Determine the Length of the Longest Word in a String

4 messages Options
Embed this post
Permalink
Shadley Thomas

Determine the Length of the Longest Word in a String

Reply Threaded More More options
Print post
Permalink
Hi Everyone,

I'm new to programming R and have accomplished my goal, but feel that there
is probably a more efficient way of coding this.  I'd appreciate any
guidance that a more advanced programmer can provide.

My goal --
I would like to find the length of the longest word in a string containing
many words separated by spaces.

How I did it --
I was able to find the length of the longest word by parsing the string into
a list of separate words, using the function "which.max" to determine the
element with the longest length, and then using "nchar" to calculate the
length of that particular word.

My question --
It seems inefficient to determine which element is the longest and then
calculate the length of that longest element.  I was hoping to find a way to
simply return the length of the longest word in a more straightforward way.

Short sample code --
> shadstr <- c("My string of words with varying lengths.  Longest word is
nine - 1 22 333 999999999 4444")
> shadvector <- unlist(strsplit(shadstr, split=" "))
> shadvlength <- lapply(shadvector,nchar)
> shadmaxind <- which.max(shadvlength) ## Maximum element
> shadmax <- nchar(shadvector[shadmaxind])
> shadmax
[1] 9

Many thanks for your help and suggestions.
Shad

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Kingsford Jones

Re: Determine the Length of the Longest Word in a String

Reply Threaded More More options
Print post
Permalink
On Fri, Apr 10, 2009 at 2:40 PM, Shadley Thomas
<[hidden email]> wrote:
[snip]
> My question --
> It seems inefficient to determine which element is the longest and then
> calculate the length of that longest element.  I was hoping to find a way to
> simply return the length of the longest word in a more straightforward way.
>
> Short sample code --
>> shadstr <- c("My string of words with varying lengths.  Longest word is
> nine - 1 22 333 999999999 4444")
>> shadvector <- unlist(strsplit(shadstr, split=" "))

nchar is vectorized, so at this point you can just do

> max(nchar(shadvector))
[1] 9

hth,
Kingsford Jones


>> shadvlength <- lapply(shadvector,nchar)
>> shadmaxind <- which.max(shadvlength) ## Maximum element
>> shadmax <- nchar(shadvector[shadmaxind])
>> shadmax
> [1] 9
>
> Many thanks for your help and suggestions.
> Shad
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Marc Schwartz-3

Re: Determine the Length of the Longest Word in a String

Reply Threaded More More options
Print post
Permalink
In reply to this post by Shadley Thomas
On Apr 10, 2009, at 3:40 PM, Shadley Thomas wrote:

> Hi Everyone,
>
> I'm new to programming R and have accomplished my goal, but feel  
> that there
> is probably a more efficient way of coding this.  I'd appreciate any
> guidance that a more advanced programmer can provide.
>
> My goal --
> I would like to find the length of the longest word in a string  
> containing
> many words separated by spaces.
>
> How I did it --
> I was able to find the length of the longest word by parsing the  
> string into
> a list of separate words, using the function "which.max" to  
> determine the
> element with the longest length, and then using "nchar" to calculate  
> the
> length of that particular word.
>
> My question --
> It seems inefficient to determine which element is the longest and  
> then
> calculate the length of that longest element.  I was hoping to find  
> a way to
> simply return the length of the longest word in a more  
> straightforward way.
>
> Short sample code --
>> shadstr <- c("My string of words with varying lengths.  Longest  
>> word is
> nine - 1 22 333 999999999 4444")
>> shadvector <- unlist(strsplit(shadstr, split=" "))
>> shadvlength <- lapply(shadvector,nchar)
>> shadmaxind <- which.max(shadvlength) ## Maximum element
>> shadmax <- nchar(shadvector[shadmaxind])
>> shadmax
> [1] 9
>
> Many thanks for your help and suggestions.
> Shad

Welcome to R Shad.

Note that the 'x' argument to nchar() can be a vector, which means  
that it will return the character lengths of the individual elements  
of the vector. Thus:

# Get the individual components, I use list indexing here, but  
unlist() works as well
 > strsplit(shadstr, " ")[[1]]
  [1] "My"        "string"    "of"        "words"     "with"
  [6] "varying"   "lengths."  ""          "Longest"   "word"
[11] "is"        "nine"      "-"         "1"         "22"
[16] "333"       "999999999" "4444"

# Get the lengths of each
 > nchar(strsplit(shadstr, " ")[[1]])
  [1] 2 6 2 5 4 7 8 0 7 4 2 4 1 1 2 3 9 4

# Get the max length
 > max(nchar(strsplit(shadstr, " ")[[1]]))
[1] 9


As an aside, note that there are two spaces between the period '.' and  
the word "Longest", which results in an empty element in the resultant  
vector. If you wanted to split on one or more spaces, you could use a  
'regular expression' in strsplit() such as:

 > strsplit(shadstr, " +")[[1]]
  [1] "My"        "string"    "of"        "words"     "with"
  [6] "varying"   "lengths."  "Longest"   "word"      "is"
[11] "nine"      "-"         "1"         "22"        "333"
[16] "999999999" "4444"

In the above, the use of " +" says to match one or more spaces as the  
split character. See ?regex for more information on that point.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck

Re: Determine the Length of the Longest Word in a String

Reply Threaded More More options
Print post
Permalink
In reply to this post by Shadley Thomas
Using strapply, we extract all strings of word characters
and apply nchar to each simplifying by taking the max.

library(gsubfn)
strapply(shadstr, "\\w+", nchar, simplify = max)

See the info on the gsubfn home page:
http://gsubfn.googlecode.com
as well as the vignette, help file and demos.

On Fri, Apr 10, 2009 at 4:40 PM, Shadley Thomas
<[hidden email]> wrote:

> Hi Everyone,
>
> I'm new to programming R and have accomplished my goal, but feel that there
> is probably a more efficient way of coding this.  I'd appreciate any
> guidance that a more advanced programmer can provide.
>
> My goal --
> I would like to find the length of the longest word in a string containing
> many words separated by spaces.
>
> How I did it --
> I was able to find the length of the longest word by parsing the string into
> a list of separate words, using the function "which.max" to determine the
> element with the longest length, and then using "nchar" to calculate the
> length of that particular word.
>
> My question --
> It seems inefficient to determine which element is the longest and then
> calculate the length of that longest element.  I was hoping to find a way to
> simply return the length of the longest word in a more straightforward way.
>
> Short sample code --
>> shadstr <- c("My string of words with varying lengths.  Longest word is
> nine - 1 22 333 999999999 4444")
>> shadvector <- unlist(strsplit(shadstr, split=" "))
>> shadvlength <- lapply(shadvector,nchar)
>> shadmaxind <- which.max(shadvlength) ## Maximum element
>> shadmax <- nchar(shadvector[shadmaxind])
>> shadmax
> [1] 9
>
> Many thanks for your help and suggestions.
> Shad
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.