Extracting matched expressions

4 messages Options
Embed this post
Permalink
Hadley Wickham-2

Extracting matched expressions

Reply Threaded More More options
Print post
Permalink
Hi all,

Is there a tool in base R to extract matched expressions from a
regular expression?  i.e. given the regular expression "(.*?) (.*?)
([ehtr]{5})" is there a way to extract the character vector c("one",
"two", "three") from the string "one two three" ?

Thanks,

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck

Re: Extracting matched expressions

Reply Threaded More More options
Print post
Permalink
strapply in the gsubfn package can do that. It applies the indicated
function, here just c, to the back references from the pattern match
and then simplifies the result using simplify. (If you omit simplify
here it would give a one element list like strsplit does.)

library(gsubfn)
pat <- "(.*?) (.*?) ([ehtr]{5})"
strapply("one two three", pat, c, simplify = c)

See home page at: http://gsubfn.googlecode.com


On Sun, Nov 8, 2009 at 1:51 PM, Hadley Wickham <[hidden email]> wrote:

> Hi all,
>
> Is there a tool in base R to extract matched expressions from a
> regular expression?  i.e. given the regular expression "(.*?) (.*?)
> ([ehtr]{5})" is there a way to extract the character vector c("one",
> "two", "three") from the string "one two three" ?
>
> Thanks,
>
> Hadley
>
> --
> http://had.co.nz/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
jholtman

Re: Extracting matched expressions

Reply Threaded More More options
Print post
Permalink
In reply to this post by Hadley Wickham-2
Is this what you want:

> x <- 'xxxx xxxx one two three xxxx xxxx'
> y <- sub(".*?([^[:space:]]+)[[:space:]]+([^[:space:]]+)[[:space:]]+([ehrt]{5}).*",
+     "\\1 \\2 \\3", x, perl=TRUE)
> unlist(strsplit(y, ' '))
[1] "one"   "two"   "three"


On Sun, Nov 8, 2009 at 1:51 PM, Hadley Wickham <[hidden email]> wrote:

> Hi all,
>
> Is there a tool in base R to extract matched expressions from a
> regular expression?  i.e. given the regular expression "(.*?) (.*?)
> ([ehtr]{5})" is there a way to extract the character vector c("one",
> "two", "three") from the string "one two three" ?
>
> Thanks,
>
> Hadley
>
> --
> http://had.co.nz/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hadley Wickham-2

Re: Extracting matched expressions

Reply Threaded More More options
Print post
Permalink
Thanks Jim - it's not elegant, but it works.  Instead of using space
as a delimiter, I used "\u001E" - it's the unicode record delimiter
character, and I figure there's less chance of a clash with a
character in the match.

Hadley

On Sun, Nov 8, 2009 at 1:40 PM, jim holtman <[hidden email]> wrote:

> Is this what you want:
>
>> x <- 'xxxx xxxx one two three xxxx xxxx'
>> y <- sub(".*?([^[:space:]]+)[[:space:]]+([^[:space:]]+)[[:space:]]+([ehrt]{5}).*",
> +     "\\1 \\2 \\3", x, perl=TRUE)
>> unlist(strsplit(y, ' '))
> [1] "one"   "two"   "three"
>
>
> On Sun, Nov 8, 2009 at 1:51 PM, Hadley Wickham <[hidden email]> wrote:
>> Hi all,
>>
>> Is there a tool in base R to extract matched expressions from a
>> regular expression?  i.e. given the regular expression "(.*?) (.*?)
>> ([ehtr]{5})" is there a way to extract the character vector c("one",
>> "two", "three") from the string "one two three" ?
>>
>> Thanks,
>>
>> Hadley
>>
>> --
>> http://had.co.nz/
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.