Issue with %in% - not matching identical rows in data frames

3 messages Options
Embed this post
Permalink
Kaushik Krishnan

Issue with %in% - not matching identical rows in data frames

Reply Threaded More More options
Print post
Permalink
Hi folks

I have two data frames.  I know that the nth (let's say the 7th) row
in the first data frame (sequence) is there in the second
(today.sequence).  When I try to check that by doing 'sequence[7,]
%in% today.sequence', I get all FALSE when it should be all TRUE.

I'm certain I'm making some trivial mistake.  Any solutions?

The code to recreate the data frames and see for yourself is:
----
sequence <- structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
14557, 14550, 14551, 14550), class = "Date"), DATASET = c(1L,
2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
0L, 0L)), .Names = c("DATE", "DATASET", "REP", "WRONGS_ABS",
"WRONGS_RATIO", "DONE"), class = "data.frame", row.names = c(NA,
-8L))

today.sequence <- structure(list(DATE = structure(c(14551, 14550),
class = "Date"),
    DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
WRONGS_RATIO = c(0L,
    0L), DONE = c(0L, 0L)), .Names = c("DATE", "DATASET", "REP",
"WRONGS_ABS", "WRONGS_RATIO", "DONE"), row.names = 7:8, class = "data.frame")

sequence[7,] #You should see '2009-11-03       3   1          0
    0    0'

today.sequence #You can clearly see that sequence [7,] is the first
row in today.sequence

sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
TRUE TRUE TRUE'.  Instead
# it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'
----

Thanks

--
Kaushik Krishnan
([hidden email])

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Sundar Dorai-Raj-2

Re: Issue with %in% - not matching identical rows in data frames

Reply Threaded More More options
Print post
Permalink
?"%in%" says "x" and "table" must be vectors. You supplied
data.frames. So %in% is coercing your today.sequence to a vector using

as.character(today.sequence)

Perhaps you should paste the columns together first:

x <- do.call("paste", c(sequence, sep = "::"))
table <- do.call("paste", c(today.sequence, sep = "::"))
x[7] %in% table

I'm not sure if this is what you want/need, but it does match your example.

HTH,

--sundar

On Tue, Nov 3, 2009 at 7:53 AM, Kaushik Krishnan
<[hidden email]> wrote:

> Hi folks
>
> I have two data frames.  I know that the nth (let's say the 7th) row
> in the first data frame (sequence) is there in the second
> (today.sequence).  When I try to check that by doing 'sequence[7,]
> %in% today.sequence', I get all FALSE when it should be all TRUE.
>
> I'm certain I'm making some trivial mistake.  Any solutions?
>
> The code to recreate the data frames and see for yourself is:
> ----
> sequence <- structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
> 14557, 14550, 14551, 14550), class = "Date"), DATASET = c(1L,
> 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
> 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
> 0L, 0L)), .Names = c("DATE", "DATASET", "REP", "WRONGS_ABS",
> "WRONGS_RATIO", "DONE"), class = "data.frame", row.names = c(NA,
> -8L))
>
> today.sequence <- structure(list(DATE = structure(c(14551, 14550),
> class = "Date"),
>    DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
> WRONGS_RATIO = c(0L,
>    0L), DONE = c(0L, 0L)), .Names = c("DATE", "DATASET", "REP",
> "WRONGS_ABS", "WRONGS_RATIO", "DONE"), row.names = 7:8, class = "data.frame")
>
> sequence[7,] #You should see '2009-11-03       3   1          0
>    0    0'
>
> today.sequence #You can clearly see that sequence [7,] is the first
> row in today.sequence
>
> sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
> TRUE TRUE TRUE'.  Instead
> # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'
> ----
>
> Thanks
>
> --
> Kaushik Krishnan
> ([hidden email])
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry

Re: Issue with %in% - not matching identical rows in data frames

Reply Threaded More More options
Print post
Permalink
In reply to this post by Kaushik Krishnan


Kaushik,

The documentation doesn't quite tell (me, anyway) how the function behaves
when 'target' is a list (or data.frame). You'll need to dig into match.c
or experiment with match() or %in% to see what it is actually doing.

But it looks like it is matching whole columns of the data.frame rather
than elements within each column :

>  sequence %in% sequence
[1] TRUE TRUE TRUE TRUE TRUE TRUE
>  sequence %in% rev(sequence)
[1] TRUE TRUE TRUE TRUE TRUE TRUE
>
>  sequence[1,] %in% sequence
[1] FALSE FALSE FALSE FALSE FALSE FALSE
>  sequence[1,] %in% sequence[1,]
[1] TRUE TRUE TRUE TRUE TRUE TRUE
>

Maybe you wanted something like

  mapply( function(x,y) x%in%y , sequence[7, ], today.sequence )

??

HTH,

Chuck


On Tue, 3 Nov 2009, Kaushik Krishnan wrote:

> Hi folks
>
> I have two data frames.  I know that the nth (let's say the 7th) row
> in the first data frame (sequence) is there in the second
> (today.sequence).  When I try to check that by doing 'sequence[7,]
> %in% today.sequence', I get all FALSE when it should be all TRUE.
>
> I'm certain I'm making some trivial mistake.  Any solutions?
>
> The code to recreate the data frames and see for yourself is:
> ----
> sequence <- structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
> 14557, 14550, 14551, 14550), class = "Date"), DATASET = c(1L,
> 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
> 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
> 0L, 0L)), .Names = c("DATE", "DATASET", "REP", "WRONGS_ABS",
> "WRONGS_RATIO", "DONE"), class = "data.frame", row.names = c(NA,
> -8L))
>
> today.sequence <- structure(list(DATE = structure(c(14551, 14550),
> class = "Date"),
>    DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
> WRONGS_RATIO = c(0L,
>    0L), DONE = c(0L, 0L)), .Names = c("DATE", "DATASET", "REP",
> "WRONGS_ABS", "WRONGS_RATIO", "DONE"), row.names = 7:8, class = "data.frame")
>
> sequence[7,] #You should see '2009-11-03       3   1          0
>    0    0'
>
> today.sequence #You can clearly see that sequence [7,] is the first
> row in today.sequence
>
> sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
> TRUE TRUE TRUE'.  Instead
> # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'
> ----
>
> Thanks
>
> --
> Kaushik Krishnan
> ([hidden email])
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:[hidden email]            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.