Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

4 messages Options
Embed this post
Permalink
Tymek W

Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

Reply Threaded More More options
Print post
Permalink
Hi,

Could anyone tell me what is wrong:

> length(unique(mydata$myvariable))
[1] 2
>

and in t-test:

(...)
Error in t.test.formula(othervariable ~ myvariable, mydata) :
  grouping factor must have exactly 2 levels
>

I re-checked the code and still don't get what is wrong.

Moreover, there is some strange behavior:

/1 It seems that the error is vulnerable to NA'a, because it affects
some variables in data set with NA's and doesn't affect same ones in
dataset with NA's removed.

/2 It seems it works differently with different ways of using
variables in t.test:

eg. it hapends here: t.test(x~y, dataset) and does not here:
t.test(dataset[['x']]~dataset[['y']])

Does anyone have any ideas?

Greetz,
Timo

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Marc Schwartz-3

Re: Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

Reply Threaded More More options
Print post
Permalink
On Jul 9, 2009, at 5:04 PM, Tymek W wrote:

> Hi,
>
> Could anyone tell me what is wrong:
>
>> length(unique(mydata$myvariable))
> [1] 2
>>
>
> and in t-test:
>
> (...)
> Error in t.test.formula(othervariable ~ myvariable, mydata) :
>  grouping factor must have exactly 2 levels
>>
>
> I re-checked the code and still don't get what is wrong.
>
> Moreover, there is some strange behavior:
>
> /1 It seems that the error is vulnerable to NA'a, because it affects
> some variables in data set with NA's and doesn't affect same ones in
> dataset with NA's removed.
>
> /2 It seems it works differently with different ways of using
> variables in t.test:
>
> eg. it hapends here: t.test(x~y, dataset) and does not here:
> t.test(dataset[['x']]~dataset[['y']])
>
> Does anyone have any ideas?
>
> Greetz,
> Timo


Check the output of:

   na.omit(cbind(mydata$othervariable, mydata$myvariable))

which will give you some insight into what data is actually available  
to be used in the t test. This will remove any rows that have missing  
data. Your first test above, checking the number of levels, is before  
missing data is removed.

The likelihood is that once missing values have been removed, you are  
only left with one unique grouping value in mydata$myvariable.

For your note number 2, it should be the same for both examples, as in  
both cases, the same basic approach is used. For example:

DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))

 > DF
    x y
1  1 1
2  2 1
3  3 1
4 NA 2
5 NA 2
6 NA 2

# Remove missing data
 > na.omit(DF)
   x y
1 1 1
2 2 1
3 3 1

 > t.test(x ~ y, data = DF)
Error in t.test.formula(x ~ y, data = DF) :
   grouping factor must have exactly 2 levels

 > t.test(DF$x ~ DF$y)
Error in t.test.formula(DF$x ~ DF$y) :
   grouping factor must have exactly 2 levels


If you have a small reproducible example where the two function calls  
behave differently, please post back with it.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Tymek W

Re: Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

Reply Threaded More More options
Print post
Permalink
Thanks for your hints, but I'm still stuck... In dataset I mentioned
(N=134) there are only 3 NA's in variable, and 41% : 59% distribution
of the two values. It doesn't look like it was because of the data...

I changed and simplified my function, now it prints levels before
doing the rest. Here's a "funny" error result:

> myfun(data, 'varname')

 Levels = 2

Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
  grouping factor must have exactly 2 levels

...

I'll paste simplified code, maybe it'd give someone a clue what is going wrong:

myfun <- function(data, g) {
       
        require(stats)

        data <- as.data.frame(data)
        nam <- names(data)
        res <- matrix(NA,ncol(data))
       
        cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
               
        for (v in 1:ncol(data)) {
                if (nam[v] != g) {
                        res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
        }}
        res
}

What is going wrong here?

Greetz,
Timo


2009/7/10 Marc Schwartz <[hidden email]>:

> On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
>
>> Hi,
>>
>> Could anyone tell me what is wrong:
>>
>>> length(unique(mydata$myvariable))
>>
>> [1] 2
>>>
>>
>> and in t-test:
>>
>> (...)
>> Error in t.test.formula(othervariable ~ myvariable, mydata) :
>>  grouping factor must have exactly 2 levels
>>>
>>
>> I re-checked the code and still don't get what is wrong.
>>
>> Moreover, there is some strange behavior:
>>
>> /1 It seems that the error is vulnerable to NA'a, because it affects
>> some variables in data set with NA's and doesn't affect same ones in
>> dataset with NA's removed.
>>
>> /2 It seems it works differently with different ways of using
>> variables in t.test:
>>
>> eg. it hapends here: t.test(x~y, dataset) and does not here:
>> t.test(dataset[['x']]~dataset[['y']])
>>
>> Does anyone have any ideas?
>>
>> Greetz,
>> Timo
>
>
> Check the output of:
>
>  na.omit(cbind(mydata$othervariable, mydata$myvariable))
>
> which will give you some insight into what data is actually available to be
> used in the t test. This will remove any rows that have missing data. Your
> first test above, checking the number of levels, is before missing data is
> removed.
>
> The likelihood is that once missing values have been removed, you are only
> left with one unique grouping value in mydata$myvariable.
>
> For your note number 2, it should be the same for both examples, as in both
> cases, the same basic approach is used. For example:
>
> DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
>
>> DF
>   x y
> 1  1 1
> 2  2 1
> 3  3 1
> 4 NA 2
> 5 NA 2
> 6 NA 2
>
> # Remove missing data
>> na.omit(DF)
>  x y
> 1 1 1
> 2 2 1
> 3 3 1
>
>> t.test(x ~ y, data = DF)
> Error in t.test.formula(x ~ y, data = DF) :
>  grouping factor must have exactly 2 levels
>
>> t.test(DF$x ~ DF$y)
> Error in t.test.formula(DF$x ~ DF$y) :
>  grouping factor must have exactly 2 levels
>
>
> If you have a small reproducible example where the two function calls behave
> differently, please post back with it.
>
> HTH,
>
> Marc Schwartz
>
>



--
pozdrawiam,
Tymek W

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Petr Pikal

Re: Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

Reply Threaded More More options
Print post
Permalink
Hi

you have to look to your data
when I used your function to some artificial data I got expected result

> myfun(visko,"konc")

 Levels = 2

[[1]]
[1] NA

[[2]]

        Welch Two Sample t-test

data:  data[[nam[v]]] by data[[g]]
t = -1.7778, df = 4.541, p-value = 0.1415
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.861362   2.535362
sample estimates:
mean in group 1 mean in group 2
          6.685          11.848


[[3]]

        Welch Two Sample t-test

data:  data[[nam[v]]] by data[[g]]
t = -2.6074, df = 3.263, p-value = 0.07327
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.070027   0.775027
sample estimates:
mean in group 1 mean in group 2
         2.3275          6.9750

try

debug(myfun)

and see at what column it gives an error and how all values look like
immediately before an error.

Regards
Petr


[hidden email] napsal dne 10.07.2009 11:40:30:

> Thanks for your hints, but I'm still stuck... In dataset I mentioned
> (N=134) there are only 3 NA's in variable, and 41% : 59% distribution
> of the two values. It doesn't look like it was because of the data...
>
> I changed and simplified my function, now it prints levels before
> doing the rest. Here's a "funny" error result:
>
> > myfun(data, 'varname')
>
>  Levels = 2
>
> Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
>   grouping factor must have exactly 2 levels
>
> ...
>
> I'll paste simplified code, maybe it'd give someone a clue what is going
wrong:

>
> myfun <- function(data, g) {
>
>    require(stats)
>
>    data <- as.data.frame(data)
>    nam <- names(data)
>    res <- matrix(NA,ncol(data))
>
>    cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
>
>    for (v in 1:ncol(data)) {
>       if (nam[v] != g) {
>          res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
>    }}
>    res
> }
>
> What is going wrong here?
>
> Greetz,
> Timo
>
>
> 2009/7/10 Marc Schwartz <[hidden email]>:
> > On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
> >
> >> Hi,
> >>
> >> Could anyone tell me what is wrong:
> >>
> >>> length(unique(mydata$myvariable))
> >>
> >> [1] 2
> >>>
> >>
> >> and in t-test:
> >>
> >> (...)
> >> Error in t.test.formula(othervariable ~ myvariable, mydata) :
> >>  grouping factor must have exactly 2 levels
> >>>
> >>
> >> I re-checked the code and still don't get what is wrong.
> >>
> >> Moreover, there is some strange behavior:
> >>
> >> /1 It seems that the error is vulnerable to NA'a, because it affects
> >> some variables in data set with NA's and doesn't affect same ones in
> >> dataset with NA's removed.
> >>
> >> /2 It seems it works differently with different ways of using
> >> variables in t.test:
> >>
> >> eg. it hapends here: t.test(x~y, dataset) and does not here:
> >> t.test(dataset[['x']]~dataset[['y']])
> >>
> >> Does anyone have any ideas?
> >>
> >> Greetz,
> >> Timo
> >
> >
> > Check the output of:
> >
> >  na.omit(cbind(mydata$othervariable, mydata$myvariable))
> >
> > which will give you some insight into what data is actually available
to be
> > used in the t test. This will remove any rows that have missing data.
Your
> > first test above, checking the number of levels, is before missing
data is
> > removed.
> >
> > The likelihood is that once missing values have been removed, you are
only
> > left with one unique grouping value in mydata$myvariable.
> >
> > For your note number 2, it should be the same for both examples, as in
both

> > cases, the same basic approach is used. For example:
> >
> > DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
> >
> >> DF
> >   x y
> > 1  1 1
> > 2  2 1
> > 3  3 1
> > 4 NA 2
> > 5 NA 2
> > 6 NA 2
> >
> > # Remove missing data
> >> na.omit(DF)
> >  x y
> > 1 1 1
> > 2 2 1
> > 3 3 1
> >
> >> t.test(x ~ y, data = DF)
> > Error in t.test.formula(x ~ y, data = DF) :
> >  grouping factor must have exactly 2 levels
> >
> >> t.test(DF$x ~ DF$y)
> > Error in t.test.formula(DF$x ~ DF$y) :
> >  grouping factor must have exactly 2 levels
> >
> >
> > If you have a small reproducible example where the two function calls
behave

> > differently, please post back with it.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> >
>
>
>
> --
> pozdrawiam,
> Tymek W
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.