Using something like the "by" command, but on rows instead of columns

2 messages Options
Embed this post
Permalink
Josh B-3

Using something like the "by" command, but on rows instead of columns

Reply Threaded More More options
Print post
Permalink
Hello R Forum users,

I was hoping someone could help me with the following problem. Consider the following "toy" dataset:

Accession    SNP_CRY2    SNP_FLC    Phenotype
1    NA    A    0.783143079
2    BQ    A    0.881714811
3    BQ    A    0.886619488
4    AQ    B    0.416893034
5    AQ    B    0.621392903
6    AS    B    0.031719125
7    AS    NA    0.652375037

"Accession" = individual plants, arbitrarily identified by unique numbers
"SNP_" = individual genes.
"SNP_CRY2" = the CRY2 gene. The plants either have the BQ, AQ, or AS genotype at the CRY2 gene. "NA" = missing data.
"SNP_FLC" = the FLC gene. The plants either have the A or B genotype at the FLC gene. "NA" = missing data.
"Phenotype" = a continuous variable of interest.

I have a much larger number of columns corresponding to genes (i.e., more columns with the "SNP_" prefix) in my real dataset. For each gene in turn (i.e., each "SNP_" column), I would like to find the phenotypic variance for all of the plants with non-missing data. Note that the plants with missing genotype data ("NA") differ for each gene (each "SNP_" column).

Would one of you be able to offer some specific code that could do this operation? Please rest assured that I am not a student trying to elicit help with a homework assignment. I am a post-doc with limited R skills, working with a large genetic dataset.

Thanks very much in advance to a wonderful online community.
Sincerely,
Josh



     
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Freedman

Re: Using something like the "by" command, but on rows instead of columns

Reply Threaded More More options
Print post
Permalink
Some variation of the following might be want you want:

df=data.frame(sex=sample(1:2,100,replace=T),snp.1=rnorm(100),snp.15=runif(100))
df$snp.1[df$snp.1>1.0]<-NA; #put some missing values into the data
x=grep('^snp',names(df)); x #which columns that begin with 'snp'
apply(df[,x],2,summary)
#or
apply(df[,x],2,FUN=function(x)mean(x,na=T))

hth,
david

Josh B-3 wrote:
Hello R Forum users,

I was hoping someone could help me with the following problem. Consider the following "toy" dataset:

Accession    SNP_CRY2    SNP_FLC    Phenotype
1    NA    A    0.783143079
2    BQ    A    0.881714811
3    BQ    A    0.886619488
4    AQ    B    0.416893034
5    AQ    B    0.621392903
6    AS    B    0.031719125
7    AS    NA    0.652375037

"Accession" = individual plants, arbitrarily identified by unique numbers
"SNP_" = individual genes.
"SNP_CRY2" = the CRY2 gene. The plants either have the BQ, AQ, or AS genotype at the CRY2 gene. "NA" = missing data.
"SNP_FLC" = the FLC gene. The plants either have the A or B genotype at the FLC gene. "NA" = missing data.
"Phenotype" = a continuous variable of interest.

I have a much larger number of columns corresponding to genes (i.e., more columns with the "SNP_" prefix) in my real dataset. For each gene in turn (i.e., each "SNP_" column), I would like to find the phenotypic variance for all of the plants with non-missing data. Note that the plants with missing genotype data ("NA") differ for each gene (each "SNP_" column).

Would one of you be able to offer some specific code that could do this operation? Please rest assured that I am not a student trying to elicit help with a homework assignment. I am a post-doc with limited R skills, working with a large genetic dataset.

Thanks very much in advance to a wonderful online community.
Sincerely,
Josh



     
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.