Hello R Forum users,
I was hoping someone could help me with the following problem. Consider the following "toy" dataset:
Accession SNP_CRY2 SNP_FLC Phenotype
1 NA A 0.783143079
2 BQ A 0.881714811
3 BQ A 0.886619488
4 AQ B 0.416893034
5 AQ B 0.621392903
6 AS B 0.031719125
7 AS NA 0.652375037
"Accession" = individual plants, arbitrarily identified by unique numbers
"SNP_" = individual genes.
"SNP_CRY2" = the CRY2 gene. The plants either have the BQ, AQ, or AS genotype at the CRY2 gene. "NA" = missing data.
"SNP_FLC" = the FLC gene. The plants either have the A or B genotype at the FLC gene. "NA" = missing data.
"Phenotype" = a continuous variable of interest.
I have a much larger number of columns corresponding to genes (i.e., more columns with the "SNP_" prefix) in my real dataset. For each gene in turn (i.e., each "SNP_" column), I would like to find the phenotypic variance for all of the plants with non-missing data. Note that the plants with missing genotype data ("NA") differ for each gene (each "SNP_" column).
Would one of you be able to offer some specific code that could do this operation? Please rest assured that I am not a student trying to elicit help with a homework assignment. I am a post-doc with limited R skills, working with a large genetic dataset.
Thanks very much in advance to a wonderful online community.
Sincerely,
Josh
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.