correct way to subset a vector

3 messages Options
Embed this post
Permalink
Juliet Hannah

correct way to subset a vector

Reply Threaded More More options
Print post
Permalink
Hi,

#make example data
dat <- data.frame(matrix(rnorm(15),ncol=5))
colnames(dat) <- c("ab","cd","ef","gh","ij")

If I want to get a subset of the data for the middle 3 columns, and I
know the names of the start column and the end column, I can do this:

mysub <- subset(dat,select=c(cd:gh))

If I wanted to do this just on the column names, without subsetting
the data, how could I do this?

mynames <- colnames(dat);

#mynames
#[1] "ab" "cd" "ef" "gh" "ij"

Is there an easy way to create the vector c("cd","ef","gh") as I did
above using something similar to cd:gh?

Thanks,

Juliet

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Steve Lianoglou-6

Re: correct way to subset a vector

Reply Threaded More More options
Print post
Permalink
Hi,

On Jul 9, 2009, at 11:40 AM, Juliet Hannah wrote:

> Hi,
>
> #make example data
> dat <- data.frame(matrix(rnorm(15),ncol=5))
> colnames(dat) <- c("ab","cd","ef","gh","ij")
>
> If I want to get a subset of the data for the middle 3 columns, and I
> know the names of the start column and the end column, I can do this:
>
> mysub <- subset(dat,select=c(cd:gh))
>
> If I wanted to do this just on the column names, without subsetting
> the data, how could I do this?
>
> mynames <- colnames(dat);
>
> #mynames
> #[1] "ab" "cd" "ef" "gh" "ij"
>
> Is there an easy way to create the vector c("cd","ef","gh") as I did
> above using something similar to cd:gh?

How about just taking your mynames vector? eg:

R> mynames[2:4]
[1] "cd" "ef" "gh"

R> dat[, mynames[2:4]]
           cd         ef          gh
1  1.7745386  1.0958930 -0.07213304
2  0.7480372 -0.1364458 -0.62848211
3 -0.5477843  1.5811382 -0.74404103

-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Marc Schwartz-3

Re: correct way to subset a vector

Reply Threaded More More options
Print post
Permalink
In reply to this post by Juliet Hannah
On Jul 9, 2009, at 10:40 AM, Juliet Hannah wrote:

> Hi,
>
> #make example data
> dat <- data.frame(matrix(rnorm(15),ncol=5))
> colnames(dat) <- c("ab","cd","ef","gh","ij")
>
> If I want to get a subset of the data for the middle 3 columns, and I
> know the names of the start column and the end column, I can do this:
>
> mysub <- subset(dat,select=c(cd:gh))
>
> If I wanted to do this just on the column names, without subsetting
> the data, how could I do this?
>
> mynames <- colnames(dat);
>
> #mynames
> #[1] "ab" "cd" "ef" "gh" "ij"
>
> Is there an easy way to create the vector c("cd","ef","gh") as I did
> above using something similar to cd:gh?
>
> Thanks,
>
> Juliet



Using the same presumption that the desired values are consecutive in  
the vector:

# Use which() to get the indices for the start and end of the subset
 > mynames[which(mynames == "cd"):which(mynames == "gh")]
[1] "cd" "ef" "gh"


You can encapsulate that in a function:

subset.vector <- function(x, start, end)
{
   x[which(x == start):which(x == end)]
}

 > subset.vector(mynames, "cd", "gh")
[1] "cd" "ef" "gh"



Note that you can also do this:

 > names(subset(dat, select = cd:gh))
[1] "cd" "ef" "gh"

but that actually goes through the process of subsetting the data  
frame first, which potentially introduces a lot of overhead and memory  
use if the data frame is large. It also presumes that the desired  
vector is a subset of the column names of the initial data frame.


To use the same sequence based approach as is used in  
subset.data.frame(), you can do what is used internally within that  
function:

subset.vector <- function(x, select)
{
   nl <- as.list(1L:length(x))
   names(nl) <- x
   vars <- eval(substitute(select), nl)
   x[vars]
}


 > subset.vector(mynames, select = cd:gh)
[1] "cd" "ef" "gh"



BTW, well done on recognizing that you can use the sequence of column  
names for the 'select' argument. A lot of folks, even experienced  
useRs, don't realize that you can do that...  :-)

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.