creating mulptiple new variables from one data.frame according to columns and rows in that frame

5 messages Options
Embed this post
Permalink
Hayes, Daniel

creating mulptiple new variables from one data.frame according to columns and rows in that frame

Reply Threaded More More options
Print post
Permalink
Dear R-helpers,

I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below).

*        I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame.

*        I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc)


I have tried using:  c(seq(0,5,1/12),seq(5,50,1) )  to select the desired ages but am unsure how to repeat that sequence for consecutive countries.
I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one  variable
Have also looked a 'by', 'apply' and things like 'for (i in 1:13)'

Help with either or both steps would be greatly appreciated.

Greetings from Formentera,
Daniel

           Age(yrs) country       mu     sigma
1   0.00000000   Bolivia 11.42168 0.1014872
2   0.08333333   Bolivia 11.33625 0.1053837
3   0.16666667   Bolivia 11.28417 0.1070594
4   0.25000000   Bolivia 11.21125 0.1083872
5   0.33333333   Bolivia 11.11637 0.1095305
...
602  0.00000000  Brazil 11.54888 0.10839417
603  0.08333333  Brazil 11.46345 0.11255592
604  0.16666667  Brazil 11.41137 0.11434565
605  0.25000000  Brazil 11.33844 0.11576378
606  0.33333333  Brazil 11.24357 0.11698489
...


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
jholtman

Re: creating mulptiple new variables from one data.frame according to columns and rows in that frame

Reply Threaded More More options
Print post
Permalink
try this:

> x <- read.table(textConnection("          Age(yrs) country       mu     sigma
+ 1   0.00000000   Bolivia 11.42168 0.1014872
+ 2   0.08333333   Bolivia 11.33625 0.1053837
+ 3   0.16666667   Bolivia 11.28417 0.1070594
+ 4   0.25000000   Bolivia 11.21125 0.1083872
+ 5   0.33333333   Bolivia 11.11637 0.1095305
+ 5.1   5  Bolivia 11.11637 0.1095305
+ 5.2   5.5   Bolivia 11.11637 0.1095305
+ 5.3   6   Bolivia 11.11637 0.1095305
+ 5.4   20   Bolivia 11.11637 0.1095305
+ 5.5   20.1   Bolivia 11.11637 0.1095305
+ 5.6   50   Bolivia 11.11637 0.1095305
+ 602  0.00000000  Brazil 11.54888 0.10839417
+ 603  0.08333333  Brazil 11.46345 0.11255592
+ 604  0.16666667  Brazil 11.41137 0.11434565
+ 605  0.25000000  Brazil 11.33844 0.11576378
+ 606  0.33333333  Brazil 11.24357 0.11698489"), header=TRUE)
> closeAllConnections()
> result <- lapply(split(x, x$country), function(.ctry){
+     # keep all < 5 and only integers over 5
+     subset(.ctry, .ctry$Age.yrs. < 5 | .ctry$Age.yrs. %in% 5:50)
+ })
>
> result
$Bolivia
       Age.yrs. country       mu     sigma
1    0.00000000 Bolivia 11.42168 0.1014872
2    0.08333333 Bolivia 11.33625 0.1053837
3    0.16666667 Bolivia 11.28417 0.1070594
4    0.25000000 Bolivia 11.21125 0.1083872
5    0.33333333 Bolivia 11.11637 0.1095305
5.1  5.00000000 Bolivia 11.11637 0.1095305
5.3  6.00000000 Bolivia 11.11637 0.1095305
5.4 20.00000000 Bolivia 11.11637 0.1095305
5.6 50.00000000 Bolivia 11.11637 0.1095305

$Brazil
      Age.yrs. country       mu     sigma
602 0.00000000  Brazil 11.54888 0.1083942
603 0.08333333  Brazil 11.46345 0.1125559
604 0.16666667  Brazil 11.41137 0.1143456
605 0.25000000  Brazil 11.33844 0.1157638
606 0.33333333  Brazil 11.24357 0.1169849


On Tue, Nov 3, 2009 at 9:31 AM, Hayes, Daniel <[hidden email]> wrote:

> Dear R-helpers,
>
> I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below).
>
> *        I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame.
>
> *        I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc)
>
>
> I have tried using:  c(seq(0,5,1/12),seq(5,50,1) )  to select the desired ages but am unsure how to repeat that sequence for consecutive countries.
> I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one  variable
> Have also looked a 'by', 'apply' and things like 'for (i in 1:13)'
>
> Help with either or both steps would be greatly appreciated.
>
> Greetings from Formentera,
> Daniel
>
>           Age(yrs) country       mu     sigma
> 1   0.00000000   Bolivia 11.42168 0.1014872
> 2   0.08333333   Bolivia 11.33625 0.1053837
> 3   0.16666667   Bolivia 11.28417 0.1070594
> 4   0.25000000   Bolivia 11.21125 0.1083872
> 5   0.33333333   Bolivia 11.11637 0.1095305
> ...
> 602  0.00000000  Brazil 11.54888 0.10839417
> 603  0.08333333  Brazil 11.46345 0.11255592
> 604  0.16666667  Brazil 11.41137 0.11434565
> 605  0.25000000  Brazil 11.33844 0.11576378
> 606  0.33333333  Brazil 11.24357 0.11698489
> ...
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hayes, Daniel

Re: creating mulptiple new variables from one data.frame according to columns and rows in that frame

Reply Threaded More More options
Print post
Permalink
Jim Holtman,
Thank you for your reply.
Your script is very concise and I think it could help me.
However when I run it on my real data object (musigma.lat.m) the age range from 5-50 skips certain full years (see script below).
Am not sure why that is and no error is given.
Hoping you can help.

Thank you in advance for your time and energy.
All the best,
Daniel

> dput(musigma.lat.m[580:620,])
structure(list(age = c(48.25, 48.3333333333333, 48.4166666666667,
48.5, 48.5833333333333, 48.6666666666667, 48.75, 48.8333333333333,
48.9166666666667, 49, 49.0833333333333, 49.1666666666667, 49.25,
49.3333333333333, 49.4166666666667, 49.5, 49.5833333333333, 49.6666666666667,
49.75, 49.8333333333333, 49.9166666666667, 50, 0, 0.0833333333333333,
0.166666666666667, 0.25, 0.333333333333333, 0.416666666666667,
0.5, 0.583333333333333, 0.666666666666667, 0.75, 0.833333333333333,
0.916666666666667, 1, 1.08333333333333, 1.16666666666667, 1.25,
1.33333333333333, 1.41666666666667, 1.5), country = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bolivia", "Brazil",
"Colombia", "Dominican Rep.", "El Salvador", "Guatemala", "Guyana",
"Haiti", "Honduras", "Nicaragua", "Paraguay", "Peru", "Suriname"
), class = "factor"), mu = c(10.7198320154036, 10.7193221119285,
10.7188036231439, 10.7182764259851, 10.7177406001273, 10.7171962535812,
10.7166435245754, 10.7160826629999, 10.7155141252060, 10.7149385933270,
10.7143568116012, 10.7137696820872, 10.7131779280271, 10.7125822168258,
10.7119832145823, 10.7113816139594, 10.7107780960397, 10.7101732860418,
10.7095677728307, 10.7089620284128, 10.7083564497153, 10.7077512971194,
11.548875536071, 11.4634458099448, 11.4113675486745, 11.3384424250672,
11.2435706626324, 11.1313969585720, 11.0086560681222, 10.8827443523793,
10.7598371816865, 10.6440424747848, 10.5382165128003, 10.4442220905656,
10.3633207905823, 10.2961499250469, 10.2427320635721, 10.2025802100475,
10.1749531325293, 10.1590477762319, 10.1540156426321), sigma = c(0.0947487228789027,
0.0947760295260326, 0.0948033853581562, 0.0948307832769866, 0.094858216728106,
0.0948856796527442, 0.0949131660004063, 0.0949406718763748, 0.0949681949273155,
0.0949957322607503, 0.0950232806230888, 0.095050836445582, 0.0950783958990592,
0.0951059550037287, 0.0951335102859937, 0.0951610590705954, 0.0951885984623664,
0.0952161256367413, 0.0952436392777666, 0.0952711384472643, 0.0952986226318235,
0.0953260918098295, 0.108394172852678, 0.112555919942990, 0.114345649992535,
0.115763779372203, 0.116984886895669, 0.118065092089138, 0.119029362771532,
0.119887968678076, 0.120638553936562, 0.121278180095107, 0.121810743569063,
0.122245010348365, 0.122590801228219, 0.122858869689557, 0.123059216409329,
0.123199542683827, 0.123286339009648, 0.123324768295488, 0.123319375423601
)), .Names = c("age", "country", "mu", "sigma"), row.names = c("580",
"581", "582", "583", "584", "585", "586", "587", "588", "589",
"590", "591", "592", "593", "594", "595", "596", "597", "598",
"599", "600", "601", "602", "603", "604", "605", "606", "607",
"608", "609", "610", "611", "612", "613", "614", "615", "616",
"617", "618", "619", "620"), class = "data.frame")
>
> result <- lapply(split(musigma.lat.m, musigma.lat.m$country), function(.ctry){
+      # keep all < 5 and only integers over 5
+      subset(.ctry, .ctry$age < 5 | .ctry$age %in% 5:50)
+  })
>
> result
$Bolivia
            age country       mu      sigma
1    0.00000000 Bolivia 11.42168 0.10148719
2    0.08333333 Bolivia 11.33625 0.10538375
3    0.16666667 Bolivia 11.28417 0.10705943
4    0.25000000 Bolivia 11.21125 0.10838720
5    0.33333333 Bolivia 11.11637 0.10953050
...
59   4.83333333 Bolivia 10.49080 0.10671819
60   4.91666667 Bolivia 10.48562 0.10653400
109  9.00000000 Bolivia 10.43279 0.10180158
133 11.00000000 Bolivia 10.33394 0.10160484
169 14.00000000 Bolivia 10.24878 0.09946659
193 16.00000000 Bolivia 10.20148 0.09694376
205 17.00000000 Bolivia 10.16589 0.09573946

$Brazil
             age country       mu      sigma
602   0.00000000  Brazil 11.54888 0.10839417
603   0.08333333  Brazil 11.46345 0.11255592
604   0.16666667  Brazil 11.41137 0.11434565
605   0.25000000  Brazil 11.33844 0.11576378
...
660   4.83333333  Brazil 10.61799 0.11398118
661   4.91666667  Brazil 10.61281 0.11378445
710   9.00000000  Brazil 10.55999 0.10872996
734  11.00000000  Brazil 10.46113 0.10851983
770  14.00000000  Brazil 10.37597 0.10623606
794  16.00000000  Brazil 10.32867 0.10354153

-----Original Message-----
From: jim holtman [mailto:[hidden email]]
Sent: 04 November 2009 03:12
To: Hayes, Daniel
Cc: [hidden email]
Subject: Re: [R] creating mulptiple new variables from one data.frame according to columns and rows in that frame

try this:

> x <- read.table(textConnection("          Age(yrs) country       mu     sigma
+ 1   0.00000000   Bolivia 11.42168 0.1014872
+ 2   0.08333333   Bolivia 11.33625 0.1053837
+ 3   0.16666667   Bolivia 11.28417 0.1070594
+ 4   0.25000000   Bolivia 11.21125 0.1083872
+ 5   0.33333333   Bolivia 11.11637 0.1095305
+ 5.1   5  Bolivia 11.11637 0.1095305
+ 5.2   5.5   Bolivia 11.11637 0.1095305
+ 5.3   6   Bolivia 11.11637 0.1095305
+ 5.4   20   Bolivia 11.11637 0.1095305
+ 5.5   20.1   Bolivia 11.11637 0.1095305
+ 5.6   50   Bolivia 11.11637 0.1095305
+ 602  0.00000000  Brazil 11.54888 0.10839417
+ 603  0.08333333  Brazil 11.46345 0.11255592
+ 604  0.16666667  Brazil 11.41137 0.11434565
+ 605  0.25000000  Brazil 11.33844 0.11576378
+ 606  0.33333333  Brazil 11.24357 0.11698489"), header=TRUE)
> closeAllConnections()
> result <- lapply(split(x, x$country), function(.ctry){
+     # keep all < 5 and only integers over 5
+     subset(.ctry, .ctry$Age.yrs. < 5 | .ctry$Age.yrs. %in% 5:50)
+ })
>
> result
$Bolivia
       Age.yrs. country       mu     sigma
1    0.00000000 Bolivia 11.42168 0.1014872
2    0.08333333 Bolivia 11.33625 0.1053837
3    0.16666667 Bolivia 11.28417 0.1070594
4    0.25000000 Bolivia 11.21125 0.1083872
5    0.33333333 Bolivia 11.11637 0.1095305
5.1  5.00000000 Bolivia 11.11637 0.1095305
5.3  6.00000000 Bolivia 11.11637 0.1095305
5.4 20.00000000 Bolivia 11.11637 0.1095305
5.6 50.00000000 Bolivia 11.11637 0.1095305

$Brazil
      Age.yrs. country       mu     sigma
602 0.00000000  Brazil 11.54888 0.1083942
603 0.08333333  Brazil 11.46345 0.1125559
604 0.16666667  Brazil 11.41137 0.1143456
605 0.25000000  Brazil 11.33844 0.1157638
606 0.33333333  Brazil 11.24357 0.1169849


On Tue, Nov 3, 2009 at 9:31 AM, Hayes, Daniel <[hidden email]> wrote:

> Dear R-helpers,
>
> I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below).
>
> *        I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame.
>
> *        I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc)
>
>
> I have tried using:  c(seq(0,5,1/12),seq(5,50,1) )  to select the desired ages but am unsure how to repeat that sequence for consecutive countries.
> I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one  variable
> Have also looked a 'by', 'apply' and things like 'for (i in 1:13)'
>
> Help with either or both steps would be greatly appreciated.
>
> Greetings from Formentera,
> Daniel
>
>           Age(yrs) country       mu     sigma
> 1   0.00000000   Bolivia 11.42168 0.1014872
> 2   0.08333333   Bolivia 11.33625 0.1053837
> 3   0.16666667   Bolivia 11.28417 0.1070594
> 4   0.25000000   Bolivia 11.21125 0.1083872
> 5   0.33333333   Bolivia 11.11637 0.1095305
> ...
> 602  0.00000000  Brazil 11.54888 0.10839417
> 603  0.08333333  Brazil 11.46345 0.11255592
> 604  0.16666667  Brazil 11.41137 0.11434565
> 605  0.25000000  Brazil 11.33844 0.11576378
> 606  0.33333333  Brazil 11.24357 0.11698489
> ...
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
jholtman

Re: creating mulptiple new variables from one data.frame according to columns and rows in that frame

Reply Threaded More More options
Print post
Permalink
My guess is that we are being affected by FAQ 7.31 (good old floating
point numbers).  The test 'age %in% 5:50' might be affected by round
off.  Something like the following might be better:

age < 5 | (abs(age - round(age)) < 0.001)

This should give TRUE for all ages that are 'close' to the year.  Take
a look at your data where you thing values might be missing and set
'options(digit=20)' to print out the full values.

On Wed, Nov 4, 2009 at 8:03 AM, Hayes, Daniel <[hidden email]> wrote:

> Jim Holtman,
> Thank you for your reply.
> Your script is very concise and I think it could help me.
> However when I run it on my real data object (musigma.lat.m) the age range from 5-50 skips certain full years (see script below).
> Am not sure why that is and no error is given.
> Hoping you can help.
>
> Thank you in advance for your time and energy.
> All the best,
> Daniel
>
>> dput(musigma.lat.m[580:620,])
> structure(list(age = c(48.25, 48.3333333333333, 48.4166666666667,
> 48.5, 48.5833333333333, 48.6666666666667, 48.75, 48.8333333333333,
> 48.9166666666667, 49, 49.0833333333333, 49.1666666666667, 49.25,
> 49.3333333333333, 49.4166666666667, 49.5, 49.5833333333333, 49.6666666666667,
> 49.75, 49.8333333333333, 49.9166666666667, 50, 0, 0.0833333333333333,
> 0.166666666666667, 0.25, 0.333333333333333, 0.416666666666667,
> 0.5, 0.583333333333333, 0.666666666666667, 0.75, 0.833333333333333,
> 0.916666666666667, 1, 1.08333333333333, 1.16666666666667, 1.25,
> 1.33333333333333, 1.41666666666667, 1.5), country = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bolivia", "Brazil",
> "Colombia", "Dominican Rep.", "El Salvador", "Guatemala", "Guyana",
> "Haiti", "Honduras", "Nicaragua", "Paraguay", "Peru", "Suriname"
> ), class = "factor"), mu = c(10.7198320154036, 10.7193221119285,
> 10.7188036231439, 10.7182764259851, 10.7177406001273, 10.7171962535812,
> 10.7166435245754, 10.7160826629999, 10.7155141252060, 10.7149385933270,
> 10.7143568116012, 10.7137696820872, 10.7131779280271, 10.7125822168258,
> 10.7119832145823, 10.7113816139594, 10.7107780960397, 10.7101732860418,
> 10.7095677728307, 10.7089620284128, 10.7083564497153, 10.7077512971194,
> 11.548875536071, 11.4634458099448, 11.4113675486745, 11.3384424250672,
> 11.2435706626324, 11.1313969585720, 11.0086560681222, 10.8827443523793,
> 10.7598371816865, 10.6440424747848, 10.5382165128003, 10.4442220905656,
> 10.3633207905823, 10.2961499250469, 10.2427320635721, 10.2025802100475,
> 10.1749531325293, 10.1590477762319, 10.1540156426321), sigma = c(0.0947487228789027,
> 0.0947760295260326, 0.0948033853581562, 0.0948307832769866, 0.094858216728106,
> 0.0948856796527442, 0.0949131660004063, 0.0949406718763748, 0.0949681949273155,
> 0.0949957322607503, 0.0950232806230888, 0.095050836445582, 0.0950783958990592,
> 0.0951059550037287, 0.0951335102859937, 0.0951610590705954, 0.0951885984623664,
> 0.0952161256367413, 0.0952436392777666, 0.0952711384472643, 0.0952986226318235,
> 0.0953260918098295, 0.108394172852678, 0.112555919942990, 0.114345649992535,
> 0.115763779372203, 0.116984886895669, 0.118065092089138, 0.119029362771532,
> 0.119887968678076, 0.120638553936562, 0.121278180095107, 0.121810743569063,
> 0.122245010348365, 0.122590801228219, 0.122858869689557, 0.123059216409329,
> 0.123199542683827, 0.123286339009648, 0.123324768295488, 0.123319375423601
> )), .Names = c("age", "country", "mu", "sigma"), row.names = c("580",
> "581", "582", "583", "584", "585", "586", "587", "588", "589",
> "590", "591", "592", "593", "594", "595", "596", "597", "598",
> "599", "600", "601", "602", "603", "604", "605", "606", "607",
> "608", "609", "610", "611", "612", "613", "614", "615", "616",
> "617", "618", "619", "620"), class = "data.frame")
>>
>> result <- lapply(split(musigma.lat.m, musigma.lat.m$country), function(.ctry){
> +      # keep all < 5 and only integers over 5
> +      subset(.ctry, .ctry$age < 5 | .ctry$age %in% 5:50)
> +  })
>>
>> result
> $Bolivia
>            age country       mu      sigma
> 1    0.00000000 Bolivia 11.42168 0.10148719
> 2    0.08333333 Bolivia 11.33625 0.10538375
> 3    0.16666667 Bolivia 11.28417 0.10705943
> 4    0.25000000 Bolivia 11.21125 0.10838720
> 5    0.33333333 Bolivia 11.11637 0.10953050
> ...
> 59   4.83333333 Bolivia 10.49080 0.10671819
> 60   4.91666667 Bolivia 10.48562 0.10653400
> 109  9.00000000 Bolivia 10.43279 0.10180158
> 133 11.00000000 Bolivia 10.33394 0.10160484
> 169 14.00000000 Bolivia 10.24878 0.09946659
> 193 16.00000000 Bolivia 10.20148 0.09694376
> 205 17.00000000 Bolivia 10.16589 0.09573946
>
> $Brazil
>             age country       mu      sigma
> 602   0.00000000  Brazil 11.54888 0.10839417
> 603   0.08333333  Brazil 11.46345 0.11255592
> 604   0.16666667  Brazil 11.41137 0.11434565
> 605   0.25000000  Brazil 11.33844 0.11576378
> ...
> 660   4.83333333  Brazil 10.61799 0.11398118
> 661   4.91666667  Brazil 10.61281 0.11378445
> 710   9.00000000  Brazil 10.55999 0.10872996
> 734  11.00000000  Brazil 10.46113 0.10851983
> 770  14.00000000  Brazil 10.37597 0.10623606
> 794  16.00000000  Brazil 10.32867 0.10354153
>
> -----Original Message-----
> From: jim holtman [mailto:[hidden email]]
> Sent: 04 November 2009 03:12
> To: Hayes, Daniel
> Cc: [hidden email]
> Subject: Re: [R] creating mulptiple new variables from one data.frame according to columns and rows in that frame
>
> try this:
>
>> x <- read.table(textConnection("          Age(yrs) country       mu     sigma
> + 1   0.00000000   Bolivia 11.42168 0.1014872
> + 2   0.08333333   Bolivia 11.33625 0.1053837
> + 3   0.16666667   Bolivia 11.28417 0.1070594
> + 4   0.25000000   Bolivia 11.21125 0.1083872
> + 5   0.33333333   Bolivia 11.11637 0.1095305
> + 5.1   5  Bolivia 11.11637 0.1095305
> + 5.2   5.5   Bolivia 11.11637 0.1095305
> + 5.3   6   Bolivia 11.11637 0.1095305
> + 5.4   20   Bolivia 11.11637 0.1095305
> + 5.5   20.1   Bolivia 11.11637 0.1095305
> + 5.6   50   Bolivia 11.11637 0.1095305
> + 602  0.00000000  Brazil 11.54888 0.10839417
> + 603  0.08333333  Brazil 11.46345 0.11255592
> + 604  0.16666667  Brazil 11.41137 0.11434565
> + 605  0.25000000  Brazil 11.33844 0.11576378
> + 606  0.33333333  Brazil 11.24357 0.11698489"), header=TRUE)
>> closeAllConnections()
>> result <- lapply(split(x, x$country), function(.ctry){
> +     # keep all < 5 and only integers over 5
> +     subset(.ctry, .ctry$Age.yrs. < 5 | .ctry$Age.yrs. %in% 5:50)
> + })
>>
>> result
> $Bolivia
>       Age.yrs. country       mu     sigma
> 1    0.00000000 Bolivia 11.42168 0.1014872
> 2    0.08333333 Bolivia 11.33625 0.1053837
> 3    0.16666667 Bolivia 11.28417 0.1070594
> 4    0.25000000 Bolivia 11.21125 0.1083872
> 5    0.33333333 Bolivia 11.11637 0.1095305
> 5.1  5.00000000 Bolivia 11.11637 0.1095305
> 5.3  6.00000000 Bolivia 11.11637 0.1095305
> 5.4 20.00000000 Bolivia 11.11637 0.1095305
> 5.6 50.00000000 Bolivia 11.11637 0.1095305
>
> $Brazil
>      Age.yrs. country       mu     sigma
> 602 0.00000000  Brazil 11.54888 0.1083942
> 603 0.08333333  Brazil 11.46345 0.1125559
> 604 0.16666667  Brazil 11.41137 0.1143456
> 605 0.25000000  Brazil 11.33844 0.1157638
> 606 0.33333333  Brazil 11.24357 0.1169849
>
>
> On Tue, Nov 3, 2009 at 9:31 AM, Hayes, Daniel <[hidden email]> wrote:
>> Dear R-helpers,
>>
>> I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below).
>>
>> *        I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame.
>>
>> *        I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc)
>>
>>
>> I have tried using:  c(seq(0,5,1/12),seq(5,50,1) )  to select the desired ages but am unsure how to repeat that sequence for consecutive countries.
>> I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one  variable
>> Have also looked a 'by', 'apply' and things like 'for (i in 1:13)'
>>
>> Help with either or both steps would be greatly appreciated.
>>
>> Greetings from Formentera,
>> Daniel
>>
>>           Age(yrs) country       mu     sigma
>> 1   0.00000000   Bolivia 11.42168 0.1014872
>> 2   0.08333333   Bolivia 11.33625 0.1053837
>> 3   0.16666667   Bolivia 11.28417 0.1070594
>> 4   0.25000000   Bolivia 11.21125 0.1083872
>> 5   0.33333333   Bolivia 11.11637 0.1095305
>> ...
>> 602  0.00000000  Brazil 11.54888 0.10839417
>> 603  0.08333333  Brazil 11.46345 0.11255592
>> 604  0.16666667  Brazil 11.41137 0.11434565
>> 605  0.25000000  Brazil 11.33844 0.11576378
>> 606  0.33333333  Brazil 11.24357 0.11698489
>> ...
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hayes, Daniel

Re: creating mulptiple new variables from one data.frame according to columns and rows in that frame

Reply Threaded More More options
Print post
Permalink
YES, that does the trick.
Glad to have your help for I had no idea of the existence of FAQ 7.31 nor for that matter do I completely understand what floating point number are (but that is another story :P))
Think I am all set.
Cheers again for your time and energy.

Daniel

-----Original Message-----
From: jim holtman [mailto:[hidden email]]
Sent: 04 November 2009 15:30
To: Hayes, Daniel
Cc: [hidden email]
Subject: Re: [R] creating mulptiple new variables from one data.frame according to columns and rows in that frame

My guess is that we are being affected by FAQ 7.31 (good old floating
point numbers).  The test 'age %in% 5:50' might be affected by round
off.  Something like the following might be better:

age < 5 | (abs(age - round(age)) < 0.001)

This should give TRUE for all ages that are 'close' to the year.  Take
a look at your data where you thing values might be missing and set
'options(digit=20)' to print out the full values.

On Wed, Nov 4, 2009 at 8:03 AM, Hayes, Daniel <[hidden email]> wrote:

> Jim Holtman,
> Thank you for your reply.
> Your script is very concise and I think it could help me.
> However when I run it on my real data object (musigma.lat.m) the age range from 5-50 skips certain full years (see script below).
> Am not sure why that is and no error is given.
> Hoping you can help.
>
> Thank you in advance for your time and energy.
> All the best,
> Daniel
>
>> dput(musigma.lat.m[580:620,])
> structure(list(age = c(48.25, 48.3333333333333, 48.4166666666667,
> 48.5, 48.5833333333333, 48.6666666666667, 48.75, 48.8333333333333,
> 48.9166666666667, 49, 49.0833333333333, 49.1666666666667, 49.25,
> 49.3333333333333, 49.4166666666667, 49.5, 49.5833333333333, 49.6666666666667,
> 49.75, 49.8333333333333, 49.9166666666667, 50, 0, 0.0833333333333333,
> 0.166666666666667, 0.25, 0.333333333333333, 0.416666666666667,
> 0.5, 0.583333333333333, 0.666666666666667, 0.75, 0.833333333333333,
> 0.916666666666667, 1, 1.08333333333333, 1.16666666666667, 1.25,
> 1.33333333333333, 1.41666666666667, 1.5), country = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bolivia", "Brazil",
> "Colombia", "Dominican Rep.", "El Salvador", "Guatemala", "Guyana",
> "Haiti", "Honduras", "Nicaragua", "Paraguay", "Peru", "Suriname"
> ), class = "factor"), mu = c(10.7198320154036, 10.7193221119285,
> 10.7188036231439, 10.7182764259851, 10.7177406001273, 10.7171962535812,
> 10.7166435245754, 10.7160826629999, 10.7155141252060, 10.7149385933270,
> 10.7143568116012, 10.7137696820872, 10.7131779280271, 10.7125822168258,
> 10.7119832145823, 10.7113816139594, 10.7107780960397, 10.7101732860418,
> 10.7095677728307, 10.7089620284128, 10.7083564497153, 10.7077512971194,
> 11.548875536071, 11.4634458099448, 11.4113675486745, 11.3384424250672,
> 11.2435706626324, 11.1313969585720, 11.0086560681222, 10.8827443523793,
> 10.7598371816865, 10.6440424747848, 10.5382165128003, 10.4442220905656,
> 10.3633207905823, 10.2961499250469, 10.2427320635721, 10.2025802100475,
> 10.1749531325293, 10.1590477762319, 10.1540156426321), sigma = c(0.0947487228789027,
> 0.0947760295260326, 0.0948033853581562, 0.0948307832769866, 0.094858216728106,
> 0.0948856796527442, 0.0949131660004063, 0.0949406718763748, 0.0949681949273155,
> 0.0949957322607503, 0.0950232806230888, 0.095050836445582, 0.0950783958990592,
> 0.0951059550037287, 0.0951335102859937, 0.0951610590705954, 0.0951885984623664,
> 0.0952161256367413, 0.0952436392777666, 0.0952711384472643, 0.0952986226318235,
> 0.0953260918098295, 0.108394172852678, 0.112555919942990, 0.114345649992535,
> 0.115763779372203, 0.116984886895669, 0.118065092089138, 0.119029362771532,
> 0.119887968678076, 0.120638553936562, 0.121278180095107, 0.121810743569063,
> 0.122245010348365, 0.122590801228219, 0.122858869689557, 0.123059216409329,
> 0.123199542683827, 0.123286339009648, 0.123324768295488, 0.123319375423601
> )), .Names = c("age", "country", "mu", "sigma"), row.names = c("580",
> "581", "582", "583", "584", "585", "586", "587", "588", "589",
> "590", "591", "592", "593", "594", "595", "596", "597", "598",
> "599", "600", "601", "602", "603", "604", "605", "606", "607",
> "608", "609", "610", "611", "612", "613", "614", "615", "616",
> "617", "618", "619", "620"), class = "data.frame")
>>
>> result <- lapply(split(musigma.lat.m, musigma.lat.m$country), function(.ctry){
> +      # keep all < 5 and only integers over 5
> +      subset(.ctry, .ctry$age < 5 | .ctry$age %in% 5:50)
> +  })
>>
>> result
> $Bolivia
>            age country       mu      sigma
> 1    0.00000000 Bolivia 11.42168 0.10148719
> 2    0.08333333 Bolivia 11.33625 0.10538375
> 3    0.16666667 Bolivia 11.28417 0.10705943
> 4    0.25000000 Bolivia 11.21125 0.10838720
> 5    0.33333333 Bolivia 11.11637 0.10953050
> ...
> 59   4.83333333 Bolivia 10.49080 0.10671819
> 60   4.91666667 Bolivia 10.48562 0.10653400
> 109  9.00000000 Bolivia 10.43279 0.10180158
> 133 11.00000000 Bolivia 10.33394 0.10160484
> 169 14.00000000 Bolivia 10.24878 0.09946659
> 193 16.00000000 Bolivia 10.20148 0.09694376
> 205 17.00000000 Bolivia 10.16589 0.09573946
>
> $Brazil
>             age country       mu      sigma
> 602   0.00000000  Brazil 11.54888 0.10839417
> 603   0.08333333  Brazil 11.46345 0.11255592
> 604   0.16666667  Brazil 11.41137 0.11434565
> 605   0.25000000  Brazil 11.33844 0.11576378
> ...
> 660   4.83333333  Brazil 10.61799 0.11398118
> 661   4.91666667  Brazil 10.61281 0.11378445
> 710   9.00000000  Brazil 10.55999 0.10872996
> 734  11.00000000  Brazil 10.46113 0.10851983
> 770  14.00000000  Brazil 10.37597 0.10623606
> 794  16.00000000  Brazil 10.32867 0.10354153
>
> -----Original Message-----
> From: jim holtman [mailto:[hidden email]]
> Sent: 04 November 2009 03:12
> To: Hayes, Daniel
> Cc: [hidden email]
> Subject: Re: [R] creating mulptiple new variables from one data.frame according to columns and rows in that frame
>
> try this:
>
>> x <- read.table(textConnection("          Age(yrs) country       mu     sigma
> + 1   0.00000000   Bolivia 11.42168 0.1014872
> + 2   0.08333333   Bolivia 11.33625 0.1053837
> + 3   0.16666667   Bolivia 11.28417 0.1070594
> + 4   0.25000000   Bolivia 11.21125 0.1083872
> + 5   0.33333333   Bolivia 11.11637 0.1095305
> + 5.1   5  Bolivia 11.11637 0.1095305
> + 5.2   5.5   Bolivia 11.11637 0.1095305
> + 5.3   6   Bolivia 11.11637 0.1095305
> + 5.4   20   Bolivia 11.11637 0.1095305
> + 5.5   20.1   Bolivia 11.11637 0.1095305
> + 5.6   50   Bolivia 11.11637 0.1095305
> + 602  0.00000000  Brazil 11.54888 0.10839417
> + 603  0.08333333  Brazil 11.46345 0.11255592
> + 604  0.16666667  Brazil 11.41137 0.11434565
> + 605  0.25000000  Brazil 11.33844 0.11576378
> + 606  0.33333333  Brazil 11.24357 0.11698489"), header=TRUE)
>> closeAllConnections()
>> result <- lapply(split(x, x$country), function(.ctry){
> +     # keep all < 5 and only integers over 5
> +     subset(.ctry, .ctry$Age.yrs. < 5 | .ctry$Age.yrs. %in% 5:50)
> + })
>>
>> result
> $Bolivia
>       Age.yrs. country       mu     sigma
> 1    0.00000000 Bolivia 11.42168 0.1014872
> 2    0.08333333 Bolivia 11.33625 0.1053837
> 3    0.16666667 Bolivia 11.28417 0.1070594
> 4    0.25000000 Bolivia 11.21125 0.1083872
> 5    0.33333333 Bolivia 11.11637 0.1095305
> 5.1  5.00000000 Bolivia 11.11637 0.1095305
> 5.3  6.00000000 Bolivia 11.11637 0.1095305
> 5.4 20.00000000 Bolivia 11.11637 0.1095305
> 5.6 50.00000000 Bolivia 11.11637 0.1095305
>
> $Brazil
>      Age.yrs. country       mu     sigma
> 602 0.00000000  Brazil 11.54888 0.1083942
> 603 0.08333333  Brazil 11.46345 0.1125559
> 604 0.16666667  Brazil 11.41137 0.1143456
> 605 0.25000000  Brazil 11.33844 0.1157638
> 606 0.33333333  Brazil 11.24357 0.1169849
>
>
> On Tue, Nov 3, 2009 at 9:31 AM, Hayes, Daniel <[hidden email]> wrote:
>> Dear R-helpers,
>>
>> I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below).
>>
>> *        I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame.
>>
>> *        I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc)
>>
>>
>> I have tried using:  c(seq(0,5,1/12),seq(5,50,1) )  to select the desired ages but am unsure how to repeat that sequence for consecutive countries.
>> I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one  variable
>> Have also looked a 'by', 'apply' and things like 'for (i in 1:13)'
>>
>> Help with either or both steps would be greatly appreciated.
>>
>> Greetings from Formentera,
>> Daniel
>>
>>           Age(yrs) country       mu     sigma
>> 1   0.00000000   Bolivia 11.42168 0.1014872
>> 2   0.08333333   Bolivia 11.33625 0.1053837
>> 3   0.16666667   Bolivia 11.28417 0.1070594
>> 4   0.25000000   Bolivia 11.21125 0.1083872
>> 5   0.33333333   Bolivia 11.11637 0.1095305
>> ...
>> 602  0.00000000  Brazil 11.54888 0.10839417
>> 603  0.08333333  Brazil 11.46345 0.11255592
>> 604  0.16666667  Brazil 11.41137 0.11434565
>> 605  0.25000000  Brazil 11.33844 0.11576378
>> 606  0.33333333  Brazil 11.24357 0.11698489
>> ...
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.