avoiding loop

11 messages Options
Embed this post
Permalink
parkbomee

avoiding loop

Reply Threaded More More options
Print post
Permalink

Hi all,

I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop.
I want to calculate a conditional mean(?) given time.
For example, from the data below, I want to calculate sum((value|choice==1)/sum(value)) across time.
Is there a way to do it without using a loop?

time  cum_time  choice    value
1         4             1           3
1         4              0           2
1          4             0           3
1          4             0           3
2         6             1           4
2         6             0           4
2         6             0           2
2         6             0           4
2         6             0           2
2         6             0           2
3         4             1           2
3         4             0           3
3         4             0           5
3         4             0           2



My code looks like

objective[1] = value[1] / sum(value[1:cum_time[1])
for (i in 2:max(time)){
     objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])])
}
sum(objective)


Anyone have an idea that I can do this without using a loop??
Thanks.

     
_________________________________________________________________
[[elided Hotmail spam]]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos-4

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink
one approach is the following:

# say 'DF' is your data frame, then
with(DF, {
     ind <- choice == 1
     n <- tapply(value[ind], time[ind], sum)
     d <- tapply(value, time, sum)
     n / d
})


I hope it helps.

Best,
Dimitris


parkbomee wrote:

> Hi all,
>
> I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop.
> I want to calculate a conditional mean(?) given time.
> For example, from the data below, I want to calculate sum((value|choice==1)/sum(value)) across time.
> Is there a way to do it without using a loop?
>
> time  cum_time  choice    value
> 1         4             1           3
> 1         4              0           2
> 1          4             0           3
> 1          4             0           3
> 2         6             1           4
> 2         6             0           4
> 2         6             0           2
> 2         6             0           4
> 2         6             0           2
> 2         6             0           2
> 3         4             1           2
> 3         4             0           3
> 3         4             0           5
> 3         4             0           2
>
>
>
> My code looks like
>
> objective[1] = value[1] / sum(value[1:cum_time[1])
> for (i in 2:max(time)){
>      objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])])
> }
> sum(objective)
>
>
> Anyone have an idea that I can do this without using a loop??
> Thanks.
>
>      
> _________________________________________________________________
> [[elided Hotmail spam]]
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink
This is pretty much equivalent:

tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
         tapply(DF$value, DF$time, sum)

And both will probably fail if the number of groups with choice==1 is  
different than the number overall.

--
David.

On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:

> one approach is the following:
>
> # say 'DF' is your data frame, then
> with(DF, {
>    ind <- choice == 1
>    n <- tapply(value[ind], time[ind], sum)
>    d <- tapply(value, time, sum)
>    n / d
> })
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
>
> parkbomee wrote:
>> Hi all,
>> I am trying to figure out a way to improve my code's efficiency by  
>> avoiding the use of loop.
>> I want to calculate a conditional mean(?) given time.
>> For example, from the data below, I want to calculate sum((value|
>> choice==1)/sum(value)) across time.
>> Is there a way to do it without using a loop?
>> time  cum_time  choice    value
>> 1         4             1           3
>> 1         4              0           2
>> 1          4             0           3
>> 1          4             0           3
>> 2         6             1           4
>> 2         6             0           4
>> 2         6             0           2
>> 2         6             0           4
>> 2         6             0           2
>> 2         6             0           2 3         4              
>> 1           2 3         4             0           3 3          
>> 4             0           5 3         4             0           2  
>> My code looks like
>> objective[1] = value[1] / sum(value[1:cum_time[1])
>> for (i in 2:max(time)){
>>     objective[i] = value[cum_time[i-1]+1] /  
>> sum(value[(cum_time[i-1]+1) : cum_time[i])])
>> }
>> sum(objective)
>> Anyone have an idea that I can do this without using a loop??
>> Thanks.
>>    
>> _________________________________________________________________
>> [[elided Hotmail spam]]
>> [[alternative HTML version deleted]]
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
>
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
parkbomee

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink

Thank you both.

However, using tapply() instead of a loop does not seem to improve my code much.
I am using this inside of an optimization function,
and it still takes more than it needs...



> CC: [hidden email]; [hidden email]
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] avoiding loop
> Date: Sat, 31 Oct 2009 22:26:17 -0400
>
> This is pretty much equivalent:
>
> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
>          tapply(DF$value, DF$time, sum)
>
> And both will probably fail if the number of groups with choice==1 is  
> different than the number overall.
>
> --
> David.
>
> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
>
> > one approach is the following:
> >
> > # say 'DF' is your data frame, then
> > with(DF, {
> >    ind <- choice == 1
> >    n <- tapply(value[ind], time[ind], sum)
> >    d <- tapply(value, time, sum)
> >    n / d
> > })
> >
> >
> > I hope it helps.
> >
> > Best,
> > Dimitris
> >
> >
> > parkbomee wrote:
> >> Hi all,
> >> I am trying to figure out a way to improve my code's efficiency by  
> >> avoiding the use of loop.
> >> I want to calculate a conditional mean(?) given time.
> >> For example, from the data below, I want to calculate sum((value|
> >> choice==1)/sum(value)) across time.
> >> Is there a way to do it without using a loop?
> >> time  cum_time  choice    value
> >> 1         4             1           3
> >> 1         4              0           2
> >> 1          4             0           3
> >> 1          4             0           3
> >> 2         6             1           4
> >> 2         6             0           4
> >> 2         6             0           2
> >> 2         6             0           4
> >> 2         6             0           2
> >> 2         6             0           2 3         4              
> >> 1           2 3         4             0           3 3          
> >> 4             0           5 3         4             0           2  
> >> My code looks like
> >> objective[1] = value[1] / sum(value[1:cum_time[1])
> >> for (i in 2:max(time)){
> >>     objective[i] = value[cum_time[i-1]+1] /  
> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
> >> }
> >> sum(objective)
> >> Anyone have an idea that I can do this without using a loop??
> >> Thanks.
> >>    
> >> _________________________________________________________________
> >> [[elided Hotmail spam]]
> >> [[alternative HTML version deleted]]
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Dimitris Rizopoulos
> > Assistant Professor
> > Department of Biostatistics
> > Erasmus University Medical Center
> >
> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> > Tel: +31/(0)10/7043478
> > Fax: +31/(0)10/7043014
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
     
_________________________________________________________________
[[elided Hotmail spam]]
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink
On Sat, 31 Oct 2009, parkbomee wrote:

>
> Thank you both.
>
> However, using tapply() instead of a loop does not seem to improve my code much.
> I am using this inside of an optimization function,
> and it still takes more than it needs...
>


Well, you haven't given us much to work with.

The optimization choices depend on the particulars of your problem, which
you've not detailed.

It does not take long to run the tapply() code once, so you need to do it
many times. Right?

If so, you need to say how the structure varies (rendered as DF in David's
reply) from iteration to iteration in the optimization.

If it turns out that only 'value' changes and that the number of different
times is not too large, then precomputing suitable indicator matrics may
help:

  mat1 <- model.matrix( ~ 0 + factor(time):as.numeric(choice==1),DF)
  mat2 <- model.matrix( ~ 0 + factor(time), DF )

Inside the optimization use something like

  with(DF,(value%*%mat1)/(value%*%mat2))

If the structure can change or the number of unique times is large, then
with so simple a calculation you should probably just inline some C code.

  http://cran.r-project.org/web/packages/inline/index.html

HTH,

Chuck

>
>
>> CC: [hidden email]; [hidden email]
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Re: [R] avoiding loop
>> Date: Sat, 31 Oct 2009 22:26:17 -0400
>>
>> This is pretty much equivalent:
>>
>> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
>>          tapply(DF$value, DF$time, sum)
>>
>> And both will probably fail if the number of groups with choice==1 is
>> different than the number overall.
>>
>> --
>> David.
>>
>> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
>>
>>> one approach is the following:
>>>
>>> # say 'DF' is your data frame, then
>>> with(DF, {
>>>    ind <- choice == 1
>>>    n <- tapply(value[ind], time[ind], sum)
>>>    d <- tapply(value, time, sum)
>>>    n / d
>>> })
>>>
>>>
>>> I hope it helps.
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>> parkbomee wrote:
>>>> Hi all,
>>>> I am trying to figure out a way to improve my code's efficiency by
>>>> avoiding the use of loop.
>>>> I want to calculate a conditional mean(?) given time.
>>>> For example, from the data below, I want to calculate sum((value|
>>>> choice==1)/sum(value)) across time.
>>>> Is there a way to do it without using a loop?
>>>> time  cum_time  choice    value
>>>> 1         4             1           3
>>>> 1         4              0           2
>>>> 1          4             0           3
>>>> 1          4             0           3
>>>> 2         6             1           4
>>>> 2         6             0           4
>>>> 2         6             0           2
>>>> 2         6             0           4
>>>> 2         6             0           2
>>>> 2         6             0           2 3         4
>>>> 1           2 3         4             0           3 3
>>>> 4             0           5 3         4             0           2
>>>> My code looks like
>>>> objective[1] = value[1] / sum(value[1:cum_time[1])
>>>> for (i in 2:max(time)){
>>>>     objective[i] = value[cum_time[i-1]+1] /
>>>> sum(value[(cum_time[i-1]+1) : cum_time[i])])
>>>> }
>>>> sum(objective)
>>>> Anyone have an idea that I can do this without using a loop??
>>>> Thanks.
>>>>
>>>> _________________________________________________________________
>>>> [[elided Hotmail spam]]
>>>> [[alternative HTML version deleted]]
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Dimitris Rizopoulos
>>> Assistant Professor
>>> Department of Biostatistics
>>> Erasmus University Medical Center
>>>
>>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>>> Tel: +31/(0)10/7043478
>>> Fax: +31/(0)10/7043014
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>
> _________________________________________________________________
> [[elided Hotmail spam]]
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:[hidden email]            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
jholtman

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink
In reply to this post by parkbomee
What you need to do is to understand how to use Rprof so that you can
determine where the time is being spent.  It probably indicates that
this is not the source of slowness in your optimization function.  How
much time are we talking about?  You may spent more time trying to
optimize the function than just running the current version even if it
is "slow" (slow is a relative term and does not hold much meaning
without some context round it).

On Sat, Oct 31, 2009 at 11:36 PM, parkbomee <[hidden email]> wrote:

>
> Thank you both.
>
> However, using tapply() instead of a loop does not seem to improve my code much.
> I am using this inside of an optimization function,
> and it still takes more than it needs...
>
>
>
>> CC: [hidden email]; [hidden email]
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Re: [R] avoiding loop
>> Date: Sat, 31 Oct 2009 22:26:17 -0400
>>
>> This is pretty much equivalent:
>>
>> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
>>          tapply(DF$value, DF$time, sum)
>>
>> And both will probably fail if the number of groups with choice==1 is
>> different than the number overall.
>>
>> --
>> David.
>>
>> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
>>
>> > one approach is the following:
>> >
>> > # say 'DF' is your data frame, then
>> > with(DF, {
>> >    ind <- choice == 1
>> >    n <- tapply(value[ind], time[ind], sum)
>> >    d <- tapply(value, time, sum)
>> >    n / d
>> > })
>> >
>> >
>> > I hope it helps.
>> >
>> > Best,
>> > Dimitris
>> >
>> >
>> > parkbomee wrote:
>> >> Hi all,
>> >> I am trying to figure out a way to improve my code's efficiency by
>> >> avoiding the use of loop.
>> >> I want to calculate a conditional mean(?) given time.
>> >> For example, from the data below, I want to calculate sum((value|
>> >> choice==1)/sum(value)) across time.
>> >> Is there a way to do it without using a loop?
>> >> time  cum_time  choice    value
>> >> 1         4             1           3
>> >> 1         4              0           2
>> >> 1          4             0           3
>> >> 1          4             0           3
>> >> 2         6             1           4
>> >> 2         6             0           4
>> >> 2         6             0           2
>> >> 2         6             0           4
>> >> 2         6             0           2
>> >> 2         6             0           2 3         4
>> >> 1           2 3         4             0           3 3
>> >> 4             0           5 3         4             0           2
>> >> My code looks like
>> >> objective[1] = value[1] / sum(value[1:cum_time[1])
>> >> for (i in 2:max(time)){
>> >>     objective[i] = value[cum_time[i-1]+1] /
>> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
>> >> }
>> >> sum(objective)
>> >> Anyone have an idea that I can do this without using a loop??
>> >> Thanks.
>> >>
>> >> _________________________________________________________________
>> >> [[elided Hotmail spam]]
>> >>    [[alternative HTML version deleted]]
>> >> ______________________________________________
>> >> [hidden email] mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > --
>> > Dimitris Rizopoulos
>> > Assistant Professor
>> > Department of Biostatistics
>> > Erasmus University Medical Center
>> >
>> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> > Tel: +31/(0)10/7043478
>> > Fax: +31/(0)10/7043014
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>
> _________________________________________________________________
> [[elided Hotmail spam]]
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
parkbomee

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink

Thank you all.

What Chuck has suggested might not be applicable since the number of different times is around 40,000.
The object of optimization in my function is the varying "value", which is basically data * parameter, of which "parameter" is the object of optimization..
 
And from the r profiling with a subset of data,
I got this report..any idea what "<Anonymous>" is?


$by.total
                        total.time total.pct self.time self.pct
"<Anonymous>"               571.56     100.0      0.02      0.0
"optim"                     571.56     100.0      0.00      0.0
"fn"                        571.54     100.0      0.98      0.2
"eval"                      423.74      74.1      0.00      0.0
"with.default"              423.74      74.1      0.00      0.0
"with"                      423.74      74.1      0.00      0.0
"tapply"                    414.28      72.5     13.84      2.4
"lapply"                    255.48      44.7     76.94     13.5
"factor"                    127.68      22.3     11.08      1.9
"unlist"                    120.54      21.1     80.46     14.1
"FUN"                        94.16      16.5     94.16     16.5
.
.
.
.
.


> Date: Sun, 1 Nov 2009 15:35:41 -0400
> Subject: Re: [R] avoiding loop
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]; [hidden email]
>
> What you need to do is to understand how to use Rprof so that you can
> determine where the time is being spent.  It probably indicates that
> this is not the source of slowness in your optimization function.  How
> much time are we talking about?  You may spent more time trying to
> optimize the function than just running the current version even if it
> is "slow" (slow is a relative term and does not hold much meaning
> without some context round it).
>
> On Sat, Oct 31, 2009 at 11:36 PM, parkbomee <[hidden email]> wrote:
> >
> > Thank you both.
> >
> > However, using tapply() instead of a loop does not seem to improve my code much.
> > I am using this inside of an optimization function,
> > and it still takes more than it needs...
> >
> >
> >
> >> CC: [hidden email]; [hidden email]
> >> From: [hidden email]
> >> To: [hidden email]
> >> Subject: Re: [R] avoiding loop
> >> Date: Sat, 31 Oct 2009 22:26:17 -0400
> >>
> >> This is pretty much equivalent:
> >>
> >> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
> >>          tapply(DF$value, DF$time, sum)
> >>
> >> And both will probably fail if the number of groups with choice==1 is
> >> different than the number overall.
> >>
> >> --
> >> David.
> >>
> >> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
> >>
> >> > one approach is the following:
> >> >
> >> > # say 'DF' is your data frame, then
> >> > with(DF, {
> >> >    ind <- choice == 1
> >> >    n <- tapply(value[ind], time[ind], sum)
> >> >    d <- tapply(value, time, sum)
> >> >    n / d
> >> > })
> >> >
> >> >
> >> > I hope it helps.
> >> >
> >> > Best,
> >> > Dimitris
> >> >
> >> >
> >> > parkbomee wrote:
> >> >> Hi all,
> >> >> I am trying to figure out a way to improve my code's efficiency by
> >> >> avoiding the use of loop.
> >> >> I want to calculate a conditional mean(?) given time.
> >> >> For example, from the data below, I want to calculate sum((value|
> >> >> choice==1)/sum(value)) across time.
> >> >> Is there a way to do it without using a loop?
> >> >> time  cum_time  choice    value
> >> >> 1         4             1           3
> >> >> 1         4              0           2
> >> >> 1          4             0           3
> >> >> 1          4             0           3
> >> >> 2         6             1           4
> >> >> 2         6             0           4
> >> >> 2         6             0           2
> >> >> 2         6             0           4
> >> >> 2         6             0           2
> >> >> 2         6             0           2 3         4
> >> >> 1           2 3         4             0           3 3
> >> >> 4             0           5 3         4             0           2
> >> >> My code looks like
> >> >> objective[1] = value[1] / sum(value[1:cum_time[1])
> >> >> for (i in 2:max(time)){
> >> >>     objective[i] = value[cum_time[i-1]+1] /
> >> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
> >> >> }
> >> >> sum(objective)
> >> >> Anyone have an idea that I can do this without using a loop??
> >> >> Thanks.
> >> >>
> >> >> _________________________________________________________________
> >> >> [[elided Hotmail spam]]
> >> >>    [[alternative HTML version deleted]]
> >> >> ______________________________________________
> >> >> [hidden email] mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >
> >> > --
> >> > Dimitris Rizopoulos
> >> > Assistant Professor
> >> > Department of Biostatistics
> >> > Erasmus University Medical Center
> >> >
> >> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> >> > Tel: +31/(0)10/7043478
> >> > Fax: +31/(0)10/7043014
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> David Winsemius, MD
> >> Heritage Laboratories
> >> West Hartford, CT
> >>
> >
> > _________________________________________________________________
> > [[elided Hotmail spam]]
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
     
_________________________________________________________________
[[elided Hotmail spam]]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Martin Morgan

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink
parkbomee <[hidden email]> writes:

> Thank you all.
>
> What Chuck has suggested might not be applicable since the number of
> different times is around 40,000.
>
> The object of optimization in my function is the varying "value",
> which is basically data * parameter, of which "parameter" is the
> object of optimization..
>  
> And from the r profiling with a subset of data,
> I got this report..any idea what "<Anonymous>" is?
>
>
> $by.total
>                         total.time total.pct self.time self.pct
> "<Anonymous>"               571.56     100.0      0.02      0.0
> "optim"                     571.56     100.0      0.00      0.0
> "fn"                        571.54     100.0      0.98      0.2

You're giving us 'by.total', so these are saying that all the time was
spent in these functions or the functions they called. Probably all
are in 'optim' and its arguments; since little self.time is spent
here, there isn't much to work with

> "eval"                      423.74      74.1      0.00      0.0
> "with.default"              423.74      74.1      0.00      0.0
> "with"                      423.74      74.1      0.00      0.0

These are probably in the internals of optim, where the function
you're trying to optimize is being set up for evaluation. Again
there's little self.time, and all these say is that a big piece of the
time is being spent in code called by this code.

> "tapply"                    414.28      72.5     13.84      2.4
> "lapply"                    255.48      44.7     76.94     13.5
> "factor"                    127.68      22.3     11.08      1.9
> "unlist"                    120.54      21.1     80.46     14.1
> "FUN"                        94.16      16.5     94.16     16.5

these look like they are tapply-related calls (looking at the code for
tapply, it calls lapply, factor, and unlist, and FUN is the function
argument to tapply), perhaps from the function you're optimizing (did
you implement this as suggested below?  it would really help to have a
possibly simplified version of the code you're calling).

There is material to work with here, as apparently a fairly large
amount of self.time is being spent in each of these functions. So
here's a sample data set

  n <- 100000
  set.seed(123)
  df <- data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
                   value=ceiling(runif(n)*5))

It would have been helpful for you to provide reproducible code like
that above, so that the characteristics of your data were easily
reproducible. Let's time tapply

> replicate(5, {
+     system.time(x0 <<- tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
+ })
[1] 0.316 0.316 0.308 0.320 0.304

tapply is quite general, but in your case I think you'd be happy with

  tapply1 <- function(X, INDEX, FUN)
      unlist(lapply(split(X, INDEX), FUN), use.names=FALSE)

> replicate(5, {
+     system.time(x1 <<- tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]]
+ })
[1] 0.156 0.148 0.152 0.144 0.152

so about twice the speed (timing depends quite a bit on what 'time' is,
integer or numeric or character or factor). The vector values of the
two calculations are identical, though tapply presents the data as an
array with column names

> identical(as.vector(x0), x1)
[1] TRUE

tapply allows FUN to be anything, but if the interest is in the sum of
each time interval, and the time intervals can be assumed to be sorted
(sorting is not expensive, so could be done on the fly), then

  tapply2 <- function(X, INDEX)
  {
      csum <- cumsum(c(0, X))
      idx <- diff(INDEX) != 0
      csum[c(FALSE, idx, TRUE)] - csum[c(TRUE, idx, FALSE)]
  }

calculates the cumulative sum and the points in INDEX where the time
intervals change. It then takes the difference over the appropriate
interval.

> replicate(5, {
+     system.time(x2 <<- tapply2(df$value, df$time), gcFirst=TRUE)[[1]]
+ })
[1] 0.024 0.024 0.024 0.024 0.024
> identical(as.vector(x0), x2)
[1] TRUE

This approach could be subject to rounding error (if csum gets very
large and the intervals remain small). To calculate values where
choice == 1 I think you'd want to

  tapply2(df$value * (df$choice==1), df$time)

rather than sub-setting, so that the result of tapply2 is always a
vector of the same length even when some time intervals never have
choice==1.

Because tapply in these examples seems so fast compared to your
calculation, I wonder whether optim is evaluating your function many
times, and that reformulating the optimization might lead to a very
substantial speed-up?

Martin

> .
> .
> .
> .
> .
>
>
>> Date: Sun, 1 Nov 2009 15:35:41 -0400
>> Subject: Re: [R] avoiding loop
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]; [hidden email]; [hidden email]
>>
>> What you need to do is to understand how to use Rprof so that you can
>> determine where the time is being spent.  It probably indicates that
>> this is not the source of slowness in your optimization function.  How
>> much time are we talking about?  You may spent more time trying to
>> optimize the function than just running the current version even if it
>> is "slow" (slow is a relative term and does not hold much meaning
>> without some context round it).
>>
>> On Sat, Oct 31, 2009 at 11:36 PM, parkbomee <[hidden email]> wrote:
>> >
>> > Thank you both.
>> >
>> > However, using tapply() instead of a loop does not seem to improve my code much.
>> > I am using this inside of an optimization function,
>> > and it still takes more than it needs...
>> >
>> >
>> >
>> >> CC: [hidden email]; [hidden email]
>> >> From: [hidden email]
>> >> To: [hidden email]
>> >> Subject: Re: [R] avoiding loop
>> >> Date: Sat, 31 Oct 2009 22:26:17 -0400
>> >>
>> >> This is pretty much equivalent:
>> >>
>> >> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
>> >>          tapply(DF$value, DF$time, sum)
>> >>
>> >> And both will probably fail if the number of groups with choice==1 is
>> >> different than the number overall.
>> >>
>> >> --
>> >> David.
>> >>
>> >> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
>> >>
>> >> > one approach is the following:
>> >> >
>> >> > # say 'DF' is your data frame, then
>> >> > with(DF, {
>> >> >    ind <- choice == 1
>> >> >    n <- tapply(value[ind], time[ind], sum)
>> >> >    d <- tapply(value, time, sum)
>> >> >    n / d
>> >> > })
>> >> >
>> >> >
>> >> > I hope it helps.
>> >> >
>> >> > Best,
>> >> > Dimitris
>> >> >
>> >> >
>> >> > parkbomee wrote:
>> >> >> Hi all,
>> >> >> I am trying to figure out a way to improve my code's efficiency by
>> >> >> avoiding the use of loop.
>> >> >> I want to calculate a conditional mean(?) given time.
>> >> >> For example, from the data below, I want to calculate sum((value|
>> >> >> choice==1)/sum(value)) across time.
>> >> >> Is there a way to do it without using a loop?
>> >> >> time  cum_time  choice    value
>> >> >> 1         4             1           3
>> >> >> 1         4              0           2
>> >> >> 1          4             0           3
>> >> >> 1          4             0           3
>> >> >> 2         6             1           4
>> >> >> 2         6             0           4
>> >> >> 2         6             0           2
>> >> >> 2         6             0           4
>> >> >> 2         6             0           2
>> >> >> 2         6             0           2 3         4
>> >> >> 1           2 3         4             0           3 3
>> >> >> 4             0           5 3         4             0           2
>> >> >> My code looks like
>> >> >> objective[1] = value[1] / sum(value[1:cum_time[1])
>> >> >> for (i in 2:max(time)){
>> >> >>     objective[i] = value[cum_time[i-1]+1] /
>> >> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
>> >> >> }
>> >> >> sum(objective)
>> >> >> Anyone have an idea that I can do this without using a loop??
>> >> >> Thanks.
>> >> >>
>> >> >> _________________________________________________________________
>> >> >> [[elided Hotmail spam]]
>> >> >>    [[alternative HTML version deleted]]
>> >> >> ______________________________________________
>> >> >> [hidden email] mailing list
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> >> >> and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >> > --
>> >> > Dimitris Rizopoulos
>> >> > Assistant Professor
>> >> > Department of Biostatistics
>> >> > Erasmus University Medical Center
>> >> >
>> >> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> >> > Tel: +31/(0)10/7043478
>> >> > Fax: +31/(0)10/7043014
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >> David Winsemius, MD
>> >> Heritage Laboratories
>> >> West Hartford, CT
>> >>
>> >
>> > _________________________________________________________________
>> > [[elided Hotmail spam]]
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>      
> _________________________________________________________________
> [[elided Hotmail spam]]
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
parkbomee

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink

This is the Rprof() report by self time.
Is it also possible that these routines, which take long self.time, are causing the optim() to be slow?


$by.self
                        self.time self.pct total.time total.pct
"FUN"                       94.16     16.5      94.16      16.5
"unlist"                    80.46     14.1     120.54      21.1
"lapply"                    76.94     13.5     255.48      44.7
"match"                     60.76     10.6      60.88      10.7
"as.matrix.data.frame"      31.00      5.4      51.12       8.9
"as.character"              29.28      5.1      29.28       5.1
"unique.default"            24.36      4.3      24.40       4.3
"data.frame"                21.06      3.7      55.78       9.8
"split.default"             20.42      3.6      84.38      14.8
"tapply"                    13.84      2.4     414.28      72.5
"structure"                 11.32      2.0      22.36       3.9
"factor"                    11.08      1.9     127.68      22.3
"attributes<-"              11.00      1.9      11.00       1.9
"=="                        10.56      1.8      10.56       1.8
"%*%"                       10.30      1.8      10.30       1.8
"as.vector"                 10.22      1.8      10.22       1.8
"as.integer"                 9.86      1.7       9.86       1.7
"list"                       9.64      1.7       9.64       1.7
"exp"                        7.12      1.2       7.12       1.2
"as.data.frame.integer"      5.98      1.0       8.10       1.4

> To: [hidden email]
> CC: [hidden email]; [hidden email]
> Subject: Re: [R] avoiding loop
> From: [hidden email]
> Date: Sun, 1 Nov 2009 22:14:09 -0800
>
> parkbomee <[hidden email]> writes:
>
> > Thank you all.
> >
> > What Chuck has suggested might not be applicable since the number of
> > different times is around 40,000.
> >
> > The object of optimization in my function is the varying "value",
> > which is basically data * parameter, of which "parameter" is the
> > object of optimization..
> >  
> > And from the r profiling with a subset of data,
> > I got this report..any idea what "<Anonymous>" is?
> >
> >
> > $by.total
> >                         total.time total.pct self.time self.pct
> > "<Anonymous>"               571.56     100.0      0.02      0.0
> > "optim"                     571.56     100.0      0.00      0.0
> > "fn"                        571.54     100.0      0.98      0.2
>
> You're giving us 'by.total', so these are saying that all the time was
> spent in these functions or the functions they called. Probably all
> are in 'optim' and its arguments; since little self.time is spent
> here, there isn't much to work with
>
> > "eval"                      423.74      74.1      0.00      0.0
> > "with.default"              423.74      74.1      0.00      0.0
> > "with"                      423.74      74.1      0.00      0.0
>
> These are probably in the internals of optim, where the function
> you're trying to optimize is being set up for evaluation. Again
> there's little self.time, and all these say is that a big piece of the
> time is being spent in code called by this code.
>
> > "tapply"                    414.28      72.5     13.84      2.4
> > "lapply"                    255.48      44.7     76.94     13.5
> > "factor"                    127.68      22.3     11.08      1.9
> > "unlist"                    120.54      21.1     80.46     14.1
> > "FUN"                        94.16      16.5     94.16     16.5
>
> these look like they are tapply-related calls (looking at the code for
> tapply, it calls lapply, factor, and unlist, and FUN is the function
> argument to tapply), perhaps from the function you're optimizing (did
> you implement this as suggested below?  it would really help to have a
> possibly simplified version of the code you're calling).
>
> There is material to work with here, as apparently a fairly large
> amount of self.time is being spent in each of these functions. So
> here's a sample data set
>
>   n <- 100000
>   set.seed(123)
>   df <- data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
>                    value=ceiling(runif(n)*5))
>
> It would have been helpful for you to provide reproducible code like
> that above, so that the characteristics of your data were easily
> reproducible. Let's time tapply
>
> > replicate(5, {
> +     system.time(x0 <<- tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
> + })
> [1] 0.316 0.316 0.308 0.320 0.304
>
> tapply is quite general, but in your case I think you'd be happy with
>
>   tapply1 <- function(X, INDEX, FUN)
>       unlist(lapply(split(X, INDEX), FUN), use.names=FALSE)
>
> > replicate(5, {
> +     system.time(x1 <<- tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]]
> + })
> [1] 0.156 0.148 0.152 0.144 0.152
>
> so about twice the speed (timing depends quite a bit on what 'time' is,
> integer or numeric or character or factor). The vector values of the
> two calculations are identical, though tapply presents the data as an
> array with column names
>
> > identical(as.vector(x0), x1)
> [1] TRUE
>
> tapply allows FUN to be anything, but if the interest is in the sum of
> each time interval, and the time intervals can be assumed to be sorted
> (sorting is not expensive, so could be done on the fly), then
>
>   tapply2 <- function(X, INDEX)
>   {
>       csum <- cumsum(c(0, X))
>       idx <- diff(INDEX) != 0
>       csum[c(FALSE, idx, TRUE)] - csum[c(TRUE, idx, FALSE)]
>   }
>
> calculates the cumulative sum and the points in INDEX where the time
> intervals change. It then takes the difference over the appropriate
> interval.
>
> > replicate(5, {
> +     system.time(x2 <<- tapply2(df$value, df$time), gcFirst=TRUE)[[1]]
> + })
> [1] 0.024 0.024 0.024 0.024 0.024
> > identical(as.vector(x0), x2)
> [1] TRUE
>
> This approach could be subject to rounding error (if csum gets very
> large and the intervals remain small). To calculate values where
> choice == 1 I think you'd want to
>
>   tapply2(df$value * (df$choice==1), df$time)
>
> rather than sub-setting, so that the result of tapply2 is always a
> vector of the same length even when some time intervals never have
> choice==1.
>
> Because tapply in these examples seems so fast compared to your
> calculation, I wonder whether optim is evaluating your function many
> times, and that reformulating the optimization might lead to a very
> substantial speed-up?
>
> Martin
>
> > .
> > .
> > .
> > .
> > .
> >
> >
> >> Date: Sun, 1 Nov 2009 15:35:41 -0400
> >> Subject: Re: [R] avoiding loop
> >> From: [hidden email]
> >> To: [hidden email]
> >> CC: [hidden email]; [hidden email]; [hidden email]
> >>
> >> What you need to do is to understand how to use Rprof so that you can
> >> determine where the time is being spent.  It probably indicates that
> >> this is not the source of slowness in your optimization function.  How
> >> much time are we talking about?  You may spent more time trying to
> >> optimize the function than just running the current version even if it
> >> is "slow" (slow is a relative term and does not hold much meaning
> >> without some context round it).
> >>
> >> On Sat, Oct 31, 2009 at 11:36 PM, parkbomee <[hidden email]> wrote:
> >> >
> >> > Thank you both.
> >> >
> >> > However, using tapply() instead of a loop does not seem to improve my code much.
> >> > I am using this inside of an optimization function,
> >> > and it still takes more than it needs...
> >> >
> >> >
> >> >
> >> >> CC: [hidden email]; [hidden email]
> >> >> From: [hidden email]
> >> >> To: [hidden email]
> >> >> Subject: Re: [R] avoiding loop
> >> >> Date: Sat, 31 Oct 2009 22:26:17 -0400
> >> >>
> >> >> This is pretty much equivalent:
> >> >>
> >> >> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
> >> >>          tapply(DF$value, DF$time, sum)
> >> >>
> >> >> And both will probably fail if the number of groups with choice==1 is
> >> >> different than the number overall.
> >> >>
> >> >> --
> >> >> David.
> >> >>
> >> >> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
> >> >>
> >> >> > one approach is the following:
> >> >> >
> >> >> > # say 'DF' is your data frame, then
> >> >> > with(DF, {
> >> >> >    ind <- choice == 1
> >> >> >    n <- tapply(value[ind], time[ind], sum)
> >> >> >    d <- tapply(value, time, sum)
> >> >> >    n / d
> >> >> > })
> >> >> >
> >> >> >
> >> >> > I hope it helps.
> >> >> >
> >> >> > Best,
> >> >> > Dimitris
> >> >> >
> >> >> >
> >> >> > parkbomee wrote:
> >> >> >> Hi all,
> >> >> >> I am trying to figure out a way to improve my code's efficiency by
> >> >> >> avoiding the use of loop.
> >> >> >> I want to calculate a conditional mean(?) given time.
> >> >> >> For example, from the data below, I want to calculate sum((value|
> >> >> >> choice==1)/sum(value)) across time.
> >> >> >> Is there a way to do it without using a loop?
> >> >> >> time  cum_time  choice    value
> >> >> >> 1         4             1           3
> >> >> >> 1         4              0           2
> >> >> >> 1          4             0           3
> >> >> >> 1          4             0           3
> >> >> >> 2         6             1           4
> >> >> >> 2         6             0           4
> >> >> >> 2         6             0           2
> >> >> >> 2         6             0           4
> >> >> >> 2         6             0           2
> >> >> >> 2         6             0           2 3         4
> >> >> >> 1           2 3         4             0           3 3
> >> >> >> 4             0           5 3         4             0           2
> >> >> >> My code looks like
> >> >> >> objective[1] = value[1] / sum(value[1:cum_time[1])
> >> >> >> for (i in 2:max(time)){
> >> >> >>     objective[i] = value[cum_time[i-1]+1] /
> >> >> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
> >> >> >> }
> >> >> >> sum(objective)
> >> >> >> Anyone have an idea that I can do this without using a loop??
> >> >> >> Thanks.
> >> >> >>
> >> >> >> _________________________________________________________________
> >> >> >> [[elided Hotmail spam]]
> >> >> >>    [[alternative HTML version deleted]]
> >> >> >> ______________________________________________
> >> >> >> [hidden email] mailing list
> >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >> >
> >> >> > --
> >> >> > Dimitris Rizopoulos
> >> >> > Assistant Professor
> >> >> > Department of Biostatistics
> >> >> > Erasmus University Medical Center
> >> >> >
> >> >> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> >> >> > Tel: +31/(0)10/7043478
> >> >> > Fax: +31/(0)10/7043014
> >> >> >
> >> >> > ______________________________________________
> >> >> > [hidden email] mailing list
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >> David Winsemius, MD
> >> >> Heritage Laboratories
> >> >> West Hartford, CT
> >> >>
> >> >
> >> > _________________________________________________________________
> >> > [[elided Hotmail spam]]
> >> >        [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >>
> >>
> >> --
> >> Jim Holtman
> >> Cincinnati, OH
> >> +1 513 646 9390
> >>
> >> What is the problem that you are trying to solve?
> >      
> > _________________________________________________________________
> > [[elided Hotmail spam]]
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
     
_________________________________________________________________
[[elided Hotmail spam]]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
jholtman

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink
The first thing I would suggest is convert your dataframes to matrices
so that you are not having to continually convert them in the calls to
the functions.  Also I am not sure what the code:

        realized_prob = with(DF, {
      ind <- (CHOSEN == 1)
      n <- tapply(theta_multiple[ind], CS[ind], sum)
      d <- tapply(theta_multiple, CS, sum)
      n / d
                                                })

is doing.  It looks like 'n' and 'd' might have different lengths
since they are being created by two different (CS & CS[ind])
sequences.  I have no idea why you are converting to the "DF"
dataframe.  THere is no need for that.  You could just leave the
vectors (e.g., theta_multiple, CS and ind) as they are and work with
them.  This is probably where most of your time is being spent.  So if
you start with matrices and leave the dataframes out of the main loop
you will probably see an increase in performance.

2009/11/2 parkbomee <[hidden email]>:

> This is the Rprof() report by self time.
> Is it also possible that these routines, which take long self.time, are
> causing the optim() to be slow?
>
>
> $by.self
>                         self.time self.pct total.time total.pct
> "FUN"                       94.16     16.5      94.16      16.5
> "unlist"                    80.46     14.1     120.54      21.1
> "lapply"                    76.94     13.5     255.48      44.7
> "match"                     60.76     10.6      60.88      10.7
> "as.matrix.data.frame"      31.00      5.4      51.12       8.9
> "as.character"              29.28      5.1      29.28       5.1
> "unique.default"            24.36      4.3      24.40       4.3
> "data.frame"                21.06      3.7      55.78       9.8
> "split.default"             20.42      3.6      84.38      14.8
> "tapply"                    13.84      2.4     414.28      72.5
> "structure"                 11.32      2.0      22.36       3.9
> "factor"                    11.08      1.9     127.68      22.3
> "attributes<-"              11.00      1.9      11.00       1.9
> "=="                        10.56      1.8      10.56       1.8
> "%*%"                       10.30      1.8      10.30       1.8
> "as.vector"                 10.22      1.8      10.22       1.8
> "as.integer"                 9.86      1.7       9.86       1.7
> "list"                       9.64      1.7       9.64       1.7
> "exp"                        7.12      1.2       7.12       1.2
> "as.data.frame.integer"      5.98      1.0       8.10       1.4
>
>> To: [hidden email]
>> CC: [hidden email]; [hidden email]
>> Subject: Re: [R] avoiding loop
>> From: [hidden email]
>> Date: Sun, 1 Nov 2009 22:14:09 -0800
>>
>> parkbomee <[hidden email]> writes:
>>
>> > Thank you all.
>> >
>> > What Chuck has suggested might not be applicable since the number of
>> > different times is around 40,000.
>> >
>> > The object of optimization in my function is the varying "value",
>> > which is basically data * parameter, of which "parameter" is the
>> > object of optimization..
>> >
>> > And from the r profiling with a subset of data,
>> > I got this report..any idea what "<Anonymous>" is?
>> >
>> >
>> > $by.total
>> > total.time total.pct self.time self.pct
>> > "<Anonymous>" 571.56 100.0 0.02 0.0
>> > "optim" 571.56 100.0 0.00 0.0
>> > "fn" 571.54 100.0 0.98 0.2
>>
>> You're giving us 'by.total', so these are saying that all the time was
>> spent in these functions or the functions they called. Probably all
>> are in 'optim' and its arguments; since little self.time is spent
>> here, there isn't much to work with
>>
>> > "eval" 423.74 74.1 0.00 0.0
>> > "with.default" 423.74 74.1 0.00 0.0
>> > "with" 423.74 74.1 0.00 0.0
>>
>> These are probably in the internals of optim, where the function
>> you're trying to optimize is being set up for evaluation. Again
>> there's little self.time, and all these say is that a big piece of the
>> time is being spent in code called by this code.
>>
>> > "tapply" 414.28 72.5 13.84 2.4
>> > "lapply" 255.48 44.7 76.94 13.5
>> > "factor" 127.68 22.3 11.08 1.9
>> > "unlist" 120.54 21.1 80.46 14.1
>> > "FUN" 94.16 16.5 94.16 16.5
>>
>> these look like they are tapply-related calls (looking at the code for
>> tapply, it calls lapply, factor, and unlist, and FUN is the function
>> argument to tapply), perhaps from the function you're optimizing (did
>> you implement this as suggested below? it would really help to have a
>> possibly simplified version of the code you're calling).
>>
>> There is material to work with here, as apparently a fairly large
>> amount of self.time is being spent in each of these functions. So
>> here's a sample data set
>>
>> n <- 100000
>> set.seed(123)
>> df <- data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
>> value=ceiling(runif(n)*5))
>>
>> It would have been helpful for you to provide reproducible code like
>> that above, so that the characteristics of your data were easily
>> reproducible. Let's time tapply
>>
>> > replicate(5, {
>> + system.time(x0 <<- tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
>> + })
>> [1] 0.316 0.316 0.308 0.320 0.304
>>
>> tapply is quite general, but in your case I think you'd be happy with
>>
>> tapply1 <- function(X, INDEX, FUN)
>> unlist(lapply(split(X, INDEX), FUN), use.names=FALSE)
>>
>> > replicate(5, {
>> + system.time(x1 <<- tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]]
>> + })
>> [1] 0.156 0.148 0.152 0.144 0.152
>>
>> so about twice the speed (timing depends quite a bit on what 'time' is,
>> integer or numeric or character or factor). The vector values of the
>> two calculations are identical, though tapply presents the data as an
>> array with column names
>>
>> > identical(as.vector(x0), x1)
>> [1] TRUE
>>
>> tapply allows FUN to be anything, but if the interest is in the sum of
>> each time interval, and the time intervals can be assumed to be sorted
>> (sorting is not expensive, so could be done on the fly), then
>>
>> tapply2 <- function(X, INDEX)
>> {
>> csum <- cumsum(c(0, X))
>> idx <- diff(INDEX) != 0
>> csum[c(FALSE, idx, TRUE)] - csum[c(TRUE, idx, FALSE)]
>> }
>>
>> calculates the cumulative sum and the points in INDEX where the time
>> intervals change. It then takes the difference over the appropriate
>> interval.
>>
>> > replicate(5, {
>> + system.time(x2 <<- tapply2(df$value, df$time), gcFirst=TRUE)[[1]]
>> + })
>> [1] 0.024 0.024 0.024 0.024 0.024
>> > identical(as.vector(x0), x2)
>> [1] TRUE
>>
>> This approach could be subject to rounding error (if csum gets very
>> large and the intervals remain small). To calculate values where
>> choice == 1 I think you'd want to
>>
>> tapply2(df$value * (df$choice==1), df$time)
>>
>> rather than sub-setting, so that the result of tapply2 is always a
>> vector of the same length even when some time intervals never have
>> choice==1.
>>
>> Because tapply in these examples seems so fast compared to your
>> calculation, I wonder whether optim is evaluating your function many
>> times, and that reformulating the optimization might lead to a very
>> substantial speed-up?
>>
>> Martin
>>
>> > .
>> > .
>> > .
>> > .
>> > .
>> >
>> >
>> >> Date: Sun, 1 Nov 2009 15:35:41 -0400
>> >> Subject: Re: [R] avoiding loop
>> >> From: [hidden email]
>> >> To: [hidden email]
>> >> CC: [hidden email]; [hidden email];
>> >> [hidden email]
>> >>
>> >> What you need to do is to understand how to use Rprof so that you can
>> >> determine where the time is being spent. It probably indicates that
>> >> this is not the source of slowness in your optimization function. How
>> >> much time are we talking about? You may spent more time trying to
>> >> optimize the function than just running the current version even if it
>> >> is "slow" (slow is a relative term and does not hold much meaning
>> >> without some context round it).
>> >>
>> >> On Sat, Oct 31, 2009 at 11:36 PM, parkbomee <[hidden email]>
>> >> wrote:
>> >> >
>> >> > Thank you both.
>> >> >
>> >> > However, using tapply() instead of a loop does not seem to improve my
>> >> > code much.
>> >> > I am using this inside of an optimization function,
>> >> > and it still takes more than it needs...
>> >> >
>> >> >
>> >> >
>> >> >> CC: [hidden email]; [hidden email]
>> >> >> From: [hidden email]
>> >> >> To: [hidden email]
>> >> >> Subject: Re: [R] avoiding loop
>> >> >> Date: Sat, 31 Oct 2009 22:26:17 -0400
>> >> >>
>> >> >> This is pretty much equivalent:
>> >> >>
>> >> >> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
>> >> >> tapply(DF$value, DF$time, sum)
>> >> >>
>> >> >> And both will probably fail if the number of groups with choice==1
>> >> >> is
>> >> >> different than the number overall.
>> >> >>
>> >> >> --
>> >> >> David.
>> >> >>
>> >> >> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
>> >> >>
>> >> >> > one approach is the following:
>> >> >> >
>> >> >> > # say 'DF' is your data frame, then
>> >> >> > with(DF, {
>> >> >> > ind <- choice == 1
>> >> >> > n <- tapply(value[ind], time[ind], sum)
>> >> >> > d <- tapply(value, time, sum)
>> >> >> > n / d
>> >> >> > })
>> >> >> >
>> >> >> >
>> >> >> > I hope it helps.
>> >> >> >
>> >> >> > Best,
>> >> >> > Dimitris
>> >> >> >
>> >> >> >
>> >> >> > parkbomee wrote:
>> >> >> >> Hi all,
>> >> >> >> I am trying to figure out a way to improve my code's efficiency
>> >> >> >> by
>> >> >> >> avoiding the use of loop.
>> >> >> >> I want to calculate a conditional mean(?) given time.
>> >> >> >> For example, from the data below, I want to calculate sum((value|
>> >> >> >> choice==1)/sum(value)) across time.
>> >> >> >> Is there a way to do it without using a loop?
>> >> >> >> time cum_time choice value
>> >> >> >> 1 4 1 3
>> >> >> >> 1 4 0 2
>> >> >> >> 1 4 0 3
>> >> >> >> 1 4 0 3
>> >> >> >> 2 6 1 4
>> >> >> >> 2 6 0 4
>> >> >> >> 2 6 0 2
>> >> >> >> 2 6 0 4
>> >> >> >> 2 6 0 2
>> >> >> >> 2 6 0 2 3 4
>> >> >> >> 1 2 3 4 0 3 3
>> >> >> >> 4 0 5 3 4 0 2
>> >> >> >> My code looks like
>> >> >> >> objective[1] = value[1] / sum(value[1:cum_time[1])
>> >> >> >> for (i in 2:max(time)){
>> >> >> >> objective[i] = value[cum_time[i-1]+1] /
>> >> >> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
>> >> >> >> }
>> >> >> >> sum(objective)
>> >> >> >> Anyone have an idea that I can do this without using a loop??
>> >> >> >> Thanks.
>> >> >> >>
>> >> >> >> _________________________________________________________________
>> >> >> >> [[elided Hotmail spam]]
>> >> >> >> [[alternative HTML version deleted]]
>> >> >> >> ______________________________________________
>> >> >> >> [hidden email] mailing list
>> >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> >> PLEASE do read the posting guide
>> >> >> >> http://www.R-project.org/posting-guide.html
>> >> >> >> and provide commented, minimal, self-contained, reproducible
>> >> >> >> code.
>> >> >> >
>> >> >> > --
>> >> >> > Dimitris Rizopoulos
>> >> >> > Assistant Professor
>> >> >> > Department of Biostatistics
>> >> >> > Erasmus University Medical Center
>> >> >> >
>> >> >> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> >> >> > Tel: +31/(0)10/7043478
>> >> >> > Fax: +31/(0)10/7043014
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > [hidden email] mailing list
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide
>> >> >> > http://www.R-project.org/posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >>
>> >> >> David Winsemius, MD
>> >> >> Heritage Laboratories
>> >> >> West Hartford, CT
>> >> >>
>> >> >
>> >> > _________________________________________________________________
>> >> > [[elided Hotmail spam]]
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jim Holtman
>> >> Cincinnati, OH
>> >> +1 513 646 9390
>> >>
>> >> What is the problem that you are trying to solve?
>> >
>> > _________________________________________________________________
>> > [[elided Hotmail spam]]
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>
> ________________________________
> 내 맘대로 바꾸는 배경 화면 색, 더욱 다양해진 홈페이지 슬라이드 쇼 구성! 확 달라진 MSN 홈페이지!



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
parkbomee

Re: avoiding loop

Reply Threaded More More options
Print post
Permalink


Thanks for your help.


> Date: Mon, 2 Nov 2009 18:50:42 -0500
> Subject: Re: [R] avoiding loop
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]
>
> The first thing I would suggest is convert your dataframes to matrices
> so that you are not having to continually convert them in the calls to
> the functions.  Also I am not sure what the code:
>
> realized_prob = with(DF, {
>       ind <- (CHOSEN == 1)
>       n <- tapply(theta_multiple[ind], CS[ind], sum)
>       d <- tapply(theta_multiple, CS, sum)
>       n / d
> })
>
> is doing.  It looks like 'n' and 'd' might have different lengths
> since they are being created by two different (CS & CS[ind])
> sequences.  I have no idea why you are converting to the "DF"
> dataframe.  THere is no need for that.  You could just leave the
> vectors (e.g., theta_multiple, CS and ind) as they are and work with
> them.  This is probably where most of your time is being spent.  So if
> you start with matrices and leave the dataframes out of the main loop
> you will probably see an increase in performance.
>
> 2009/11/2 parkbomee <[hidden email]>:
> > This is the Rprof() report by self time.
> > Is it also possible that these routines, which take long self.time, are
> > causing the optim() to be slow?
> >
> >
> > $by.self
> >                         self.time self.pct total.time total.pct
> > "FUN"                       94.16     16.5      94.16      16.5
> > "unlist"                    80.46     14.1     120.54      21.1
> > "lapply"                    76.94     13.5     255.48      44.7
> > "match"                     60.76     10.6      60.88      10.7
> > "as.matrix.data.frame"      31.00      5.4      51.12       8.9
> > "as.character"              29.28      5.1      29.28       5.1
> > "unique.default"            24.36      4.3      24.40       4.3
> > "data.frame"                21.06      3.7      55.78       9.8
> > "split.default"             20.42      3.6      84.38      14.8
> > "tapply"                    13.84      2.4     414.28      72.5
> > "structure"                 11.32      2.0      22.36       3.9
> > "factor"                    11.08      1.9     127.68      22.3
> > "attributes<-"              11.00      1.9      11.00       1.9
> > "=="                        10.56      1.8      10.56       1.8
> > "%*%"                       10.30      1.8      10.30       1.8
> > "as.vector"                 10.22      1.8      10.22       1.8
> > "as.integer"                 9.86      1.7       9.86       1.7
> > "list"                       9.64      1.7       9.64       1.7
> > "exp"                        7.12      1.2       7.12       1.2
> > "as.data.frame.integer"      5.98      1.0       8.10       1.4
> >
> >> To: [hidden email]
> >> CC: [hidden email]; [hidden email]
> >> Subject: Re: [R] avoiding loop
> >> From: [hidden email]
> >> Date: Sun, 1 Nov 2009 22:14:09 -0800
> >>
> >> parkbomee <[hidden email]> writes:
> >>
> >> > Thank you all.
> >> >
> >> > What Chuck has suggested might not be applicable since the number of
> >> > different times is around 40,000.
> >> >
> >> > The object of optimization in my function is the varying "value",
> >> > which is basically data * parameter, of which "parameter" is the
> >> > object of optimization..
> >> >
> >> > And from the r profiling with a subset of data,
> >> > I got this report..any idea what "<Anonymous>" is?
> >> >
> >> >
> >> > $by.total
> >> > total.time total.pct self.time self.pct
> >> > "<Anonymous>" 571.56 100.0 0.02 0.0
> >> > "optim" 571.56 100.0 0.00 0.0
> >> > "fn" 571.54 100.0 0.98 0.2
> >>
> >> You're giving us 'by.total', so these are saying that all the time was
> >> spent in these functions or the functions they called. Probably all
> >> are in 'optim' and its arguments; since little self.time is spent
> >> here, there isn't much to work with
> >>
> >> > "eval" 423.74 74.1 0.00 0.0
> >> > "with.default" 423.74 74.1 0.00 0.0
> >> > "with" 423.74 74.1 0.00 0.0
> >>
> >> These are probably in the internals of optim, where the function
> >> you're trying to optimize is being set up for evaluation. Again
> >> there's little self.time, and all these say is that a big piece of the
> >> time is being spent in code called by this code.
> >>
> >> > "tapply" 414.28 72.5 13.84 2.4
> >> > "lapply" 255.48 44.7 76.94 13.5
> >> > "factor" 127.68 22.3 11.08 1.9
> >> > "unlist" 120.54 21.1 80.46 14.1
> >> > "FUN" 94.16 16.5 94.16 16.5
> >>
> >> these look like they are tapply-related calls (looking at the code for
> >> tapply, it calls lapply, factor, and unlist, and FUN is the function
> >> argument to tapply), perhaps from the function you're optimizing (did
> >> you implement this as suggested below? it would really help to have a
> >> possibly simplified version of the code you're calling).
> >>
> >> There is material to work with here, as apparently a fairly large
> >> amount of self.time is being spent in each of these functions. So
> >> here's a sample data set
> >>
> >> n <- 100000
> >> set.seed(123)
> >> df <- data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
> >> value=ceiling(runif(n)*5))
> >>
> >> It would have been helpful for you to provide reproducible code like
> >> that above, so that the characteristics of your data were easily
> >> reproducible. Let's time tapply
> >>
> >> > replicate(5, {
> >> + system.time(x0 <<- tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
> >> + })
> >> [1] 0.316 0.316 0.308 0.320 0.304
> >>
> >> tapply is quite general, but in your case I think you'd be happy with
> >>
> >> tapply1 <- function(X, INDEX, FUN)
> >> unlist(lapply(split(X, INDEX), FUN), use.names=FALSE)
> >>
> >> > replicate(5, {
> >> + system.time(x1 <<- tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]]
> >> + })
> >> [1] 0.156 0.148 0.152 0.144 0.152
> >>
> >> so about twice the speed (timing depends quite a bit on what 'time' is,
> >> integer or numeric or character or factor). The vector values of the
> >> two calculations are identical, though tapply presents the data as an
> >> array with column names
> >>
> >> > identical(as.vector(x0), x1)
> >> [1] TRUE
> >>
> >> tapply allows FUN to be anything, but if the interest is in the sum of
> >> each time interval, and the time intervals can be assumed to be sorted
> >> (sorting is not expensive, so could be done on the fly), then
> >>
> >> tapply2 <- function(X, INDEX)
> >> {
> >> csum <- cumsum(c(0, X))
> >> idx <- diff(INDEX) != 0
> >> csum[c(FALSE, idx, TRUE)] - csum[c(TRUE, idx, FALSE)]
> >> }
> >>
> >> calculates the cumulative sum and the points in INDEX where the time
> >> intervals change. It then takes the difference over the appropriate
> >> interval.
> >>
> >> > replicate(5, {
> >> + system.time(x2 <<- tapply2(df$value, df$time), gcFirst=TRUE)[[1]]
> >> + })
> >> [1] 0.024 0.024 0.024 0.024 0.024
> >> > identical(as.vector(x0), x2)
> >> [1] TRUE
> >>
> >> This approach could be subject to rounding error (if csum gets very
> >> large and the intervals remain small). To calculate values where
> >> choice == 1 I think you'd want to
> >>
> >> tapply2(df$value * (df$choice==1), df$time)
> >>
> >> rather than sub-setting, so that the result of tapply2 is always a
> >> vector of the same length even when some time intervals never have
> >> choice==1.
> >>
> >> Because tapply in these examples seems so fast compared to your
> >> calculation, I wonder whether optim is evaluating your function many
> >> times, and that reformulating the optimization might lead to a very
> >> substantial speed-up?
> >>
> >> Martin
> >>
> >> > .
> >> > .
> >> > .
> >> > .
> >> > .
> >> >
> >> >
> >> >> Date: Sun, 1 Nov 2009 15:35:41 -0400
> >> >> Subject: Re: [R] avoiding loop
> >> >> From: [hidden email]
> >> >> To: [hidden email]
> >> >> CC: [hidden email]; [hidden email];
> >> >> [hidden email]
> >> >>
> >> >> What you need to do is to understand how to use Rprof so that you can
> >> >> determine where the time is being spent. It probably indicates that
> >> >> this is not the source of slowness in your optimization function. How
> >> >> much time are we talking about? You may spent more time trying to
> >> >> optimize the function than just running the current version even if it
> >> >> is "slow" (slow is a relative term and does not hold much meaning
> >> >> without some context round it).
> >> >>
> >> >> On Sat, Oct 31, 2009 at 11:36 PM, parkbomee <[hidden email]>
> >> >> wrote:
> >> >> >
> >> >> > Thank you both.
> >> >> >
> >> >> > However, using tapply() instead of a loop does not seem to improve my
> >> >> > code much.
> >> >> > I am using this inside of an optimization function,
> >> >> > and it still takes more than it needs...
> >> >> >
> >> >> >
> >> >> >
> >> >> >> CC: [hidden email]; [hidden email]
> >> >> >> From: [hidden email]
> >> >> >> To: [hidden email]
> >> >> >> Subject: Re: [R] avoiding loop
> >> >> >> Date: Sat, 31 Oct 2009 22:26:17 -0400
> >> >> >>
> >> >> >> This is pretty much equivalent:
> >> >> >>
> >> >> >> tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
> >> >> >> tapply(DF$value, DF$time, sum)
> >> >> >>
> >> >> >> And both will probably fail if the number of groups with choice==1
> >> >> >> is
> >> >> >> different than the number overall.
> >> >> >>
> >> >> >> --
> >> >> >> David.
> >> >> >>
> >> >> >> On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
> >> >> >>
> >> >> >> > one approach is the following:
> >> >> >> >
> >> >> >> > # say 'DF' is your data frame, then
> >> >> >> > with(DF, {
> >> >> >> > ind <- choice == 1
> >> >> >> > n <- tapply(value[ind], time[ind], sum)
> >> >> >> > d <- tapply(value, time, sum)
> >> >> >> > n / d
> >> >> >> > })
> >> >> >> >
> >> >> >> >
> >> >> >> > I hope it helps.
> >> >> >> >
> >> >> >> > Best,
> >> >> >> > Dimitris
> >> >> >> >
> >> >> >> >
> >> >> >> > parkbomee wrote:
> >> >> >> >> Hi all,
> >> >> >> >> I am trying to figure out a way to improve my code's efficiency
> >> >> >> >> by
> >> >> >> >> avoiding the use of loop.
> >> >> >> >> I want to calculate a conditional mean(?) given time.
> >> >> >> >> For example, from the data below, I want to calculate sum((value|
> >> >> >> >> choice==1)/sum(value)) across time.
> >> >> >> >> Is there a way to do it without using a loop?
> >> >> >> >> time cum_time choice value
> >> >> >> >> 1 4 1 3
> >> >> >> >> 1 4 0 2
> >> >> >> >> 1 4 0 3
> >> >> >> >> 1 4 0 3
> >> >> >> >> 2 6 1 4
> >> >> >> >> 2 6 0 4
> >> >> >> >> 2 6 0 2
> >> >> >> >> 2 6 0 4
> >> >> >> >> 2 6 0 2
> >> >> >> >> 2 6 0 2 3 4
> >> >> >> >> 1 2 3 4 0 3 3
> >> >> >> >> 4 0 5 3 4 0 2
> >> >> >> >> My code looks like
> >> >> >> >> objective[1] = value[1] / sum(value[1:cum_time[1])
> >> >> >> >> for (i in 2:max(time)){
> >> >> >> >> objective[i] = value[cum_time[i-1]+1] /
> >> >> >> >> sum(value[(cum_time[i-1]+1) : cum_time[i])])
> >> >> >> >> }
> >> >> >> >> sum(objective)
> >> >> >> >> Anyone have an idea that I can do this without using a loop??
> >> >> >> >> Thanks.
> >> >> >> >>
> >> >> >> >> _________________________________________________________________
> >> >> >> >> [[elided Hotmail spam]]
> >> >> >> >> [[alternative HTML version deleted]]
> >> >> >> >> ______________________________________________
> >> >> >> >> [hidden email] mailing list
> >> >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> >> >> PLEASE do read the posting guide
> >> >> >> >> http://www.R-project.org/posting-guide.html
> >> >> >> >> and provide commented, minimal, self-contained, reproducible
> >> >> >> >> code.
> >> >> >> >
> >> >> >> > --
> >> >> >> > Dimitris Rizopoulos
> >> >> >> > Assistant Professor
> >> >> >> > Department of Biostatistics
> >> >> >> > Erasmus University Medical Center
> >> >> >> >
> >> >> >> > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> >> >> >> > Tel: +31/(0)10/7043478
> >> >> >> > Fax: +31/(0)10/7043014
> >> >> >> >
> >> >> >> > ______________________________________________
> >> >> >> > [hidden email] mailing list
> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> >> > PLEASE do read the posting guide
> >> >> >> > http://www.R-project.org/posting-guide.html
> >> >> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >> >>
> >> >> >> David Winsemius, MD
> >> >> >> Heritage Laboratories
> >> >> >> West Hartford, CT
> >> >> >>
> >> >> >
> >> >> > _________________________________________________________________
> >> >> > [[elided Hotmail spam]]
> >> >> > [[alternative HTML version deleted]]
> >> >> >
> >> >> > ______________________________________________
> >> >> > [hidden email] mailing list
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide
> >> >> > http://www.R-project.org/posting-guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jim Holtman
> >> >> Cincinnati, OH
> >> >> +1 513 646 9390
> >> >>
> >> >> What is the problem that you are trying to solve?
> >> >
> >> > _________________________________________________________________
> >> > [[elided Hotmail spam]]
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Martin Morgan
> >> Computational Biology / Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N.
> >> PO Box 19024 Seattle, WA 98109
> >>
> >> Location: Arnold Building M1 B861
> >> Phone: (206) 667-2793
> >
> > ________________________________
> > ³» ¸¾´ë·Î ¹Ù²Ù´Â ¹è°æ È­¸é »ö, ´õ¿í ´Ù¾çÇØÁø ȨÆäÀÌÁö ½½¶óÀÌµå ¼î ±¸¼º! È® ´Þ¶óÁø MSN ȨÆäÀÌÁö!
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
     
_________________________________________________________________
À©µµ¿ì ¶óÀ̺ê ÄûÁîÇ®°í~¼±¹° ¹ÞÀ¸¼¼¿ä~

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.