Discretising intra-day data using zoo?

5 messages Options
Embed this post
Permalink
Ajay Shah

Discretising intra-day data using zoo?

Reply Threaded More More options
Print post
Permalink
Folks,

I have a zoo object where the time-stamps are intra-day with
sub-second resolution. Can you take a look:

  library(zoo)
  print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
  options("digits.secs"=6)
  head(demo)
  tail(demo)

My question is: How do I force this down to a uniform grid of (say)
four second resolution. In that case, we'd have readings for

    10:30:00
    10:30:04
    10:30:08
    10:30:12

out of this dataset. As with the standard
  zoo::aggregate(blah, tail, 1)
we'd take the last available record as of 10:30:08 and put this
information for that timepoint.

Suppose there is not a single record in the raw data from 10:30:04 to
10:30:09. Despite this, the resulting object should contain a record
for 10:30:08 with NA values (which can then be filled out e.g. using
na.locf()). How would we do this? This problem is not present in this
data, where records are plentiful. But discretisation code should be
general and handle this case right.

--
Ajay Shah                                      http://www.mayin.org/ajayshah 
[hidden email]                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.
Gabor Grothendieck

Re: Discretising intra-day data using zoo?

Reply Threaded More More options
Print post
Permalink
See the aggregate.zoo example in vignette("zoo-quickref") but round up
to the next 4 seconds instead of next Friday:

> to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01")
> aggregate(demo, to4sec, tail, 1)
                    spread    ltp
2009-02-16 05:00:04 0.0050 48.715
2009-02-16 05:00:08 0.0025 48.715
2009-02-16 05:00:12 0.0025 48.715
2009-02-16 05:00:16 0.0025 48.715


On Sun, Nov 8, 2009 at 2:10 AM, Ajay Shah <[hidden email]> wrote:

> Folks,
>
> I have a zoo object where the time-stamps are intra-day with
> sub-second resolution. Can you take a look:
>
>  library(zoo)
>  print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
>  options("digits.secs"=6)
>  head(demo)
>  tail(demo)
>
> My question is: How do I force this down to a uniform grid of (say)
> four second resolution. In that case, we'd have readings for
>
>    10:30:00
>    10:30:04
>    10:30:08
>    10:30:12
>
> out of this dataset. As with the standard
>  zoo::aggregate(blah, tail, 1)
> we'd take the last available record as of 10:30:08 and put this
> information for that timepoint.
>
> Suppose there is not a single record in the raw data from 10:30:04 to
> 10:30:09. Despite this, the resulting object should contain a record
> for 10:30:08 with NA values (which can then be filled out e.g. using
> na.locf()). How would we do this? This problem is not present in this
> data, where records are plentiful. But discretisation code should be
> general and handle this case right.
>
> --
> Ajay Shah                                      http://www.mayin.org/ajayshah
> [hidden email]                             http://ajayshahblog.blogspot.com
> <*(:-? - wizard who doesn't know the answer.
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.
Ajay Shah

Re: Discretising intra-day data using zoo?

Reply Threaded More More options
Print post
Permalink
library(zoo)
print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
options("digits.secs"=6)
head(demo)
tail(demo)

On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:

> See the aggregate.zoo example in vignette("zoo-quickref") but round up
> to the next 4 seconds instead of next Friday:
>
> > to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01")
> > aggregate(demo, to4sec, tail, 1)
>                     spread    ltp
> 2009-02-16 05:00:04 0.0050 48.715
> 2009-02-16 05:00:08 0.0025 48.715
> 2009-02-16 05:00:12 0.0025 48.715
> 2009-02-16 05:00:16 0.0025 48.715

Gabor, thanks! I am not as fluent with as.POSIXct() as I should be.

And, to continue with my original question:

> > Suppose there is not a single record in the raw data from 10:30:04 to
> > 10:30:09. Despite this, the resulting object should contain a record
> > for 10:30:08 with NA values (which can then be filled out e.g. using
> > na.locf()). How would we do this? This problem is not present in this
> > data, where records are plentiful. But discretisation code should be
> > general and handle this case right.

How would we do this? To illustrate:

  demo2 <- demo[-300:-700,]
  plot(index(demo2), 1:599, type="l")         # we see that 5th to 10th
                                              # second is zapped out.
  to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01")


Now :

> aggregate(demo, to5sec, tail, 1)
                    spread    ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:10 0.0025 48.715
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715
> aggregate(demo2, to5sec, tail, 1)
                    spread    ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715  

We should get :

                    spread    ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:10 NA     NA
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715  

--
Ajay Shah                                      http://www.mayin.org/ajayshah 
[hidden email]                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.
Gabor Grothendieck

Re: Discretising intra-day data using zoo?

Reply Threaded More More options
Print post
Permalink
On Sun, Nov 8, 2009 at 7:58 AM, Ajay Shah <[hidden email]> wrote:

> library(zoo)
> print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
> options("digits.secs"=6)
> head(demo)
> tail(demo)
>
> On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:
>> See the aggregate.zoo example in vignette("zoo-quickref") but round up
>> to the next 4 seconds instead of next Friday:
>>
>> > to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01")
>> > aggregate(demo, to4sec, tail, 1)
>>                     spread    ltp
>> 2009-02-16 05:00:04 0.0050 48.715
>> 2009-02-16 05:00:08 0.0025 48.715
>> 2009-02-16 05:00:12 0.0025 48.715
>> 2009-02-16 05:00:16 0.0025 48.715
>
> Gabor, thanks! I am not as fluent with as.POSIXct() as I should be.
>
> And, to continue with my original question:
>
>> > Suppose there is not a single record in the raw data from 10:30:04 to
>> > 10:30:09. Despite this, the resulting object should contain a record
>> > for 10:30:08 with NA values (which can then be filled out e.g. using
>> > na.locf()). How would we do this? This problem is not present in this
>> > data, where records are plentiful. But discretisation code should be
>> > general and handle this case right.
>
> How would we do this? To illustrate:
>
>  demo2 <- demo[-300:-700,]
>  plot(index(demo2), 1:599, type="l")         # we see that 5th to 10th
>                                              # second is zapped out.
>  to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01")
>
>
> Now :
>
>> aggregate(demo, to5sec, tail, 1)
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:10 0.0025 48.715
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>> aggregate(demo2, to5sec, tail, 1)
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>
> We should get :
>
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:10 NA     NA
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>

The trick is that converting to ts makes the series regular (as that
is the only thing ts can represent) so just convert it to ts and then
back to zoo.  Since ts cannot represent POSIXct what you get back will
not have the POSIXct class= attribute set so just set it yourself.

> # aggregate to 5 seconds
> ag <- aggregate(demo2, to5sec, tail, 1)
>
> # make regular (this will strip class from time)
> ag.fill <- as.zoo(as.ts(ag))
>
> # put class back on time
> time(ag.fill) <- structure(time(ag.fill), class = class(time(ag)))
> ag.fill
                    spread    ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:10     NA     NA
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.
Gabor Grothendieck

Re: Discretising intra-day data using zoo?

Reply Threaded More More options
Print post
Permalink
On Sun, Nov 8, 2009 at 9:13 AM, Gabor Grothendieck
<[hidden email]> wrote:

> On Sun, Nov 8, 2009 at 7:58 AM, Ajay Shah <[hidden email]> wrote:
>> library(zoo)
>> print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
>> options("digits.secs"=6)
>> head(demo)
>> tail(demo)
>>
>> On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:
>>> See the aggregate.zoo example in vignette("zoo-quickref") but round up
>>> to the next 4 seconds instead of next Friday:
>>>
>>> > to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01")
>>> > aggregate(demo, to4sec, tail, 1)
>>>                     spread    ltp
>>> 2009-02-16 05:00:04 0.0050 48.715
>>> 2009-02-16 05:00:08 0.0025 48.715
>>> 2009-02-16 05:00:12 0.0025 48.715
>>> 2009-02-16 05:00:16 0.0025 48.715
>>
>> Gabor, thanks! I am not as fluent with as.POSIXct() as I should be.
>>
>> And, to continue with my original question:
>>
>>> > Suppose there is not a single record in the raw data from 10:30:04 to
>>> > 10:30:09. Despite this, the resulting object should contain a record
>>> > for 10:30:08 with NA values (which can then be filled out e.g. using
>>> > na.locf()). How would we do this? This problem is not present in this
>>> > data, where records are plentiful. But discretisation code should be
>>> > general and handle this case right.
>>
>> How would we do this? To illustrate:
>>
>>  demo2 <- demo[-300:-700,]
>>  plot(index(demo2), 1:599, type="l")         # we see that 5th to 10th
>>                                              # second is zapped out.
>>  to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01")
>>
>>
>> Now :
>>
>>> aggregate(demo, to5sec, tail, 1)
>>                    spread    ltp
>> 2009-02-16 05:00:05 0.0050 48.715
>> 2009-02-16 05:00:10 0.0025 48.715
>> 2009-02-16 05:00:15 0.0025 48.715
>> 2009-02-16 05:00:20 0.0025 48.715
>>> aggregate(demo2, to5sec, tail, 1)
>>                    spread    ltp
>> 2009-02-16 05:00:05 0.0050 48.715
>> 2009-02-16 05:00:15 0.0025 48.715
>> 2009-02-16 05:00:20 0.0025 48.715
>>
>> We should get :
>>
>>                    spread    ltp
>> 2009-02-16 05:00:05 0.0050 48.715
>> 2009-02-16 05:00:10 NA     NA
>> 2009-02-16 05:00:15 0.0025 48.715
>> 2009-02-16 05:00:20 0.0025 48.715
>>
>
> The trick is that converting to ts makes the series regular (as that
> is the only thing ts can represent) so just convert it to ts and then
> back to zoo.  Since ts cannot represent POSIXct what you get back will
> not have the POSIXct class= attribute set so just set it yourself.
>
>> # aggregate to 5 seconds
>> ag <- aggregate(demo2, to5sec, tail, 1)
>>
>> # make regular (this will strip class from time)
>> ag.fill <- as.zoo(as.ts(ag))
>>
>> # put class back on time
>> time(ag.fill) <- structure(time(ag.fill), class = class(time(ag)))
>> ag.fill
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:10     NA     NA
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>

A slightly shorter alternative to the time(ag.fill)<- line above is:

class(time(ag.fill)) <- class(time(ag))

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.