[math] Generate random data using the Inverse CDF Method?

38 messages Options
Embed this post
Permalink
1 2
Mikkel Meyer Andersen-2

[math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Dear community.

I've just started using the Apache Commons Math library. In regards to
generating random data from probability distributions, the library
doesn't support generating random data using the inverse cdf method
although a lot of the distributions gives the possibility to calculate
the inverse cdf.

Are there any particular good reason for this?

If not, I would create a public interface
DistributionWithInverseCumulativeProbability (who has a better name?)
with the method inverseCumulativeProbability (right now it's on
ContinuousDistribution and some other subclasses and don't seem to be
gathered in an interface) and all the distributions with the
inverseCumulativeProbability-method should implement this interface.

With this small change, a new class called
RandomDistributionWithInverseCumulativeProbability (again, who has a
better name?) could simply use the uniform generator and a class
implementing DistributionWithInverseCumulativeProbability to generate
random data from that distribution.

What do you think about that idea? I look forward to comments,
suggestions, and preferably, better names :-).

Cheers, Mikkel Meyer Andersen.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
If you define this new interface, can't you just put the correct
implementation of the generator into some abstract class somewhere and make
sure that all of the distributions implement this.  That implementation can
check if this extends DistributionWithInverseCumulativeProbability.  If it
does, then generating a sample is easy.  If not, throwing
UnImplementedOperation would be in order.

For that matter, would it be better to just insert an abstract class named
DistributionWithInverseCumulativeProbability somewhere in the inheritance
chain?  Do we need the interface at all?

On Mon, Oct 26, 2009 at 3:09 PM, Mikkel Meyer Andersen <[hidden email]> wrote:

> Dear community.
>
> I've just started using the Apache Commons Math library. In regards to
> generating random data from probability distributions, the library
> doesn't support generating random data using the inverse cdf method
> although a lot of the distributions gives the possibility to calculate
> the inverse cdf.
>
> Are there any particular good reason for this?
>
> If not, I would create a public interface
> DistributionWithInverseCumulativeProbability (who has a better name?)
> with the method inverseCumulativeProbability (right now it's on
> ContinuousDistribution and some other subclasses and don't seem to be
> gathered in an interface) and all the distributions with the
> inverseCumulativeProbability-method should implement this interface.
>
> With this small change, a new class called
> RandomDistributionWithInverseCumulativeProbability (again, who has a
> better name?) could simply use the uniform generator and a class
> implementing DistributionWithInverseCumulativeProbability to generate
> random data from that distribution.
>
> What do you think about that idea? I look forward to comments,
> suggestions, and preferably, better names :-).
>
> Cheers, Mikkel Meyer Andersen.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Ted Dunning, CTO
DeepDyve
Mikkel Meyer Andersen-2

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Hi Ted.

Thanks or your answer.

I like the idea about an abstract class named
DistributionWithInverseCumulativeProbability. But I have to look
closer where in the chain it should be. Because both
AbstractIntegerDistribution and AbstractContinuousDistribution have
the inverse cd method. And both extends AbstractDistribution, but
AbstractDistribution doesn't have an inverse cdf. So the best might be
to put an inverse cd method at AbstractDistribution, and throw an
exception, because AbstractIntegerDistribution and
AbstractContinuousDistribution implements it. How does that sound?

Cheers, Mikkel.

2009/10/26 Ted Dunning <[hidden email]>:

> If you define this new interface, can't you just put the correct
> implementation of the generator into some abstract class somewhere and make
> sure that all of the distributions implement this.  That implementation can
> check if this extends DistributionWithInverseCumulativeProbability.  If it
> does, then generating a sample is easy.  If not, throwing
> UnImplementedOperation would be in order.
>
> For that matter, would it be better to just insert an abstract class named
> DistributionWithInverseCumulativeProbability somewhere in the inheritance
> chain?  Do we need the interface at all?
>
> On Mon, Oct 26, 2009 at 3:09 PM, Mikkel Meyer Andersen <[hidden email]> wrote:
>
>> Dear community.
>>
>> I've just started using the Apache Commons Math library. In regards to
>> generating random data from probability distributions, the library
>> doesn't support generating random data using the inverse cdf method
>> although a lot of the distributions gives the possibility to calculate
>> the inverse cdf.
>>
>> Are there any particular good reason for this?
>>
>> If not, I would create a public interface
>> DistributionWithInverseCumulativeProbability (who has a better name?)
>> with the method inverseCumulativeProbability (right now it's on
>> ContinuousDistribution and some other subclasses and don't seem to be
>> gathered in an interface) and all the distributions with the
>> inverseCumulativeProbability-method should implement this interface.
>>
>> With this small change, a new class called
>> RandomDistributionWithInverseCumulativeProbability (again, who has a
>> better name?) could simply use the uniform generator and a class
>> implementing DistributionWithInverseCumulativeProbability to generate
>> random data from that distribution.
>>
>> What do you think about that idea? I look forward to comments,
>> suggestions, and preferably, better names :-).
>>
>> Cheers, Mikkel Meyer Andersen.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
That sounds nice.  It also means that more distributions are likely to
benefit "by accident" even if they don't know to advertise what they can do.

It is also plausible to use reflection at class construction time to
determine whether the method is available.  That would let
AbstractDistribution use the inverse distribution to implement a generator
if possible.

On Mon, Oct 26, 2009 at 3:58 PM, Mikkel Meyer Andersen <[hidden email]> wrote:

> So the best might be
> to put an inverse cd method at AbstractDistribution, and throw an
> exception, because AbstractIntegerDistribution and
> AbstractContinuousDistribution implements it. How does that sound?
>



--
Ted Dunning, CTO
DeepDyve
Mikkel Meyer Andersen-2

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Yes, that was exactly one of the planned "side-effects" :-). If some
distributions prefer to generate random data in another way, explicit
classes in the random-part must be made.

Yes, but I would like to avoid reflection if possible. I'll try to
have a go, and send the patch proposal to the list.

Cheers, Mikkel.

2009/10/27 Ted Dunning <[hidden email]>:

> That sounds nice.  It also means that more distributions are likely to
> benefit "by accident" even if they don't know to advertise what they can do.
>
> It is also plausible to use reflection at class construction time to
> determine whether the method is available.  That would let
> AbstractDistribution use the inverse distribution to implement a generator
> if possible.
>
> On Mon, Oct 26, 2009 at 3:58 PM, Mikkel Meyer Andersen <[hidden email]> wrote:
>
>> So the best might be
>> to put an inverse cd method at AbstractDistribution, and throw an
>> exception, because AbstractIntegerDistribution and
>> AbstractContinuousDistribution implements it. How does that sound?
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Just a suggestion.  You are the one doing the patch and should get the final
word on what you implement.

(and thanks from everybody)

On Mon, Oct 26, 2009 at 4:08 PM, Mikkel Meyer Andersen <[hidden email]> wrote:

> Yes, but I would like to avoid reflection if possible. I'll try to
> have a go, and send the patch proposal to the list.
>



--
Ted Dunning, CTO
DeepDyve
Mikkel Meyer Andersen-2

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Suggestions are happily received! And thanks for all the suggestions.
I haven't really done any open source, so I don't know how this stuff
works :-).

2009/10/27 Ted Dunning <[hidden email]>:

> Just a suggestion.  You are the one doing the patch and should get the final
> word on what you implement.
>
> (and thanks from everybody)
>
> On Mon, Oct 26, 2009 at 4:08 PM, Mikkel Meyer Andersen <[hidden email]> wrote:
>
>> Yes, but I would like to avoid reflection if possible. I'll try to
>> have a go, and send the patch proposal to the list.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
The way you are doing it is exactly the way it works.  And if you keep it
up, you will wind up a committer.  Then they introduce you to the societies
for helping open source contributors recover from the effects of spending
all their time posting new software.


On Mon, Oct 26, 2009 at 4:13 PM, Mikkel Meyer Andersen <[hidden email]> wrote:

> I haven't really done any open source, so I don't know how this stuff
> works :-).
>



--
Ted Dunning, CTO
DeepDyve
Phil Steitz

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Mikkel Meyer Andersen-2
Mikkel Meyer Andersen wrote:
> Dear community.
>
> I've just started using the Apache Commons Math library. In regards to
> generating random data from probability distributions, the library
> doesn't support generating random data using the inverse cdf method
> although a lot of the distributions gives the possibility to calculate
> the inverse cdf.

Have you looked at the RandomData interface and RandomDataImpl
class?  This class provides methods for generating random deviates
from multiple distributions, in some cases using inverse cdfs (see
e.g. RandomDataImpl#nextExponential().
>
> Are there any particular good reason for this?

>
> If not, I would create a public interface
> DistributionWithInverseCumulativeProbability (who has a better name?)
> with the method inverseCumulativeProbability (right now it's on
> ContinuousDistribution and some other subclasses and don't seem to be
> gathered in an interface) and all the distributions with the
> inverseCumulativeProbability-method should implement this interface.

I am not following you here. What exactly is the difference between
DistributionWithInverseCumulativeProbability and
ContinuousDistribution?  ContinuousDistribution extends Distribution
with the inverseCumulativeProbability method you are describing.
>
> With this small change, a new class called
> RandomDistributionWithInverseCumulativeProbability (again, who has a
> better name?) could simply use the uniform generator and a class
> implementing DistributionWithInverseCumulativeProbability to generate
> random data from that distribution.

This is essentially what RandomDataImpl does for the distributions
that it supports. Support for more distributions would be a welcome
addition.  I guess along the lines of what you are talking about
above, it might make sense to add a single generic
nextInversionDeviate method paramaterized by ContinuousDistribution.
The implementation of this would use a uniform generator and
inversion to generate deviates.  That would be simpler than creating
separate classes for each distribution or adding random data
generation to the distributions themselves.  It would also be more
consistent with the current organization of the code, which locates
random data generation in the random package.

Phil

>
> What do you think about that idea? I look forward to comments,
> suggestions, and preferably, better names :-).
>
> Cheers, Mikkel Meyer Andersen.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Inverse CDF methods work for discrete distributions as well as continuous
ones.

On Mon, Oct 26, 2009 at 4:50 PM, Phil Steitz <[hidden email]> wrote:

> > If not, I would create a public interface
> > DistributionWithInverseCumulativeProbability (who has a better name?)
> > with the method inverseCumulativeProbability (right now it's on
> > ContinuousDistribution and some other subclasses and don't seem to be
> > gathered in an interface) and all the distributions with the
> > inverseCumulativeProbability-method should implement this interface.
>
> I am not following you here. What exactly is the difference between
> DistributionWithInverseCumulativeProbability and
> ContinuousDistribution?
>



--
Ted Dunning, CTO
DeepDyve
Mikkel Meyer Andersen-2

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Phil Steitz
Hi Phil.

Yes, I have seen RandomDataImpl and the next{Int, Poisson, ...}, but
as you mention not that many distributions are supported. What I
talked about is kind of the nextInversionDeviate-idea.

I must admit, that I find it a bit weird that the distributions are
separated from the random generation, i.e. the next{Int, ...}-method.
Ideally, when you have an instance of a distribution with the
parameters already specified, it would be nice to just get a sample
from the distribution just like when you find probability mass or what
over.

Yes, ContinuousDistribution have it, but DiscretDistributions doesn't?
And discrete have inverse cdfs as well.

Cheers, Mikkel.

2009/10/27 Phil Steitz <[hidden email]>:

> Mikkel Meyer Andersen wrote:
>> Dear community.
>>
>> I've just started using the Apache Commons Math library. In regards to
>> generating random data from probability distributions, the library
>> doesn't support generating random data using the inverse cdf method
>> although a lot of the distributions gives the possibility to calculate
>> the inverse cdf.
>
> Have you looked at the RandomData interface and RandomDataImpl
> class?  This class provides methods for generating random deviates
> from multiple distributions, in some cases using inverse cdfs (see
> e.g. RandomDataImpl#nextExponential().
>>
>> Are there any particular good reason for this?
>
>>
>> If not, I would create a public interface
>> DistributionWithInverseCumulativeProbability (who has a better name?)
>> with the method inverseCumulativeProbability (right now it's on
>> ContinuousDistribution and some other subclasses and don't seem to be
>> gathered in an interface) and all the distributions with the
>> inverseCumulativeProbability-method should implement this interface.
>
> I am not following you here. What exactly is the difference between
> DistributionWithInverseCumulativeProbability and
> ContinuousDistribution?  ContinuousDistribution extends Distribution
> with the inverseCumulativeProbability method you are describing.
>>
>> With this small change, a new class called
>> RandomDistributionWithInverseCumulativeProbability (again, who has a
>> better name?) could simply use the uniform generator and a class
>> implementing DistributionWithInverseCumulativeProbability to generate
>> random data from that distribution.
>
> This is essentially what RandomDataImpl does for the distributions
> that it supports. Support for more distributions would be a welcome
> addition.  I guess along the lines of what you are talking about
> above, it might make sense to add a single generic
> nextInversionDeviate method paramaterized by ContinuousDistribution.
> The implementation of this would use a uniform generator and
> inversion to generate deviates.  That would be simpler than creating
> separate classes for each distribution or adding random data
> generation to the distributions themselves.  It would also be more
> consistent with the current organization of the code, which locates
> random data generation in the random package.
>
> Phil
>>
>> What do you think about that idea? I look forward to comments,
>> suggestions, and preferably, better names :-).
>>
>> Cheers, Mikkel Meyer Andersen.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Phil Steitz

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ted Dunning
Ted Dunning wrote:
> Inverse CDF methods work for discrete distributions as well as continuous
> ones.

Thanks.  That's what I was missing. I would still rather see the
implementations in the random package and for common distributions,
e.g. Poisson, pick a method that is well-suited for the distribution.

Phil

>
> On Mon, Oct 26, 2009 at 4:50 PM, Phil Steitz <[hidden email]> wrote:
>
>>> If not, I would create a public interface
>>> DistributionWithInverseCumulativeProbability (who has a better name?)
>>> with the method inverseCumulativeProbability (right now it's on
>>> ContinuousDistribution and some other subclasses and don't seem to be
>>> gathered in an interface) and all the distributions with the
>>> inverseCumulativeProbability-method should implement this interface.
>> I am not following you here. What exactly is the difference between
>> DistributionWithInverseCumulativeProbability and
>> ContinuousDistribution?
>>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Mikkel Meyer Andersen-2

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Ted, sorry hadn't seen your e-mail before sending mine.

Yes, I agree in you point of having specialised good algorithms. But
in lack of such methods, I'd prefer being able to have a general
method, although it might be bad compared to a specialised one.

2009/10/27 Phil Steitz <[hidden email]>:

> Ted Dunning wrote:
>> Inverse CDF methods work for discrete distributions as well as continuous
>> ones.
>
> Thanks.  That's what I was missing. I would still rather see the
> implementations in the random package and for common distributions,
> e.g. Poisson, pick a method that is well-suited for the distribution.
>
> Phil
>>
>> On Mon, Oct 26, 2009 at 4:50 PM, Phil Steitz <[hidden email]> wrote:
>>
>>>> If not, I would create a public interface
>>>> DistributionWithInverseCumulativeProbability (who has a better name?)
>>>> with the method inverseCumulativeProbability (right now it's on
>>>> ContinuousDistribution and some other subclasses and don't seem to be
>>>> gathered in an interface) and all the distributions with the
>>>> inverseCumulativeProbability-method should implement this interface.
>>> I am not following you here. What exactly is the difference between
>>> DistributionWithInverseCumulativeProbability and
>>> ContinuousDistribution?
>>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Phil Steitz
It is similar to what RandomDataImpl does, but the current type hierarchy is
a problem because it over-uses that interface/factory/implementation
pattern.  The result is confusing as can be, especially since virtually all
of the distributions have just one implementation and the factories aren't
polymorphic.  This causes the reader (*this* reader anyway) to be completely
convinced that there is more code hidden somewhere.

The approach suggested by Mikkel seems much better than the current one.

On Mon, Oct 26, 2009 at 4:50 PM, Phil Steitz <[hidden email]> wrote:

> > With this small change, a new class called
> > RandomDistributionWithInverseCumulativeProbability (again, who has a
> > better name?) could simply use the uniform generator and a class
> > implementing DistributionWithInverseCumulativeProbability to generate
> > random data from that distribution.
>
> This is essentially what RandomDataImpl does for the distributions
> that it supports. Support for more distributions would be a welcome
> addition.  I guess along the lines of what you are talking about
> above, it might make sense to add a single generic
> nextInversionDeviate method paramaterized by ContinuousDistribution.
> The implementation of this would use a uniform generator and
> inversion to generate deviates.  That would be simpler than creating
> separate classes for each distribution or adding random data
> generation to the distributions themselves.  It would also be more
> consistent with the current organization of the code, which locates
> random data generation in the random package.
>



--
Ted Dunning, CTO
DeepDyve
Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Mikkel Meyer Andersen-2
That was Phil. (not that it matters)

+1 for the idea of a default generator for all distributions that define a
cumulative density.

+1 as well for specialized implementations where possible that over-ride the
default generator even if it exists.

I can't imagine much dispute on either of these points because they satisfy
the general principle of doing the best we can for all cases as well as for
special cases.

I also completely agree with Mikkel with not understanding why the
generation of deviates is separated from the distribution.

On Mon, Oct 26, 2009 at 5:11 PM, Mikkel Meyer Andersen <[hidden email]> wrote:

> Ted, sorry hadn't seen your e-mail before sending mine.
>
> Yes, I agree in you point of having specialised good algorithms. But
> in lack of such methods, I'd prefer being able to have a general
> method, although it might be bad compared to a specialised one.
>
> 2009/10/27 Phil Steitz <[hidden email]>:
> > Thanks.  That's what I was missing. I would still rather see the
> > implementations in the random package and for common distributions,
> > e.g. Poisson, pick a method that is well-suited for the distribution.
>
Mikkel Meyer Andersen-2

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
Ted: No, I mean with the discrete inverse cdf. But anyway. Thanks for
clarifying the points.

Phil, if you're not convinced, I'll be happy to provide a
patch-draft/prototype of code so you can see exactly what I mean?

If we were to put a generator in the distributions (for supporting the
specialised generators), should this method then just be parameterised
by a RandomGenerator? Or what would be a proper approach?

2009/10/27 Ted Dunning <[hidden email]>:

> That was Phil. (not that it matters)
>
> +1 for the idea of a default generator for all distributions that define a
> cumulative density.
>
> +1 as well for specialized implementations where possible that over-ride the
> default generator even if it exists.
>
> I can't imagine much dispute on either of these points because they satisfy
> the general principle of doing the best we can for all cases as well as for
> special cases.
>
> I also completely agree with Mikkel with not understanding why the
> generation of deviates is separated from the distribution.
>
> On Mon, Oct 26, 2009 at 5:11 PM, Mikkel Meyer Andersen <[hidden email]> wrote:
>
>> Ted, sorry hadn't seen your e-mail before sending mine.
>>
>> Yes, I agree in you point of having specialised good algorithms. But
>> in lack of such methods, I'd prefer being able to have a general
>> method, although it might be bad compared to a specialised one.
>>
>> 2009/10/27 Phil Steitz <[hidden email]>:
>> > Thanks.  That's what I was missing. I would still rather see the
>> > implementations in the random package and for common distributions,
>> > e.g. Poisson, pick a method that is well-suited for the distribution.
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
I think that the implementations with specialized generators should just
over-ride the generic generator and do the specialized operation.

I have seen cases where _developers_ think there should be multiple
implementations of random number generators, but I don't think I have ever
seen a case where _users_ think that there should be such.

The only reasonable exception I can think of is in test cases.  There I can
imagine that it would be possible to need to say "what *if* I really did get
this sequence of numbers".  That can generally be handled by mocking and
doesn't motivate me to want to make the user experience more complex than it
needs to be.

As the best case in point, does R provide more than one way to generate
exponential deviates?  (no)  Does SPSS? (no)  Does SAS? (don't think so)
Matlab?  (nope)

Why should we?

On Mon, Oct 26, 2009 at 5:33 PM, Mikkel Meyer Andersen <[hidden email]> wrote:

>
> If we were to put a generator in the distributions (for supporting the
> specialised generators), should this method then just be parameterised
> by a RandomGenerator? Or what would be a proper approach?
>
>
Phil Steitz

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Mikkel Meyer Andersen-2
Mikkel Meyer Andersen wrote:
> Ted: No, I mean with the discrete inverse cdf. But anyway. Thanks for
> clarifying the points.
>
> Phil, if you're not convinced, I'll be happy to provide a
> patch-draft/prototype of code so you can see exactly what I mean?
>
> If we were to put a generator in the distributions (for supporting the
> specialised generators), should this method then just be parameterised
> by a RandomGenerator? Or what would be a proper approach?

That is part of the reason that I would rather see random data
generation remain in the random package.  That way the
RandomGenerator can be easily configured.  It is also better
separation of concerns.  There are lots of things that one can do
with probability distributions, including generate random data
following them.  That does not mean all of these things should be in
the distribution implementation classes.

I now understand your point about a missing interface. I would be
fine with adding a HasInverse or HasInverseCumulativeDistribution
interface to mark invertible distributions and then adding a generic
nextDeviate to RandomDataImpl.  I am also open to deprecating
RandomData/RandomDataImpl and refactoring the setup there.  What I
am -1 on is adding (potentially poor) random data generation to the
distributions implementations.

Phil

>
> 2009/10/27 Ted Dunning <[hidden email]>:
>> That was Phil. (not that it matters)
>>
>> +1 for the idea of a default generator for all distributions that define a
>> cumulative density.
>>
>> +1 as well for specialized implementations where possible that over-ride the
>> default generator even if it exists.
>>
>> I can't imagine much dispute on either of these points because they satisfy
>> the general principle of doing the best we can for all cases as well as for
>> special cases.
>>
>> I also completely agree with Mikkel with not understanding why the
>> generation of deviates is separated from the distribution.
>>
>> On Mon, Oct 26, 2009 at 5:11 PM, Mikkel Meyer Andersen <[hidden email]> wrote:
>>
>>> Ted, sorry hadn't seen your e-mail before sending mine.
>>>
>>> Yes, I agree in you point of having specialised good algorithms. But
>>> in lack of such methods, I'd prefer being able to have a general
>>> method, although it might be bad compared to a specialised one.
>>>
>>> 2009/10/27 Phil Steitz <[hidden email]>:
>>>> Thanks.  That's what I was missing. I would still rather see the
>>>> implementations in the random package and for common distributions,
>>>> e.g. Poisson, pick a method that is well-suited for the distribution.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Phil Steitz

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ted Dunning
Ted Dunning wrote:
> It is similar to what RandomDataImpl does, but the current type hierarchy is
> a problem because it over-uses that interface/factory/implementation
> pattern.

Explain what you mean.  There is one interface and one
implementation for RandomData.

 The result is confusing as can be, especially since virtually all
> of the distributions have just one implementation and the factories aren't
> polymorphic.

What factories?  These were removed in 2.0.

  This causes the reader (*this* reader anyway) to be completely
> convinced that there is more code hidden somewhere.

Where?  Is it the abstract classes that are confusing you?  These
make implementing actual distributions much easier.

Phil

>
> The approach suggested by Mikkel seems much better than the current one.
>
> On Mon, Oct 26, 2009 at 4:50 PM, Phil Steitz <[hidden email]> wrote:
>
>>> With this small change, a new class called
>>> RandomDistributionWithInverseCumulativeProbability (again, who has a
>>> better name?) could simply use the uniform generator and a class
>>> implementing DistributionWithInverseCumulativeProbability to generate
>>> random data from that distribution.
>> This is essentially what RandomDataImpl does for the distributions
>> that it supports. Support for more distributions would be a welcome
>> addition.  I guess along the lines of what you are talking about
>> above, it might make sense to add a single generic
>> nextInversionDeviate method paramaterized by ContinuousDistribution.
>> The implementation of this would use a uniform generator and
>> inversion to generate deviates.  That would be simpler than creating
>> separate classes for each distribution or adding random data
>> generation to the distributions themselves.  It would also be more
>> consistent with the current organization of the code, which locates
>> random data generation in the random package.
>>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Ted Dunning

Re: [math] Generate random data using the Inverse CDF Method?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Phil Steitz
Are you against adding any nextSample() method to distributions at all
(regardless of the quality of the implementation)?

Or just unhappy about adding nextSample() hooked to a bad implementation?

The first opinion, I just don't understand.  The second can be dealt with by
putting in good implementations or by throwing UOE.

I have a little bit of sympathy as a developer for separating all sampling
from the distributions, but I have no sympathy at all with this as a user.
I think of a distribution as something that you can take the density of,
(often) get the cumulative distribution from and get a sample from.  I know
in my heart of hearts that there is something down deep that is probably
called a <mumble>DistributionGenerator.  I even know that underneath that,
there is likely to be a uniform distribution generator.  What what I think
about when using a system is "sampling from a distribution" just like
anybody trained in statistics would.  That means that I expect
<mumble>Distribution.nextSample() to exist.  I know that it might be fast or
slow, but having hunted up the distribution I want, I *don't* want to have
to imagine what class might generate the distribution I want.

The key here is what a user of the system thinks.  Not how an implementor
thinks.

On Mon, Oct 26, 2009 at 6:01 PM, Phil Steitz <[hidden email]> wrote:

> What I
> am -1 on is adding (potentially poor) random data generation to the
> distributions implementations.
>



--
Ted Dunning, CTO
DeepDyve
1 2