Massive Algorithms: Google Interview - Discrete random variables - 我的博客

Google Interview - Discrete random variables - 我的博客 - ITeye技术网站

Given K discrete events with different probabilities P[k], produce a random value k consistent with its probability.

discrete random variable distribution, generate random number.
比如给你 [0.2, 0.3, 0.5] 输出0 with probability 0.2, 1 with prob 0.3, 2 with prob 0.5
Follow up : how to speed up

This is an important problem with many applications, especially in Monte-Carlo simulations. There are several solutions to it. They all start with a standard random number u that is uniformly distributed in the interval [0,1). The simplest algorithm would then proceed as follows:

  if      u<p[0]            then return 0;
  else if u<p[0]+p[1]       then return 1;
  else if u<p[0]+p[1]+p[2]  then return 2;
  ... and so on ...

This algorithm is O(n). Using binary search, it can be brought down to O(log(n)).

However, there is another solution that is O(1). It uses precomputed arrays. Effort for this precomputation is O(n). Drawing a representative event requires just drawing of u plus two array lookups plus little arithemetics, independent of n. This ingenious algorithm is called the alias method. It is due to Walker (1974/77), with refinements by Vose (1991). It is the preferred method for M > > n > > 1, where M is the number of drawings.

The obvious way to do this is to preprocess the probability list by generating a cumulative probability array with K+1 elements:

C[0] = 0 
C[k+1] = C[k]+P[k].

Note that this construction produces C[K]=1. Now choose a uniform deviate u between 0 and 1, and find the value of k such that C[k] <= u < C[k+1]. Although this in principle requires of order \log K steps per random number generation, they are fast steps, and if you use something like \lfloor uK \rfloor as a starting point, you can often do pretty well.

But faster methods have been devised. Again, the idea is to preprocess the probability list, and save the result in some form of lookup table; then the individual calls for a random discrete event can go rapidly. An approach invented by G. Marsaglia (Generating discrete random variables in a computer, Comm ACM 6, 37–38 (1963)) is very clever, and readers interested in examples of good algorithm design are directed to this short and well-written paper. Unfortunately, for large K, Marsaglia’s lookup table can be quite large.

A much better approach is due to Alastair J. Walker (An efficient method for generating discrete random variables with general distributions, ACM Trans on Mathematical Software 3, 253–256 (1977); see also Knuth, v2, 3rd ed, p120–121,139). This requires two lookup tables, one floating point and one integer, but both only of size K. After preprocessing, the random numbers are generated in O(1) time, even for large K. The preprocessing suggested by Walker requiresO(K^2) effort, but that is not actually necessary, and the implementation provided here only takes O(K) effort. In general, more preprocessing leads to faster generation of the individual random numbers, but a diminishing return is reached pretty early. Knuth points out that the optimal preprocessing is combinatorially difficult for large K.

public static int discreteRandom(double[] A) {
int n = A.length;
double[] C = new double[n+1];
for(int i=0; i<n; i++) {
C[i+1] = C[i] + A[i];
}
double r = new Random().nextDouble();
int l = 0, h = n-1;
while(l <= h) {
int m = (l + h)/2;
if(r >= C[m] && r < C[m+1]) {
return m;
} else if(r < C[m]) {
h = m - 1;
} else {
l = m + 1;
}
}
return l;
}
// Test Case
public static void main(String[] args) {
double[] B = {0.2, 0.3, 0.5};
int[] D = {0, 0, 0};
for(int i=1; i<10000000; i++) {
D[discreteRandom(B)]++;
}
System.out.println(Arrays.toString(D));
//output: [2001045, 3001794, 4997160]
}

Read full article from Google Interview - Discrete random variables - 我的博客 - ITeye技术网站

Google Interview - Discrete random variables - 我的博客 - ITeye技术网站

Labels

Popular Posts