Choose a simple random sample, without replacement, of k items from a population of size n.

Fisher-Yates shuffling for known n

The basic idea is to shuffle the original array of n number, and pick the first k numbers.
To shuffle, we use Fisher-Yates algorithm:

for i from n−1 downto 1 do
     j ← random integer s.t. 0 ≤ j ≤ i
     swap(a[j], a[i])
The intuation

Imagine you have a magical hat and n balls. Put all n balls into the hat.

Now close your eyes and start taking balls randomly out of the hat and placing them in a line. If you can convince yourself that the line of ball you get is random, you can understand how Fisher-Yates works.

Math proof

We have to prove that number a[i] will have 1/n probability of ending up at index j.

This is equivalent to a[i] is picked at j trial, and not being picked at first n-j trial. This probability is

  (1/j) * (1 - 1/(j+1)) * (1 - 1/(j+2)) * ... * (1 - 1/n)
= 1/j * (j/j+1) * (j+1)/(j+2) * (j+2)/(j+3) * ... * (n-1)/n
= 1/n

Algorithm R for unknown n

When n is unknown in advance, for example when we receive a stream of events, the problem becomes harder. This is known as Reservior sampling problem. There are many algorithms for this problem, but I will write the simplest one:

ReservoirSample(S[1..n], R[1..k])
  // fill the reservoir array
  for i := 1 to k
      R[i] := S[i]

  for i := k+1 to n
    j := randomInteger(1, i)
    if j <= k
        R[j] := S[i]

Runtime: O(n)


We need to prove that Pr(S[i] ∈ R) = k/n.
This is equivalent to S[i] is picked into R and S[i+1], S[i+2], ..., S[n] do not replace S[i]. This probability is

   k/i * (1 - 1/(i+1)) * (1 - 1/(i+2)) * ... * (1 - 1/n)
=  k/i * i/(i+1) * (i+1)/(i+2) * ... * (n-1)/n
=  k/n