Problem
Choose a simple random sample, without replacement, of k items from a population of size n.
Fisher-Yates shuffling for known n
The basic idea is to shuffle the original array of n number, and pick the first k numbers.
To shuffle, we use Fisher-Yates algorithm:
for i from n−1 downto 1 do
j ← random integer s.t. 0 ≤ j ≤ i
swap(a[j], a[i])
The intuation
Imagine you have a magical hat and n balls. Put all n balls into the hat.


Now close your eyes and start taking balls randomly out of the hat and placing them in a line. If you can convince yourself that the line of ball you get is random, you can understand how Fisher-Yates works.
Math proof
We have to prove that number a[i]
will have 1/n
probability of ending up at index j
.
This is equivalent to a[i]
is picked at j
trial, and not being picked at first n-j
trial. This probability is
(1/j) * (1 - 1/(j+1)) * (1 - 1/(j+2)) * ... * (1 - 1/n)
= 1/j * (j/j+1) * (j+1)/(j+2) * (j+2)/(j+3) * ... * (n-1)/n
= 1/n
Algorithm R for unknown n
When n is unknown in advance, for example when we receive a stream of events, the problem becomes harder. This is known as Reservior sampling problem. There are many algorithms for this problem, but I will write the simplest one:
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i := 1 to k
R[i] := S[i]
for i := k+1 to n
j := randomInteger(1, i)
if j <= k
R[j] := S[i]
Runtime: O(n)
Proof
We need to prove that Pr(S[i] ∈ R) = k/n
.
This is equivalent to S[i] is picked into R and S[i+1], S[i+2], ..., S[n] do not replace S[i]. This probability is
k/i * (1 - 1/(i+1)) * (1 - 1/(i+2)) * ... * (1 - 1/n)
= k/i * i/(i+1) * (i+1)/(i+2) * ... * (n-1)/n
= k/n