#### Problem

Choose a simple random sample, without replacement, of k items from a population of size n.

#### Fisher-Yates shuffling for known n

The basic idea is to shuffle the original array of n number, and pick the first k numbers.
To shuffle, we use Fisher-Yates algorithm:

``````for i from n−1 downto 1 do
j ← random integer s.t. 0 ≤ j ≤ i
swap(a[j], a[i])
``````
##### The intuation

Imagine you have a magical hat and n balls. Put all n balls into the hat.

Now close your eyes and start taking balls randomly out of the hat and placing them in a line. If you can convince yourself that the line of ball you get is random, you can understand how Fisher-Yates works.

##### Math proof

We have to prove that number `a[i]` will have `1/n` probability of ending up at index `j`.

This is equivalent to `a[i]` is picked at `j` trial, and not being picked at first `n-j` trial. This probability is

``````  (1/j) * (1 - 1/(j+1)) * (1 - 1/(j+2)) * ... * (1 - 1/n)
= 1/j * (j/j+1) * (j+1)/(j+2) * (j+2)/(j+3) * ... * (n-1)/n
= 1/n
``````

#### Algorithm R for unknown n

When n is unknown in advance, for example when we receive a stream of events, the problem becomes harder. This is known as Reservior sampling problem. There are many algorithms for this problem, but I will write the simplest one:

``````ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i := 1 to k
R[i] := S[i]

for i := k+1 to n
j := randomInteger(1, i)
if j <= k
R[j] := S[i]
``````

Runtime: `O(n)`

#### Proof

We need to prove that `Pr(S[i] ∈ R) = k/n`.
This is equivalent to S[i] is picked into R and S[i+1], S[i+2], ..., S[n] do not replace S[i]. This probability is

``````   k/i * (1 - 1/(i+1)) * (1 - 1/(i+2)) * ... * (1 - 1/n)
=  k/i * i/(i+1) * (i+1)/(i+2) * ... * (n-1)/n
=  k/n
``````