#### Problem

Choose a simple random sample, without replacement, of **k items** from a population of **size n**.

#### Fisher-Yates shuffling for known n

The basic idea is to shuffle the original array of n number, and pick the first k numbers.

To shuffle, we use Fisher-Yates algorithm:

```
for i from n−1 downto 1 do
j ← random integer s.t. 0 ≤ j ≤ i
swap(a[j], a[i])
```

##### The intuation

Imagine you have a magical hat and n balls. Put all n balls into the hat.

Now close your eyes and start taking balls randomly out of the hat and placing them in a line. If you can convince yourself that the line of ball you get is random, you can understand how Fisher-Yates works.

##### Math proof

We have to prove that number `a[i]`

will have `1/n`

probability of ending up at index `j`

.

This is equivalent to `a[i]`

is picked at `j`

trial, **and** not being picked at first `n-j`

trial. This probability is

```
(1/j) * (1 - 1/(j+1)) * (1 - 1/(j+2)) * ... * (1 - 1/n)
= 1/j * (j/j+1) * (j+1)/(j+2) * (j+2)/(j+3) * ... * (n-1)/n
= 1/n
```

#### Algorithm R for unknown n

When n is unknown in advance, for example when we receive a stream of events, the problem becomes harder. This is known as Reservior sampling problem. There are many algorithms for this problem, but I will write the simplest one:

```
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i := 1 to k
R[i] := S[i]
for i := k+1 to n
j := randomInteger(1, i)
if j <= k
R[j] := S[i]
```

Runtime: `O(n)`

#### Proof

We need to prove that `Pr(S[i] ∈ R) = k/n`

.

This is equivalent to S[i] is picked into R and S[i+1], S[i+2], ..., S[n] do not replace S[i]. This probability is

```
k/i * (1 - 1/(i+1)) * (1 - 1/(i+2)) * ... * (1 - 1/n)
= k/i * i/(i+1) * (i+1)/(i+2) * ... * (n-1)/n
= k/n
```