Introduction
Recently, I have several questions about BTree and its application - database indexes:
- What is BTree? Kind of a cool name, huh?
- What special properties make we use BTree for database indexes?
- Why is advantage of BTree over BST/ Red-black tree or Hashtable that we don't use the latter for indexing?
- Is it hard to implement a BTree? How?
- What are disadvantages of BTree?
Let's figure out the answers!
BTree's concept
- A BTree of order m is a data structure with following properties
- at most m children per node
- at least ⌈m/2⌉ children per non-leaf node (except root)
- root has at least 2 children if it is not a leaf node
- a non-leaf node with k children contains k-1 keys
- all leaves appear in the same level (height)

- external node: leave node - contain client key
- internal node: contain copy of client key to guide search
- suppose subtree Ti is the ith child of the node:
all keys in Ti must be between keys k[i-1] and k[i] i.e.k[i-1] ≤ Ti < k[i]
- k[i-1] is the smallest key in Ti
- All keys in first subtree T1 < k[1]
- All keys in last subtree TM ≥ k[M-1]

- Search operation
- start from root
- find interval for search key and follow the corresponding link. Keys in each node are sorted so finding step can be done in O(log M) by binary search tree
- repeat until find the key or terminate in external nodes.

- Insert operation
- search for the external node that contains new key
- insert new key to that node
- if the node overflow, .i.e containing M keys then split it into 2 nodes and insert the newly created node to parent node.
- repeat above step until we reach the root or there is no more overflow node.
- if we reach root, split the root and creat a new root, increate the height of tree.

- Runtime analysis for a BTree of order M
- depth: because each node has at least M/2 key => O(logN / log⌈M/2⌉)
- insert/ delete:
- time for splitting or combining key: O(M)
- insert/ delete time: O(depth * M) = O((log N/log ⌈M/2⌉ )*M) = O((M/log M) * log N)
- search: O(depth*log M) = O(log N)
We can see that BTree has guarantee performance for all operations, while BST has worst case runtime O(n) for search/insert/delete operation
Implementation
We will implement a BTree with key is interger and value is String.
- Entry: representing a key-link in each Node. It has key, value and a pointer to next Node.
private static final class Entry {
private final String value;
private int key;
private Node next;
Entry(int key, String value, Node next) {
this.key = key;
this.value = value;
this.next = next;
}
}
- Node
private static final class Node {
// number of entries
private int m;
private Entry[] children = new Entry[M];
private Node(int k) {
m = k;
}
}
Instead of using a sentinal key like in BTree concept, in implementation, each Node has an array of Entry, entry[i] link to a branch in which all keys satisfy:
entry[i].key <= key < entry[i+1].key
if i == m-1: entry[m-1].key <= key < +∞
if i == 0: -∞ < key < entry[1].key

- Search
public String get(int key) {
return search(root, key, height);
}
private String search(Node x, int key, int h) {
Entry[] children = x.children;
if (h == 0) {
// can be improved by Binary search
for (int i = 0; i < x.m; i++)
if (children[i].key == key)
return children[i].value;
} else {
// can be improved by Binary search
for (int i = 0; i < x.m; i++) {
if (i == x.m - 1 || key < children[i + 1].key) {
return search(children[i].next, key, h - 1);
}
}
}
return null;
}
- Insert
public void put(int key, String value) {
Node u = insert(root, key, value, height);
n++;
if (u == null) {
return;
}
// need to split root
Node newRoot = new Node(2);
newRoot.children[0] = new Entry(root.children[0].key, null, root);
newRoot.children[1] = new Entry(u.children[0].key, null, u);
root = newRoot;
height++;
}
private Node insert(Node x, int key, String value, int h) {
Entry newEntry = new Entry(key, value, null);
int insertPosition = x.m;
if (h == 0) {
// can be improved by Binary search
for (int i = 0; i < x.m; i++) {
if (key < x.children[i].key) {
insertPosition = i;
break;
}
}
} else {
// can be improved by Binary search
for (int i = 0; i < x.m; i++) {
if (i + 1 == x.m || key < x.children[i + 1].key) {
Node u = insert(x.children[i].next, key, value, h - 1);
insertPosition = i + 1;
if (u == null) {
return null;
}
newEntry.key = u.children[0].key;
newEntry.next = u;
break;
}
}
}
for (int j = x.m; j > insertPosition; j--) {
x.children[j] = x.children[j - 1];
}
x.children[insertPosition] = newEntry;
x.m++;
// fit in the max size of each node
if (x.m < M) {
return null;
}
// split
return split(x);
}
private Node split(Node x) {
Node newNode = new Node(M / 2);
// cut the size of x by half
x.m = M / 2;
for (int i = 0; i < M / 2; i++) {
newNode.children[i] = x.children[M / 2 + i];
}
return newNode;
}
Full implementation repor: https://github.com/quangh33/BTree
Reference:
- https://courses.cs.washington.edu/courses/cse373/01sp/Lect9.pdf
- https://www.coursera.org/learn/algorithms-part1/lecture/HIlHd/b-trees-optional
- https://www.cs.yale.edu/homes/aspnes/pinewiki/BTrees.html
<to be continue>