##### Introduction

Recently, I have several questions about BTree and its application - database indexes:

1. What is BTree? Kind of a cool name, huh?
2. What special properties make we use BTree for database indexes?
3. Why is advantage of BTree over BST/ Red-black tree or Hashtable that we don't use the latter for indexing?
4. Is it hard to implement a BTree? How?
5. What are disadvantages of BTree?

##### BTree's concept
1. A BTree of order m is a data structure with following properties
• at most m children per node
• at least ⌈m/2⌉ children per non-leaf node (except root)
• root has at least 2 children if it is not a leaf node
• a non-leaf node with k children contains k-1 keys
• all leaves appear in the same level (height)
• external node: leave node - contain client key
• internal node: contain copy of client key to guide search
• suppose subtree Ti is the ith child of the node:
all keys in Ti must be between keys k[i-1] and k[i] i.e. k[i-1] ≤ Ti < k[i]
• k[i-1] is the smallest key in Ti
• All keys in first subtree T1 < k
• All keys in last subtree TM ≥ k[M-1]

Visualization of BTree

1. Search operation
• start from root
• find interval for search key and follow the corresponding link. Keys in each node are sorted so finding step can be done in O(log M) by binary search tree
• repeat until find the key or terminate in external nodes.

1. Insert operation
• search for the external node that contains new key
• insert new key to that node
• if the node overflow, .i.e containing M keys then split it into 2 nodes and insert the newly created node to parent node.
• repeat above step until we reach the root or there is no more overflow node.
• if we reach root, split the root and creat a new root, increate the height of tree.
1. Runtime analysis for a BTree of order M
• depth: because each node has at least M/2 key => O(logN / log⌈M/2⌉)
• insert/ delete:
• time for splitting or combining key: O(M)
• insert/ delete time: O(depth * M) = O((log N/log ⌈M/2⌉ )*M) = O((M/log M) * log N)
• search: O(depth*log M) = O(log N)

We can see that BTree has guarantee performance for all operations, while BST has worst case runtime O(n) for search/insert/delete operation

##### Implementation

We will implement a BTree with key is interger and value is String.

1. Entry: representing a key-link in each Node. It has key, value and a pointer to next Node.
private static final class Entry {
private final String value;
private int key;
private Node next;

Entry(int key, String value, Node next) {
this.key = key;
this.value = value;
this.next = next;
}
}

1. Node
private static final class Node {
// number of entries
private int m;
private Entry[] children = new Entry[M];

private Node(int k) {
m = k;
}
}


Instead of using a sentinal key like in BTree concept, in implementation, each Node has an array of Entry, entry[i] link to a branch in which all keys satisfy:

• entry[i].key <= key < entry[i+1].key
• if i == m-1: entry[m-1].key <= key < +∞
• if i == 0: -∞ < key < entry.key
1. Search
public String get(int key) {
return search(root, key, height);
}

private String search(Node x, int key, int h) {
Entry[] children = x.children;
if (h == 0) {
// can be improved by Binary search
for (int i = 0; i < x.m; i++)
if (children[i].key == key)
return children[i].value;
} else {
// can be improved by Binary search
for (int i = 0; i < x.m; i++) {
if (i == x.m - 1 || key < children[i + 1].key) {
return search(children[i].next, key, h - 1);
}
}
}
return null;
}

1. Insert
public void put(int key, String value) {
Node u = insert(root, key, value, height);
n++;
if (u == null) {
return;
}
// need to split root
Node newRoot = new Node(2);
newRoot.children = new Entry(root.children.key, null, root);
newRoot.children = new Entry(u.children.key, null, u);
root = newRoot;
height++;
}

private Node insert(Node x, int key, String value, int h) {
Entry newEntry = new Entry(key, value, null);
int insertPosition = x.m;

if (h == 0) {
// can be improved by Binary search
for (int i = 0; i < x.m; i++) {
if (key < x.children[i].key) {
insertPosition = i;
break;
}
}
} else {
// can be improved by Binary search
for (int i = 0; i < x.m; i++) {
if (i + 1 == x.m || key < x.children[i + 1].key) {
Node u = insert(x.children[i].next, key, value, h - 1);
insertPosition = i + 1;
if (u == null) {
return null;
}
newEntry.key = u.children.key;
newEntry.next = u;
break;
}
}
}
for (int j = x.m; j > insertPosition; j--) {
x.children[j] = x.children[j - 1];
}
x.children[insertPosition] = newEntry;
x.m++;
// fit in the max size of each node
if (x.m < M) {
return null;
}
// split
return split(x);
}

private Node split(Node x) {
Node newNode = new Node(M / 2);
// cut the size of x by half
x.m = M / 2;
for (int i = 0; i < M / 2; i++) {
newNode.children[i] = x.children[M / 2 + i];
}
return newNode;
}


Full implementation repor: https://github.com/quangh33/BTree

Reference:

<to be continue>