Introduction

Recently, I have several questions about BTree and its application - database indexes:

  1. What is BTree? Kind of a cool name, huh?
  2. What special properties make we use BTree for database indexes?
  3. Why is advantage of BTree over BST/ Red-black tree or Hashtable that we don't use the latter for indexing?
  4. Is it hard to implement a BTree? How?
  5. What are disadvantages of BTree?

Let's figure out the answers!

BTree's concept
  1. A BTree of order m is a data structure with following properties
  • at most m children per node
  • at least ⌈m/2⌉ children per non-leaf node (except root)
  • root has at least 2 children if it is not a leaf node
  • a non-leaf node with k children contains k-1 keys
  • all leaves appear in the same level (height)
  • external node: leave node - contain client key
  • internal node: contain copy of client key to guide search
  • suppose subtree Ti is the ith child of the node:
    all keys in Ti must be between keys k[i-1] and k[i] i.e. k[i-1] ≤ Ti < k[i]
    • k[i-1] is the smallest key in Ti
    • All keys in first subtree T1 < k[1]
    • All keys in last subtree TM ≥ k[M-1]

Visualization of BTree

  1. Search operation
    • start from root
    • find interval for search key and follow the corresponding link. Keys in each node are sorted so finding step can be done in O(log M) by binary search tree
    • repeat until find the key or terminate in external nodes.

  1. Insert operation
    • search for the external node that contains new key
    • insert new key to that node
    • if the node overflow, .i.e containing M keys then split it into 2 nodes and insert the newly created node to parent node.
    • repeat above step until we reach the root or there is no more overflow node.
    • if we reach root, split the root and creat a new root, increate the height of tree.
  1. Runtime analysis for a BTree of order M
  • depth: because each node has at least M/2 key => O(logN / log⌈M/2⌉)
  • insert/ delete:
    • time for splitting or combining key: O(M)
    • insert/ delete time: O(depth * M) = O((log N/log ⌈M/2⌉ )*M) = O((M/log M) * log N)
  • search: O(depth*log M) = O(log N)

We can see that BTree has guarantee performance for all operations, while BST has worst case runtime O(n) for search/insert/delete operation

Implementation

We will implement a BTree with key is interger and value is String.

  1. Entry: representing a key-link in each Node. It has key, value and a pointer to next Node.
private static final class Entry {
    private final String value;
    private int key;
    private Node next;

    Entry(int key, String value, Node next) {
        this.key = key;
        this.value = value;
        this.next = next;
    }
}
  1. Node
private static final class Node {
    // number of entries
    private int m;
    private Entry[] children = new Entry[M];

    private Node(int k) {
        m = k;
    }
}

Instead of using a sentinal key like in BTree concept, in implementation, each Node has an array of Entry, entry[i] link to a branch in which all keys satisfy:

  • entry[i].key <= key < entry[i+1].key
  • if i == m-1: entry[m-1].key <= key < +∞
  • if i == 0: -∞ < key < entry[1].key
BTree of order 4 (M=4)
  1. Search
public String get(int key) {
        return search(root, key, height);
}

private String search(Node x, int key, int h) {
    Entry[] children = x.children;
    if (h == 0) {
        // can be improved by Binary search
        for (int i = 0; i < x.m; i++)
            if (children[i].key == key)
                return children[i].value;
    } else {
        // can be improved by Binary search
        for (int i = 0; i < x.m; i++) {
            if (i == x.m - 1 || key < children[i + 1].key) {
                return search(children[i].next, key, h - 1);
            }
        }
    }
    return null;
}
  1. Insert
public void put(int key, String value) {
        Node u = insert(root, key, value, height);
        n++;
        if (u == null) {
            return;
        }
        // need to split root
        Node newRoot = new Node(2);
        newRoot.children[0] = new Entry(root.children[0].key, null, root);
        newRoot.children[1] = new Entry(u.children[0].key, null, u);
        root = newRoot;
        height++;
}

private Node insert(Node x, int key, String value, int h) {
    Entry newEntry = new Entry(key, value, null);
    int insertPosition = x.m;

    if (h == 0) {
        // can be improved by Binary search
        for (int i = 0; i < x.m; i++) {
            if (key < x.children[i].key) {
                insertPosition = i;
                break;
            }
        }
    } else {
        // can be improved by Binary search
        for (int i = 0; i < x.m; i++) {
            if (i + 1 == x.m || key < x.children[i + 1].key) {
                Node u = insert(x.children[i].next, key, value, h - 1);
                insertPosition = i + 1;
                if (u == null) {
                    return null;
                }
                newEntry.key = u.children[0].key;
                newEntry.next = u;
                break;
            }
        }
    }
    for (int j = x.m; j > insertPosition; j--) {
        x.children[j] = x.children[j - 1];
    }
    x.children[insertPosition] = newEntry;
    x.m++;
    // fit in the max size of each node
    if (x.m < M) {
        return null;
    }
    // split
    return split(x);
}

private Node split(Node x) {
    Node newNode = new Node(M / 2);
    // cut the size of x by half
    x.m = M / 2;
    for (int i = 0; i < M / 2; i++) {
        newNode.children[i] = x.children[M / 2 + i];
    }
    return newNode;
}

Full implementation repor: https://github.com/quangh33/BTree

Reference:

<to be continue>