BTree and database indexes
1. How data is stored in hard disk?
  • Modern hard disk drives are addressed as large one-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The size of a logical block is usually 512 or 1024 bytes.This one-dimensional array of logical blocks is mapped onto the sectors of the disk sequentially.

  • When data is read/ written, the arm needs to go to a particular track, this is measured as seek time. After that, spindle will spin to a particular sector, this is measured as rotational latency. When we read a large amount of data, seek time becomes the bottleneck because disk has to continuously move its arm.

2. Memory locality
  • Due to the way data is stored in disk, a data structure will run faster if it has good locality
  • Good locality: data accessed around the same time tend to be in about the same place. Example, searching through an unsorted array will be faster than searching through an unsorted linked list.

Array locality
  • Each time we create a linked list node by new Node(x), it will locate the node in a unpreditable location in main memory, in the worst case, one node per virtual memory block, each of which needs to be paged in from disk if you are running low on space. A similar issue arises with BST.

  • In fact, there are space limitations when it comes to memory, and not all of the data can reside in memory, and hence majority of the data has to be saved on disk, which is slow (because of the seek time). Without a lookup data structure, DBMS would have to do a sequential scan to find a value in database.

  • The solution is we create a B-trees to store all key values and corresponding pointer to rows in main memory => searching time will be proportional to the height of the tree. In fact, when data volumn grows, the BTree cannot fit in memory, so it is stored in disk. BTree is designed to store data in block fashion so it’s efficient for Operating Systems to read & write data in blocks instead of writing individual bytes. Imagine we use BST/ AVL tree in this case, there will be a lot of random access (because of bad locality), our disk manager has to move the disk head around the spinning disk, so naturally it’s a costly operation as the disk has to look for data like crazy.

In summary, advantages of B-Trees over BST, AVL tree for being used:

  • B-trees are self-balancing while BST are not self-balancing => can lead to a very tall tree
  • B-trees are much shorter than other balanced binary tree structures such as AVL => fewer disk access.
M = 1024
N = 63 billion
Height = logN / log(M/2) ≤ 4
  • If we want to retrieve a range of entries in order by key, B-trees and its variant B+trees are much efficient than AVL, BST or Hash table, because each node contains a large number of entries with sorted key.
3. Disadvantage of B-trees
  • much more complicated to implement than Hash table
  • for equality comparisons lookup (==, !=), hash table is much faster O(1)
  • the write speed in BTree is typically slower than some other method (such as LSM-trees)