Table of Contents

Vocabulary

B-Trees Internal Nodes Leaf Nodes Maximum Height Minimum Height Searching Insertion Deletion

What Are B-Trees?

<aside> 💡 With AVLs, we discovered that Binary Search Trees (BSTs) can become self-balancing to prevent performing operations as bad as $O(n)$. However, there is still a problem. What if we are dealing with large amounts of data?

B-Trees are a solution to this problem! B-Trees are self-balancing BSTs that have more than 2 children and can store multiple keys in a node. Being able to store multiple keys in a node and have more children node, significantly reduces the height of the tree, allowing for faster runtimes when performing operations such as insertion, deletion, or searching for a specific key. Additionally, the amount of times data would need to be retrieved from seconday storage would lessen as we would be able to store more data in shorter amounts of disk accesses.

</aside>

B-Tree Requirements

<aside> 💡 For a tree to be deemed a valid B-Tree of order $m$:

Prerequisite: A valid B-Tree must meet the general requirements of a Binary Search Tree (i.e. no duplicates, etc.)

Important Variables To Note $M$ = max number of children $M - 1$ = max number of keys $t$ = $\lceil \frac{M}{2} \rceil$ A full node has $M - 1$ keys.

General Requirements

  1. Keys are stored in ascending order in a node (smallest to biggest). 2a. Keys in left subtree ($T_L$) < the leftmost key in a parent node. 2b. Keys in right subtree ($T_R$) > the rightmost key in a parent node. 2c. Children nodes with keys that are in between a preceding key and following key in value should be placed in between these nodes.

Root Node Requirements

  1. The root must contain a number of keys between $1$ and $M - 1$ (aka between $2$ and $M$ children).
  2. If the root node isn’t a leaf node, the number of children = # of keys in the root + 1.

Internal Node Requirements

  1. Internal Nodes must contain a number of keys between $t - 1$ and $M -1$.
  2. If the internal nodes aren’t leaf nodes, the number of children = # of keys in the node + 1 (aka between $t$ and $M$ children. ****3. If a parent node has $n$ children, then the parent node must have $n-1$ keys.

Leaf Node Requirements

  1. Leaf Nodes (nodes with no children) must contain a number of keys between $t - 1$ and $M -1$.
  2. All leaf nodes must be at the same level of the tree.

</aside>

Estimating The Height of B-Trees

<aside> 💡 Given $n$ data, how many keys are needed to hold that data? What is the minimum and maximum height of a B-Tree?

</aside>

Maximum/Worst Case Height of A B-Tree

<aside> 💡 The worst case height of a B-Tree is $log_t \space n$.

To showcase the calculation behind this, let’s create a B-Tree that isn’t full (meaning it would contain the minimum amount of nodes).

</aside>

<aside> 💡 In Figure 1, the root node only contains 1 key (since this B-Tree isn’t full and we want to obtain the maximum height). Let’s assume the root node contains a minimum amount of children. As a result, it will only have 2 children nodes that contain $2(t-1)$ keys since the minimum amount of keys internal nodes can have (assuming they aren’t leaf nodes) are $t -1$.

</aside>

<aside> 💡 In Figure 2, at depth 2, each of the 2 nodes can have $t$ children with each child having $t-1$ keys. Thus, the total amount of keys at depth 2 is $2t(t-1)$ with $2t$ representing the total amount of nodes within depth 2 and $t-1$ being the amount of keys for each child node.

</aside>

<aside> 💡 See a pattern? The total amount of keys can be determined with the following formula:

</aside>

$$ keys = 1 + \sum_{i=1}^{h} 2t^{i-1}(t-1) $$

<aside> 💡 We can solve this summation using the Geometric Series to determine the most of amount of keys:

</aside>

$$ 1 + \sum_{i=1}^{h} 2t^{i-1}(t-1) \newline 1 + 2(t-1) \sum_{i=1}^{h} t^{i-1} \newline 1 + 2(t-1) \frac {t^h - 1}{t - 1} \newline 1 + 2(t^h - 1) \newline most \space amount \space of keys = 2t^h - 1 $$