Massive Algorithms: B Tree

http://en.wikipedia.org/wiki/B-tree
Key Point: Sorted, Join and Split when insert or delete

a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.

In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some pre-defined range. When data is inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation. For example, in a 2-3 B-tree (often simply referred to as a 2-3 tree), each internal node may have only 2 or 3 child nodes.

Usually, the number of keys is chosen to vary between

d

and

2d

, where

d

is the minimum number of keys, and

d+1

is the minimum degree or branching factorof the tree. In practice, the keys take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If an internal node has

2d

keys, then adding a key to that node can be accomplished by splitting the

2d

key node into two

d

key nodes and adding the key to the parent node. Each split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each have

d

keys, then a key may be deleted from the internal node by combining with its neighbor. Deleting the key would make the internal node have

d-1

keys; joining the neighbor would add

d

keys plus one more key brought down from the neighbor's parent. The result is an entirely full node of

2d

keys.

The number of branches (or child nodes) from a node will be one more than the number of keys stored in the node. In a 2-3 B-tree, the internal nodes will store either one key (with two child nodes) or two keys (with three child nodes). A B-tree is sometimes described with the parameters

(d+1)

—

(2d+1)

or simply with the highest branching order,

(2d+1)

In the narrow sense, a B-tree stores keys in its internal nodes but need not store those keys in the records at the leaves. The general class includes variations such as the B+ tree and the B*.

In the B+ tree, copies of the keys are stored in the internal nodes; the keys and records are stored in leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential access
The B^*-tree balances more neighboring internal nodes to keep the internal nodes more densely packed (Comer 1979, p. 129). This variant requires non-root nodes to be at least 2/3 full instead of 1/2 (Knuth 1998, p. 488). To maintain this, instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. When both nodes are full, then the two nodes are split into three. Deleting nodes is somewhat more complex than inserting however.
B-trees can be turned into order statistic trees to allow rapid searches for the Nth record in key order, or counting the number of records between any two records, and various other related operations.

The database problem
Large databases have historically been kept on disk drives. The time to read a record on a disk drive far exceeds the time needed to compare keys once the record is available. For simplicity, assume reading from disk takes about 10 milliseconds.

An index speeds the search

a B-tree:
keeps keys in sorted order for sequential traversing
uses a hierarchical index to minimize the number of disk reads
uses partially full blocks to speed insertions and deletions
keeps the index balanced with an elegant recursive algorithm
In addition, a B-tree minimizes waste by making sure the interior nodes are at least half full. A B-tree can handle an arbitrary number of insertions and deletions.

http://en.wikipedia.org/wiki/B-tree

B Tree

Labels

Popular Posts