read/B-tree

原文地址 : https://en.wikipedia.org/wiki/B-tree

B-tree

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children.[1] Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as discs. It is commonly used in databases and file systems.

What, if anything, the B stands for has never been established.

在计算机科学中, B-tree 是自平衡树数据结构, 管理排序后的数据, 能常量时间内搜索. 序列访问, 插入和删除操作

B-tree 是二分查找树的广泛实现, 即一个节点可以有超过两个子节点

不像其他自平衡二分查找树, B-tree 适用于存储系统, 用作读取和写入相当庞大的数据

比如 discs, 通常被数据库和操作系统使用

Overview (概览)

In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some pre-defined range. When data is inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation. For example, in a 2-3 B-tree (often simply referred to as a 2-3 tree), each internal node may have only 2 or 3 child nodes.

Each internal node of a B-tree contains a number of keys. The keys act as separation values which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.

在 B-tree 中, 内部(非叶子)节点可以在预定义范围内拥有可变数量的子节点

当从节点中插入或删除数据时, 子节点的数量改变. 为了管理预定义的范围, 内部节点可以连接或分裂

因为子节点的范围是受限的, B-tree 不需要像其他自平衡二分查找数一样频繁地重平衡, 但是会浪费一些空间, 因为节点不是填充满的

子节点数量的上限和下限通常有具体的实现固定, 比如 2-3 B-tree(通常简称 2-3 tree) 每个内部节点只有 2 个或 3个子节点

每个 B-tree 的内部节点含有一些 keys, keys 充当分离的值分离子树

比如, 如果内部节点有 3 个子节点(或子树) 那么, 它必须有 2 个 keys : a1 和 a2

所有在最左边子树的值 <a1, 所有在中间子树的值 >a1 <a2, 在最右子树中的所有值 >a2

(PS. 这里没有考虑 = 的情况, 不过还行, 等于放在值的左边还是右边, 不会有太大影响)

Usually, the number of keys is chosen to vary between {\displaystyle d} $d$ and {\displaystyle 2d} $2d$ , where {\displaystyle d} $d$ is the minimum number of keys, and {\displaystyle d+1} $d+1$ is the minimum degree or branching factor of the tree. In practice, the keys take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If an internal node has {\displaystyle 2d} $2d$ keys, then adding a key to that node can be accomplished by splitting the hypothetical {\displaystyle 2d+1} $2d+1$ key node into two {\displaystyle d} $d$ key nodes and moving the key that would have been in the middle to the parent node. Each split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each have {\displaystyle d} $d$ keys, then a key may be deleted from the internal node by combining it with its neighbor. Deleting the key would make the internal node have {\displaystyle d-1} $d-1$ keys; joining the neighbor would add {\displaystyle d} $d$ keys plus one more key brought down from the neighbor’s parent. The result is an entirely full node of {\displaystyle 2d} $2d$ keys.