R-tree_spatial access methods-优快云博客

R树是一种类似于B树的空间数据结构，主要用于多维信息的索引。它通过使用层次嵌套且可能重叠的最小边界矩形来分割空间。每个非叶节点包含指向子节点的标识符和该子节点内所有条目的边界框。插入和删除操作利用这些边界框确保“邻近”的元素被放置在同一叶节点中。搜索算法也利用这些边界框来决定是否需要深入子节点进行搜索。

R-trees are tree data structures that are similar to B-trees, but are used for spatial access methods, i.e., for indexing multi-dimensional information; for example, the (X, Y) coordinates of geographical data. A common real-world usage for an R-tree might be: "Find all museums within 2 km of my current location".

用于索引多维信息

The data structure splits space with hierarchically nested, and possibly overlapping, minimum bounding rectangles (MBRs, otherwise known as bounding boxes, i.e. "rectangle", what the "R" in R-tree stands for).

Each node of an R-tree has a variable number of entries (up to some pre-defined maximum). Each entry within a non-leaf node stores two pieces of data: a way of identifying a child node, and the bounding box of all entries within this child node.

The insertion and deletion algorithms use the bounding boxes from the nodes to ensure that "nearby" elements are placed in the same leaf node (in particular, a new element will go into the leaf node that requires the least enlargement in its bounding box). Each entry within a leaf node stores two pieces of information; a way of identifying the actual data element (which, alternatively, may be placed directly in the node), and the bounding box of the data element.

Similarly, the searching algorithms (e.g., intersection, containment, nearest) use the bounding boxes to decide whether or not to search inside a child node. In this way, most of the nodes in the tree are never "touched" during a search. Like B-trees, this makes R-trees suitable for databases, where nodes can be paged to memory when needed.

Different algorithms can be used to split nodes when they become too full, resulting in the quadratic and linear R-tree sub-types.

R-trees do not historically guarantee good worst-case performance, but generally perform well with real-world data.^{[citation needed]}However, a new algorithm was published in 2004 that defines the Priority R-Tree, which claims to be as efficient as the most efficient methods of 2004 and is at the same time worst-case optimal.^[1]

When data is organized in an R-Tree, the k nearest neighbors (for any L^p-Norm) of all points can efficiently be computed using a spatial join.^[2] This is beneficial for many algorithms based on the k nearest neighbors, for example the Local Outlier Factor.