My current area of research is similarity search. Just like the normal search process, we need several data structures to make the similarity search effectively and efficiently, which should support the range query and KNN at least. In this essay, I would like to sum up my recent research in M-Tree, which is a kind of metric tree (only considering relative distances between objects).
Firstly, let us see the example, which is copied from the book 《Similarity Search-The Metric Space Approach》, written by Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal and Michal Batko.

Obviously, it is a two dimensional space and we can abstract these objects, ranging from O1 to O11 in the vector space. Additionally, to compute the relative distances, euclidean distance is often used. In this data structure, internal nodes including root node, in which each entry consists of radius representing its area and the distance between itself and its parent object (0 for root), while as for the leaf nodes, radii are always 0 instead.
The features of M-Tree can be concluded:
1. Balanced.
2. All of its objects are listed in the leaf nodes.
3. Dynamic, meaning insertion is possible without reorganization the whole tree.
4. Most importantly, it bases on the secondary memory, able to process large data.
However, to further improve the performance of M-Tree, triangle inequality is also applied to diminish the computing as distance computing in high dimensional space is rather time-consuming. Fully employing the distances stored in the entries can contribute it totally.
Note that euclidean distance is not the only way to measure the distance. Only if the distance meets the requirement of non-negativity, symmetry as well as triangle inequality can we employ it as the distance in M-Tree.
Several useful materials are listed below:
1. http://www-db.deis.unibo.it/Mtree/ (below are most relative ones)
2. P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula. Indexing metric spaces with M-tree. In Atti del Quinto Convegno Nazionale SEBD, Verona, Italy, June 1997.
3. P. Ciaccia, and M. Patella. Bulk loading the M-tree. In Proceedings of th 9th Australasian Database Conference (ADC'98), Perth, Australia, February 1998.
4. M. Patella. Similarity Search in Multimedia Databases. PhD thesis, Dipartimento di Elettronica Informatica e Sistemistica, Università degli Studi di Bologna, Bologna, Italy, February 1999.
本文概述了作者近期在M-树领域的研究工作,M-树是一种度量树,仅考虑对象之间的相对距离。文章详细介绍了M-树的特点,如平衡性、叶节点包含所有对象、动态插入等,并讨论了如何利用三角不等式提高性能,以及如何在高维空间中充分利用存储的距离信息。此外,文章还提供了多个相关资源供读者深入学习。
1万+

被折叠的 条评论
为什么被折叠?



