Automatic Accuracy Assessment via Hashing in Multiple-Source Environment

本文提出了一种基于Jensen–Shannon divergence的方法来自动量化数据的准确性。通过与实体最接近的上下文进行比较,该方法可以快速为大型数据源提供客观的准确性评分,减少了人工交互的需求。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Accuracy is a most important data quality dimension and its assessment is a key issue in data management. Most of current studies focus on how to qualitatively analyze accuracy dimension and the analysis depends heavily on experts’ knowledge. Seldom work is given on how to automatically quantify accuracy dimension. Based on Jensen–Shannon divergence (JSD) measure, we propose accuracy of data can be automatically quantified by comparing data with its entity’s most approximation in available context . To quickly identify most approximation in large scale data sources, locality-sensitive hashing (LSH ) is employed to extract most approximation at multiple levels, namely column, record and field level. Our approach can not only give each data source an objective accuracy score very quickly as long as context member is available but also avoid human’s laborious interaction. As an automatic accuracy assessment solution in multiple-source environment, our approach is distinguished, especially for large scale data sources. Theory and experiment show our approach performs well in achieving metadata on accuracy dimension.

Published by Expert Systems With Applications (ELSEVIER)The online version is available at http://dx.doi.org/10.1016/j.eswa.2009.08.023 .

<!-- articleText -->
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值