MatchScore
packages/providers/ContactsProvider/src/com/android/providers/contacts/aggregation/util/MatchScore.java
这个类用于记录每个联系人匹配分数,自动聚合的时候依据这个来选取候选对象。
public class MatchScore implements Comparable<MatchScore> {
// Scores a multiplied by this number to allow room for "fractional" scores
public static final int SCORE_SCALE = 1000; //分数所占系数
// Best possible match score
public static final int MAX_SCORE = 100; //最大分数
private long mRawContactId; //联系人相关信息
private long mContactId;
private long mAccountId;
private boolean mKeepIn; //是否匹配
private boolean mKeepOut;
private int mPrimaryScore; //首要分数,名字匹配分数
private int mSecondaryScore; //次要分数,号码等其它信息匹配分数
private int mMatchCount; //每次更新分数后值加1,最终也会用于计算匹配分数
...
}
接下俩是显示成员变量关系的方法:
public int getScore() {
if (mKeepOut) {
return 0; //不匹配直接返回0
}
if (mKeepIn) {
return MAX_SCORE; //匹配直接返回100
}
int score = (mPrimaryScore > mSecondaryScore ? mPrimaryScore : mSecondaryScore); //选取最大的score
// Ensure that of two contacts with the same match score the one with more matching
// data elements wins.
return score * SCORE_SCALE + mMatchCount; //系数是1000.可见mMatchCount占的比例很小
}
比较方法
@Override
public int compareTo(MatchScore another) {
return another.getScore() - getScore(); //比较分数值
}
NameDistance
packages/providers/ContactsProvider/src/com/android/providers/contacts/aggregation/util/NameDistance.java
这个类的方法就一个
public float getDistance(byte bytes1[], byte bytes2[])
返回两个名字的距离,注意参数都是byte格式的。这个距离的定义见
匹配算法Jaro–Winkler distance简介
ContactMatcher
packages/providers/ContactsProvider/src/com/android/providers/contacts/aggregation/util/ContactMatcher.java联系人匹配分数计算
常量
// Suggest to aggregate contacts if their match score is equal or greater than this threshold
public static final int SCORE_THRESHOLD_SUGGEST = 50;
// Automatically aggregate contacts if their match score is equal or greater than this threshold
public static final int SCORE_THRESHOLD_PRIMARY = 70;
// Automatically aggregate contacts if the match score is equal or greater than this threshold
// and there is a secondary match (phone number, email etc).
public static final int SCORE_THRESHOLD_SECONDARY = 50;
三个常量,确定联系人匹配程度的阀值,值越低匹配的程度越低。还有其它的常量:
private static final int NO_DATA_SCORE = -1; //不匹配的分数
private static final int PHONE_MATCH_SCORE = 71; //号码匹配分数
private static final int EMAIL_MATCH_SCORE = 71; //邮件匹配分数
private static final int NICKNAME_MATCH_SCORE = 71; //昵称匹配分数
private static final int MAX_MATCHED_NAME_LENGTH = 30; //最大匹配联系人数量
匹配算法常量,匹配程度由高到低:
public static final int MATCHING_ALGORITHM_EXACT = 0; //完全匹配
public static final int MATCHING_ALGORITHM_CONSERVATIVE = 1; //保守匹配
public static final int MATCHING_ALGORITHM_APPROXIMATE = 2; //相近匹配
最后是
public static final float APPROXIMATE_MATCH_THRESHOLD = 0.82f; //名字距离阀值
public static final float APPROXIMATE_MATCH_THRESHOLD_FOR_EMAIL = 0.95f; //邮件地址距离阀值
成员
private static int[] sMinScore =
new int[NameLookupType.TYPE_COUNT * NameLookupType.TYPE_COUNT]; //最低分
private static int[] sMaxScore =
new int[NameLookupType.TYPE_COUNT * NameLookupType.TYPE_COUNT]; //最高分
在静态块中初始化:
static {
setScoreRange(NameLookupType.NAME_EXACT,
NameLookupType.NAME_EXACT, 99, 99);
setScoreRange(NameLookupType.NAME_VARIANT,
NameLookupType.NAME_VARIANT, 90, 90);
setScoreRange(NameLookupType.NAME_COLLATION_KEY,
NameLookupType.NAME_COLLATION_KEY, 50, 80);
...
setScoreRange(NameLookupType.NICKNAME,
NameLookupType.NICKNAME, 50, 60);
setScoreRange(NameLookupType.NICKNAME,
NameLookupType.NAME_COLLATION_KEY, 50, 60);
setScoreRange(NameLookupType.NICKNAME,
NameLookupType.EMAIL_BASED_NICKNAME, 50, 60);
}
setCcoreRange方法,依据两中类型计算索引,并赋值,将一维数组当二维用:
private static void setScoreRange(int candidateNameType, int nameType, int scoreFrom, int scoreTo) {
int index = nameType * NameLookupType.TYPE_COUNT + candidateNameType;
sMinScore[index] = scoreFrom;
sMaxScore[index] = scoreTo;
}
看MatchScore的集合成员:
private final HashMap<Long, MatchScore> mScores = new HashMap<Long, MatchScore>(); //依据contact id获取MatchScore
private final ArrayList<MatchScore> mScoreList = new ArrayList<MatchScore>(); //MatchScore列表
private int mScoreCount = 0; //匹配的分数个数,可能小于mScoreList的size
getMatchingScore方法展现了它们的关系
private MatchScore getMatchingScore(long contactId) {
MatchScore matchingScore = mScores.get(contactId); //先从缓存中获取
if (matchingScore == null) { //没有的话开始创建新的MatchScore
if (mScoreList.size() > mScoreCount) { 列表数目大于mScoreCount,则取一个元素并初始化
matchingScore = mScoreList.get(mScoreCount);
matchingScore.reset(contactId);
} else {
matchingScore = new MatchScore(contactId); //创建新的对象
mScoreList.add(matchingScore);
}
mScoreCount++; //创建一个新的MatchScore后数量加1
mScores.put(contactId, matchingScore); //放置对应contact id的MatchScore
}
return matchingScore;
}
方法
public void updateScoreWithPhoneNumberMatch(long contactId) {
updateSecondaryScore(contactId, PHONE_MATCH_SCORE);
}
public void updateScoreWithEmailMatch(long contactId) {
updateSecondaryScore(contactId, EMAIL_MATCH_SCORE);
}
public void updateScoreWithNicknameMatch(long contactId) {
updateSecondaryScore(contactId, NICKNAME_MATCH_SCORE);
}
private void updatePrimaryScore(long contactId, int score) {
getMatchingScore(contactId).updatePrimaryScore(score);
}
private void updateSecondaryScore(long contactId, int score) {
getMatchingScore(contactId).updateSecondaryScore(score);
}
public void keepIn(long contactId) {
getMatchingScore(contactId).keepIn();
}
public void keepOut(long contactId) {
getMatchingScore(contactId).keepOut();
}
一系列的更新MatchScore的方法
public void matchName(long contactId, int candidateNameType, String candidateName,
int nameType, String name, int algorithm) {
int maxScore = getMaxScore(candidateNameType, nameType); //分数矩阵转换分数是0分,无需继续匹配
if (maxScore == 0) {
return;
}
if (candidateName.equals(name)) { //名字完全匹配
updatePrimaryScore(contactId, maxScore);
return;
}
if (algorithm == MATCHING_ALGORITHM_EXACT) { //算法是完全匹配,无需继续进行
return;
}
int minScore = getMinScore(candidateNameType, nameType);
if (minScore == maxScore) {//最小分和最大分相等,无需进行
return;
}
final byte[] decodedCandidateName;
final byte[] decodedName;
try {
decodedCandidateName = Hex.decodeHex(candidateName); //转换成byte,以供后续计算
decodedName = Hex.decodeHex(name);
} catch (RuntimeException e) {
// How could this happen?? See bug 6827136
Log.e(TAG, "Failed to decode normalized name. Skipping.", e);
return;
}
NameDistance nameDistance = algorithm == MATCHING_ALGORITHM_CONSERVATIVE ?
mNameDistanceConservative : mNameDistanceApproximate;
int score;
float distance = nameDistance.getDistance(decodedCandidateName, decodedName); //计算名字举例
boolean emailBased = candidateNameType == NameLookupType.EMAIL_BASED_NICKNAME
|| nameType == NameLookupType.EMAIL_BASED_NICKNAME;
float threshold = emailBased
? APPROXIMATE_MATCH_THRESHOLD_FOR_EMAIL
: APPROXIMATE_MATCH_THRESHOLD;
if (distance > threshold) {
score = (int)(minScore + (maxScore - minScore) * (1.0f - distance)); //计算分数
} else {
score = 0;
}
updatePrimaryScore(contactId, score); //更新主要分数
}
matchName是计算名字匹配分数方法
public List<Long> prepareSecondaryMatchCandidates(int threshold)
返回符合次要分数符合要求的联系人id列表
public List<MatchScore> pickBestMatches(int threshold)
返回符合要求的MatchScore列表
public long pickBestMatch(int threshold, boolean allowMultipleMatches)
返回最匹配的联系人id,就是分数最高的那个,如果有相同的最高分,则依据allowMultipleMatches返回不同的值
RawContactMatcher
packages/providers/ContactsProvider/src/com/android/providers/contacts/aggregation/util/RawContactMatcher.java
和ContactMatcher基本一样,最大的改动是方法参数中基本都加了account id。这个类只有ContactAggregator2类在使用,ContactMatcher是ContactAggregator使用。新版本的聚合是用ContactAggregator2。ContactsProvider2中依据算法版本号值不同创建不同的对象:
private void initForDefaultLocale() {
...
PROPERTY_AGGREGATION_ALGORITHM_VERSION = (value == 0)
? AGGREGATION_ALGORITHM_OLD_VERSION
: AGGREGATION_ALGORITHM_NEW_VERSION;
mContactAggregator = (value == 0)
? new ContactAggregator(this, mContactsHelper,
createPhotoPriorityResolver(context), mNameSplitter, mCommonNicknameCache)
: new ContactAggregator2(this, mContactsHelper,
createPhotoPriorityResolver(context), mNameSplitter, mCommonNicknameCache);
...
}
目前是新版本号,也就是使用ContactAggregator2。
RawContactMatchingCandidates
packages/providers/ContactsProvider/src/com/android/providers/contacts/aggregation/util/RawContactMatchingCandidates.java
该类就是保存匹配联系人的一些相关数据,ContactAggregator2中用到
private List<MatchScore> mBestMatches; //保存MatchScore
private Set<Long> mRawContactIds = null; //全部contact id
private Map<Long, Long> mRawContactToContact = null; //RawContact id对应的Contact id
private Map<Long, Long> mRawContactToAccount = null; //RawContact id对应的Account id
CommonNicknameCache
packages/providers/ContactsProvider/src/com/android/providers/contacts/aggregation/util/CommonNicknameCache.java使用NICKNAME_LOOKUP表的封装,见 联系人存储ContactsProvider表分析中的NICKNAME_LOOKUP表,例如 "Robert", "Bob" and "Rob"这三个属于同一个CLUSTER,表示相同的名字,只不过写法不同(西方语言是表音文字,同一读音有多个写法,中文就无此问题了)。注意这个和data表中mime类型为Nickname的数据是两回事。