最近接触的一个项目中,其中有一个功能需求就是号码归属地的查询,乍一看确实挺简单,反正数据库也都有了,只不过就是查询一下而已嘛!到了实际程序设计的时候才发现,5M的数据库光要加载起来就得1分多钟,放在android手机上跑的太慢了,没办法,只好另辟蹊径了!!!
本文的基本思路如下:
1. 先把数据进行分组,即每一个地区一个组,例如
1898742 1898743 1898744 :云南曲靖
1894380 1894381 1894382 :吉林松原
2. 把电话号码进行排序,目的就是为了找到电话号码的区间,例如
1894815 --> 1899819 :广东珠海,
找到了一个区段,这样就不用把每个电话号码读存储下来, 只要存储一个区间就好,
这样可以大大节省存储空间
3. 设计新的存储格式,本文的程序采用如下方式存储
第一条电话记录在文件中的位置偏移 | 最后一条电话记录在文件中的位置偏移 |
电话归属地的字符串(例如:辽宁大连,湖北武汉,广东珠海,广东深圳..., 字符串之间以逗号分隔) | |
第一条电话记录 (例如:1894815{代表号码段起始值} 5{代表连续的号码个数} 2{代表该归属地字符串在所有归属地字符串中的偏移量}) | |
第二条电话记录 | |
... | |
最后一条电话记录 |
4. 归属地查询
根据用户输入的电话号码,利用二分查找法可快速定位到该记录在文件中的位置偏移,读出该记录中位置字符串的偏移值,进而查表即可找到归属地
程序设计,源码如下:
package com.carey.tel;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
public class JavaTelArea {
private static JavaTelArea jta = null;
private static final String INDEXDATAFILE = "tel.bin";
private static Hearder cacheHearder = null;
private static AreaCode cacheAreaCode = null;
private static String cacheIndexFilePath = null;
public static void main(String[] args) {
JavaTelArea ctc = JavaTelArea.getInstance();
//ctc.genIndexFile("e:\", "e:\telphone.txt");
String indexPath = "e:\";
long startTime = new Date().getTime();
String searchNum = "13889650920";
String res = ctc.searchTel(indexPath, searchNum);
System.out.println(searchNum + " : " + res);
System.out.println(System.currentTimeMillis() - startTime + "ms");
startTime = new Date().getTime();
searchNum = "+8613659867583";
res = ctc.searchTel(indexPath, searchNum);
System.out.println(searchNum + " : " + res);
System.out.println(System.currentTimeMillis() - startTime + "ms");
startTime = new Date().getTime();
searchNum = "1301815";
res = ctc.searchTel(indexPath, searchNum);
System.out.println(searchNum + " : " + res);
System.out.println(System.currentTimeMillis() - startTime + "ms");
startTime = new Date().getTime();
searchNum = "1301816";
res = ctc.searchTel(indexPath, searchNum);
System.out.println("没有预测");
System.out.println(searchNum + " : " + res);
System.out.println(System.currentTimeMillis() - startTime + "ms");
startTime = new Date().getTime();
searchNum = "1301816";
res = ctc.searchTel(indexPath, searchNum, true);
System.out.println("根据号码连贯性原理预测");
System.out.println(searchNum + " : " + res);
System.out.println(System.currentTimeMillis() - startTime + "ms");
startTime = new Date().getTime();
searchNum = "1301817";
res = ctc.searchTel(indexPath, searchNum);
System.out.println(searchNum + " : " + res);
System.out.println(System.currentTimeMillis() - startTime + "ms");
}
private HashMap<Long, String> generateTestData() {
HashMap<Long, String>
telToAreaCode = new HashMap<Long, String>();
telToAreaCode.put(1310944l, "新疆伊犁州");
telToAreaCode.put(1301263l, "新疆伊犁州");
telToAreaCode.put(1301264l, "新疆伊犁州");
telToAreaCode.put(1301260l, "新疆伊犁州");
telToAreaCode.put(955L, "海南");
telToAreaCode.put(1320955l, "海南");
telToAreaCode.put(1320957l, "海南");
telToAreaCode.put(1300561L, "陕西商州");
telToAreaCode.put(1300562L, "陕西商州");
return telToAreaCode;
}
public static synchronized JavaTelArea getInstance() {
if (jta == null) {
jta = new JavaTelArea();
}
return jta;
}
/** * Generate Index File (tel.bin) * */
private void genIndexFile(String indexFilePath, String souceFile) {
ArrayList<String> strs = readFileToList(souceFile);
HashMap<Long, String>
telToArea = createTel2AreaHashMap(strs);
writeDate(indexFilePath + INDEXDATAFILE, telToArea);
}
/** * read file content to String array list, every line one string. * */
private ArrayList<String> readFileToList(String filePath) {
final ArrayList<String>
strLists = new ArrayList<String>();
BufferedReader bReader = null;
try {
bReader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath)));
String str = bReader.readLine();
while (str != null) {
strLists.add(str);
str = bReader.readLine();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bReader != null) {
try {
bReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return strLists;
}
/** * create telephone number to area hash map. * */
HashMap<Long, String> createTel2AreaHashMap(ArrayList<String> strs) {
final HashMap<Long, String>
telToArea = new HashMap<Long, String>();
String[] tels = null;
int len = 0;
String strArea = null;
for (String string : strs) {
tels = string.split(" ");
len = tels.length;
strArea = tels[len - 1].substring(1);
for (int i = 0; i < len - 1; i++) {
telToArea.put(Long.valueOf(tels[i]), strArea);
}
}
return telToArea;
}
/** * combined the adjacent Records. * */
private void combinedRecords(ArrayList<Record> records, Record newRecord) {
int size = records.size();
if (size > 0&& records.get(size - 1).areacodeIndex == newRecord.areacodeIndex) {
// combined
Record lastRecord = records.get(size - 1);
lastRecord.numCnt = (int) (newRecord.baseTelNum - lastRecord.baseTelNum) + newRecord.numCnt;
} else {
records.add(newRecord);
}
}
/** * write index info to file. * */
private void writeDate(String filePath, HashMap<Long, String> telToAreaCode) {
// 1. get all area info
ArrayList<String> tmpAreaCodes = new ArrayList<String>(telToAreaCode.values());
ArrayList<String> strAreaCodes = new ArrayList<String>();
for (String str : tmpAreaCodes) {
if (!strAreaCodes.contains(str)) {
strAreaCodes.add(str);
}
}
tmpAreaCodes.clear();
tmpAreaCodes = null;
StringBuffer sb = new StringBuffer();
for (String str : strAreaCodes) {
sb.append(str + ",");
}
sb.deleteCharAt(sb.length() - 1);
AreaCode areaCode = new AreaCode(sb.toString());
areaCode.print();
// 2. Sort HashMap and combined the adjacent telephone number
ArrayList<Long> telNunms = new ArrayList<Long>(telToAreaCode.keySet());
Collections.sort(telNunms);
ArrayList<Record> records = new ArrayList<Record>();
long baseNum = 0;
String baseArea = null;
int numCnt = 0;
for (Long tm : telNunms) {
if (numCnt == 0) {
baseNum = tm;
baseArea = telToAreaCode.get(tm);
numCnt = 1;
} else {
if (tm == baseNum + numCnt && baseArea.equals(telToAreaCode.get(tm))) {
numCnt++;
} else {
combinedRecords(records, new Record(baseNum, numCnt, strAreaCodes.indexOf(baseArea)));
baseNum = tm;
baseArea = telToAreaCode.get(tm);
numCnt = 1;
}
}
}
combinedRecords(records, new Record(baseNum, numCnt, strAreaCodes.indexOf(baseArea)));
// for (Record record : records) {
// record.print();
// }
// 3. Write data to the file
RandomAccessFile raf = null;
try {
raf = new RandomAccessFile(filePath, "rw");
raf.seek(0);
Hearder hearder = new Hearder();
hearder.firstRecordOffset = hearder.Size() + areaCode.Size();
hearder.lastRecordOffset = hearder.firstRecordOffset + (records.size() - 1) * records.get(0).Size();
hearder.print();
hearder.write(raf);
areaCode.write(raf);
for (Record record : records) {
record.write(raf);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
class Hearder {
int firstRecordOffset;
int lastRecordOffset;
public int Size() {
return (Integer.SIZE + Integer.SIZE) / Byte.SIZE;
}
public void write(RandomAccessFile raf) throws IOException {
raf.writeInt(this.firstRecordOffset);
raf.writeInt(this.lastRecordOffset);
}
public void read(RandomAccessFile raf) throws IOException {
this.firstRecordOffset = raf.readInt();
this.lastRecordOffset = raf.readInt();
}
public void print() {
System.out.println("===== Hearder ===== ");
System.out.println("[" + firstRecordOffset + " , " + lastRecordOffset + "]");
}
}
class AreaCode {
private String areacode;
private String[] codes;
public AreaCode() {
this("");
}
public AreaCode(String areacode) {
this.areacode = areacode;
this.codes = null;
}
public int Size() {
return areacode.getBytes().length + (Integer.SIZE / Byte.SIZE);
}
public void print() {
System.out.println("===== AreaCode ===== ");
System.out.println("[" + areacode.getBytes().length + "]" + areacode);
}
public void write(RandomAccessFile raf) throws IOException {
raf.writeInt(areacode.getBytes().length);
raf.write(this.areacode.getBytes());
}
public void read(RandomAccessFile raf) throws IOException {
byte[] bytes = new byte[raf.readInt()];
raf.read(bytes);
this.areacode = new String(bytes);
}
public String getCodeByIndex(int index) {
if (this.codes == null) {
this.codes = this.areacode.split(",");
}
return (index < 0 || this.codes == null || index >= this.codes.length) ? null : this.codes[index];
}
}
class Record {
long baseTelNum;
int numCnt;
int areacodeIndex;
public Record() {
this(0, 0, 0);
}
public Record(long baseTelNum, int numCnt, int areacodeIndex) {
this.baseTelNum = baseTelNum;
this.numCnt = numCnt;
this.areacodeIndex = areacodeIndex;
}
public void print() {
System.out.println("===== Record ===== ");
System.out.println("<" + baseTelNum + "> <" + numCnt + "> <" + areacodeIndex + ">");
}
public int Size() {
return (Long.SIZE + Integer.SIZE) / Byte.SIZE;
}
public void write(RandomAccessFile raf) throws IOException {
raf.writeLong(this.baseTelNum);
int tmp = this.numCnt << 16;
tmp += 0xFFFF & this.areacodeIndex;
raf.writeInt(tmp);
}
public void read(RandomAccessFile raf) throws IOException {
this.baseTelNum = raf.readLong();
int tmp = raf.readInt();
this.numCnt = tmp >> 16;
this.areacodeIndex = 0xFFFF & tmp;
}
public int inWhich(long telNum) {
if (telNum < this.baseTelNum) {
return -1;
} else if (telNum >= this.baseTelNum + this.numCnt) {
return 1;
} else {
return 0;
}
}
}
public String searchTel(String indexFilePath, String telNum) {
return searchTel(indexFilePath, telNum, false);
}
/** * search * */
public String searchTel(String indexFilePath, String telNum, boolean forecast) {
StringBuffer sb = new StringBuffer(telNum);
// +
if (sb.charAt(0) == '+') {
sb.deleteCharAt(0);
}
// 86
if (sb.charAt(0) == '8' && sb.charAt(1) == '6') {
sb.delete(0, 2);
}
// 以0开头,是区号
if (sb.charAt(0) == '0') {
sb.deleteCharAt(0);
// 首先按照4位区号查询,若查询为空,再按3位区号查询
if (sb.length() >= 3) {
sb.delete(3, sb.length());
}
String dial = searchTel(indexFilePath, Long.valueOf(sb.toString()),false);
if (dial != null) {
return dial;
}
if (sb.length() >= 2) {
sb.delete(2, sb.length());
}
}
// 以1开头,是手机号或者服务行业号码
else if (sb.charAt(0) == '1') {
// 首先按照手机号码查询,若查询为空,再按特殊号码查询
if (sb.length() > 7) {
String dial = searchTel(indexFilePath, Long.valueOf(sb.substring(0, 8)),false);
if (dial != null) {
return dial;
}
dial = searchTel(indexFilePath, Long.valueOf(sb.toString()),false);
if (dial != null) {
return dial;
}
// 只需要保留7位号码就ok了,多余的删掉
if (sb.length() > 7) {
sb.delete(7, sb.length());
}
} else {
//小于7位,最有可能是服务号码
//do nothing.
}
}
// 以其他数字开头,这也不知道是啥号码了
else {
//do nothing.
}
return searchTel(indexFilePath, Long.valueOf(sb.toString()), forecast);
}
private String searchTel(String indexFilePath, long telNum, boolean forecast) {
RandomAccessFile raf = null;
try {
raf = new RandomAccessFile(indexFilePath + INDEXDATAFILE, "r");
if (cacheIndexFilePath == null || !cacheIndexFilePath.equals(indexFilePath)) {
cacheIndexFilePath = indexFilePath;
cacheHearder = new Hearder();
cacheHearder.read(raf);
cacheHearder.print();
cacheAreaCode = new AreaCode();
cacheAreaCode.read(raf);
cacheAreaCode.print();
}
int index = lookUP(raf, cacheHearder.firstRecordOffset, cacheHearder.lastRecordOffset, telNum, forecast);
return cacheAreaCode.getCodeByIndex(index);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return null;
}
private int lookUP(RandomAccessFile raf, long startpos, long endpos, long looknum, boolean forecast) throws IOException {
Record record = new Record();
long seekpos = 0;
do {
seekpos = startpos + (endpos - startpos) / record.Size() / 2 * record.Size();
raf.seek(seekpos);
record.read(raf);
if (record.inWhich(looknum) > 0) {
startpos = seekpos + record.Size();
} else if (record.inWhich(looknum) < 0) {
endpos = seekpos - record.Size();
} else {
return record.areacodeIndex;
}
} while (startpos <= endpos);
if (forecast) {
return record.areacodeIndex;
} else {
return -1;
}
}
}
程序运行情况如下:
==== Hearder ===== [4554 , 605622] ===== AreaCode ===== [4542]北福建南平,福建三明,海果洛,青海海南,... 13889650920 : 辽宁大连 20ms +8613659867583 : 湖北武汉 2ms 1301815 : 四川泸州 2ms 没有预测1301816 : null 2ms 根据号码连贯性原理预测1301816 : 四川泸州 1ms 1301817 : 四川宜宾 2ms
可以看到,除了第一次查询的时候要加载索引文件大约耗时20ms,以后的查询基本都在1ms,速度非常快了!!!
本程序的测试data文件下载telinfo