最后, 开始计算TF-IDF
TF i - IDF i,j = TF i,j * IDF i
代码:
class TFIDF
{
ArrayList<HashMap<String,Double>> TFIDFMainFileList = new ArrayList<HashMap<String,Double>>();
ArrayList<HashMap<String,Double>> TFMainFileList = new ArrayList<HashMap<String,Double>>();
ArrayList<ArrayList<String>> MainFileList = new ArrayList<ArrayList<String>>();
HashMap<String,Double> IDFMainFileList = new HashMap<String, Double>();
public TFIDF(ArrayList<ArrayList<String>> mfl, HashMap<String,Double> idfm, ArrayList<HashMap<String,Double>> tfmfl)
{
MainFileList = mfl;
IDFMainFileList = idfm;
TFMainFileList = tfmfl;
}
public ArrayList<HashMap<String,Double>> PrintTFIDF()
{
for(int i=0; i<MainFileList.size(); i++)
{
ArrayList<String> SubFileList = MainFileList.get(i);
HashMap<String,Double> tfFile = TFMainFileList.get(i);
HashMap<String,Double> GetTFIDF = new HashMap<String, Double>();
ArrayList<String> Index = new ArrayList<String>();
//Take hashmap level from ArrayList<HashMap<String,Double>>
for(int j=0; j<SubFileList.size(); j++)
{
Index.add(SubFileList.get(j));
double tf = tfFile.get(SubFileList.get(j));
double idf = IDFMainFileList.get(SubFileList.get(j));
double tfidf = tf * idf;
GetTFIDF.put(SubFileList.get(j),tfidf);
}
TFIDFMainFileList.add(GetTFIDF);
//==========================================================
}
return TFIDFMainFileList;
}
}
本文介绍了一种基于TF-IDF算法的文本特征提取方法,并详细展示了如何通过具体步骤实现TF-IDF算法。包括构建文件列表、计算词频(TF)、逆文档频率(IDF)以及最终的TF-IDF得分。
4万+

被折叠的 条评论
为什么被折叠?



