plda
largetalk
技术宅,享受coding, python粉丝,热爱c, linux,想找个人一起去旅行
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
plda源码(一)
common.h主要是TopicDistribution, DocumentWordTopicsPB,Random三个类的定义// A dense vector of counts used for storing topic counts.// No memory allocation here, just keep pointers.template <class T>c...原创 2018-12-14 15:37:27 · 456 阅读 · 0 评论 -
plda源码(十)
plda源码(十)Sparse LDAStandardGibbs采样公式如下q(z)=nk,¬it+βnk,¬i+βV(nm,¬ik+αk)\begin{aligned} q(z) &amp;= \frac{n^{t}_{k,\neg i} + \beta}{n_{k,\neg i} + \beta V}(n^{k}_{m,\neg i} + \alpha_k) \end{ali...原创 2019-01-15 16:39:03 · 465 阅读 · 0 评论 -
plda源码(八)
plda源码(八)sampler.h终于来到最关键的地方,lda不能直接算出word和document的topic分布,只能不停的随机对应的topic分布,然后选择接受与否,即Gibbs采样// LDASampler trains LDA models and computes statistics about documents in// LDA models.class LDASam...原创 2019-01-09 17:53:32 · 362 阅读 · 0 评论 -
plda源码(十二)
plda源码(十一)LightLDAGibbs Samplingp(zdi=k∣rest)∝(nkd−di+αk)(nkw−di+βw)nk−di+βp(z_{di}=k | rest) ∝ \frac{(n^{−di}_{kd}+\alpha_k)(n^{−di}_{kw}+\beta_w)}{n^{−di}_k+\beta}p(zdi=k∣rest)∝nk−di+β(nkd−di+...原创 2019-02-01 15:56:48 · 453 阅读 · 0 评论 -
plda源码(七)
plda源码(七)FastMatrixvals和mapped_vecclass FastMatrix { public: struct FElem { int col; double val; }; class ElemIter {//行迭代器 public: ElemIter(FElem * ptr, int size) ...原创 2019-01-07 18:08:37 · 370 阅读 · 0 评论 -
plda源码(六)
plda源码(六)LDAModel只增加了IncrementTopic和ReassignTopic函数class LDAModel : public ModelBase<int32> void LDAModel::IncrementTopic(int word, int topic, int32 count) { CHECK_GT(num_topics(), topic)...原创 2019-01-07 15:21:55 · 420 阅读 · 0 评论 -
plda源码(五)
plda源码(五)model_base.h存储所有word的topic分布// The ModelBase class stores topic-word co-occurrence count vectors as// well as a vector of global topic occurrence counts. The global vector is// the sum ...原创 2019-01-07 12:52:54 · 360 阅读 · 0 评论 -
plda源码(九)
plda源码(九)BaseSampler是把Sampler抽象一下,添加词相识度和为新采样方法提供接口 class BaseSampler { public: BaseSampler(double alpha, double beta, LDAModel* model, LDAAccumulative...原创 2019-01-10 16:49:34 · 362 阅读 · 0 评论 -
plda源码(四)
corpus.htypedef std::list&lt;LDADocument*&gt; LDACorpus;// Stores multiple documents and manages the memory pool of// the topic distributions.class LDACorpusManager { public: LDACorpusManage...原创 2018-12-19 00:10:01 · 418 阅读 · 0 评论 -
plda源码(三)
document.hclass DocumentWordTopicsPB;// Stores a document as a bag of words and provides methods for interacting// with Gibbs LDA models.class LDADocument { //存储一个文档topic分布的类 public: // An ite...原创 2018-12-18 16:59:37 · 333 阅读 · 0 评论 -
plda源码(二)
vocabulary.cc存储单词到id的映射class Vocabulary { public: int GetOrCreateID(string word,bool &created); bool GetID(string word, int &id) const; bool GetWordByID(int id, string &w...原创 2018-12-17 15:38:43 · 388 阅读 · 0 评论 -
plda源码(十一)
class VoseAlias {public: unsigned short n; //Dimension double wsum; //Sum of proportions std::vector&lt;std::pair&lt;double, unsigned short&gt;&gt; table; //Alias probabilities and i...原创 2019-01-30 10:42:11 · 396 阅读 · 0 评论
分享