简介
Gensim = “Generate Similar”
Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library dml.cz in 2008, where it served to generate a short list of the most similar articles to a given article.
I also wanted to try these fancy “Latent Semantic Methods”, but the libraries that realized the necessary computation were not much fun to work with.
Naturally, I set out to reinvent the wheel. Our 2010 LREC publication describes the initial design decisions behind Gensim: clarity, efficiency and scalability. It is fairly representative of how Gensim works even today.