Topic modelingis used to analyze clustersof "topics" or co-occurring words in a text or series of texts, often with the aim of understanding recurring themes.
Tools
Out-of-the-Box
MALLET
For statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to textTopic Modeling Tool
For Latent Dirichlet Allocation (LDA)topic modelingFactorie
For natural language processing and information integration such as segmentation, tokenization, part-of-speech tagging, named entity recognition, dependency parsing, mention finding, coreference, lexicon-matching, and latent Dirichlet allocationjsLDA
For in-browser topic modeling
Programmatic
Genism
For latent semantic analysis (LSA, LSI, SVD), unsupervised topic modeling (Latent Dirichlet allocation; LDA), embeddings (fastText, word2vec, doc2vec), non-negative matrix factorization (NMF), and term frequency–inverse document frequency (tf-idf)NLTK (Natural Language Toolkit)
For accessing corpora and lexicons, tokenization, stemming, (part-of-speech) tagging, parsing, transformations, translation, chunking, collocations, classification, clustering, topic segmentation, concordancing, frequency distributions, sentiment analysis, named entity recognition, probability distributions, semantic reasoning, evaluation metrics, manipulating linguistic data (in SIL Toolbox format), language modeling, and other NLP tasksspaCy
For tokenization, named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and morescikit-learn
For classification, regression, clustering, dimensionality reduction, model selection, and preprocessingNLP Architect
For word chunking, named entity recognition, dependency parsing, intent extraction, sentiment classification, language models, transformations, Aspect Based Sentiment Analysis (ABSA), joint intent detection and slot tagging, noun phrase embedding representation (NP2Vec), most common word sense detection, relation identification, cross document coreference, noun phrase semantic segmentation, term set expansion, topics and trend analysis, optimizing NLP/NLU modelsTop2Vec
For topic modeling,semantic search, andword and document embeddings
tidytext
For converting to and from non-tidy formats, word and document frequency analysis (tf-idf), n-grams and correlations, sentiment analysis with tidy data, and topic modelingtopicmodels
For Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM) by David M. Blei and co-authors and the C++ code for fitting LDA models using Gibbs sampling by Xuan-Hieu Phan and co-authors;provides an interface to the C codeBTM
For identifying topics in texts from term-term cooccurrences (hence 'biterm' topic model, BTM)topicdoc
ForLDA and CTM topic models to assist in evaluating topic quality; provide topic-specific diagnosticslda
For Latent Dirichlet Allocation and related models similar to LSA and topic modelsstm(Structural Topic Model)
For implementinga topic model derivate that can include document-level meta-data; also includes tools for model selection, visualization, and estimation of topic-covariate regressionstext2vec
For text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), and similaritiesmscstexta4r
For sentiment analysis, topic detection, language detection, and key phrase extraction;provides an interface to the Microsoft Cognitive Services Text Analytics API
Weka
For data preprocessing (e.g., stemming, data resampling,transformation),classification, regression, clustering, latent semantic analysis (LSA, LSI),association rules, visualization, filtering, and anonymization