What is a Structural Topic Model (STM)?

STM is a framework for topic modeling that allows document-level metadata, such as treatment groups or demographics, to influence topic prevalence and content directly within the model.

How does this skill help choose the number of topics?

It provides a workflow for evaluating candidate models across multiple diagnostics including semantic coherence, exclusivity (FREX), held-out likelihood, and residuals.

Does this skill assist with preprocessing decisions?

Absolutely. It covers the substantive impact of stemming, stopword removal, and term-frequency thresholds, helping you justify these choices for your specific domain.

Can I use this for BERTopic or LDA?

Yes, the skill includes criteria for when to use traditional LDA or embedding-based methods like BERTopic, specifically for short texts or multilingual corpora.

Topic Modeling for Survey Data

Name: Topic Modeling for Survey Data
Author: scdenney

byscdenney

•

Data Science & ML

Guides the specification, validation, and reporting of Structural Topic Models (STM) for survey and experimental text data.

This skill provides a rigorous methodological framework for performing topic modeling with a focus on social science standards. It assists users in selecting between Structural Topic Models (STM), LDA, and BERTopic, while providing deep technical guidance on preprocessing text, selecting the optimal number of topics using diagnostic metrics, and estimating the effects of metadata covariates on topic prevalence. Whether you are analyzing open-ended survey responses or experimental corpora, this skill ensures your analysis is reproducible, validated against treatment groups, and reported according to academic best practices.

Key Features

0115 GitHub stars

02Methodological guidance on text preprocessing and frequency thresholds

03Standardized reporting templates for DA-RT compliant research

04Validation techniques including permutation tests and FREX word analysis

05Multi-metric diagnostic evaluation for selecting topic counts (K)

06Structural Topic Model (STM) specification with metadata covariates

Use Cases

01Analyzing open-ended survey responses with respondent demographic metadata

02Developing reproducible text analysis pipelines for academic publications

03Estimating the effect of experimental treatments on text discussion topics

Key Features

0115 GitHub stars

02Methodological guidance on text preprocessing and frequency thresholds

03Standardized reporting templates for DA-RT compliant research

04Validation techniques including permutation tests and FREX word analysis

05Multi-metric diagnostic evaluation for selecting topic counts (K)

06Structural Topic Model (STM) specification with metadata covariates

Use Cases

01Analyzing open-ended survey responses with respondent demographic metadata

02Developing reproducible text analysis pipelines for academic publications

03Estimating the effect of experimental treatments on text discussion topics