For Bayesian diagnostic classification models
W. Jake Thompson, Ph.D.
The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D210045 to the University of Kansas. The opinions expressed are those of the authors and do not represent the views of the the Institute or the U.S. Department of Education.
Absolute fit: How well does a model fit the data?
Relative fit: How well does a model fit compared to another model?
Different methods available depending on how the model was estimated (e.g., maximum likelihood, MCMC)
Categorical response data create sparse data matrices
Limited-information indices use lower-order summaries of the contingencies tables (Maydue-Olivares & Joe, 2005)
Most popular method for model fit in DCMs is the M2 (Hansen et al., 2016; Liu et al., 2016)
Not constrained to limited-information indices
Posterior predictive model checks (PPMCs)
In this study, we examine a PPMC of the the raw score distribution (Park et al., 2015; Thompson, 2019)
Information criteria such as the AIC (Akaike, 1973), BIC (Schwarz, 1978), or similar
Compare the information criteria for each competing model
These methods are often inappropriate when using a Bayesian estimation process (e.g., MCMC; Hollenbach & Montgomery, 2020)
Information criteria that are designed for Bayesian estimation methods
Leave-one-out (LOO) cross validation with Pareto-smoothed importance sampling (Vehtari et al., 2017)
As with more traditional methods, we compare the LOO for each competing model
Assessment of model fit is primarily limited to methods that rely on point estimates (e.g., M2, AIC, BIC)
Research has not compared the efficacy of Bayesian measures of model fit to the more commonly used measures
Recent software advances have made Bayesian estimation of DCMs more accessible to applied researchers
Simulation study to evaluate the efficacy of Bayesian measures of model fit
Research questions:
Preprint
Generating model | Estimated model | Absolute-fit flag | Relative-fit preference |
---|---|---|---|
DINA | DINA | No | DINA |
DINA | LCDM | No | DINA |
LCDM | DINA | Yes | LCDM |
LCDM | LCDM | No | LCDM |
Bayesian methods performed as well or better for absolute fit
Bayesian methods for relative fit performed as well or better than has been reported for other non-Bayesian information criteria
Future directions
For each iteration, calculate the total number of respondents at each score point
Calculate the expected number of respondents at each score point
For each iteration, calculate the total number of respondents at each score point
Calculate the expected number of respondents at each score point
Calculate the observed number of respondents at each score point
\[ \chi^2_{rep} = \sum_{s=0}^S\frac{[n_s - E(n_s)]^2}{E(n_s)} \]
#> [1] 25.26204
For each replication, calculate a χ2rep statistic
Create a distribution of the expected value of the χ2 statistic
For each replication, calculate a χ2rep statistic
Create a distribution of the expected value of the χ2 statistic
Calculate the χ2 value comparing the observed data to the expectation
Calculate the proportion of χ2rep draws greater than our observed value
Flag if the observed value is outside a predefined boundary (e.g., .025 < ppp < 0.975)
In our example ppp = 0.856