Title: | The Q-Matrix Validation Methods Framework |
---|---|
Description: | Provide a variety of Q-matrix validation methods for the generalized cognitive diagnosis models, including the method based on the generalized deterministic input, noisy, and gate model (G-DINA) by de la Torre (2011) <DOI:10.1007/s11336-011-9207-7> discrimination index (the GDI method) by de la Torre and Chiu (2016) <DOI:10.1007/s11336-015-9467-8>, the step-wise Wald test method (the Wald method) by Ma and de la Torre (2020) <DOI:10.1111/bmsp.12156>, the Hull method by Najera et al. (2021) <DOI:10.1111/bmsp.12228>, the multiple logistic regression‑based Q‑matrix validation method (the MLR-B method) by Tu et al. (2022) <DOI:10.3758/s13428-022-01880-x>, the beta method based on signal detection theory by Li and Chen (2024) <DOI:10.1111/bmsp.12371> and Q-matrix validation based on relative fit index by Chen et la. (2013) <DOI:10.1111/j.1745-3984.2012.00185.x>. Different research methods and iterative procedures during Q-matrix validating are available. |
Authors: | Haijiang Qin [aut, cre, cph]
|
Maintainer: | Haijiang Qin <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2025-03-08 05:43:53 UTC |
Source: | https://github.com/cran/Qval |
A function to estimate parameters for cognitive diagnosis models by MMLE/EM (de la Torre, 2009; de la Torre, 2011)
or MMLE/BM (Ma & Jiang, 2020) algorithm.The function imports various functions from the GDINA
package,
parameter estimation for Cognitive Diagnostic Models (CDMs) was performed and extended. The CDM
function not
only accomplishes parameter estimation for most commonly used models (e.g., GDINA
, DINA
, DINO
,
ACDM
, LLM
, or rRUM
). Furthermore, it incorporates Bayes modal estimation
(BM; Ma & Jiang, 2020) to obtain more reliable estimation results, especially in small sample sizes.
The monotonic constraints are able to be satisfied.
CDM( Y, Q, model = "GDINA", method = "EM", mono.constraint = TRUE, maxitr = 2000, verbose = 1 )
CDM( Y, Q, model = "GDINA", method = "EM", mono.constraint = TRUE, maxitr = 2000, verbose = 1 )
Y |
A required |
Q |
A required binary |
model |
Type of model to be fitted; can be |
method |
Type of method to estimate CDMs' parameters; one out of |
mono.constraint |
Logical indicating whether monotonicity constraints should be fulfilled in estimation.
Default = |
maxitr |
A vector for each item or nonzero category, or a scalar which will be used for all items
to specify the maximum number of EM or BM cycles allowed. Default = |
verbose |
Can be |
CDMs are statistical models that fully integrates cognitive structure variables, which define the response probability of examinees on items by assuming the mechanism between attributes. In the dichotomous test, this probability is the probability of answering correctly. According to the specificity or generality of CDM assumptions, it can be divided into reduced CDM and saturated CDM.
Reduced CDMs possess specific assumptions about the mechanisms of attribute interactions, leading to clear interactions between attributes. Representative reduced models include the Deterministic Input, Noisy and Gate (DINA) model (Haertel, 1989; Junker & Sijtsma, 2001; de la Torre & Douglas, 2004), the Deterministic Input, Noisy or Gate (DINO) model (Templin & Henson, 2006), and the Additive Cognitive Diagnosis Model (A-CDM; de la Torre, 2011), the reduced Reparametrized Unified Model (rRUM; Hartz, 2002), among others. Compared to reduced models, saturated models, such as the Log-Linear Cognitive Diagnosis Model (LCDM; Henson et al., 2009) and the general Deterministic Input, Noisy and Gate model (G-DINA; de la Torre, 2011), do not have strict assumptions about the mechanisms of attribute interactions. When appropriate constraints are applied, saturated models can be transformed into various reduced models (Henson et al., 2008; de la Torre, 2011).
The LCDM is a saturated CDM fully proposed within the framework of cognitive diagnosis. Unlike reduced models that only discuss the main effects of attributes, it also considers the interaction between attributes, thus having more generalized assumptions about attributes. Its definition of the probability of correct response is as follows:
Where, represents the probability of an examinee with attribute mastery
pattern
(
and
) correctly answering item i.
Here,
denotes the number of attributes in the collapsed q-vector,
is the
intercept parameter, and
represents the effect vector of the attributes. Specifically,
is the main effect of attribute
,
is the interaction effect between
attributes
and
, and
represents the interaction effect of all required attributes.
The G-DINA, proposed by de la Torre (2011), is another saturated model that offers three types of link functions: identity link, log link, and logit link, which are defined as follows:
Where ,
, and
are the intercept parameters for the three
link functions, respectively;
,
, and
are the main effect
parameters of
for the three link functions, respectively;
,
,
and
are the interaction effect parameters between
and
for the three link functions, respectively; and
,
,
and
are the interaction effect parameters of
for the three link functions, respectively. It can be observed that when the logit link is adopted, the
G-DINA model is equivalent to the LCDM model.
Specifically, the A-CDM can be formulated as:
The rRUM, can be written as:
The item response function for the linear logistic model (LLM) can be given by:
In the DINA model, every item is characterized by two key parameters: guessing (g) and slip (s). Within
the traditional framework of DINA model parameterization, a latent variable , specific to
examinee
who has the attribute mastery pattern
and responses to
,
is defined as follows:
If examinee whose attribute mastery pattern is
has acquired every attribute
required by item i,
is given a value of 1. If not,
is set to 0. The
DINA model's item response function can be concisely formulated as such:
is the original expression of the DINA model, while
is an equivalent form of the DINA model after adding constraints in the G-DINA model.
Here,
and
.
In contrast to the DINA model, the DINO model suggests that an examinee can correctly respond to
an item if he/she have mastered at least one of the item's measured attributes. Additionally, like the
DINA model, the DINO model also accounts for parameters related to guessing and slipping. Therefore,
the main difference between DINO and DINA lies in their respective formulations. The
DINO model can be given by:
An object of class CDM
containing the following components:
An GDINA
object gained from GDINA
package or an
list
after BM algorithm, depending on which estimation is used.
Individuals' attribute parameters calculated by EAP method
Individual's posterior probability
Individuals' marginal mastery probabilities matrix
Attribute prior weights for calculating marginalized likelihood in the last iteration
Some basic model-fit indeces, including Deviance
, npar
, AIC
, BIC
. @seealso fit
Haijiang Qin <[email protected]>
de la Torre, J. (2009). DINA Model and Parameter Estimation: A Didactic. Journal of Educational and Behavioral Statistics, 34(1), 115-130. DOI: 10.3102/1076998607309474.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353. DOI: 10.1007/BF02295640.
de la Torre, J. (2011). The Generalized DINA Model Framework. Psychometrika, 76(2), 179-199. DOI: 10.1007/s11336-011-9207-7.
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301-323. DOI: 10.1111/j.1745-3984.1989.tb00336.x.
Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign.
Henson, R. A., Templin, J. L., & Willse, J. T. (2008). Defining a Family of Cognitive Diagnosis Models Using Log-Linear Models with Latent Variables. Psychometrika, 74(2), 191-210. DOI: 10.1007/s11336-008-9089-5.
Huebner, A., & Wang, C. (2011). A note on comparing examinee classification methods for cognitive diagnosis models. Educational and Psychological Measurement, 71, 407-419. DOI: 10.1177/0013164410388832.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258-272. DOI: 10.1177/01466210122032064.
Ma, W., & Jiang, Z. (2020). Estimating Cognitive Diagnosis Models in Small Samples: Bayes Modal Estimation and Monotonic Constraints. Applied Psychological Measurement, 45(2), 95-111. DOI: 10.1177/0146621620977681.
Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological methods, 11(3), 287-305. DOI: 10.1037/1082-989X.11.3.287.
Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models: A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.
################################################################ # Example 1 # # fit using MMLE/EM to fit the GDINA models # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 3 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## using MMLE/EM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "GDINA", method = "EM", maxitr = 2000, verbose = 1) ################################################################ # Example 2 # # fit using MMLE/BM to fit the DINA # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 5 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "DINA", distribute = "horder") ## using MMLE/BM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "GDINA", method = "BM", maxitr = 1000, verbose = 2) ################################################################ # Example 3 # # fit using MMLE/EM to fit the ACDM # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 5 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "ACDM", distribute = "horder") ## using MMLE/EM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "ACDM", method = "EM", maxitr = 2000, verbose = 1)
################################################################ # Example 1 # # fit using MMLE/EM to fit the GDINA models # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 3 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## using MMLE/EM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "GDINA", method = "EM", maxitr = 2000, verbose = 1) ################################################################ # Example 2 # # fit using MMLE/BM to fit the DINA # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 5 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "DINA", distribute = "horder") ## using MMLE/BM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "GDINA", method = "BM", maxitr = 1000, verbose = 2) ################################################################ # Example 3 # # fit using MMLE/EM to fit the ACDM # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 5 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "ACDM", distribute = "horder") ## using MMLE/EM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "ACDM", method = "EM", maxitr = 2000, verbose = 1)
A generic function to extract elements from objects of class CDM
, validation
or sim.data
.
Objects which can be extracted from CDM
object include:
An GDINA
object (@seealso GDINA
) gained from GDINA
package or an
list
after BM algorithm, depending on which estimation is used.
Individuals' attribute parameters calculated by EAP method
Individual's posterior probability
Individuals' marginal mastery probabilities matrix
Attribute prior weights for calculating marginalized likelihood in the last iteration
deviance, or negative two times observed marginal log likelihood
The number of parameters
AIC
BIC
Objects which can be extracted from validation
object include:
The original Q-matrix that maybe contain some mis-specifications and need to be validated.
The Q-matrix that suggested by certain validation method.
The time that CPU cost to finish the validation.
A matrix that contains the modification process of each item during each iteration.
Each row represents an iteration, and each column corresponds to the q-vector index of respective
item. The order of the indices is consistent with the row number in the matrix generated by
the attributepattern
function in the GDINA
package. Only when
maxitr
> 1, the value is available.
The number of iteration. Only when maxitr
> 1, the value is available.
An I
× K
matrix that contains the priority of every attribute for
each item. Only when the search.method
is "PAA"
, the value is available.
A list
containing all the information needed to plot the Hull plot, which is
available only when method
= "Hull"
.
Objects which can be extracted from sim.data
object include:
An N
× I
simulated item response matrix.
The Q-matrix.
An N
× K
matrix for inviduals' attribute patterns.
A list of non-zero category success probabilities for each attribute mastery pattern.
A list of delta parameters.
Higher-order parameters.
Multivariate normal distribution parameters.
A matrix of item/category success probabilities for each attribute mastery pattern.
extract(object, what, ...) ## S3 method for class 'CDM' extract(object, what, ...) ## S3 method for class 'validation' extract(object, what, ...) ## S3 method for class 'sim.data' extract(object, what, ...)
extract(object, what, ...) ## S3 method for class 'CDM' extract(object, what, ...) ## S3 method for class 'validation' extract(object, what, ...) ## S3 method for class 'sim.data' extract(object, what, ...)
object |
objects from class |
what |
what to extract |
... |
Additional arguments. |
extract(CDM)
: various elements of CDM
object
extract(validation)
: various elements of validation
object
extract(sim.data)
: various elements of sim.data
object
set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 3 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 1000, IQ = IQ, model = "GDINA", distribute = "horder") extract(example.data,"dat") ## using MMLE/EM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "GDINA", method = "EM", maxitr = 2000, verbose = 1) extract(example.CDM.obj,"alpha") extract(example.CDM.obj,"npar") example.MQ <- sim.MQ(example.Q, 0.1) example.CDM.obj <- CDM(example.data$dat, example.MQ, model = "GDINA", method = "EM", maxitr = 2000, verbose = 1) validation.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "MLR-B", eps = 0.90) extract(validation.obj,"Q.sug")
set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 3 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 1000, IQ = IQ, model = "GDINA", distribute = "horder") extract(example.data,"dat") ## using MMLE/EM to fit GDINA model example.CDM.obj <- CDM(example.data$dat, example.Q, model = "GDINA", method = "EM", maxitr = 2000, verbose = 1) extract(example.CDM.obj,"alpha") extract(example.CDM.obj,"npar") example.MQ <- sim.MQ(example.Q, 0.1) example.CDM.obj <- CDM(example.data$dat, example.MQ, model = "GDINA", method = "EM", maxitr = 2000, verbose = 1) validation.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "MLR-B", eps = 0.90) extract(validation.obj,"Q.sug")
Calculate relative fit indices (-2LL, AIC, BIC, CAIC, SABIC) and absolute fit indices ( test,
, SRMSR)
using the
modelfit
function in the GDINA
package.
fit(Y, Q, model = "GDINA")
fit(Y, Q, model = "GDINA")
Y |
A required |
Q |
A required binary |
model |
Type of model to be fitted; can be |
An object of class list
. The list contains various fit indices:
The number of parameters.
The Deviance.
The Akaike information criterion.
The Bayesian information criterion.
The consistent Akaike information criterion.
The Sample-size Adjusted BIC.
A vector consisting of statistic, degrees of freedom, significance level, and
(Liu, Tian, & Xin, 2016).
The standardized root mean squared residual (SRMSR; Ravand & Robitzsch, 2018).
Haijiang Qin <[email protected]>
Khaldi, R., Chiheb, R., & Afa, A.E. (2018). Feed-forward and Recurrent Neural Networks for Time Series Forecasting: Comparative Study. In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications (LOPAL 18). Association for Computing Machinery, New York, NY, USA, Article 18, 1–6. DOI: 10.1145/3230905.3230946.
Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41, 3–26. DOI: 10.3102/1076998615621293.
Ravand, H., & Robitzsch, A. (2018). Cognitive diagnostic model of best choice: a study of reading comprehension. Educational Psychology, 38, 1255–1277. DOI: 10.1080/01443410.2018.1489524.
set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 5 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## calculate fit indices fit.indices <- fit(Y = example.data$dat, Q = example.Q, model = "GDINA") print(fit.indices)
set.seed(123) library(Qval) ## generate Q-matrix and data to fit K <- 5 I <- 30 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## calculate fit indices fit.indices <- fit(Y = example.data$dat, Q = example.Q, model = "GDINA") print(fit.indices)
The function is able to calculate the index for all items after fitting
CDM
or directly.
get.beta(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
get.beta(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
Y |
A required |
Q |
A required binary |
CDM.obj |
An object of class |
model |
Type of model to be fitted; can be |
For item with the q-vector of the
-th (
) type, the
index is computed as follows:
In the formula, represents the number of examinees in attribute mastery pattern
who correctly
answered item
, while
is the total number of examinees in attribute mastery pattern
.
denotes the probability that an examinee in attribute mastery pattern
answers
item
correctly when the q-vector for item
is of the
-th type. In fact,
is the observed probability that an examinee in attribute mastery pattern
answers
item
correctly, and
represents the difference between the actual proportion of
correct answers for item
in each attribute mastery pattern and the expected probability of answering the
item incorrectly in that state. Therefore, to some extent,
can be considered as a measure
of discriminability.
An object of class matrix
, which consisted of index for each item and each possible attribute mastery pattern.
Haijiang Qin <[email protected]>
Li, J., & Chen, P. (2024). A new Q-matrix validation method based on signal detection theory. British Journal of Mathematical and Statistical Psychology, 00, 1–33. DOI: 10.1111/bmsp.12371
library(Qval) set.seed(123) ## generate Q-matrix and data K <- 3 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) model <- "DINA" example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = model, distribute = "horder") ## calculate beta directly beta <-get.beta(Y = example.data$dat, Q = example.Q, model = model) print(beta) ## calculate beta after fitting CDM example.CDM.obj <- CDM(example.data$dat, example.Q, model=model) beta <-get.beta(CDM.obj = example.CDM.obj) print(beta)
library(Qval) set.seed(123) ## generate Q-matrix and data K <- 3 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) model <- "DINA" example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = model, distribute = "horder") ## calculate beta directly beta <-get.beta(Y = example.data$dat, Q = example.Q, model = model) print(beta) ## calculate beta after fitting CDM example.CDM.obj <- CDM(example.data$dat, example.Q, model=model) beta <-get.beta(CDM.obj = example.CDM.obj) print(beta)
matrixCalculate matrix for stauted CDMs (de la Torre, 2011). The
matrix is a matrix used to
represent the interaction mechanisms between attributes.
get.Mmatrix(K = NULL, pattern = NULL)
get.Mmatrix(K = NULL, pattern = NULL)
K |
The number of attributes. Can be |
pattern |
The attribute mastery pattern matrix containing all possible attribute mastery pattern.
Can be gained from |
An object of class matrix
.
Haijiang Qin <[email protected]>
de la Torre, J. (2011). The Generalized DINA Model Framework. Psychometrika, 76(2), 179-199. DOI: 10.1007/s11336-011-9207-7.
library(Qval) example.Mmatrix <- get.Mmatrix(K = 5)
library(Qval) example.Mmatrix <- get.Mmatrix(K = 5)
This function will provide the priorities of attributes for all items.
get.priority(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
get.priority(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
Y |
A required |
Q |
A required binary |
CDM.obj |
An object of class |
model |
Type of model to fit; can be |
The calculation of priorities is straightforward (Qin & Guo, 2025): the priority of an attribute is the regression coefficient obtained from a LASSO multinomial logistic regression, with the attribute as the independent variable and the response data from the examinees as the dependent variable. The formula (Tu et al., 2022) is as follows:
Where represents the response of examinee
on item
,
denotes the marginal mastery probabilities of examinee
(which can be obtained from the return value
alpha.P
of the CDM
function),
is the intercept term, and
represents the regression coefficient.
The LASSO loss function can be expressed as:
Where is the penalized likelihood,
is the original likelihood,
and
is the tuning parameter for penalization (a larger value imposes a stronger penalty on
).
The priority for attribute
is defined as:
A matrix containing all attribute priorities.
Qin, H., & Guo, L. (2025). Priority attribute algorithm for Q-matrix validation: A didactic. Behavior Research Methods, 57(1), 31. DOI: 10.3758/s13428-024-02547-5.
Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models: A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.
set.seed(123) library(Qval) ## generate Q-matrix and data K <- 5 I <- 20 IQ <- list( P0 = runif(I, 0.1, 0.3), P1 = runif(I, 0.7, 0.9) ) Q <- sim.Q(K, I) data <- sim.data(Q = Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") MQ <- sim.MQ(Q, 0.1) CDM.obj <- CDM(data$dat, MQ) priority <- get.priority(data$dat, Q, CDM.obj) head(priority)
set.seed(123) library(Qval) ## generate Q-matrix and data K <- 5 I <- 20 IQ <- list( P0 = runif(I, 0.1, 0.3), P1 = runif(I, 0.7, 0.9) ) Q <- sim.Q(K, I) data <- sim.data(Q = Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") MQ <- sim.MQ(Q, 0.1) CDM.obj <- CDM(data$dat, MQ) priority <- get.priority(data$dat, Q, CDM.obj) head(priority)
The function is able to calculate the proportion of variance accounted for () for all items
after fitting
CDM
or directly.
get.PVAF(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
get.PVAF(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
Y |
A required |
Q |
A required binary |
CDM.obj |
An object of class |
model |
Type of model to be fitted; can be |
The intrinsic essence of the GDI index (as denoted by ) is the weighted variance of
all
attribute mastery patterns' probabilities of correctly responding to
item
, which can be computed as:
where represents the prior probability of mastery pattern
;
is the weighted average of the correct
response probabilities across all attribute mastery patterns. When the q-vector
is correctly specified, the calculated
should be maximized, indicating
the maximum discrimination of the item.
Theoretically, is larger when
is either specified correctly or over-specified,
unlike when
is under-specified, and that when
is over-specified,
is larger than but close to the value of
when specified correctly. The value of
continues to
increase slightly as the number of over-specified attributes increases, until
becomes
(
= [11...1]).
Thus,
is computed to indicate the proportion of variance accounted for by
,
called the
.
An object of class matrix
, which consisted of for each item and each possible q-vector.
Haijiang Qin <[email protected]>
de la Torre, J., & Chiu, C. Y. (2016). A General Method of Empirical Q-matrix Validation. Psychometrika, 81(2), 253-273. DOI: 10.1007/s11336-015-9467-8.
library(Qval) set.seed(123) ## generate Q-matrix and data K <- 3 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## calculate PVAF directly PVAF <-get.PVAF(Y = example.data$dat, Q = example.Q) print(PVAF) ## calculate PVAF after fitting CDM example.CDM.obj <- CDM(example.data$dat, example.Q, model="GDINA") PVAF <-get.PVAF(CDM.obj = example.CDM.obj) print(PVAF)
library(Qval) set.seed(123) ## generate Q-matrix and data K <- 3 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## calculate PVAF directly PVAF <-get.PVAF(Y = example.data$dat, Q = example.Q) print(PVAF) ## calculate PVAF after fitting CDM example.CDM.obj <- CDM(example.data$dat, example.Q, model="GDINA") PVAF <-get.PVAF(CDM.obj = example.CDM.obj) print(PVAF)
The function is able to calculate the McFadden pseudo- (
) for all items after
fitting
CDM
or directly.
get.R2(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
get.R2(Y = NULL, Q = NULL, CDM.obj = NULL, model = "GDINA")
Y |
A required |
Q |
A required binary |
CDM.obj |
An object of class |
model |
Type of model to fit; can be |
The McFadden pseudo- (McFadden, 1974) serves as a definitive model-fit index,
quantifying the proportion of variance explained by the observed responses. Comparable to the
squared multiple-correlation coefficient in linear statistical models, this coefficient of
determination finds its application in logistic regression models. Specifically, in the context
of the CDM, where probabilities of accurate item responses are predicted for each examinee,
the McFadden pseudo-
provides a metric to assess the alignment between these predictions
and the actual responses observed. Its computation is straightforward, following the formula:
where is the log-likelihood of the model, and
is the log-likelihood of
the null model. If there were
examinees taking a test comprising
items, then
would be computed as:
where is the posterior probability of examinee
with attribute
mastery pattern
when their response vector is
, and
is
examinee
's response to item
. Let
be the average probability of correctly responding
to item
across all
examinees; then
could be computed as:
An object of class matrix
, which consisted of for each item and each possible attribute mastery pattern.
Haijiang Qin <[email protected]>
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in economics (pp.105–142). Academic Press.
Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2021). Balancing ft and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology, 74, 110–130. DOI: 10.1111/bmsp.12228.
Qin, H., & Guo, L. (2023). Using machine learning to improve Q-matrix validation. Behavior Research Methods. DOI: 10.3758/s13428-023-02126-0.
library(Qval) set.seed(123) ## generate Q-matrix and data K <- 3 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## calculate R2 directly R2 <-get.R2(Y = example.data$dat, Q = example.Q) print(R2) ## calculate R2 after fitting CDM example.CDM.obj <- CDM(example.data$dat, example.Q, model="GDINA") R2 <-get.R2(CDM.obj = example.CDM.obj) print(R2)
library(Qval) set.seed(123) ## generate Q-matrix and data K <- 3 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## calculate R2 directly R2 <-get.R2(Y = example.data$dat, Q = example.Q) print(R2) ## calculate R2 after fitting CDM example.CDM.obj <- CDM(example.data$dat, example.Q, model="GDINA") R2 <-get.R2(CDM.obj = example.CDM.obj) print(R2)
This function returns the restriction matrix (de la Torre, 2011; Ma & de la Torre, 2020) based on two q-vectors, where the two q-vectors can only differ by one attribute.
get.Rmatrix(q1, q2)
get.Rmatrix(q1, q2)
q1 |
A q-vector |
q2 |
Another q-vector |
A restriction matrix
de la Torre, J. (2011). The Generalized DINA Model Framework. Psychometrika, 76(2), 179-199. DOI: 10.1007/s11336-011-9207-7.
Ma, W., & de la Torre, J. (2020). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163. DOI: 10.1111/bmsp.12156.
q1 <- c(1, 1, 0) q2 <- c(1, 1, 1) Rmatrix <- get.Rmatrix(q1, q2) print(Rmatrix)
q1 <- c(1, 1, 0) q2 <- c(1, 1, 1) Rmatrix <- get.Rmatrix(q1, q2) print(Rmatrix)
MethodThis function performs a single iteration of the method for one item's validation. It is designed
to be used in parallel computing environments to speed up the validation process of the
method.
The function is a utility function for
validation
.
When the user calls the validation
function with method = "beta"
,
parallel_iter
runs automatically, so there is no need for the user to call parallel_iter
.
It may seem that parallel_iter
, as an internal function, could better serve users.
However, we found that the Qval
package must export it to resolve variable environment conflicts in R
and enable parallel computation. Perhaps a better solution will be found in the future.
parallel_iter( i, Y, criter.index, P.alpha.Xi, P.alpha, pattern, ri, Ni, Q.pattern.ini, model, criter, search.method, P_GDINA, Q.beta, L, K, alpha.P, get.MLRlasso )
parallel_iter( i, Y, criter.index, P.alpha.Xi, P.alpha, pattern, ri, Ni, Q.pattern.ini, model, criter, search.method, P_GDINA, Q.beta, L, K, alpha.P, get.MLRlasso )
i |
An integer indicating the item number that needs to be validated. |
Y |
A matrix of observed data used for validation. |
criter.index |
An integer representing the index of the criterion. |
P.alpha.Xi |
A matrix representing individual posterior probability. |
P.alpha |
A vector of attribute prior weights. |
pattern |
A matrix representing the attribute mastery patterns. |
ri |
A vector containing the number of examinees in each knowledge state who correctly answered item |
Ni |
A vector containing the total number of examinees in each knowledge state. |
Q.pattern.ini |
An integer representing the initial pattern order for the model. |
model |
A model object used for fitting, such as the GDINA model. |
criter |
A character string specifying the fit criterion. Possible values are "AIC", "BIC", "CAIC", or "SABIC". |
search.method |
A character string specifying the search method for model selection. Options include "beta", "ESA", "SSA", or "PAA". |
P_GDINA |
A function that calculates probabilities for the GDINA model. |
Q.beta |
A Q-matrix used for validation. |
L |
An integer representing the number of all attribute mastery patterns. |
K |
An integer representing the number of attributes. |
alpha.P |
A matrix of individuals' marginal mastery probabilities (Tu et al., 2022). |
get.MLRlasso |
A function for Lasso regression with multiple linear regression. |
A list
containing the following components:
The previous fit index value after applying the selected search method.
The current fit index value after applying the selected search method.
The pattern that corresponds to the optimal model configuration for the current iteration.
The priority vector used in the PAA method, if applicable.
This function can provide the Hull plot. The point suggested by the Hull method is marked in red.
## S3 method for class 'validation' plot(x, i, ...)
## S3 method for class 'validation' plot(x, i, ...)
x |
A |
i |
A numeric, which represents the item you want to plot Hull curve. |
... |
Additional arguments. |
None. This function is used for side effects (plotting).
set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 IQ <- list( P0 = runif(I, 0.2, 0.4), P1 = runif(I, 0.6, 0.8) ) Q <- sim.Q(K, I) data <- sim.data(Q = Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") MQ <- sim.MQ(Q, 0.1) CDM.obj <- CDM(data$dat, MQ) ############### ESA ############### Hull.obj <- validation(data$dat, MQ, CDM.obj, method = "Hull", search.method = "ESA") ## plot Hull curve for item 20 plot(Hull.obj, 20) ############### PAA ############### Hull.obj <- validation(data$dat, MQ, CDM.obj, method = "Hull", search.method = "PAA") ## plot Hull curve for item 20 plot(Hull.obj, 20)
set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 IQ <- list( P0 = runif(I, 0.2, 0.4), P1 = runif(I, 0.6, 0.8) ) Q <- sim.Q(K, I) data <- sim.data(Q = Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") MQ <- sim.MQ(Q, 0.1) CDM.obj <- CDM(data$dat, MQ) ############### ESA ############### Hull.obj <- validation(data$dat, MQ, CDM.obj, method = "Hull", search.method = "ESA") ## plot Hull curve for item 20 plot(Hull.obj, 20) ############### PAA ############### Hull.obj <- validation(data$dat, MQ, CDM.obj, method = "Hull", search.method = "PAA") ## plot Hull curve for item 20 plot(Hull.obj, 20)
This function prints the details of a CDM
object.
It outputs the call used to create the object, the version and the date of the Qval
package.
## S3 method for class 'CDM' print(x, ...)
## S3 method for class 'CDM' print(x, ...)
x |
A |
... |
Additional arguments. |
This function prints the details of a sim.data
object.
It outputs the call used to create the object, the version and the date of the Qval
package.
## S3 method for class 'sim.data' print(x, ...)
## S3 method for class 'sim.data' print(x, ...)
x |
A |
... |
Additional arguments. |
This function prints the details of a validation
object.
It outputs the call used to create the object, the version and the date of the Qval
package.
## S3 method for class 'validation' print(x, ...)
## S3 method for class 'validation' print(x, ...)
x |
A |
... |
Additional arguments. |
randomly generate response matrix according to certain conditions, including attributes distribution, item quality, sample size, Q-matrix and cognitive diagnosis models (CDMs).
sim.data( Q = NULL, N = NULL, IQ = list(P0 = NULL, P1 = NULL), model = "GDINA", distribute = "uniform", control = NULL, verbose = TRUE )
sim.data( Q = NULL, N = NULL, IQ = list(P0 = NULL, P1 = NULL), model = "GDINA", distribute = "uniform", control = NULL, verbose = TRUE )
Q |
The Q-matrix. A random 30 × 5 Q-matrix ( |
N |
Sample size. Default = 500. |
IQ |
A list containing two |
model |
Type of model to be fitted; can be |
distribute |
Attribute distributions; can be |
control |
A list of control parameters with elements:
|
verbose |
Logical indicating to print information or not. Default is |
Object of class sim.data
.
An sim.data
object initially gained by simGDINA
function form GDINA
package.
Elements that can be extracted using method extract include:
An N
× I
simulated item response matrix.
The Q-matrix.
An N
× K
matrix for inviduals' attribute patterns.
A list of non-zero success probabilities for each attribute mastery pattern.
A list of delta parameters.
Higher-order parameters.
Multivariate normal distribution parameters.
A matrix of success probabilities for each attribute mastery pattern.
Haijiang Qin <[email protected]>
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster Analysis for Cognitive Diagnosis: Theory and Applications. Psychometrika, 74(4), 633-665. DOI: 10.1007/s11336-009-9125-0.
Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models:A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.
################################################################ # Example 1 # # generate data follow the uniform distrbution # ################################################################ library(Qval) set.seed(123) K <- 5 I <- 10 Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "uniform") print(data$dat) ################################################################ # Example 2 # # generate data follow the mvnorm distrbution # ################################################################ set.seed(123) K <- 5 I <- 10 Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example_cutoffs <- sample(qnorm(c(1:K)/(K+1)), ncol(Q)) data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "mvnorm", control = list(sigma = 0.5, cutoffs = example_cutoffs)) print(data$dat) ################################################################# # Example 3 # # generate data follow the horder distrbution # ################################################################# set.seed(123) K <- 5 I <- 10 Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example_theta <- rnorm(10, 0, 1) example_b <- seq(-1.5,1.5,length.out=K) data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "horder", control = list(theta = example_theta, a = 1.5, b = example_b)) print(data$dat)
################################################################ # Example 1 # # generate data follow the uniform distrbution # ################################################################ library(Qval) set.seed(123) K <- 5 I <- 10 Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "uniform") print(data$dat) ################################################################ # Example 2 # # generate data follow the mvnorm distrbution # ################################################################ set.seed(123) K <- 5 I <- 10 Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example_cutoffs <- sample(qnorm(c(1:K)/(K+1)), ncol(Q)) data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "mvnorm", control = list(sigma = 0.5, cutoffs = example_cutoffs)) print(data$dat) ################################################################# # Example 3 # # generate data follow the horder distrbution # ################################################################# set.seed(123) K <- 5 I <- 10 Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example_theta <- rnorm(10, 0, 1) example_b <- seq(-1.5,1.5,length.out=K) data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "horder", control = list(theta = example_theta, a = 1.5, b = example_b)) print(data$dat)
simulate certen rate
mis-specifications in the Q-matrix.
sim.MQ(Q, rate, verbose = TRUE)
sim.MQ(Q, rate, verbose = TRUE)
Q |
The Q-matrix ( |
rate |
The ratio of mis-specifications in the Q-matrix. |
verbose |
Logical indicating to print information or not. Default is |
An object of class matrix
.
Haijiang Qin <[email protected]>
library(Qval) set.seed(123) Q <- sim.Q(5, 10) print(Q) MQ <- sim.MQ(Q, 0.1) print(MQ)
library(Qval) set.seed(123) Q <- sim.Q(5, 10) print(Q) MQ <- sim.MQ(Q, 0.1) print(MQ)
generate a ×
Q-matrix randomly, which consisted of one-attribute q-vectors
(50
This function ensures that the generated Q-matrix contains at least two identity matrices as a priority.
Therefore, this function must also satisfy the condition that the number of items (
)
must be at least twice the number of attributes (
).
sim.Q(K, I)
sim.Q(K, I)
K |
The number of attributes of the Q-matrix. |
I |
The number of items. |
An object of class matrix
.
Haijiang Qin <[email protected]>
Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2021). Balancing fit and parsimony to improve Q-matrix validation. Br J Math Stat Psychol, 74 Suppl 1, 110-130. DOI: 10.1111/bmsp.12228.
library(Qval) set.seed(123) Q <- sim.Q(5, 10) print(Q)
library(Qval) set.seed(123) Q <- sim.Q(5, 10) print(Q)
This function uses generalized Q-matrix validation methods to validate the Q-matrix, including commonly used methods such as GDI (de la Torre, & Chiu, 2016; Najera, Sorrel, & Abad, 2019; Najera et al., 2020), Wald (Ma, & de la Torre, 2020), Hull (Najera et al., 2021), and MLR-B (Tu et al., 2022). It supports different iteration methods (test level or item level; Najera et al., 2020; Najera et al., 2021; Tu et al., 2022) and can apply various attribute search methods (ESA, SSA, PAA; de la Torre, 2008; Terzi, & de la Torre, 2018).
validation( Y, Q, CDM.obj = NULL, par.method = "EM", mono.constraint = TRUE, model = "GDINA", method = "GDI", search.method = "PAA", iter.level = "no", maxitr = 1, eps = 0.95, alpha.level = 0.05, criter = NULL, verbose = TRUE )
validation( Y, Q, CDM.obj = NULL, par.method = "EM", mono.constraint = TRUE, model = "GDINA", method = "GDI", search.method = "PAA", iter.level = "no", maxitr = 1, eps = 0.95, alpha.level = 0.05, criter = NULL, verbose = TRUE )
Y |
A required |
Q |
A required binary |
CDM.obj |
An object of class |
par.method |
Type of mtehod to estimate CDMs' parameters; one out of |
mono.constraint |
Logical indicating whether monotonicity constraints should be fulfilled in estimation.
Default = |
model |
Type of model to fit; can be |
method |
The methods to validata Q-matrix, can be |
search.method |
Character string specifying the search method to use during validation.
|
iter.level |
Can be |
maxitr |
Number of max iterations. Default = |
eps |
Cut-off points of |
alpha.level |
alpha level for the wald test. Default = |
criter |
The kind of fit-index value. When |
verbose |
Logical indicating to print iterative information or not. Default is |
An object of class validation
containing the following components:
The original Q-matrix that maybe contain some mis-specifications and need to be validated.
The Q-matrix that suggested by certain validation method.
The time that CPU cost to finish the function.
A matrix that contains the modification process of each question during each iteration.
Each row represents an iteration, and each column corresponds to the q-vector index of the respective
question. The order of the indices is consistent with the row numbering in the matrix generated by
the attributepattern
function in the GDINA
package. Only when
maxitr
> 1, the, the value is available.
The number of iteration. Only when maxitr
> 1, the value is available.
An I
× K
matrix that contains the priority of every attribute for
each item. Only when the search.method
is "PAA"
, the value is available. See details.
A list
containing all the information needed to plot the Hull plot, which is
available only when method
= "Hull"
.
The GDI method (de la Torre & Chiu, 2016), as the first Q-matrix validation method applicable to saturated models, serves as an important foundation for various mainstream Q-matrix validation methods.
The method calculates the proportion of variance accounted for (; @seealso
get.PVAF
)
for all possible q-vectors for each item, selects the q-vector with a just
greater than the cut-off point (or Epsilon, EPS) as the correction result, and the variance
is the generalized discriminating index (GDI; de la Torre & Chiu, 2016).
Therefore, the GDI method is also considered as a generalized extension of the
method (de la Torre, 2008), which also takes maximizing discrimination as its basic idea.
In the GDI method,
is defined as the weighted variance of the correct
response probabilities across all mastery patterns, that is:
where represents the prior probability of mastery pattern
;
is the weighted
average of the correct response probabilities across all attribute mastery patterns.
When the q-vector is correctly specified, the calculated
should be maximized,
indicating the maximum discrimination of the item. However, in reality,
continues to increase when the q-vector is over-specified, and the more attributes that
are over-specified, the larger
becomes. The q-vector with all attributes set
to 1 (i.e.,
) has the largest
(de la Torre, 2016).
This is because an increase in attributes in the q-vector leads to an increase in item
parameters, resulting in greater differences in correct response probabilities across
attribute patterns and, consequently, increased variance. However, this increase in
variance is spurious. Therefore, de la Torre et al. calculated
to describe the degree to which the discrimination of the current q-vector explains
the maximum discrimination. They selected an appropriate
cut-off point to achieve
a balance between q-vector fit and parsimony. According to previous studies,
the
cut-off point is typically set at 0.95 (Ma & de la Torre, 2020; Najera et al., 2021).
Najera et al. (2019) proposed using multinomial logistic regression to predict a more appropriate cut-off point for
.
The cut-off point is denoted as
, and the predicted regression equation is as follows:
Where represents the question quality, calculated as the negative difference between the probability of an examinee
with all attributes answering the question correctly and the probability of an examinee with no attributes answering the question correctly
(
),
and
and
represent the number of examinees and the number of questions, respectively.
The Wald method (Ma & de la Torre, 2020) combines the Wald test with to correct
the Q-matrix at the item level. Its basic logic is as follows: when correcting item
,
the single attribute that maximizes the
value is added to a vector with all
attributes set to
(i.e.,
) as a starting point.
In subsequent iterations, attributes in this vector are continuously added or
removed through the Wald test. The correction process ends when the
exceeds the
cut-off point or when no further attribute changes occur. The Wald statistic follows an
asymptotic
distribution with a degree of freedom of
.
The calculation method is as follows:
represents the restriction matrix (@seealso
get.Rmatrix
);
denotes
the vector of correct response probabilities for item
;
is the
variance-covariance matrix of the correct response probabilities for item
, which
can be obtained by multiplying the
matrix (de la Torre, 2011) with the
variance-covariance matrix of item parameters
, i.e.,
. The
can be
derived by inverting the information matrix. Using the the empirical cross-product information
matrix (de la Torre, 2011) to calculate
.
is a
matrix (@seealso
get.Mmatrix
)
that represents the relationship between the parameters of item and the attribute mastery patterns. The
rows represent different mastery patterns, while the columns represent different item parameters.
The Hull method (Najera et al., 2021) addresses the issue of the cut-off point in the GDI method and demonstrates good performance in simulation studies. Najera et al. applied the Hull method for determining the number of factors to retain in exploratory factor analysis (Lorenzo-Seva et al., 2011) to the retention of attribute quantities in the q-vector, specifically for Q-matrix validation. The Hull method aligns with the GDI approach in its philosophy of seeking a balance between fit and parsimony. While GDI relies on a preset, arbitrary cut-off point to determine this balance, the Hull method utilizes the most pronounced elbow in the Hull plot to make this judgment. The the most pronounced elbow is determined using the following formula:
where represents the fit-index value (can be
@seealso
get.PVAF
or
@seealso
get.R2
) when the q-vector contains attributes,
similarly,
and
represent the fit-index value when the q-vector contains
and
attributes, respectively.
denotes the number of parameters when the
q-vector has
attributes, which is
for a saturated model. Likewise,
and
represent the number of parameters when the q-vector has
and
attributes, respectively. The Hull method calculates the
index for all possible q-vectors
and retains the q-vector with the maximum
index as the corrected result.
Najera et al. (2021) removed any concave points from the Hull plot, and when only the first and
last points remained in the plot, the saturated q-vector was selected.
The MLR-B method proposed by Tu et al. (2022) differs from the GDI, Wald and Hull method in that
it does not employ . Instead, it directly uses the marginal probabilities of attribute mastery for
examinees to perform multivariate logistic regression on their observed scores. This approach assumes
all possible q-vectors and conducts
regression modelings. After proposing regression equations
that exclude any insignificant regression coefficients, it selects the q-vector corresponding to
the equation with the minimum
value as the validation result. The performance of this method in both the
LCDM and GDM models even surpasses that of the Hull method (Tu et al., 2022), making it an efficient and reliable
approach for Q-matrix validation.
methodThe method (Li & Chen, 2024) addresses the Q-matrix validation problem from the
perspective of signal detection theory. Signal detection theory posits that any stimulus is
a signal embedded in noise, where the signal always overlaps with noise. The
method
treats the correct q-vector as the signal and other possible q-vectors as noise. The goal is
to identify the signal from the noise, i.e., to correctly identify the q-vector. For item
with the q-vector of the
-th type, the
index is computed as follows:
In the formula, represents the number of examinees in knowledge state
who correctly
answered item
, while
is the total number of examinees in knowledge state
.
denotes the probability that an examinee in knowledge state
answers
item
correctly when the q-vector for item
is of the
-th type. In fact,
is the observed probability that an examinee in knowledge state
answers
item
correctly, and
represents the difference between the actual proportion of
correct answers for item
in each knowledge state and the expected probability of answering the
item incorrectly in that state. Therefore, to some extent,
can be considered as a measure
of discriminability, and the
method posits that the correct q-vector maximizes
,
i.e.:
Therefore, essentially, is an index similar to GDI. Both increase as the number of attributes
in the q-vector increases. Unlike the GDI method, the
method does not continue to compute
but instead uses the minimum
value to determine whether the attributes
in the q-vector are sufficient. In Package Qval, parLapply will be used to accelerate the
method.
Please note that the method has different meanings when applying different search algorithms.
For more details, see section 'Search algorithm' below.
The iterative procedure that one item modification at a time is item level iteration ( iter.level = "item"
) in (Najera
et al., 2020, 2021). The steps of the item
level iterative procedure algorithm are as follows:
Fit the CDM
according to the item responses and the provisional Q-matrix ().
Validate the provisional Q-matrix and gain a suggested Q-matrix ().
for each item, as the
of the provisional q-vector specified in
,
and
as the
of the suggested q-vector in
.
Calculate all items' , defined as
Define the hit item as the item with the highest .
Update by changing the provisional q-vector by the suggested q-vector of the hit item.
Iterate over Steps 1 to 6 until
When the Q-matrix validation method is "MLR-B"
or "Hull"
when criter = "AIC"
or criter = "R2"
, is not used.
In this case, the criterion for determining which item's index will be replaced is
or
, respectively.
The iterative procedure that the entire Q-matrix is modified at each iteration
is test level iteration ( iter.level = "test"
) (Najera et al., 2020; Tu et al., 2022).
The steps of the test
level iterative procedure algorithm are as follows:
Fit the CDM
according to the item responses and the provisional Q-matrix ().
Validate the provisional Q-matrix and gain a suggested Q-matrix ().
Check whether . If
TRUE
, terminate the iterative algorithm.
If FALSE
, Update as
.
Iterate over Steps 1 and 3 until one of conditions as follows is satisfied: 1. ; 2. Reach the maximum number of iterations (
maxitr
); 3. does not satisfy
the condition that an attribute is measured by one item at least.
iter.level = 'test.att'
will use a method called the test-attribute iterative procedure (Najera et al., 2021), which
modifies all items in each iteration while following the principle of minimizing changes in the number of attributes.
Therefore, the test-attribute iterative procedure and the test-level iterative procedure follow the same process for large items.
The key difference is that the test-attribute iterative procedure only allows minimal adjustments to the -vector in each iteration.
For example, if the original
-vector is
and the validation methods suggest
,
the test-level iterative procedure can directly update the
-vector to
.
In contrast, the test-attribute iterative procedure can only make a gradual adjustment,
first modifying the
-vector to either
or
.
As a result, the test-attribute iterative procedure is more cautious than the test-level iterative procedure
and may require more iterations.
Three search algorithms are available: Exhaustive Search Algorithm (ESA), Sequential Search Algorithm (SSA),
and Priority Attribute Algorithm (PAA).
ESA is a brute-force algorithm. When validating the q-vector of a particular item, it traverses all possible
q-vectors and selects the most appropriate one based on the chosen Q-matrix validation method. Since there are
possible q-vectors with
attributes, ESA requires
searches for each item.
SSA reduces the number of searches by adding one attribute at a time to the q-vector in a stepwise manner.
Therefore, in the worst-case scenario, SSA requires searches.
The detailed steps are as follows:
Define an empty q-vector of length
,
where all elements are 0.
Examine all single-attribute q-vectors, which are those formed by
changing one of the 0s in to 1.
According to the criteria of the chosen Q-matrix validation method,
select the optimal single-attribute q-vector, denoted as
.
Examine all two-attribute q-vectors, which are those formed by changing
one of the 0s in to 1. According to the criteria of the
chosen Q-matrix validation method, select the optimal two-attribute q-vector,
denoted as
.
Repeat this process until is found, or the stopping criterion
of the chosen Q-matrix validation method is met.
PAA is a highly efficient and concise algorithm that evaluates whether each attribute needs to be included in the
q-vector based on the priority of the attributes. @seealso get.priority
. Therefore, even in
the worst-case scenario, PAA only requires searches. The detailed process is as follows:
Using the applicable CDM (e.g. the G-DINA model) to estimate the model parameters
and obtain the marginal attribute mastery probabilities matrix
Use LASSO regression to calculate the priority of each attribute in the q-vector for item
Check whether each attribute is included in the optimal q-vector based on the attribute priorities from high to low seriatim and output the final suggested q-vector according to the criteria of the chosen Q-matrix validation method.
The calculation of priorities is straightforward (Qin & Guo, 2025): the priority of an attribute is the regression coefficient obtained from a LASSO multinomial logistic regression, with the attribute as the independent variable and the response data from the examinees as the dependent variable. The formula (Tu et al., 2022) is as follows:
Where represents the response of examinee
on item
,
denotes the marginal mastery probabilities of examinee
(which can be obtained from the return value
alpha.P
of the CDM
function),
is the intercept term, and
represents the regression coefficient.
The LASSO loss function can be expressed as:
Where is the penalized likelihood,
is the original likelihood,
and
is the tuning parameter for penalization (a larger value imposes a stronger penalty on
).
The priority for attribute
is defined as:
It should be noted that the Wald method proposed by Ma and de la Torre (2020) uses a "stepwise"
search approach.
This approach involves incrementally adding or removing 1 from the q-vector and evaluating the significance of
the change using the Wald test:
1. If removing a 1 results in non-significance (indicating that the 1 is unnecessary), the 1 is removed from the q-vector;
otherwise, the q-vector remains unchanged.
2. If adding a 1 results in significance (indicating that the 1 is necessary), the 1 is added to the q-vector;
otherwise, the q-vector remains unchanged.
The process stops when the q-vector no longer changes or when the PVAF reaches the preset cut-off point (i.e., 0.95).
Stepwise are unique search approach of the Wald method, and users should be aware of this. Since stepwise is
inefficient and differs significantly from the extremely high efficiency of PAA, Qval
package also provides PAA
for q-vector search in the Wald method. When applying the PAA version of the Wald method, the search still
examines whether each attribute is necessary (by checking if the Wald test reaches significance after adding the attribute)
according to attribute priority. The search stops when no further necessary attributes are found or when the
PVAF reaches the preset cut-off point (i.e., 0.95). The "forward" search approach is another search method
available for the Wald method, which is equivalent to "SSA"
. When "Wald"
uses search.method = "SSA"
,
it means that the Wald method is employing the forward search approach. Its basic process is the same as 'stepwise'
,
except that it does not remove elements from the q-vector. Therefore, the "forward" search approach is essentially equivalent to SSA.
Please note that, since the method essentially selects q-vectors based on
, even without using the iterative process,
the
method requires multiple parameter estimations to obtain the AIC values for different q-vectors.
Therefore, the
method is more time-consuming and computationally intensive compared to the other methods.
Li and Chen (2024) introduced a specialized search approach for the
method, which is referred to as the
search (
search.method = 'beta'
). The number of searches required is , and
the specific steps are as follows:
For item , sequentially examine the
values for each single-attribute q-vector,
select the largest
and the smallest
, along with the corresponding
attributes
and
. (K searches)
Then, add all possible q-vectors (a total of ) containing attribute
and
not containing
to the search space
(a total of
)), and unconditionally
add the saturated q-vector
to
to ensure that it is tested.
Select the q-vector with the minimum AIC from as the final output of the
method. (The remaining
searches)
The Qval
package also provides three search methods, ESA, SSA, and PAA, for the method.
When the
method applies these three search methods, Q-matrix validation can be completed without
calculating any
values, as the
method essentially uses
AIC
for selecting q-vectors.
For example, when applying ESA, the method does not need to perform Step 1 of the
search
and only needs to include all possible q-vectors (a total of
) in
, then outputs
the corresponding q-vector based on the minimum
. When applying SSA or PAA, the
method also
does not require any calculation of
values. In this case, the
method is consistent
with the Q-matrix validation process described by Chen et al. (2013) using relative fit indices. Therefore, when
the
method does not use
search, it is equivalent to the method of Chen et al. (2013).
To better implement Chen et al. (2013)'s Q-matrix validation method using relative fit indices, the
Qval
package also provides ,
, and
as alternatives to validate q-vectors, in addition
to
.
Haijiang Qin <[email protected]>
Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and Absolute Fit Evaluation in Cognitive Diagnosis Modeling. Journal of Educational Measurement, 50(2), 123-140. DOI: 10.1111/j.1745-3984.2012.00185.x
de la Torre, J., & Chiu, C. Y. (2016). A General Method of Empirical Q-matrix Validation. Psychometrika, 81(2), 253-273. DOI: 10.1007/s11336-015-9467-8.
de la Torre, J. (2008). An Empirically Based Method of Q-Matrix Validation for the DINA Model: Development and Applications. Journal of Education Measurement, 45(4), 343-362. DOI: 10.1111/j.1745-3984.2008.00069.x.
Li, J., & Chen, P. (2024). A new Q-matrix validation method based on signal detection theory. British Journal of Mathematical and Statistical Psychology, 00, 1–33. DOI: 10.1111/bmsp.12371
Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The Hull method for selecting the number of common factors. Multivariate Behavioral Research, 46, 340–364. DOI: 10.1080/00273171.2011.564527.
Ma, W., & de la Torre, J. (2020). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163. DOI: 10.1111/bmsp.12156.
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in economics (pp. 105–142). New York, NY: Academic Press.
Najera, P., Sorrel, M. A., & Abad, F. J. (2019). Reconsidering Cutoff Points in the General Method of Empirical Q-Matrix Validation. Educational and Psychological Measurement, 79(4), 727-753. DOI: 10.1177/0013164418822700.
Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020). Improving Robustness in Q-Matrix Validation Using an Iterative and Dynamic Procedure. Applied Psychological Measurement, 44(6), 431-446. DOI: 10.1177/0146621620909904.
Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2021). Balancing fit and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology, 74 Suppl 1, 110-130. DOI: 10.1111/bmsp.12228.
Qin, H., & Guo, L. (2025). Priority attribute algorithm for Q-matrix validation: A didactic. Behavior Research Methods, 57(1), 31. DOI: 10.3758/s13428-024-02547-5.
Terzi, R., & de la Torre, J. (2018). An Iterative Method for Empirically-Based Q-Matrix Validation. International Journal of Assessment Tools in Education, 248-262. DOI: 10.21449/ijate.40719.
Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models: A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.
################################################################ # Example 1 # # The GDI method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM model first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.GDI.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "GDI") ## also can validate the Q-matrix directly Q.GDI.obj <- validation(example.data$dat, example.MQ) ## item level iteration Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI", iter.level = "item", maxitr = 150) ## search method Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI", search.method = "ESA") ## cut-off point Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI", eps = 0.90) ## check QRR print(zQRR(example.Q, Q.GDI.obj$Q.sug)) ################################################################ # Example 2 # # The Wald method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.Wald.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Wald") ## also can validate the Q-matrix directly Q.Wald.obj <- validation(example.data$dat, example.MQ, method = "Wald") ## check QRR print(zQRR(example.Q, Q.Wald.obj$Q.sug)) ################################################################ # Example 3 # # The Hull method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.Hull.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Hull") ## also can validate the Q-matrix directly Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull") ## change PVAF to R2 as fit-index Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull", criter = "R2") ## check QRR print(zQRR(example.Q, Q.Hull.obj$Q.sug)) ################################################################ # Example 4 # # The MLR-B method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.MLR.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "MLR-B") ## also can validate the Q-matrix directly Q.MLR.obj <- validation(example.data$dat, example.MQ, method = "MLR-B") ## check QRR print(zQRR(example.Q, Q.MLR.obj$Q.sug))
################################################################ # Example 1 # # The GDI method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM model first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.GDI.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "GDI") ## also can validate the Q-matrix directly Q.GDI.obj <- validation(example.data$dat, example.MQ) ## item level iteration Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI", iter.level = "item", maxitr = 150) ## search method Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI", search.method = "ESA") ## cut-off point Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI", eps = 0.90) ## check QRR print(zQRR(example.Q, Q.GDI.obj$Q.sug)) ################################################################ # Example 2 # # The Wald method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.Wald.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Wald") ## also can validate the Q-matrix directly Q.Wald.obj <- validation(example.data$dat, example.MQ, method = "Wald") ## check QRR print(zQRR(example.Q, Q.Wald.obj$Q.sug)) ################################################################ # Example 3 # # The Hull method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.Hull.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Hull") ## also can validate the Q-matrix directly Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull") ## change PVAF to R2 as fit-index Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull", criter = "R2") ## check QRR print(zQRR(example.Q, Q.Hull.obj$Q.sug)) ################################################################ # Example 4 # # The MLR-B method to validate Q-matrix # ################################################################ set.seed(123) library(Qval) ## generate Q-matrix and data K <- 4 I <- 20 example.Q <- sim.Q(K, I) IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder") ## simulate random mis-specifications example.MQ <- sim.MQ(example.Q, 0.1) ## using MMLE/EM to fit CDM first example.CDM.obj <- CDM(example.data$dat, example.MQ) ## using the fitted CDM.obj to avoid extra parameter estimation. Q.MLR.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "MLR-B") ## also can validate the Q-matrix directly Q.MLR.obj <- validation(example.data$dat, example.MQ, method = "MLR-B") ## check QRR print(zQRR(example.Q, Q.MLR.obj$Q.sug))
This function flexibly provides the Wald test for any two q-vectors of a given item in the Q-matrix,
but requires that the two q-vectors differ by only one attribute. Additionally, this function needs
to accept a CDM.obj
.
Wald.test(CDM.obj, q1, q2, i = 1)
Wald.test(CDM.obj, q1, q2, i = 1)
CDM.obj |
An object of class |
q1 |
A q-vector |
q2 |
Another q-vector |
i |
the item needed to be validated |
An object of class htest
containing the following components:
The statistic of the Wald test.
the degrees of freedom for the Wald-statistic.
The p value
library(Qval) set.seed(123) K <- 3 I <- 20 N <- 500 IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) Q <- sim.Q(K, I) data <- sim.data(Q = Q, N = N, IQ = IQ, model = "GDINA", distribute = "horder") CDM.obj <- CDM(data$dat, Q) q1 <- c(1, 0, 0) q2 <- c(1, 1, 0) ## Discuss whether there is a significant difference when ## the q-vector of the 2nd item in the Q-matrix is q1 or q2. Wald.test.obj <- Wald.test(CDM.obj, q1, q2, i=2) print(Wald.test.obj)
library(Qval) set.seed(123) K <- 3 I <- 20 N <- 500 IQ <- list( P0 = runif(I, 0.0, 0.2), P1 = runif(I, 0.8, 1.0) ) Q <- sim.Q(K, I) data <- sim.data(Q = Q, N = N, IQ = IQ, model = "GDINA", distribute = "horder") CDM.obj <- CDM(data$dat, Q) q1 <- c(1, 0, 0) q2 <- c(1, 1, 0) ## Discuss whether there is a significant difference when ## the q-vector of the 2nd item in the Q-matrix is q1 or q2. Wald.test.obj <- Wald.test(CDM.obj, q1, q2, i=2) print(Wald.test.obj)
Calculate over-specifcation rate (OSR)
zOSR(Q.true, Q.sug)
zOSR(Q.true, Q.sug)
Q.true |
The true Q-matrix. |
Q.sug |
The Q-matrix that has being validated. |
The OSR is defned as:
where denotes the
th attribute of item
in the true Q-matrix (
Q.true
),
denotes
th attribute of item
in the suggested Q-matrix(
Q.sug
),
and is the indicator function.
A numeric (OSR index).
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) OSR <- zOSR(example.Q1, example.Q2) print(OSR)
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) OSR <- zOSR(example.Q1, example.Q2) print(OSR)
Calculate Q-matrix recovery rate (QRR)
zQRR(Q.true, Q.sug)
zQRR(Q.true, Q.sug)
Q.true |
The true Q-matrix. |
Q.sug |
The Q-matrix that has being validated. |
The Q-matrix recovery rate (QRR) provides information on overall performance, and is defned as:
where denotes the
th attribute of item
in the true Q-matrix (
),
denotes
th attribute of item
in the suggested Q-matrix(
),
and
is the indicator function.
A numeric (QRR index).
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) QRR <- zQRR(example.Q1, example.Q2) print(QRR)
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) QRR <- zQRR(example.Q1, example.Q2) print(QRR)
Calculate true-negative rate (TNR)
zTNR(Q.true, Q.orig, Q.sug)
zTNR(Q.true, Q.orig, Q.sug)
Q.true |
The true Q-matrix. |
Q.orig |
The Q-matrix need to be validated. |
Q.sug |
The Q-matrix that has being validated. |
TNR is defined as the proportion of correct elements which are correctly retained:
where denotes the
th attribute of item
in the true Q-matrix (
Q.true
),
denotes
th attribute of item
in the original Q-matrix(
Q.orig
),
denotes
th attribute of item
in the suggested Q-matrix(
Q.sug
),
and is the indicator function.
A numeric (TNR index).
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) example.Q3 <- sim.MQ(example.Q1, 0.05) TNR <- zTNR(example.Q1, example.Q2, example.Q3) print(TNR)
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) example.Q3 <- sim.MQ(example.Q1, 0.05) TNR <- zTNR(example.Q1, example.Q2, example.Q3) print(TNR)
Calculate true-positive rate (TPR)
zTPR(Q.true, Q.orig, Q.sug)
zTPR(Q.true, Q.orig, Q.sug)
Q.true |
The true Q-matrix. |
Q.orig |
The Q-matrix need to be validated. |
Q.sug |
The Q-matrix that has being validated. |
TPR is defned as the proportion of correct elements which are correctly retained:
where denotes the
th attribute of item
in the true Q-matrix (
Q.true
),
denotes
th attribute of item
in the original Q-matrix(
Q.orig
),
denotes
th attribute of item
in the suggested Q-matrix(
Q.sug
),
and is the indicator function.
A numeric (TPR index).
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) example.Q3 <- sim.MQ(example.Q1, 0.05) TPR <- zTPR(example.Q1, example.Q2, example.Q3) print(TPR)
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) example.Q3 <- sim.MQ(example.Q1, 0.05) TPR <- zTPR(example.Q1, example.Q2, example.Q3) print(TPR)
Calculate under-specifcation rate (USR)
zUSR(Q.true, Q.sug)
zUSR(Q.true, Q.sug)
Q.true |
The true Q-matrix. |
Q.sug |
The Q-matrix that has being validated. |
The USR is defned as:
where denotes the
th attribute of item
in the true Q-matrix (
Q.true
),
denotes
th attribute of item
in the suggested Q-matrix(
Q.sug
),
and is the indicator function.
A numeric (USR index).
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) USR <- zUSR(example.Q1, example.Q2) print(USR)
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) USR <- zUSR(example.Q1, example.Q2) print(USR)
Calculate vector recovery ratio (VRR)
zVRR(Q.true, Q.sug)
zVRR(Q.true, Q.sug)
Q.true |
The true Q-matrix. |
Q.sug |
The Q-matrix that has being validated. |
The VRR shows the ability of the validation method to recover q-vectors, and is determined by
where denotes the
-vector of item
in the true Q-matrix (
Q.true
),
denotes the
-vector of item
in the suggested Q-matrix(
Q.sug
),
and is the indicator function.
A numeric (VRR index).
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) VRR <- zVRR(example.Q1, example.Q2) print(VRR)
library(Qval) set.seed(123) example.Q1 <- sim.Q(5, 30) example.Q2 <- sim.MQ(example.Q1, 0.1) VRR <- zVRR(example.Q1, example.Q2) print(VRR)