<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Sequential Gibbs Sampling Algorithm for Cognitive Diagnosis Models with Many Attributes</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10251914</idno>
					<idno type="doi">10.1080/00273171.2021.1896352</idno>
					<title level='j'>Multivariate Behavioral Research</title>
<idno>0027-3171</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Juntao Wang</author><author>Ningzhong Shi</author><author>Xue Zhang</author><author>Gongjun Xu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Cognitive diagnosis models (CDMs) are useful statistical tools to provide rich information relevant for intervention and learning. As a popular approach to estimate and make inference of CDMs, the Markov chain Monte Carlo (MCMC) algorithm is widely used in practice. However, when the number of attributes, K, is large, the existing MCMC algorithm may become time-consuming, due to the fact that O(2 K ) calculations are usually needed in the process of MCMC sampling to get the conditional distribution for each attribute profile. To overcome this computational issue, motivated by Culpepper and Hudson (2018), we propose a computationally efficient sequential Gibbs sampling method, which needs O(K) calculations to sample each attribute profile. We use simulation and real data examples to show the good finite-sample performance of the proposed sequential Gibbs sampling, and its advantage over existing methods.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In recent years, cognitive diagnosis models (CDMs) have gained great achievements in educational and psychological assessments, where latent binary random vectors are often assumed to represent the presence or absence of multiple fine-grained skills or attributes. The CDMs can be viewed as a family of restricted latent class models, with the goal of achieving personalized diagnostic classification. Compared with the Item Response Theory (IRT) models, the CDMs can provide more informative feedbacks on attribute profiles and allow for the design of more effective intervention strategies <ref type="bibr">(Rupp, Templin, &amp; Henson, 2010)</ref>.</p><p>Many CDMs have been proposed in the literature. An incomplete list contains the Deterministic Input, Noisy "And" gate and Noisy Inputs, Deterministic "And" gate models <ref type="bibr">(DINA and NIDA;</ref><ref type="bibr">Haertel, 1989;</ref><ref type="bibr">Junker &amp; Sijtsma, 2001)</ref>, the reduced version of the Reparameterized Unified Model (rRUM; <ref type="bibr">Hartz, 2002;</ref><ref type="bibr">Rupp et al., 2010)</ref>, the Deterministic Input, Noisy "Or" gate and Noisy Inputs, Deterministic "Or" gate models (DINO and NIDO; <ref type="bibr">Templin &amp; Henson, 2006)</ref>, the general diagnostic model <ref type="bibr">(GDM;</ref><ref type="bibr">von Davier, 2005)</ref>, the log-linear cognitive diagnosis model (LCDM; <ref type="bibr">Henson, Templin, &amp; Willse, 2009)</ref>, and the generalized DINA model (GDINA; de la Torre, 2011).</p><p>To estimate the CDM parameters and perform classification of examinees, the Bayesian MCMC method is one popular approach, as it will not only provide the point estimation but also the whole posterior distributional information for statistical inferences. In the Bayesian framework, the MCMC algorithm is used to generate the unique stationary distribution that weakly converges to the true target distribution of parameters of interest.</p><p>The MCMC algorithm provides a useful tool to solve many complicated problems in statistics and psychometrics. In the CDM literature, the Bayesian MCMC estimation of CDMs has also been studied. For instance, Under the confirmatory setting with the Qmatrix prespecified, <ref type="bibr">Culpepper (2015)</ref> proposed an efficient Gibbs sampling for the DINA model, in which all parameters were sampled from their full conditional distributions. <ref type="bibr">Chung (2014</ref><ref type="bibr">Chung ( , 2019) )</ref> estimated the DINA and rRUM models in the Bayesian framework using a Gibbs sampling algorithm. <ref type="bibr">Culpepper and Hudson (2018)</ref> further proposed a Bayesian sequential Gibbs sampler of the rRUM which samples each latent attribute sequentially from the corresponding conditional Bernoulli distribution. In addition, the software "JAGS" has also been used to fit many common CDMs (e.g. <ref type="bibr">Zhan, Jiao, Man, &amp; Wang, 2019)</ref>. In the exploratory CDM setting, Bayesian method has also been used to estimate the model parameters and the Qmatrix jointly under identifiability conditions. For instance, <ref type="bibr">Chen, Culpepper, Chen, and Douglas (2018)</ref> proposed an easily implemented MCMC algorithm through a data augmentation strategy and item parameter reparameterizations. Following <ref type="bibr">Culpepper and Hudson (2018)</ref>, <ref type="bibr">Culpepper and Chen (2019)</ref> proposed a similar sequential sampler to estimate the Q matirx under the exploratory rRUM.</p><p>In modern psychological, educational and medical applications of CDMs, large-scale data, with large numbers of manifest attributes of interest (denoted by K), are often collected. In many applications, the number of the corresponding latent classes 2 K could become comparable or even larger than the number of examinees N. Examples with large number of latent classes can be found in educational assessment <ref type="bibr">(Lee, Park, &amp; Taylan, 2011)</ref> and the medical diagnosis <ref type="bibr">(Wu, Deloria-Knoll, &amp; Zeger, 2016)</ref>. The increasing dimension of attributes and items often causes high computational cost and therefore introduces new challenges for the estimation and inference of the CDMs.</p><p>In this paper, we focus on improving the MCMC with the Gibbs sampling in the setting of many latent attributes under the cofirmatory CDMs with the Q-matrix prespecified.</p><p>Existing MCMC algorithms often directly sample from the posterior distribution of each latent attribute profile (see <ref type="bibr">Zhan et al., 2019)</ref>, with the whole attribute profile treated as one random sample from a categorical distribution with 2 K different categories. Therefore, in order to sample one attribute profile, it is needed to evaluate 2 K posterior probabilities of each possible profile candidate. The corresponding computational overhead for sampling each individual's attribute profile is of the order O(2 K ). For a large K, this would lead to a significant computational burden and also affect the convergence of the MCMC algorithm. Alternatively, <ref type="bibr">Zhan, Li, Wang, Bian, and Wang (2015)</ref> proposed to model the attributes as independent variables by introducing an independent Bernoulli prior for each attribute, the corresponding sampling method is named as the independent Gibbs sampling in this article. Without modeling the dependence among the attributes, the computation cost of sampling each attribute profile in the independent Gibbs Sampling is then reduced to O(K), however, the independent assumption of attributes is often too strong to satisfy in practice.</p><p>Since the computational difficulty for large K mainly arises from the sampling of the attribute profiles, we follow the novel idea of sequential Gibbs sampler proposed in <ref type="bibr">Culpepper and Hudson (2018)</ref> to develop an efficient sequential Gibbs sampling method. This work extends <ref type="bibr">Culpepper and Hudson (2018)</ref>, which focuses on the rRUM model, to the more general GDINA model under the high-dimensional setting with many attributes; such a high-dimensional setting arises in many applications but the related estimation challenge has not been addressed. Following <ref type="bibr">Culpepper and Hudson (2018)</ref>, the sequential sampler samples each attribute separately instead of sampling the attribute profile as a whole, and consequently, the computational overheads of sampling attribute profiles is greatly reduced from O(2 K ) to O(K). For a large K, the improvement is especially significant as shown in the simulation studies.</p><p>The rest of the paper is organized as follows. In Section 2, we give an overview on the CDMs and a Bayesian formulation for the estimation. Section 3 introduces the proposed sequential Gibbs sampling, with a focus on the estimation of the GDINA model as a general version of CDMs. The simulations and real data analyses are shown in Section 4 and Section 5, respectively. A discussion is given in Section 6. The supplementary materials include more details for the proposed algorithm. Source code of the proposed method will be made publicly available upon the acceptance of this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Bayesian GDINA Model</head><p>This section focus on the GDINA model as the general framework for CDMs, which include many CDMs as special cases such as DINA, DINO, and Reduced RUM <ref type="bibr">(Hartz, 2002;</ref><ref type="bibr">Junker &amp; Sijtsma, 2001;</ref><ref type="bibr">Rupp et al., 2010)</ref>. We present the formulation for the Bayesian GDINA model, which contains the model setup, item parameter priors and population parameter prior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">The DINA and GDINA Models</head><p>In CDMs, the examinee's responses depend on his/her latent attribute profile which is </p><p>represent an examinee's responses to J items. Both &#945; and Y are examinee-specific; a particular examinee i's attribute profile and responses are denoted by &#945; i and Y i for i = 1, &#8226; &#8226; &#8226; , N. The N examinees' attribute profiles are random samples from a population distribution with the probability: &#960; &#945; = P (&#945; i = &#945;), where</p><p>Thus, the population distribution of attribute profiles is characterized by the vector &#960; = (&#960; &#945; , &#945; &#8712; {0, 1} K ) . For notational convenience, for &#945; = (&#945; 1 , &#945; 2 , &#8226; &#8226; &#8226; , &#945; K ) , we will also</p><p>The binary Q-matrix (K. K. <ref type="bibr">Tatsuoka, 1983</ref>) is a key component for CDMs. For each pair of j and k, q jk = 1 indicates attribute k is required by item j, otherwise q jk = 0, for j = 1, &#8226; &#8226; &#8226; , J. Particularly, the jth row vector q j of the Q-matrix corresponds to the attributes required by item j.</p><p>The DINA model <ref type="bibr">(Haertel, 1989;</ref><ref type="bibr">Junker &amp; Sijtsma, 2001</ref>) is one of, if not the simplest, consequently most restrictive, interpretable CDMs available for dichotomously scored tests. For a specific examinee with an attribute profile &#945;, we can define the ideal response &#951;(&#945;, q j ) to item j relying on &#945; and q j as &#951;(&#945;, q j ) = K k=1 &#945; q jk k .</p><p>(1)</p><p>For brevity, given examinee i's attribute profile &#945; i , the ideal response &#951;(&#945; i , q j ) can also be written as &#951; ij if the context permits. The &#951; ij is an indicator of whether examinee i masters all the required attributes for item j, which indicates that each item partitions all examinees into two latent groups. Let</p><p>be the guessing and slipping parameters, respectively. For examinee i and item j, the positive response probability, denoted by &#952; j,&#945; i = P (Y ij = 1|&#945; i ), takes the form</p><p>de la Torre (2011) proposed a general framework for CDMs based on the DINA model, called the GDINA model, which characterized more complex relationships between attribute profiles and response data. In the GDINA model, the positive response probability can be decomposed into the sum of the effects due the presence of required attributes and their interactions. We let K * j = K k=1 q jk be the number of required attributes by item j, which is determined by the jth row vector q j in the Q-matrix. For a specific examinee with &#945; and item j, we rearrange the structure of attribute profiles, so that the first K * j attributes are the attributes required by item j. The reduced attribute profile for item j consists of the first K * j required attributes denoted by &#945;</p><p>Similarly to &#945; and Y , there also exists the examinee-specific reduced attribute profile,</p><p>. Given &#945; * j , the item response probability of item j is modeled as</p><p>where &#952; j,&#945; * j = P (Y ij = 1|&#945; * j ) represents the positive response probability of the examinees with the reduced &#945; * j to item j, h(&#8226;) is the link function where usually probit, identity, log and logit links can be employed, &#955; j0 is the intercept, &#955; jk is the main effect corresponding to &#945; * jk , &#955; jkk is the two-way interaction corresponding to &#945; * jk and &#945; * jk , . . . , &#955; j12&#8226;&#8226;&#8226;K * j is the K * j -way interaction corresponding to all required attributes. We let</p><p>represent the item parameters for item j and &#955; = (&#955; 1 , &#8226; &#8226; &#8226; , &#955; J ) represent the item parameters for all items. The number of item parameters is determined by the structure of Q-matrix. Specifically, for item j, the number of item parameters is 2 K * j . In this paper, we shall focus on the probit link function, whereas the proposed method can be applied to other link functions as well.</p><p>Under the GDINA model, each item j can divide examinees into 2 K * j latent groups.</p><p>Because &#945; * j is a sub-vector of &#945;, we should notice that &#952; j,&#945; = &#952; j,&#945; * j . For equation (3), we can use the vector-notation to rewrite the positive response probability as follows</p><p>where</p><p>The GDINA model degenerates to the DINA model by setting all item parameters, except &#955; j0 and</p><p>The collection of positive response probabilities is denoted by a J &#215; C matrix &#920; = (&#952; j,&#945; ), which may depend on different forms of item parameters in different CDMs. Given the response data and all attribute profiles, the conditional likelihood function takes the following form:</p><p>(4)</p><p>After integrating attribute profiles, the marginal likelihood function takes the following form:</p><p>(5)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Priors of Measurement Models' Parameters</head><p>The population proportion parameter &#960; includes the saturated information about the attribute profile distribution in CDMs. The Dirichlet distribution is commonly used as a conjugated prior for &#960;, such as <ref type="bibr">Culpepper (2015)</ref>, <ref type="bibr">Culpepper and Hudson (2018)</ref> and <ref type="bibr">Zhan et al. (2019)</ref>. The specific form of the Dirichlet prior for &#960;</p><p>where &#948; = (&#948;, &#8226; &#8226; &#8226; , &#948; C ) represent a C-dimensional hyper-parameter vector for &#960;.</p><p>In different CDMs, item parameters will be presented in the different forms, such as g j and s j in the DINA model, and &#955; j in the GDINA model. For the DINA model, independent Beta distributions, Beta(a g , b g ) and Beta(a s , b s ), are often used as the priors for guessing and slipping parameters, respectively. We may also constraint 0 &#8804; g j &lt; 1 -s j &#8804; 1, to ensure the model identifiability <ref type="bibr">(Chen et al., 2018;</ref><ref type="bibr">Gu &amp; Xu, 2019;</ref><ref type="bibr">Junker &amp; Sijtsma, 2001;</ref><ref type="bibr">Xu &amp; Zhang, 2016)</ref>. For the GDINA model, the normal distributions are often taken as priors for the item parameters &#955; (e.g., <ref type="bibr">Zhan et al., 2019)</ref>. Specifically, two types of priors are often chosen: one is a multivariate normal distribution, &#955; j &#8764; N(&#181; &#955; j , &#931; &#955; j ), as a general choice; the other is a truncated multivariate normal distribution, &#955; j &#8764; N(&#181; &#955; j , &#931; &#955; j )I {&#955; j &#8712;T } , which is used to ensure certain monotonicity assumption of the item response function. Here</p><p>, and each T m represents some pre-specified constraint of the m-th element of &#955; j . For instance, we may restrict the main effect terms in &#955; j to be positive to ensure the monotonicity assumption.</p><p>As discussed in the introduction, for a large K, the existing MCMC algorithms with Dirichlet prior for &#960; often suffer from the increasing computational cost of sampling each latent attribute profile &#945; i from its conditional distribution, which is a categorical distribution with 2 K different categories. Therefore, it needs to evaluate 2 K posterior probabilities of each possible profile candidate to sample each &#945; i , and the corresponding computational overhead for sampling &#945; i is O(2 K ). For a large K, this would lead to a significant computational burden and also affect the convergence of the MCMC algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The Sequential Gibbs Sampling</head><p>In this section, we introduce the sequential Gibbs sampling method, which samples each attribute separately and is computationally efficient for large K. The sequential Gibbs sampling algorithm will be derived for the GDINA model. It's natural to apply the sequential Gibbs sampling to other CDMs, and an example of the DINA model is given in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Motivation</head><p>With the commonly used Dirichlet prior for &#960;, many existing Gibbs sampling methods suggest that the full conditional distribution for &#945; takes the following from:</p><p>where the " * " represented all the other parameters and responses. To infer a specific examinee's attribute profile, we need to calculate the posterior probability p(&#945; c | * ) for c = 1, &#8226; &#8226; &#8226; , C to obtain the posterior distribution. For large K, the computation is challenging.</p><p>Since in this Gibbs sampling method, the whole &#945; should be sampled simultaneously, hereafter this sampling method is referred to as the simultaneous Gibbs sampling.</p><p>Following <ref type="bibr">Culpepper and Hudson (2018)</ref>, we first describe the sequential sampling method for the attributes. Let &#945; \k denote the sub-vector of &#945; excluding the k-th attribute.</p><p>Based on the fact that knowing &#945; \k and &#945; k is equivalent to knowing the attribute profile &#945;, it's obvious that p(&#945;|&#960;) = p(&#945; \k , &#945; k |&#960;). According to Bayes' theorem, given the &#945; \k and &#960;, the conditional probability of &#945; k is</p><p>Considering Equation ( <ref type="formula">7</ref>), the full conditional distribution for &#945; k is calculated from</p><p>In Equation ( <ref type="formula">8</ref>), p(Y |&#945;, &#955;) is the conditional likelihood function.</p><p>For the second term on RHS of Equation ( <ref type="formula">8</ref>), noticing the binary nature of &#945; k &#8712;{0, 1}, we know that conditional on &#945; \k and &#960;,</p><p>with</p><p>, where we use the notation &#945; c\k to represent the &#945; vector corresponding to a general latent class c excluding the k-th attribute. </p><p>When there is no ambiguity, we will write p k|&#945; \k ,&#960; and p k|&#945; i\k ,&#960; as p k and p ik in the following.</p><p>The Equations ( <ref type="formula">8</ref>) and (9) imply a sampling method that can sample the latent attributes sequentially one by one. Without loss of generality, the attributes are sampled in an increasing order (i.e., &#945; 1 , &#8226; &#8226; &#8226; , &#945; K are sampled in turns). In Table <ref type="table">1</ref>, an example with three attributes is presented to show how Equation (9) works. An 8-dimensional vector &#960; (i.e., K = 3) is used to represent the saturated population information and &#945; 1 , &#945; 2 and &#945; 3 are generated in turns. The first two rows show a one-to-one mapping between &#945; and &#960;.</p><p>Similar to Gibbs sampling, an initial value of the attribute profile is needed as the starting point. Without loss of generality, let the initial value of &#945; equal to (000). When to sample the first attribute &#945; 1 , &#945; 2 = &#945; 3 = 0 is used in Equation ( <ref type="formula">9</ref>), then the first attribute &#945; 1 can be drawn from a Bernoulli distribution with p 1 = &#960; 2 &#960; 1 +&#960; 2 , which is the prior conditional probability of &#945; 1 = 1 given &#945; 2 = &#945; 3 = 0. Assuming the realization of the first attribute &#945; 1 is 1, then we can sample the second attribute &#945; 2 , conditional on &#945; 1 = 1 and &#945; 3 = 0, from a Bernoulli distribution with p 2 = &#960; 4 &#960; 2 +&#960; 4 in Table <ref type="table">1</ref>. Assuming the realization of &#945; 2 is 0, then we move on to sample &#945; 3 , conditional on &#945; 1 = 1 and &#945; 2 = 0, from a Bernoulli distribution with p 3 = &#960; 6 &#960; 2 +&#960; 6 in Table <ref type="table">1</ref>. </p><p>Note. The column "p k " represents the conditional probability of &#945; k = 1. The column "Prob" is the probability of realization &#945; k shown in the first column (in this table, the realizations are &#945; 1 = 1, &#945; 2 = 0 and &#945; 3 = 1).</p><p>In both of the sequential and simultaneous sampling methods, the sampling of attribute profiles depends on &#960; and &#948;. However, in the simultaneous Gibbs sampling, each attribute profile is treated as a basic unit, and the joint information p(&#945;|&#960;) is used to sample &#945;. This method is very slow when K is large. In the sequential Gibbs sampling, each element of the attribute profile is sampled seperately from the conditional Bernoulli distribution of &#945; k |&#945; \k , &#960;, which would reduce the compuational cost significantly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Sequential Gibbs Sampling Schedules</head><p>With the above introduced sequential sampling method for attributes, in this section we derive the Gibbs sampling updates for other model parameters. To illustrate our method, we shall focus on the GDINA model with a probit link function and use the prior settings introduced in Section 2.2.</p><p>We will use a data augmentation strategy to derive a closed-form Gibbs sampling method for the item parameters. Please note that similar sampling methods have been proposed in the CDM literature <ref type="bibr">(Chen, Culpepper, &amp; Liang, 2020;</ref><ref type="bibr">Culpepper, 2019a</ref><ref type="bibr">Culpepper, , 2019b))</ref>. Specifically, we introduce the data augmentation process for the examinee with &#945; to item j as follows</p><p>where &#949; j follows a standard normal distribution and Z j is a latent auxiliary variable. The Z j is the examinee-specific, and the augmented data of item j for examinee i is denoted by</p><p>With the introduced augmented data Z, the Gibbs sampling needs to sample from the four full conditional distributions: p(Z|Y , &#945;, &#960;, &#955;), p(&#955;|Y , Z, &#945;, &#960;), p(&#945; ik |Y , Z, &#945; i\k , &#960;, &#955;) and p(&#960;|Y , Z, &#945;, &#955;).</p><p>Sample Augmented Data. For examinee i and item j, the augmented data is Z ij .</p><p>Conditional on &#945;, the distribution of Z ij is independent of the parameter &#960;, which means the distributions p(Z|Y , &#945;, &#960;, &#955;) and p(Z|Y , &#945;, &#955;) are equivalent.</p><p>According to the jth row vector q j in the Q-matrix, we can get the reduced vector</p><p>The augmented data is generated by the formula</p><p>Sample Item Parameters. In the GDINA model, two considered types of item parameter priors are the multivariate normal distribution and the truncated multivariate normal distribution, which will induce two sampling methods to sample item parameters. The truncated prior is suitable for the case we have known some constrains on item parameters. The multivariate normal distribution is suitable for the case we don't have additional information about item parameters.</p><p>For the item parameters, the conditional independence implies p(&#955;|Y , Z, &#945;, &#960;) and p(&#955;|Y , Z, &#945;) are equivalent. To sample the item parameters for item j, the information of all examinees for this item need to be considered. We arrange all examinees' augmented data about item j in a vector</p><p>Given Z j and X j , a linear regression model is obtained as follows:</p><p>where</p><p>there are no constraints on the item parameter &#955; j , which follows a the prior N(&#181; &#955; j , &#931; &#955; j ), then we can obtain the full conditional distribution <ref type="bibr">(Minka, 2000)</ref> whose form is shown as</p><p>where &#931;-1</p><p>The sampling method using Equation ( <ref type="formula">11</ref>) is called the sampling without truncation. The specifics of the derivation can be found in the Appendix B. If the prior of &#955; j is the truncated distribution N(&#181; &#955; j , &#931; &#955; j )I {&#955; j &#8712;T } , we can obtain the closed form for &#955; j 's full conditional distribution:</p><p>The sampling method using Equation ( <ref type="formula">12</ref>) is called the sampling with truncation. The details about how to sample from the truncated multivariate normal distribution will be discussed in the Appendix C.</p><p>Sample Attribute Profiles. In the sequential Gibbs sampling, attributes are sampled one by one, instead of the whole attribute profile. For examinee i, if the k-th attribute &#945; ik isn't required by an item, the value of &#945; ik won't affect the item's likelihood. So when to sample attribute &#945; ik , we only need to pay attention to the items requiring the k-th attribute. Hence, we define a set &#937;k = {j | q jk = 1, j = 1, &#8226; &#8226; &#8226; , J}, which represents the items which require attribute &#945; k , and the complementary set of &#937;k is defined as</p><p>Only the items from &#937;k will affect the inference about &#945; k .</p><p>Assuming item j belongs to &#937;k and giving the reduced attribute profile &#945; * ij , the positive response probability &#952; j,&#945; i = &#934;(X &#945; * ij &#955; j ). For the specific examinee i, the likelihood</p><p>where</p><p>1 with the two terms T ij 0 and T ij 1 defined as follows. For X &#945; * ij &#955; j , the notation T ij 0 is the sum of the terms which don't contain &#945; ik and T ij 1 &#945; ik is the sum of the terms related to &#945; ik . If examinee i masters attribute &#945; k , the positive response probability is &#934;(T ij 0 + T ij 1 ), otherwise, the positive response probability is &#934;(T ij 0 ). Therefore, the positive response probability is</p><p>, can be obtained for the negative response. For example, assume that the vector q j = (110), the third attribute doesn't affect the positive response probability and the likelihood function. In other words, from the responses on this item we can't get any information about the third attribute. We show how to calculate T ij 0 and T ij 1 . When to investigate the first attribute &#945; 1 , the positive response probability is that</p><p>According to Equation ( <ref type="formula">13</ref>), it's obvious that only the items in &#937;ik will affect the full conditional distribution for &#945; ik . The parameters &#960; and &#945; \k are used to calculate the prior conditional probability for &#945; ik , with p ik calculated as in Section 3. Then the full conditional distribution for &#945; ik is calculated by</p><p>Hence, the full conditional distribution for &#945; ik is Bernoulli(p ik ), where the value of pik is given by</p><p>Sample the Population Parameter. The population parameter &#960; is a C-dimensional vector, whose prior is Dirichlet(&#948;). Given &#945;, we can calculate the number of examinees within the latent class c, N c = N i=1 I {&#945; i =&#945;c} , and the vector</p><p>From the conditional indepdendence, we know p(&#960;|Y , Z, &#945;, &#955;; &#948;) and p(&#960;|&#945;; &#948;) are equivalent. And we can write the posterior of &#960; as</p><p>We summarize the sequential Gibbs sampling for the GDINA model in Algorithm 1.</p><p>The sequential sampling method can be straightforwardly applied to other CDMs as well, and the DINA example is illustrated in the Appendix.</p><p>Algorithm 1: Sequential Gibbs Sampling for GDINA models Input: Initialize &#955; (0) , &#945; (0) , &#960; (0) , Y , m = 0, M and specify priors.</p><p>Output: Markov chains of &#955;, &#945;, &#960;. while m &lt; M do Generate the augmented data from Equation (10). Sample item parameters from Equation ( <ref type="formula">11</ref>) or ( <ref type="formula">12</ref>). Sample attribute profiles from Equation ( <ref type="formula">16</ref>). Sample the population parameter from Equation ( <ref type="formula">17</ref>). Set m = m + 1 end</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Simulation Studies</head><p>In this section, the simultaneous Gibbs sampling, independent Gibbs sampling <ref type="bibr">(Zhan et al., 2015)</ref> 1 , and sequential Gibbs sampling are used to estimate parameters in the DINA and GDINA models. The simulation studies intend to implement on different settings of K.</p><p>However, for large K, the simultaneous Gibbs sampling methods doesn't work due to the high computational cost, so only the results of independent and sequential Gibbs sampling are shown. To show that the difference among these methods is purely caused by the difference among the sampling techniques rather than the software, we code and compile all these three methods by ourselves. The computation of the simulation study is implemented by Dell XPS with 3.0 GHz Intel Core i7-9700, 24 GB RAM.</p><p>The statistical software JAGS (Just Another Gibbs Sampling; <ref type="bibr">Plummer, 2003)</ref>, as the off-the-shelf sampling method, is also used to implement the simultaneous Gibbs sampling method for the DINA and GDINA models. The JAGS is similar to WinBUGS <ref type="bibr">(Lunn, Thomas, Best, &amp; Spiegelhalter, 2000)</ref> and OpenBUGS <ref type="bibr">(Foulley &amp; Jaffr&#233;zic, 2010)</ref>. <ref type="bibr">Zhan et al. (2019)</ref> showed how to implement the DINA and linear logistics models (LLM, see <ref type="bibr">Maris, 1999)</ref> by JAGS. We use JAGS to analyze the DINA and GDINA models. When using JAGS, the initial values of all parameters are generated by the default way within JAGS. Under the DINA model, another simultaneous sampler, the R package "dina" <ref type="bibr">(Culpepper, 2015)</ref>, is also used to estimate the model. Our simulation results indicate that "dina" is faster than JAGS 2 and the parameter estimates of "dina" are similar to JAGS. As the R package "dina" can't handle the GDINA model, the detailed results of 1 For the independent Gibbs sampling, we use the independent Bernoulli prior that &#945; nk &#8764; Bernoulli(p ik ) and p ik &#8764; Beta(1, 1), where i and k indicate examinee and attribute, respectively.</p><p>2 Particularly, using "dina" to run the chains with 2000 iterations for the DINA settings in our simulation with {N = 2000, K = 3}, {N = 2000, K = 5}, and {N = 2000, K = 7} needs about 20, 82, and 344 seconds, which are faster than JAGS but slower than the proposed method.</p><p>"dina" are not shown.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Simulation Design</head><p>The attribute profiles are generated from the two following structures.</p><p>Uniform Structure. The uniform structure assumes that all latent classes share the same probability.</p><p>Correlated Structure. <ref type="bibr">Chiu, Douglas, and Li (2009)</ref> proposed a correlated structure for attribute profiles, which can be viewed as a special case of the higher-order attribute structure. For each examinee, the K-dimensional vector &#952; = (&#952; 1 , &#8226; &#8226; &#8226; , &#952; K ) follows a multivariate normal distribution N(0, &#931;), where the covariance matrix &#931; has a common correlation &#961; as follows</p><p>then attributes are determined by</p><p>Chen, Liu, Xu, and Ying (2015) also called this situation as "Dependent Attributes".</p><p>For the DINA and GDINA models, the generation methods of item parameters need to be introduced separately. For the DINA model, we set the guessing and slipping parameters to 0.2. For the GDINA model, another equivalent notation is introduced to make the description of item parameters clear. For item j, let &#955; (0) j and &#955;</p><p>(w) j denote the intercept parameter and w-way interaction parameter, respectively. We generate &#955; j from a multivariate normal distribution with a diagonal covariance matrix. In particular, the distribution to generate &#955; j is specified as:</p><p>This generation method of item parameters indicates the same-way interactions share similar properties (i.e., the same distribution).</p><p>Based on the model identifiability and generic identifiability restrictions <ref type="bibr">(Chen et al., 2020;</ref><ref type="bibr">Gu &amp; Xu, 2020;</ref><ref type="bibr">Xu, 2017;</ref><ref type="bibr">Xu &amp; Zhang, 2016)</ref>, the Q-matrix has this form</p><p>where each item in Q 2 and Q 3 requires two and three attributes, respectively. In addition, we randomly sample non-zero q-vectors which require three or fewer attributes to fill Q.</p><p>Given the attribute profiles, the item parameters, the Q-matrix, and the response data can be generated. For different K, the Q-matrices are fixed and shown in Appendix D.</p><p>The sampling methods are compared from three aspects: speed, parameter estimation accuracy and classification accuracy. The running times of the sampling methods are used to reflect the speed, and the bias, root mean squared error (RMSE) and mean squared error (MSE) are used to evaluate the accuracy. The average bias, RMSE and MSE, denoted by Bias, RMSE and MSE, are computed for each type parameter, accord-</p><p>, where &#966;r h denotes the estimation from r-th replication of a parameter, &#966; h denotes the true value, and R denotes the number of replications When n is a positive integer, PARn is a relaxation of PAR (i.e., PAR0).</p><p>For any estimation of the parameter &#960;, Bias &#960; = 0 always holds. So the maximum norm is used to replace the bias to evaluate the performance of population parameter estimations. When the true value and estimation of population parameter are &#960; and &#960;(r) , the maximum norm of difference is &#960; -&#960;(r) &#8734; = max c |&#960; c -&#960;c |. The maximum norm measures the maximum of absolute deviance. If the estimation repeats R times, the average maximum norm (MN) is</p><p>Table <ref type="table">2</ref> summaries the simulation study basic settings: sample sizes N = 1000 and 2000; the number of the items J = 30; attribute structures uniform, correlated structures with two correlation levels &#961; = 0.3 and 0.7. We call the cases K = 3 and 5 as the low dimension cases, where the simultaneous and sequential Gibbs sampling are conducted.</p><p>The cases K = 7 and 15 are named as the high dimension cases, only the sequential Gibbs sampling is performed. For each particular case, 25 independent response datasets are generated. Note. The column "Sim", "Seq" and "Ind" represent the simultaneous, sequential and independent Gibbs samplings, respectively.</p><p>For the low dimension (K = 3, 5), the Dirichlet prior's hyper-parameter &#948; is the Cdimensional vector 1, leading to a non-informative prior. For the high dimension (K = 7, 15), three &#948;'s are used: &#948; = 0.01, 0.1 and 1, which are indicated as "S", "M" and "L" in Table <ref type="table">2</ref>. For the GDINA model, the priors for the item parameter &#955; j are shown as follows:  Besides the priors, we need to specify initial values for &#945;, &#960; and &#955;. The initial value of &#945; is that each attribute is randomly sampled from an independent Bernoulli(0.5). The initial value of population parameter &#960; is the C-dimensional vector 1/C. The initial value of &#955; is a random sample from the &#955;'s prior. <ref type="bibr">Culpepper (2015)</ref> showed that, for the DINA model, simultaneous Gibbs sampling only needed about 750 iterations to reach convergence. Consequently, in this paper, a 2000 iterations Markov chain is run and we discard the first 1000 iterations as burn-in which are adequate to reach convergence. For the GDINA model, we conduct a Markov chain with length 3000 and burn-in length 2000.</p><p>For the simulation results, potential scale reduction factor R1/2 <ref type="bibr">(Brooks &amp; Gelman, 1998;</ref><ref type="bibr">Gelman &amp; Rubin, 1992</ref>) is used for convergence diagnosis. <ref type="bibr">Brooks and Gelman (1998)</ref> suggested that R1/2 &lt; 1.2 for all model parameters indicates that convergence has been reached. To make the conclusion more reliable, the condition R1/2 &lt; 1.1 can be used.</p><p>In Figure <ref type="figure">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Simulation Results</head><p>There are 7 tables to show the results for different settings of the DINA and GDINA models. For the GDINA model with a probit link function, the generation of item parameters has a great probability that the intercept is negative and the interaction is positive.</p><p>Consequently the results of the sampling with or without truncation are very similar, and here we only show the results of sampling with truncation. The digits of bias, RMSE, MSE, AAR and PARn are rounded off to four decimal places and the digits of time are rounded off to five significant figures.</p><p>Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>show the results of the uniform population, correlated structure with &#961; = 0.3 and 0.7 for the low dimension cases, respectively. Hereafter, the "Sim", "Seq", "Ind", and "sSim" represent the simultaneous (implemented by JAGS), sequential, independent, and self-compiled simultaneous Gibbs sampling methods, respectively. For the DINA model, 'Sim", "Seq" and "sSim" always obtain similar estimation results for item parameters, population parameters, and attribute patterns, which indicates the estimation consistency between the sequential and simultaneous Gibbs sampling methods.</p><p>When the population is uniform, the "Ind" method also performs similarly to the other methods. However, as the correlation &#961; among the attributes increases, the "Ind" method performs more poorly in MN &#960; . For the GDINA model, inferences between the sequential and simultaneous Gibbs sampling methods are also consistent. As the correlation &#961; increases, the "Ind" method will underperform the other methods, due to the violation of its assumption on the independence of the attributes.</p><p>For both models, the accuracy of estimations becomes better with the large sample size. The classification accuracy of the "Sim" method, as a baseline, is comparable to that of the other methods.</p><p>Tables 6 -8 show the results of the uniform population and correlated structures with &#961; = 0.3 and 0.7 for the high dimension cases. If the population distribution of attribute profiles is uniform, the "Ind" method is comparable or slightly outperforms the sequential Gibbs sampling method. When the attributes become more correlated, the sequential sampling method outperforms the "Ind" method. For instance, in the high-dimensional condition with &#961; = 0.7 and K = 15, the sequential sampling with a smaller &#948; is superior to the other methods. We also find some common phenomena that when K = 15, the RMSE &#960; is approximate to 0.0001 (i.e., the MSE &#960; is approximate to 0), which is due to the large number of latent classes.</p><p>In Table <ref type="table">9</ref>, the average computational time is reported. Since the dependent structure of the attribute profiles has almost negligible influence on the computational time, we only report the calculation time of different methods under the uniform population. The results show that the simultaneous Gibbs sampling implemented by JAGS has the slowest speed among the four methods. In order to show the computational superiority of the sequential Gibbs than that of the simultaneous Gibbs sampling, we also add the K = 7 condition for the self-compiled simultaneous Gibbs sampling. Comparing the time of "Seq" and "sSim", when K = 7, the superiority of the sequential Gibbs sampling method appears in the GDINA model. However, the superiority of the sequential Gibbs sampling method has been reflected in the DINA model with K = 5. In addition, "Ind" is also computationally efficient, with comparable computational time to the sequential Gibbs method.</p><p>In the simulation study, using JAGS to estimate the GDINA model needs more memory space than the DINA model. Particularly, for the GDINA model, we find that the estimation process in JAGS is often executed or shut down due to "run out of memory".</p><p>Therefore, the presented computational time for the GDINA model is only the average of those well-converged replications in JAGS, which leads to a counter-intuitive observation that the simultaneous Gibbs sampling for the GDINA model is "faster" than the simultaneous Gibbs sampling of the DINA model. Due to the high computational cost, the simultaneous Gibbs sampling time for some high dimension cases is not reported.</p><p>Through simulation studies, we can find when K is small, the sequential Gibbs can use less time to obtain the results with similar accuracy as JAGS. When K is large, the sequential Gibbs sampling algorithm still works well, but the simultaneous Gibbs sampling algorithm doesn't due to the high computational cost. The speed advantage of the sequential Gibbs sampling become more apparent as K increases. In addition, when the number of attributes K goes large, the moderately small hyperparameter &#948; is preferred. Comparing the independent and sequential Gibbs sampling methods, for the uniform structure with independent attributes, the independent Gibbs sampling method is comparable or slightly outperforms the sequential Gibbs sampling method. However, for correlated structures, the sequential Gibbs sampling method provides better performance (see the Bias in Tables <ref type="table">5</ref> and<ref type="table">8</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Real Data Analysis</head><p>In this analysis, the DINA and GDINA models are used to deal with the Tatsuoka's fraction-subtraction data (C. <ref type="bibr">Tatsuoka, 2002)</ref>. The fraction-subtraction data has been widely analyzed. For the data set, the Q-matrix (de la Torre &amp; Douglas, 2004) and contents are shown in Table <ref type="table">10</ref>. This data set contains responses of 536 middle school students  0.0010 0.0010 0.0013 0.0010 0.0002 0.0002 0.0004 0.0002 0.0009 0.0011 0.0005 0.0011 0.0008 0.0009 0.0014 0.0009 RMSE g 0.0156 0.0156 0.0156 0.0155 0.0113 0.0113 0.0113 0.0114 0.0164 0.0164 0.0161 0.0164 0.0122 0.0122 0.0122 0.0122 MSE g 0.0003 0.0003 0.0003 0.0003 0.0001 0.0001 0.0001 0.0001 0.0003 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0002 Bias s 0.0046 0.0037 0.0034 0.0037 0.0001 -0.0003 -0.0004 -0.0003 0.0035 0.0022 0.0036 0.0021 0.0019 0.0013 0.0005 0.0013 RMSE s 0.0257 0.0254 0.0253 0.0254 0.0171 0.0171 0.0171 0.0171 0.0278 0.0273 0.0273 0.0274 0.0179 0.0178 0.0177 0.0178 MSE s 0.0007 0.0007 0.0007 0.0007 0.0003 0.0003 0.0003 0.0003 0.0009 0.0008 0.0008 0.0008 0.0004 0.0004 0.0003 0.0004 MN &#960; 0.0158 0.0159 0.0144 0.0159 0.0117 0.0116 0.0115 0.0118 0.0154 0.0155 0.0109 0.0154 0.0105 0.0105 0.0065 0.0106 RMSE &#960; 0.0094 0.0093 0.0082 0.0094 0.0060 0.0059 0.0054 0.0060 0.0059 0.0059 0.0041 0.0059 0.0040 0.0040 0.0026 0.0040 MSE &#960; 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AAR 0.9742 0.9741 0.9739 0.9740 0.9749 0.9748 0.9748 0.9748 0.9445 0.9443 0.9445 0.9443 0.9461 0.9459 0.9466 0.9460 PAR1 0.9956 0.9953 0.9952 0.9954 0.9962 0.9963 0.9961 0.9961 0.9644 0.9643 0.9654 0.9645 0.9667 0.9669 0.9676 0.9667 GDINA Bias &#955; 0.0305 0.0310 0.0297 0.0307 0.0243 0.0244 0.0250 0.0244 0.0371 0.0372 0.0313 0.0374 0.0326 0.0331 0.0319 0.0332 RMSE &#955; 0.1370 0.1373 0.1378 0.1370 0.1148 0.1149 0.1151 0.1148 0.1334 0.1336 0.1325 0.1336 0.1288 0.1288 0.1275 0.1285 MSE &#955; 0.0221 0.0222 0.0223 0.0221 0.0162 0.0161 0.0162 0.0161 0.0205 0.0205 0.0203 0.0206 0.0209 0.0210 0.0206 0.0209 MN &#960; 0.0235 0.0238 0.0180 0.0236 0.0119 0.0118 0.0091 0.0119 0.0214 0.0217 0.0107 0.0214 0.0153 0.0152 0.0063 0.0155 RMSE &#960; 0.0129 0.0130 0.0098 0.0130 0.0063 0.0063 0.0046 0.0063 0.0089 0.0090 0.0040 0.0090 0.0061 0.0061 0.0025 0.0061 MSE &#960; 0.0002 0.0002 0.0001 0.0002 0.0000 0.0000 0.0000 0.0000 0.0001 0.0001 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 AAR 0.9436 0.9423 0.9431 0.9435 0.9452 0.9446 0.9443 0.9452 0.8921 0.8920 0.8945 0.8923 0.8959 0.8960 0.8969 0.8956 PAR1 0.9575 0.9557 0.9563 0.9578 0.9604 0.9595 0.9590 0.9605 0.8882 0.8876 0.8909 0.8886 0.8891 0.8892 0.8907 0.8887 Note. The "Sim", "Seq", "Ind" and "sSim" represent the simultaneous (implemented by JAGS), sequential, independent and self-compiled simultaneous Gibbs sampling methods, respectively. 0.0004 0.0005 -0.0018 0.0005 -0.0002 -0.0001 -0.0024 -0.0001 0.0010 0.0012 -0.0044 0.0011 0.0005 0.0006 -0.0043 0.0006 RMSE g 0.0159 0.0159 0.0161 0.0159 0.0113 0.0113 0.0117 0.0113 0.0170 0.0170 0.0180 0.0170 0.0120 0.0120 0.0132 0.0120 MSE g 0.0003 0.0003 0.0003 0.0003 0.0001 0.0001 0.0001 0.0001 0.0003 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 Bias s 0.0023 0.0016 0.0037 0.0016 0.0006 0.0003 0.0024 0.0003 0.0030 0.0020 0.0072 0.0020 0.0012 0.0007 0.0046 0.0007 RMSE s 0.0216 0.0215 0.0217 0.0215 0.0157 0.0156 0.0161 0.0157 0.0237 0.0235 0.0250 0.0235 0.0166 0.0166 0.0179 0.0166 MSE s 0.0005 0.0005 0.0005 0.0005 0.0003 0.0003 0.0003 0.0003 0.0006 0.0006 0.0007 0.0006 0.0003 0.0003 0.0003 0.0003 MN &#960; 0.0076 0.0077 0.0227 0.0076 0.0055 0.0054 0.0253 0.0054 0.0128 0.0128 0.0480 0.0125 0.0084 0.0082 0.0444 0.0083 RMSE &#960; 0.0034 0.0034 0.0068 0.0034 0.0025 0.0025 0.0072 0.0025 0.0039 0.0039 0.0047 0.0039 0.0026 0.0026 0.0039 0.0026 MSE &#960; 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 AAR 0.9789 0.9788 0.9748 0.9789 0.9790 0.9789 0.9740 0.9790 0.9531 0.9530 0.9423 0.9533 0.9545 0.9543 0.9445 0.9544 APR1 0.9964 0.9965 0.9964 0.9964 0.9969 0.9969 0.9967 0.9969 0.9724 0.9722 0.9624 0.9721 0.9732 0.9728 0.9639 0.9728 GDINA Bias &#955; 0.0304 0.0314 0.0371 0.0310 0.0220 0.0224 0.0279 0.0225 0.0371 0.0398 0.0565 0.0402 0.0270 0.0285 0.0457 0.0281 RMSE &#955; 0.1301 0.1303 0.1285 0.1304 0.1051 0.1053 0.1021 0.1052 0.1457 0.1459 0.1474 0.1461 0.1192 0.1193 0.1200 0.1194 MSE &#955; 0.0198 0.0199 0.0195 0.0199 0.0130 0.0130 0.0126 0.0130 0.0250 0.0252 0.0260 0.0252 0.0176 0.0176 0.0183 0.0176 MN &#960; 0.0109 0.0111 0.0250 0.0109 0.0091 0.0093 0.0195 0.0091 0.0215 0.0218 0.0453 0.0223 0.0153 0.0157 0.0415 0.0156 RMSE &#960; 0.0064 0.0065 0.0112 0.0063 0.0049 0.0050 0.0089 0.0049 0.0076 0.0076 0.0073 0.0077 0.0056 0.0057 0.0064 0.0057 MSE &#960; 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0001 0.0000 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0001 0.0000 AAR 0.9488 0.9485 0.9466 0.9488 0.9531 0.9531 0.9512 0.9531 0.8936 0.8933 0.8914 0.8933 0.8976 0.8977 0.8938 0.8977 PAR1 0.9660 0.9657 0.9630 0.9661 0.9701 0.9697 0.9665 0.9697 0.8931 0.8920 0.8877 0.8924 0.9031 0.9030 0.8952 0.9030 Note. The "Sim", "Seq", "Ind" and "sSim" represent the simultaneous (implemented by JAGS), sequential, independent and self-compiled simultaneous Gibbs sampling methods, respectively. 0.0001 0.0001 -0.0055 0.0002 0.0000 0.0000 -0.0057 0.0000 0.0019 0.0019 -0.0097 0.0019 -0.0000 -0.0000 -0.0119 -0.0000 RMSE g 0.0164 0.0164 0.0181 0.0164 0.0116 0.0116 0.0140 0.0116 0.0169 0.0169 0.0223 0.0169 0.0130 0.0130 0.0200 0.0130 MSE g 0.0003 0.0003 0.0003 0.0003 0.0001 0.0001 0.0002 0.0001 0.0003 0.0003 0.0005 0.0003 0.0002 0.0002 0.0005 0.0002 Bias s 0.0022 0.0016 0.0069 0.0016 0.0011 0.0008 0.0066 0.0008 0.0019 0.0014 0.0108 0.0014 0.0008 0.0005 0.0104 0.0005 RMSE s 0.0209 0.0208 0.0227 0.0208 0.0145 0.0145 0.0171 0.0145 0.0210 0.0209 0.0260 0.0209 0.0141 0.0140 0.0210 0.0140 MSE s 0.0004 0.0004 0.0005 0.0004 0.0002 0.0002 0.0003 0.0002 0.0005 0.0004 0.0007 0.0004 0.0002 0.0002 0.0005 0.0002 MN &#960; 0.0072 0.0070 0.0546 0.0070 0.0054 0.0053 0.0581 0.0053 0.0127 0.0128 0.1206 0.0126 0.0080 0.0082 0.1259 0.0082 RMSE &#960; 0.0034 0.0035 0.0144 0.0034 0.0024 0.0024 0.0149 0.0024 0.0033 0.0033 0.0087 0.0033 0.0023 0.0023 0.0087 0.0023 MSE &#960; 0.0000 0.0000 0.0005 0.0000 0.0000 0.0000 0.0005 0.0000 0.0000 0.0000 0.0005 0.0000 0.0000 0.0000 0.0005 0.0000 AAR 0.9815 0.9817 0.9696 0.9817 0.9825 0.9825 0.9678 0.9826 0.9652 0.9651 0.9380 0.9651 0.9667 0.9667 0.9361 0.9667 PAR1 0.9966 0.9968 0.9959 0.9967 0.9976 0.9976 0.9963 0.9975 0.9795 0.9796 0.9511 0.9796 0.9819 0.9821 0.9478 0.9819 GDINA Bias &#955; 0.0247 0.0252 0.0343 0.0252 0.0177 0.0186 0.0308 0.0186 0.0197 0.0190 0.0619 0.0184 0.0138 0.0141 0.0477 0.0141 RMSE &#955; 0.1227 0.1226 0.1347 0.1228 0.1139 0.1142 0.1356 0.1142 0.1458 0.1462 0.1647 0.1466 0.1145 0.1140 0.1406 0.1141 MSE &#955; 0.0173 0.0173 0.0209 0.0173 0.0164 0.0164 0.0232 0.0164 0.0262 0.0262 0.0339 0.0264 0.0166 0.0164 0.0259 0.0165 MN &#960; 0.0117 0.0117 0.0754 0.0118 0.0076 0.0075 0.0762 0.0077 0.0198 0.0205 0.1253 0.0195 0.0129 0.0132 0.1414 0.0125 RMSE &#960; 0.0069 0.0069 0.0269 0.0069 0.0042 0.0043 0.0285 0.0043 0.0057 0.0058 0.0153 0.0057 0.0041 0.0041 0.0149 0.0041 MSE &#960; 0.0000 0.0000 0.0011 0.0000 0.0000 0.0000 0.0012 0.0000 0.0000 0.0000 0.0009 0.0000 0.0000 0.0000 0.0010 0.0000 AAR 0.9561 0.9562 0.9449 0.9559 0.9629 0.9626 0.9462 0.9628 0.9365 0.9362 0.9150 0.9364 0.9368 0.9369 0.9137 0.9369 PAR1 0.9774 0.9780 0.9720 0.9776 0.9790 0.9786 0.9724 0.9789 0.9481 0.9475 0.9286 0.9480 0.9514 0.9516 0.9321 0.9514 Note. The "Sim", "Seq", "Ind" and "sSim" represent the simultaneous (implemented by JAGS), sequential, independent and self-compiled simultaneous Gibbs sampling methods, respectively.  0.0066 0.0041 0.0026 0.0018 0.0018 0.0010 0.0005 -0.0004 0.0127 0.0055 0.0056 0.0055 0.0120 0.0007 0.0001 0.0001 RMSE g 0.0208 0.0190 0.0182 0.0178 0.0140 0.0134 0.0131 0.0128 0.0329 0.0299 0.0290 0.0293 0.0260 0.0205 0.0200 0.0198 MSE g 0.0005 0.0004 0.0003 0.0003 0.0002 0.0002 0.0002 0.0002 0.0012 0.0010 0.0009 0.0010 0.0007 0.0005 0.0004 0.0004 Bias s 0.0112 0.0060 0.0026 0.0013 0.0073 0.0040 0.0020 0.0019 0.0036 -0.0038 -0.0046 -0.0046 0.0079 0.0013 0.0013 0.0012 RMSE s 0.0321 0.0290 0.0280 0.0276 0.0229 0.0214 0.0208 0.0201 0.0394 0.0367 0.0359 0.0360 0.0321 0.0268 0.0262 0.0259 MSE s 0.0011 0.0009 0.0008 0.0008 0.0006 0.0005 0.0005 0.0005 0.0017 0.0015 0.0014 0.0014 0.0011 0.0008 0.0008 0.0007 MN &#960; 0.0232 0.0163 0.0106 0.0056 0.0154 0.0114 0.0082 0.0040 0.0056 0.0003 0.0000 0.0006 0.0052 0.0004 0.0000 0.0003 RMSE &#960; 0.0072 0.0052 0.0032 0.0018 0.0052 0.0039 0.0028 0.0013 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 MSE &#960; 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AAR 0.9039 0.9178 0.9256 0.9173 0.9161 0.9237 0.9269 0.9200 0.8167 0.8386 0.8396 0.8246 0.8114 0.8388 0.8402 0.8201 PAR2 0.9712 0.9780 0.9792 0.9777 0.9773 0.9794 0.9799 0.9779 0.4679 0.5574 0.5632 0.4998 0.4436 0.5585 0.5654 0.4797 GDINA Bias &#955; 0.0372 0.0410 0.0403 0.0355 0.0197 0.0264 0.0304 0.0320 0.0336 0.0325 0.0325 0.0320 0.0245 0.0315 0.0318 0.0324 RMSE &#955; 0.1884 0.1700 0.1595 0.1570 0.1445 0.1328 0.1256 0.1219 0.2057 0.1804 0.1781 0.1784 0.1886 0.1578 0.1562 0.1552 MSE &#955; 0.0424 0.0344 0.0306 0.0296 0.0239 0.0204 0.0184 0.0175 0.0465 0.0367 0.0359 0.0361 0.0411 0.0297 0.0293 0.0290 MN &#960; 0.0473 0.0311 0.0144 0.0037 0.0253 0.0185 0.0107 0.0025 0.0077 0.0000 0.0000 0.0001 0.0090 0.0001 0.0000 0.0000 RMSE &#960; 0.0120 0.0077 0.0036 0.0012 0.0089 0.0060 0.0033 0.0009 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 MSE &#960; 0.0002 0.0001 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AAR 0.7846 0.8187 0.8358 0.8401 0.8298 0.8504 0.8589 0.8625 0.6852 0.7328 0.7355 0.7356 0.6754 0.7266 0.7303 0.7302 PAR2 0.8187 0.8760 0.8943 0.9014 0.8981 0.9212 0.9278 0.9305 0.1025 0.1851 0.1923 0.1950 0.0897 0.1781 0.1880 0.1865 Note. In the row "Method", there are three different levels of &#948;; the sequential Gibbs samplings with &#948; = 0.01, 0.1 and 1. 0.0069 0.0039 0.0005 -0.0046 0.0047 0.0029 0.0012 -0.0038 0.0074 -0.0053 -0.0013 0.0008 0.0151 -0.0011 0.0012 0.0049 RMSE g 0.0212 0.0193 0.0185 0.0200 0.0152 0.0142 0.0137 0.0149 0.0283 0.0290 0.0292 0.0293 0.0251 0.0189 0.0208 0.0212 MSE g 0.0005 0.0004 0.0004 0.0004 0.0003 0.0002 0.0002 0.0002 0.0008 0.0009 0.0010 0.0009 0.0007 0.0004 0.0005 0.0005 Bias s 0.0092 0.0058 0.0045 0.0053 0.0044 0.0025 0.0015 0.0022 0.0053 0.0000 -0.0085 -0.0111 0.0012 -0.0034 -0.0116 -0.0163 RMSE s 0.0270 0.0251 0.0246 0.0265 0.0181 0.0173 0.0170 0.0194 0.0339 0.0335 0.0386 0.0399 0.0241 0.0227 0.0294 0.0326 MSE s 0.0008 0.0007 0.0006 0.0007 0.0003 0.0003 0.0003 0.0004 0.0012 0.0012 0.0018 0.0020 0.0006 0.0006 0.0012 0.0017 MN &#960; 0.0233 0.0160 0.0110 0.0384 0.0174 0.0124 0.0096 0.0416 0.0177 0.0100 0.0128 0.0115 0.0174 0.0110 0.0188 0.0183 RMSE &#960; 0.0055 0.0038 0.0025 0.0022 0.0041 0.0029 0.0020 0.0019 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 MSE &#960; 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AAR 0.9114 0.9219 0.9287 0.9117 0.9199 0.9265 0.9306 0.9176 0.8347 0.8506 0.8448 0.8193 0.8314 0.8574 0.8499 0.8231 PAR2 0.9749 0.9790 0.9820 0.9701 0.9793 0.9818 0.9826 0.9734 0.5423 0.6028 0.5769 0.5039 0.5250 0.6302 0.6004 0.5161 GDINA Bias &#955; 0.0302 0.0289 0.0359 0.0459 0.0121 0.0200 0.0286 0.0508 0.0454 0.0492 0.0488 0.0490 0.0440 0.0585 0.0610 0.0602 RMSE &#955; 0.1867 0.1685 0.1581 0.1565 0.1469 0.1328 0.1235 0.1246 0.1940 0.1799 0.1788 0.1791 0.1730 0.1637 0.1693 0.1683 MSE &#955; 0.0412 0.0341 0.0300 0.0297 0.0256 0.0211 0.0186 0.0198 0.0429 0.0381 0.0381 0.0383 0.0342 0.0325 0.0352 0.0347 MN &#960; 0.0510 0.0313 0.0207 0.0450 0.0351 0.0229 0.0172 0.0414 0.0232 0.0235 0.0240 0.0238 0.0310 0.0204 0.0269 0.0267 RMSE &#960; 0.0105 0.0065 0.0035 0.0029 0.0084 0.0055 0.0031 0.0026 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 MSE &#960; 0.0001 0.0001 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AAR 0.8212 0.8505 0.8627 0.8611 0.8368 0.8561 0.8630 0.8595 0.7179 0.7513 0.7497 0.7492 0.7294 0.7716 0.7652 0.7640 PAR2 0.8752 0.9159 0.9290 0.9249 0.8978 0.9257 0.9306 0.9282 0.1747 0.2452 0.2348 0.2349 0.2104 0.3131 0.2860 0.2812 Note. In the row "Method", there are three different levels of &#948;; the sequential Gibbs samplings with &#948; = 0.01, 0.1 and 1. 0.0079 0.0043 -0.0000 -0.0117 0.0043 0.0021 -0.0002 -0.0111 0.0051 -0.0176 -0.0327 -0.0038 0.0094 -0.0111 -0.0342 -0.0019 RMSE g 0.0225 0.0201 0.0188 0.0261 0.0147 0.0136 0.0131 0.0221 0.0234 0.0303 0.0450 0.0334 0.0192 0.0200 0.0407 0.0268 MSE g 0.0005 0.0004 0.0004 0.0008 0.0002 0.0002 0.0002 0.0006 0.0006 0.0010 0.0023 0.0012 0.0004 0.0004 0.0019 0.0008 Bias s 0.0053 0.0032 0.0026 0.0090 0.0032 0.0021 0.0016 0.0064 0.0083 0.0136 0.0210 -0.0082 0.0040 0.0075 0.0204 -0.0125 RMSE s 0.0215 0.0208 0.0207 0.0284 0.0152 0.0149 0.0147 0.0238 0.0271 0.0291 0.0394 0.0471 0.0175 0.0185 0.0307 0.0413 MSE s 0.0005 0.0004 0.0004 0.0008 0.0002 0.0002 0.0002 0.0006 0.0008 0.0009 0.0016 0.0031 0.0003 0.0003 0.0010 0.0030 MN &#960; 0.0313 0.0197 0.0292 0.1556 0.0153 0.0108 0.0168 0.1443 0.0193 0.1174 0.1447 0.1425 0.0368 0.0910 0.1454 0.1449 RMSE &#960; 0.0040 0.0028 0.0023 0.0036 0.0029 0.0021 0.0016 0.0032 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 MSE &#960; 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0412 0.0303 0.0515 0.1693 0.0383 0.0238 0.0327 0.1585 0.0559 0.1413 0.1569 0.1562 0.0465 0.1134 0.1422 0.1418 RMSE &#960; 0.0074 0.0047 0.0034 0.0058 0.0062 0.0040 0.0027 0.0054 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 MSE &#960; 0.0001 0.0000 0.0000 0.0004 0.0001 0.0000 0.0000 0.0004 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AAR 0.8861 0.9046 0.9108 0.8800 0.8866 0.9013 0.9061 0.8740 0.8143 0.8185 0.7807 0.7769 0.8007 0.8242 0.7839 0.7736 PAR2 0.9303 0.9559 0.9612 0.9522 0.9323 0.9525 0.9577 0.9432 0.4738 0.4750 0.3406 0.3249 0.4330 0.4921 0.3601 0.3181 Note. In the row "Method", there are three different levels of &#948;; the sequential Gibbs samplings with &#948; = 0.01, 0.1 and 1.  <ref type="bibr">(2000,3) 856.37 19.573 21.568 18.986 516.98 103.63 105.93 53.263 (2000,5)</ref>  (i.e., N = 536) to 20 items (i.e., J = 20). There are 8 attributes and 2 8 = 256 latent classes.</p><p>We fit both the DINA and GDINA models. Based on the results of simulation studies, we set the hyper-parameter of the Dirichlet priror &#948; = 0.1. When applying the GDINA model, we assume the item parameter prior for all items as &#955; (w) &#8764; N(0, 1) for both w = 0 and w &gt; 0. The prior hyper-parameters and MCMC chain lengths and burn-in are listed in When to analyze the fraction-subtraction data with the DINA model, the independent, sequential and simultaneous Gibbs sampling methods spend 5.46, 4.91 and 942 seconds, respectively. The off-the-shelf software JAGS is treated as a benchmark. We find that the simultaneous Gibbs sampling using JAGS is time-consuming. Figures 2(a   similar results to the sequential Gibbs sampling, while the latter has obvious speed advantage. Compared with the MCMC estimates obtained by JAGS, the independent Gibbs sampling trends to overestimate the guessing parameters and underestimate the slipping parameters, which may be due to the correlation of the attributes.</p><p>In the GDINA model, the item parameters are intercept and interaction parameters rather than guessing and slipping parameters. The time costs of the independent and sequential Gibbs sampling algorithms are 19.25 and 17.65 seconds, respectively. The notation &#955; (w) represents the w-way interaction. Since the items in the fraction-subtraction data need up to 5 attributes, there exist up to the 5-way interaction parameters. The The item parameter estimations show that the means of intercept parameters and 4-way interaction parameters are negative and the others are positive. The conclusion that the intercept term is negative is consistent with our intuition, because a subject without any required attributes is usually expected to have a low positive response probability. </p><p>The estimations of item parameters for the fraction-subtraction data. The "Ind", "Seq" and "Sim" represent the results from the independent, sequential and simultaneous (by implemented JAGS) Gibbs sampling, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion</head><p>In practice, one computational challenge is that when the number of attributes is large, the existing MCMC for the CDMs may become slow. In this paper, a computationally efficient algorithm, named as the sequential Gibbs sampling, was proposed for a general CDM, i.e., the GDINA model. In the situation with small K, compared to the existing method (e.g., JAGS), the proposed method can also yield similar results. The proposed method still works well and fast for the case with large K. When K = 15, I = 2000, J = 40 and the model is the GDINA model, running 3,000 iterations only needs less 110 seconds.</p><p>Especially, for a large K, the computational advantage becomes more significant compared with the simultaneous Gibbs sampling method. The proposed method can be easily applied to other CDMs. In the appendix, we show the algorithm for the DINA model for an illustration.</p><p>In this paper, we only focus on the computational challenge for large K, given the Qmatrix is correctly specified. Most references about identification theory pointed out that the Q-matrix need to contain an identity matrix at least for strict identifiability <ref type="bibr">(Gu &amp; Xu, 2020;</ref><ref type="bibr">Xu &amp; Shang, 2018)</ref>. In practice, however, the Q-matrix may be misspecified, and it would be needed to estimate the Q-matrix together with the model parameters and latent attributes. The estimation of the Q-matrix is known to be a challenging issue, especially when K is large, and the proposed algorithm may be extended to such applications to help reduce the computational cost of the convensional MCMC approaches. Another interesting extension is to use the idea in this paper to solve other latent variable modeling problems with many latent attributes. Not only for the discrete but also for continuous abilities, this idea may be helpful.</p><p>conditional distribution as follows:</p><p>Obviously, the posterior distribution of &#945; ik is a Bernoulli distribution Bernoulli(p ik ) with the parameter</p><p>.</p><p>We Algorithm 2: Sequential Gibbs Sampling for DINA Input: Initialize g (0) , s (0) , &#945; (0) , &#960; (0) , Y , m = 0, M and specify priors.</p><p>Output: Markov chains of g, s, &#945;, &#960;. while m &lt; M do Sample attribute profiles from Equation (A.5).</p><p>Sample the other parameters according to the reference <ref type="bibr">(Culpepper, 2015)</ref>. Set m = m + 1. end B Full conditional distribution for &#955; j Focusing on the jth item, we can get a classical linear regression as follows Z j = X j &#955; j + &#949; j , where &#955; j is the item parameter and the residual &#949;</p><p>where the I represents an identity matrix. Assuming the joint prior for &#955; j is N(&#181; &#955; j , &#931; &#955; j ), the specific form of parameter's prior is</p><p>According to the Bayesian linear regression, the kernel of the posterior is</p><p>Let &#931;-1 &#955; j = X j X j + &#931; -1 &#955; j , using the undetermined coefficient method to solve &#956;&#955; j = &#931;&#955; j (X j Z j + &#931; -1 &#955; j &#181; &#955; j ), so we can get the full conditional distribution for jth item's parameters easily. The full conditional distribution is N( &#956;&#955; j , &#931;&#955; j ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C Sample from the truncated multivariate normal distribution</head><p>Because the posterior is a normal distribution, only the multivariate normal distribution with truncation needs to de derived. Assuming that a random vector X &#8764; N p (&#181;, &#931;) and</p><p>When we ignore the truncation, the marginal distribution of X 1 is a normal distribution which can be used to generate x 1 , and then the conditional distributions of the following parameters:</p><p>. . .</p><p>are also normal distributions <ref type="bibr">(Anderson, 1958)</ref> which are used to generate x 2 , &#8226; &#8226; &#8226; , x p . As a result, we can get a realization (x 1 , x 2 , &#8226; &#8226; &#8226; , x p ) following N p (&#181;, &#931;).</p><p>In this paper, the first component X 1 is negative, the else components are positive.</p><p>Imposing the restrictions to (x 1 , x 2 , &#8226; &#8226; &#8226; , x p ) , we sample x 1 from the interval (-&#8734;, 0) part of the marginal distribution of X 1 and sample x 2 , &#8226; &#8226; &#8226; , x p from the (0, &#8734;) of remaining conditional distributions.</p><p>Furthermore, through this method, more complex restrictions can be easy to impose.</p><p>For any X i , the left censoring, right censoring and interval censoring can be employed.</p><p>Generating censoring data from a unidimensional normal distribution is easy, so this method is rather flexible and simple.</p><p>D The Q-matrices for different K</p><p>The Q-matrices are used for different K in the simulation studies, please see Table <ref type="table">12</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E Analyses of TIMSS 2007</head><p>The  A1 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 A2 0 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 0 A3 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 0 0 1 0 1 K=5 A1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 A2 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 A3 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 A4 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 A5 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 K=7 A1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 A2 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 A3 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 A4 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 A5 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 A6 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 A7 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 K=15 A1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 A2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 A3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 A4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 A5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 A6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 A7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 A8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 A9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 A10 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 A11 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 A12 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 A13 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 A14 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 A15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1</p><p>The Q-matrix used by <ref type="bibr">Lee et al. (2011)</ref> is listed in Table <ref type="table">13</ref>. There are 15 attributes Since the items in TIMSS 2007 need up to 6 attributes, there exist the 6-way interaction parameters. The box-plot of estimated item parameters is given in Figure <ref type="figure">3</ref>(c). We can see that the average effect of intercept is negative and the others are positive. </p></div></body>
		</text>
</TEI>
