Modelling a student’s learning and mastery using BKT, IRT and PFA

8 min readJun 13, 2020

Educational technology is an area that applies technologies to the field of education to produce the best outcomes. For example, we can apply machine learning models to pedagogy, to understand how well a student is learning different concepts. This knowledge can be used to fine tune teaching materials, improve online learning platforms or Intelligent Tutoring Systems (ITS), focus on areas where the student is weak, and accelerate learning of certain important concepts.

In this article, we study a few popular models for modelling a student’s learning: Bayesian Knowledge Tracing (BKT), Item Response Theory (IRT) and Performance Factors Analysis (PFA). The first of these deals with modelling how the student is embibing or learning a new concept, while the second is about testing or assessing the student’s ability when they try to solve questions of varying difficulty levels. PFA is a relatively new model that takes the student’s historical practice and performance into account along with other real life factors (such as multiple knowledge components being learned or tested, and accumulated learning on a knowledge component by repeated practice).

Traditionally, we test a student’s learning by their scores on a test. But score is a meta level value, there are various latent factors such as difficulty of the question, the ability of the student etc. that contribute to the overall score. These models shed some light on these factors.

We examine the basics of these theories and also how to apply them in real life with data on the student’s performance in a test.

Bayesian Knowledge Tracing (BKT)

BKT is a theory that models the learning of a student as a Markov process. It is a very popular model and has many variations such as individualised BKT, deep BKT (or deep knowledge tracing) etc. The basic idea is that a student’s learning is not fixed, as a student interacts with a learning platform, their skill in a given concept improves. So in BKT, all the variables such as forgetting, learning, guessing, slipping etc are modelled. BKT estimates how fast and how well learning is happening for the student. This model is widely used in evaluating the effectiveness of online learning platforms, how well such platforms can improve the learning of a student.

The BKT model assumes that the student’s learning can be measured by giving them standardised tests on a concept or combination of concepts. Each question in such a standardised test can be answered either correct or incorrect. BKT assumes that initially a student may not know about a concept, but their knowledge gets better with learning and practice related to that concept. Since it is modelled as a Markov process, we have four transition probabilities across different states of the process.

The BKT model has the following 4 parameters representing the probability of a student :

P_0 = Initial probability of answering the question correct

P_forgetting: probability that the student forgot something previously learned (this is often kept as 0, i.e. we assume that once a student has learnt a concept, their knowledge stays constant)

P_learned: Probability that the student has learned something that was previous not known

P_slip = probability that the student gave a wrong answer even though they had learned the concept

P_guess = probability that the student guessed the right answer while not knowing the concept

Now let us assume that we have a dataset of the student’s responses to questions in a test, along with whether they answered correctly or incorrectly. So all we have to do is use the BKT model and find the values of these probabilities p_guess, p_slip, p_0, p_learned.

One of the ways to learn the parameters from the data is as follows: we fit a neural network to the attempts data of the student.

Some papers on the BKT and similar models are:

Corbett, A. T.; Anderson, J. R. (1995). “Knowledge tracing: Modeling the acquisition of procedural knowledge”. User Modeling and User-Adapted Interaction. 4 (4): 253–278.
Yudelson, M.V.; Koedinger, K.R.; Gordon, G.J. (2013). “Individualized bayesian knowledge tracing models”. Artificial Intelligence in Education.
Deep Knowledge Tracing (DKT)

There are a few available BKT packages which can also be used. Examples are

CAHLR/pyBKT

Python implementation of the Bayesian Knowledge Tracing algorithm and variants, estimating student cognitive mastery…

github.com

vinitra/CognitiveTutorPrediction

INFO C260F Getting started with prediction (part 1) Vinitra Swamy, Madeline Wu, Wilton Wu Team Name: "Four-layer Dip"…

github.com

bayesnet/bnt

Bayes Net Toolbox for Matlab. Contribute to bayesnet/bnt development by creating an account on GitHub.

github.com

myudelson/hmm-scalable

This command-line tool is developed to fit Hidden Markov Models (HMM) from the large datasets. In particular, it is…

github.com

Item Response Theory (IRT)

Unlike the BKT model, the Item response theory (IRT) model assumes that the student’s knowledge of a given concept is constant and just has to be determined by administering tests and measuring their performance. This model has a long history since the 1920s. The IRT model is more popular for generation of psychometric tests and adaptive tests (where the difficulty level of the successive questions adaptively increases or decreases as the student gives correct or incorrect answers to questions of a given difficulty level). Examples of such standardised tests include GMAT and GRE and SAT.

The basic idea in an IRT model is this: there are some latent (hidden) traits such as a question’s difficulty, student’s ability, discrimination, chance factor etc. Now from the student’s correct or incorrect responses to the test questions, we can get a good idea of these latent variables.

There are 4 IRT models based on increased complexity: 1 parameter IRT model (called 1 PL model) to 4 parameter IRT (or 4 PL model).

Some useful books to learn more about the IRT model are:

Handbook of Item Response Theory, Three Volume Set, by By Wim J. van der Linden. https://www.routledge.com/Handbook-of-Item-Response-Theory-Volume-One-Models/Linden/p/book/9780367220013
List of books on Rasch (IRT 1 PL) model: https://www.rasch.org/rmt/rmt182d.htm
The Basics of Item Response Theory. Second Edition. by Frank B Baker https://eric.ed.gov/?id=ED458219

1PL IRT model

The 1PL (or 1 parameter) IRT model (also known as the Rasch model) is described as follows.

As per the 1PL model, the probability P_ij of the ith user correctly answering the jth question is given as

logit(P_ij) = i — j

where the logit function is given by

logit(x) =(1+(-x))-1

Where

i is a learner (student), j is a question, θ_i is the ability of the learner, β_j is the difficulty level of the question.

Using the 1PL IRT model, we can predict the ability level θ_i of a learner, given the data about the learner’s response to each attempted question.

2PL IRT model

We explore the 2 parameter (2PL) IRT model in a similar way. The 2PL model adds an additional factor, discrimination, to the 1PL model. The discrimination parameter indicates how good the question is in discriminating between students of differing abilities.

As per the 2PL IRT model, the probability P_ij of the jth user responding correctly to the ith question is given as follows:

logit (P_ij) = a_i (θ_j — β_i )

where

i is a question, j is a learner student, θ_j is the ability of the learner, a_i is the discrimination, β_i the difficulty level of the question.

3PL IRT model

The 3PL IRT model is more complex than the 1PL and 2PL models. It adds an additional factor, chance or guess, to the 2PL model. The guess parameter indicates how good the question is in discriminating between students of differing abilities.

As per the 3PL IRT model, the probability P_ij of the jth user responding correctly to the ith question is given as follows:

P_ij = c_j + (1-c_j) logit (a_i (θ_j — β_i ))

where i is a question and j is a learner student, θ_j is the ability of the learner, ai is the discrimination, β_i the difficulty level of the question and c_j is the chance parameter, indicating the probability that the student will guess the answer to the question.

4PL IRT model

The 4PL IRT model adds an additional factor to the 3PL model, called d, which is the upper asymptotic limit on the chance of getting the answer correct.

As per the 4PL IRT model, the probability Pij of the jth user responding correctly to the ith question is given as follows:

P_ij = c_j + (d_j-c_j) logit (a_i (θ_j — β_i ))

where

i is a question, j is a learner student, θ_j is the ability of the learner, a_i is the discrimination, β_i is the difficulty level of the question, c_j is the chance parameter, indicating the probability that the student will guess the answer to the question, d_j is the upper asymptotic limit

Therefore, by training any of these models (1PL to 4PL IRT model) on the student’s test performance data, we can find the values of the variables such as discrimination level, difficulty level etc. Similar to the BKT case, a deep neural network could be used for training the model and determining the parameters.

One can then plot the item characteristic curve (student’s ability level on X axis vs probability of getting the answer right on Y axis).

A few IRT packages include the following:

aimir/irt

Currently contains simple code, using a 4-parameter model, and allowing for partial credit. The parameter estimation is…

github.com

jplalor/py-irt

Bayesian IRT models in Python This repository includes code for fitting Item Response Theory (IRT) models using…

github.com

pluralsight/irt_parameter_estimation

This package implements parameter estimation for logistic Item Characteristic Curves (ICC) from Item Response Theory…

github.com

ckyeungac/DeepIRT

This is the repository for the code in the paper Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using…

github.com

Performance Factors Analysis (PFA)

The PFA model has different variables to model each knowledge component (KC) or skill. It is a variant of Learning Factors Analysis or LFA. It can be modelled using a logistic regression model where the student’s performance is the dependent variable.

It can be represented by the following equation:

m(i, j, n) = a_i + sum( β_j + y_j c_i,j)

Here

i is the ith student, j is the knowledge component (KC) or skill, n is number of observations of the student i with KC j (representing the past practice of the student), m is the logit value representing the accumulated learning for the student, a is the ability of the student, β is a parameter representing easiness (or difficulty) of the KC, the summing is done on all the KCs.

PFA is superior to the knowledge tracing models in that it takes into account the previous successes and failures of the student with each skill or knowledge component.

Some papers on the PFA theory are as follows:

Cen H., Koedinger K., and Junker B. Learning factors analysis–a general method for cognitive model evaluation and improvement. In Intelligent tutoring systems (2006), Springer, pp. 164–175.
Performance Factors Analysis, a new alternative
How to construct more accurate student models

Some Github projects that include PFA are as follows:

theophilee/learner-performance-prediction

Simple and performant implementations of learner performance prediction algorithms: Create a new conda environment…

github.com

yemao616/522fall-Project

This project is an effort to model student's changing knowledge during skill acquisition as they learn different…

github.com

Public Datasets for student’s learning

A few public datasets of student performance in tests (on which the BKT and other models may be trained and tested) are as follows:

ASSISTment 2009–2010 dataset https://sites.google.com/site/assistmentsdata/home

Skill-builder data 2009–2010 — ASSISTmentsData

When you want to cite this paper say Heffernan,N. (2015) Skillbuilder Data Data 2009–10. DOI= Important Note: A data…

sites.google.com

2. ASSISTment 2015 dataset https://sites.google.com/site/assistmentsdata/home/2015-assistments-skill-builder- data

3. KDD Cup 2010 cognitive tutor dataset https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp

4. Synthetic-5 dataset Simulated data used in the DKT paper: Github has the dataset https://github.com/chrispiech/DeepKnowledgeTracing

5. OLI Engineering statics 2011 dataset https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=507

Modelling a student’s learning and mastery using BKT, IRT and PFA

CAHLR/pyBKT

Python implementation of the Bayesian Knowledge Tracing algorithm and variants, estimating student cognitive mastery…

vinitra/CognitiveTutorPrediction

INFO C260F Getting started with prediction (part 1) Vinitra Swamy, Madeline Wu, Wilton Wu Team Name: "Four-layer Dip"…

bayesnet/bnt

Bayes Net Toolbox for Matlab. Contribute to bayesnet/bnt development by creating an account on GitHub.

myudelson/hmm-scalable

This command-line tool is developed to fit Hidden Markov Models (HMM) from the large datasets. In particular, it is…

aimir/irt

Currently contains simple code, using a 4-parameter model, and allowing for partial credit. The parameter estimation is…

jplalor/py-irt

Bayesian IRT models in Python This repository includes code for fitting Item Response Theory (IRT) models using…

pluralsight/irt_parameter_estimation

This package implements parameter estimation for logistic Item Characteristic Curves (ICC) from Item Response Theory…

ckyeungac/DeepIRT

This is the repository for the code in the paper Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using…

theophilee/learner-performance-prediction

Simple and performant implementations of learner performance prediction algorithms: Create a new conda environment…

yemao616/522fall-Project

This project is an effort to model student's changing knowledge during skill acquisition as they learn different…

Skill-builder data 2009–2010 — ASSISTmentsData

When you want to cite this paper say Heffernan,N. (2015) Skillbuilder Data Data 2009–10. DOI= Important Note: A data…

Written by Joy Bose

No responses yet