Educational Assessment and Measurement after COVID-19

Identifying a research problem (Creswell)

Problem

In the spring of 2020, the SARS-CoV-2 pandemic disrupted everything, including, and perhaps especially, higher education. As universities realized that they could no longer hold in-person classes, a large majority chose to pivot to using digital technologies to enable teaching and learning to happen with teachers and learners separated both geographically and temporally. This represented a monumental change in practice for many faculty as they realized they needed to rethink how to reach learners in this new model.

Very early in my experience of supporting faculty who had mere days to think about how they would finish the spring semester without a physical classroom, there was one challenge that seemed to take priority over the rest:

<aside> 👉🏻 What about final exams?

</aside>

Some presumed that the shift would be simply one of replacing in-person lectures with lectures delivered via web-conferencing technologies with the same requirements for 'attendance' and 'participation' in place. It was presumed that assessment could continue as close to 'normal' as possible by replacing high-stakes, high-security, in-person exams with high-stakes, high-security remote exams with digital technologies replacing in-person proctors, a shift in modality. Others have recognized that because the pandemic represented an opportunity to scrutinize the modality of teaching and learning in higher education (where and when teaching and learning happens), some pedagogical or androgogical practices employed in face-to-face had been exposed as being inadequate for the task of higher education. Conversations about the security of faculty-created summative exams opened up conversations about the general quality of faculty-created summative exams. If every learner would have ready access to their notes, textbooks, the internet, and each other for the duration of an examination then there are two main options, increase the security through costly and invasive surveillance technology, or rethink how to assess learning. This called for an entirely new approach to teaching and learning which centred the complex contexts of learners as they navigated their new physically isolated lives.

A mere shift in modality would serve to magnify the inequities already present in higher education where those with text anxiety were suddenly forced to subject themselves to highly invasive surveillance technologies, compounding their anxiety and reducing their capacity for performing at their full capability. At the same time, these technologies, since they rely on faulty face-detection algorithms to function, proved to be inadequate for detecting non-white faces, further entrenching inequity.

The alternative to doubling down on practices that we know to be ineffective and inequitable is, of course, to instead focus on increasing the degree to which assessments are sensitive to the personal context of and equity between learners. The challenge, however, as illustrated in figure 1, is that as the degree to which context and equity increases is inversely correlated to the degree to which an assessment can be considered valid and reliable. The only way to increase reliability and validity is to reduce the influence of local context; to increase the uniformity of testing procedures, assumptions about learners, and equality in the context in which the test is written.

What teachers know, however, is that every learner is different. There is no learner that could be considered a Platonic ideal. So we are left with the horns of a dilemma, we either increase validity and reliability, or we increase contextual decisions and equity. Unless, of course, there is a third option where we reconceptualize 'validity and reliability' as 'evidentiary warrant'. If assessment is 'reasoning based on evidence' (National Research Council, 2001), we can see that different types of inferences are supported by different kinds of evidence.

Purpose

The purpose of this research is to highlight the tension between (1) the traditional practice of using high-stakes, high-pressure, in-class examinations for learner assessment in higher education, (2) the need for these examinations to be reliable and valid, and (3) the impact of different assessment strategies on the performance and holistic well-being of learners. The research will consider various approaches to resolving the tensions including (1) providing professional development opportunities for faculty designed to increase their capacity to build tests which provide evidence for valid and reliable inferences, (2) increasing validity and reliability through automated test creation and deployment, and (3) rethinking the scale used to determine the quality of an inference.

Questions

In what ways to high-stakes summative exams impact learners, and especially traditionally disadvantaged learners?
Does recognizing the lack of reliability in faculty-created summative assessments contribute to educational transformation?
Does recognizing the impact of assessment practices on learner well-being and mental health contribute to educational transformation?
What practices support warranted inferences in human-centred assessment design?
How should PSI respond to the proliferation of digital assessment technologies?
What do learners think of various assessment practices?