Identifying a research problem (Creswell)

Topic

Reconciling Educational Measurement and Classroom Assessment

Problem

In higher education, there is a long history of using high-stakes, summative exams to determine learner grades in a course or program of study. These exams tend to be seen as 'objective' measurements of learners' knowledge, skills, and attitudes with respect to the learning outcomes of the course. This positivist view of the nature of learning has very real consequences for learners and faculty alike, the most significant of which is the dehumanizing of the very non-linear, context-dependent, and idiosyncratic process of learning. This can be seen in the increase in mental health challenges among learners during parts of the semester with high numbers of exams. Not only does this disadvantage learners predisposed to anxiety, depression, or other mental health challenges, but it also disadvantages learners, such as Indigenous learners, who are acculturated to non-linear, context-dependent, and highly relational and participatory learning contexts. In addition to the negative mental health effects, these high-stress examinations are significant contributors to incidences of academic dishonesty.

From a social constructivist perspective, learning is a social and participatory endeavour which relies on deep and meaningful interactions within and between learners and instructors. The positivist view that seeks to objectify and decontextualize the assessment of a subjective and contextual process can, ironically, be shown to be inadequate by its own standards. In other words, it is falsifiable. This view borrows the assumptions made in large-scale assessments and attempts to apply those principles to the context of a classroom, yet, at the same time, does not generally utilize the tools required of drawing inferences in large-scale contexts. The gold standard by which tests are determined to be true indicators of latent constructs is that they are shown to be both reliable and valid. These interdependent ideas refer to how well a test reveals latent constructs over time or across populations (reliability), and how well a test actually reveals the latent construct in question (validity). Psychometricians consider reliability values over 0.90 to be necessary to make (what they call) high stakes decisions with respect to learner achievement in highly regulated fields like nursing, medicine, accounting, and law. For a summative assessment in higher education, reliability of 0.80 is the minimum, and for pilot studies or exploratory research, 0.70 is the minimum. Anything below 0.70 is considered to be unreliable and should not be used to make meaningful decisions. If high stakes, summative exams can be shown to be unreliable indicators of learning (below 0.80), then they cannot be considered valid and should not be used in higher education.

Purpose

The purpose of this research is to determine whether faculty-created, high-stakes, summative examinations can be considered reliable or valid measures of learning and how post-secondary institutions might respond to the shortcomings of either high-stakes, summative testing or more learning-centric models of assessment. If it can be shown that learning-centric models of assessment are more reliable than high-stakes summative exams and they also reduce stress and mental health challenges for learners, post-secondary institutions should consider transitioning to learning-centric models.

Questions

Are faculty-created, high-stakes summative assessments reliable measures of learning?
If faculty-created, high-stakes examinations are not reliable, should we rely on technological solutions to increase reliability and validity? JISC Report
In what ways to high-stakes summative exams impact traditionally disadvantaged learners?
What practices support reliable measurements in human-centred assessment design?
What do learners think of various assessment practices?

Notes

help faculty to become less confident in the results of single-sample summative exams
help steer institutions and faculty away from techno-solutionism in assessment practices, especially algorithmically generated, deployed, and graded assessments