Problem

Modern assessment practices in higher education tend to focus heavily on summative high-stakes, selected-response instruments that may lack evidence of validity and reliability (Broadfoot, 2016; Lipnevich et al., 2020) and, since the spring of 2020 when higher education was forced to pivot to digital technologies, are increasingly administered in high-security settings using digital surveillance tools which appear to negatively and systematically affect examinees with darker skin colour and those who are neuro-divergent (Benjamin, 2019). Countering these trends will require paradigm shifts in both pedagogy and assessment that centre a relational approach to transformative learning experiences for learners (Black & Wiliam, 1998).

Psychometricians consider the validity and reliability of measurement instruments to be critical to their usefulness in supporting inferences about what examinees know or can do in relation to particular constructs (Finch & French, 2019). Large-scale assessments (LSA) such as the Scholastic Aptitude Test (SAT), Graduate Record Examination (GRE), and other tests are used to make admissions decisions in colleges and universities (Test Specifications for the Redesigned SAT, 2015), while licensing examinations such as the National Council Licensure Examination (NCLEX) for nurses in the United States and Canada are used to ensure that graduates of nursing programs have the requisite knowledge and skills to be licensed as a Registered Nurse (Wendt & Alexander, 2007). These LSAs undergo thorough vetting and statistical analysis to ensure that they are valid and reliable predictors of success in college or in the nursing profession. LSAs are designed to be context independent, meaning that they will provide useful information about an examinee regardless of where and when they complete the assessment. The consequences of these examinations being shown to be invalid or unreliable would be severe, undermining both admissions decisions for entry into higher education and licensure decisions in the nursing profession, where lives are genuinely at stake. These are only two of many LSAs used as gatekeepers into programs or professions.

Black and Wiliam (1998) and Guskey and Link (2019) report that classroom teachers, when given the opportunity, tend to try emulate these summative LSAs by building their own or using publisher-created assessment items and instruments. Unfortunately, these instructor- or publisher-created instruments have not been vetted through the same degree of statistical analyses or refinement as typical LSAs, meaning that much of the assessment practice in higher ed is based on untested assumptions of validity and reliability. Further compounding this problem is that classroom-based summative assessments are used in much the same way as LSAs in that they serve as gatekeepers for learners hoping to progress through a degree program in a timely manner.

Other research suggests, however, that assessment in higher education may be more broadly conceptualized, generally as being reasoning from evidence and more specifically as being comprised of three interdependent components known as the assessment triangle (National Research Council, 2001). The three critical components of the assessment triangle are:

a model of the content or skills to be mastered (cognition);
a performance task that elicits evidence of learner competence (observation); and
an interpretation of the evidence that leads to either formative feedback or a summative grade (inference).

Lipnevich, et al., (2020) report that 78% of first-year undergraduate syllabi they examined relied heavily on exams to elicit evidence of learning, suggesting an imbalanced focus on primarily the "observation" pillar of the assessment triangle. This imbalance may lead to decreased strength of any conclusions drawn from the evidence. There are, however, faculty who approach assessment differently, eschewing the focus on the "observation" pillar for a more balanced view that considers the nature of the content or skills to be learned, which is then aligned with robust opportunities for learners to practice and demonstrate their new knowledge, leading to warranted inferences in the form of formative feedback to the learner and instructor or, as appropriate, a summative rating.

Purpose

The purpose of this research is to:

Identify faculty and learner perceptions of various approaches to assessment in higher education (summative, formative, ungrading).
Identify factors which encourage faculty to utilize more formative and human-centred approaches to assessment.
Identify the effects of automating assessment practices (selected-response, automated scoring, technology-enabled surveillance) on learners' experiences of learning in higher education.
Provide recommendations for faculty or departments seeking to reform assessment and grading practices.

Questions

How do faculty model content or skills to be learned?
What types of evidence do faculty use to determine final grades?