Approaches to Assessment

Quantitative/assessment of learning

Qualitative/assessment for and as learning

Jankowski et al. define assessment more procedurally than the NRC as,

Outcome/Assessment Misalignment

This gap between outcomes and assessments can also be considered in light of the model of assessment described by the National Research Council in the United States [-@nationalresearchcouncilKnowingWhatStudents2001]. The authors describe the assessment triangle with three interdependent components:

If any one of these components is lacking, the quality of the assessment process will be compromised. Accordingly, both the observation and the inference must align with the cognition component of the model. If, according to Timmis et al. [-@timmisRethinkingAssessmentDigital2016], the cognition component has changed with the increase in the use of digital technologies in society and in higher education, then it follows that the ways in which we observe learner competence and infer a summative grade from the observation must also change.

Assessment practices in higher education tend to focus on summative, selected-response instruments designed to maximize efficiency in administration and objectivity in scoring [@lipnevichWhatGradesMean2020], both of which are beneficial to instructors and learners. While selected-response tests have notable benefits for learners, instructors, and institutions including presumed greater objectivity in scoring, the potential for robust psychometric qualities, and more defensible grading practices, there are also downsides, including learner anxiety, the tendency of learners to cram for exams, misalignment between learning outcomes and assessments, and 'teaching to the test' [@broadfootAssessmentTwentyFirstCenturyLearning2016; @gerritsen-vanleeuwenkampAssessmentQualityTertiary2017]. While these classroom assessments are designed to mimic the psychometric rigour of large-scale assessments, such as the Scholastic Aptitude Test (SAT), it appears that they may lack evidence of validity and reliability [@broadfootAssessmentTwentyFirstCenturyLearning2016; @lipnevichWhatGradesMean2020].

Validity and Reliability

Psychometricians consider the validity and reliability of measurement instruments to be critical to their usefulness in supporting inferences about what examinees know or can do in relation to particular constructs [@finchEducationalPsychologicalMeasurement2019]. Large-scale assessments (LSA) such as the SAT, Graduate Record Examination (GRE), and other tests are used to make admissions decisions in colleges and universities [@TestSpecificationsRedesigned2015], while licensing examinations such as the National Council Licensure Examination (NCLEX) for nurses in the United States and Canada are used to ensure that graduates of nursing programs have the requisite knowledge and skills to be licensed as a Registered Nurse [@wendtStandardizedEvidenceBasedContinued2007]. These LSAs undergo thorough vetting and rigourous statistical analysis to ensure that they are valid and reliable predictors of success in college or in the nursing profession. LSAs are necessarily context independent, meaning that they are intended to provide useful information about an examinee regardless of who they are or where and when they complete the assessment [@bairdAssessmentLearningFields2017]. The consequences if these examinations were to be shown to be invalid or unreliable would be severe, thus undermining both admissions decisions for entry into higher education and licensure decisions in the nursing profession, where lives are genuinely at stake. These are only two of many LSAs used as gatekeepers into programs or professions.

Black and Wiliam [-@blackAssessmentClassroomLearning1998] and Guskey and Link [-@guskeyExploringFactorsTeachers2019] report that classroom teachers, when given the opportunity, attempt to emulate these summative LSAs by building their own or using publisher-created assessment items and instruments. Unfortunately, these instructor- or publisher-created instruments have not been vetted through the same degree of psychometric analyses or refinement as typical LSAs, meaning that much of the assessment practice in higher education may be based on untested assumptions of validity and reliability. Further compounding this problem is that classroom-based summative assessments are used in much the same way as LSAs in that they serve as gatekeepers for learners hoping to progress through a degree program in a timely manner.

Technology and Assessment

The use of technology in higher education has been particularly noticeable in assessment practices with instructors relying on the ease of administration and efficiency of scoring selected-response tests to determine learners' grades [@broadfootAssessmentTwentyFirstCenturyLearning2016], but it has been slower to lead to significant innovation in how technology might afford novel assessment structures [@pellegrinoPerspectivesIntegrationTechnology2010]. This does not mean, however, that there are not affordances of technology that may empower new ways of thinking about how grading decisions are made. For example, the use of learner blogs may lead to more opportunities for metacognitive reflection or self and peer formative assessment. Researchers caution, however that the increased use of technology in assessment will require careful thought about the ethical use of data, especially as surveillance tools have begun to proliferate in the field [@oldfieldAssessmentDigitalAge].