Google Docs - create and edit documents online, for free.
Among the most important roles of instructors in higher education is the task of certifying that each individual learner in their course has achieved a particular standard in relation to the intended outcomes of the course and that achievement is both a valid and reliable measurement of the learner's true ability. The importance of this determination is a reflection of how student achievement data are used in not only summative course assessments, but also predicting future success, awarding scholarships, and determining acceptance into competitive programs [@guskeyExploringFactorsTeachers2019; @bairdAssessmentLearningFields2017]. In order for this accounting for learning to accurately reflect the goals of any given course, it is necessary to ensure that the assessment strategy be aligned with course learning outcomes [@biggsTeachingQualityLearning2011]. However, Broadfoot [-@broadfootAssessmentTwentyFirstCenturyLearning2016] and Pellegrino and Quellmalz [-@pellegrinoPerspectivesIntegrationTechnology2010] argue that the goals and intended outcomes for higher education have changed as society has become more saturated with digital technologies. These changes have precipitated parallel shifts in both the cognitive and affective competencies required of digitally literate citizens. Predictably, this has led to an increasing gap between traditional assessment structures which prioritize validity, reliability, fairness, and objectivity through psychometric analyses and modern goals of higher education prioritizing more affective constructs such as cooperation, empathy, creativity, and inquiry.
Complicating this problem is the trend, accelerated by SARS-Cov-2 and COVID-19, towards the use of digital technologies to create, administer, and score assessments. Typically, digital technologies are used to increase the efficiency and objectivity of test administration [@benjaminRaceTechnologyAbolitionist2019] using automated scoring on selected-response tests, reinforcing traditional assessment structures. However, as Shute et al. [-@shuteAdvancesScienceAssessment2016] argue, digital technologies could be used to drive innovations in assessment practice while balancing the need for both quantitative and qualitative approaches to assessment.
This gap between outcomes and assessments can also be considered in light of the model of assessment described by the National Research Council in the United States [-@nationalresearchcouncilKnowingWhatStudents2001]. The authors describe the assessment triangle with three interdependent components:
If any one of these components is lacking, the quality of the assessment process will be compromised. Accordingly, both the observation and the inference must align with the cognition component of the model. If, according to Timmis et al. [-@timmisRethinkingAssessmentDigital2016], the cognition component has changed with the increase in the use of digital technologies in society and in higher education, then it follows that the ways in which we observe learner competence and infer a summative grade from the observation must also change.
Assessment practices in higher education tend to focus on summative, selected-response instruments designed to maximize efficiency in administration and objectivity in scoring [@lipnevichWhatGradesMean2020], both of which are beneficial to instructors and learners. While selected-response tests have notable benefits for learners, instructors, and institutions including presumed greater objectivity in scoring, the potential for robust psychometric qualities, and more defensible grading practices, there are also downsides, including learner anxiety, the tendency of learners to cram for exams, misalignment between learning outcomes and assessments, and 'teaching to the test' [@broadfootAssessmentTwentyFirstCenturyLearning2016; @gerritsen-vanleeuwenkampAssessmentQualityTertiary2017]. While these classroom assessments are designed to mimic the psychometric rigour of large-scale assessments, such as the Scholastic Aptitude Test (SAT), it appears that they may lack evidence of validity and reliability [@broadfootAssessmentTwentyFirstCenturyLearning2016; @lipnevichWhatGradesMean2020]. Researchers have called for instructors to reconsider their assessment practices [@timmisRethinkingAssessmentDigital2016] as they recognize that the aims of higher education in the 21st century have shifted from a predominantly top-down transmission model of requiring graduates to demonstrate knowledge mastery in a cognitive domain, to a model that demands graduates demonstrate skills and attitudes in non-cognitive domains such as cooperation, problem-solving, creativity, and empathy, assessment of which selected-response instruments are ill-suited [@broadfootAssessmentTwentyFirstCenturyLearning2016]. Encouraging a broader range of assessment structures will require paradigm shifts in all of pedagogy, assessment, and use of technology that centre a relational, human centred approach to transformative learning experiences for learners [@blackAssessmentClassroomLearning1998]. Understanding how instructors think about and implement assessment structures and also how those assessment structures impact learners can help stakeholders plan for assessment in the 21st century.
Psychometricians consider the validity and reliability of measurement instruments to be critical to their usefulness in supporting inferences about what examinees know or can do in relation to particular constructs [@finchEducationalPsychologicalMeasurement2019]. Large-scale assessments (LSA) such as the SAT, Graduate Record Examination (GRE), and other tests are used to make admissions decisions in colleges and universities [@TestSpecificationsRedesigned2015], while licensing examinations such as the National Council Licensure Examination (NCLEX) for nurses in the United States and Canada are used to ensure that graduates of nursing programs have the requisite knowledge and skills to be licensed as a Registered Nurse [@wendtStandardizedEvidenceBasedContinued2007]. These LSAs undergo thorough vetting and rigourous statistical analysis to ensure that they are valid and reliable predictors of success in college or in the nursing profession. LSAs are necessarily context independent, meaning that they are intended to provide useful information about an examinee regardless of who they are or where and when they complete the assessment [@bairdAssessmentLearningFields2017]. The consequences if these examinations were to be shown to be invalid or unreliable would be severe, thus undermining both admissions decisions for entry into higher education and licensure decisions in the nursing profession, where lives are genuinely at stake. These are only two of many LSAs used as gatekeepers into programs or professions.
Black and Wiliam [-@blackAssessmentClassroomLearning1998] and Guskey and Link [-@guskeyExploringFactorsTeachers2019] report that classroom teachers, when given the opportunity, attempt to emulate these summative LSAs by building their own or using publisher-created assessment items and instruments. Unfortunately, these instructor- or publisher-created instruments have not been vetted through the same degree of psychometric analyses or refinement as typical LSAs, meaning that much of the assessment practice in higher education may be based on untested assumptions of validity and reliability. Further compounding this problem is that classroom-based summative assessments are used in much the same way as LSAs in that they serve as gatekeepers for learners hoping to progress through a degree program in a timely manner.
The use of technology in higher education has been particularly noticeable in assessment practices with instructors relying on the ease of administration and efficiency of scoring selected-response tests to determine learners' grades [@broadfootAssessmentTwentyFirstCenturyLearning2016], but it has been slower to lead to significant innovation in how technology might afford novel assessment structures [@pellegrinoPerspectivesIntegrationTechnology2010]. This does not mean, however, that there are not affordances of technology that may empower new ways of thinking about how grading decisions are made. For example, the use of learner blogs may lead to more opportunities for metacognitive reflection or self and peer formative assessment. Researchers caution, however that the increased use of technology in assessment will require careful thought about the ethical use of data, especially as surveillance tools have begun to proliferate in the field [@oldfieldAssessmentDigitalAge].