Understanding the Impact of Assessment Practices in Higher Education

Google Docs - create and edit documents online, for free.

Problem

The importance of assessment.

Among the most important roles of instructors in higher education is the task of certifying that each individual learner in their course has achieved a particular standard in relation to the intended outcomes of the course and that that achievement is both a valid and reliable measurement of the learner's true ability. The importance of this determination is a reflection of how student achievement data are used in not only summative course assessments, but also predicting future success, awarding scholarships, and determining acceptance into competitive programs (Guskey & Link, 2019). In order for this accounting for learning to accurately reflect the goals of any given course, it is necessary to ensure that the assessment strategy be aligned with course learning outcomes (Biggs & Tang, 2011). However, Broadfoot (2016) and Pellegrino & Quellmalz (2010) argue that the goals and intended outcomes for higher education have changed as society has become more saturated with digital technologies. These changes have precipitated a parallel shift in both the cognitive and affective competencies required of digitally literate citizens. Predictably, this has led to an increasing gap between traditional assessment structures which prioritize validity, reliability, fairness, and objectivity through statistical analyses and modern goals of higher education prioritizing more affective constructs such as cooperation, empathy, creativity, and inquiry.

Outcome/assessment misalignment.

This gap between outcomes and assessments can also be considered in light of the model of assessment described by the National Research Council in the United States (2001). The authors describe the assessment triangle with three interdependent components:

a model of the content or skills to be mastered (cognition);
a performance task that elicits evidence of learner competence (observation); and
an interpretation of the evidence that leads to either formative feedback or a summative grade (inference).

If any one of these components is lacking, the quality of the assessment process will be compromised. Accordingly, both the observation and the inference must align with the cognition component of the model. If, according to Timmis et al. (2016), the cognition component has changed with the increase in the use of digital technologies in society and in higher education, then it follows that the ways in which we observe learner competence and infer a summative grade from the observation must also change.

Assessment practices in higher education tend to focus on summative, selected-response instruments designed to maximize efficiency in administration and objectivity in scoring (Lipnevich et al., 2020), both of which are beneficial to instructors and learners. While selected-response tests have notable benefits for learners, instructors, and institutions including greater objectivity in scoring, robust psychometric qualities, and more defensible grading practices, there are also downsides, including learner anxiety, the tendency of learners to cram for exams, misalignment between learning outcomes and assessments, and 'teaching to the test' (Broadfoot, 2016; Gerritsen-van Leeuwenkamp et al., 2017). While these classroom assessments are designed to mimic the psychometric rigour of large-scale assessments, such as the Scholastic Aptitude Test (SAT), it appears that they may lack evidence of validity and reliability (Broadfoot, 2016; Lipnevich et al., 2020). Researchers have called for instructors to reconsider their assessment practices (Timmis et al., 2016) as they recognize that the aims of higher education in the 21st century have shifted from a predominantly top-down transmission model of requiring graduates to demonstrate knowledge mastery in a cognitive domain, to a model that demands graduates demonstrate skills and attitudes in non-cognitive domains such as cooperation, problem-solving, creativity, and empathy, assessment of which selected-response instruments are ill-suited (Broadfoot, 2016). Encouraging a broader range of assessment structures will require paradigm shifts in both pedagogy and assessment that centre a relational, human centred approach to transformative learning experiences for learners (Black & Wiliam, 1998). Understanding how instructors think about and implement assessment structures and also how those assessment structures impact learners can help stakeholders plan for assessment in the 21st century.

Validity and reliability.

Psychometricians consider the validity and reliability of measurement instruments to be critical to their usefulness in supporting inferences about what examinees know or can do in relation to particular constructs (Finch & French, 2019). Large-scale assessments (LSA) such as the SAT, Graduate Record Examination (GRE), and other tests are used to make admissions decisions in colleges and universities (Test Specifications for the Redesigned SAT, 2015), while licensing examinations such as the National Council Licensure Examination (NCLEX) for nurses in the United States and Canada are used to ensure that graduates of nursing programs have the requisite knowledge and skills to be licensed as a Registered Nurse (Wendt & Alexander, 2007). These LSAs undergo thorough vetting and rigourous statistical analysis to ensure that they are valid and reliable predictors of success in college or in the nursing profession. LSAs are designed to be context independent, meaning that they will provide useful information about an examinee regardless of where and when they complete the assessment. The consequences if these examinations were to be shown to be invalid or unreliable would be severe, undermining both admissions decisions for entry into higher education and licensure decisions in the nursing profession, where lives are genuinely at stake. These are only two of many LSAs used as gatekeepers into programs or professions.

Black and Wiliam (1998) and Guskey and Link (2019) report that classroom teachers, when given the opportunity, attempt to emulate these summative LSAs by building their own or using publisher-created assessment items and instruments. Unfortunately, these instructor- or publisher-created instruments have not been vetted through the same degree of statistical analyses or refinement as typical LSAs, meaning that much of the assessment practice in higher education may be based on untested assumptions of validity and reliability. Further compounding this problem is that classroom-based summative assessments are used in much the same way as LSAs in that they serve as gatekeepers for learners hoping to progress through a degree program in a timely manner.

Summary

Assessment in higher education is critical and consequential, requiring careful planning to ensure alignment between the cognition, observation, and inference components of the assessment triangle. If there is a misalignment, the validity and reliability of the inference may be compromised. Many instructors in higher education try to emulate the psychometric rigour of large-scale assessments, such as the NCLEX for nurses, by designing their own or using publisher-created selected-response instruments. However, these instruments lack the psychometric qualities of LSAs and they are ill-suited to the task of assessing the skills and competencies required of digitally literate citizens.

Purposes and Possible Questions

Identify factors which influence instructors’ approaches to assessment.
- Do instructors align cognition, observation, and inference components of the assessment triangle in their own assessment structure?
- Do instructors perceive misalignment between digital society and their assessment practices?
Identify instructor perceptions of various approaches to assessment in higher education (summative, formative, ungrading).
- What do instructors believe are the purposes of assessment?
- In what ways do instructors prioritize summative, formative, or ungraded assessment strategies?