NCTE - The National Council of Teachers of English - A Professional Association of Educators in English Studies, Literacy and Language Arts
Search:
About NCTE Membership Professional Development Publications Programs Related Groups
 
The National Council of Teachers of English
- All Positions by Category
-    Assessment & Testing
-     Censorship & Intellectual Freedom
-     Class Size & Workload
-     Computers in Education
-     Curriculum
-     Diversity
-     Government in Education
-     Grammar
-     Instruction
-     Interdisciplinary
-     Language
-     Library
-     Literacy
-     Literature
-     Media Literacy
-     Multicultural Literature
-     NCTE Organizational Concerns
-     Professional Concerns
-     Professional Development
-     Publishers
-     Reading
-     Rights and Roles in Education
-     Standards
-     Teacher Certification & Preparation
-     Teaching Materials
-     Working Conditions
-     Writing
- All Positions by Level
- All Positions by Date
- Call for Resolutions
NCTE

- Parents & Students
- Press & Policymakers
Login to My NCTE Page
Shop the NCTE Catalog
 Writing
Home > About NCTE > Overview > Our Positions > Positions by Category > Writing > Article:107609
 

Standards for the Assessment of Reading and Writing

 

Excerpts from the booklet prepared by the International Reading Association and National Council of Teachers of English Joint Task Force on Assessment 
(To order, see below.)

1994

 

The Standards

1. The interests of the student are paramount in assessment.

2. The primary purpose of assessment is to improve teaching and learning.

3. Assessment must reflect and allow for critical inquiry into curriculum and instruction.

4. Assessments must recognize and reflect the intellectually and socially complex nature of reading and writing and the important roles of school, home, and society in literacy development.

5. Assessment must be fair and equitable.

6. The consequences of an assessment procedure are the first, and most important, consideration in establishing the validity of the assessment.

7. The teacher is the most important agent of assessment.

8. The assessment process should involve multiple perspectives and sources of data.

9. Assessment must be based in the community.

10. All members of the educational community -- students, parents, teachers, administrators, policymakers, and the public -- must have a voice in the development, interpretation, and reporting of assessment.

11. Parents must be involved as active, essential participants in the assessment process.

 

Standard 1: The interests of the student are paramount in assessment. 

Rationale

This standard refers to individual students, not students on average nor students collectively. Assessment must serve, not harm, each and every student. This means that each individual's intellectual, social, and emotional well-being must be considered, even when the decision to be made from the assessment will affect other individual students or even an entire class or school.

We must recognize that assessment experiences, formal or informal, have consequences for students (see standard 6 -- consequential validity). Assessment procedures have profound effects on students' lives. Assessments may alter their educational opportunities, increase or decrease their motivation to learn, elicit positive or negative feelings about themselves and others, and influence their understanding of what it means to be literate, educated, or successful.

What features of assessment are likely to serve the students' interests? First and foremost, assessment must encourage students to reflect on their own reading and writing in productive ways, to evaluate their own intellectual growth, and to set goals. In this way, students become involved in and responsible for their own learning and better able to assist the teacher in focusing instruction. Past assessment practices, particularly normative practices, have often produced conditions of threat and defensiveness for students. Constructive reflection is particularly difficult under such conditions. Thus assessment should emphasize what students can do rather than what they cannot do. Portfolio assessment, for example, if managed properly, can be reflective, involving students in their own learning and assisting teachers in refocusing their instruction.

Second, assessment must provide useful information to inform and enable reflection. The information must be both specific and timely. Specific information on students' knowledge, skills, strategies, and attitudes helps teachers, parents, and students set goals and plan instruction more thoughtfully. Information about students' confusions, counterproductive strategies, and limitations, too, can help students and teachers reflect on and learn about students' reading and writing as long as it is provided in the context of clear descriptions of what they can do. The timeliness of the information is equally important. If information from assessment is not provided immediately, it is not likely to be used. Nor is it likely to be useful, because needs, interests, and aspirations are likely to change with the passage of time. In either case the opportunity to influence and promote learning may be missed.

Third, the assessment must yield high-quality information. The quality of information is suspect when tasks are too difficult or too easy, when students do not understand the tasks or cannot follow the directions, or when they are too anxious to be able to do their best or even their typical work. In these situations students cannot produce their best efforts or demonstrate what they know. Requiring students to spend their time and efforts on assessment tasks that do not yield high-quality, useful information results in students losing valuable learning time. Such a loss does not serve their interests and is thus an invalid practice (see standard 6).

Implications

This standard implies that if any individual student's interests are not served by an assessment practice, regardless of whether it is intended for administration or decision making by an individual or by a group, then that practice is not valid for that student. Since group-administered, machine-scorable tests do not normally encourage students to reflect constructively on their reading and writing, do not provide specific and timely feedback, and generally do not provide high-quality information about students, they seem unlikely to serve the best interests of students. Similarly, many less formal classroom assessments fail to meet these criteria. Regardless of the source or motivation for any particular assessment, states, school districts, schools, and teachers must demonstrate how these assessment practices benefit and do not harm individual students.

Assessment instruments or procedures themselves are not the only consideration in this standard. The context in which they are used can be equally important. For example, portfolio assessment that satisfies this standard when used in one class may also satisfy it in the context of a high-stakes assessment, such as an accountability assessment in which comparative scores are published in the newspaper. Authentic assessment tasks such as those being tested in California and in the New Standards Project in 112 school districts across the nation offer exciting and insightful possibilities for producing useful information. Students will perform "authentic" or "real-life" tasks over time, and these tasks can be evaluated at the district, state, and national levels and provide much more meaningful information about what a student knows and is able to do. Rather than a simple comparative reporting of aggregate test scores by a school or district, which provides numbers only and is more likely to produce defensiveness and anxiety than insight, such task-oriented assessments can produce meaningful information that shows the level of teaching and learning actually taking place in a learning community

Indeed, the most powerful assessments for students are likely to be those that occur in the daily activity of the classroom.  Maximizing the value of these for students and minimizing the likelihood that they are damaging will involve an investment in staff development and the creation of conditions that enable teachers to reflect on their own practice.

 

Glossary of Assessment Terminology

Rapid changes in the field of reading and writing assessment have generated a variety of new terms as well as new uses for many established terms.  The purpose of this glossary is to specify how assessment terms are generally used in discussions of literacy assessment and to point out alternative meanings of terms where they are common.  We begin with curriculum since it is the foundation for our understanding of assessment as curriculum inquiry.

Curriculum

We can think of curriculum as having three components: the envisioned curriculum, the enacted curriculum, and the experienced curriculum.  The envisioned curriculum is our daily attempts in classrooms to put the envisioned curriculum into practice.  The experienced curriculum is the sense the language learner makes of what goes on in the classroom and is thus constructed within the language of that classroom.  For example, if most of the reading material in one class involves racial or gender stereotypes, then that is likely to be reflected in students' learning, and, by contrast, students are likely to construct different knowledge about human relationships from a more balanced selection of reading material.  However, the knowledge and attitudes students construct from those works is strongly influenced by the ways the teacher talks about them, the nature of group discussions, and the ways teachers and other students respond to each other.  Ultimately, it is the experienced curriculum that is our concern, and that is why students must be our primary curricular informants.  However, it is the discrepancies among the envisioned, enacted, and experienced curricula that drive curriculum inquiry, the process of assessment.

Standards 1, 3, and 4 are particularly closely related to issues of curriculum.

Aggregation

In assessment, aggregation is the process of collecting data together for the purpose of making a more general statement. For example, it is common practice for school districts to add together all of the test scores for their students in order to find the average performance of students in the district. This process strips away all of the differences among the various cultural groups, schools, and students within the district in order to make the larger statement. Even an individual student's test score is a result of aggregating all of the individual items to which the student responded in order to make a general statement about a student's "reading ability." It is also common then to "disaggregate" the scores to see how subgroups performed within the larger group. Aggregation and disaggregation are in some ways a matter of deciding what are relevant and what are irrelevant data.

There are powerful tensions in this society around the issue of aggregation -- reflecting, on the one hand, the need to make general statements about students, teachers, and schools, and, on the other, the problem of stripping away the particulars of individual performances and situations in the process. It is not universally agreed that it is valuable to reduce students or schools to numbers, let alone for which purposes or on what grounds that might be reasonable. It is often argued that administrators need highly aggregated data to make programmatic and budgetary decisions. However, both in education and in industry administrators make different decisions when facing aggregated data than they do when facing real situations with real persons.

Authentic, Performance-based Assessment

These terms and the kinds of assessment to which they refer arise from the realization that widely employed assessment tools generally have been poor reflections of what literate people actually do when they read, write, and speak. The logic of authentic assessment suggests, for example, that merely identifying grammatical elements or proofreading for potential flaws is not an acceptable measure of writing ability. For their writing to be assessed, students must write, facing the real challenges faced by literate people.

The general issue of the "realness" of what is being measured (its construct-validity) is alluded to by the terms: authentic assessment, performance-based assessment, performance assessment, and demonstrations. Regardless of what the assessments are called, the issue is that tests must measure what they purport to measure: a reading test requires a demonstration of, among other things, constructing meaning from written text; a writing assessment requires a demonstration of producing written text.

Controversy continues to exist about whether machine-scorable, multiple-choice tests have a place in a world in which the criterion of authenticity is applied systematically and rigorously to the evaluation of assessments. The issues of authentic, performance-based assessment are particularly relevant to standards 4 and 6.

Equity

Issues of fairness surround literacy assessment. Testing originated as a means to control nepotism in job selection -- providing an independent perspective on selection to uphold fairness. But equity cannot be assured through testing alone. Those who control the assessment process control what counts, or what is valued. As we pointed out in the introduction, language assessment is laden with cultural issues and biases. Although equity cannot be assured through assessment, it must be pursued relentlessly in assessment and in schooling. It is more likely to be achieved through the involvement of multiple, independent perspectives than through the use of a single perspective.

Tests have traditionally been administered, their results published, and their impact on instruction instigated with little regard to issues such as cultural, economic, and gender equity. But many equity issues affect assessment, rendering comparisons difficult and often meaningless. Because traditional test makers have all too frequently designed assessment tools reflecting narrow cultural values, students and schools with different backgrounds and concerns often have not been fairly assessed.

Equity issues also include the kinds of educational experiences available to students who will face similar assessments, particularly in certification or gatekeeping situations. Questions of access to sound instruction, appropriate materials, and enriching learning opportunities are critical. Educators have become increasingly aware of the connections between assessment results and levels of safety, health and welfare support, and physical accessibility. Any responsible assessment must engage the full complexity of situations faced by educational communities. These issues related to equity are most closely tied to standard 5, but touch all the standards here.

Norm-referenced or Criterion-referenced Assessment

"Referencing" is choosing a framework for interpreting something, in this case assessment data. Norm-referenced interpretations are based on comparisons with others, usually resulting in a ranking. A norm-referenced interpretation of a student's writing will assert, for example, that the sample of writing is "as good as that of 20 percent of the students in that grade nationally." Criterion-referenced assessment is based on predetermined criteria that serve as "yardsticks" or "benchmarks" of performance. Neither frame of reference is particularly illuminating instructionally.

Other, less common frames of reference are more productive in that regard. For example, performance can be interpreted in the context of previous performance (self-referenced). Performance can also be interpreted in the context of a particular theory of literate learning (theory-referenced). But these frames of reference have consequences for the whole process of assessment. They bring with them consequential changes in assessment procedures. In order to make self-referenced assessments one needs to arrange for the collection of historical examples. In order to make theory-referenced interpretations one has to have a coherent theory. To make norm-referenced assessments, assessment practices need to be standardized and focus on maximizing the differences among individuals on a single scale.

Norm-referenced testing is the most prevalent form of large-scale testing, in which large groups of students take a test and the scores are grouped and interpreted in relation to other scores. In other words, the score of any student or group (school, district, state, or nation) has meaning only in relationship to all the other scores of like entities, e.g., school to school, district to district, state to state. In order to make such comparisons, we have to make the assumption of "all else being equal." In other words, we try to make everything the same so that differences in performance can be attributed to one source: the student, or school, or district -- whichever is the level of aggregation. This assumption, as we pointed out earlier in our discussion of language, is extremely dubious. It does not usually take into account the differences that abound throughout the thousands of schools and districts relating to curriculum, culture, gender, ethnicity, economic circumstance, per-pupil funding, and so forth. National norm-referenced tests assume that all students in our society have had similar cultural and curricular experiences.

Norm-referenced interpretations often occur in classrooms, too. A teacher who has little knowledge of the complexity of literacy learning will often have to resort to comparisons and rankings in order to interpret students' reading and writing. Such normative assessments often turn up as grades on report cards. Teachers with a reasonably detailed knowledge of their students' reading and writing, on the other hand, will have difficulty reducing their knowledge to simple rankings for such purposes. Indeed, the process poses highly stressful ethical dilemmas for them. Although grades and rankings are a common part of the educational history of most individuals in this culture, this committee believes the practice to be unnecessary and generally counterproductive.

Some of the stakeholders in assessment -- parents, teachers, students, administrators, policymakers -- have been seduced into believing that norm-referenced test scores are readily interpretable and productive. However, when it comes to assessing reading and writing, norm-referenced test scores have little utility because they oversimplify highly complex processes. These processes cannot be evaluated by a machine-scored, multiple-choice test -- the most common form of norm-referenced assessment. Assessments based on norm-referenced tests give at best inadequate and often actually misleading information about many students. Most unfortunately, norm-referenced test scores have too often become the single most important criterion for decisions about placement and promotion that have a powerful impact on students' lives.

Criterion-referenced testing involves tests that compare students' performance against established benchmarks. These benchmarks or criteria are usually expressed as numerical ranges that define levels of achievement. For example, an 80-85 score may mean high performance among levels of achievement ranging from unsatisfactory to outstanding. Criterion-based testing can also involve holistic scoring of writing, for example, where a score is based on a set of pre-established consensual criteria.

Standards 1, 2, 3, 4, and 6 raise issues related to norm-referenced and criterion-referenced assessments.

Reliability

Broadly speaking, reliability is an index of the extent to which a set of results or interpretations can be generalized across tasks, over time, and among interpreters. In other words, it is a particular kind of generalizability. For example, a common concern raised by newer forms of literacy assessment is whether different examiners, evaluating a complex response and using complex scoring criteria, will draw similar conclusions about a student's performance (whether an assessment will generalize across different examiners). Experience from scoring complex student writing samples does suggest that when people are well trained in the application of specific criteria, high rates of agreement can be achieved; however, this agreement does not guarantee a high-quality assessment. Indeed, current assessment practices stressing reliability as the central quality of assessments generally focus on trivial matters, on which it is easiest to gain agreement. Reliability is only important within the context of validity -- the extent to which the assessment leads to useful, meaningful conclusions and consequences.

In order to provide more "authentic" tasks, newer approaches to testing reading use more substantial bodies of text than the brief excerpts typical of older tests. Because these require more reading and response time, fewer assessment tasks or "items" are typical. For example, rather than having students read and answer multiple-choice questions about a dozen or more short passages, students may be asked to read one or two long pieces. The specific content of those passages may seriously influence that student's performance. This would limit the generalizability of any statements made about the student's reading of expository materials. In "one-shot" tests, there is thus a trade-off between the extent to which one can generalize performance in reading and writing to real ("authentic") situations, and whether one can generalize across examiners or tests.

One way to increase the reliability of statements about students' reading and writing performance while maintaining authenticity is to avoid dependence on one-shot tests, taking more advantage of continuous classroom assessment, at least where classroom practices reflect the literate activities of the real world. Standards 4 and 5 raise issues related to the reliability of assessment.

Validity

Historically, a common definition of a valid measure is that it measures what it purports to measure. The evidence for the validity of most reading and writing assessment tasks in the past was very thin, or nonexistent, often consisting only of how well a new test of reading, for example, correlated with some other measure of reading. If assessments of literate learning are to measure what they purport to measure, they will need to concern themselves with the nature of language. Valid assessments must then respect and value student diversity and acknowledge that there is generally no single "correct" response. Such assessments would allow for and encourage multiple interpretations of a reading selection and make provisions for allowing students to demonstrate their ability to construct meaning through multiple response modes such as writing, drawing, speaking, or performing.

To a very great extent, a valid assessment is one that reflects a valid curriculum. But more recent conceptions of validity include an examination of the consequences of assessment practices. In other words, one cannot have a valid assessment procedure that destroys curriculum in the process. Consequently, a more productive definition of a valid assessment practice would be one that reflects and supports a valid curriculum. As standard 6 asserts, assessment must have consequential validity.

Validity issues are addressed particularly in standards 1, 2, 3, 4, 5, and 6.

 


These excerpts are from Standards for the Assessment of Reading and Writing, prepared by the IRA/NCTE Joint Task Force on Assessment. To order Standards for the Assessment of Reading and Writing, visit the NCTE Online Bookstore, call the NCTE Customer Service Department toll-free at 877-369-6283, or e-mail .


 
 
 
Copyright © 1998- National Council of Teachers of English. All rights reserved in all media.
1111 W. Kenyon Road, Urbana, Illinois 61801-1096 Phone: 217-328-3870 or 877-369-6283
Read our Privacy Policy Statement and Links Policy. Use of this site signifies your agreement to the Terms of Use.
Educator Resources:  Elementary  |  Middle  |  Secondary  |  College  |  Parents/Students  |  Press/Policymakers  |  Job Announcements