National Monitoring of Education
One important function of assessment is to monitor national changes in the education of young people so that the various stake-holders, including educators and the public and their representatives, can take any necessary actions to improve the quality of education. The following case studies present two examples, the National Assessment of Educational Progress (NAEP) in the United States and the National Educational Monitoring Project (NEMP) in New Zealand. Following the descriptions of these two national assessments, Table 1 compares the ways in which they meet (or do not meet) the assessment standards.
Case 1: The United States’ National Assessment of Educational Progress
The NAEP was developed as a test broad enough to cover what the designers considered appropriate educational domains including mathematics, reading, science, writing, the arts, civics, economics, geography, and U.S. history. The test, which was far too big for individual students to take, was then broken into smaller overlap-ping tests. These have been administered to a representative nation-al sample of 9-, 13-, and 17-year-old students every four years since 1969. The four-year cycle was considered appropriate because shorter term systemic change was viewed as relatively unlikely.
The tests, which include multiple-choice and extended-answer questions, are administered by individuals hired and trained specifically for the purpose. The sampling system was designed to be nationally representative but is deliberately structured in such a way that comparisons cannot be made among states, school districts, or cities. Such comparisons were viewed as likely to increase the stakes involved and thereby encourage people to engage in activities such as “teaching to the test,” which would then affect the extent to which the results could provide a valid representation of general achievement.
The NAEP results are presented to the public as scaled scores (from 0–300 or 0–500, depending on the subject) and at five percentiles through National Report Cards. Gains and (particularly) losses in performance are attended to by the press and politicians. The numbers remain relatively abstract since only a small percentage of the items are released for scrutiny by the public. The item structure of this long-term trend assessment test has been consistent since 1971 so that direct comparisons can be made over time. Participation is mandatory and sampling includes public and private schools, though in 2004 the private school sample was too small to be reported.
In 1990, politicians decided that enabling state-by-state comparisons would be a good idea, and energy was diverted to development of a second NAEP test. This second test, now called the “main NAEP,” is administered at grades 4, 8, and 12, only in public schools. It allows state-by-state comparisons and, on a trial basis, comparisons of large urban districts. It is administered every two years and changes about every ten years to reflect curriculum changes. Tests are administered in science, math, reading, and writ-ing. They are all administered in English. Some students are ex-cluded for various reasons. Although participation in the state-level test had been voluntary, the No Child Left Behind Act od 2001 required states receiving Title I money to participate in the reading and math tests. Test items include multiple-choice, extended-answer, and short-answer questions, and results are reported both in scaled score performance levels and in categories of achievement (basic, proficient, and advanced) determined by cut scores. These are reported to the public by the press, though it seems likely that most who receive the information have little idea of what is meant by either the scaled scores or the categories (i.e., what it means to be “proficient”).
Case 2: New Zealand’s National Educational Monitoring Project
NEMP uses a national sampling of students over four-year cycles to assess 15 different areas of the national curriculum: art, music, speaking, listening, viewing, health and physical education, science, reading, writing, math, information skills, graphs, tables and maps, social studies, and technology. Knowledge, skills, motivation, and attitudes are all assessed. The assessment includes items addressing material not in the school curriculum in order to monitor the effects of any changes in the national curriculum. Students are assessed in English at two pivotal transition periods, year 4 (age 8–9) and year 8 (age 12–13). In Mäori Medium settings, assessment is only at year 8. There is a deliberate effort to accommodate a range of differences in language, culture, gender, ability, and disability in the design and administration of assessment tasks. There are virtually no exclusions.
Almost all items are performance based, requiring students to work on tasks for three to four hours spread over five days, with the support of a trained teacher–test administrator. Tasks are selected to be meaningful and enjoyable for the students to ensure optimal engagement and the best picture of their capabilities. The task formats include working one-on-one with the teacher–administrator, working cooperatively in a group of four, and working independently on a series of hands-on activities or pencil-and-paper tasks. Some of the activities are videotaped and scored with rubrics. All items are carefully piloted.
In the NEMP, literacy is viewed as a social activity as much as a cognitive activity. For example, one task has a group of four year 4 students acting as the library committee. They are given a set of books and must choose, individually and then collectively, which books the library should purchase. The videotaped event is scored for the collaborative process as well as for individual performance.
School participation is voluntary; if a school is selected on multiple occasions or is unable to participate in a given testing, it is replaced with the most comparable school available. Replacement is rare because of a history of positive experiences. The test is admi-nistered by a group of teachers who are seconded from the schools, trained, and then returned to their teaching after the six-week test-administration period. Teachers are involved in the development of tasks, trialing of items, administration of tasks, and analysis of res-ponses, and they report that the experience provides excellent pro-fessional development, which they share with their schools upon their return.
Results are reported to the public and to educators in terms of national performance and the performance of subgroups by demographics (e.g., race, gender, school size and characteristics). Results are reported in different formats to accommodate a wide audience, but typically they are reported in concrete terms of types of item citing specific examples. About two thirds of the items are released in order to maintain transparency and, in addition, so that teachers might use these items to see how their students compare with the national sample.
Table 1. Analysis of National Monitoring Cases 1 and 2 in Relation to the IRA–NCTE Assessment Standards