New Approaches to the Assessment of Reading and Writing: Focus on Language Awareness

Silvia-Maria Chireac, Norbert Francis & John McClure

The following report is an abbreviated version of “Awareness of form and pattern in literacy assessment: Classroom applications for first and second language,” published in The Reading Matrix, v. 19, n. 1, pp. 20–34 (2019).

How educators understand literacy learning determines the methods we use for evaluating the progress that our students make in becoming proficient in reading and writing. The approach that we favor in our project is one that attempts to estimate this proficiency at the levels of orthographic knowledge, sentence processing and discourse comprehension/expression with the idea of better understanding the components of literacy. Here, the idea of “components” should be taken in the more informal or broad sense, the sense that we have in mind when thinking about how students learn the skills and abilities related to word identification, spelling, sentence production, text comprehension, coherent expression in writing, and so forth. Research on how these skills and abilities develop has suggested that awareness, focused attention, is an important aspect of ultimate attainment when advanced comprehension and expression is important (Ehri 2014, Gombert 2005). Focused attention and awareness is one of the key dimensions that the analysis of the pilot assessment, described in this report, sought to measure.

For example, the foundation of fluent reading, necessary for advanced literacy in both first and second language, is the development of efficient word recognition skills (Share and Stanovich, 1995). At the text level, effective comprehension depends on the ability to attend to and reflect upon ideas and concepts, and how they are organized into larger units. In the same way, in skilled writing, a special kind of awareness of language is necessary for on-going monitoring and self-correction even on the first draft attempt. How, then, can teachers measure progress in these different aspects of literacy learning?

The other consideration related to the relationship between assessment and teaching is the direct positive effect that the testing procedures should have on student learning. Ideally, learners should benefit, directly, from working on the problems they are presented with in the assessments we give them. The problem-solving tasks include the training activities they work on in preparation for an assessment and the review of correct and incorrect responses after an assessment. This is one of the ideas behind the concept of formative evaluation. In turn, measuring progress, of the summative kind, should be useful for program evaluation: is the literacy teaching program effective? Answering this question then counts as an indirect, longer term, benefit to learners.

San Lucas

Participants and their community

The site of the study was the town of San Lucas (approximately 6,000 inhabitants) located on the Pan-American Highway, in northern Loja Province, Ecuador. A rural economy (predominantly cattle raising, cultivation of corn, potato and bean), 91% of residents self-identify as belonging to the Saraguro ethnic group, historically Quichua-speaking. In San Lucas, today, only a small percentage of the population speaks the language. Children enrolled in the public elementary school are all native speakers of the national language (NL) of instruction, Spanish. Knowledge of Quichua in the school-age population is rare. Consistent with the official Ministry of Education curriculum, students receive weekly instruction in Quichua as part of a language revitalization program.

San Lucas

Three assessments

Results from the piloting of the first assessment, of reading ability, will be reported below. In this section we discuss the design features of the other two tentative components of the literacy evaluation for the purpose of presenting an overall framework of how they might be related upon completing the design of all three components.

Reading — the cloze test

Of the different variants of the cloze procedure the project selected the text-level narrative type to tap into, potentially, comprehension processes beyond the sentence level in addition to word identification skills at the sentence level. For each of the 20 items, the reader selects the correct response from three choices (see “Historia de los ninos que vivian en el bosque [Story of the children who lived in the forest]” Appendix 1). Omissions, together with the selection of distractors, were chosen primarily to require that the reader take subsequent sentence-level context into account (14 items). For six items, previous context provides sufficient information to eliminate the two distractors. Nevertheless, in all cases, all previous context from the beginning of the story might provide the reader with relevant information for a correct response. An omissions/total-word-count ratio of 1:9.6 was maintained by generally avoiding more than one omission in any independent clause, which was the case for 18 of 20 items. For example, in item #9 the omission appears in a dependent clause, in items #13 and #14 the omissions appear in two separate independent clauses (see Appendix 1).

Items #9 and #10:

Cuando completó ____ veces siete, se le cayeron las ______….

(dos, los, la) (manos, piernas, brujas)

When [she] completed _______ times seven, from her fell off [her] ________….

(two, the [masc. plural], the [fem. singular]) (hands, legs, witches)

Items #13 and #14:

Salió al _________, se quejó y empezó a __________.

(otro, árbol, patio) (volar, velar, valer)

[She] went out to the _______, [she] groaned and began _______.

(other, tree, patio) (to fly, to watch over, to cost)

Almost all omissions required a content word for completion. We are proposing that the present closed-ended (choice-question) design represents an improvement over the project’s previous more open-ended (limited-response type) application of the cloze procedure which provided a word bank for each page of text. The word bank consisted of a list of correct responses plus one distractor. In turn, our hypothesis is that the multiple-choice format might be superior overall, especially for beginning readers, to the traditional cloze test that provides no choices. The latter no-choice design appears to impose processing conditions that are simply too onerous for all except the most advanced readers, potentially leading to unreliable results, especially in the lower grades and for beginning second language learners of the text language. Even using grade-level material, response patterns, depending on circumstances, can end up being massively random for a significant portion of the participants, in addition to results with many items left blank.

In a study by Mostow et al. (2017) the interesting design features of the cloze test that provides choices for each omission in a closed-ended design are discussed at length. For example, the selection of distractors allows for managing the task difficulty of items and possibly studying different aspects of word identification and comprehension processes. Among the three incorrect choices distractors can be systematically included that are:

  • ungrammatical,

The authors’ design emphasized predictability even more than in our proposal by omitting only the final word of a sentence, with a resulting omissions/total words ratio that is even higher. The point is well taken, as ratios lower than 1:9, in our view, begin to undermine the very purpose of the assessment: estimating reading comprehension approximating actual text processing demands, an important consideration of validity. An interesting contrast to the Mostow study, together with the present closed-ended item proposal, is the discussion by Brown (2013) regarding the limitations of the traditional cloze procedure (no multiple-choice) in a wide-ranging retrospective of the research.

Error correction

Separately, a series of three error correction evaluations was designed utilizing the same text to probe the initial acceptance on the part of students of the assessment tasks. Future piloting will select the one error correction test that yields the best performance across the grades to accompany the cloze test. The correction task requires students to read the story and identify as many errors, previously introduced into the text, as possible, and then supply the correct form. The assessments seek to estimate students’ ability to identify and correct, on the three separate instruments, errors of spelling (20 errors total, Appendix 2), grammar (20 errors, Appendix 3) or punctuation/capitalization (25 errors, Appendix 3).

Target errors of orthography on the spelling test include 14 homophonous or near-homophonously spelled non-words, and 6 similarly sounding words or non-words. On the grammar correction test, 19 errors are correctly spelled words plus one omission (e.g., errors of concordance, subject-verb agreement and gender, all of which correspond to basic sentence-level grammar knowledge of native speaking children, not aspects of higher-order academic register). On the third correction test, all 21 capitalization errors occur in real words, otherwise, spelled correctly. All 4 target punctuation errors consist in the omission of a period (a clue is provided by inserting a space and correctly writing the first word of the next sentence with an upper-case letter). Thus, on each test, in turn, readers are specifically directed to attend to spelling errors, to sentence grammar errors, or to grammar problems at the sentence level and beyond occasioned by the error of punctuation or upper/lower case.

Each item on the three instruments receives a maximum score of 2 (successful identification + correction), 1 (identification), or 0. We recommend that in the application of the cloze test-error correction test pair that the former precede the latter, preferably on separate days. During the application of the cloze test, students can be told that if they select all the correct words there will be no omissions or errors remaining. In any case, the tests should always be given in the same order from one group to another: cloze test first. Using the same instrument for 2nd, 4th and 6th grades, if the complete piloting project so indicates, direct group comparisons can be easily made. For obvious reasons, none of the proposed assessments are recommended for use as tests in 1st grade. For 1st grade, the materials can be used as a group learning activity, with the teacher reading the text aloud to the class.

This design, we propose, is an improvement over our previous (open-ended) design for error correction in which students freely selected which errors to correct on their own first draft compositions. On the previous design, after a training session that introduced the children to correcting and revising techniques, attempts were scored for effectiveness (degree to which the revision improved the original text segment) and for level of text (word, sentence, discourse). Aside from considerations of practicality (the test is usable only by full-time researchers), there is no way to systematically focus the assessment on learners’ ability to attend to specific error patterns of interest to the teacher and the instructional program.

Sample writing assessment from Mexican students

Written expression

Finally, to complete the series, a writing sample was taken from students, utilizing a completely open-ended design, based on a narrative model presented orally with graphic context support (“El cazador de venados [The deer hunter],” Appendices 7–10). Following up on the successful application of this prompt from a previous study in Mexico, a satisfactory initial response (full participation by all grade levels) was received, pending analysis for coherence and inter-grade level comparison. As a typical classroom literacy activity, no previous training session is required, as was widely confirmed in the Mexican study in two separate second language learner communities, with participants starting at age eight. Nevertheless, providing a narrative model (serving as a story-grammar framework) from which to compose is an important feature to ensure consistent access to a previous knowledge schema for the writing task, necessary for comparison purposes. Foreign language and indigenous language versions of the writing assessment follow the same protocol with a different story of equivalent event structure and narrative pattern, again for comparison (see Francis, 2012, for implementation in two languages for bilingual learners).

Sample writing assessment from Mexican students


Especially taking into account the relevant experience of the youngest beginning readers, evaluator and students work together on the cloze training activity (“El rey Midas [King Midas]” Appendix 5) to solve all of the items. Learning the logic of the cloze test, for example how to process both previous and subsequent context, and in the present case, for understanding the concept of distractor, is important for maintaining comparability across the grade levels. The evaluator selects a distractor to prompt students: why the wrong choice produces an ungrammatical sentence or a sequence that is not semantically compatible with the rest of the sentence or with the larger passage (“doesn’t make sense”).

For the error detection and correction assessment, students will first work with the teacher/evaluator on the correction of the training texts (Appendix 6):

  • how to be attentive, respectively, to errors of spelling, grammar or punctuation/capitalization, and

After helping students make the appropriate revisions, in the same way as on the cloze training activity, the teacher/evaluator deliberately selects a non-target word (no error) to modify incorrectly. Students are prompted to explain how a correct form was made incorrect. We emphasize, that except for the writing sample for which no practice session is required, students should not attempt any of the assessments without first working with the teacher/evaluator on the corresponding training activity. After reviewing responses on the practice test as a group, students submit a completed attempt with responses to all items answered correctly. The first two assessments, cloze and error correction, should be given on different days, and students should not be allowed to consult their cloze test page while working on the error correction assessment.


On the cloze test, there appeared an interesting correlation between the need to use subsequent sentence context and difficulty. Hypothetically, the more difficult items required more deliberate attention to meaning and grammar pattern and higher degrees of controlled processing at the word and sentence level. Thus, this dimension, related to sentence complexity, can be varied to generate a range of difficulty among items. The finding, by the way, coincides with previous results from our Mexican assessment project: students who were able to recover from reading errors (self-correcting) by successfully attending to wording after the error were shown to be, on average, more attentive readers and writers on other measures (Francis, 2012). We attributed the development of this skill set to be related, hypothetically, to more advanced levels of language awareness. The construct underlying this dimension of item difficulty (in that it follows from general theoretical principles of information processing in reading), plus the finding of progressive and consistent advances across 2nd, 4th, 6th and secondary grades, we propose can be taken as evidence in favor of overall validity.

The present alternative for classroom literacy evaluation, in its three parts described here, proposes for future research the idea that language awareness forms part of the central core of advanced literacy ability. Further empirical support for our proposal is still needed. But the different features of these assessments that are conceivably related to language awareness, or metalinguistic awareness, appear to be tied to the kinds of knowledge and skill that support advanced proficiency in academic uses of language. They appear to be the kinds of abilities that teachers promote when students are challenged with reading and writing tasks that are more demanding. Metalinguistic awareness (MA), if the hypothesis is shown to be correct, presents itself at all levels of processing when learners go beyond basic implicit language ability (as in everyday conversation):

  • beginning with phonemic and orthographic awareness, for young readers in particular,

In the higher-order domains, MA can be understood as metacognition applied to discourse-level comprehension, reflection on and monitoring of understanding. Metacognition of this kind comes into play, for example, in the cognitive demands of challenging expository text reading in school. Metacognitively aware readers apply self-monitoring strategies such as detecting the breakdown of comprehension to then engage in self-correction. When composing, writers would do the same when checking their own writing for correct spelling, sentence meaning and text-level coherence, all of this the hallmark of academic language proficiency in one of its core acquisitions: secondary discourse ability. These kinds of ability were the ones that we had in mind in the design of the error correction tests in particular (Appendices 2, 3, 4).

In regard to the features of the cloze test, while frequent backtracking in reading has been shown to correlate with difficulties in decoding among poor readers, other aspects of retrospection (“looking and thinking back”) might be shown to be associated with monitoring strategies characteristic of skilled and efficient reading. Hypothetically, monitoring takes effect at the point of contact where accurate decoding and comprehension processes come together, in a kind of self-regulation. Monitoring can be thought of as real-time problem solving and “trouble-shooting” applied to revision strategies, as in the case of the repair of comprehension breakdown. But to reiterate a point made in the Introduction, efficient word identification is the basic foundation that allows this kind of monitoring and self-correction to work.

Considering again the idea of evaluating both lower-level word processing and higher-level comprehension and expression, the proposed three-part series presents assessment options that include the range from:

  • the closed-ended choice item (the cloze test),

The limitations of each design might be able to be compensated for by results from one or both of the other assessment designs that present a different set of limitations, advantages and disadvantages. For example, trade-offs involve: different aspects of reliability, the transparency of task requirements (in the case of each assessment, what the target performance consists of), and the problem of making valid interpretations from results.

Returning to the question in the Introduction regarding the positive effect that evaluation methods should have on student learning, Amini and Ibrahim-González (2012) suggest from their findings that the cloze procedure, for example as a learning task integrated into the teaching program, compares favorably to other methods such as the post-reading comprehension question. Recall our recommendation on how to implement the cloze training activity in the section on procedures, in which students receive direct instruction on productive reading strategies. The error-detection/correction training activities (Appendix 6) are designed with this same purpose in mind.


Increasingly, students in Latin America are learning second languages in school, a minority or indigenous language (IL) as part of a language revitalization program, and/or English as a foreign language (FL), for learning a language of international communication.

Results of the present Ecuador pilot study continued work on a previous literacy assessment in the Mexico study with an eye toward developing instruments for classroom use by teachers themselves. In the case of the multiple-choice cloze test, the new steam-lined design, with its ease of administration and scoring represent improvements over previous formats that, aside from practicality, might be more accessible for beginner and second language (L2) readers. The reason for this is that solution of items requires recognition, as opposed to production of a language form, retrieved often with more difficulty from long-term memory. For example, in the case of a given lexical item, the L2 learner may have mastery of some of the components (e.g., orthographic representation, an aspect of meaning) but others only partially or not at all (its phonological form, properties of syntax and morphology). If the current multiple-choice format proves to be effective, parallel and equivalent assessments can be easily designed in the additional languages that form part of the L2 learning curriculum: FL-English and IL-Quichua, in the case of the highland region of Ecuador.

Scoring the limited-response type error correction test is somewhat more demanding, requiring a judgment: the comparison between the student’s response and the answer key’s expected response. Similarly as on the cloze test, at least partial credit is obtained by recognizing the incorrect form. The writing assessment, of course, is based on a purely productive task.

Satisfactory results would be indicated by parallel progress across the grades when performance on the literacy assessments is compared between the languages. Work on the development of the parallel L2 assessment materials follows the recommendations in Ryan and Brunfaut (2016): where, often, the “ideal test writer profile is…spread across more than one person” (p. 394). Second language materials are parallel and equivalent, but not translations of the L1 version. (1)


Chireac and Francis (2018) proposed an international study for evaluating the actual literacy achievement parity between boys and girls that should correspond to the region-wide Latin American gender parity in school enrollment at both the primary and secondary levels, historical attainment today confirmed by United Nations surveys of participation (attendance) in public school programs. Specifically, the proposal was for this evaluation to be carried out in rural community schools, with an emphasis on geographically distant and isolated localities for the purpose of confirming that actual parity in literacy performance corresponds to the reported enrollment parity, evenly, across all regional school systems. The report by Chireac and Francis, based on partial findings from three rural communities in Mexico and Ecuador, presented the working hypothesis that, in fact, such a gender parity has been attained, potentially measurable in actual performance on objective evaluation of literacy skills. Given that individual records of national school literacy testing, if they exist, cannot be relied upon in these communities, an on-site independent measure, taken by visiting evaluators, requires the use of an instrument of the type presented in this paper. For example, administered immediately following its corresponding training test, the multiple-choice cloze test provides for optimal conditions of consistency from one setting to the next. Aside from applications of program evaluation, as in the previous two examples, classroom teachers should benefit from ongoing access to practical assessment tools tied to the learning objectives of their lesson plans.

Bilingual classroom


1. Teachers and researchers are invited to administer the training activities and assessments in Appendices 1–10, and are welcome to publish the results, in addition to sending us a summary of the findings. Additional information regarding implementation of the training activities, testing and grading procedures is available by writing to the authors. As mentioned in the Method section, none of the materials are appropriate for 1st grade testing. In addition, individual student scores cannot be used for either norm-referenced or criterion-referenced comparisons. As should be obvious, but to be absolutely clear, the score of a given student (e.g. in Bolivia) that falls above or below the average reported for grades 2, 4 and 6 in this article (results from three elementary grade classrooms in Loja Province, Ecuador) should not be used as an estimate of his or her individual literacy learning progress in school or for any other individual assessment purpose, formal or informal.


Appendix 1

Cloze test

Appendix 2

Error correction test: Spelling

Appendix 3

Error correction test: Grammar

Appendix 4

Error correction test: Punctuation/Capitalization

Appendix 5

Cloze training activity

Appendix 6

Error correction training activity

Appendix 7

Model story (prompt) for writing assessment

Appendix 8

Graphic context support for writing assessment

Appendix 9

Sample response paper (two-sided)

Appendix 10

Writing Sample


Amini, M. and Ibrahim-González, N. (2012). The washback effect of cloze and multiple-choice tests on vocabulary acquisition. Language in India, 12, 71–91.

Brown, J. D. (2013). My twenty-five years of cloze testing research: So what? International Journal of Language Studies, 7, 1–32.

Chireac, S.-M. and Francis, N. (2018). Alfabetizacion en la comunidad rural de América Latina: Las niñas en la escuela. Contextos Educativos, 21, 153–168.

Ehri, L. (2014). Orthographic mapping in the acquisition of sight word reading, spelling memory and vocabulary learning. Scientific Studies of Reading, 18, 5–21.

Francis, N. (2012). Bilingual competence and bilingual proficiency in child development. Cambridge: MIT Press.

Gombert, J.E. (2005). Apprentissage implicite et explicite de la lecture. Rééducation Orthophonique, 223, 177–187.

Mostow, J.; Huang, Y.-T.; Jang, H.-J.; Weinstein, A.; Valeri, J. and Gates, D. (2017). Developing, evaluating, and refining an automatic generator of diagnostic multiple choice cloze questions to assess children’s comprehension while reading. Natural Language Engineering, 23, 245–294.

Ryan, E. and Brunfaut, T. (2016). When the test developer does not speak the target language: The use of language informants in the test development process. Language Assessment Quarterly, 13, 393–408.

Share, D. and Stanovich, K. (1995). Cognitive processes in early reading development: Accommodating individual differences into a model of acquisition. Issues in Education, 1: 1–57.



Norbert Francis works on problems of language and culture, research in Latin America and East Asia.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Norbert Francis

Norbert Francis works on problems of language and culture, research in Latin America and East Asia.