Home | Call for Papers | Submissions | Journal Info | Links |
Journal of the Slovene Association of LSP Teachers
ISSN: 1854-
Melita Djurić
Dealing with Situations of Positive and Negative Washback
ABSTRACT
The article deals with the complexity of a washback phenomenon in language testing.
It focuses on its positive effects within an institution as well as on the situations
of negative washback. Washback is presented as a stimulus for a change and as a bridge
for efficient communication between teachers and testers. Certain changes as a result
of positive washback point at the opportunities which a testing institution has when
it organizes, designs and administers criterion-
The complexity of washback is confirmed when the teachers’ perspective is discussed.
Teacher-
Keywords: washback, STANAG tests, criterion-
validity.
1. Introduction
The concept of washback covers both teaching and testing situations. Washback is generally known as the effect of testing on teaching. Alderson and Wall (1993, as cited in Fulcher and Davidson, 2007: 224) described it as “a complex phenomenon” and in their washback hypotheses they assumed that teachers and learners “do things they would not necessarily otherwise do because of the test”. Alderson and Wall presented numerous elements which create positive or negative washback and emphasized the need to further investigate the nature of washback. Hughes (2003) in his second edition of the book writes about “greater interest in backwash than was previously the case and admits its importance in language testing (2003: 53).
Zavašnik and Pižorn (2006) examined the situation in Slovenia and found that there were no empirical studies of washback although external examinations and proficiency testing in a foreign language were introduced at the national level by the National Examination Centre in 1996. Another unfortunate fact is that testing as a field of applied linguistics is not included in the undergraduate or postgraduate studies of Slovenian language teachers. Consequently, this has resulted in an absence of empirical studies and research into the current language testing situation in Slovenia.
It is not only the National Examination Centre which organizes language testing in Slovenia. The Ministry of Defence also needs to organize proficiency testing that follows the NATO Standardisation Agreement (STANAG), which under the number 6001 describes language levels set for the international military community. The Slovenian Ministry of Defence established the School of Foreign Languages (SFL) in 1999 and STANAG testing for English started the same year.
This paper deals with washback at the institutional level as a result of ten years
of testing experience at the SFL. The text begins with the theoretical background
of three issues which are closely connected, criterion-
2. Criterion-
Brown and Hudson (2002: xiv) claim that criterion-
Hughes (2003: 21) explains the purpose of criterion-
Criterion-
2.1. Test validity
This part has no ambition to cover the complete theoretical background of test validity. It deals with some aspects of test validity which support the connection between test validity and washback especially considering the issues developed later in the paper.
Alderson and Wall (1993, as cited in Fulcher and Davidson, 2007: 223) claim that washback cannot be related directly to a test’s validity and criticize the statement of some writers that a test’s validity should be measured by the degree to which it has had a beneficial influence on teaching. Alderson and Wall reject the concepts “washback validity” because “this form of validity has never been demonstrated, or indeed investigated, nor have proposals been made as to how it could be established empirically rather than asserted” (Ibid.).
Messick (1996) emphasizes two elements of test properties, authenticity and directness,
because they are likely to produce washback. He classifies both properties under
construct validity. Looking at the broader concept of “validity framework, washback
is seen as an instance of the consequential aspect of construct validity” (1996:
242). To encourage positive and reduce negative washback, testers should minimize
construct under-
“If a test’s validity is compromised because of construct under-
(Messick, 1996: 247)
Positive washback is according to Messick linked to authentic and direct assessments
and to the need to minimize construct under-
Hughes (2003) agrees that direct testing implies the testing of performance skills with texts and tasks as authentic as possible. “If we test directly the skills that we are interested in fostering, then practice for the test represents practice in those skills” (2003: 54). He is very explicit in promoting direct testing:
“If we want people to learn to write compositions, we should get them to write compositions in the test. If a course objective is that students should be able to read scientific articles, then we should get them to do that in the test.”
(2003: 54)
For the needs of this paper the concept of washback will be dealt with as consequential validity at the level of institutional organization first and at the level of curriculum later.
2.2. Washback
Alderson and Wall (1993, as cited in Fulcher and Davidson 2007) include different factors in their washback hypotheses. If teachers use tests to get their students to pay more attention to lessons and to prepare more thoroughly, it is positive washback. If teachers fear poor results and the associated guilt which might lead to the desire for their students to achieve high scores in tests, it might be a reason for teaching to the test. Consequentially, teachers narrow the curriculum and produce negative washback. In their Sri Lankan Impact study Wall and Alderson (1993) came to an important conclusion that “tests have impact on what teachers teach but not on how they teach” (1993: 68).
Bachman and Palmer (1996) place washback within the scope of impact. They understand learning and teaching as two processes which “take place in and are implemented by individuals, as well as educational and societal systems, and society at large” (1996: 30). Understanding washback as an intended outcome of the test, Bachman and Palmer expect “the specific components (for instance, teaching method, curriculum, materials) to be affected and the degrees to which they are affected” (1996: 137).
Shohamy, Donitsa-
Bailey (1996) refers to washback to the learners and washback to the programme. The latter includes judging students’ language in relation to the expectations of the curriculum, to determine whether the school as a whole performs well or whether teaching methods and textbooks are effective tools for achieving the curricula goals.
Being aware of the complexity of washback, Hamp-
In her review of empirical studies of washback, Spratt (2005) identified the areas of teaching and learning which could be affected by washback: curriculum, materials, teaching methods, feelings and attitudes, learning. A teacher is the most important and influential agent in the process of introducing the effects of washback into teaching and learning. Spratt sees teachers facing “a set of pedagogic and ethical decisions about what and how best to teach and facilitate learning if they wish to make the most of teaching towards exams” (2005: 27).
Wall and Horak (2008) focus on the role of communication in creating positive washback. They found that teachers usually do not understand the nature of tests and encourage testers to communicate their intentions so that teachers and learners can prepare for new kinds of assessment. They also call for dissemination of the principles embodied by the tests and the provision of teacher and learner support and conclude “Much advice is available from exam designers and teachers, if only someone could collect it and organize it effectively” (conference presentation, 2008).
Washback (and its communication) shows itself as a gap or a bridge between teachers and testers as well as an indicator for a need for change. If teachers are not isolated from testing and if they recognize and respect ethical principles in the classroom, their awareness process works towards positive washback and they will promote good practices. The complex nature of washback allows broad expectations in different areas. Consequently, washback can be understood as a powerful tool to introduce changes not only in teaching and testing but also in educational policy if it is supported by evidence and/or research.
Situations of positive and negative washback will be described in STANAG testing situations.
3. English language tests STANAG 6001
STANAG proficiency levels were introduced in 1976 (Edition 1) for English and French languages and updated in 2003 (Edition 2). Three purposes of the document NATO Standardisation Agreement, STANAG 6001 were:
-
-
-
Language proficiency levels and standards classify STANAG tests among criterion-
The descriptions of five levels give definitions of language proficiency in four language skills: oral proficiency (listening and speaking) and written proficiency (reading and writing). A language proficiency profile (Standard Language Profile, SLP) is recorded by four digits indicating the specific skills in the following order: Listening, Speaking, Reading, and Writing (SLP 3321).
STANAG 6001 tests are language tests for the military but Green and Wall (2005) found in their study that the tests were not necessarily ESP tests in nature. Some country testing teams practice a general English approach while others add more or less a specified number of military texts. Testing teams usually know SLP requirements within NATO but they have little or no information about what candidates should do with the language. Testers and teachers need to know what constitutes adequate language performance. It is an important issue for test constructors, as it has ”implications for how much they can be expected to contribute to the design and running of tests and how much candidates from different backgrounds and levels of the hierarchy can be expected to handle” (2005: 382).
The Slovenian Armed Forces included the STANAG language levels among the criteria
for working positions from the lowest to the highest military ranks. The same criteria
were upheld for the civilian employees of the Ministry of Defence. To meet the needs
of STANAG levels for military personnel, the SFL used to organize three testing sessions
per year and test 200 – 250 candidates in all four language skills for levels 1-
The teaching staff of the SFL was small and nobody was specialized in testing. This
situation was typical in other countries as well and not just in Slovenia. In his
survey of modern language testing, Bachman (2000) writes that language testing as
a subfield within applied linguistics “evolved and expanded in a number of ways in
the past 20 years or so” (2000: 3). Foreign contract testers and teachers who worked
temporarily in the SFL were mostly American and English. They organized internal
workshops to familiarize Slovenian language teachers with basic testing principles.
At the same time international networking of military testers started. Testing workshops
and seminars were organized by the British Council Peacekeeping English Project and
the Bureau of International Language Co-
Testing sessions were reduced to two per year in 2003 when a systematic approach
to organization and quality of proficiency testing started. As of 2008, the SFL has
three full-
3.1. Experiences of positive washback
STANAG tests are high stakes tests, which significantly affect the lives of those who take them. Within this context, washback represents a constant pressure to respond to test results with appropriate actions. In the SFL, it was not before 2005 that certain changes started being introduced. The situations will be described first and the actions as examples of positive washback will be presented later.
Situation 1
Following the test results from 1999, the SFL testers were faced with significant differences in language proficiency on a scale within the same STANAG level. For example, the STANAG threshold level 2 differed in the quality of language knowledge a lot from level 2 at the upper part of the scale. When the candidates from both extremes on the scale applied for a higher level course (STANAG 3), the teachers complained that some of the students who had reached level 2 did not show adequate language knowledge and as a result the students did not form a homogenous learning group. The need arose for levels within a level.
Situation 2
The Slovenian STANAG requirements for writing skills were/are a level lower than for the other three language skills. In 2005 Slovenia became a NATO member and suddenly a demand for and the awareness of the importance of writing skills increased. There was not much that teachers could do because both the teaching and testing staff of the School were aware that weak writing skills in English may be correlated with weak writing skills in the mother tongue. In the past, the experience of learning and teaching reading techniques in English improved the reading skills in the mother tongue. The challenge to introduce the same practice for writing skills was in the air.
Situation 3
Test results showed that younger generations of military personnel (20-
Situation 4
Candidates have to re-
The actions and changes introduced referred to the programme and the policy of the SFL:
Descriptive marks within levels
Three descriptive marks within each STANAG level were introduced: threshold, good,
excellent. Two main purposes were to better place students in language courses and
to give more specific information about the reached language level to the personnel
department and to the candidates themselves. Descriptive marks explain that at the
level of threshold a candidate’s language is too weak for a higher-
Writing courses
New courses were developed aiming at improving writing skills. During the course
design phase the element of mother tongue was also considered and the content was
aimed at improving writing skills and organization of writing in both languages.
The Writing 2 and Writing 3 curricula included contact time with teachers and self-
Refresher courses
Two new courses were developed, Refresher Basic and Intermediate, aiming at refreshing and improving the existing language levels STANAG 1 and 2. Two additional purposes were to fill the gap between the existing courses and to offer those with threshold marks an opportunity to raise their language level up to good so that they would be able to apply for higher level courses. The Refresher Basic has proven to be a very attractive course and the SFL has organized 6 from 2006. There have always been more candidates than free slots which is not the case with the Refresher Intermediate. It was organized once and was not full. Our assumption is that the language level STANAG 2 is the realistic language level for functional purposes of military employees.
Exam-
A solution as to how to reach the military population with basic and simple information
about proficiency testing was to publish information about tests in a military newsletter.
After it was published, students made little effort to find out more about the test,
although some made inquiries by phone. As a result testers produced two booklets;
Frequently Asked Questions and a Self-
3.2. Experiences of negative washback
It has been mentioned above that teachers often bring negative feelings into the
issue of testing especially in the cases of high-
Course content -
Evidence about the course content was gathered from final course reports, evidence about the test content was taken from feedback forms which test takers and test administrators filled in after STANAG exams.
Test takers were from two different level courses, STANAG 2 and 3. They perceived
bi-
Final course reports reflected the opinions of students’ about the course content after the course and before the exam. Listening skills came out repeatedly as the skills not taught enough during courses and students’ perception was that they did not progress in listening as much as they could. The same information appeared after several courses and the course directors and course teachers were expected to discuss the issue and suggest a change whether in test method or in the organization of the listening module. Unfortunately, little has been changed if at all. This fact can be understood as confirmation of Wall and Alderson’s statement (1993: 68) that tests do not have an impact on how teachers teach especially because testers did not highlight the difference in interpretation between the course content and the test content.
Feelings/attitudes
Feelings/attitudes were not observed or noted in an organized way. We are going to assume them from test administrators’ feedback being aware that there are different factors which might contribute to our assumptions (subjective perceptions, tense atmosphere resulting from test anxiety, etc.).
Test-
It can be assumed that listening tests and the listening module of the course are
both affected by the lack of information about the target language situation. Not
being able to specify an authentic listening situation has resulted in inefficient
teaching and learning practices. After testers find an accurate as possible description
of the target language, it will be realistic to organize listening training sessions
and discuss the appropriate teaching methods with teachers. Teachers will then be
able to re-
3.3. Lessons from washback experiences
In the SFL, washback is observed during each testing session. So far more action has been taken concerning the educational policy and curriculum organization than teaching methods and specific teaching situations. It seems that changes which can be interpreted as positive washback are more efficiently introduced into the system than into the thinking process of teachers and testers.
The actions of positive washback placed new tasks on testers and teachers:
-
-
-
Considering the difference between course content and test content, Wall and Alderson
(1993) reported that “teachers cannot tell by looking at the exam how they should
teach” (1993: 66) a certain skill and many teachers are “unable, or feel unable,
to implement the recommended methodology” (1993: 67). They found that ”an exam on
its own cannot reinforce an approach to teaching the educational system has not adequately
prepared its teachers for” (Ibid.). Among elements of the educational system Wall
and Alderson included insufficient exam-
Similarly, testers need course-
Negative washback has raised new issues for teacher training: designing courses aiming at specific language levels instead of at test tasks or a test in general; classroom assessment according to the objectives of the course instead of the textbook content; training in methods to teach individual language skills.
The areas remaining open for research are the following:
-
-
-
-
-
A number of questions that are and will be difficult to answer.
4. Conclusions
Washback needs to be planned, observed, studied, and communicated. The process of
producing positive washback includes testers and teachers, their training, communication
and consistency. The management of a language/testing institution needs to inform
teachers and testers how influential their roles are when introducing changes at
the institutional, programme or classroom levels. These changes present positive
washback when teachers know how to introduce a change and when testers and teachers
are aware of their professional responsibility and ethical aspects. The culture of
sharing teaching and testing information and further discussion on these professional
issues will contribute to the awareness of and a need for professional development
and life-
Negative washback does not necessarily have negative effects. As soon as negative washback is noted it can be addressed. Considering its complex nature it is difficult enough to identify it but responding to it professionally and timely is the responsibility of testers, teachers and institutions towards their clients – students and test takers.
Teachers help testers improve their tests, testers help teachers improve their teaching and both need to accomplish a common mission i.e. help students and test takers reach the course objectives during a course and reach the required language level by valid tests. Changes as results of washback should be introduced to improve teaching and testing processes primarily for the sake of students and test takers.
Acknowledgements
I would like to express thanks to both reviewers for their detailed reading and constructive suggestions.
1 Hughes uses the expression backwash.
2 Level 1-
3 SLP 3321 means level 3 in listening, level 3 in speaking, level 2 in reading and level 1 in writing.
4 BILC is an advisory language body to NATO.
5 1110, 2221, 3332.
6 Beginner course – 300 hrs, Intermediate course – 300 hrs, Upper-
7 Testers' purpose was to elicit a certain language sample from a memo and a letter as two test tasks. However, a memo does not represent a common practice in Slovenian official correspondence and letters are not perceived as a standard way of communication any more.
References
Bachman, L. F. and Palmer, A. S. (1996). Language Testing in Practice. Oxford: Oxford University Press.
Bachman, L. F. (2000). Modern language testing at the turn of the century: assuring
that what we count counts. Language Testing, 17 (1), 1-
Bailey, K. (1996). Working for washback: a review of the washback concept in language
testing. Language Testing, 13 (3), 257-
Brown, J. D. and Hudson, T. (2002). Criterion-
Fulcher, G. and Davidson, F. (2007). Language Testing and Assessment. London, New York: Routledge.
Greene, R. and Wall, D. (2005). Language testing in the military: problems, politics and progress. Language Testing, 22 (3), 379–398.
Hamp -
Hughes, A. (2003). Testing for Language Teachers. Cambridge: Cambridge University Press.
Messick, S. (1996). Validity and washback in language testing. Language Testing,
13 (3), 241-
NATO Standardization Agreement, STANAG 6001. (2003). Edition 2.
Shohamy, E., Donitsa-
Spratt, M. (2005). Washback and the classroom: the implications for teaching and learning of studies of washback from exams. Language Teaching Research, 9 (1), 5–29.
Wall, D. and Alderson, J.C. (1993). Examining washback: the Sri Lankan Impact Study. Language Testing, 10 (1), 41–69.
Wall, D. and Horak, T. (2008). The Role of Communication in Creating Positive Washback. Presentation at EALTA Conference, Athens.
Zavašnik, M. and Pižorn, K. (2006). Povratni učinek nacionalnih tujejezikovnih preizkusov:
opredelitev pojma in posnetek stanja v svetu. Sodobna pedagogika, 57 (1), 76-
© 2005-
Scripta Manent Vol. 4 (1)
» Contents
Key Issues in Testing English for Specific Purposes
» M. Djurić
Dealing with Situations of Positive and Negative Washback
Previous Volumes