Monday, February 2, 2009

Language and Testing Evaluation

1. Some people say that tests that are easy to make up are hard to grade, and tests that are hard to make up are easy to grade. How may this relate to the contrast between “subjective” and “objective” tests?

Firstly, let’s look into the differences between subjective and objective tests. A subjective test refers to “question of application”. In other words, it is a format where there is no exact or single answer to the question. Often, students may need to elaborate longer and give more examples to support their claims and understanding. In most subjective question however, questions are often of a simple type such as “discuss”, “elaborate” and “explain”. But in grading time, scorers may face difficulty in determining whether the answers given by students are acceptable as some opinions or writing may seem logical but not the answer that are aimed to. Not only that, what is agreed by a course leader may not be agreed by other lecturers. So, wrapping up, subjective test may be easy to make up but hard to be graded.

Objective test on the other hand is a format whereby the test items are evaluated objectively and have only one right answer. It can be said as the “questions of constitution” whereby there has already an establishment of answers to a particular questions. For examples, true and false, multiple-choice, matching etc. This is the type of test that may be easy to grade but hard to make up. At the end of the day in marking an objective test, most scorers do not need to exercise judgment over the answer whether they are correct or incorrect but they can easily follow the marking key. Some institution even provides scanning machines and computers to ease the grading stage. However, it maybe hard to make up since an answer key may specify the correct answer for a one-word, gap-filling item, but there may in fact be multiple, acceptable alternative responses to that item that the teacher or test developer did not anticipate (CARLA; Evaluation Process). In illustration, sometime, during class discussion over a recent test, teachers may find out that some of the given responses are equally or partially correct. Thus, in creating an objective test, teachers need to make sure that the given questions and answers are the exact one and not having other ambiguous choices.

(a) What are the basic characteristics of “good” tests?

According to Mark Coughlin (2006), a good test should have a positive effect on learning, and teaching should result in improved learning habits. Thus, a good quality test should be able to discover and locate the specific and exact areas of difficulties that the class or particular individual is facing. This indeed need to be highlighted since every learner has different kind of ability and capability including their ways of absorbing knowledge. Only by then, teachers would be able to create effective practices and exercises to assist the students’ learning plus improvising their own teaching methods and materials. Most importantly, a well-developed test could provide the students an opportunity to show how well they can perform a particular language task and the chance to learn from their own weaknesses through the exam papers or exercises.

Basically, a good test can be characterized based on its “reliability”, “validity” and “practicality” (Bachman 1990; Harris 1969; Lado 1961). Reliability is the accuracy of measurement. Specifically, it refers to a test-measuring instrument which attempts to determine whether a particular test that is given to the same respondent on a second occasion would be equal to those of the first occasion (Cohen, 1994). Among the factors that influence the reliability of a test are firstly the test factors (i.e. layout of the test, familiarity of the respondents toward the test’s format, clarity of instruction), secondly is the situational factors (i.e. the exact environment that the respondent is in during the test such as the physical space, lighting etc.) and finally the individual factors (i.e. the physical health and psychological state of the respondents, their cognitive abilities and motivation-driver etc.) (Cohen, 1994). Thus, according to Alderson, Clapham and Wall 1995, one way to test the reliability of a test is to use parallel-form reliability wherein the scores from two very similar tests that have been applied to the same students are compared. In addition, in order to determine their parallel-form reliability, they suggested that both tests should include identical instructions, same response type as well as number of questions.

Secondly, validity refers to how well the assessment instrument has measures the original objectives of the test (Cohen, 1994). Cohen claimed that there are several terms often associated with validity and this validity is the elements that help to determine a good test as well. Examples given by Cohen are face validity (whether the test is legitimate to the respondent), criterion validity (verification of the functionality of the test in comparison with another language test of equal value), construct validity (how a respondent's performance correlates on two different tests which are testing the same abilities), content validity (how well the test correlates with the objectives of the course being evaluated), systematic validity (evidence of progress in the respondent's skills after the test is applied), internal validity (the perceived content of the test by the respondents) and external validity (comparison of a respondent's test results with their general language ability).

Last but not least, practicality. In other words, the test should be sensible or realistic. For instance, the feasibility of the TEFOL test can be seen when everyone from around the world are able to sit on the same test which already has a standardized format and questions etc.

(b) Is testing the word hurry in the context of this sentence better than testing it in isolation? Is it possible to respond correctly to this item without knowledge of the highlighted word?

“The traveler had to hurry to the boarding gate, because the plane was about to take off”.

A. Walk B. Look C. Refer D. Rush

Personally, I believe that testing the learner’s knowledge in context is indeed a good and effective way of testing and learning as well instead of teaching the students and testing subjects in isolation. In this illustration, by testing the student such as the below bolded example may not only causes the students that have not come across the word before give up trying but also to simply circle a choice. As a matter of fact, Mark Coughlin mentioned that a well-developed test should provide an opportunity for students to show their ability to perform certain language tasks. A test should be constructed with goal of having students learn from their weaknesses. Last but not least, I agree to Coughlin’s argument that a good test can actually be used as a valuable teaching tool. In dry question like below, students might not only fail to answer but fail to learn at the same time.

Hurry = ____________

A. Walk B. Look C. Refer D. Rush


Conversely, by testing the word in context like the above former example, students are provided with contextual clues. In other words, even though they might not know the meaning of the word “hurry”, the context given “had to hurry” and “the plane was about to take off” hint that “had to” means a must and thus “hurry” is a verb that indicates fast or quick action and the closest answer to it would be “rush”. So, in conclusion, it IS possible to respond correctly to this item without knowledge of the highlighted word with the help of the context given.

If so, what language skills/knowledge does the item test?

The item illustrated test the students’ contextual clues skills whether the students are able to apply the least knowledge that they have on the context or situation given to the particular word that they do not understand.

(c) Would knowledge of the word hurry be necessary to respond correctly to this item if it were to appear in isolation, with no context?

Basically, if the word “hurry” is tested in isolation, students that do not know the answer may not strive hard to work on it since they do not even know the correct answer to the dry question. So, since they do not know the answer, there is no urge to respond correctly. Different to testing the word in context, there is a sense of necessary that there should be only one correct and appropriate answer that can fit into the context given. By then, students do not only try hard to answer the question but also stirring up their cognitive ability by using and applying whatever knowledge they have regarding the question and learn at the same time.

(d) What advantages and disadvantages do you see for discrete point and integrative tests based on these considerations?


Integrative test refers to a test that requires students to use their knowledge and skills to complete a task (Dr. Kathleen Bueno, July 1999). It tests more than one point at a time (Cohen, 1999). All the linguistic components, including more than one skill may be required in assessment without specific reference to or identification of particular sounds, words and grammatical rules (Harry L. 1997; Daniel J. 1997). In other words, in an integrative test, language component (i.e. vocabulary, grammar) and skill (i.e. listening, speaking) are not tested separately (one skills at a time) but concurrently. In illustration, student may be asked to write a summary of the students’ favourite movie watched during the summer break.

Indeed, integrating all the language skills and components together into context is an effective idea. Its advantage is students are able to develop all their language knowledge in parallel. The students do not merely learn the language but also its application especially in context of their daily life. However, the disadvantage of it is, if students do not understand the lesson, they may not be able to do the application since the test actually tests mostly the understanding of the students toward a particular subject.

Discrete-point test on the other hand refers to a test where the individual test questions focuses only on a particular piece of knowledge and skill (Dr. Kathleen Bueno, July 1999). It tests one and only one point at a time such as isolated grammar, vocabulary and socio-cultural knowledge (Omaggio, Chap. 9). This test is sometimes thought of as “objective” due to a list of specific points that can be stated, based on language description, and questions and test items can be written with those specific points as their focus (Harry L. 1997; Daniel J. 1997). For instance;

Define “home”.

A. The building where you stay in.
B. The land of your birth.
C. Your family.


Above is rough idea of the focus of a discrete-point test. Its disadvantage is there is or must be one specific answer to a question. As students read different materials individually, the knowledge that they have collected may be different as well and thus definition might be different as well. However, its advantage may be the clearness that the students can gain. Things may look organised to them as they learn and being tested accordingly, from one point to another.


References:

Books:


Harlig, K. Bardovi, Hartford B. Beyond Methods: Components of Second Language Teacher Education. Mc-Graw Hill, US, 1997.

Kate De Benedetti, Language Testing: Some Problems and Solutions, Vol 30 (Num. 1), Universidad de Guanajuato, Mexico, 2006.

Keith Morrow. Communicative language testing: revolution or evolution? The Communicative Approach to Language Teaching, Edited by C.J. Brumfit and K. Johnson. Oxford University Press. Pp. 143-157.

Check Me Out! ^^