Lesson 6 ID Review: Assessment Instruments

Background Information

The next step in the Dick, Carey and Carey instructional design model is to develop assessment instruments so that you will be able to determine if your learners have achieved your objectives. As an instructional designer, the emphasis on assessment is an important one as performing appropriate and well-thought out assessment helps us determine what objectives have or have not been learned by the learners and will also help in performing the formative evaluation. As Robert Mager states in his book Making Instruction Work, " If it's worth teaching, it's worth finding out whether the instruction was successful. If it wasn't entirely successful, it's worth finding out how to improve it" (pg. 83). Once again, if you think of objectives as describing where you are going, the assessment items are the means by which you find out whether you got there.

Your assessment items should be prescribed by your objectives. This means that the performance asked for in the assessment item should match the performance described in the objective. They should not be based on what you think are good or fun test questions. They should also not be based on what your instructional activities are. In fact, your instructional activities should be based on your objectives and assessment items, not the other way around. The good thing is that if you've written worthwhile objectives, you already know what content to test for. Then it's just a matter of creating good test items that measure the acquisition of the skills, knowledge, or attitudes you are looking for.

Introduction to Assessment

Learner-centered assessment is linked very closely to the traditional notion of criterion-referenced tests. The name criterion-referenced is derived from the purpose of the test: to find out whether the criteria stated in an objective have been achieved. Criterion-referenced assessments are composed of items or performance tasks that directly measure skills described in one or more behavioral objectives. The importance of criterion-referenced assessment from an instructional design standpoint is that it is closely linked to instructional goals and a matched set of performance objectives, therefore giving the designers an opportunity to evaluate performance and revise instructional strategies if needed. In other words, criterion-referenced assessment allows instructors to decide how well the learners have met the objectives that were set forth. It also facilitates a reflective process in which learners are able to evaluate their own performance against the stated objectives and assessment items. Smith and Ragan (1999) note that criterion referenced tests have also been referred to as objective-referenced or domain-referenced instruments. They believe that this testing strategy is effective for determining "competency", especially as it relates to meeting instructional objectives.

In contrast to criterion-referenced tests, norm-referenced tests are designed to yield scores that compare each student's performance with that of a group or with a norm established by group scores. They provide a spread of scores that generally allows decision makers to compare or rank learners. They are not based on each student achieving a certain level of mastery. In fact, in many cases items are selected to produce the largest possible variation in scores among students. As a result, items that all students are able to master are often removed in order to maintain a certain spread of scores. An example of a norm-referenced test would be the SAT test. Scores from this test are used to perform comparisons of students for various purposes (such as college admission). Although this form of assessment can be learner-centered, it differs in the manner in which it defines the content that is to be assessed.

Designing Tests & Writing Items

There are quite a few issues to consider when designing assessment instruments. Let's spend a little time discussing some of the more important ones.

Types of Assessment Items

The first thing we want to look at is the various types of items you can use when creating assessment instruments. Possible test items include:

Essay
Fill-in-the-blank
Completion
Multiple-choice
Matching
Product checklist
Live performance checklist

In the Table 7.1 (page 140, 7th edition), Dick, Carey and Carey give some guidelines for selecting item types according to the type of behavior specified in your objective. This table provides a good starting point for deciding on what item type to use for a particular objective. However, when it comes right down to it, the wording of your objective should guide the selection of item type. You should select the type of item that gives learners the best opportunity to demonstrate the performance specified in the objective. For example, if our objective was for students to state the capital of Virginia, it would be best to have them state it from memory (fill-in-the-blank) and not pick it from a list of choices (multiple-choice).

In addition to selecting the appropriate test item type, it is also important to consider the testing environment. If your test items require special equipment and facilities, as specified in the "conditions" component of your objective, you will need to make sure that those things will be available to them. If not, you will need to create a realistic alternative to the ideal test item. Keep in mind that the farther removed the behavior in the assessment is from the behavior specified in the objective, the less likely you will be able to predict if learners can or cannot perform the objective.

Matching Learning Domain and Item Type

The next issue we want to look at is that of matching the learning domain with an appropriate item type. Organizing your objectives according to learning domain can also aid you in selecting the most appropriate type of assessment item. Gagne defined four main learning domains (categories):

Verbal Information: Verbal skill objectives generally call for simple objective-style test items. This includes short-answer, matching, and multiple-choice.

Intellectual Skills: Intellectual skills objectives require either objective-style test items, the creation of some product, or a performance of some sort. The product or performance would need to be judged by a checklist of criteria.

Attitudes: Attitude objectives are more problematic since there is not usually a way to directly measure a person's attitude. Assessment items generally involve observing learners in action and inferring their attitudes, or having learners state their preferences on a questionnaire.

Psychomotor Skills: Psychomotor objectives are usually assessed by having the learner perform a set of tasks that lead to the achievement of the goal. It also requires a checklist or rating scale so that the instructor can determine if each step is performed properly.

Writing Test Items

You should write an assessment item for each objective whose accomplishment you want to measure. Mager provides these steps to follow when writing a criterion assessment item:

Read the objective and determine what it wants someone to be able to do (i.e., identify the performance).

Draft a test item that asks students to exhibit that performance.

Read the objective again and note the conditions under which the performing should occur (i.e., tools and equipment provided, people present, key environmental conditions).

Write those conditions into your item.

For conditions you cannot provide, describe approximations that are as close to the objective as you can imagine.

If you feel you must have more than one item to test an objective, it should be because (a) the range of possible conditions is so great that one performance won't tell you that the student can perform under the entire range of conditions, or (b) the performance could be correct by chance. Be sure that each item calls for the performance stated in the objective, under the conditions called for.

If you follow these steps and still find yourself having trouble drafting an assessment item, it is almost always be because the objective isn't clear enough to provide the necessary guidance.

Criteria for Writing Test Items

Dick, Carey and Carey list several criteria that you should consider when writing test items:

Goal-Centered Criteria

Learner-Centered Criteria

Context-Centered Criteria

Assessment-Centered Criteria

Let's take a brief look at each one.

Goal-Centered Criteria

As we have inferred already, test items should be congruent with the terminal and performance objectives by matching the behavior involved. What this means is that each test item should measure the exact behavior and response stated in the objective. The language of the objective should guide the process of writing the assessment items. A well-written objective will prescribe the form of test item that is most appropriate for assessing achievement of the objective. Appropriate assessment items should answer "yes" to the following questions:

Does the assessment item require the same performance of the student as specified in the instructional objective?

Does the assessment item provide the same conditions (or "givens") as those specified in the instructional objective?

For example, if the performance of an objective states that learners will be able to state or define a term, the assessment item should ask them to state or define the term, not to choose the definition from a list of answers.

Another common bad practice is teaching one thing and then testing for another. You should not use a test item that asks for a different performance than the one called for by your objectives. For example, if you have an objective that says students need to be able to make change, it would be deceitful to then have test items such as the following:

Define money

Name the president on the fifty-dollar bill.

Describe the risks of not being able to count.

None of these items asks the student to do what the objective asks, which is to make change. As a result you will not know if your students can perform as required. Kemp, Morrison, and Ross (1998) cite several more examples of "mismatches" between objectives and assessment. In one example a college professor whose objective asked students to analyze and synthesize developments of the Vietnam War simply asked students to list those developments in the final exam. Other examples include a corporate training course on group leadership skills that included objectives that were performance or skill-based, yet the sole assessment items were multiple-choice. As these examples illustrate, it is important to determine which learning domain your objective falls into in order to write the most appropriate type of assessment.

So why do we keep saying that the performance indicated in the assessment item has to match the performance in the objective? Well, the point of testing is to be able to predict whether your learners will be able to do what you want them to do when they leave you, and the best way to do that is to observe the actual performance that you are trying to develop. Mager (1988) provides a good story to illustrate this point. Suppose your surgeon were standing over you with gloved hands and the following conversation took place:

Surgeon: Just relax. I'll have that appendix out in no time.
You: Have you done this operation before?
Surgeon: No, but I passed all the tests.
You: Oh? What kind of tests?
Surgeon: Mostly multiple-choice. But there were some essay items, too.
You: Goodbye!

So, would you prefer your surgeon to have had some meaningful, practical types of assessments or strictly paper-and-pencil-tests?

Learner-Centered Criteria

Test items should take into consideration the characteristics and needs of the learners. This includes issues such as learners' vocabulary and language levels, motivational and interest levels, experiences and backgrounds, and special needs. To start with, test items should be written using language and grammar that is familiar to the learners. Another important aspect of learner-centered assessment is that the level of familiarity of experiences and contexts needs to be taken into consideration. Learners should not be asked to demonstrate a desired performance in an unfamiliar context or setting. The examples, question types, and response formats should also be familiar to learners, and your items should be free of any gender, racial, or cultural bias.

Context-Centered Criteria

Remember the context analysis you wrote? Well, when writing test items you should consider both the performance context and the learning context your wrote about. It is important to make your test items as realistic and close to the performance setting as possible. This will help ensure the transfer of skills from the learning environment to the eventual performance environment. According to Dick, Carey and Carey, "the more realistic the testing environment, the more valid the learners' responses will be" (pg. 153). It is also important to make sure the learning environment contains all the necessary tools to adequately simulate the performance environment.

Assessment-Centered Criteria

Test items should be well written and free of spelling, grammar, and punctuation errors. Directions should be clearly written to avoid any confusion on the part of the learner. It's also important to avoid writing "tricky" questions that feature double negatives, deliberately confusing directions, or compound questions. Your learners should miss questions because they do not have the necessary skill, not because your directions were unclear, or because you wanted to throw them off with unclear wording.

How Many Items?

The question inevitable arises as to how many items are necessary to achieve mastery of an objective? For some skills only one item is necessary. However, for others it may require more than one item. For example, a second grade student may be asked to demonstrate his mastery of an arithmetic rule by means of the item:

3M + 2M = 25; M = ?

Obviously, the purpose of assessment would be to determine if the student could perform a class of arithmetic operations similar to this, not whether he or she is able to perform this single one. Normally, items of the same type and class would be employed to ensure the reliability of the results.

Also, on any single item a student may make a correct response because he or she has seen the correct response before, or perhaps has just made a good guess. In this case several items may be warranted. With some assessment items, though, guessing is not something that could be rewarded, so you may only require a single performance. Another possibility is that a single item may be missed because a student has been misled by some confusing characteristic of the item, or has simply made a "stupid" mistake.

It is essential to keep in mind that, no matter how many items are created for an objective, the conclusion aimed for should not be, "how many did they get correct?" but rather, "does the number correct indicate mastery of the objective?" Also keep in mind that while two items may be better than one, it may also yield a 50-50 result, with a student getting one right and one wrong. Would this indicate mastery? Gagne (1988) suggests having three items in this case instead of two, as two out of three provides a better means of making a reliable decision about mastery.

Assessment of Performances, Products, and Attitudes

Some intellectual skills, as well as psychomotor and attitudinal skills, cannot be assessed using common objective-type test items. They require either the creation of some type of product or a performance of some sort. These types of performances need to be assessed using an evaluation or observation instrument. Dick, Carey and Carey suggest that you provide guidance during the learning activities and construct a rubric to assist in the evaluation of the performance or product.

Attitudes are unique in that they require in that they are not directly measurable. Instead, the best way to assess attitudes is to observe the learner exhibiting or not exhibiting the desired attitude. During observation, it is important that the learners be given the choice to behave according to their attitudes. If you are observing a performance and the learners know they are being observed their behavior may not reflect their true attitudes. If direct observation isn't possible you can have students respond to a questionnaire or open-ended essay question. Much care should be taken when constructing such tests, though. If you simply give them a test with leading questions and/or directions describing the nature of the test they are likely to give you the answer they think you want to read. The results would not tell you how they would act when faced with a real-world situation involving that attitude.

Dick, Carey and Carey make the following suggestions regarding the development of these types of assessment instruments:

Writing Directions

Directions for performances and products should clearly describe what is expected of the learners. You should include any special conditions and decide on the amount of guidance you will provide during the assessment. In some situations you may want to provide no guidance.

Developing the Instrument

When assessing performances, products, or attitudes you will need to create an assessment instrument to help you evaluate the performance, product, or attitude. Dick, Carey and Carey offer five steps to creating this instrument:

Identify the elements to be evaluated: These elements should be taken directly from the behaviors and criteria included in your objectives. Make sure that the elements you select can be observed during the performance.

Paraphrase each element: Elements should be paraphrased to cut down on the length of the instrument. Also, make sure that a "Yes" response on the instrument always corresponds with a positive performance, and a "No" response with a negative performance.

Sequence the elements on the instrument: The order in which the elements are listed should match the natural order of the performance. For example, if you are creating an instrument to help assess the changing of a tire you would not put "Tightens lug nuts on new tire" at the top of the list.

Select the type of judgment to be made by the evaluator: When evaluating a performance, product, or attitude, judgments can be made using checklists, rating scales, or frequency counts. Checklists provide a simple "yes" or "no" as to whether or not a learner meets a criterion or element. Rating scales take this a step further by allowing for in-between ratings instead of strictly "yes" or "no". Frequency counts are used for indicating the number of times a learner meets or displays a criterion or element. This is good if the element can be observed more than once.

Determine how the instrument will be scored: With checklists you can simply add up the "yes" answers to obtain a score for each objective and for the entire process or product. With rating scales you can add up the numbers assigned for each element. Frequency counts are a little more complicated as you have to determine how to create a score. You can add up the frequencies for an element, but you would still have to determine what constitutes a good score and whether a lack of an occurrence would be detrimental.

Dick, Carey and Carey provide good examples of assessment instruments for evaluating psychomotor skills and attitudes on pages 169 and 171 of their book.

Examples

Here are some examples of good and bad assessment items:

Objective: The student will state the time shown on an analog clock to the nearest 5 minutes.

Bad assessment: Students are given a time and are asked to draw the corresponding minute and hour hands on a blank clock diagram.
Good assessment: Students are given pictures of analog clock faces and are asked to state the time indicated on each clock.

Objective: The student will set up an attractive merchandise display in the student store, with appropriate signs.

Badassessment: Students are asked to write a paragraph describing the six elements of an attractive merchandise display.
Good assessment: Students are scheduled a week to set up an attractive merchandise display in the student bookstore. Displays are evaluated using a criteria checklist.

Objective: Students will write a descriptive essay of at least 300 words.

Bad assessment: Have students read several examples of good essays.
Bad assessment: Write a descriptive essay in class by having each student contribute a sentence.
Bad assessment: Have each student orally describe an unknown object until the other students can guess what the object is.
Good assessment: Have students choose a topic and write an essay describing it.

ID Review: Assessment Instruments

Suggested Readings

Background Information

Introduction to Assessment

Designing Tests & Writing Items

Types of Assessment Items

Matching Learning Domain and Item Type

Writing Test Items

Criteria for Writing Test Items

How Many Items?

Assessment of Performances, Products, and Attitudes

Examples