Assignments
Top

Lesson 7 - Assessment Instruments

Lesson 7 Readings
  • Read Chapter 7, Developing Assessment Instruments, from Dick and Carey.

Background Information

At this point you’ve come quite a ways in the instructional design process. You assessed your needs and wrote an instructional goal statement. Then you analyzed your goal to identify goal steps, substeps, subordinate skills, and entry behaviors. After that you analyzed the learners and both the performance and learning context. And in the last lesson you wrote objectives for each of the skills in your instructional analysis. The next step in the Dick and Carey model is to develop assessment instruments so that you will be able to determine if your learners have achieved your objectives. In the ASSURE model, assessment is not really discussed until the final step (Evaluate and Revise).

As an instructional designer, the emphasis on assessment is an important one as performing appropriate and well-thought out assessment helps us determine what objectives have or have not been learned by the learners and will also help in performing the formative evaluation. As Robert Mager states in his book Making Instruction Work, " If it’s worth teaching, it’s worth finding out whether the instruction was successful. If it wasn’t entirely successful, it’s worth finding out how to improve it" (pg. 83). If you think of objectives as describing where you are going, the assessment items are the means by which you find out whether you got there.

You may wonder why test items are created now, when you haven’t even developed your instruction? Well, the idea is that your assessment items should stem directly from your objectives. The performance asked for in the assessment item should match the performance described in the objective. They should not be based on what you think are good or fun test questions. They should also not be based on what your instructional activities are. In fact, the activities should be based on your objectives and assessment items. The good thing is that if you’ve written worthwhile objectives, you already know what content to test for. Then it’s just a matter of creating good test items that measure the acquisition of the skills, knowledge, or attitudes you are looking for.

Introduction to Assessment

As discussed in Chapter 7 of Dick and Carey, learner-centered assessment is linked very closely to the traditional notion of criterion-referenced tests. The name criterion-referenced is derived from the purpose of the test: to find out whether the criteria stated in an objective have been achieved. Criterion-referenced assessments are composed of items or performance tasks that directly measure skills described in one or more behavioral objectives. The importance of criterion-referenced assessment from an instructional design standpoint is that it is closely linked to instructional goals and a matched set of performance objectives, therefore giving the designers an opportunity to evaluate performance and revise instructional strategies if needed. In other words, criterion-referenced assessment allows instructors to decide how well the learners have met the objectives that were set forth. It also facilitates a reflective process in which learners are able to evaluate their own performance against the stated objectives and assessment items. Smith and Ragan (1999) note that criterion referenced tests have also been referred to as objective-referenced or domain-referenced instruments. They believe that this testing strategy is effective for determining "competency", especially as it relates to meeting instructional objectives.

In contrast to criterion-referenced tests, norm-referenced tests are designed to yield scores that compare each student’s performance with that of a group or with a norm established by group scores. They provide a spread of scores that generally allows decision makers to compare or rank learners. They are not based on each student achieving a certain level of mastery. In fact, in many cases items are selected to produce the largest possible variation in scores among students. As a result, items that all students are able to master are often removed in order to maintain a certain spread of scores. An example of a norm-referenced test would be the SAT test. Scores from this test are used to perform comparisons of students for various purposes (such as college admission). Although this form of assessment can be learner-centered, it differs in the manner in which it defines the content that is to be assessed. In this course we will mainly concern ourselves with criterion-referenced assessment.

Types of Criterion-Referenced Tests

Dick, Carey and Carey discuss four different types of criterion-referenced tests that fit into the design process:

  1. Entry Behaviors Test
  2. Pretest
  3. Practice Tests
  4. Posttests

1. Entry Behaviors Test

An entry behaviors test is given to learners before instruction begins. They are designed to assess learners’ mastery of prerequisite skills. These are the skills that appear below the dotted line you drew on your instructional analysis flowchart. If you have no entry behaviors then there would be no need to develop a pretest. However, if you have entry behaviors that you are unsure about you should test your learners to help determine if they are indeed entry behaviors after all.

2. Pretest

A pretest is used to determine whether learners have already mastered some of the skills in your instructional analysis. If they have, then they do not need as much instruction for those skills. If it becomes obvious that they lack certain skills then your instruction can be developed with enough depth to help them attain those skills. When using a pretest in this manner you are not trying to get a score that you can compare with a later posttest in order to document gains.

A pretest is often combined with an entry behaviors test. However, it is important to keep in mind the purpose of each test. The entry behaviors test determines whether or not students are ready to begin your instruction, while the pretest helps determine which skills in your main instructional analysis they may already be familiar with. However, if you already know that your learners have no clue about the topic you are teaching them, then they may not need a pretest.

3. Practice Tests

Practice tests solicit learner participation during the instruction by providing them with a chance to rehearse the new skills they are being taught. They also allow instructors to provide corrective feedback to keep learners on track.

4. Posttests

Posttests are given following instruction, and help you determine if learners have achieved the objectives you set out for them in the beginning. Each item on a posttest should match one of your objectives, and the test should assess all of the objectives, especially focusing on the terminal objective. If time is a factor, it may be necessary to create a shorter test that assesses only the terminal objective and any important related subskills.

Posttests are used by instructors to assess learner performance and hand out grades, but for the designer the primary purpose of a post-test is to help identify areas where the instruction is not working. If learners are not performing adequately on the terminal objective, then there is something wrong with your instruction, and you will have to identify the areas that are not working. Since each test item should correspond to one of your objectives, it should be relatively easy to figure this out.

Designing Tests & Writing Items

There are quite a few issues to consider when designing assessment instruments. Let’s spend a little time discussing some of the more important ones.

Types of Assessment Items

The first thing we want to look at is the various types of items you can use when creating assessment items. Earlier we discussed different types of tests (Entry Behaviors Test, Pretest, Practice Tests, and Posttests); now we are discussing individual test items. Possible test items include:

  • Essay
  • Fill-in-the-blank
  • Completion
  • Multiple-choice
  • Matching
  • Product checklist
  • Live performance checklist

In the table on page 154, Dick and Carey give some guidelines for selecting item types according to the type of behavior specified in your objective. This table provides a good starting point for deciding on what item type to use for a particular objective. However, when it comes right down to it, the wording of your objective should guide the selection of item type. You should select the type of item that gives learners the best opportunity to demonstrate the performance specified in the objective. For example, if our objective was for students to state the capital of Virginia, it would be best to have them state it from memory (fill-in-the-blank) and not pick it from a list of choices (multiple-choice).

In addition to selecting the appropriate test item type, it is also important to consider the testing environment. If your test items require special equipment and facilities – as specified in the "conditions" component of your objective – you will need to make sure that those things will be available to them. If not, you will need to create a realistic alternative to the ideal test item. Keep in mind that the farther removed the behavior in the assessment is from the behavior specified in the objective, the less likely you will be able to predict if learners can or cannot perform the objective.

Matching Learning Domain and Item Type

The next issue we want to look at is that of matching the learning domain with an appropriate item type. Organizing your objectives according to learning domain can also aid you in selecting the most appropriate type of assessment item. If you remember, Gagné defined four main learning domains (categories):

  1. Verbal Information - Verbal skill objectives generally call for simple objective-style test items. This includes short-answer, matching, and multiple-choice.

  2. Intellectual Skills – Intellectual skills objectives require either objective-style test items, the creation of some product, or a performance of some sort. The product or performance would need to be judged by a checklist of criteria.

  3. Attitudes – Attitude objectives are more problematic since there is not usually a way to directly measure a person’s attitude. Assessment items generally involve observing learners in action and inferring their attitudes, or having learners state their preferences on a questionnaire.

  4. Psychomotor Skills – Psychomotor objectives are usually assessed by having the learner perform a set of tasks that lead to the achievement of the goal. It also requires a checklist or rating scale so that the instructor can determine if each step is performed properly.

Writing Test Items

You should write an assessment item for each objective whose accomplishment you want to measure. Mager provides these steps to follow when writing a criterion assessment item:

  1. Read the objective and determine what it wants someone to be able to do (i.e., identify the performance).
  2. Draft a test item that asks students to exhibit that performance.
  3. Read the objective again and note the conditions under which the performing should occur (i.e., tools and equipment provided, people present, key environmental conditions).
  4. Write those conditions into your item.
  5. For conditions you cannot provide, describe approximations that are as close to the objective as you can imagine.
  6. If you feel you must have more than one item to test an objective, it should be because (a) the range of possible conditions is so great that one performance won’t tell you that the student can perform under the entire range of conditions, or (b) the performance could be correct by chance. Be sure that each item calls for the performance stated in the objective, under the conditions called for.

If you follow these steps and still find yourself having trouble drafting an assessment item, it is almost always be because the objective isn’t clear enough to provide the necessary guidance.

Criteria for Writing Test Items

Dick and Carey list several criteria that you should consider when writing test items:

  1. Goal-Centered Criteria
  2. Learner-Centered Criteria
  3. Context-Centered Criteria
  4. Assessment-Centered Criteria

Let’s take a brief look at each one.

Goal-Centered Criteria

As we have inferred already, test items should be congruent with the terminal and performance objectives by matching the behavior involved. What this means is that each test item should measure the exact behavior and response stated in the objective. The language of the objective should guide the process of writing the assessment items. A well-written objective will prescribe the form of test item that is most appropriate for assessing achievement of the objective. Appropriate assessment items should answer "yes" to the following questions:

  1. Does the assessment item require the same performance of the student as specified in the instructional objective?
  2. Does the assessment item provide the same conditions (or "givens") as those specified in the instructional objective?

For example, if the performance of an objective states that learners will be able to state or define a term, the assessment item should ask them to state or define the term, not to choose the definition from a list of answers.

Another common bad practice is teaching one thing and then testing for another. You should not use a test item that asks for a different performance than the one called for by your objectives. For example, if you have an objective that says students need to be able to make change, it would be deceitful to then have test items such as the following:

  1. Define money
  2. Name the president on the fifty-dollar bill.
  3. Describe the risks of not being able to count.

None of these items asks the student to do what the objective asks, which is to make change. As a result you will not know if your students can perform as required. Kemp, Morrison, and Ross (1998) cite several more examples of "mismatches" between objectives and assessment. In one example a college professor whose objective asked students to analyze and synthesize developments of the Vietnam War simply asked students to list those developments in the final exam. Other examples include a corporate training course on group leadership skills that included objectives that were performance or skill-based, yet the sole assessment items were multiple-choice. As these examples illustrate, it is important to determine which learning domain your objective falls into in order to write the most appropriate type of assessment.

So why do we keep saying that the performance indicated in the assessment item has to match the performance in the objective? Well, the point of testing is to be able to predict whether your learners will be able to do what you want them to do when they leave you, and the best way to do that is to observe the actual performance that you are trying to develop. Mager (1988) provides a good story to illustrate this point. Suppose your surgeon were standing over you with gloved hands and the following conversation took place:

Surgeon: Just relax. I’ll have that appendix out in no time.
You: Have you done this operation before?
Surgeon: No, but I passed all the tests.
You: Oh? What kind of tests?
Surgeon: Mostly multiple-choice. But there were some essay items, too.
You: Goodbye!

So, would you prefer your surgeon to have had some meaningful, practical types of assessments or strictly paper-and-pencil-tests?

Learner-Centered Criteria

Test items should take into consideration the characteristics and needs of the learners. This includes issues such as learners’ vocabulary and language levels, motivational and interest levels, experiences and backgrounds, and special needs. To start with, test items should be written using language and grammar that is familiar to the learners. Another important aspect of learner-centered assessment is that the level of familiarity of experiences and contexts needs to be taken into consideration. Learners should not be asked to demonstrate a desired performance in an unfamiliar context or setting. The examples, question types, and response formats should also be familiar to learners, and your items should be free of any gender, racial, or cultural bias.

Context-Centered Criteria

Remember the context analysis you wrote? Well, when writing test items you should consider both the performance context and the learning context your wrote about. It is important to make your test items as realistic and close to the performance setting as possible. This will help ensure the transfer of skills from the learning environment to the eventual performance environment. According to Dick and Carey, "the more realistic the testing environment, the more valid the learners’ responses will be" (pg. 153). It is also important to make sure the learning environment contains all the necessary tools to adequately simulate the performance environment.

Assessment-Centered Criteria

Test items should be well written and free of spelling, grammar, and punctuation errors. Directions should be clearly written to avoid any confusion on the part of the learner. It’s also important to avoid writing "tricky" questions that feature double negatives, deliberately confusing directions, or compound questions. Your learners should miss questions because they do not have the necessary skill, not because your directions were unclear, or because you wanted to throw them off with unclear wording.

Dick and Carey provide a checklist of these four criteria on page 165. Use this checklist as you create your own test items.

How Many Items?

The question inevitable arises as to how many items are necessary to achieve mastery of an objective? For some skills only one item is necessary. However, for others it may require more than one item. For example, a second grade student may be asked to demonstrate his mastery of an arithmetic rule by means of the item: 3M + 2M = 25; M=? Obviously, the purpose of assessment would be to determine if the student could perform a class of arithmetic operations similar to this, not whether he or she is able to perform this single one. Generally items of the same type and class would be employed to ensure the reliability of the results.

Also, on any single item a student may make a correct response because he or she has seen the correct response before, or perhaps has just made a good guess. In this case several items may be warranted. With some assessment items, though, guessing is not something that could be rewarded, so you may only require a single performance. Another possibility is that a single item may be missed because a student has been misled by some confusing characteristic of the item, or has simply made a "stupid" mistake.

It is essential to keep in mind that, no matter how many items are created for an objective, the conclusion aimed for should not be, "how many did they get correct?" but rather, "does the number correct indicate mastery of the objective?" Also keep in mind that while two items may be better than one, it may also yield a 50-50 result, with a student getting one right and one wrong. Would this indicate mastery? Gagné (1988) suggests having three items in this case instead of two, as two out of three provides a better means of making a reliable decision about mastery.

Assessment of Performances, Products, and Attitudes

Some intellectual skills, as well as psychomotor and attitudinal skills, cannot be assessed using common objective-type test items. They require either the creation of some type of product or a performance of some sort. These types of performances need to be assessed using an evaluation or observation instrument. Dick and Carey suggest that you provide guidance during the learning activities and construct a rubric to assist in the evaluation of the performance or product.

Attitudes are unique in that they require in that they are not directly measurable. Instead, the best way to assess attitudes is to observe the learner exhibiting or not exhibiting the desired attitude. During observation, it is important that the learners be given the choice to behave according to their attitudes. If you are observing a performance and the learners know they are being observed their behavior may not reflect their true attitudes. If direct observation isn’t possible you can have students respond to a questionnaire or open-ended essay question. Much care should be taken when constructing such tests, though. If you simply give them a test with leading questions and/or directions describing the nature of the test they are likely to give you the answer they think you want to read. The results would not tell you how they would act when faced with a real-world situation involving that attitude.

Dick and Carey make the following suggestions regarding the development of these types of assessment instruments:

Writing Directions

Directions for performance and products should clearly describe what is expected of the learners. You should include any special conditions and decide on the amount of guidance you will provide during the assessment. In some situations you may want to provide no guidance.

Developing the Instrument

When assessing performances, products, or attitudes you will need to create an assessment instrument to help you evaluate the performance, product, or attitude. Dick and Carey offer five steps to creating this instrument:

  1. Identify the elements to be evaluated – these elements should be taken directly from the behaviors and criteria included in your objectives. Make sure that the elements you select can be observed during the performance.

  2. Paraphrase each element – elements should be paraphrased to cut down on the length of the instrument. Also, make sure that a "Yes" response on the instrument always corresponds with a positive performance, and a "No" response with a negative performance.

  3. Sequence the elements on the instrument – the order in which the elements are listed should match the natural order of the performance. For example, if you are creating an instrument to help assess the changing of a tire you would not put "Tightens lug nuts on new tire" at the top of the list.

  4. Select the type of judgment to be made by the evaluator – When evaluating a performance, product, or attitude, judgments can be made using checklists, rating scales, or frequency counts. Checklists provide a simple "yes" or "no" as to whether or not a learner meets a criterion or element. Rating scales take this a step further by allowing for in-between ratings instead of strictly "yes" or "no". Frequency counts are used for indicating the number of times a learner meets or displays a criterion or element. This is good if the element can be observed more than once.

  5. Determine how the instrument will be scored – With checklists you can simply add up the "yes" answers to obtain a score for each objective and for the entire process or product. With rating scales you can add up the numbers assigned for each element. Frequency counts are a little more complicated as you have to determine how to create a score. You can add up the frequencies for an element, but you would still have to determine what constitutes a good score and whether a lack of an occurrence would be detrimental.

Dick and Carey provide good examples of assessment instruments for evaluating psychomotor skills and attitudes on page 169 and 170 of their book.

Portfolio Assessment

Many of you are probably familiar with this type of assessment. Portfolios are collections of work that together represent learners’ achievements over an extended period of time. This could include tests, products, performances, essays, or anything else related to the goals of the portfolio. They allow you to assess learners’ work as well as their growth during the process. As with all other forms of assessment, whatever is included in the portfolio must be related to specific goals and objectives. The choice of what to include can be decide on entirely by the teacher, or in cooperation with students. Assessment of each portfolio component is done as it is completed, and the overall assessment of the portfolio is carried out at the end of the process using rubrics. In addition, learners are given the opportunity to assess their own work by reflecting on the strengths and weaknesses of various components. Portfolios can also be used as part of the evaluation process to determine what students did and did not learn, and then that information can be used to strengthen the instruction.

Evaluating Congruence in the Design Process

One of the most crucial aspects of the assessment phase of the design process is to be able to evaluate the congruence of the assessment against the objectives and analyses that have been performed. Remember that this is a systematic approach to instructional design, which means that every step in the process influences subsequent steps. As such, all of your skills, objectives, and assessment items should be parallel. One way to clearly represent this relationship is to create a three-column table that lists each of the skills from your instructional analysis, the accompanying objective, and the resulting assessment item. At the bottom of the table you would finish up with your main instructional goal, the terminal objective, and the test item for the terminal objective.

Design Evaluation Chart

Skill

Objective

Assessment Item(s)

1

Objective 1

Test item

2

Objective 2

Test item

3

Objective 3

Test item

Instructional Goal

Terminal Objective

Test item

It is important at this point to make sure that your design is adequate so that you will be able to move on to the next step in the instructional design process. The next step involves developing an instructional strategy based on all of the design work you have done up to this point. But before we move on, let’s close with one more note from Mager regarding objectives and assessment:

If you write your test items according to the above procedures, and if you find yourself saying, "But the test items look pretty much like the objectives," you need to have a little chat with yourself. Remember that the objective of instruction is to bestow competence just as elegantly as you can manage to do it. The object is not to use trick questions just to make it harder, or to spread people on a curve, or to find out whether students "really" understand. The object is to find out whether they have achieved the objectives you derived for them to achieve. If your test items look similar to your objectives, rejoice. They’re supposed to look similar.

Examples

Here are some examples of good and bad assessment items:

Objective: The student will state the time shown on an analog clock to the nearest 5 minutes.

Bad assessment: Students are given a time and are asked to draw the corresponding minute and hour hands on a blank clock diagram.
Good assessment
: Students are given pictures of analog clock faces and are asked to state the time indicated on each clock.

Objective: The student will set up an attractive merchandise display in the student store, with appropriate signs.

Bad assessment: Students are asked to write a paragraph describing the six elements of an attractive merchandise display.
Good
assessment: Students are scheduled a week to set up an attractive merchandise display in the student bookstore. Displays are evaluated using a criteria checklist.

Objective: Students will write a descriptive essay of at least 300 words.

Bad assessment: Have students read several examples of good essays.
Bad assessment
: Write a descriptive essay in class by having each student contribute a sentence.
Bad assessment
: Have each student orally describe an unknown object until the other students can guess what the object is.
Good assessment
: Have students choose a topic and write an essay describing it.

In addition, if you return to Appendix D in the Dick and Carey book you will see that they have a Design Evaluation Chart that lists the skills, objectives, and test items for a portion of their project on story writing. This will provide a good example for you to follow.

Instructional Design Project Part Four (cont.)

The activities in this lesson should be added to the document you began in the last lesson (objectives.doc). If you recall, in the last lesson you began Part Four of your ID Project by writing objectives for each of the skills and subskills in your instructional analysis. Now that you have drafted a list of objectives describing what you want your learners to be able to do after your instruction, it is time to create test items that will determine whether or not they have achieved those objectives. While you may want to administer an entry behaviors test or pretest to your learners, for now we will concentrate on creating posttest assessment items. To complete Part Four of your ID Project, perform the following tasks:

To begin with, write down each of your objectives in order, including the terminal objective. Beneath each one, answer the following questions:

  1. According to your objective, what is it that the learner will need to do?
  2. What conditions will need to be provided for this performance to occur?
  3. What type of learning domain is covered by this objective: verbal information, intellectual skill, psychomotor skill, or attitude?
  4. What type of test item will you need for this objective? Will it be an objective-style test item, or will you need to create a checklist or rating scale to evaluate a product, performance, or attitude? If it is an objective-style test item, which type of item will be most congruent with the prescribed behavior and conditions (e.g., multiple-choice, matching, essay, etc.)?

When you have answered these questions, create a criterion-referenced assessment item or evaluation tool for the objective. The criterion-referenced items or evaluation tools do not need to be paper-and-pencil tests, but they must accurately assess the behavior or performance called for by each of your objectives, and they should attempt to provide the conditions stated in the objective. If you feel you need more than one item in order to assess achievement of the objective, feel free to include them. However, at this point you are only required to create one item per objective.

If you are assessing a product, performance, or attitude, you will not create an objective-type item. Instead, describe the product you will have them create or the behavior you will have them perform. Then, list some of the criteria you would include in a checklist or rating scale for that item. These criteria should reflect the characteristics of the product, the steps in the performance, or the items you will use to determine the presence of the attitude. Also indicate how these criteria will be rated.

When writing your assessment items, keep in mind the four categories of test item qualities that were discussed earlier:

  1. Goal-Centered Criteria
  2. Learner-Centered Criteria
  3. Context-Centered Criteria
  4. Assessment-Centered Criteria

If you need to, use the checklist on page 165 of your book to help you evaluate your assessment items.

Here’s an example of what your document should look like for each objective. This example shows an intellectual skill with an objective-type test item.

Objective 4.0 – Given a research topic and a list of ten AltaVista search results, select the three sites most appropriate to the research topic.

  1. What will they need to do? The learners should be able to select web sites from a list of search results.
  2. What conditions will need to be provided? The learners will need to be given a predetermined research topic and a list of actual AltaVista search results related to that topic.
  3. Domain – Intellectual Skills: Rules. Students have to apply a set of criteria in order to make a decision.
  4. This objective will require an objective-style fill-in-the-blank test item, as the students will have to write down the three most appropriate sites based on certain criteria.

Test Item 1 - Take a look at the following AltaVista search results: (show screen capture of search results). Which 3 web sites are likely to have specific and relevant information dealing with the subject of Life on Mars?

When you are finished you should have an assessment item for each of your objectives. The final thing you need to do is create a design evaluation chart that indicates the congruence between your skills, objectives, and assessment items. Following the examples in the book, create a three-column table. In the left column list all of your goal steps, substeps, and subordinate skills in order. In the middle column list the accompanying objective for each skill. In the last column list your assessment item(s) for that skill and objective. Make sure that everything is lined up properly, so that it is obvious which skill goes with which objective, and which objective goes with which test item.

Submitting Part Four of Your ID Project

At the end of this lesson you will submit your completed Part Four. To recap, Part Four of your ID Project should be typed up in Microsoft Word. At the top of the paper you should have "ID Project Part Four: Objectives and Assessment". Underneath that should be your name, email address, and the date. Also, make sure the file is named "objectives.doc". When you have completed Part Four, upload the Word document to the "instrdes" folder in your Filebox. When you have finished uploading your file, proceed to the online student interface to officially submit your activities for grading.

Assignment: ID Project Part Four (cont.)
Points:
20

Grading Criteria:

  • For each objective, includes a description of what the learner will need to do in the assessment. (1)
  • For each objective, includes a description of the conditions that will need to be provided in the assessment. (1)
  • For each objective, describes the learning domain covered by the objective. (1)
  • For each objective, identifies the type of test item that is appropriate for assessing that objective. (1)
  • Criterion-referenced assessment item or evaluation tool created for each objective that will accurately assess the behavior or performance called for by the objective. Realistically provides for the conditions stated in the objective. Product, performance, or attitude assessment items should describe the product or performance and list the criteria that would be included on a checklist or rating scale for that item. (10)
  • Three-column design evaluation chart created that indicates the congruence between skills, objectives, and assessment items. (6)