Item Formats & Standards
Types of Test Item Formats
Test items can be written in various formats, including multiple choice, matching, true/false, short answer, and essay.These formats vary in their strengths and weaknesses, and no one format is ideal in all circumstances.
The first three formats are known as selected-response formats, because the student sees the possible answers and has to choose (or select) the correct one.
- Multiple Choice. The student is given directions or a question, and needs to choose the correct or best answer from several possible responses.
- Matching. The student is given two lists of words or phrases, and for each item in the first list, must choose the correct item from the second list to go with it. The matching format is actually a type of multiple choice, where each test item has the same list of possible responses.
- True/False. This venerable format presents a statement, and the student must mark whether it is correct or not. It may take the form of true/false, yes/no, agree/disagree, or others.
The last two formats are known as constructed-response formats, because the student has to come up with the answers on his own.
- Short Answer. The student is given a statement that requires him to fill in the blank(s), either in the statement itself, or at the end of it. As short answer implies, the expected response is usually a word or phrase. This format is common in curriculum-based tests.
- Essay. This item format requires a longer answer that usually requires more creative thought or memory. The student may be asked to describe, discuss, or summarize a given subject. This format requires the most critical thinking skills and is also the most challenging for the examiner to score.
We will discuss only the three selected-response items for now.
The Multiple Choice Item Format
The multiple choice (MC) format is the most commonly used format in formal testing. It typically consists of a stem and three or more distractors, but can vary widely. The matching format can be thought of as an MC format, where several items share the same group of options. Multiple choice is popular for several reasons:
- No subjective evaluation is required in scoring (the answer is either right or wrong, best or not best, not half-right or partly wrong).
- It lends itself to detailed analysis of responses, in which even incorrect answers can provide information on the student's skills.
- It lends itself well to computer scoring.
There is also one significant drawback to multiple choice. As a selected-response format, it is unable to test writing skills, including organization of thought and originality. These skills are generally beyond the scope of a standardized achievement test.
In addition to the general characteristics of a good test item noted in Introduction to Test Items, there are some specific guidelines to follow when writing or evaluating MC items. Some relate to the stem, some to the options.
Characteristics of a Good Multiple Choice Item
- The stem should clearly state the problem. A good stem is often clear enough that a competent student can answer the item correctly without seeing any of the options.
- The stem should contain as much as the item as possible, but no more. There is no point in redundantly repeating something in each option that can be stated in the stem. On the other hand, the stem should not wordy nor contain irrelevant information, known as window dressing. One exception would be a problem presented that requires the student to determine which facts presented are necessary to solve the problem.
- The stem should, in most cases, be worded positively and in the active voice. When negatives do need to be used, they must be accentuated in boldface or ALLCAPS.
- Use "story problems" – literally or figuratively – to present scenarios that require comprehension and analysis, not merely recall of the concept.
- Always keep in mind that the primary goal in writing the response options in MC is to make it difficult for an uninformed person who is skilled at testing to figure out the correct answer. Knowledge of the construct being evaluated ideally plays the only factor in correctly answering an MC or any other item format.
- Three or four options are best. It is difficult to write more than two or three plausible distractors. The various authors of the Handbook of Test Development range from mild to strongly-worded support of only three options.
- All options should be parallel in structure and similar in length. The item is more readable, and there will be no obvious clues as to which options may be correct or are obviously incorrect.
- Options must be grammatically consistent with the stem in order to prevent elimination of distractors.
- All options must be plausible. If someone skilled, or at least comfortable, in a testing environment, were to take a test on a subject of which he knew nothing, he should not be able to dismiss options that seem to be implausible.
- Distractors should reflect typical student errors, which makes them more plausible and more valuable in analyzing student performance.
- The option, "All of the above", is confusing and should generally be avoided. The option, "None of the above", should only be used when there is one absolutely correct answer, as in spelling or math.
- Options should avoid clang associations, in which the correct answer contains a word or phrase from the stem that the distractors lack.
- Options should be placed in a logical order, such as numerical, alphabetical, or response length. On the other hand, placement of the correct response should be random. Any discernable pattern of correct answers can invalidate a test.
- Options should not overlap each other; one option should not be a partial version of another.
The Matching Item Format
As was mentioned earlier, the matching format can be considered a type of multiple choice. The matching format is common in curriculum-based tests. It is sometimes used to good advantage and sometimes very poorly done. Some of the strengths of the matching format are:
- It is easy to construct. Since options are used for more than one item, not nearly as much effort needs to be put into constructing each individual item.
- It is compact in size. An individual item usually takes only a fraction of the space occupied by one conventional MC item.
- It is usually time efficient for the test taker. He only needs to analyze one set of options for multiple items, provided the matching group is competently designed.
- It is very useful for working with groups of homogenous items, for example, matching states with their capitals.
There can also be some serious weaknesses in the matching item format, which could make an entire section of test items invalid. Some things to look out for:
- Cued answers. A competent test-taker can usually get one or more items correct "for free", by using the process of elimination. A group of ten items with ten options often means that a student needs to know, at most, the answers to nine of the items.
- Non-homogenous options. Many, many groups of matching items are practically worthless because they mix totally unrelated things together as options. In such cases, a skilled student can use the process of elimination to dramatically increase his score, and very little valid testing has taken place.
- Excessively large groups of items or options. Since each item has the entire set of options as answer possibilities, a student may become overwhelmed with the amount of choices from which to select the correct answer.
Recommendations for the Matching Format
- There should be more options than items. This will reduce the effectiveness of elimination and guessing.
- Even better, the group of items should be designed to use some options more than once and some options not at all. This will nullify the process of elimination completely. If this is done, it must be explicitly stated in the directions. For example: "Answers may be used once, more than once, or not at all."
- If options may be used more than once, the pool of options can be much smaller and less confusing. Some very effective item groups of ten or more may have only three options for the entire group. An example would be listing various geographic and political characteristics of North American countries and having Canada, the United States, and Mexico as the three options.
- Options must be homogenous. Do not mix crops with rivers or Roman numerals with geometric shapes. Note that items do not necessarily need to be homogenous, as long as the list of possible answers is. The idea is to prevent elimination based on test-taking skill.
- Items and options should be ordered alphabetically or in some other logical arrangement. As in multiple choice, correct answers should form no discernable pattern.
The True/False Item Format
The true/false (T/F) format is limited in usefulness compared with most other formats, but is still common. A few reasons for its refusal to fade into oblivion are the relative ease of writing a true/false item and the ease and objectivity of scoring it. There are more problems than benefits, however:
- T/F items tend to focus on trivial facts, rather than significant concepts. As a result, they tend to be either too easy or unreasonably difficult.
- T/F items are much more likely to be ambiguous or "tricky" to answer. Often the answer turns on a single word. A student may need to analyze multiple words in the item to catch the one that is incorrect.
- T/F items are too rewarding for guessers, since a random answer has a 50% chance of being correct. On a curriculum-based test, where a passing score typically is 75% - 80%, a chance of 50% may not be enough to boost the overall test grade. On a norm-referenced achievement test, guessing with a chance of 50% may significantly affect the overall score.
Suggestions for True/False Items
- Avoid vague, indefinite, or broad terms in favor of precise statements. Good test items must be unambiguous, and T/F items even more so.
- If the correctness of a statement hinges on a particular word or phrase, highlight or emphasize that word or phrase.
- Avoid negative statements if at all possible. Negative statements are harder to decode, particularly those with two negatives.
- Include similar numbers of true and false items and make them similar in length.
- Group T/F items under a common statement, story, illustration, graph, or other material. This reduces the amount of ambiguity possible, since the items come from a specific frame of reference.
- Avoid generalizations such as all, always, never, or none, since they usually trigger a false statement. Also avoid qualifiers like sometimes, generally, often, and can be, since they are often indicators of a true statement.