Cover Image
close this bookEducational Handbook for Health Personnel (WHO, 1998, 392 p.)
close this folderChapter 2: Evaluation planning
View the document(introduction...)
View the documentWhat is evaluation?
View the documentContinuous evaluation formative and certifying evaluation
View the documentAims of student evaluation1
View the documentCommon methodology for student evaluation1
View the documentComparison of advantages and disadvantages of different types of test
View the documentEvaluation in education qualities of a measuring instrument
View the documentEvaluation is a matter for teamwork

Common methodology for student evaluation1

1 See also Rezler, A.G. The assessment of attitudes. In: Development of educational programmes for the health professions. Geneva, WHO, 1973 (Public Health Papers No. 52), pp. 70-83.


Evaluation of practical skills
Evaluation of communication skills
Evaluation of knowledge and intellectual skills

1. Make a list of observable types of behaviour showing that the objective pursued has been reached.

2. Make a list of observable types of behaviour showing that the objective pursued has not been reached.

3. Determine the essential features of behaviour in both lists.

4. Assign a positive or negative weight to the items on both lists.

5. Decide on the acceptable performance score.

* For the last three stages obtain the agreement of several experts.

Example. Objective: Reassure the mother of a child admitted to hospital







Explain clearly what has been done to the child

often uses medical terms and never explains what they mean

often uses medical terms and rarely explains what they mean

rarely uses medical terms and sometimes explains what they mean

rarely uses medical terms and always explains what they mean

uses only terms suited to the mother's vocabulary

etc. See the complete table on p. 4.32.

Minimum Performance Score: The student should score n marks out of 10 on the rating scale.



Try to answer questions 17 - 20 on p. 2.46 and check your answers on p. 2.48.

Evaluation methodology according to domains to be evaluated




For each of the educational objectives you have already defined (pp. 1.68, 1.69), choose from among the methods of evaluation set out on p. 2.22 the one you think most suitable for informing you and the student on the extent to which the objective has been achieved.


Method Of evaluation

Instrument of evaluation

1 page 1.68

Indirect method

Short, open-answer question based on the patient's record


2 page 1.68

Indirect method


3 page 1.68

Direct observation

Practical examination






For the purposes of this exercise the total number of students to be considered should be fixed: e.g., 100, or any other number that is realistic in your situation.

Personal notes


General remarks concerning examinations


Analysis of the most commonly used tests shows that sometimes, often even, the questions set are ambiguous, unclear, disputable, esoteric or trivial. It is essential for anyone constructing an examination, whether of the traditional written type, an objective test or a practical test, to submit it to his colleagues for criticism to make sure that its content is relevant (related to an educational objective) and of general interest, and does not exclusively concern a special interest or taste of the author; that the subject is interesting and real for the general practitioner or the physicians with a specialty different from that of the author; and that the questions (and the answers in the case of multiple-choice questions) are so formulated that experts can agree on the correct response. It is clear that a critical analysis along these lines would avoid the oversimplification of many tests which only too often justifies the conclusion: “the more you know about a question the lower will be your score”.

The author of a test is not the best judge of its clarity, precision, relevance and interest. Critical review of the test by colleagues is consequently essential for its sound construction.

Moreover, an examination must take the factor of practicability into account. This will be governed by the time necessary for its construction and administration, scoring and interpretation of the results, and by its general ease of use.

If the examination methods employed become a burden on the teacher because of their impractical nature he will tend not to assign to the measuring instrument the importance it deserves.

A discussion is not always pertinent to the problem at hand, but one learns to allow for some rambling. It seems to help people realize that they normally use quite a few fuzzies during what they consider “technical discussions”; it helps them realize that they don't really know what they are talking about... a little rambling helps clear the air. Asking someone to define his goal in terms of performance is a little like asking someone to take his clothes off in public - if he hasn't done it before, he may need time to get used to the idea.

R.F. Mager

Qualities of a test


Directly related to educational objectives
Realistic and practical
Concerned with important and useful matters
Comprehensive but brief
Precise and clear

Judge the consequences of the student's not achieving the objective by answering such questions as: “If he cannot perform the objective when he leaves my instruction he is likely to.....”. The answer should help you decide how much energy to put into constructing a valid evaluation system to find out whether the objective is achieved as written.

R.F. Mager

Considerations of the type of competence a test purports to measure


No test format (objective, essay or oral) has a monopoly on the measurement of the highest and more complex intellectual processes. Studies of various types of tests support the view that the essay and the oral examination, as commonly employed, test predominantly simple recall and, like the objective tests in current use, rarely require the student to engage in reasoning and problem-solving. In short, the form of a question does not necessarily determine the nature of the intellectual process required to answer it.

Second, there is often a tendency to confuse the difficulty of a question with the complexity of the intellectual process measured by it. However, it should be noted that a question requiring simple recall may be very “difficult” because of the esoteric nature of the information demanded; alternatively, a question requiring interpretation of data or application of principles could be quite “easy” because the principles of interpretation are so familiar and the data to be analysed so simple. In short, question difficulty and complexity of instructions are not necessarily related to the nature of the intellectual process being tested.

Third, there is often a strong inclination to assume that any question which includes data about a specific case necessarily involves problem-solving, whereas, in fact, “data” are often merely “window dressing” when the question is really addressed to a general condition and can be answered equally well without reference to the data. Or, the data furnished about a “specific case” may constitute a “cut-and-dried”, classical textbook picture that, for example, simply requires the student to recall symptoms associated with a specific diagnosis. It is interesting to note that questions of this type can readily be converted into problems that do require interpretation of data and evaluation, simply by making the case material conform more closely to the kind of reality that an actual case, rather than a textbook, presents.

In short, just as each patient in the ward or outpatient department represents a unique configuration of findings that must be analysed, a test that purports to measure the student's clinical judgement and his ability to solve clinical problems must simulate reality as closely as possible by presenting him with specific constellations of data that are in some respects unique and, in that sense, are new to him. Do not try to use a MCQ or a SOAQ to find out whether the student is able to communicate orally with a patient!

However reliable or objective a test may be, it is of no value if it does not measure ability to perform the tasks expected of a health worker in his/her professional capacity.

Common defects of examinations (domain of intellectual skills)


A review of examinations currently in use strongly suggests that the most common defects of testing are:


the triviality of the questions asked, which is all the more serious in that examination questions can only represent a small sample of all those that could be asked. Consequently it is essential for each question to be important and useful;

Outright error

outright error in phrasing the question (or, in the case of multiple-choice questions, in phrasing the distractors and the correct response);


ambiguity in the use of language which may lead the student to spend more time in trying to understand the question than in answering it; in addition to the risk of his giving an irrelevant answer;


forcing the student to answer in terms of the outmoded ideas of the examiner, a bias which is well known and often aggravated by the teaching methods themselves (particularly traditional lectures);


requesting the student to answer in terms of the personal preferences of the examiner when several equally correct options are available;


complexity or ambiguity of the subject matter taught, so that the search for the correct answer is more difficult than was anticipated;

Unintended cues

unintended cues in the formulation of the questions that make the correct answer obvious; this fault, which is often found in multiple-choice questions, is just as frequent in oral examinations.

Outside factors to be avoided


In constructing an examination, outside factors must not be allowed to interfere with the factor to be measured.

Complicated instructions (ability to understand instructions)

In some tests, the instructions for students on how to solve the problems are so complicated that what is really evaluated is the students' aptitude to understand the question rather than their actual knowledge and ability to use it. This criticism is often made of multiple-choice examinations in which the instructions appear too complicated. The complexity is often more apparent than real and disturbs the teacher rather than the student.

Over-elaborate style (ability to avoid traps)

The student may disguise his lack of knowledge in such elegant prose that he succeeds in influencing the corrector, who judges the words and style rather than the student's knowledge.

Trap questions (ability to use words)

This type of interference does not depend on a measuring instrument, but on possible sadistic tendencies on the part of the examiner who, during an examination, may allow himself to be influenced by the candidate's appearance, sex, etc. Some candidates are more or less skilled at playing on these tendencies.


This is a criticism that is generally made of multiple-choice examinations; it may in fact be applied to other forms of evaluation. In oral and written examinations, students develop a sixth sense, often based on statistical analysis of past questions, which enables them somehow to predict the questions that will be set.