Cover Image
close this bookEarly Supplementary Feeding and Cognition (Society for Research in Child Development, 1993, 123 pages)
close this folderV. Methods of the cross-sectional follow-up
View the document(introductory text...)
View the documentSubjects
View the documentSocioeconomic indicators
View the documentSchooling variables
View the documentThe psychological test battery

The psychological test battery

With the intention of assessing two distinct aspects of cognition, two psychological test batteries were used in the follow-up. The psychoeducational test battery included Raven's Progressive Matrices and tests of complex intellectual aptitudes, abilities, and achievements that are heavily influenced by experience, education, and cultural upbringing. Illustrative of the latter are two standardized tests of reading and vocabulary and a knowledge test that was developed locally. The theoretical justification for the selection of these tests was the expectation that proficiency in reading and vocabulary and breadth of general knowledge will determine in part the potential that an adolescent or a young adult has to contribute to his or her community's social and economic development. Our particular concern was whether the nutritional supplement made a difference in terms of the crystallization of those mental abilities.







Age at entry




















Highest grade reached





The second test battery included elementary cognitive tasks, such as simple and choice reaction time (RT), that measure a single attribute of information processing: speed. A paired associates test was also included in this battery. The between-subject variability in RT tests is generally not accounted for by schooling and cultural background, yet test performance still maintains a low-level correlation (r's ranging from -.10 to -.30) with g or a general ability factor. Theoreticians currently claim that RT is a sensitive indicator of differences in brain function (Eysenck, 1986; Jensen, 1991; Vernon, 1987). In the present study, inclusion of these tests was justified by the assumption that RT would be particularly sensitive to the effects of nutrition on central nervous system activity.

The tests included in the two batteries and their psychometric properties are described below.

Psychoeducational Tests

The battery included tests of literacy, numeracy, and general knowledge, two standardized educational achievement tests, and Raven's Progressive Matrices (RPM). The achievement tests were part of the Interamerican Series originally designed to assess reading abilities of Spanish-speaking children in Texas (Manuel, 1967).

Tests of literacy, numeracy, and general knowledge were administered individually by four trained testers. The achievement and intelligence tests were administered either individually or in a group, depending on subject availability, time, and logistical constraints. All the testers were females with certification as primary school teachers, and they came from Guatemala City or from a medium-sized town located near the villages. Testers received extensive training by both Guatemalan and U.S. psychologists during pre-testing and the pilot study.

Interrater reliability was calculated for literacy, numeracy, and general knowledge tests on the basis of four testing sessions with five raters (four testers and the psychologist) at each session. Percentage agreement varied between 86% and 100% for literacy, 97% and 100% for numeracy, and 94% and 100% for general knowledge.


The literacy test consisted of two parts: a preliteracy measure of knowledge of letters, syllables, words, and short phrases and a reading test based on material familiar to the subjects. All subjects who reported having achieved 4 or fewer years of schooling were given the preliteracy test. Subjects who achieved between 4 and 6 years of schooling were asked to read the headline of a newspaper article aloud ("Futbol Guatemalteco bien representado en Caracas"). If mistakes were made in word recognition or pronunciation, then the preliteracy test was administered. Subjects who achieved more than 6 years of schooling were presumed to be literate.

The preliteracy test was scored on a four-point scale as follows: 1 = unable to complete prereading test, suspended; 2 = completed test with at least five errors, suspended; 3 = completed test with less than five errors, continued; 4 = only reading test, not preliteracy test, administered.

The reading test consisted of 19 questions about two different sets of stimuli: a cedula (identification card) and related personal data, and a newspaper article about a soccer game. For each stimulus, subjects were asked to read a short paragraph and then respond verbally to a series of questions regarding the information they had read. Coding was done by individual testers, and scoring was based on the total number of correct answers.


Subjects were asked to read aloud a list of numbers ranging from one to three digits, to read a list of prices of familiar articles, and to order a list of items sequentially by their prices. They were also shown three pictures reflecting common situations of buying, working, and transportation and asked to answer questions regarding costs, wages, fares, and distances that required the ability to add, subtract, multiply, or divide. There was a total of 41 items. Coding was done by individual testers, and scoring was based on the number of correct answers across all items.


The knowledge test consisted of 22 questions regarding common experiences related to school, work, transportation, legal-political structures, and health. Subjects were presented with situations that required either basic knowledge or simple decision-making skills to be understood. They were given three possible choices and asked to select the option that best answered the question. Coding was done by individual testers, and scoring reflected the total number of correct answers.

Achievement Tests

The Interamerican Reading Series is a standardized test that consists of three parts: level of comprehension, speed of comprehension, and vocabulary. As a result of the pilot study, and owing to time constraints, only the level of comprehension and vocabulary sections were included. All subjects who passed the preliteracy test, independent of years of schooling, were given the achievement tests. The tests were timed and given either individually or in a group of up to four subjects. Scores were the number of correct answers on each of the two scales.


Intelligence was assessed with Raven's Progressive Matrices (RPM), which consists of five scales (A-E) containing 12 items each. Data from pilot testing indicated very low variance on scales D and E; consequently, only scales A, B. and C were administered. The test was administered either individually or in a group, and scoring reflected the number of correct answers summed across the three scales.

Information Processing

Tests of simple, choice, and memory reaction time (RT) (Sternberg, 1966) composed the computerized battery of tests to assess information processing. In addition, a paired associates test was administered as part of this battery. The intent of the battery was to assess the speed with which an individual processed information in completing elementary cognitive tasks. As described below, two of the RT tests (i.e., choice and memory) also allowed an assessment of efficiency, that is, speed in relation to errors in response.

The computer programs for each test were designed for this study. Two Guatemalan testers from a medium-sized town centrally located near the villages were trained in the use of the computer program and data management. They had limited previous experience with computers but were trained extensively during both the pilot and the pretesting stages of the project.

Subjects to be tested were first introduced to the computer as if it were a television and a typewriter (both familiar objects). They were then given a chance to interact with the computer in a series of warm-up exercises prior to the administration of the test battery.

Simple Reaction Time

This task consisted of repeated presentations of a randomly selected stimulus (geometric figures such as a circle or triangle) at the center of a computer screen. The duration of the presentation was 0.5 see, with an interstimulus interval that varied systematically between 0.5 and 2 sec. The subjects were instructed to press the bar of the keyboard as quickly as possible on appearance of the stimulus. The test consisted of 30 trials. The lapse between presentation of the target stimulus and the bar press was recorded for each response. The score was the mean reaction time across successful trials.

Choice Reaction Time/Accuracy

The task consisted of the presentation of 12 geometric figures, from which the subject selected two that then became target figures. A series of five figures (two target and three randomly selected from the initial set of 12) flashed on the screen sequentially with a display period of 0.5 sec and interstimulus intervals that varied systematically between 0.5 and 3 sec. Subjects were instructed to press the bar when the two target figures appeared in sequential order and to refrain from pressing the bar in response to any other figures or to the target figures when not presented in sequential order. The test consisted of 30 trials. In addition to calculating reaction time for all correct responses, the percentage positive (presence of motor response) and negative (inhibition of motor response) correct and the number of errors of omission and commission were also calculated. The standardized error and reaction time scores were then used to calculate measures of efficiency (total error score plus reaction time) and impulsivity (total error score minus reaction time) (Salkind & Wright, 1977). These two measures capture variation in style of response, taking into account both accuracy and speed. Large negative scores of the efficiency measure are interpreted as highly efficient responses, and large positive scores on the impulsivity index indicate impulsive responses.

Memory Task

This task follows Sternberg's (1966) paradigm. It consisted of the horizontal presentation of six geometric figures at the top of the computer screen for 3 see; the figures then flashed off the screen, and a single target figure appeared at the center of the screen. Subjects had to press one of two different keys depending on whether the target figure was one of the six previously displayed figures or not. The test included 20 trials. As in the previous testing, scores consisted of reaction time, percentage of positive and negative correct, impulsivity, and efficiency.

Paired Associates

The task consisted of four pairs of randomly selected geometric figures that appeared at the top-left-hand corner of the screen for 5 sec. Figures were presented in two horizontal rows, paired vertically. Pairs were then flashed off the screen, and one of the four figures from the top row appeared in the middle of the screen; concurrently, the four figures from the bottom row appeared at the bottom of the screen. Each of the four figures was numbered (1-4). Subjects were requested to select the numbered figure that had been paired originally with the target figure by selecting the corresponding number on the keyboard. Each trial consisted of the presentation of four target figures (selected in random order). A bell rang after every correct response; incorrect answers received no feedback. The four pairs were consistent across all trials, while order of presentation was random. The test was completed after 30 trials or when all four pairs had been successfully matched on three consecutive trials. The score was the number of trials required to reach criterion.


Each of the four villages was visited twice by a research team, once during the dry and once during the rainy season. The teams were rotated, and each team visited each village during one round of testing. The team stayed in the village for 3-9 weeks, depending on the size of the village and coverage rates. Teams were made up of a doctor, two anthropometrists, several interviewers for sociodemographic data collection, and three persons trained to collect the behavioral data: one person on each team administered the information-processing tests, and two administered the psychoeducational tests.

Subjects were asked to complete the series of psychoeducational and information-processing tests on two separate days; completion of both series in a single day was strongly discouraged and occurred infrequently. When administered on the same day, a break was given between the two testing sessions. The information-processing evaluation lasted approximately 30 min. while the psychoeducational assessment averaged 1 hour and 15 min. In the case of illiterate subjects, all tests were administered individually.

In each community, two staff members recruited subjects and made appointments for testing. All testing was conducted in houses in the community rented by the project and adapted appropriately. In addition to psychological assessments, subjects were given medical and anthropometric examinations and interviewed regarding sociodemographic characteristics.

Reliability of Tests


Test-retest stability coefficients (Pearson product-moment correlation) for the psychoeducational and information-processing tests were assessed on a subsample of the Guatemalan adolescent study population (N = 217). Subjects who agreed to participate in retesting were assigned randomly to one or more of the information-processing and/or psychoeducational tests. The test-retest interim period ranged across subjects from 2 to 34 days, with a mean of 17.7 days (SD = 7.99). Tests with a test-retest stability coefficient of .40 or less were dropped from further analyses.

As shown in Table 13, the stability coefficients for the psychoeducational tests were high, ranging from .85 to .98. These coefficients are similar to published test-retest values for Raven's Progressive Matrices (Rash, 1959; Stinissen, 1956 [cited in Raven, Court, & Raven, 1984]) and the Interamerican Series (Manuel, 1967).

The stability coefficients for the reaction time and paired associates tests were also moderate to high. However, the other variables on the choice reaction test had coefficients under .40. The means and frequency distributions for these variables indicate that the test was not sufficiently difficult to capture individual differences; the frequency distribution of errors of commission, for example, showed the majority of subjects to have made few or no such errors.

Test-retest stability coefficients were also assessed on the basis of sub-samples of subjects with longer versus shorter interims between testing. Differences in range of time between testing sessions did not significantly affect reliability coefficients.

Internal Homogeneity

Using the entire sample, Cronbach's alphas were calculated to assess the internal consistency of the RPM, the Interamerican Series, and the knowledge, numeracy, and reading tests. Alphas obtained for the RPM and the Interamerican vocabulary and reading tests were high (.79-.98) and similar to the internal consistency measures published for these tests in the literature (Arnold, 1969; Barahini, 1973; Stinissen, 1956 [cited in Raven et al., 1984]; Swinnen, 1958 [cited in Raven et al., 1984]).



Psychoeducational battery tests:

Raven (N = 88)


Knowledge (N = 87)


Interamerican (N = 70):





Reading (N = 70)


Literacy (N = 89)


Information-processing battery tests:

Numeracy (N = 89)


Reaction time (N = 82)


Paired associates (N = 85):

Trials to criterion


Information-processing battery tests:

Choice reaction time:

Reaction time


% positive correct


% negative correct






Memory task (N = 70): tests:

Reaction time.


% positive correct


% negative correct






a Dropped from further analysis.

The internal homogeneity of the numeracy, knowledge, and reading tests was of particular interest as these tests had been constructed specifically for use with the Guatemalan adolescents. The coefficient alpha for the numeracy test was .95; alphas for the knowledge (.67) and the reading (.75) tests were not as high, but we nevertheless considered them to fall within an acceptable range. Item deletions proved to increase the coefficient only marginally and hence were not considered necessary for subsequent analyses.

Tester Differences

Assessment of differences among testers was made by comparing mean scores obtained on each variable by each of the two information-processing testers and by each of the four psychoeducational testers.

As shown in Table 14, significant intertester differences were observed on all psychoeducational tests, except for the Interamerican reading test. A series of analyses was run to assess whether the differences were a function of length of time spent in the village, round of testing, systematic disposition of an individual tester, or teams of testers (since two testers were always working in each village together). The results suggest that the differences were more likely to be related to teams rather than to individual testers and that they were not systematic - no one tester appeared to be biasing the results in a specific direction. Nevertheless, because some of these differences were large and potentially capable of affecting findings on the effects of treatment, final analyses of the psychoeducational outcomes were run both with and without controlling for testers. Comparisons of results indicated that none of the treatment effects were modified significantly by tester variation.











3,30 a,b

3.22 b

3.05 c





16.58 a

15.75 b


Raven's Matrices



11.36 a

11.01 a,b





32.66 a,b

31.91 b




13.59 a,b

13.04 c

13.32 b,c















NOTE - Duncan test; means with the same letter are not significantly different. df(3, 1,405) for literacy, numeracy, knowledge, and RPM; df(3, 1,052) for reading and Interamerican reading and vocabulary.

*p <.05.

**p <.01.

***p <.001.

Among the information-processing tests, the only scores on which significant tester differences were obtained were the choice reaction time and memory reaction time variables. These differences, however, were not large (.028 and .09 see, respectively), suggesting that their statistical significance can possibly be attributed to the large sample size and does not represent behaviorally meaningful differences between testers.


To assess construct validity, a factor analysis was conducted to test the original assumption that the overall battery of tests assessed two distinct domains of cognition: complex intellectual aptitudes, abilities, and educational achievements and elementary aspects of information processing. Factor loadings obtained from a factor analysis with varimax rotation performed on the full set of variables showed that all the psychoeducational tests loaded strongly on the first factor, which reflects an overall, general abilities factor (see Table 15). Factor 2 loaded most heavily with two of the reaction time variables, and Factor 3 loaded a memory variable. Factor 4 included the number of trials to reach criterion on the paired associates test and reaction time on the memory test. The composition of this last factor was somewhat unexpected as it had been assumed that memory reaction time would load with the other two reaction time measures.








Raven's Matrices










Interamerican vocabulary


Interamerican reading


Choice reaction time


Simple reaction time


Memory - reaction time


Trials to criterion


Memory - % negative correct







% variance





The factor analysis supports the assumption that the psychoeducational and the information-processing test batteries assess two distinct cognitive domains. A clear division exists between Factor 1 (psychoeducational) and Factors 2, 3, and 4 (information processing). The factor-analytic separation of the simple and choice RT tests from the memory RT suggests that, in these subjects, the different measures of RT are not tapping the same cognitive functions and may, therefore, be sensitive to different types of influences. The existence of these distinct domains was confirmed with an oblique rotation.

Concurrent test validity was also addressed by calculating correlations between the test scores and the educational variables. Positive and statistically significant correlations - ranging from r =. 18 to r =.58 - were found between highest grade achieved and all the tests contained in the psychoeducational battery. Correlations between grade attainment and the information-processing variables were also statistically significant, but much lower, ranging from -.10 (simple RT) to - .22 (memory efficiency).