You are here:     Home > Teaching and Learning Strategies > Assessment

Assessment Equity in a Multicultural Society

by Asa G. Hilliard III-Baffour Amankwatia II

 

A physicist from MIT spoke about what is required for "measurement" in science:

The main defect in both sides, or either side, of this argument is that the protagonists pay so little attention to the quality of the data base . . . the worst error in the whole business lies in attempting to put people, of whatever age or station, into a single ordered line of "intelligence" or "achievement" like numbers along a measuring tape: 86 comes after 85 and before 93. Everyone knows that people are complex—talented in some ways, clumsy in others; educated in some ways, ignorant in others, call, careful, persistent, and patient in some ways; impulsive, careless, or lazy in others. Not only are these characteristics different in different people, they also vary in any one person from time to time. To further complicate the problem, there is variety in the types of descriptions. The traits tall, handsome, and rich are not along the same sets of scales as affectionate, impetuous, or bossy . . .

As an old professional measurer (by virtue of being an experimental physicist) I can say categorically that it makes no sense to try to represent a multidimensional space with an array of numbers ranged along one line. This does not mean it is impossible to cook up a scheme that tries to do it; it's just that the scheme won't make any sense. It's possible to make an average of a column of figures in a telephone directory, but one would never try to dial it. Telephone numbers at least represent the same kind of idea: they are all address like codes for the central office to respond to.

. . .Implicit in the process of averaging is the process of adding. To obtain an average, first add a number of quantitative measures, and then divide by however many there are. This is all very simple provided the quantities can be added, but for the most part, with disparate objects, they cannot be. ( Zacharias, 1977, Pp. 69-70, emphasis mine.)

A socio-linguist speaks about using language to construct measurement devices:

Meanwhile it will do for us to examine some recent uses of quantitative language analyses from the perspective of the linguist. As noted earlier, linguists generally feel more comfortable about using quantitative analyses to probe for patterned differences than to generalize for broad groupings. Likewise, the more linguists study the semantic and pragmatic meaning conveyed by language, the less comfortable they become about the possibility of accurate measurement of tests, which use language as a medium. It is beginning to be believed, in fact, that the most critical measurement points of all, at least as far as language is concerned, are the ones least susceptible to quantification.

A basic problem is that the goal of getting responses that will be comparable across subjects or across testing times is often realized by forcing one standard interpretation of a question (or stimulus) and answer (or response) that is, in fact, not uniquely interpretable but rather is vague and can be fully specified only with reference to specifics of the individual test-takers' background and the individual test-taking occasion. (Shuy, 1975 emphasis mine)

Zacharias and Shuy quoted above are two of many scholars representing two of several related academic disciplines whose messages have rarely if ever been heard by psychologists and educators in the field of "mental measurement." They offer glimpses of the porous foundation upon which the field of mental measurement is constructed and upon which symptoms appear to be problems.

Taboo Topics

I have spent the better part of my professional career trying to start discussions about the empirical study of context in testing and evaluation, especially the context of language variety. I first issued the challenge to testing advocates speak to these issues at an American Psychological Association Panel in San Francisco during the 1970s. President William Turnbull of the Educational Testing Service and President Roger Lennon of the Psychological Corporation were on the panel, along with six or so others, including Attorney Dan Bersoff for the APA. David Weschsler, author of the Weschsler Intelligence Tests, was in the audience. I raised the language issues there, asking merely that they be discussed. They were ignored by everyone.

During the 1980s, I did the same thing with the senior professional staff of the American College Testing Program in Iowa. Good questions were raised by the attendees, with what appeared to be an understanding of the importance of the language issue. I have not heard of any follow-up. I also served on an invited panel at the Educational Testing Service, specifically on the topic of "bias" in testing, chaired by Ann Anastassi. I raised the same issues there. Again, I have not heard of any follow up. I served on the Committee on Testing and Assessment (CPTA) for the American Psychological Association, and was only minimally successful in getting a discussion there. To the best of my knowledge, the discussion never went beyond the committee. Shortly after my presentation at the American Psychological Association annual meeting, the President of the Educational Testing Service became a member of the board of directors of the Center for Applied Linguistics, where I was also serving as a board member. Still, I know of no engagement of this branch of scholarship in any serious way in the work of test design and use at ETS or at other testing companies.

The Purposes, Practices and the Utility of Tests and Assessment Approaches

School assessment purposes other than for achievement have been ranking, classification and placement of students, for treatment in special education and or school tracks. A small number of psychologists and special educators also claimed to be doing "diagnostic testing and assessment

During these decades there have been fundamental conceptual changes in the science and academics of testing and assessment. One of the most interesting is the cognitive change approach. (Feuerstein, Rand and Hoffman 1979, Lidz, 2000) Another set of changes has been the acceptance of the underdevelopment of the field of mental measurement by experts, and the development of a vision of possibilities. (Rowe, 1991) In spite of these conceptual changes for a significant number of scientists, there have been far fewer changes in professional practices among the overwhelming majority of professionals in applied psychology in education.

Testing and Assessment Issues Specific to Ethnic Minorities

It will be seen that where the "equity problem" in testing and assessment in education for minority cultural groups is concerned, these groups are the canaries in the miner's cap signaling deep problems with the whole enterprise of mass produced standardized testing and assessment, a paradigm problem. The Zacharias and Shuy quotations above, show that we are in deep water with this "measurement" business. Cultural anthropologists and experts from other related academic disciplines can inform the work of psychometrics and the assessments that are linked to that work. When applied to minority ethnic groups in a cultural and political environment, simple assessment practice problems can be magnified.

Common types of validity in testing and assessment are 'construct validity,' ' face validity,' 'content validity,' 'concurrent validity,' and 'predictive validity.' By far, predictive validity, or forecasting, is valued most, with "intelligence" and cognitive testing. In general, predictive validity rests on the assumption of fixed abilities, resulting in the expectation that students will maintain their approximate rank in a distribution of scores of "mental ability." These various validities were applied in different proportions to "cognitive" or "intelligence" testing and to "achievement" testing, collectively the primary forms of school testing. Of course, "personality," "interest" and other types of testing were sometimes used to a lesser degree.

For the most part in practice, testing includes little "assessment" or problem solving. There are few rigorous and commonly accepted standards for "assessment" as there are with testing. The romantic notion that many sources of valid data are combined with testing, and integrated in an instructionally valid way in school practice, simply has not been documented. (Donovan and Cross, 2002) The standardized routines that have been executed were accepted, on faith, as important activities to support, implicitly and rarely explicitly, beneficial school services. Gradually, some psychologists and educators began to be concerned about alternative meanings and values of the testing/assessment enterprise in schools. A few took action to develop approaches based on a new paradigm. However, even today, testing and assessment activities tend to be compliance activities, not to be confused with activities that inform instructionally valid design.

When I entered the field, there was not even a hint of a substantial challenge to the accepted idea of standardized, mass-produced, "one-size fits-all" assessment for all children, or that it would be beneficial universally for any students. Culture was ignored or minimized as a factor in creating testing routines, or in interpreting testing and assessment data. Students' economic opportunity and what we now call "opportunities to learn" were also ignored. In fact, I do not recall that culture was "on the radar screen" at all. Moreover, I recall no serious discussions in any classes or at any professional meetings of psychologists about culture, based on the work of appropriate experts. I recall no recognition by psychometricians of the academic disciplines that could inform cultural considerations, such as cultural anthropology and cultural linguistics. In general, most responses to the challenge of cultural deficiencies in testing and assessment tended to be political.

Some educators and psychologists have argued for "equity" in testing and assessment, for "culture free," "culturally fair," and "culturally relevant" or "culturally salient" testing and assessment. There is also "non-discriminatory" testing and assessment that speaks to equity without specific mention of culture. This equity oriented and intuitive response to the validity problem fell far short of what was needed, mainly because the assessment problem is not really understood as a validity problem, as a matter of science. No scientifically based data have been collected routinely on "opportunity-to-learn," "non-discrimination" and "cultural" factors to be included in the assessment. No standards have been developed to demonstrate the scientific basis for such adjustments. Therefore by which standards were these culturally responsive and "opportunity-to-learn" data to be evaluated, and for what purpose were the procedures to be performed? Clearly, no serious scientific response to the challenges have yet been made in general practice.

Although some of the challenges to standardized testing, especially IQ tests, and assessment practices were made in a variety of states, California by itself gives a good example of these challenges with three landmark court cases. In 1970, Diana vs. California State Board of Education dealt with the failure to take Mexican culture and language into account in testing. The Mexican children actually got an average of 15-point gains on IQ tests, when tested in their native language. In 1972 Lau vs. Nichols dealt with the failure to take Asian language into account in assessment for access to school services. In 1972, Larry P. vs. Wilson Riles dealt with the failure to take African culture into account because of the use of biased IQ tests. Similar cases were tried throughout the nation, Hobsen vs. Hansen on IQ and tracking in the District of Columbia in 1967, PASE [Parents in action for Special Education] vs. Hannon in Chicago in 1980 on IQ bias and special education, and Mattie T. vs. Holiday in Mississippi on IQ and special education in the 1980s.

So, the primary pressure for consideration of cultural context in mental measurement came through the courts, rather than through the academy or the profession, which had turned its back on these questions. That is still true. Most responses are more political than scientific.

Of all the cultural challenges, the most transparent one is language diversity. Why would families and communities have to go to court to get psychologists and educators to understand the existence of and the meaning of language diversity, especially in California, one of the most diverse states linguistically? California was also the state where the "Ebonics" controversy flared up a few years ago, the source of almost hysterical opposition to the recognition of the cultural uniqueness of the language spoken by many African students. (Adger, Christian and Taylor, 1999) (Delpit and Perry, 1998) (Delpit and Dowdy, 2002) (Crawford, 2001) (Jones, 1995) The Center for Applied Linguistics and the Linguistic Society of America joined in virtual unanimous support of the substance of the approach that the Oakland Unified School District was taking. Clearly, only the level of the ignorance that propelled it exceeded the level of hysteria.

A reading of a bit more of linguist Roger Shuy's (1979) analysis of the construct of intelligence, and the possibility of its measurement, using language data, is a powerful example of the validity of cultural linguistic criticism of IQ testing, in particular, and standardized tests and assessments in general. This is not trivial criticism. I must quote at length from Shuy's brilliant presentation.

 . . . Now, if such a basic principle is overlooked by the schools, it's also overlooked by those who measure things in schools. It's common practice to measure that which can be seen and that, which can be counted. In the area of foreign language instruction, what is most frequently assumed in the vocabulary, phonology, and grammar is that what you can measure most often is that. We need to be cognizant that ability to use language to get things accomplished is difficult to measure, not very physical and virtually impossible to count. Naturally, it is seldom tested.

It's small wonder then that if those who measure intelligence are not cognizant of the interference of surface forms on deep structure, they're not alone. Most everybody else does the same thing." (Shuy, 1979. p. 2)

At this point, it has been asserted that subjective judgments of all sorts, including judgments about intelligence, are made by teachers, employers, researchers and the general public regardless of the languages or settings in which they occur. Rather compelling evidence rejects every claim made by those who attempt to show linguistic variations as a deficit. Most arguments put forth to support this claim misrepresent linguistic theory and reveal naive methodology through lack of cultural understanding.

One basic contribution of linguistics to this question is that no one language or dialect, standard or nonstandard, is known to be significantly more complex than another in its basic grammatical or semantic characteristics. The Cakchiquel Indians, for example, in Guatemala were said to be primitive by the Spanish people, but they had over a hundred times more verb forms than Spanish does; and whatever complexity means, it certainly isn't that.

Well, linguists have not found any speech community with a native language that can be said to be logically or conceptually primitive. Likewise the so-called nonstandard dialects of English spoken by lower-class families in the inner-cities of this country are fully formed logical languages with only superficial differences in the means of expression from Standard English-- sometimes superior.

 . . . My point today is that such a representation is already several steps away from real data, for language use is influenced by a multitude of developmental, cultural, stereotypical and representational variances. This brings us a long way from a happy feeling about measurement of any sort.

In my opinion, the distance is too great to be of any real significance in that if ever the construct of intelligence can be shown to exist, the attempt to reflect it in language is far too distant to be of any real help. (Shuy, 1979. p. 6)

THIS IS PARADIGM BREAKING STUFF! Psychologists cannot simply leave such well-grounded professional opinions and documentation hanging with no response. I know of no meaningful responses from the psychometric discipline to this seminal linguistic challenge, or to other equally powerful challenges like it in other academic disciplines. I suspect that psychologists are incapable of responding because the consequences of doing so are enormous. The acceptance of the reality of diversity is to undermine the possibility for standardized, mass-produced, universally applicable measurement instruments. Agree or disagree, it is a fundamental scientific flaw to ignore this to ignore this particular challenge. It seems that there is a code of silence here among many professionals.

The doubts about the validity of testing and assessment are not new. Some of the early pioneers were clear about the limitations of standardized testing.

Existing instruments (for measuring intellect) represent enormous improvements over what was available twenty years ago, but three fundamental defects remain. Just what they measure is not known, how far it is proper to add, subtract, multiply, divide, compute ratios with the measures obtained is not known; just what the measures obtained signify concerning intellect is not known. We may refer to these defects in order as ambiguity in content, arbitrariness in units, and ambiguity in significance. Edward L. Thorndike (Cited in Houts, 1977, p. 23 by Sheldon White)

Unfortunately, the testing movement seems to be propelled by inertia. It has not been able to hear and to respond appropriately to the other scientific information that would have a major bearing on the work of psychologists.

Validity, requires taking such variables as culture and opportunity-to-learn into account. These things matter greatly in testing and assessment, and in the delivery of instructional services to students. Context includes culture and socioeconomic status. The research community, belatedly within the past decade, recognizes contextual variables as influential. These variables pose great threats to standardized mass-produced testing and assessment validity, especially in the absence of controls for context variation in validation studies. They also pose a great challenge to profits from mass production. It must be noted that typical validation studies of IQ tests are correlational and not experimental. How can the linkages between powerful instruction and assessment be demonstrated without some use of experimental studies and, controlling for critical contextual variables? These are scientific matters, not political ones. When will we ever learn?

Validity More than Equity: Instructional Validity More than Others

Typically, the foundation academic disciplines for the support of education have been psychology, sociology, and anthropology to a lesser extent. When we move beyond day-to-day school instructional services, disciplines such as economics, business, and other areas contribute to policy studies. What is interesting about all of these disciplines, and especially about psychology [testing and assessment], is that until very recently, there was no explicit benefits criterion for the evaluation of instruction and the testing and assessment practices that inform instruction. In other words, for example, psychology, the primary testing and assessment discipline, was not evaluated in terms of its contribution to beneficial instruction, only in terms of the faithful execution of psychological procedures or routines and recipes. Under current job descriptions, school psychologists take no responsibility for powerful instruction, only forecasting student performance.

In other words, the traditional validity criteria for "mental measurement" did not include empirically determined "instructional validity," the linking of testing and assessment to the improvement of instructional outcomes. Some psychometric experts refer to this as "consequential validity." To name "instructional validity" is to set in motion activities to provide empirical documentation to determine if it exists. In fact, that is precisely what happened, when the first National Research Council of the National Academy of Sciences committee on disproportionate placement of black males [later changed to "minorities"] in special education sought to determine if testing and assessment and special education services produced benefits, not merely predictive validity. The report was entitled Placing Children in Special Education: A Strategy for Equity. (Heller, Holtzman and Messick, 1982) A second National Academy of Sciences report during the same year, Ability Tests, Uses and Consequences echoed the instructional validity call. (Wigdor and Garner, 1982) Both reports begin with the clear statements:

Our ultimate message is a strikingly simple one. The purpose of the entire process, from referral for assessment to eventual placement in special education, is to improve instruction for children. The focus on educational benefits for children became our underlying theme, cutting across disciplinary boundaries and sharply divergent points of view.

. . . These two themes--the validity of assessment and the quality of instruction--are the subjects of this report. Valid assessment, in our view, is marked by its relevance to and usefulness for instruction. (Heller, Holtzman and Messick, 1982, pgs. x, xi)

. . . The basic principle underlying the Committee's discussion of testing in the schools is that the classification of pupils is warranted only when the decision rules, whether based on tests or not, have instructional validity. No school child should be relegated to a program of instruction that is not expected to enhance performance. (Wigdor & Garner, 1982, pg. 5)

These clear statements do introduce a "benefits" criterion to determine the "instructional validity," of professional practices and services for the first time, to my knowledge, in major scientific or professional publications. This is a seismic shift in educational and psychological paradigms, from a custodial to a remedial paradigm.

Included in the final report, Placing Children in Special Education: A Strategy for Equity, (Heller, Holtzman and Messick, 1982}, and in subsequent years, empirical research was done to determine if the linking of testing, assessment and school services produced student achievement benefits. The work showed essentially that there were few benefits to negative benefits. Twenty years later, the second National Academy of Sciences report on minority disproportion, Minority Students in Special and Gifted Education (Donovan and Cross, 2001) found virtually the same results as the first, with the review of even more studies seeking to document benefits. In other words, a cost benefit analysis of the current popular use of testing and assessment in the schools would result in a finding of virtually all cost and few to no benefits, in the high incidence special education categories. In fact, the lack of evidence for "instructional validity" and utility caused the second National Academy of Sciences committee to provide the basis for the argument for the elimination of IQ tests from school use! The following excerpts from the National Academy of Sciences Report add rationale for this thinking. (Donovan and Cross, 2001)

"In addition to the limitations of IQ tests from the perspectives of cultural psychology, it is questionable whether the costs of IQ tests are worth the benefits in special education eligibility determination."

. . . "The use of IQ tests and IQ-based disability determination do not promote the achievement of those critical goals; therefore, IQ should be abandoned, even if that action complicates the work of other agencies."

. . ."Perhaps the most convincing of the arguments against IQ tests is that the results are largely unrelated to the design, implementation, and evaluation of interventions designed to overcome learning and behavioral problems in school settings. For example, IQ is not a good predictor either of the kind of reading problem that a student exhibits or of the student's response to treatments designed to overcome that reading problem"

. . . "The same general interventions appear to work with basic skills problems regardless of whether the student is classified with mild mental retardation (MMR), learning disability (LD), or emotional disturbance (ED)"

. . . "The differentiation between LD and MMR that is done primarily with IQ test results does not lead to unique treatments or to more effective treatments."

. . ."No contemporary test author or publisher endorses the notion that IQ tests are direct measures of innate ability. Yet misconceptions that the tests reflect genetically determined, innate ability that is fixed throughout the life span remain prominent with the public, many educators, and some social scientists."

. . ."The present study suggests that the concept of discrepancy operationalized using IQ scores does not produce a unique subgroup of children with reading disabilities when a chronological age design is used; rather, it simply provides an arbitrary subdivision of the reading-IQ distribution that is fraught with statistical and other interpretative problems" . . .Poor readers who make up 70 to 80 percent of the current LD population seem to have the same needs and the same cognitive processing profiles, and they respond to the same treatments regardless of their IQ status (it should be noted that children with IQs less than 80 generally were excluded from the NICHD studies). Therefore, arbitrarily dividing poor readers into subgroups with higher IQs (those who meet the current LD criteria) and those with IQs similar to their reading achievement levels is invalid. With regard to reading-related characteristics, these subgroups are much more similar than different, calling into serious question the current LD diagnostic practices. "

" . . . Importantly for future policy development, the IQ test results and whether or not a child shows a discrepancy between IQ and reading achievement has little significance for understanding or treating a reading disability." (See especially pages 283-91)

Why did it take so long to discover the lack of value, or even the presence of harm in professional practice, invalid professional practice? "Blinders" were built into the professional conceptualizations and routines. Professional habits, really fixations, rarely critiqued, are self-perpetuating.

I must hasten to add here that instructionally valid beneficial testing and assessment approaches have been available for more than half a century.(Feuerstein, 1979 and Lidz, 2000) However, it comes from an entirely different paradigm, a marginalized paradigm. This alternative paradigm, rooted in the cognitive psychology of Binet and Piaget, assumes malleable intelligence, and demonstrates the instructional validity of diagnostic cognitive assessment and remedial teaching or mediation. The diagnostic assessment is called "dynamic assessment" for structural cognitive modifiability. The remedial teaching is called "instrumental enrichment" for changing cognitive structures in an enduring way. The goal is to assess the student's mental perceptions, definitions, functions, operations and structures. In other words, cognitive processes are of greater interest than the acquisition of cognitive content. Further, the uses of the process information must be beneficial for instruction. The remedial paradigm changes the assumptions, the questions, the goals, the roles of professionals, and many other things. This approach, although marginalized, does show powerful achievement benefits for students.

A brief word is in order here about achievement testing. The primary concern about achievement testing is that tests be "content valid." It is not rocket science to say that tests should measure what schools promise to teach. Of course, in the real world, students rarely are exposed to a common curriculum with a uniformly high quality of professional service. Rarely do valid tests exist of the actual curriculum that is offered, the actual opportunities to learn. Time and time again, empirical work demonstrates the lack of content validity of popularly used standardized tests. Empirical documentation of the "Savage Inequalities" (Kozol, 1991) in opportunities-to-learn in school services offered to students is abundant.

Of course, since few curricula are culturally salient, few tests are either. Cultural salience is of great importance, since students should have the opportunity to demonstrate what they know and can do; in perceptions, operations, functions and structures, using cultural material that is familiar to them. In addition, the human experience is an enormously diverse experience. The truth of the human experience must be reflected in the diversity of the school world. Any non-diverse curriculum is an untruthful one, be that a curriculum in mathematics, science, art, music, history, etc.

Mass produced standardized achievement tests have a poor content validity track record. That will remain a problem because savage inequalities in opportunities-to-learn shows no signs of abating. I do not argue that standardized mass produced tests should be eliminated. The argument is that they are not measurement when applied in the world of diversity. This means that they must be taken for what they are--rough instruments for data gathering as a point of departure for assessment. They may give some indication of the results of exposure to instruction. Some of these data may be useful, potentially even beneficial. However, the benefits must be demonstrated empirically.

Cultural Pluralism is a Reality, Not Rhetoric

It is a simple fact, empirically demonstrable, that there is no universal human culture. Testing and assessment are dependent upon communication systems that are tied to language. Language is culturally embedded and is a core part of the communication process. Cultural linguists are scientists who have developed a body of knowledge and models of meaning that are essential to the work of those who use language in making measurement devices and in the interpretation of the meaning of mediation and responses. (Shuy, 1979) Standardization in testing and assessment is of high value to entrepreneurs, who make money, if they can mass-produce products, such as tests. The best of all possible worlds for them is "one size fits all." Unfortunately for their needs, humanity made up of a plurality of cultures.

Until now, it is rare to see any academic/scientific expertise in culture, its contents and principles, reflected in mainstream psychology. The reality of cultural diversity and its meaning for science is often seen by measurement experts as a "minority concern," a "fairness concern," a "civil rights concern," an "equity concern," but not a scientific concern. This means that many scholars, often noting the concerns above, seem to believe that cultural diversity matters are more political than scientific. Moreover, there is real resistance among many traditional psychologists to engage in the required scientific study and dialogue about these cultural matters. Their cultural naiveté is almost legendary. The field of psychology is littered with the wreckage of attempts to function professionally in ignorance of cultural realities, and in ignorance of the cultural sciences that document and interpret those realities.

Politics, Testing and Assessment: Mental Measurement and Hegemony

This section deserves extensive treatment. However, it will be brief. I simply cannot conclude this paper without raising the issue of the contamination of the professions, including psychology and other behavioral sciences, by the more than 400-year tradition of white supremacy hegemony. Many of the invalid and bad practices in testing and assessment, in particular, stem from the well-documented partnership between many powerful people in our field and the forces of slavery, colonialism, segregation/apartheid and white supremacy ideology.

It would take many books to treat this subject well. I will merely provide a few bibliographical references to support the charge. The most recent examples include the widely respected and supported psychologist, Arthur Jensen [How much can we boost IQ, 1979 and Bias in Mental Testing, 1980], to Richard Herrenstein and Charles Murray [The Bell Curve; 1994], and finally to the internationalization of the bell curve racist ideology, Richard Lynn and Tatu Vanhanen [IQ and the Wealth of Nations. 2002]. Other classic works document the continuing contamination of professional practice with white supremacy thinking. (Gould, 1996) (Guthrie, 1976) (Kamin, 1974) (Tucker, 1994) Few courses in history and systems offer the documentation and the discussion time that is due to the white supremacist use of the discipline. This was and remains a non-trivial matter.

This is truly a taboo topic. White supremacy ideology and behavior is a contextual variable, subject to empirical confirmation. It is rarely investigated. It must be part of validity considerations. The culture and equity testing and assessment investigation of validity cannot be complete in the absence of an examination of this continuing phenomenon.

Conclusion

The professional remedies for the problems that I have tried to describe are worthy of careful consideration, though there is little time for that discussion here. I have merely tried to open up or re-open neglected topics. Cultural salience appears to be among the taboo topics, such as including the sciences of cultural linguistics and cultural anthropology in the psychometric endeavor. Even to enter into discussion about them is to threaten the derailment of the mass-produced testing and assessment train, as it is now constituted. Perhaps that explains the grand silence on these matters.

As I see it, the main problems here are political and economic, not academic and professional at all. Professionally, scientifically and academically, the way has already been prepared for appropriate responses to instructional validity and cultural salience in testing and assessment and teaching linkage. Vested interests will seek stasis. Courageous psychologists will determine whether we consider the taboo topics, and whether we embrace beneficial professional practice.

The students are waiting.


Selected References and Bibliography

Adger, Carolyn Temple; Christian, Donna and Taylor, Orlando, eds. (1999) Making the connection: Language and academic achievement among African American students. McHenry, IL: Center for Applied Linguistics and Delta Systems.

Alland, Alexander Jr. (2002) Race in Mind: Race, IQ and other Racisms. New York : Palgrave Macmillian.

Crawford, Clinton (2001) (Ed.) Ebonics and Language Education of African Ancestry Students. New York: Sankofa World Publishers.

Dandy, Evelyn B. (1991) Black Communications: Breaking Down the Barriers. Chicago: African American Images.

Delpit, Lisa and Perry, Theresa (1998) The Real Ebonics Debate: Power, Language and the Education of African American Children. Boston: Beacon Press

Delpit, Lisa and Dowdy, Jo Ann (Eds.)(2002) The skin we speak: Thoughts on Language and Culture in the Classroom. New York: New Press.

Donovan, M. Suzanne and Cross, Christopher T. (2001) Minority Students in Special and Gifted Education, Commission on Minority Representation in Special and Gifted Education, Behavioral and Social Sciences and Education, National Research Council. Washington, DC: National Academy Press.

Feuerstein, Reuven; Rand, Ya'acov and Hoffman, Mildred (1979) The Dynamic Assessment of Retarded Performers: the Learning Potential Assessment Device, Theory, Instruments and Techniques. Baltimore: University Park Press.

Fuller, Renee (1977) In Search of the IQ Correlation: a Scientific Whodunit.   Stony Brook, NY: Ball-Stick-Bird Publishing.

Gould, Stephen J. (1996) The Mismeasure of Man. New York: WW Norton.

Guthrie, Robert V. (1976) Even the Rat was White: a Historical View of Psychology. NY: Harper and Row.

Hehir, Thomas and Latus, Thomas (1992) Special Education at Century's End: Evaluation of Theory and Practice Since 1970. Harvard Educational Review Reprint Series Number 23.

Heller, Kirby A., Holtzman, Wayne H., and Messick, Samuel, eds. (1982) Placing Children in Special Education: A Strategy for Equity. Washington, DC: National Academy Press.

Herrenstein, Richard and Murray, Charles (1994) The Bell Curve: Intelligence and Class Structure in American Life. New York: The Free Press.

Hilliard, Asa G. III (1990) "Back to Binet: The case against the use of IQ tests in the schools." Contemporary Education. 61, 4, 184-9.

Hilliard, Asa G. III (1995) "Either a Paradigm Shift or no Mental Measurement: The Non-science and Non-sense of the Bell Curve." Psych Discourse. 76, 10, 620.

Hilliard, Asa G. III (1984) "IQ testing as the Emperor's New Clothes: a Critique of Bias in Mental Testing," in C. Reynolds, ed. Perspectives on Bias in Mental Testing. New York: Plenum.

Hilliard, Asa G. III (1988) "Misunderstanding and Testing Intelligence," in John Goodlad and Pamela Keating, eds. Access to Knowledge. New York: The College Board, 145-157.

Hilliard, Asa G. III (1987) "Testing African American students." Special Issue of the Negro Education Review. 38, numbers 2 and 3 (Republished 1995, by Chicago: Third World Press.)

Hilliard, Asa G. III (1975) "The Strengths and Weaknesses of Cognitive Tests for Young Children." in J. D Andrews, ed. One Child Indivisible. Washington: DC: National Foundation for the Education of Young Children.

Hilliard, Asa G. III (1983). "Factors associated with language in the education of the African-American child." Journal of Negro Education, 52(1), 24-34.

Hilliard, Asa G. III (1994) "What Good is this thing called Intelligence and Why Bother to Measure it? Journal of Black Psychology, 20, 4, 430-444.

Houts, Paul, ed. (1977) The Myth of Measurability: IQ tests, Standardized Tests. New York: Hart Publishing Company, Inc.

Jacoby, Russell and Guberman, Naomi (1995) The Bell Curve Debate: History, Documents, Opinion. New York: Times Books, Random House.

Jones, J. Arthur (1987)" Look at Math Teachers, Not 'Black English'." Essays and Policy Studies. Washington, DC: Institute for Independent Education.

Kamin, Leon (1974) The Science and Politics of IQ. New York: John Wiley and Sons.

Kincheloe, Joe L., Sternberg, Shirley N. Gresson, Aron D. M., eds. (1997) Measured Lies: the Bell Curve Examined. New York: St. Martins Press.

Kozol, Jonathan (1991) Savage Inequalities: Children in America's Schools. New York: Crown.

Lidz, Carol and Elliott, Julian G., eds. (2000) Dynamic Assessment: Theory, Models, and Applications. New York: Pantheon Books.

Lidz, Carol and Elliott, Julian G. (2000) Dynamic Assessment: Prevailing Models and Applications. New York: JAI, An Imprint of Elsevier Science.

Lynn, Richard. and Vanhanen, Tatu (2002) IQ and the Wealth of Nations. Westport, CT: Praeger.

Rowe, Helga A.H. (1991) Intelligence: Reconceptualization and Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Shuy, Roger, (1979) "Is the Construct Intelligence a Twentieth Century Myth?" (Transcript of Symposium Presentation) American Psychological Association Annual Convention, New York, 1979. (Other Symposium Participants: David Elkind, Chairman; John Horn, Renee Fuller, Asa Hilliard, Presenters; and J. McVicker Hunt, Discussant.)

Shuy, Roger (1975) "Quantitative linguistic data: a case for and Some Warnings Against." Unpublished manuscript. (Shuy was at the Center For Applied Linguistics in Washington, DC.)

Skrtic, Thomas M. (1992) "The Special Education Paradox: Equity as the Way to Excellence" in Thomas Hehir and Thomas Latus, eds. Special Education at Century's End: Evaluation of Theory and Practice since 1970. Harvard Education Review Reprint Series No. 23

Smitherman, Geneva (Ed.) (1981). Black English and the Education of Black Children and Youth: Proceedings of the National Invitational Symposium on the King Decision, Detroit: Harlo Press.

Tucker, William H. (1994). The Science and Politics of Racial Research. Chicago: University of Illinois Press.

Van Keulen, G., Weddington, G. And DeBose, C. (1998). Speech, Language, Learning and the African-American Child. Boston: Allyn Bacon.

Wigdor, Alexandra K. and Garner, William R., eds. Ability Testing: Uses, Consequences, and Controversies, Part I. Committee on Ability Testing, Assembly of Behavioral and Social Sciences National Research Council. Washington, DC: National Academy Press, 1982.

Zacharias, Jerrold R. (1977) "The Trouble with Tests." In Houts, Paul, ed. The Myth of Measurability. New York: Hart Publishing Company, Inc.


About the author

Dr. Asa G. Hilliard III - Nana Baffour Amankwatia, II , is the Fuller E. Callaway Professor of Urban Education at Georgia State University, with joint appointments in the Department of Educational Policy Studies and the Department of Educational Psychology/Special Education. A teacher, psychologist, and historian, he began his career in the Denver Public Schools, teaching psychology, mathematics and American History. He earned a B.A., in Psychology, M.A. in Counseling, and Ed.D. in Educational Psychology from the University of Denver, where he also taught in the College of Education, and in the College of Arts ad Sciences in the Honors Program in philosophy. Dr. Hilliard served on the faculty at San Francisco State University for 18 years. During that time he was a Department Chair for 2 years, Dean of Education for 8 years, and was consultant to the Peace Corps and Superintendent of Schools in Monrovia, and school psychologist, during his six years in Liberia, West Africa.

He has helped to develop several national assessment systems, such as proficiency assessment of professional educators, and developmental assessments of young children and infants. He is a Board Certified Forensic Examiner and Diplomate of both the American board of Forensic Examiners and the American Board of Forensic Medicine. He served as expert witness sin several landmark federal cases on test validity and bias, including Larry P. v. Wilson Riles in California, Mattie T. v. Holliday in Mississippi, Deborah P. v. Turlington in Florida, and also in two Supreme Court cases, Ayers v. Fordice in Mississippi, and Marino v. Ortiz in New York City.

Dr. Hilliard is a founding member and First Vice President of the Association for the Study of Classical African Civilizations. He has conducted Ancient African History study tours to Egypt for 18 years, is the co-developer of an educational television series on Ancient Kemetic (Egyptian history). He has produced videotapes and educational materials on African history through his production company, Waset Educational Productions. He is Co-Founder, with his daughter, Nefertari Patricia Hilliard-Nunn, of Makare Publishing. Dr. Hilliard has written more than three hundred research reports, articles and books on testing, Ancient African History, Teaching strategies, African culture, and child growth and development. He served with Dr. Barbara Sizemore as Chief Consultant on the Every Child Can Succeed television series, produced by the Agency for Instructional Technology.


© March 2004 New Horizons for Learning
http://www.newhorizons.org

info@newhorizons.org

For permission to redistribute, please go to:
New Horizons for Learning Copyright and Permission Information




  Quarterly Journal | Current Notices |
  About New Horizons for Learning | Survey/Feedback
  Site Index | NHFL Products | WABS | Meeting Spaces | Search