ALBEMARLE PAPER COMPANY v. MOODY
Mr. Justice STEWART delivered the opinion of the Court.
These consolidated cases raise . . . important questions under Title VII of the Civil Rights Act of 1964 . . . What must an employer show to establish that pre-employment tests racially discriminatory in effect, though not in intent, are sufficiently `job related’ to survive challenge under Title VII?
The respondents — plaintiffs in the District Court — are a certified class of present and former Negro employees at a paper mill in Roanoke Rapids, N.C.; the petitioners — defendants in the District Court — are the plant’s owner, the Albemarle Paper Co., and the plant employees’ labor union, Halifax Local No. 425. . . .
At the trial, in July and August 1971, the major issues were the plant’s seniority system, its program of employment testing, . . . In its opinion of November 9, 1971, the court found that the petitioners had `strictly segregated’ the plant’s departmental `lines of progression’ prior to January 1, 1964, reserving the higher paying and more skilled lines for whites. The `racial identifiability’ of whole lines of progression persisted until 1968, when the lines were reorganized under a new collective-bargaining agreement. The court found, however, that this reorganization left Negro employees “locked’ in the lower paying job classifications.’ The formerly ‘Negro’ lines of progression had been merely tacked on to the bottom of the formerly ‘white’ lines, and promotions, demotions, and layoffs continued to be governed–where skills were ‘relatively equal’–by a system of ‘job seniority.’ Because of the plant’s previous history of overt segregation, only whites had seniority in the higher job categories. Accordingly, the court ordered the petitioners to implement a system of ‘plantwide’ seniority.
The court . . . refused to enjoin or limit Albemarle’s testing program. Albemarle had required applicants for employment in the skilled lines of progression to have a high school diploma and to pass two tests, the Revised Beta Examination, allegedly a measure of nonverbal intelligence, and the Wonderlic Personnel Test (available in alternative Forms A and B), allegedly a measure of verbal facility. After this Court’s decision in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and on the eve of trial, Albemarle engaged an industrial psychologist to study the ‘job relatedness’ of its testing program. His study compared the test scores of current employees with supervisorial judgments of their competence in ten job groupings selected from the middle or top of the plant’s skilled lines of progression. The study showed a statistically significant correlation with supervisorial ratings in three job groupings for the Beta Test, in seven job groupings for either Form A or Form B of the Wonderlic Test, and in two job groupings for the required battery of both the Beta and the Wonderlic Tests. The respondents’ experts challenged the reliability of these studies, but the court concluded:
The personnel tests administered at the plant have undergone validation studies and have been proven to be job related. The defendants have carried the burden of proof in proving that these tests are ‘necessary for the safe and efficient operation of the business’ and are, therefore, permitted by the Act. However, the high school education requirement used in conjunction with the testing requirements is unlawful in that the personnel tests alone are adequate to measure the mental ability and reading skills required for the job classifications.’
The . . . respondents appealed the denial of a backpay award and the refusal to enjoin or limit Albemarle’s use of pre-employment tests. A divided Court of Appeals for the Fourth Circuit reversed the judgment of the District Court, ruling that . . . use of the tests should have been enjoined . . .
As for the pre-employment tests, the Court of Appeals held that it was error
to approve a validation study done without job analysis, to allow Albemarle to require tests for 6 lines of progression where there has been no validation study at all, and to allow Albemarle to require a person to pass two tests for entrance into 7 lines of progression when only one of those tests was validated for that line of progression.
. . . We granted certiorari because of an evident Circuit conflict . . . as to the showing required to establish the `job relatedness’ of pre-employment tests. . . .
In Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), this Court unanimously held that Title VII forbids the use of employment tests that are discriminatory in effect unless the employer meets `the burden of showing that any given requirement (has) . . . a manifest relationship to the employment in question.’ This burden arises, of course, only after the complaining party or class has made out a prima facie case of discrimination, i.e. has shown that the tests in question select applicants for hire or promotion in a racial pattern significantly different from that of the pool of applicants. If an employer does then meet the burden of proving that its tests are ‘job related,’ it remains open to the complaining party to show that other tests or selection devices, without a similarly undesirable racial effect, would also serve the employer’s legitimate interest in `efficient and trustworthy workmanship.’ Such a showing would be evidence that the employer was using its tests merely as a ‘pretext’ for discrimination. In the present case, however, we are concerned only with the question whether Albemarle has shown its tests to be job related.
. . . Like the employer in Griggs, Albemarle uses two general ability tests, the Beta Examination, to test nonverbal intelligence, and the Wonderlic Test (Forms A and B), the purported measure of general verbal facility which was also involved in the Griggs case. Applicants for hire into various skilled lines of progression at the plant are required to score 100 on the Beta Exam and 18 on one of the Wonderlic Test’s two alternative forms.
The question of job relatedness must be viewed in the context of the plant’s operation and the history of the testing program. The plant, which now employes about 650 persons, converts raw wood into paper products. It is organized into a number of functional departments, each with one or more distinct lines of progression, the theory being that workers can move up the line as they acquire the necessary skills. The number and structure of the lines have varied greatly over time. For many years, certain lines were themselves more skilled and paid higher wages than others, and until 1964 these skilled lines were expressly reserved for white workers. In 1968, many of the unskilled `Negro’ lines were `end-tailed’ onto skilled `white’ lines, but it apparently remains true that at least the top jobs in certain lines require greater skills than the top jobs in other lines. In this sense, at least, it is still possible to speak of relatively skilled and relatively unskilled lines.
In the 1950’s while the plant was being modernized with new and more sophisticated equipment, the Company introduced a high school diploma requirement for entry into the skilled lines. Though the Company soon concluded that this requirement did not improve the quality of the labor force, the requirement was continued until the District Court enjoined its use. In the late 1950’s, the Company began using the Beta Examination and the Bennett Mechanical Comprehension Test (also involved in the Griggs case) to screen applicants for entry into the skilled lines. The Bennett Test was dropped several years later, but use of the Beta Test continued.(1)
The Company added the Wonderlic Tests in 1963, for the skilled lines, on the theory that a certain verbal intelligence was called for by the increasing sophistication of the plant’s operations. The Company made no attempt to validate the test for job relatedness, and simply adopted the national `norm’ score of 18 as a cut-off point for new job applicants. After 1964, when it discontinued overt segregation in the lines of progression, the Company allowed Negro workers to transfer to the skilled lines if they could pass the Beta and Wonderlic Tests, but few succeeded in doing so. Incombents in the skilled lines, some of whom had been hired before adoption of the tests, were not required to pass them to retain their jobs or their promotion rights. The record shows that a number of white incumbents in high- ranking job groups could not pass the tests.(2)
Because departmental reorganization continued up to the point of trial, and has indeed continued since that point, the details of the testing program are less than clear from the record. The District Court found that, since 1963, the Beta and Wonderlic Tests have been used in 13 lines of progression, within eight departments. Albemarle contends that at present the tests are used in only eight lines of progression, within four departments.
Four months before this case went to trial, Albemarle engaged an expert in industrial psychology to `validate’ the job relatedness of its testing program. He spent a half day at the plant and devised a `concurrent validation’ study, which was conducted by plant officials, without his supervision. The expert then subjected the results to statistical analysis. The study dealt with 10 job groupings, selected from near the top of nine of the lines of progression. Jobs were grouped together solely by their proximity in the line of progression; no attempt was made to analyze jobs in terms of the particular skills they might require. All, or nearly all, employees in the selected groups participated in the study — 105 employees in all, but only four Negroes. Within each job grouping the study compared the test scores of each employee with an independent ‘ranking’ of the employee, relative to each of his coworkers, made by two of the employee’s supervisors. The supervisors, who did not know the test scores, were asked to `determine which ones they felt irrespective of the job that they were actually doing, but in their respective jobs, did a better job than the person they were rating against . . . .’
For each job grouping, the expert computed the `Phi coefficient’ of statistical correlation between the test scores and an average of the two supervisorial rankings. Consonant with professional conventions, the expert regarded as ‘statistically significant’ any correlation that could have occurred by chance only five times, or fewer in 100 trials. On the basis of these results, the District Court found that ‘(t)he personnel test administered at the plant have undergone validation studies and have been proven to be job related.’ Like the Court of Appeals, we are constrained to disagree.
The EEOC has issued ‘Guidelines’ for employers seeking to determine, through professional validation studies, whether their employment tests are job related. These Guidelines draw upon and make reference to professional standards of test validation established by the American Psychological Association. The EEOC Guidelines are not administrative regulations promulgated pursuant to formal procedures established by the Congress. But, as this Court has heretofore noted, they do constitute `(t)he administrative interpretation of the Act by the enforcing agency,’ and consequently they are ‘entitled to great deference.’
The message of these Guidelines is the same as that of the Griggs case — that discriminatory tests are impermissible unless shown, by professionally acceptable methods, to be `predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.’
Measured against the Guidelines, Albemarle’s validation study is materially defective in several respects:
(1) Even if it had been otherwise adequate, the study would not have `validated’ the Beta and Wonderlic test battery for all of the skilled lines of progression for which the two tests are, apparently, now required. The study showed significant correlations for the Beta Exam in only three of the eight lines. Though the Wonderlic Test’s Form A and Form B are in theory identical and interchangeable measures of verbal facility, significant correlations for one form but not for the other were obtained in four job groupings. In two job groupings neither form showed a significant correlation. Within some of the lines of progression, one form was found acceptable for some job groupings but not for others. Even if the study were otherwise reliable, this odd patchwork of results would not entitle Albemarle to impose its testing program under the Guidelines. A test may be used in jobs other than those for which it has been professionally validated only if there are `no significant differences’ between the studied and unstudied jobs. The study in this case involved no analysis of the attributes of, or the particular skills needed in, the studied job groups. There is accordingly no basis for concluding that ‘no significant differences’ exist among the lines of progression, or among distinct job groupings within the studied lines of progression. Indeed, the study’s checkered results appear to compel the opposite conclusion.
(2) The study compared test scores with subjective supervisorial rankings. While they allow the use of supervisorial rankings in test validation, the Guidelines quite plainly contemplate that the rankings will be elicited with far more care than was demonstrated here. Albemarle’s supervisors were asked to rank employees by a `standard’ that was extremely vague and fatally open to divergent interpretations. As previously noted, each `job grouping’ contained a number of different jobs, and the supervisors were asked, in each grouping to `determine which ones (employees) they felt irrespective of the job that they were actually doing, but in their respective jobs, did a better job than the person they were rating against . . . .’
There is no way of knowing precisely what criteria of job performance the supervisors were considering, whether each of the supervisors was considering the same criteria or whether, indeed, any of the supervisors actually applied a focused and stable body of criteria of any kind.(3) There is, in short, simply no way to determine whether the criteria actually considered were sufficiently related to the Company’s legitimate interest in job-specific ability to justify a testing system with a racially discriminatory impact.
(3) The Company’s study focused, in most cases, on job groups near the top of the various lines of progression. . . . The Guidelines take a sensible approach to this issue, and we now endorse it:
If job progression structures and seniority provisions are so established that new employees will probably, within a reasonable period of time and in a great majority of cases, progress to a higher level, it may be considered that candidates are being evaluated for jobs at that higher level. However, where job progression is not so nearly automatic, or the time span is such that higher level jobs or employees’ potential may be expected to change in significant ways, it shall be considered that candidates are being evaluated for a job at or near the entry level.
The fact that the best of those employees working near the top of a line of progression score well on a test does not necessarily mean that that test, or some particular cutoff score on the test, is a permissible measure of the minimal qualifications of new workers entering lower level jobs. In drawing any such conclusion, detailed consideration must be given to the normal speed of promotion, to the efficacy of on-the-job training in the scheme of promotion, and to the possible use of testing as a promotion device, rather than as a screen for entry into low-level jobs. The District Court made no findings on these issues. The issues take on special importance in a case, such as this one, where incumbent employees are permitted to work at even high-level jobs without passing the company’s test battery.
(4) Albemarle’s validation study dealt only with job-experienced, white workers; but the tests themselves are given to new job applicants, who are younger, largely inexperienced, and in many instances nonwhite. The APA Standards state that it is ‘essential’ that
(t)he validity of a test should be determined on subjects who are at the age or in the same educational or vocational situation as the persons for whom the test is recommended in practice.’
The EEOC Guidelines likewise provide that `(d)ata must be generated and results separately reported for minority and nonminority groups wherever technically feasible.’ In the present case, such `differential validation’ as to racial groups was very likely not `feasible,’ because years of discrimination at the plant have insured that nearly all of the upper level employees are white. But there has been no clear showing that differential validation was not feasible for lower level jobs. More importantly, the Guidelines provide:
If it is not technically feasible to include minority employees in validation studies conducted on the present work force, the conduct of a validation study without minority candidates does not relieve any person of his subsequent obligation for validation when inclusion of minority candidates becomes technically feasible.
. . (E)vidence of satisfactory validity based on other groups will be regarded as only provisional compliance with these guidelines pending separate validation of the test for the minority group in question.
For all these reasons, we agree with the Court of Appeals that the District Court erred in concluding that Albemarle had proved the job relatendness of its testing program . . . .
Accordingly, the judgment is vacated, and these cases are remanded to the District Court for proceedings consistent with this opinion.
It is so ordered.
1. While the Company contends that the Bennett and Beta Tests were ‘locally validated’ when they were introduced, no record of this validation was made. Plant officials could recall only the barest outlines of the alleged validation. Job relatedness cannot be proved through vague and unsubstantiated hearsay.
2. In the course of a 1971 validation effort, test scores were accumulated for 105 incumbent employees (101 of whom were white) working in relatively high- ranking jobs. Some of these employees apparently took the tests for the first time as part of this study. The Company’s expert testified that the test cut-off scores originally used to screen these incumbents for employment or promotion `couldn’t have been . . . very high scores because some of these guys tested very low, as low as 8 in the Wonderlic test, and as low as 95 in the Beta. They couldn’t have been using very high cut-off scores or they wouldn’t have these low testing employees.’
3. It cannot escape notice that Albemarle’s study was conducted by plant officials, without neutral, on-the-scene oversight, at a time when this litigation was about to come to trial. Studies so closely controlled by an interested party in litigation must be examined with great care.