How to Choose a Test for Your Applications

Any individual test or test battery needs to be meaningfully tied to the job. If you're using full-scale intelligence tests for entry-level store clerks, you're asking for trouble. However, personality testing for managers or others in people-interface jobs makes sense. These are the things you need to consider with any testing system, and any selection process in general.

Be sure it is appropriate for your purposes and for your population. You should not use a calculus test for entry-level customer service reps or a low-level personality typing measure for executive selection.

Be sure it is fair. It should not systematically exclude any people or categories due to non-job-related factors (e.g., ethnicity, gender, religion). You need to be able to demonstrate that your test results are related to job behaviors, not unrelated factors. You want to discriminate between good and poor job performers, not between groups of people on factors other than job performance. Periodic checks for adverse impact are necessary to ensure consistent fairness.

Be sure it is reliable. It should provide consistent results over time. If your test gives the same general results when you test the same people with it later (test-retest reliability), that is a good sign. However, it is not the whole story. If your thermometer consistently registers ten degrees warmer than the actual temperature, it is reliable, but also wrong. This is where validity comes in.

Be sure it is valid. It should do what it is supposed to do. The primary types of validity as it concerns selection testing are listed below.

  • Content validity refers to the fit between the test and the content of the job. For instance, if you were testing candidates for a financial analyst and part of the battery included questions about how to interpret information found on balance sheets, that portion of the test would be considered of high content validity. That is, it provides a good sample of the actual content of the position. Such an item would also be said to have high face validity in that it appears to measure what the job requires. It would make sense to a casual observer that this type of test would be included in a selection test for this type of job.
  • Criterion-related validity reflects the relationship between a test score and a specific outcome. For instance, if there is a high correlation between a school entrance exam like the SAT and later grade-point average, the test would have high predictive validity. If you give the test to current students and correlate scores to current GPA, you are investigating concurrent validity, another form of criterion-related validity. To validate a selection test in a work setting, predictive validity is the gold standard. However, it is also the hardest to obtain because of the difficulties in getting good performance measures and because, in an ideal (academic) setting, one would give the test to every candidate, hire them all without reference to the test results, then measure everyone on the criteria at a later point. Not many businesses have that luxury. In business, we usually have to settle for good concurrent validity results, but those are strengthened by cross-validating (using the prediction equation from the original correlations between scores and performance to predict performance in a separate or "hold-out" sample). If the prediction holds up reasonably well with the separate sample, you have a good case for validity.
  • Construct validity refers to the extent to which a test measures the theoretical property (or construct) it is supposed to measure. For instance, if you have a test designed to measure math skills and the items are presented in the form of word problems, you may in fact be measuring reading comprehension rather than the construct of math aptitude. To determine construct validity, you need not only to show that a test correlates with the ability or attribute it purports to measure, but also that it does not correlate with unrelated attributes. The two methods for determining construct validity are convergent validity (how well it correlates with other established tests that measure the same type of ability) and discriminant validity (how it correlates with tests that measure some other type of ability). For example, a new test of vocabulary should correlate well with other tests of vocabulary, but not as well with tests of, e.g., math aptitude.

Quiz Questions: