Direct Answer

Psychometric validity is the degree to which a measurement tool actually measures what it claims to measure — and whether the conclusions drawn from its scores are supported by evidence. In hiring, it answers the most fundamental question about any assessment: "Does this tool give us information we can trust for the decisions we need to make?"

Validity is not a single number. It is a body of evidence, gathered from multiple sources, that supports (or fails to support) specific interpretations of scores.

Why It Matters

Every hiring tool rests on a claim. A cognitive ability test claims to measure mental aptitude. A personality questionnaire claims to measure behavioral tendencies. A reference check claims to capture how someone actually performs at work. Psychometric validity is how you evaluate whether those claims hold up under scrutiny.

Without validity evidence, you are guessing. A tool might produce consistent scores (that is reliability), but those scores could be measuring something completely irrelevant to job performance. A tool can be perfectly reliable and completely invalid — consistently measuring the wrong thing.

The Science Behind It

Modern psychometrics treats validity as a unified concept rather than separate "types." This framework, developed through the foundational work of Cronbach and Meehl (1955) and refined by Messick (1995), holds that validity is not a property of the test itself but of the inferences drawn from its scores (Van Iddekinge et al., 2023).

In practice, evidence for validity comes from several complementary sources:

Content validity examines whether the tool's content adequately represents the domain it is supposed to measure. For a reference check, this means asking: do the questions cover the behaviors and competencies that are actually relevant to the job?

Criterion-related validity examines whether the tool's scores predict meaningful outcomes. This is typically assessed by correlating scores with measures of job performance, turnover, or other criteria of interest. As discussed in our glossary entry on criterion-related validity, structured reference checks achieve r = .35 with supervisory performance ratings (Hedricks et al., 2013).

Construct validity examines whether the tool measures the psychological construct it is supposed to measure, and not something else. This involves demonstrating that scores correlate with things they should correlate with (convergent validity) and do not correlate with things they should not (discriminant validity).

Binning and Barrett (1989) unified these three lines of evidence into a single framework, arguing that all validation is fundamentally about gathering empirical and judgmental evidence to support inferences linking psychological constructs to operational measures. This unified view means that content, criterion-related, and construct evidence are not competing approaches — they are complementary pieces of the same puzzle.

Common Misconceptions

The most common misconception is that a tool is either "valid" or "not valid" — as if validity were a binary property. In reality, validity is always a matter of degree, always specific to a particular use, and always contingent on the population and context in which the tool is applied. A tool that is valid for predicting performance in one job may not be valid for another. A tool validated in one country may require revalidation in a different cultural context.

Another misconception is confusing face validity — whether a tool looks like it measures something useful — with actual psychometric validity. Many tools with high face validity have low predictive power, and vice versa.

How This Connects to Better Hiring

Psychometric validity is the standard that separates evidence-based hiring from guesswork. Any organization using assessments, references, or interviews in their selection process should be asking: what is the validity evidence for this tool, with this population, for this purpose? Tools backed by strong validity evidence improve hiring outcomes. Tools without it are, at best, expensive noise.