Rater Training — Open HR

Direct Answer

Rater training teaches the people who evaluate others — managers conducting performance reviews, interviewers assessing candidates, or reference providers completing questionnaires — how to make more accurate and consistent ratings. The most effective form, called frame-of-reference (FOR) training, gives all raters a shared understanding of what different levels of performance look like on each dimension being rated.

Why It Matters

Even well-designed assessment tools can produce poor data if the people using them rate inconsistently. One manager's idea of "excellent communication" might be another's idea of "adequate." Without a shared frame of reference, the same performance can receive very different ratings depending on who is doing the evaluating — not because raters disagree about what they observed, but because they are applying different internal standards.

Rater training addresses this by aligning everyone's standards before they begin rating. The result is ratings that are more accurate, more consistent across raters, and more useful for making decisions.

The Science Behind It

Frame-of-reference training was originally proposed by Bernardin and Buckley (1981, as cited in Roch et al., 2012) and has become the most widely researched and practiced approach to rater training. Its core components include: defining performance dimensions clearly, providing behavioral examples at each level of performance, giving raters practice opportunities with feedback, and establishing shared standards that all raters apply consistently.

Roch et al. (2012) conducted an updated meta-analysis — with over four times as many studies as the original Woehr and Huffcutt (1994) review — and confirmed that FOR training is an effective method of improving rating accuracy. They found that FOR training was particularly effective at improving differential accuracy (the ability to distinguish between ratees' actual strengths and weaknesses) and behavioral accuracy (correctly identifying which specific behaviors occurred).

Tsai et al. (2019) advanced the method further by identifying a limitation in traditional FOR training: the practice-then-feedback procedure can unintentionally create an anchoring effect, where raters fixate on their initial judgments rather than adjusting accurately. Their restructured FOR training — which presents evaluation standards before practice trials — produced accuracy improvements at least twice as large as typical FOR training across five studies with 1,143 participants.

Beyond accuracy, rater training has organizational value. Gorman et al. (2017) found that 61% of the 101 organizations they surveyed used behavior-based rater training approaches, and that companies utilizing behavior-focused rater training generated higher revenue than those providing rater error training or no training at all.

The principle extends beyond performance appraisal. Roch et al. (2012) documented that FOR training is directly applicable to assessment centers, employment interviews, job analysis, competency modeling, and selection test scoring — essentially any context where human judgment is used to evaluate people.

Common Misconceptions

An older approach, rater error training (RET), taught raters to avoid common mistakes like leniency bias and halo effects. While well-intentioned, RET was found to reduce visible rating errors without actually improving accuracy — raters simply learned to make their ratings look more differentiated without actually becoming better at distinguishing performance levels. FOR training replaced this approach because it focuses on giving raters the right standards rather than just telling them what mistakes to avoid.

How This Connects to Better Hiring

In reference checking, the "raters" are the reference providers — former managers and colleagues evaluating a candidate's past performance. The principles of rater training apply directly: when reference providers are given clear behavioral definitions, specific rating anchors, and standardized scales, their ratings become more accurate and more consistent. This is why structured reference checks outperform unstructured ones — the structure itself functions as a form of embedded rater training, providing the frame of reference that produces reliable data.