Why Behavioral Questions

What Rating Scales Actually Measure

A Likert-type rating scale asks a respondent to position themselves on a numerical continuum -- typically from "strongly disagree" to "strongly agree," or from "not at all effective" to "extremely effective." The approach is simple to administer, easy to score, and generates data that is straightforward to analyze statistically. It also has a significant and largely unaddressed problem: the numbers mean different things to different people.

When two respondents both select "4 out of 5" on an accountability question, there is no basis for assuming they are describing similar behavior. One may be anchoring to their best day. The other to their most typical week. One may be comparing themselves to peers they consider weak. The other to an aspirational standard they have never met. The scale captures a subjective sense of relative position; it does not capture a description of what actually happens (Krosnick & Fabrigar, 1997).¹

This matters because the purpose of a diagnostic instrument is not to capture how good someone feels about a dimension of their performance -- it is to identify what is actually happening and whether it is producing good outcomes. A measure of self-satisfaction is not a measure of behavior.

What Behavioral Descriptions Require Instead

Every response option in an Evans Learning Labs assessment is a behavioral description written at a specific and distinct level of effectiveness. Rather than rating yourself on accountability from one to five, you read five descriptions of how accountability actually operates -- at the lowest level, at the highest, and at three meaningfully different points in between -- and select the one that most accurately describes your consistent pattern.

Conventional rating scale

How effectively do you hold your team accountable for commitments?

1 — Not at all effectively

2 — Slightly effectively

3 — Moderately effectively

4 — Very effectively

5 — Extremely effectively

Measures self-satisfaction. The "3" selected here carries no information about what accountability actually looks like on this team.

Behavioral description

Which of the following best describes how accountability operates on your team?

Commitments are rarely tracked. When things are not delivered, the pattern is explained or absorbed rather than addressed.

Accountability exists in principle but is inconsistently applied. High-profile commitments are followed up; others are allowed to drift.

Commitments are generally tracked and followed up. When gaps occur, they are addressed directly, though not always consistently across the team.

Commitments are clear, tracked, and consistently followed up. Gaps are addressed promptly and without blame.

Accountability is a cultural norm. The team holds itself accountable without managerial prompting. Missed commitments are the exception and are addressed immediately.

Requires identification of actual behavioral pattern. Much harder to answer aspirationally than a numerical scale.

The behavioral description approach draws on the logic of behaviorally anchored rating scales (BARS), which were developed specifically to reduce the subjectivity and evaluative inconsistency of numerical rating scales by replacing abstract numbers with concrete behavioral examples (Smith & Kendall, 1963).² The evidence base for behavioral anchoring as a mechanism for improving rating accuracy and reducing leniency bias is well established (Landy & Farr, 1980).³

Five Levels, Meaningfully Distinct

Each domain in every Evans Learning Labs tool is evaluated across five behavioral levels, calibrated so that the difference between adjacent levels is practically significant -- not merely a gradient of the same theme. A score of 2.0 in a domain describes a genuinely different state than a score of 3.0, and the behavioral descriptions reflect that.

This calibration is what gives domain scores interpretive weight. When a leader scores 2.1 in self-awareness and 4.3 in execution discipline, that pattern carries real meaning: genuine capability in one area coexisting with a significant developmental gap in another. A rating scale cannot produce this kind of differentiated profile because the reference points are subjective and unstandardized.

The behavioral description format also makes aspirational responding harder. It is easy to circle "5 out of 5" on a rating scale without confronting what that actually means. It is much harder to read a description of the highest-performing accountability culture and honestly conclude that it describes your team. The instrument is designed to make honest answering easier and self-serving answering more cognitively costly.

What This Produces

The output of a behavioral-description assessment is a profile with genuine diagnostic specificity. Because every score maps back to a behavioral level, the recommendations associated with a given score are not generic guidance derived from the abstract label -- they are specific to the behavioral pattern that produced that score.

A leader who scores at level 2 in psychological safety is operating in a context where honesty carries real risk. A leader at level 4 has established the conditions but has not yet reached the point where the team self-manages its psychological safety norms. Those two leaders need different development conversations, and behavioral descriptions are what make the distinction visible.

See the difference in practice

The free assessment uses the same behavioral description format as every tool in the toolkit.

Take the free assessment

¹ Krosnick, J. A., & Fabrigar, L. R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz, & D. Trewin (Eds.), Survey measurement and process quality (pp. 141–164). John Wiley & Sons.

² Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47(2), 149–155. https://doi.org/10.1037/h0047060

³ Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107. https://doi.org/10.1037/0033-2909.87.1.72

What Rating Scales Actually Measure

What Behavioral Descriptions Require Instead

Five Levels, Meaningfully Distinct

What This Produces

See the difference in practice

Informational and Educational Use Only

Self-Reported Results

No Guarantee of Outcomes

Limitation of Liability

Governing Law