Most assessments ask you to rate yourself on a scale of one to five. Evans Learning Labs does not. Every response option is a behavioral description -- and that distinction is not cosmetic.
A Likert-type rating scale asks a respondent to position themselves on a numerical continuum -- typically from "strongly disagree" to "strongly agree," or from "not at all effective" to "extremely effective." The approach is simple to administer, easy to score, and generates data that is straightforward to analyze statistically. It also has a significant and largely unaddressed problem: the numbers mean different things to different people.
When two respondents both select "4 out of 5" on an accountability question, there is no basis for assuming they are describing similar behavior. One may be anchoring to their best day. The other to their most typical week. One may be comparing themselves to peers they consider weak. The other to an aspirational standard they have never met. The scale captures a subjective sense of relative position; it does not capture a description of what actually happens (Krosnick & Fabrigar, 1997).1
This matters because the purpose of a diagnostic instrument is not to capture how good someone feels about a dimension of their performance -- it is to identify what is actually happening and whether it is producing good outcomes. A measure of self-satisfaction is not a measure of behavior.
Every response option in an Evans Learning Labs assessment is a behavioral description written at a specific and distinct level of effectiveness. Rather than rating yourself on accountability from one to five, you read five descriptions of how accountability actually operates -- at the lowest level, at the highest, and at three meaningfully different points in between -- and select the one that most accurately describes your consistent pattern.
The behavioral description approach draws on the logic of behaviorally anchored rating scales (BARS), which were developed specifically to reduce the subjectivity and evaluative inconsistency of numerical rating scales by replacing abstract numbers with concrete behavioral examples (Smith & Kendall, 1963).2 The evidence base for behavioral anchoring as a mechanism for improving rating accuracy and reducing leniency bias is well established (Landy & Farr, 1980).3
Each domain in every Evans Learning Labs tool is evaluated across five behavioral levels, calibrated so that the difference between adjacent levels is practically significant -- not merely a gradient of the same theme. A score of 2.0 in a domain describes a genuinely different state than a score of 3.0, and the behavioral descriptions reflect that.
This calibration is what gives domain scores interpretive weight. When a leader scores 2.1 in self-awareness and 4.3 in execution discipline, that pattern carries real meaning: genuine capability in one area coexisting with a significant developmental gap in another. A rating scale cannot produce this kind of differentiated profile because the reference points are subjective and unstandardized.
The behavioral description format also makes aspirational responding harder. It is easy to circle "5 out of 5" on a rating scale without confronting what that actually means. It is much harder to read a description of the highest-performing accountability culture and honestly conclude that it describes your team. The instrument is designed to make honest answering easier and self-serving answering more cognitively costly.
The output of a behavioral-description assessment is a profile with genuine diagnostic specificity. Because every score maps back to a behavioral level, the recommendations associated with a given score are not generic guidance derived from the abstract label -- they are specific to the behavioral pattern that produced that score.
A leader who scores at level 2 in psychological safety is operating in a context where honesty carries real risk. A leader at level 4 has established the conditions but has not yet reached the point where the team self-manages its psychological safety norms. Those two leaders need different development conversations, and behavioral descriptions are what make the distinction visible.
The free assessment uses the same behavioral description format as every tool in the toolkit.
Take the free assessment1 Krosnick, J. A., & Fabrigar, L. R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz, & D. Trewin (Eds.), Survey measurement and process quality (pp. 141–164). John Wiley & Sons.
2 Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47(2), 149–155. https://doi.org/10.1037/h0047060
3 Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107. https://doi.org/10.1037/0033-2909.87.1.72