ELL
Evans Learning Labs
Diagnostic Toolkit
Research & Methodology

Item Banking and Retake Integrity

When an assessment is designed for repeated use over time, the answer choices themselves become a source of measurement error. Item banking is the mechanism that keeps retake scores honest.

The Retake Problem

An assessment administered once faces a familiar set of measurement challenges: response bias, aspirational self-reporting, question interpretation. An assessment designed for repeated use over time faces all of those, plus one more that is specific to repetition: respondents remember the answers.

This is not a minor problem. When a respondent completes an assessment and then retakes it six months later, they have access to something no first-time respondent has: memory of which options produced which scores the last time. If the same answer choices appear in the same form, a motivated respondent can select the highest-scoring options from memory, independently of whether their behavior has actually changed. The score improves. Nothing else has.

In a platform designed around longitudinal tracking, where the explicit purpose is to measure behavioral change over time, this would be a fundamental failure. A score improvement that reflects memorization rather than development gives individuals false evidence of growth and gives organizations false evidence of program effectiveness. It is worse than no measurement, because it provides a misleading signal with the appearance of rigor.

Item banking is the structural response to this problem.

What an Item Bank Is

An item bank is a collection of multiple, interchangeable phrasings for the same behavioral anchor. For every question in every assessment, each of the five point values has a pool of answer phrasings associated with it. All phrasings in the pool for a given point value describe the same level of behavior and produce the same score. They differ only in how that behavioral level is expressed in language.

Each time a tool is opened, the system randomly selects one phrasing per point value per question from the available pool. The respondent sees five choices, one for each level, but the specific words used to describe each level vary from session to session. A respondent who remembers that a particular sentence described the highest behavioral level on their last attempt will not see that sentence again. They will see a different description of the same level, requiring them to evaluate it on its content.

Example: item bank for one question at score levels 3 and 4 All options at the same level produce the same score
3
Version AMy work draws on my strengths meaningfully, though a significant portion still sits outside them.
Version BI engage my core capabilities regularly, but spend more time than I should on work that others could do as well or better.
Version CThere is reasonable overlap between what I do and what I do best, though the match is not as complete as I would like.
4
Version AMy current work regularly engages my strongest capabilities and I feel effective most of the time.
Version BThe work I spend most of my time on plays to my genuine strengths, with limited time required in areas where I am working against the grain.

The score produced by selecting any option in the score-3 pool is identical. The score produced by selecting any option in the score-4 pool is identical. What changes between sessions is which phrasing represents each level, making memorization of specific text useless as a strategy for improving scores.

Why This Matters for Longitudinal Tracking

The core proposition of the Evans Learning Labs platform is that assessments should be retaken over time, and that score changes over time should be interpretable as evidence of behavioral change. Item banking is what makes that proposition credible.

Without item banking, longitudinal score comparisons are contaminated by a confound the platform cannot separate from genuine change: how familiar is the respondent with the specific answer choices? A score that increases from 2.8 to 3.6 between an initial assessment and a six-month retake might reflect real behavioral development. Or it might reflect that the respondent remembered which choices produced higher scores and selected them more strategically the second time. Item banking removes the second explanation.

In educational and psychological measurement, the problem of practice effects (improved performance on retesting due to familiarity rather than genuine change) is well documented (Lievens et al., 2007).1 Item banking is the standard mechanism for managing this effect in high-stakes longitudinal assessment contexts (van der Linden & Hambleton, 1997).2 The application here adapts the same logic to behavioral self-report instruments.

This matters most in the specific contexts where ELL tools are most often used. When an individual retakes a tool after a coaching engagement, the retake score is used as evidence of whether the engagement produced change. When an organization tracks aggregate scores across a leadership cohort before and after a development program, those aggregate comparisons are used to evaluate program effectiveness. In both cases, contamination of the score by answer memorization would produce systematically misleading evidence.

Three Properties the Bank Preserves

1
Score equivalence across phrasings All phrasings within a point-value pool for a given question are calibrated to describe the same behavioral level. A respondent who genuinely operates at the level described by score-4 options will recognize themselves in any score-4 phrasing, regardless of which specific wording they encounter. The recognition is behavioral, not lexical.
2
Comparability of scores across sessions Because all phrasings for a given point value produce the same score, a score of 3.4 on an initial assessment and a score of 3.4 on a retake are numerically equivalent regardless of which specific phrasings appeared in each session. The longitudinal comparison retains its interpretive validity.
3
Resistance to strategic score inflation A respondent who wants to appear to have improved cannot achieve this by memorizing specific answer text from a prior session. The text will be different. They can only achieve a higher score by recognizing themselves in higher-scoring behavioral descriptions, which requires that their behavior has actually moved.

How the Bank Grows

The bank is not static. Additional phrasings can be added to any question's pool at any time without changing the scoring, the structure of the tool, or the interpretability of historical scores.

The bank will be expanded based on what is learned about how different groups interpret and relate to different formulations of the same behavioral level. Phrasings that are consistently misunderstood or that produce unexpected response patterns can be replaced or supplemented. New phrasings that better capture certain behavioral nuances can be added.

This is one of the structural advantages of separating answer content from assessment structure. The questions, the scoring logic, and the interpretive framework can remain stable while the specific language used to present each behavioral level continues to be refined. Respondents benefit from an instrument that improves over time without losing comparability with their own historical results.

See the methodology in practice

The free Organizational Performance Assessment demonstrates item banking, answer randomization, and the full ELL methodology with no purchase required.

Try the free tool

1 Lievens, F., Reeve, C. L., & Heggestad, E. D. (2007). An examination of psychometric bias due to retesting on cognitive ability tests in selection settings. Journal of Applied Psychology, 92(6), 1672–1682. https://doi.org/10.1037/0021-9010.92.6.1672

2 van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of Modern Item Response Theory. Springer. https://doi.org/10.1007/978-1-4757-2691-6

Welcome back
Sign in to access your assessments
No account?
Terms of Use and Disclaimer

Informational and Educational Use Only

The diagnostic tools, assessments, profiles, and indexes offered by Evans Learning Labs are designed for informational and educational purposes only. Results do not constitute professional consulting advice, legal advice, psychological assessment, clinical evaluation, or any form of certified professional guidance.

Self-Reported Results

All results are based entirely on the responses provided by the individual completing the assessment. Evans Learning Labs makes no representation that scores or profiles accurately reflect objective organizational conditions or any other measurable external reality.

No Guarantee of Outcomes

Evans Learning Labs does not guarantee that use of these tools will produce any specific organizational, leadership, or performance outcome. Recommendations are general in nature and may not be appropriate for every individual, team, or organizational context.

Limitation of Liability

To the fullest extent permitted by applicable law, Evans Learning Labs, its principals, employees, and affiliates shall not be liable for any direct, indirect, incidental, consequential, or punitive damages arising from the use of or reliance on these tools or their results.

Governing Law

These terms are governed by the laws of the United States and Commonwealth of Kentucky.