Interview guide for hiring managers: a structured playbook

An end-to-end playbook for hiring managers: prep, opening, behavioural questions, scorecard, debrief, and decision. Evidence-led and runnable.

A dark wooden desk with a green leather notebook, fountain pen, glasses, and a paper sheet showing a blank scoring grid.

Why a real interview guide for hiring managers matters

The first thing most hiring managers say after an interview is some version of "I think that went well." The second thing, six months later, is some version of "I'm not sure what happened with that hire." The two sentences are usually about the same person, and the gap between them is what an interview guide for hiring managers is supposed to close.

At five people, gut hiring works, badly. At fifty it produces a quiet mess: uneven panels, decisions no one can defend a year later, and a steady drip of mediocre offers extended to candidates who interviewed well. The cost is hard to see one hire at a time and impossible to ignore in aggregate.

The evidence is not subtle. Dana, Dawes and Peterson (2012) ran the now-famous studies in which interviewees secretly answered questions at random; interviewers came out just as confident in their impressions as they were of the candidates who had answered honestly. Research spanning a century is unkind to the unstructured chat, and Bohnet (2016) revisits the University of Texas Medical School natural experiment, where the fifty applicants who were initially rejected after unstructured interviews went on to perform indistinguishably from those who were accepted.

The honest answer to "we know a good one when we see them" is: most of the time, no, you know who interviews well. This article is the playbook a busy hiring manager actually runs - prep, opening, behavioural questions, scorecard, debrief, decision - rather than a list of clever questions you might or might not get round to asking.

Prep: the work that decides the interview before it starts

Most of the interview is decided before anyone sits down. A good interview guide gives the hiring manager four prep jobs to run through, in order, and refuses to let the interview happen until they are done.

First, agree the role. Not the job advert; the actual job. What will this person spend their time on, what does good look like in six months, and which of the people already on the team should they be able to outperform on at least one dimension. Second, pick five to seven capabilities. Campion, Palmer and Campion (1997) identified job analysis as one of the most influential of the fifteen components of interview structure, because every downstream choice - which questions to ask, how to score, what to argue about in the debrief - depends on knowing which capabilities matter. More than seven and the scorecard becomes a wishlist; fewer than five and the panel ends up rating personality.

Third, write the questions. Two or three behavioural questions per capability, the same set for every candidate, and a fixed time budget. The studies Campion and colleagues reviewed reported a mean interview length of 39 minutes with about 16 to 17 questions, which is a useful budget for a small business: long enough to gather evidence, short enough that the panel will actually run it more than once.

Fourth, brief the panel. Who is asking which capability, in what order, and what scoring scale they are working from. The federal hiring manager's three-step playbook insists on this, and any small business that has had two interviewers ask the same question twice already knows why.

Four numbered cards on a leather desk pad showing the prep tasks: a job page, a list, question marks, and a stick-figure panel.
The four prep tasks every hiring manager owns before the interview opens.

Prep is the unglamorous work that makes the rest of the playbook runnable. It is also where most structured hiring fails in practice. The operational burden on hiring managers is the real reason interview guides end up as PDFs no one opens; cut the prep small enough and it gets done.

The opening: setting up the hour without contaminating it

The first three minutes of an interview matter more than they should. Bohnet (2016) summarises a long line of evidence that interviewers form an impression of the candidate well before the candidate has said anything assessable, and then spend the rest of the conversation seeking confirmation of it. The first thirty seconds of a candidate encounter is where anchoring bias does most of its damage, and a hiring manager who treats the opening as warm-up is leaving the day's most important data point to chance.

Three things to do at the top, in order. Brief the candidate on the format, plainly: who they will speak to, how long the interview will run, what kinds of questions they will get. Tell them the panel will be taking notes and that this is normal, not a verdict. Hold their questions to the close - not because you are uninterested, but because letting them set the agenda contaminates the standardisation that makes the rest of the hour comparable to other candidates' hours.

Two things to avoid. The rapport-building chit-chat about weekend plans and the local football team feels human, and it is, but it becomes the unspoken first data point and the easiest one to misread. And "tell me about yourself" is not a question; it is a substitute for one, and a candidate who has been on the circuit can answer it without telling you anything you can score.

Behavioural questions: the workhorse format

A behavioural question asks the candidate for a specific past episode and then probes it. The difference from "have you ever managed a difficult stakeholder?" is the difference between a yes / no and a story you can score. The structured interview is built on this format because it gives every interviewer the same kind of evidence to weigh - what someone actually did, not what they think they would do.

McClelland (1998) is the parent of the format. His Behavioural Event Interview, designed for executive selection, asked candidates to describe critical incidents in detail and then coded the answers against a competency framework. Coding was reliable and predictive of executive success when the interviewer and the coder were blind to outcomes. The contemporary STAR framework that organises a behavioural answer - situation, task, action, result - is the practitioner-friendly descendant of that approach.

Salgado and Moscoso (2002) matter here too. Their construct-validity meta-analysis showed that conventional interviews - the open-ended chat - tend to load on general mental ability and personality, which the rest of the hiring process can already measure. Behavioural interviews, by contrast, capture experience and situational judgement more cleanly, which is the reason they sit in the playbook at all.

A four-quadrant illustration on parchment of the STAR framework: a landscape, a target, a footprint with arrow, and a bar chart with a tick.
Situation, task, action, result - the four prompts that turn a story into evidence.

Good probing is unglamorous and decisive. Who specifically did what. When did it happen, and over how long. Who else was involved, and what did they do. What changed because of the action, and how do you know. The candidate who can answer those four follow-ups for a single episode has given you something to score; the candidate who keeps generalising is telling you, politely, that the episode either didn't happen or didn't involve them as advertised.

Hypothetical or situational questions - "what would you do if..." - have a place as a complement, particularly for capabilities a junior candidate has not had the chance to demonstrate. They are a thinner kind of evidence than a real episode and a structured interview should not lean on them as the main course. As Campion, Palmer and Campion (1997) put it, better question types raise validity through job-relatedness, and a real story beats a thought experiment most days of the week.

The scorecard: a one-page artefact, not a vibe

A scorecard is a one-page list of the capabilities the role is being hired for, a fixed rating scale, behavioural anchors at each level, and a column for the evidence. That is the whole artefact. Anything more elaborate gets ignored at the speed people read; anything less leaves the panel grading personality.

Cap the capability list at five to seven. A scorecard people will actually fill in stays inside that range, because each capability needs to be paired with at least two behavioural questions and a defensible anchor for each rating level. Twelve capabilities means a scorecard no one finishes. Three means a scorecard that does not discriminate between candidates.

Score per answer, not per interview. Bohnet (2016) is direct on this: vivid examples and recent answers crowd out earlier, less colourful ones if the interviewer waits until the end to rate. Scoring as you go forces the interviewer to weigh each answer on its own evidence, before the next one overwrites it. It also produces the artefact the debrief needs, which the next section is about.

A parchment scorecard on a wooden desk with five blank capability rows and four small rating boxes per row, beside a fountain pen.
A scorecard short enough to fill in and structured enough to defend.

Anchored rating scales are what stop interviewers drifting apart. Campion, Palmer and Campion (1997) singled them out as one of the six most influential structure components, because anchors phrased in job-relevant terms - "would lead the project unprompted and pull others with them" beats "very strong" - give every interviewer the same fixed reference points to rate against. Without anchors, one interviewer's three is another's four and the debrief becomes an argument about scales.

A small practical note: a 1 to 4 scale beats a 1 to 5. There is no middle to hide in. An interviewer who has to commit to "above the bar" or "below it" will go and find the evidence that justifies the call; an interviewer with a comfortable three available will leave the call on the candidate. The scorecard is filled out before anyone speaks at the debrief, which is what makes the debrief possible at all.

Tools that make this guide stick

Everything above is well-known, in the sense that the academic literature has been saying it for forty years and most senior HR practitioners can recite the headlines. What is not well-known is how to install it inside a 30-person company without hiring a consultant, buying an enterprise platform, or asking the founder to read Levashina, Hartwell, Morgeson and Campion (2013). The reason the playbook stays on the shelf is almost never disagreement; it is operational. Someone has to do the work, and that someone is usually the hiring manager who already has a day job.

HireSchool is a self-guided digital programme called the Structured Hiring Method, designed to solve exactly that problem. It is delivered as video content and a learning management system, so a small business can install structured hiring across its team without bringing anyone in. The hiring manager works through the modules at their own pace, codifies the company's own capabilities and standards, and brings the rest of the interviewing team along on the same curriculum. Every interviewer ends up running the same playbook, on the same scorecard, at the same standard.

The contents map directly onto what the article has been describing. Leadership values give the company a fixed set of capabilities to anchor every scorecard against, so a "judgement" rating means the same thing in finance interviews and in engineering interviews. Codified scorecards turn the one-page artefact from the previous section into a template the team can use without rewriting it for every role. Behavioural-interviewing training - video plus practice - is what gets every interviewer asking and probing the same way, which is the part that always slips when a structured guide is left to good intentions. Decision-management runs the debrief and the final call, so the failure modes the next section will describe stop being failure modes.

The real reason structured hiring stalls inside companies, in survey after survey, is that hiring managers do not comply with it - around 40 per cent of HR teams report this as the chief barrier. The answer is not more enforcement. The answer is making the practice cheap enough to follow that there is nothing to enforce. A programme the hiring manager can run on their own is one such answer.

HireSchool is not a consultancy, not an applicant tracking system, and not a recruiting agency. The customer's team builds the muscle; HireSchool gives them the kit. If the playbook in this article reads like the kind of thing you would like running quietly inside your business, you can explore the Structured Hiring Method programme and see how the modules are put together.

The debrief: where most hires go wrong

The interview is not where most hires fail. The debrief is. After every interviewer has done their best to score honestly, the team gets in a room and talks itself out of the evidence in about twelve minutes.

Two failure modes do most of the damage. The first is groupthink: whoever speaks first, or loudest, sets the anchor and the rest of the panel calibrates around it. The second is dilution. Kausel, Culbertson and Madrid (2016) showed that interviewers given more unstructured information about a candidate became more confident in their judgements and less accurate; extra detail does not get ignored, it crowds out the scorecard evidence the team is supposed to be weighing.

The fix is mechanical. Every interviewer submits their scorecard before the debrief opens, and no one sees anyone else's scores until they have submitted their own. Independent scoring before any group discussion is the single change that does the most work, because it is what the whole playbook has been pointing at: a scorecard worth defending, filled out from evidence, before the room talks itself into a different answer.

Three flat illustrated cards with arrows: independent scorecards, a round-table discussion, and a final tick mark in a square.
Independent scoring first, then evidence-led debrief, then the decision.

Bohnet (2016) adds a small but useful trick: read horizontally, not vertically. Go through every interviewer's score on capability one before moving to capability two. This forces the panel to notice that the candidate scored a four on judgement and a two on collaboration, instead of folding both into a general "I liked them". Differences between interviewers on the same capability are interesting; differences between capabilities for the same candidate are more interesting. Both get lost in a "did we like them?" debrief.

The debrief style worth retiring is "let's talk it through and see where we land". It sounds collaborative and is, in practice, where the loudest opinion in the room wins.

The decision and the next interview

The decision rule is the part of the playbook a hiring manager should write down before the first candidate walks in, not after. A workable default: a minimum of three on every must-have capability, and an average above an agreed threshold, computed across all interviewers. Anything below the bar is a no, regardless of how the conversation felt.

Campion, Palmer and Campion (1997) include statistical aggregation - summing or averaging the scores - in their list of the most influential structure components, because it does what panels rarely do voluntarily: it weighs every interviewer's evidence equally and then commits to the result. Clinical prediction, which is the politer name for "let's just discuss it and decide", consistently underperforms the same data run through a sum.

The other half of the rule is what to do with the guide afterwards. Every interview is also data on the guide itself. Levashina, Hartwell, Morgeson and Campion (2013) emphasise that structured interviewing is iterative; the questions that stop discriminating between strong and weak candidates get retired, the ones that do get sharpened, and the scorecard gets a quarterly review like any other piece of operational kit. A hiring manager who does this once a quarter ends up with an interview guide that fits their company specifically, rather than a generic one that fits no one.

Structured hiring is a muscle, not a memo. The first interview run by the playbook will feel awkward. The third will feel deliberate. The tenth will feel obvious, and the team will quietly stop being able to imagine running an interview any other way. That is the point at which the costs the article opened with - the offers extended to candidates who interview well, the decisions no one can defend a year later - stop showing up in the numbers.