About This Series
In Blog 3, we looked at how to find and hire exceptional people. In this post, we look at what happens after they join. Most performance management systems are built on a false assumption about how human performance distributes. That assumption shapes everything from how people are rated to how managers behave, and it quietly destroys the talent density you worked so hard to build.
The assumption nobody questions
Most performance review systems were designed on a belief that felt intuitive at the time: that human performance follows a normal distribution. Plot your people on a bell curve, and most cluster in the middle. A few are exceptional. A few underperform. Reward accordingly.
This belief underpins almost every performance management system built in the last fifty years:
- Forced ranking and stack ranking systems
- The 10-70-20 performance distribution model
- Compensation bands tied to position in the curve
- Rating quotas that cap how many people can receive top scores
There is one problem: for knowledge workers, it is empirically wrong.
Ernest O'Boyle Jr. and Herman Aguinis published a landmark study in 2012, covering over 633,000 individuals. They found that 94% of those populations followed a power law, not a normal distribution. A small number of exceptional contributors produced a disproportionate share of total output. In complex cognitive roles, the gap between the best and the average is not 20%. It is ten times or more.
When you build a performance management system on the wrong assumption, everything downstream is distorted. You rate exceptional people as good. You protect underperformers because the curve requires a middle. And you drive out your best people, who notice that the system was not designed to see what they actually do.
94%
Of studied populations follow a power law, not a bell curve
O'Boyle & Aguinis, 2012
58%
Of executives say their PM system drives neither engagement nor high performance
Deloitte Global Human Capital Trends
30%
Drop in voluntary attrition at Adobe after eliminating annual performance reviews in 2012
Adobe Internal Research
What forced ranking does to the people you most want to keep
The most vivid case study in the cost of the bell curve assumption is Microsoft between 2001 and 2012, the period journalist Kurt Eichenwald investigated in a landmark Vanity Fair article titled 'Microsoft's Lost Decade' (2012).
Under CEO Steve Ballmer, Microsoft used stack ranking: every team, regardless of overall quality, had to designate a fixed percentage as poor performers. The documented consequences:
- Top performers deliberately avoided working with strong colleagues who might outrank them
- Information was withheld from peers who were effectively competitors in the ranking
- Significant energy went into managing the manager's perception rather than doing great work
- Collaboration became a liability. Every colleague was a competitor
One former employee told Eichenwald: “If you were on a team of ten people, you walked in knowing that, no matter how good everyone was, two or three people were going to get a terrible review. It wasn't about performance. It was about politics.”
During this period, Microsoft missed the smartphone revolution, the search engine era, the social media wave, and the tablet market. The company that had been the most valuable in the world stagnated for a decade. Stack ranking was widely cited as a significant cultural contributor.
Microsoft vs. Netflix: two very different models
The contrast between Microsoft's historical approach and Netflix's Keeper Test is one of the most instructive comparisons in modern talent management. Both are serious, high-performance organisations. Both held their people to high standards. The difference is in the design of the system underneath.
Stack Ranking vs. The Keeper Test: A Comparison
| Dimension | Microsoft Stack Ranking (Historical) | Netflix Keeper Test |
|---|---|---|
| Philosophy | Forced ranking / Zero-sum competition. A fixed percentage must always be rated poor. | Professional sports team model: absolute, not relative, standards. No quota on excellence. |
| Mechanics | Pre-determined percentage-based grades. A fixed proportion must be rated poor regardless of actual performance. | Continuous reflection: 'Would I fight hard to keep this person?' Evaluated against role requirements, not peers. |
| Effect on top talent | Forces top performers to avoid strong teammates to protect their relative rating. Collaboration becomes a liability. | Retains and groups elite talent. Encourages collaboration because individual success is not threatened by strong peers. |
| Effect on collaboration | Incentivises withholding help and, in some cases, peer sabotage. Every colleague becomes a competitor. | Encourages high collaboration alongside personal accountability. Strong peers are an asset, not a threat. |
| Primary failure mode | Destroys psychological safety. Contributed to a decade of Microsoft stagnation documented in Vanity Fair, 2012. | Can induce chronic anxiety if implemented without transparency, care, and continuous honest dialogue. |
The critical distinction is between relative and absolute standards. Stack ranking evaluates people against each other. The Keeper Test evaluates people against what the role actually requires. In a stack ranking world, being exceptional on a team of exceptional people is dangerous. In a Keeper Test world, it is the whole point.
What the Keeper Test actually is and how it works
The Keeper Test is deceptively simple. Netflix asks managers one question about each person on their team: if this person told me they were leaving for a similar role at a competitor, would I fight hard to keep them?
The answer drives two very different sets of actions:
Yes: I would fight hard to keep them
Invest actively
- Expand scope and responsibility
- Increase compensation to reflect market rate
- Give stretch assignments and senior visibility
Outcome: Talent density maintained or raised. The high performer stays, recruits peers, and elevates team standards.
No: I would not fight to keep them
Have the honest conversation
- Agree a transition timeline with dignity
- Provide a generous severance package
- Reopen the role with a higher bar
Outcome: Role reopened with a higher bar. Talent density protected from the slow drift of gradual dilution.
From Netflix's Culture Memo
“Adequate performance gets a generous severance package.” This line is designed to be provocative. What it means in practice is that Netflix is committed to ensuring every seat is held by someone exceptional, and that when an exit happens, it is handled with genuine care for the person leaving.
Making feedback safe to give and receive: the 4As model
The Keeper Test without a feedback culture is management by surprise, which is one of the most destructive things a leader can do to a team. Netflix's answer to this risk is the 4As model. It trains both sides of every feedback interaction so that continuous evaluation does not collapse into subjective judgment and fear.
The 4As Feedback Model: Roles, Principles, and What Each Prevents
Source: Netflix Employee Handbook (AirMason); Netflix Culture Memo
The 4As model matters because it separates the quality of the feedback from the relationship between the people involved. When both sides understand their role, feedback becomes an act of investment rather than an act of judgment.
When performance improvement is real
One of the most commonly misunderstood elements of high-density performance management is what happens when someone is not performing. In most organisations, a Performance Improvement Plan is a paper exercise. Both parties know the outcome is predetermined.
Gorgias reports something different. In their model, approximately 50% of employees placed on a structured improvement plan successfully recover their performance standards when:
- The plan is designed with genuine specificity, not generic targets
- Managerial investment is real, with regular coaching conversations
- The timeframe is honest and clearly communicated
- The criteria for success and failure are defined upfront, not revised after the fact
The other 50% who do not recover are exited, but the attempt was real. The difference is not in the paperwork. It is in the intent.
Gorgias also runs bi-yearly cross-functional performance reviews, where each person's contributions are evaluated by a panel that includes people from outside their direct team. This reduces individual manager bias, surfaces contributions that might be invisible to a single manager, and creates a richer, more honest picture of actual performance.
What comes next
We have now covered what talent density is, what it costs when it is low, how to hire for it, and how to manage and evaluate for it. The last piece is the one that holds all the others together.
In Blog 5, we address the question founders ask us most often: can I maintain genuinely high standards and still have a culture where people feel safe, take risks, and bring their best ideas? Or do I have to choose?
Up Next in This Series
Blog 5: High standards and psychological safety are not opposites
We close the series with the framework that makes everything else sustainable.
