Introduction
The Net Promoter Score (NPS) is widely used, either as a word of mouth indicator or a satisfaction metric. Thousands of corporations and even small businesses use the metric, aided by a legion of survey providers and software vendors. However, the NPS receives a lot of academic criticism. Three big criticisms are:
- NPS scores are touted as ‘the one number you need to grow’. However, industry examples show the growth comes before the high NPS, not afterwards (e.g. Sharp, 2008; Shaw, 2008).
- NPS says negative word of mouth comes from people who give low willingness to recommend scores. But low willingness to recommend is not the same thing as giving negative word of mouth (East, 2008).
- NPS is based on willingness to recommend – not actual recommendation. Ample research shows claimed intentions are fairly poor predictors of real behaviour. In fact most people who give high willingness to recommend scores do not end up recommending (Kumar, Petersen, & Leone, 2007).
These are serious shortcomings. Businesses use the NPS in the belief that sales growth will come from recommendations (or decline if the NPS drops). That’s the key message from Bain and all the consultancy firms that sell NPS – improve your NPS and you will grow, because highly loyal, ‘fan’ clients will recommend you.
The first problem with tracking willingness to recommend is that it gives a hugely inflated impression of the real amount of word of mouth that occurs for a brand. Second, pinning one’s future on consumer recommendations (or more correctly, their willingness to recommend) can lead to under-investment in advertising and the other important factors that we know actually drive brand growth, such as improving Physical Availability – refer to How Brands Grow 1 and 2.
This study identifies another, as-yet undocumented problem, specifically with the method used to derive the NPS. That is, by discarding information, and subtracting low scores from high scores, the NPS method induces unwanted artificial volatility in results. We run a simulation study that shows NPS scores fluctuate five times more than what occurs for a simple average of zero-to-ten willingness to recommend scores. The volatility makes assessing customer service improvements very difficult. It can also be dangerous, because marketers can be constantly changing tactics, over-reacting to upticks or downswings in the measure.
The NPS method – and the variation problem
NPS scores are obtained by asking respondents their willingness to recommend on a zero to ten scale. Scores of 7 and 8 are discarded, and then the 0-6 scores are deducted from the 9 and 10 scores. This process induces inflated variation in the scores. Throwing away 7 and 8 simply reduces the sample of responses – fewer responses, more random error in results. Next, the original mean scores have information value – a score of 10 is different to a score of 9, and a score of 6 is far better than a score of 1, 2, or 3. But the NPS throws away this fine grained information by lumping all high scores into one group, and all low scores into another group. What a waste!
And thirdly, subtracting the 0-6 scores from the 9 and 10 scores means that small differences in the original willing to recommend score are blown up into much larger differences in the NPS. If you get fewer high scores, it generally means your proportion of low scores is also a bit higher (all the scores add to 100 percent after all) – the NPS ‘double counts’ these score fluctuations.
A simple illustration
Perhaps this variation is what the originators of NPS wanted. After all, variation in scores makes research clients nervous and anxious to get the next update. To illustrate, suppose we have just 10 responses to an NPS survey and the scores are 10,9,8,8,8,7,6,5,4,4. The average score is 6.9 out of 10 – not bad, but not great either. The NPS is a disappointing -20. At face value this score seems low in comparison to the mean score, which is nearly 70% of the highest possible score. Now suppose we then make one little change, namely we alter the one score of 9/10, and make it 8/10 instead. This makes little difference to the average (now 6.8) but now, because we only have one score at or above 9, the NPS dramatically plunges to -30! A 0.1 decimal point change in the average has translated to a 10-point drop in the NPS. Such a big drop in NPS seems quite serious! But here, it’s arisen from a fairly trivial decimal point change in the average score.
A simulation, to investigate further
To investigate further, we ran a simulation to see how much more period-to-period variation the NPS creates, compared to simply using an average willingness-to-recommend score out of ten. We were shocked to find it has around five times as much variation. This variation makes it an extremely difficult tool to use for market research clients, who then have the task of explaining: why did the scores drop so much from last time; or: our scores are up twenty points, any clues why?
Our simulation comprised a population with an average willingness to recommend score of 8.0 out of 10. We used this figure based on an overall average score over multiple studies. The figure is also consistent with the fact that average satisfaction scores for businesses are usually quite positive (Peterson & Wilson, 1992); and as stated, satisfaction and NPS scores correlate highly.
The population scores for individual ‘respondents’ in the simulation had a distribution, or spread closely matching real scores that we obtained from our own willingness to recommend surveys. The distribution of scores are in Figure 1:
Figure 1. Willingness to recommend scores.

We checked if the shape, or spread of scores in our simulated population made any difference to the results. We tried different distributions such as a truncated normal distribution, and the results did not alter very much. We also tested different average willingness to recommend scores, these did not change the results either.
Also note that the population willingness to recommend score stayed the same over the simulation. We took randomly selected samples of this population, using sample sizes of 200 each time. We repeated this process ten times, and recorded the average intent score each time. In effect, this approach is just like doing ten waves of marketplace surveys where the population willingness to recommend is absolutely stable.
The sample scores we obtained for each sample, or ‘survey wave’ are not always exactly 8.0 each time, because of random sampling variation. We calculated the mean average score for each sample, and also calculated the NPS each time.
To compare the volatility of the NPS to the average willingness to recommend scores, we calculated the average percent change in scores from wave to wave, for both of the metrics. This figure summarises the wave-to-wave variation in both metrics, controlling for the fact one has a mean of 8.0 and one has a mean just under 50.
We show the results in two graphs, as well as a table that shows the actual numbers. To make the graphs comparable, we set the range of the Y axis to be plus or minus 20% of the population mean score, for both the average willingness to recommend scores, and the NPS.
The willingness to recommend scores vary from wave to wave by 2%. They are all between around 7.9 to 8.2 (out of 10). In contrast, the NPS gyrate around much more, ranging from 53 down to 42, an average wave to wave change of 10%. This is a lot more random variation than the average willingness to recommend score. In fact, we can see in the table that sometimes, the exact same mean willingness score of say, 8.0 translates into different NPS scores of 51 and 47. Plainly this ‘noise’ be undesirable from a measurement viewpoint. The reason for this score fluctuation is because slight differences in the total number of 9 or 10 scores dramatically change the NPS score.
Figures 2 and 3 Comparing average willingness to recommend scores, to NPS
The variation in NPS is far greater than it is for average willingness to recommend scores
Table 1. Comparing NPS with average willingness to recommend scores

From the right column in the table we see that the NPS shows five times as much random variation (10% vs 2%) from wave to wave compared to a simple average willingness to recommend score. Such variation will make it more difficult for insights managers to explain what is going on with the NPS results, or figure out if they are meaningfully related to marketing interventions designed to boost the business’ NPS level. And, of course market research companies face the difficulty of trying to explain these movements from wave to wave. Spurious variation will force people to waste time trying to guess or speculate on what the explanation is.
In summary we recommend:
Tracking willingness to recommend is misguided in the first place. We recommend you don’t track it. If recommendation is actually relevant in your industry, why not track actual recommendation given, and actual recommendation received? It’s actual recommendation that should matter more to a business.
Using the NPS method throws away information and results in unnecessary random variation in scores. Therefore, if your business is wedded to tracking willingness to recommend, don’t use the NPS, use the mean average recommendation score. At least insist your market research provider gives you both the average willingness to recommend score and the NPS so you can compare the amount of variation in each of them. And lastly, if you are also paying to track satisfaction, it’s already telling you most of what you would get from NPS research, so don’t waste funds tracking NPS as well.