BRANDS SHARE THEIR CUSTOMERS
This report outlines the process of organising data in duplication analyses so that the main structure of competition between brands is clear and easy to see. The technique is very simple, and can be used on survey data as well as the more common use with consumer panels.
In most product categories, when consumers make successive purchases over time, most will buy more than one brand — few will restrict their purchases to a single brand. By studying which pairs of brands consumers buy, we can learn a lot about how the different brands compete with one another. For example, we might expect that brands that have a similar positioning or that target the same customer segment would share the most customers with one another (that is, compete more closely with one another). Conversely, within a given category, we might expect that few people would buy a cheap brand in addition to an expensive brand — that is, we might expect that these brands would share customers rather less than normal. However, this then begs the question: ‘what is considered a normal amount of sharing between brands?’ Duplication of Purchase Analysis is a straightforward analytical procedure that produces benchmarks of what this normal level of sharing should be. Using these benchmarks, it is then a simple exercise to confirm whether or not the observed level of sharing between brands is in line with the expected, theoretical level or whether it is more or less than this normal level.
DUPLICATION ANALYSIS
For every combination of a pair of brands, we can record which buyers have bought each of the two brands and who bought both. Suppose that in some time period, 100 people bought Brand A and 30 people bought Brand B. Of these, let’s say 10 bought both. These 10 duplicate buyers (or ‘switchers’) are therefore only 10% of the larger Brand A’s buyers but a much larger 33% of Brand B’s buyers. Just looking at the different proportions of Brand A and Brand B buyers who are ‘switchers’, one might conclude that the larger brand has a more loyal customer base, and that therefore Brand B should focus on strengthening customer loyalty. But this would be a fool’s errand. The lower proportion of switchers does not indicate anything special about Brand A having a more loyal customer base than Brand B — it is simply that it is bigger than Brand B.
Now consider a third brand, C, which has 80 buyers. Some of Brand C’s customers may have also bought Brand A. However, how might this compare with the proportion of Brand B’s buyers who bought A (which was 33%)? The proportion of those who bought Brand C and who also bought Brand A could be anywhere from 0 to 100%, i.e. from zero to all 80 of Brand C’s customers could also have bought Brand A. Likewise, B could share between all and none of its 30 customers with C. So it is possible that Brand C could have almost no customers who also bought Brand A; while at the same time, Brand C could have perhaps nearly all of Brand B’s 30 customers having bought it. That is, Brand C could have a much bigger proportional duplication of its buyers with the small Brand B (30 buyers also bought Brand B: which is 38% of C’s 80 buyers) than with the bigger Brand A (none bought Brand A: 0% of C’s buyers).
Table 1a and Table 1b illustrate this scenario. In Table 1a, the absolute numbers are shown and then in Table 1b, these are converted into percentages based on vertical (column) totals. B and C overlap heavily (30 buy both), A and C don’t overlap at all, and A and B are somewhere in between (10 people buy both). Table 1a is symmetrical about the diagonal (the number of buyers of A who also buy B is the same as the number of buyers of B who also buy A). Table 1b is not symmetrical (the percent of buyers of A who buy B is not the same as the percent of buyers of B who buy A), and shows no obvious structure. Because of multiple brand buying, the column percentages can sum to more than 100%.
In both of Tables 1a and 1b the numbers in the diagonal cells are shown in a lighter shade of grey, as they merely show that “all of Brand A’s buyers bought Brand A” and are therefore meaningless to this discussion of sharing between brands.
This sort of table can be easily extended to produce a Duplication Table covering many brands. In practice, there is more structure to duplications in a market than this theoretical “anything goes” example shows. As we will see, the extent to which buyers of one brand buy another brand depends mainly on one thing: the size of the other brand. So, if 10% of Brand A’s buyers buy Brand B, then roughly 10% of C’s buyers and D’s buyers will buy B also.
FINDING THE STRUCTURE IN CUSTOMER SHARING
Table 2 shows data from the UK deodorant market, using the same format of vertical percentages as in Table 1b1. For example, 8% of buyers of Adidas (1st column of data) also bought Asda deodorant (the brand in the second row), and 20% of buyers of Adidas also bought Dove whereas 31% of Vaseline buyers (last column) also bought Sure (the brand in the third from bottom row). The diagonal cell entries which indicate 100% of those who bought brand X also bought brand X have been omitted altogether to aid visual scanning. This also helps calculation in a spreadsheet where, as we shall see below, the 100 is not to be included in the calculation of the column average.
The table above is full of ‘interesting’ numbers, but with seemingly little discernible pattern. Glancing down the columns, we can see that the buyers of that brand buy many other brands (those in the rows) in varying proportions. Looking down column 5 (Impulse buyers), for instance, tells us that 38% of Impulse buyers bought Lynx. Whereas in the next column of Lynx buyers, we see duplication with Impulse that is half of this: only 20% of Lynx buyers also bought Impulse. Confronted with such a diversity of numbers, one might be tempted to resort to some kind of sophisticated multivariate analysis technique in order to tease out the structure of any relationships. But that is not necessary. Two very simple transformations will clearly reveal the main patterns in the data and with them the main message of the structure of competition in this and every market.
The first transformation involves ordering the table based on the relative size of the brands. The difference in duplication rates between Lynx buyers and Impulse buyers (38% of Impulse buyers bought Lynx, while only 20% of Lynx buyers bought Impulse) seems like a big competitive inequality, until we use the knowledge that we gained previously, that the proportion of buyers of a large brand who also buy a small brand is smaller than the proportion of the small brand’s buyers who buy the larger brand. This is the case here, where Lynx has nearly twice as many buyers as Impulse (see the penetration figures in Table 3, below). The two-fold difference in duplication is a simple arithmetic consequence of the two-fold difference in penetration2.
If we arrange the table, both columns and rows3, in descending order of brand size, the size effect between brands becomes more obvious as we see in Table 3 — the numbers are higher in the columns towards the left and lower in the columns towards the right. It helps to see this if we show the column averages, and the brand penetrations. To sort the table elements we can use either penetration or brand share (the share of volume of purchases, which accounts for both penetration and weight of purchase)4. In this instance, we have used penetration to decide the order.
If we look again at each of the rows of Table 2, we can see that the numbers within any row are rather similar to each other. Whereas there is a lot more variation between the rows—some rows average about 9 or 10 (for instance, in the Asda row) while others (for instance, the Sure row) average 35 or more. Patterns such as this tend to be more obvious when the similar numbers occur in columns rather than rows, because the eye scans down a column more easily than it does across a row. This leads to the second transformation, which is to interchange the rows and columns. When we have carried out these two transformations, we end up with Table 3.
Organising a table as these two simple transformations have done helps to make any patterns and relationships obvious. The patterns in the table then become what John Tukey called interocular: they leap out and hit you between the eyes!
INTERPRETING THE PATTERNS
In Table 3, it is now obvious that the numbers in any column are quite similar to each other, while the numbers differ considerably from column to column — this is made clear from observing the averages. Furthermore, the numbers in the columns tend to decrease systematically as we move from left to right — again, this is evident in the decreasing column average from left to right. Whichever group of brand buyers we consider (any of the rows), about 30% of them also buy the brand leader Sure (gleaned from looking down the second column), whereas only about 8% of each brand’s buyers buy the small brand Asda (the numbers in the last column). There are some exceptions, which we shall come to, but these are small in the context of the whole table.
Now, unlike before, we would say that there are clear, discernible patterns revealed in the table. Because of the structure we have imposed, we can interpret them easily as follows:
- (i) The proportion of any brand’s buyers who buy another specific brand is roughly constant.
- (ii) This proportion decreases systematically with the size of that second brand.
In summary: each brand shares customers more (and therefore competes more) with big brands, and less with small brands.
The Duplication Coefficient: D
The average duplication for each column in Table 3, (the % ‘who also bought’ a particular brand) compared with the penetration of each brand (the % who bought in the population — the last line of Table 3), shows a very high correlation. Where duplication is high, penetration is also high (in the left-hand columns), and where one is low, the other is low (the right-hand columns). In fact, the correlation is almost perfect, with a coefficient of 0.99. For each brand, the average duplication is always about 1.5 times its penetration.
We can summarise the relationship by saying that the percentage of buyers of Brand X who also buy Brand Y is approximately equal to some constant (1.5 in our example) times the proportion of the population who have bought Brand Y (Brand Y’s penetration). It turns out that this relationship is found consistently in market after market; so consistently, that it is known as the Duplication of Purchase Law. This is represented by the formula:
bY|X = D × bY
… where bY|X is the percent of buyers of X who have also bought Y; D is the constant known as the duplication coefficient; and bY is the percent of the population who have bought Y.
The D value of 1.5 can be calculated in a number of ways. The most obvious way is to convert each entry in the table to a duplication coefficient by dividing the duplication percent (the raw entry) by the penetration, and averaging all of these coefficients. However, this isn’t the best method, as it is subject to bigger sampling error if there are many small brands. A technically better way is to divide the sum of the proportion of the population buying each pair of brands by the sum of the product of the penetrations of each pair. But that is a complicated calculation. A compromise based on easily available data, is to average the duplication percentages (i.e., to average the column averages) and divide that by the average of the penetrations5. This is the calculation used in Table 3.
When D is equal to 1, buying one brand makes the purchaser no more or less likely than anyone else in the population to buy another given brand. That is, the purchase of the one brand has had no influence on their purchase of another brand: if, say, 10% of the population buy Brand Y, then 10% of Brand X’s customers would too. If D is greater than 1, then buying X makes someone more likely to buy Y, and if less than 1, less likely. The D value is not fixed for a category: it can vary for periods of different length. For example, in very short time periods, many buyers of a brand may be out of the market for buying any other brand, as they will have already made a purchase and not have the need to make another one. In such situations, D will be less than 1, possibly even near 0 (e.g., few people buy toothpaste two weeks in a row, and so, after having bought one brand they are unlikely to buy another). But in longer periods like a year, D will be greater than 1, indicating the multi-brand portfolio nature of buying – buying one brand makes one more likely to buy another.
D differs between categories (e.g., the average D for deodorants in Table 3 was 1.5 compared with 1.2 for Spreads in Table 5). This variation has not been widely studied, however D tends to be higher if the category has many light- or non-buyers and many heavy ones (like pet food, for instance), or if consumers tend to have large portfolios of brands (like sweets) rather than mostly having a distinct favourite.
The main benefit of using D is to compare the values within a table to see whether it is roughly constant for all pairs of brands. Then to identify if there are clear groupings of brands where D is higher or lower indicating brands that compete more or less directly with each other than would be suggested by their size alone, as we will see in Table 5.
INTERPRETING THE DEVIATIONS FROM THE PATTERN
Of course, there are some deviations from the overall pattern in Table 3. For instance, in the second column, we see that 36% of Right Guard’s customers buy Sure, compared with the column average of 30%. What the analysis technique has done is to make such variations clear — they stand out from the main structure. Now that the duplications analysis has revealed them, the task then becomes one of determining
- (i) whether such deviations are sufficiently large to warrant any further investigation;
- (ii) what might be the cause of such deviations;
- (iii) what, if anything, might be done because of them.
However, without first knowing about the main patterns and organising the data so that they can be clearly seen, these important questions are obscured.

In Table 4, to emphasise the deviations, we have replaced each cell entry with the deviation of the raw value from the column average, and highlighted all those that deviate by ±5 or more points from the column average6 (yellow for positive and grey for negative deviations). We have also slightly changed the order of brands in the table, to gather the three male-targeted brands together in the bottom right. We could just highlight these deviations in the duplications in Table 3, or we could produce a table of individual D coefficients, as in Table 5. Whichever metric we use, the same story is clear. The deviations are mostly small, and some are not just random, as follows:
- The male-targeted brands Right Guard, Adidas and Gillette duplicate (share customers) somewhat more heavily with each other. What is perhaps surprising is not that they do, but how little they do. A buyer of one of these brands is only 30-40% more likely to also buy one of the others than is a buyer of any other brand in the market. Furthermore, they show only a marginal tendency to duplicate less than expected with most other brands. This may of course be because many households have both males and females, buying different brands for each. Perhaps analysis of the data separately for households with no males and with no females might add to our understanding. Lynx (known as Axe in other countries), another male-targeted brand, doesn’t seem to be part of this little partition, although Adidas buyers duplicate more than expected with Lynx, as do Impulse buyers.
- Buyers of the brand Impulse (row 6) duplicate more than average with almost all the brands, especially the big ones. This could indicate that Impulse buyers are either less loyal than other brands, or that they are heavy buyers of the category. It turns out to be the latter (from another analysis): they buy the category about 25% more than most brands’ buyers do. This is unusual as in most categories each brand’s buyers buy about the same amount of the category in total (Ehrenberg, Uncles and Goodhardt). Impulse is a body spray, bridging the deodorant and perfume categories, and may therefore be bought in addition to a more conventional deodorant. A useful facet of duplication analysis is that the category can be defined in any way you choose, and patterns of competition or structure reveal themselves. Thus, if data were available, the combined category of perfume and deodorants could be explored.
- The collection of brands making up ‘Other’ shows a varied pattern of over- and under-duplications. This mainly reflects the heterogeneity of that grouping. Analysis separating this catch-all brand into the individual brands, or into defined groups (e.g. private label brands) might reveal some other brands that fit into the groupings already shown, or other new partitions.

SEEING PARTITIONS CLEARLY
Partitions occur where groups of brands duplicate more than expected with each other, and less than expected with the brands not in the partition. This can be highlighted using the D coefficient for all the entries in a table, as in Table 5. A table of D values will always be symmetrical about the diagonal (minor discrepancies in Table 5 are due to rounding). The table is arranged so that D values that are similar to each other, but different from the rest, are grouped together. Constructing the table using D values makes the various partitions instantly clear because the D values are almost the same for all brands within the partition while different between partitions7.
Table 5 shows this for the Butters and Spreads market, where there are three very clear partitions: Butters, “Buttery Taste” Spreads and Healthy Spreads (our labels).
So, in Table 5, compared with expectations based on penetration among all households:
- Buyers of any brand in the Butter partition are more likely to buy another butter and are somewhat less likely to buy a “Buttery Taste” Spread.
- Buttery Taste Spread buyers are much more likely to buy another Buttery Taste Spread and less likely to buy real butters.
- The Healthy Spread buyers mostly show a normal level of duplication, between both each other and the brands in the other two groups. Similarly, buyers of the other two groups show neither more nor less tendency to buy a brand in the Healthy group.
- Bertolli shows some excess duplication with Butter and under duplication with Buttery Taste Spreads. In other words, it behaves like a Butter.
- There is a small grouping of Stork, Willow and Country Life, which duplicate a little more than normal with each other.
The main pattern is now quite clear: the brands that are not butters but promote their buttery taste (exemplified by the leader, ‘I Can’t Believe It’s Not Butter’) tend to displace rather than co-exist in user portfolios with Butter. Whereas the Healthy (oil based) spreads tend to substitute directly for, or complement, brands in both of the other partitions, rather than displace. A major attitude study would now seem unnecessary to understand this main structure of competition.
We can apply even more data reduction to further summarise the relationships by showing the average duplication coefficients within and between each partition, as shown in Table 6. This is particularly useful for summarising more extensive or complicated partitions, as for example in Institute Report 6 on Customer Retention and Switching in the Car Market.
I CAN’T BELIEVE IT’S NOT POSITIONING!
The difference between full-fat butters, low-fat buttery taste products and other healthy spreads here is a clear example of a functional difference. These different product groups (sub-markets that are comprised of closely competing brands) are not merely positioned distinctly; they are different (although similar) products. However, the positioning of these brands and their targeting is far from exclusive — large proportions of each brand’s customer base exist who buy brands outside of the partition. The partitions are not constructed of buyers solely wedded to products within that partition. This is typical of the sort of partitioning patterns that we see with sub-markets.
Most examples of partitions that exist are to do with some functional difference between the products in the market. Other examples would be: cola soft drinks vs. other non-cola soft drinks; sugared vs. diet soft drinks; leaded vs. unleaded petrol/gasoline; budget vs. full-price/featured airlines and so on. Partitions of brands based on non-functional positioning are rare, although it remains an empirical question as to whether they could exist.
Duplication analyses can be carried out using attributes instead of (or as well as) brands. This might reveal for example whether there are partitions by pack-size (large pack vs. small pack buyers), flavour (exotic flavour vs. traditional flavour buyers), price levels etc. Various examples can be found in Institute reports, such as Leaded and Unleaded Petrol (Report 1: Understanding Dirichlet-Type Markets), TV Programmes (Report 18), Price Levels (Report 32), Car Types (Report 19) and Fabric Conditioner (Report 11).
These patterns of duplication give us an insight into how consumers shift their purchasing back and forth between brands over time. The net result is mostly stability in overall volumes. However if market shares do change, then we can expect that a brand that gains in market share will do so mainly at the expense of losses from all other brands proportional to each of the other brand’s share. The exceptions being those brands that are in the same partition as the gaining brand, which, because they compete a little more closely, will tend to lose slightly more.
We conclude with a brief mention of a particular type of duplication analysis, known as a Switching Matrix. In a Switching Matrix, the data cover two purchases for every respondent, usually the most recent purchase and the one before. This is particularly appropriate for products that are expensive and with long purchase cycles like cars, but can be applied to any type of product. Another advantage is that data can be collected in a survey: consumers often have little problem recalling their last two purchases. This can be very useful in emerging markets where consumer panel data may not be available. We will cover this type of analysis in detail in a future Institute report.
IN CONCLUSION
We have outlined the process of producing duplication analyses. Various examples have been used to illustrate that the main pattern is that all brands compete more with big brands and less with small ones, with occasional and usually small deviations where certain brands compete more or less than expected. Brands cannot isolate themselves from competition by portraying themselves as different, or even by being different. Competition for every brand comes more from big brands that have many customers, and less from small ones.