The presence of online reviews has become pivotal in driving consumer behavior. Online reviews are evaluations of products and services posted on websites by consumers who had positive, negative, or neutral experiences or interactions with the product or service (Mudambi & Schuff, 2010). Their growth has become increasingly influential in the last decade, guiding consumer choice for specific products and services (Mo et al., 2015). Online reviews typically take one of two forms: qualitative reviews in the form of written comments and in the form of numerical ratings (e.g., number of ‘stars’). The former offers potential buyers comparatively detailed and time-consuming information, whereas the latter offers a quick glimpse of the distribution of review scores (e.g., ‘number of stars out of five’).
Uncertainty and ambiguity about a decision (e.g., the actual quality of a product advertised online) causes people to seek out or rely on information that may or may not be valuable to make a good decision (Samson & Voyer, 2012). Within an online purchasing context, the availability of previous customers’ reviews of a product provides a particularly interesting source of decision-making information. Several studies have examined the links between online reviews and purchase intentions or sales. Floyd and colleagues (2014) performed a meta-analysis of 26 studies and found that the overall positive valence of reviews, product endorsement by trusted critics, and third-party endorsements (e.g., recommendations from an independent website) each boosted product sales. A meta-analysis of 69 studies by Ismagilova and colleagues (2020) found that ‘electronic word of mouth’ communications about products—which include online reviews—influence purchase intentions in many ways. Among these, positivity and perceived trustworthiness of electronic word of mouth communications increased purchase intentions. Furthermore, the overall volume of electronic word of mouth communications also played a significant role, with purchase intentions being higher when volume was high (see also Chevalier & Mayzlin, 2006; Clemons et al., 2006). More recently, a meta-analysis of 156 studies by Qiu and Zhang (2024) found evidence of a positive impact of overall review valence, volume, and specific ratings on purchase intentions, alongside a host of predictors related to the content of reviews and background of reviewers (e.g., emotionality, quality, and credibility).
Focusing on the impact of review ratings in particular, studies thus suggest that products benefit from having many reviews and especially with positive valence. That being said, although positive reviews may generally generate a more positive judgment of the product for a potential buyer, the extent to which this happens varies by the type of product (Mudambi & Schuff, 2010). Other research suggests that negative reviews might outweigh positive reviews with experience products (Yang et al., 2016) as opposed to consumable products (e.g., beer; Clemons et al., 2006).
Dispersion of Online Product Reviews Ratings
Despite the existence of many studies and several meta-analyses on the links between product review ratings and product preferences, few studies have examined the role of rating dispersion—the degree to which review ratings are clustered less or more closely around an average. Some notable exceptions include work by Chintagunta and colleagues (2010) and Kim and Lee (2015), who found that movie box-offices and hotels performed better and worse when review ratings were more dispersed, respectively. A challenge with these existing studies is that they examined the role of review rating dispersion in naturally occurring reviews, rather than isolating its possible effect experimentally. Few studies have taken such an experimental approach, with a 2023 paper Liu and colleagues (2023) being especially noteworthy. In a series of studies, these researchers showed participants, at random, a histogram of product ratings (‘stars’) that were either high or low in dispersion, keeping their mean valence the same. They found that participants high in dialectical thinking—referring to an ability to appreciate contradictions—preferred products with highly (vs. less) dispersed ratings, as opposed to individuals low in dialectical thinking. They attributed this effect to participants ascribing higher credibility to products with highly dispersed ratings. While promising, the authors of this important study called for follow-ups with populations that were not predominantly East Asian university students, and to examine additional mechanisms related to their findings.
We propose a different (possibly complementary) mechanism through which highly dispersed product ratings may lead to increased product preferences: an asymmetry in the sampling of social information. When individuals develop their judgements, they often try to imitate other people’s behavior (Bandura & Walters, 1977). Indeed, research suggests that such exemplars are used to judge the attractiveness of decision-task alternatives (Glöckner & Witteman, 2010) and shape people’s attitudes towards choices even if the exemplar is rather atypical (Florack et al., 2001). Importantly, research by Van Tilburg and Mahadevan (2020) shows that people focus disproportionately on extremely positive exemplars in risky decision-making tasks (e.g., accidental ‘winners’), and mimic their behavior. Moreover, even when the outcomes are known to be determined by chance, individuals still tend to imitate the behavior of successful exemplars. Explanations for this phenomenon, where people disproportionally focus on positive exemplars (e.g., positive reviews) over negative ones, have focused on the goal with which individuals are tasked. In contexts where people’s goal is to obtain a desirable outcome, positive exemplars have more impact than negative ones (e.g., Lockwood et al., 2002); with the opposite being true in contexts where people are motivated to avoid an undesirable outcome. Similar to people being biased towards the use of theory-confirming evidence or self-enhancing feedback (e.g., Gregg et al., 2017; Szumowska et al., 2023), people motivated to obtain a positive outcome tend to focus especially on positive information (Van Tilburg & Mahadevan, 2020). While negativity biases are generally more common in psychology than positivity biases (Baumeister et al., 2001; Norris, 2021), there is thus a precedent for expecting positive reviews to outweigh the negative reviews in settings where individuals are trying to find the most desirable outcome from a set of alternatives.
Applied to the context of online review ratings, we accordingly expect that the availability of extremely positive reviews, keeping mean ratings the same, will positively bias consumers’ preferences towards a product, even if the review ratings contain an equivalent proportion of equally extreme negative reviews. We theorized that products accompanied by a highly dispersed set of ratings will be preferred over products accompanied by a less dispersed set of ratings. Importantly, our reasoning also offers a complementary explanation for the common finding that the volume of reviews is positively related to higher product preferences. Keeping the mean product rating the same, a higher volume of review ratings is more likely to contain extremely positive (and negative) review ratings than a comparatively low volume. A disproportionate focus on extremely positive reviews will thus tend to result in a preference for products that are accompanied by a high (vs. low) volume of reviews.
Overview of the Current Research
We tested experimentally how the dispersion and volume of product ratings affect people’s preferences. Highly dispersed distributions feature more extreme ratings, both positive and negative; given that people seem especially preoccupied with positive exemplars, we hypothesized that products whose ratings were highly dispersed would be preferred over those whose ratings were less dispersed (H1), even when their average ratings were equal. In addition, a high volume of ratings is more likely to feature extreme values (both positive and negative) than a small volume of ratings. Given that people gravitate towards positive extremes, we hypothesized that products which received a high volume of ratings would be preferred over products which received few (H2), even when their average ratings were comparable.
Method
We tested the above hypotheses in an online experiment that was built on the common star review system used by various online marketplaces. We presented participants with four categories of products (a mug, a pen, a clock, and a restaurant). Each category featured four specific products accompanied by a sample of star ratings that varied in dispersion and volume.
Participants
A total of 281 adults living in the USA participated in the study. They were recruited through Amazon’s Mechanical Turk (www.mturk.com). We excluded duplicate participants and those with a completion time below one-third the median or above three times the median. The final sample consisted of 265 participants (151 women, 114 men) aged between 18 and 72 (M = 38.64, SD = 11.63). A post-hoc sensitivity analysis indicated that this sample size afforded a power of (1- β) = 0.80 to detect small-to-medium effects of W = 0.20 (two-tailed α = .05).
Design
The study employed a 2 (volume of product ratings: high, low) × 2 (dispersion of product ratings: high, low) × 4 (product set: pens, mugs, clocks, restaurants) within-factorial design. Thus, for all four product sets, all participants considered four brands accompanied by a unique combination of rating dispersion and volume. The order of presentation of product sets was random for each participant. This study received ethical approval from the Ethics Committee at the Research Ethics Office of King’s College London, UK (Ethics code: #MR 1718 67).
Procedure
Participants gave informed consent and received the instruction that their task was to select four products of their preference. They then viewed four brands of one of the four product sets (pens, mugs, clocks, or restaurants), in a random order. Each one of these sets contained four product brands, distinguished by their names. These names were randomly allocated to them. The pen brand names were Pendora, Pentagora, Penny, and Penelope; the mug brand names were Marie, Murion, Marlie, and Meddie; the clock brand names were Clerk, Clarie, Cleo, and Clara; and the restaurant brand names were Eat & Enjoy, Delicious Dining, The Meadow Fiddler, and The Mad Drum. The four products’ brands in a set were displayed in a random order. Each product brand in a set was accompanied by a horizontal histogram with star ratings (see Online Supplement Figure S1). One (a) was accompanied by a highly dispersed, high volume of ratings, one (b) was accompanied by a little dispersed, high volume of ratings, one (c) was accompanied by a highly dispersed, low volume of ratings, and one (d) was accompanied by a little dispersed, low volume of ratings. This allocation was also random.
For the brands accompanied by a high volume of ratings (a and b) we used the ratings as displayed in Figure 1; the left one for the high dispersion product brand (a) and the right one for the low dispersion brand (b). Each contained 50 symmetrically distributed ratings, with an average rating of 5.50 stars (out of a possible 1 to 10 stars). For the brands accompanied by a low volume of ratings (c and d), we used one of the ratings displayed in Figure 2. For the brands with widely dispersed ratings (c) we used one of the ratings on the left side. We selected it randomly, provided that it had not yet been used in another set of products. For the brands with narrowly dispersed ratings we used one of the right-sided ratings (d). We selected it randomly, provided that it had not yet been used in another set of products. Note that each set of 20 ratings in Figure 2 was randomly drawn from those in Figure 1. The high dispersion ratings on the left in Figure 2 were drawn from the high dispersion ratings in Figure 1, and the low dispersion ratings on the right in Figure 2 were drawn from the low dispersion ratings in Figure 1. These random draws allowed us to retain (across the low volume ratings) an average valance and dispersion approximately similar to that of the high volume ratings (M = 5.45).
Figure 1
High Volume of Product Ratings
Figure 2
Low Volume Product Ratings
The star ratings were presented in ascending or descending order for each participant, determined at random, but consistent across their presented products to avoid confusion. Participants could click on the ratings they viewed to see the associated comments.1 After viewing the four brands of a product set, participants selected their preferred brand and moved to the next product set. Data and codebook can be accessed (see van Tilburg, 2024).
Results
Product Preferences
Each participant expressed four product preferences, each time from a set of four alternatives (Tables 1a through 1d). Each of these preferences were either for a product with a high or low dispersion of ratings, and either a high or low volume of ratings. We calculated for each participant how many times, out of 4, they selected a product with a high (vs. low) dispersion and, how many times, out of 4, that they selected a product with a high (vs. low) volume of ratings. If participants were unaffected by either of these factors then we would, by chance alone, expect that 2 out of 4 of the selected products would feature a high (vs. low) rating dispersion, and 2 out of 4 would feature a high (vs. low) rating volume. Thus, we tested if the number of selected products with a high (vs. low) rating dispersion and with a high (vs. low) rating volume differed from a chance value of 2 (i.e. H0: µ = 2).
Table 1a
Preference Frequencies for Pens
| Low Dispersion | High Dispersion | |
|---|---|---|
| Low Volume | 38 | 54 |
| High Volume | 72 | 101 |
Table 1b
Preference Frequencies
| Low Dispersion | High Dispersion | |
|---|---|---|
| Low Volume | 45 | 34 |
| High Volume | 76 | 110 |
Table 1c
Preference Frequencies
| Low Dispersion | High Dispersion | |
|---|---|---|
| Low Volume | 44 | 41 |
| High Volume | 70 | 110 |
Table 1d
Preference Frequencies
| Low Dispersion | High Dispersion | |
|---|---|---|
| Low Volume | 35 | 62 |
| High Volume | 85 | 83 |
A single-sample t-test indicated that participants significantly preferred brands that were accompanied by a high (vs. low) dispersion of ratings (M = 2.25, SD = 1.18), relative to chance (2), t(264) = 3.377, p = .001, d = 0.207, in support of H1. Participants also significantly preferred brands that were accompanied by a high (vs. low) volume of ratings (M = 2.67, SD = 1.11), relative to chance (2), t(264) = 9.833, p < .001, d = .604, in support of H2.
We then zeroed in on the individual products sets and tested if preference for a product with a high (vs. low) dispersion ratings was greater than chance (50%), and if the same was true for products with a high (vs. low) ratings volume. We found significant preferences for brands featuring a high (vs. low) dispersion ratings for pens, χ2(1) = 7.642, p = .006, and restaurants, χ2(1) = 5.166, p = .023, though not for mugs, χ2(1) = 1.996, p = .158, and clocks χ2(1) = 2.358, p = .125. We found significant preferences for brands featuring a high (vs. low) volume of ratings for each of the product sets: pens, χ2(1) = 24.758, p < .001, mugs, χ2(1) = 43.204, p < .001, clocks, χ2(1) = 19.023, p < .001, and restaurants, χ2(1) = 334.057, p < .001.
Discussion
We examined how dispersion and volume of product review ratings affect consumers’ preferences. Based on earlier work that shows that people gravitate to extreme positive exemplars in decision-making contexts where people pursue desirable outcomes (Lockwood et al., 2002; Quimby & De Santis, 2006; Van Tilburg & Mahadevan, 2020), we theorized that products accompanied by a high dispersion or high volume of ratings would be preferred over products accompanied by a low dispersion or low volume of ratings. We tested this in an experiment where we manipulated review rating dispersion and volume. As expected, people preferred products accompanied by highly dispersed and high volume ratings.
Our findings are consistent with past work demonstrating the disproportionate influence of positive exemplars in decision-making contexts (Van Tilburg & Mahadevan, 2020), suggesting that positive reviews are more influential than negative ones (Mo et al., 2015), and work indicating a positive influence of upper-quartile reviews and high variance in ratings in the context of craft beer sales (Clemons et al., 2006). More generally, our findings suggest that the shape and size of review rating distributions matter, and influence preferences independently. Our findings also echo prior research on positivity biases—an asymmetry between positive and negative units in a decision-making contexts—with individuals tending to favor positive units over negative ones (Heider, 1946, 1958).
The finding that a high dispersion of review ratings and a high volume of ratings can make products seem more appealing has practical implications for both individual consumers and marketing practices. Individual consumers may benefit from review rating presentations that are less sensitive to less representative extremes, for example, by accompanying ratings with confidence intervals around an average or making negative reviews more salient or accessible. On the other hand, for businesses, increasing diversity in a customer base, and hence in reviews, may increase sales, provided of course that the overall valence of ratings does not diminish.
Our findings also sit well within the broader context of research on the representative heuristic. This bias proposes that people often judge a group or event based on a stereotypical example regardless of the small probability that the single example is actually representative of the group (Kahneman & Tversky, 1972). A disproportionate focus on positive exemplars in context of pursuing desirable outcomes (Van Tilburg & Mahadevan, 2020) may cause positive ratings to become the specific object on the basis of which people tend to form their judgement. In future work on the topic, it would be valuable to test if a representativeness bias accompanies participants’ preferences.
While many psychological phenomena, including in decision-making contexts, display a negativity bias (Baumeister et al., 2001; Norris, 2021), our results suggest that positive product reviews can outweigh negative ones. In our particular setting, participants were tasked with obtaining the best (most desirable) product from a selection. This focus on attaining desirable outcomes (rather than avoiding undesirable outcomes) is known to shift the weight towards positive exemplars (Lockwood et al., 2002). Had participants been tasked to avoid undesirable products instead, we would have predicted the opposite pattern, with the negative reviews outweighing the positive ones, which would make an interesting extension for future work.
In our study we manipulated the dispersion and volume or product review ratings independent of the mean, allowing us to isolate their impact. However, our findings may not generalize to all settings. For example, if a product has a low dispersion of review ratings that is nevertheless highly positive then it may well be preferred over an alternative with a large dispersion of ratings that is more negative overall. Our results do not tell us the relative importance of rating valence.
Our study examined a rather specific population (USA adults) and it is valuable for future work to extend the focus beyond this particular group. In addition, while we identified the effects of review ratings, we did not examine the role of the content of written reviews, which is known to play a role too (Qiu & Zhang, 2024). Likewise, we did not provide participants with details about the source of the review ratings, and whether these were made by, for example, experts. Given that the credibility of reviewers plays a role in product preferences (Floyd et al., 2014), this is a worthwhile variable to consider in future work.
This is an open access article distributed under the terms of the