Friday, February 27, 2015

Data and the User Experience

Post 2: Data and the User Experience
By Matt Wheeler

Last month we shared some of the technology behind Polyvore’s Style Profile and how we’re using machine learning to understand our users' individual style to recommend more personalized products and outfits.

We discovered that our unique set data (our users create over 3 million sets every month) helps improve the recommendations to create a more engaging shopping and discovery experience for our users. Let’s dive into more detail:

Without divulging too much of the secret sauce, we recently developed three independent algorithms -- or “streams,” as we call them-- that we use to generate product recommendations.
  1. Stream 1: Generates recommendations based on a user’s brand-affinity (a passion for Prada, for example).
  2. Stream 2: Generate recommendations based on collaborative filtering: items that similar users have liked.
  3. Stream 3: Leverages the talent of our awesome community of creators by recommending items frequently paired in sets.

We decided to get a deeper understanding of how users react to the three types of recommendations by launching an experiment that tests each stream. The results are pretty interesting and give us insight into ways that we can improve our user experience.

Which of the three streams do users prefer?
One of the features we make available is the ability to “like” a product. We can use these likes as a tool for measuring the quality of our recommendations. That is, if we show you 20 items and you like 10 of them then we are doing much better than if we show you 20 items and you like 5 of them.

We looked at the distribution of like rates data to understand which streams users prefered:

Figure 1 - Overall like rate distribution (median = black horizontal bar)

The horizontal black lines in the middle of the boxes are the median like rates for each recommendation type. The boxes themselves represent the range of like rates in which most users fall. It is clear that the similar-users-based recommendations are the most popular, followed by the Polyvore set-based recommendations, with the brand-based recommendations bringing up the rear. So, we should focus our energies on increasing the number of impressions from user-based recommendations, right? Well, maybe. The overall variances in the individual like rates is relatively high (the boxes cover a lot of area on the graph).

Does this mean that we have a lot of heavy “likers” and a bunch of “non-likers”? Does it mean that users actually have individual preferences for recommendation type? Or does it mean something else entirely?
To answer these question, we used each user’s likes to create their “perfect mix” of streams and compared it against the average user’s perfect mix. For instance, if we showed you 100 recommendations from each stream and in each stream you liked 10 items, then your perfect mix would be an even 33%/33%/33% mix of recommendations from each stream. If the average Polyvore user’s perfect mix was 40% similar-user-based recommendations, 35% set-based recommendations, and 25% brand-based recommendations then we would say you have a -7% relative affinity for similar-user-based recommendations, a -2% affinity for set-based recommendations, and a +8% affinity for brand-based recommendations. Plotting these affinities per user in ascending order we get the following graphs:

Figure 2 - Distribution of individual preference for each of the three streams. The x-axis is individual users, sorted by increasing stream affinity. A positive .1 means that a user’s ideal mix of streams would add an additional 10% to the average mix (e.g. go from 25% of total individual impressions to 35% of total individual impressions). Note that it is not possible for a stream to be increased in a user’s ideal mix without a decrease in at least one other stream.

In each of these graphs, a large negative value indicates a user who likes the stream much less than the average Polyvore user, while a large positive number indicates someone who likes the stream much more. We can see that, for each stream, there is a non-trivial minority of users who have a strong positive affinity. There is a similarly sized minority who have a strong negative affinity. This suggests that we could improve the individual user experience by showing users more impressions from the streams they like and fewer from those they dislike.

Testing our hypothesis with an experiment:
Another interesting finding concerns the relationship between user’s reactions to the streams. It turns out that there is a pronounced negative correlation between brand-based affinity and similar-user-based affinity:

Figure 3 - 95% prediction interval of similar user affinity regressed on brand affinity

This chart uses the same values as those in Figure 2 and shows that users who have a strong affinity for recommendations based on similar-users generally have a strong negative affinity for brand-based recommendations, and vice versa. Interestingly, this relationship is much weaker when comparing similar-user-based affinity with set-based affinity.

Figure 4 - 95% prediction interval of similar user affinity regressed on set affinity

This tells us that users who like recommendations from other users also appreciate -- or at least are not overly annoyed by -- recommendations based on sets. We see this same weak relationship when we compare brand-affinity folks to the set-based stream.

Figure 5 - 95% prediction interval of brand affinity regressed on set affinity

This is great news for us! Making recommendations based on sets is something that only Polyvore can do, and the data suggests that investing time to improve these recommendations will only complement the user experience. Our creators’ sets are a rich store of fashion data and this set-based product recommendation algorithm only scratches the surface of what’s possible. All is powered by the unique data from our creators’ sets. Stay tuned.