Tuesday, October 7, 2014

Software Engineers Wanted....For Style Advice

"Software Engineers Wanted...For Style Advice" - not two things that you'd necessarily expect in the same sentence. But that’s what we do here at Polyvore, using machine learning and data science.

Today we’re excited to announce the launch of personalized recommendations on Polyvore for iPhone. You can download the app here!

Although the user experience of a personalized feed is simple and delightful, what goes on under the hood is quite complex. Personalization is no easy feat, but we were able to create hyper-personalized recommendations at scale that produce a 4x increase in product likes (one of our key measures of engagement).

Over the next few months, we’ll be doing a series of blog posts about how it all works. To get started, here's a quick infographic explaining the problem, our approach, and the results:

So, why is style such a hard problem?

Style is complex and nebulous. It's intangible, highly personal, and constantly evolving. Think about how you would try to go about describing your style or the kind of image you want to project. Think about all the details that need to be taken into account: the occasion (wedding vs. vacation), time of year (winter vs summer), situation (work vs. weekend), all the way through to tiny details that you love or hate (rhinestones and fringe, anyone?). It's not a straightforward problem with clear rules that you can encode.

Our goal is to quickly understand (and constantly learn about) our users' style, and provide a large number of real-time recommendations that match that style. We use hardcore data science to help shoppers get inspired, and find great products, through personalizing their experiences in our applications.

Big data = big value

Polyvore is uniquely positioned to build a personalized experience around style because of the kinds of data we have:
  • User-generated content (UGC) - 'sets' representing outfits/style/fashion where the combination of items is human and individual
  • Structured product catalog - rich metadata that allows us to have a detailed understanding of the items in the UGC 
  • Shopper behavior information - likes, outbound clicks, engagement (looking at what, and for how long)
  • Editorial content - representing years of curation experience, providing fashion-aware content discovery
These kinds of data, combined with the volume of that data, from years of active community and members (~3M sets/month) enables us to apply machine learning and data-mining to understand something as nebulous as 'style'.

To make the problem even more interesting, our notional 'Style Graph' is much more dynamic than other 'Graphs' as we also need to take into account things like seasonality and trends that can come and go within days or weeks.  Our focused dataset means that we can get robust results from our algorithms without having to wade through noise and uncertainty.

Constructing the Style Graph

In future blog posts we'll go into details about the systems and algorithms that run at large scale to provide personalized, real-time recommendations. We'll talk about:
  • Platforms: Mahout, EMR, neo4j, AWS and Cassandra
  • Mathematical techniques: collaborative filtering, associative rule discovery, principal component analysis
  • Statistical experiments: what valuable and interesting insights have we gained so far?

Post a comment if you'd like us to consider covering something specific or answer questions you have. Make sure you subscribe to this blog and watch for our follow-on posts! And for more information on our team, check out our About page.