Friday, September 20, 2013

Launching Home: The Good, The Bad, and The Ongoing

This week we unveiled Polyvore for your Home, a major milestone in our 6-year history! Even though Polyvore is the largest fashion community on the web, our original prototype started in interior design, so we’ve always built our technology as a platform, designed to scale beyond fashion. That said, some parts of launching home were easy, some were hard, and there’s still plenty more work to do!

The Easy Parts

Because we had planned ahead (really far ahead, in some cases), some things were easy to build on top of our existing infrastructure.

Getting products in

We needed to bootstrap the home category with enough great products for our users to play with. Our data pipeline starts with a large network of crawlers running on EC2 that pull products from popular retailers like Nordstrom, West Elm and Fab. This is when we retrieve brand, price, availability, and more. The same technology we use on fashion sites works just as well on home sites, so it was just a matter of creating new crawler scripts to pull products from home retailer sites and feed them into our existing infrastructure.

Tuesday, May 21, 2013

Under the Hood: How we Mask our Images

Polyvore users create over 3 million sets every month, mixing and matching their favorite products to express their style. They clip in images from all over the web, like this teal T-shirt from

As you can see, the shirt image has a light gray background that creates an eyesore when layered with other items in a set:

How do we strip this background away in order to achieve a cleaner look? More importantly, how can we do it for not just this T-shirt, but for any of the 2.2 million images our users clip in every month?

Monday, March 4, 2013

Polyvore Style Tips: CSS, Javascript and HTML

Shhhh! Don't tell anyone, but the engineers at Polyvore are ... nerds. Most people have the misconception that Polyvore is full of fashion models and editors, but the truth is that Polyvore's a technology company wrapped in a Gucci dress.

Our core web technology enables people to mix and match products from around the web to create works of art or whimsy.
Our front ends have always been heavy in HTML, JS, and CSS. And over the years, we've learned a ton in building those features and trying to keep up with all the latest and greatest HTML5 features in modern browsers. I wanted to share some of our favourite tricks.

Unless otherwise noted, these all work on the latest couple Safari, Chrome, and FireFox releases (obviously) but also on IE8 and higher. These are Polyvore's supported browsers and it's a decision based on traffic -- 5% or higher. YMMV if you are committed to supporting older versions.

Tuesday, January 22, 2013

Worker Queue System

In the good old days, Polyvore started out with a very simple, traditional LAMP architecture. Everything more or less directly accessed a bank of MySQL servers, both to service web requests and also batch housekeeping jobs. But our traffic and data set has kept growing and growing. We started to experience massive spikes in our DB load when nightly batch jobs were kicked off. Jobs that would complete in 3 minutes started to take 5 hours.

Fortunately, our infrastructure team has a ton of experience dealing with scalability issues. Their solution was to use a worker queue system to break up massive jobs into smaller chunks, executed by a bank of workers. This approach allows us to utilize our machines better by spreading load throughout the day and also scale jobs by parallel execution of chunks on different worker machines.

Since we used RabbitMQ before, we considered using it as the building block of our worker queue system as well. However, we quickly found out that RabbitMQ falls short in two important aspects:

  • RabbitMQ is essentially a generic message bus. This means we needed to extend its functionality to make it a full fledged worker queue system.
  • If a task is added to a RabbitMQ queue, there is no native way of inspecting that task while it is queued. That means we won’t be able to check which worker is handling that task, dedupe tasks and more.