The Data Science team is responsible for enriching the data that our crawlers collect massively from fashion related websites. For that, you will have to use our machine learning models. The data pipeline starts with hundreds of spiders (running in python with scrapyd) that continuously crawl websites. It continues with a series of data quality and data enrichment processes (python, celery, rabbitMQ, SQL, Keras, OpenCV) and ends dumping clean, validated and normalized product information for our customers to be consumed.
One of the most important pieces in this pipeline is enriching the data with machine learning models. This adds information such as the categories (clothing, footwear, beauty…), genders, attributes, colors, etc of the fashion items. . The database already contains more than 500 millions of products (growing daily) and we process 1-2M new products every week.
Our motto is "We love data". And we love technology that deals with data because it enables us to do incredible things... things that are valuable for our customers and that can sustain a business.
StyleSage was established 8 years ago and our offices are in New York and Madrid. Madrid is the home for our core technical team, while NY hosts the business team. It's an open, diverse and inclusive team of very skilled and talented individuals that are happy to collaborate, share knowledge and enjoy building great software together. We are looking forward to welcoming additional members for this team.