Backend team is responsible for collecting massive amounts of data from fashion related websites and placing it into our analytics data repository. There is a whole data pipeline that starts with hundreds of spiders (python/scrapyd) that continuously crawl websites, continues with a series of data quality and data enrichment processes (python, celery, rabbitMQ, machine-learning, SQL...) and ends dumping clean, validated and normalized product information for our customers to analyze. As an example, the pipeline includes data enriching with machine learned models and massively parallel processing using Spark. The pipeline consists of many pieces that need to be monitored, scaled up and down, maintained, deployed, enhanced and adapted to new business needs continuously. The database already contains more than 500 millions of products (refreshed daily) and we see 1-2M of new products every week.
Our motto is "We love data". And we love technology that deals with data because it enables us to do incredible
things... things that are valuable for our customers and that sustain a business.
StyleSage is (no longer) a startup founded 6 years ago with offices in New York and Madrid. Madrid is the home for our core technical team of around 20 people. It's an open, diverse and inclusive team of very skilled and talented individuals that are happy to collaborate, share knowledge and enjoy building great software together. We are looking forward to welcoming additional members for this team.