A large part of the StyleSage product is a whole data pipeline that starts with hundreds of spiders (python/scrapy) that continuously crawl websites, continues with a series of data quality and data enrichment processes (python, celery, rabbitMQ, machine-learning, SQL...) and ends providing clean, validated and normalized product information for our customers to analyze in a beautiful platform. In every step of the pipeline, we are obsessed with ensuring highest data quality, which we promise to our customers.
The Data Collection (DC) team is responsible for the first step in that pipeline: Crawling massive amounts of (raw) data from fashion e-commerce websites as well as other data sources. To achieve this, the DC team works with a dedicated team of QA reviewers and spider developers (a.k.a spider-wo+men) and supports those teams with tools, frameworks, infrastructure and process automation.
Our motto is "We love data". And we love technology that deals with data because it enables us to do incredible things... things that are valuable for our customers and that sustain a business.
StyleSage is (no longer) a startup founded 8 years ago with offices in New York and Madrid. Madrid is the home for our core technical team. It's an open, diverse and inclusive team of very skilled and talented individuals that are happy to collaborate, share knowledge and enjoy building great software together. We are looking forward to welcoming additional members for this team.