Data is the New Lego

When I was a child, I used to love playing with Lego, or “Legos” as my American friends often say; my brothers and I built spaceships and trucks and houses and animals. As time went on, our creations became more ambitious, functional, and lifelike. We could each have insisted our Lego was our own, but by pooling resources, we collectively went further. Family and friends gave us Lego including unusual and hard to find bricks, which enabled us to make more accurate models. We were growing up too, and as our play became more sophisticated, we learned how to build better models.


’m not young anymore and my bones creak on cold mornings, but I still remember playing with Lego as I go to work each morning and play with data to build models. Using data to solve real world problems, like style, fit, and size recommendations, is surprisingly like my childhood Lego memories. To build something useful you need lots of data, data diversity, and the knowledge to build the right models in the right way.


BW-lego-bricks-(1).jpgIf you don’t have enough Lego bricks, the things you build aren’t realistic; the model is crude, the colors don’t match, and there are gaps. It’s the same with machine learning and computer models; if you don’t have enough data, your models are crude, and you have quantitative and qualitative errors. The history of computer modeling is rife with examples of people making bad decisions using models made with incomplete data. In dealing with style, fit, and size recommendations, not enough data means giving bad advice because your models are too crude to accurately model people and garments. This is where pooling data wins; by pooling our Lego, my brothers and I could build what we wanted; in fashion, by pooling data from many retailers, you can build better models because you have a more complete picture of consumers’ behavior and the unique style characteristics, size, and shape of garments.


To build a good quality Lego model you need a diversity of pieces – models built with just the standard 2x4 bricks are crude and inaccurate. This is where getting Lego from friends and family was so useful – we got more diverse bricks that let us build more accurate models. In fashion, you need a diversity of data on people and garments too. Simply extrapolating from the average size to plus sizes is like using 2x4 bricks for everything; one size does not fit all and you end up with something that isn’t accurate for users who aren’t ‘average’.


Simply assuming US consumers and apparel are the same as German consumers and apparel is like using the same few Lego bricks for different models; different markets need different data. Simply believing a $20,000 dress fits the same as a $100 dress is like building Lego models when the special pieces you need are missing; it’s the kind of thing you do when you don’t have the data you need. In fact, having data on $100 and $20,000 dresses lets you build richer models that make better recommendations for all dresses. The key to good modeling is having data on a diverse set of consumers and garments.
Young children make crude Lego models, the colors don’t match and the shapes are wrong; older children build working models with careful color schemes. A similar thing happens with data and algorithms. As you get to know and manipulate your data, your algorithms, and their interactions, you come to understand their limitations and you strive to build something better. As time goes by, increasing volumes of data point out the flaws in your work and you fix them– your models become better and better. In other words, the learning curve applies to building Lego and computer modeling.


color-lego-town.jpgIt might be a brutal childhood truth, but the children with the most Lego, the best pieces, and the time to play produce the best models. The same brutal truth applies for any AI based machine learning or computer modeling project. The projects with the biggest data volumes, the most diverse data, and the best teams to use that data will produce the most accurate models. That’s why it’s fun to play with the massive data set from True Fit’s fashion Genome: it includes data from the largest number of retailers and brands; there’s a diversity of country, people sizes, and garments; and my colleagues know what they’re doing. There’s the added benefit of doing something novel and helping people find clothes they’ll love, that suit their personal style preferences, and will fit and flatter them – Lego models only make a few people happy but style, fit, and size recommendations can make millions of people happier by helping them connect more easily with the clothes and shoes that better express who they are and how they feel. Coming to work each day, it’s like playing with the world’s largest Lego set and it makes me happy.


Sometimes late at night, when it’s quiet and there’s no-one around to judge, I quietly put together Lego models. It’s a consoling and comforting reminder of my childhood, like eating ice cream, playing chase, and England losing in the World Cup. Lego has taught me a lot about data and models and collaboration. But there’s one big difference between building Lego models with my brothers and building computer models with my colleagues: I don’t fight with my colleagues quite so often.


True Fit is determined to improve the customer shopping experience by using its rich data collection from thousands of brands to provide accurate size recommendations. A larger collection of Lego increases the size of scope of projects that can be built just as a vast data collection increases the scope of customers who are provided with accurate style, fit, and size recommendations. To learn more about True Fit's data collection, called the Genome, visit here.


MikeW-headshot.pngWritten by: Mike Woodward, Head of Analytics and Insights at True Fit
Mike contributes to True Fit’s continued success as the Head of Analytics and Insights. He has spent a 20 year career analyzing, interpreting, and modeling data in different industries. Mike is published widely in academic journals and trade publications, has spoken at many international conferences, won the British Computer Society’s IT Award for Excellence, and has degrees from the University of Leicester, University of Portsmouth, University of Sheffield and Harvard University Extension.