What is size bracketing in ecommerce?

Size bracketing is when a shopper orders the same item in multiple sizes with the intention of returning whichever ones don't fit. It's one of the most common and most preventable drivers of high return rates in apparel and footwear ecommerce.

How do I know if size bracketing is affecting my return rate?

Look for same-SKU, same-order line items in your returns data, orders where the same product in multiple sizes appears in one transaction. If 10% or more of your returns follow this pattern, size bracketing is a significant driver. Return reason codes for "wrong size," "too large," or "too small" on items ordered more than once in the same transaction are another clear signal.

Why don't better size charts fix the problem?

Size charts measure garments, not people. They also don't account for how sizing varies across brands or for individual fit preferences. A shopper who knows they're a medium in one brand has no reliable way to use a size chart to figure out what size they are in a different brand. That's the information gap that drives bracketing.

What actually reduces fit-related returns?

Interventions that happen before the order is placed, at the size selection moment on the product page. Personalized size recommendations based on real purchase and return outcomes from similar shoppers, calibrated across brands, give the shopper a confident single recommendation rather than leaving the decision to them. Moosejaw reduced overall size bracketing rates 24% in one year using this approach. APL reduced fit-related returns by 15%.

How does True Fit reduce size bracketing?

True Fit's Fashion Genome is built on real purchase and return outcomes from 80 million+ shoppers across 91,000+ brands. When a shopper reaches the size selector, True Fit recommends the right size based on what similar shoppers actually bought and kept, not on self-reported measurements or static size charts. Moosejaw used True Fit to detect multi-size cart behavior in real time and trigger fit guidance at that exact moment, resulting in a 24% decline in bracketing rates over one year.

What is the real cost of an apparel return?

Most retailers calculate return cost as the refund plus shipping. The full cost runs higher once you account for processing and inspection labor, repackaging, restocking delay, markdown risk on items that can't go back to full price, and environmental impact. Reducing the number of returns has a significantly higher ROI than improving the returns process.

How fit recommendation technology works: what to look for when evaluating platforms

Romney Evans

Co-founder & Chief Marketing Officer

May 8, 2026

You're three demos in. The decks look almost identical. Each vendor opens with the same category framing, runs the same kind of clean PDP widget, and lands on the same outcome promises: higher conversion, lower returns, AI-powered recommendations.

By the second hour, the question forming in the back of your head isn't which vendor is better. It's whether there's actually a difference.

‍

There is. The performance gap between fit recommendation platforms is significant, and it traces back to one variable almost no vendor will lead with: the data the system is learning from. The marketing language across the category has converged. The underlying mechanics have not.

‍

This post is a vendor-neutral framework for evaluating fit recommendation technology, sometimes searched as fit analytics or a clothing size predictor. It explains what these systems are actually doing, why some work meaningfully better than others, and the specific questions that surface the real difference between vendors before you've signed a contract.

‍

What fit recommendation technology has to solve

‍

Sizing in apparel and footwear is genuinely hard. Three structural problems combine to make a single, confident size recommendation difficult to produce.

‍

The first is brand inconsistency. A shopper who wears a medium in one brand may need a small or a large in another. There is no universal sizing standard, and even within a single brand, fit varies across product lines, lasts, and silhouettes. The size label on the tag is not a reliable input for a recommendation system.

‍

The second is shopper variation. Two people who wear the same nominal size can have completely different fit preferences and body characteristics. Some prefer relaxed fits, some prefer slim. Some have proportions that map cleanly to standard size grades, some don't. A good fit recommendation has to account for who the shopper is, not just what size they typed in once.

‍

The third is the self-reporting gap. Most shoppers cannot accurately describe their own measurements or fit preferences. The shoppers most uncertain at the size selector, the ones the technology is supposed to help, are also the ones least able to provide accurate inputs to a system that asks them what size they wear.

‍

A fit recommendation platform that doesn't solve all three of these simultaneously will produce inconsistent recommendations. The shoppers most in need of help will get the worst answers, and the retailer's return rate won't move.

‍

The architectural fork in the road: survey-based vs outcome-based

‍

Most fit recommendation platforms fall broadly into two architectural categories. They may look similar from the outside, but they behave very differently in production.

‍

Survey-based platforms generate recommendations from shopper-reported inputs: height, weight, fit preferences, sometimes a few questions about how clothes have fit in the past. Fast to deploy, lightweight, and cheap to operate. They work well for shoppers who already know their size and fit preferences accurately.

‍

Outcome-based platforms learn from real purchase and return data across a network of shoppers and brands. The recommendation is built from what comparable shoppers actually kept, calibrated against the specific brand and style on the page. The system doesn't depend on the shopper to self-report their measurements correctly.

‍

We covered the mechanics of this distinction in detail in our recent post on how fit finder tools actually work. The short version, in evaluation terms: the architecture determines whether the recommendation is grounded in observed behavior or in what the shopper thought to type into a form. Both approaches are sold as AI-powered fit recommendations. They are not the same thing.

‍

Why data scale is the variable that matters most

‍

In an outcome-based system, the size of the underlying dataset is not a feature comparison line item. It is the variable that determines whether the recommendations are accurate.

‍

These systems work by mapping a shopper's history against the histories of comparable shoppers across a network of brands and products. The wider the network, the more often the system has seen "a shopper with this profile, on this kind of product, in this brand, kept size X." When the network is small, the system has to extrapolate. When the network is large, it can ground the recommendation in observed behavior almost every time.

‍

For retailers carrying brands that are smaller, newer, or non-standard, this matters disproportionately. A vendor with a narrow dataset can produce reasonable recommendations for the most-shopped silhouettes from the most-shopped brands, and break down on the long tail of your catalog. A vendor with a deep, broad dataset produces consistent recommendations across the full assortment.

‍

When evaluating vendors, this is the kind of data foundation buyers should ask to see in concrete terms. Here is what a scaled outcome-based platform looks like in practice.

‍

True Fit's Fashion Genome is built on 80 million plus registered shoppers, 60 million unique products, 91,000 plus brands, and $616 billion in analyzed transactions across nearly 20 years of purchase and return outcomes. According to Datanyze, True Fit holds 63% market share in fit prediction technology, larger than every other fit tech vendor combined.

‍

Across the retailers we work with, the pattern that holds is this: a dataset of that size takes decades to assemble, and the recommendations it produces can't be approximated by a smaller system using a more sophisticated model.

‍

The data is the model.

‍

What a Fashion Genome is and how cross-brand calibration works

‍

A Fashion Genome is a connected dataset of purchase and return outcomes that lets a fit recommendation system reason about how sizing translates across brands. It is the technical infrastructure that makes "I'm a medium in Brand A, what should I order in Brand B" a tractable question.

‍

Here is how that works in practice. When a shopper has a profile with True Fit, their previous purchase and return outcomes across every retailer in the network are part of the calibration. If the shopper kept a medium in three athletic brands with similar fit characteristics, and is now on a product page for a fourth brand, the Fashion Genome maps the new product to the same region of the dataset and returns the size that comparable shoppers in that region kept most often. The recommendation is not derived from a size chart comparison. It is derived from outcomes.

‍

For a new shopper without a profile, the system uses aggregate patterns from comparable shoppers to produce a starting recommendation, which improves as the shopper's own history builds.

‍

Cross-brand calibration is the capability that distinguishes a connected outcome dataset from a single-retailer recommendation engine. Most retailer-side personalization tools can tell you what your existing customers bought before. They can't tell you what a first-time shopper kept across the rest of their wardrobe, or how a brand on your PDP relates to the brands the shopper already trusts. Resolving fashion's inconsistent sizing problem requires data that lives outside any single retailer's four walls.

‍

How fit recommendation technology integrates with your stack

‍

Implementation is where vendor evaluations often go off track. The vendor with the slickest demo can be the wrong fit if the integration model doesn't match how your team operates.

‍

Buyers should ask which integration paths are available: PDP widget for any commerce platform, Shopify app for native Shopify deployments, API for headless and custom stacks, and, increasingly, MCP support for agentic commerce environments where AI shopping agents need fit data through a standard protocol. The right architecture depends on where your stack is today and where it's heading over the next 12 to 24 months.

‍

True Fit supports each of these paths. The JavaScript widget deploys on any PDP and ties into the retailer's existing CMS. The Shopify app is available for merchants on Shopify Plus and standard Shopify. The API supports custom integrations and headless commerce stacks. And MCP Fit Intelligence makes True Fit's recommendations available to AI shopping agents through the Model Context Protocol, so fit data travels with the shopper into the agentic commerce experiences emerging now. Forward-looking buyers should weigh this heavily: an AI agent that can't reason about fit will guess, and the retailer never sees the missed sale.

‍

The integration question that matters most for the buyer is what the vendor needs from your team during deployment, what ongoing maintenance looks like, and how new brand catalogs are onboarded. Some vendors require ongoing data work from the retailer to keep recommendations accurate. Outcome-based systems with broad networks tend to handle new brands and catalog changes more automatically because the system already has data on most brands the retailer is likely to add.

‍

The six evaluation questions that separate vendors

‍

If you're evaluating fit recommendation platforms, the marketing decks will sound similar. The answers to these six questions will not. For each one, the kind of answer the vendor gives matters as much as the answer itself.

‍

1. Where does the recommendation data come from?

A strong answer names the data source explicitly: real purchase and return outcomes, the number of retailers contributing data, and how the network is closed-loop. A weak answer talks about "AI" or "machine learning" without identifying what the model is trained on. If the vendor cannot point to outcome data, the system is most likely running on shopper self-report.

Red flag: the vendor describes the model as AI-powered but cannot clearly explain what real outcomes the model is learning from.

‍

2. How large is the dataset, and how is it growing?

Ask for the actual numbers: unique shoppers, brands and products, dollar value of transactions analyzed, years of history. A strong answer gives you those numbers immediately. A weak answer uses adjectives like "extensive" or "comprehensive." In this category, scale is the moat, and any vendor with real scale should be able to tell you exactly what theirs is.

Red flag: the vendor relies on adjectives instead of providing specific shopper, brand, product, and transaction counts.

‍

3. How does the system handle a brand the dataset has never seen before?

This is often where survey-based vendors have the weakest answer. A strong answer describes how the network has already encountered most brands the retailer is likely to add, and the cold-start fallback for genuinely new brands. A weak answer is silence, vagueness, or a request to "send us the size charts" so the vendor can build out the brand manually. The retailer that is launching new brands quarterly should care a lot about this answer.

Red flag: the vendor needs the retailer to ship size charts so they can manually onboard each new brand.

‍

4. What analytics does the retailer receive?

A strong answer returns shopper insight back to you: who is shopping, what fit preferences they exhibit, which brands convert with which shopper segments, and where return drivers concentrate. A weak answer is a black-box widget on the PDP with no retailer-side data. The vendor that gives you the recommendation but not the underlying intelligence is leaving meaningful value on the table.

Red flag: the vendor offers a recommendation widget on the PDP but no retailer-side dashboard, segmentation, or return-driver insight.

‍

5. What does the privacy architecture look like?

A strong answer clearly explains whether personally identifiable information is used in recommendation logic, what data is stored, what data is excluded from the model, and how shopper privacy is protected end to end. APL specifically chose True Fit because the system learns from outcome patterns at population scale rather than from storing individual biometric data. As privacy regulations continue to tighten globally and shoppers scrutinize data practices more closely, the vendors that can articulate this architecture in concrete terms will continue to have the easier path.

Red flag: the vendor cannot articulate clearly which inputs are PII, which are excluded from recommendation logic, or how the data is stored and protected.

‍

6. What does the implementation actually require?

Time to first recommendation, ongoing data work, catalog onboarding, and what happens when you add a new brand or product line. A strong answer matches what you'll see in the contract terms and the implementation guide. A weak answer is a frictionless demo with the real complexity hidden in onboarding.

Red flag: a frictionless demo with the real complexity buried in the implementation guide or onboarding contract. Always ask to see the runbook before signing.

‍

These questions don't favor any particular vendor. They surface the real differences between vendor architectures so you can make a decision that matches your business, not the marketing.

‍

The bottom line

‍

Almost every vendor in this category describes themselves the same way. AI-powered. Built on data. Proven outcomes. The decks have converged because the marketing has matured faster than the underlying technology has differentiated in plain language.

‍

The question that actually separates vendors isn't whether they use AI. It's what the AI is learning from. A model trained on shopper self-reports can only recommend what shoppers think they want. A model trained on real purchase and return outcomes recommends what shoppers actually keep. One has a ceiling. The other compounds.

‍

And the next phase of the category, the one that's already arriving, is fit recommendation traveling beyond the retailer's PDP into agentic commerce. The vendor whose data is large enough and structured enough to be the trusted fit layer for AI shopping agents will be the vendor whose recommendations travel with the shopper into wherever the next purchase decision happens. That is a question worth asking on every shortlist now, not three years from now.

‍

Want to pressure-test your shortlist? Bring these six questions to a True Fit demo at truefit.com/get-started and we'll show how the Fashion Genome answers them against your catalog, shoppers, and categories.

‍

Frequently asked questions

‍

What is fit recommendation technology?

Fit recommendation technology is software that delivers a personalized size and fit recommendation to a shopper at the product page level. The output is typically a single recommended size, often with a confidence score. The two architectural approaches in market are survey-based (built on shopper-reported inputs) and outcome-based (built on real purchase and return data across a network of shoppers and brands).

‍

What is fit analytics?

Fit analytics is a category term that refers to the use of data and AI to deliver fit recommendations and analyze fit-related shopper behavior. It also refers to a specific vendor in the category. When evaluating fit analytics platforms, the most important distinction is whether the recommendations are built on real purchase and return outcomes across a large network of shoppers and brands, or on shopper-reported survey inputs.

‍

How is fit recommendation technology different from a clothing size predictor or size chart?

A size chart provides garment measurements. A clothing size predictor or fit recommendation platform provides a personalized size recommendation for a specific shopper, based on either self-reported inputs or outcome data. A size chart tells the shopper what the garment measures. Fit recommendation technology tells the shopper which size to order.

‍

What is the difference between survey-based and outcome-based fit recommendation tools?

Survey-based tools generate recommendations from shopper-reported inputs (height, weight, fit preferences). Outcome-based tools generate recommendations from real purchase and return data across a large network of shoppers and brands. The mechanisms are different and the production accuracy is different, especially for shoppers who can't accurately self-report.

‍

Why does data scale matter for fit recommendation accuracy?

Outcome-based systems work by matching a shopper to comparable shoppers in the network. A larger network means the system has seen more comparable cases and can ground the recommendation in observed behavior. Smaller networks have to extrapolate, which leads to inconsistent accuracy across the catalog, especially on brands and silhouettes outside the most popular tier.

‍

What is a Fashion Genome?

A Fashion Genome is a connected dataset of real purchase and return outcomes across a network of shoppers, brands, and products. True Fit's Fashion Genome includes 80 million plus shoppers, 60 million unique products, 91,000 plus brands, and $616 billion in analyzed transactions. It enables cross-brand calibration: accurate size recommendations when a shopper moves between brands with different sizing conventions.

‍

How does True Fit handle privacy in fit recommendations?

True Fit's recommendations are built on outcome data at population scale, not on personally identifiable information. APL specifically chose True Fit for this reason: the system learns from observed patterns across many shoppers, not from storing individual biometric data. This privacy architecture matters increasingly as privacy regulations tighten globally.

‍

What questions should I ask a fit recommendation vendor?

Six questions surface the differences vendor decks usually conceal: Where does the recommendation data come from? How large is the dataset? How does the system handle new brands? What analytics does the retailer receive? What does the privacy architecture look like? What does implementation actually require? For each one, what matters is not just the answer but the specificity of the answer.

‍