Respondents & Extrapolation

Respondents are not the same as AI twins, and the mechanism that makes surveys statistically meaningful is not the same as what makes a twin intelligent.

Respondents are light twins

Respondents are lightweight, minimal versions of twins used exclusively for quantitative (quant) research: surveys. They are created in bulk, answer structured questions, and are deleted after the survey run. They are not conversational; they do not have the full memory and persona depth of an AI twin.

Qual (qualitative) work, conversations, focus groups, in-depth interviews, is done with AI twins. Quant work, surveys with statistical distributions, is done with respondents.

Why extrapolation is necessary: the denominator problem

Meta covers roughly 60–70% of the US population. When you pull a cohort from Meta, say, a segment of 80 million people, the true population represented by that behavioral type in the US is actually larger, perhaps 150 million. The Meta number is the denominator you have; the real-world number is the denominator you need for a representative survey.

If you run a survey against a Meta cohort without correcting for this gap, your results will reflect Meta’s demographic skews rather than the true population distribution. Certain age brackets, income groups, or gender splits may be over- or under-represented on Meta relative to their actual share of the population.

ACS / US census data as the correction mechanism

To correct for the Meta-vs-reality gap, Consumr.AI uses ACS (American Community Survey) data: the US Census Bureau’s ongoing national demographic survey. ACS provides the ground-truth demographic distribution for the US: age brackets, gender, income ranges, and other characteristics at a population level.

When creating respondents for a survey, the platform superimposes the ACS demographic distribution onto the Meta cohort. Instead of accepting whatever age/gender/income mix happens to appear in Meta for a given segment, the system generates respondents in the proportions that match the census, so the survey sample reflects the real population, not the Meta population.

This is not used for segment-building (which draws directly from Meta behavioral data). It is used only when creating respondents for surveys.

How the distribution is determined

The demographic distribution used for respondent creation is informed by an LLM-based educated guess derived from:

The category’s market penetration (how much of the population buys this type of product)
The brand’s estimated market share
The segment definition itself (who these people are)

This produces a starting-point distribution: for example, 48% loyal customers, 30% waiting to switch for a better rate. That distribution is editable. If a client has better data about their customer mix, they can adjust the distribution before the survey runs.

How many respondents are enough

For pattern detection, 60–70 respondents is sufficient. The minimum for a scatterplot, the standard data-science rule of thumb, is 30 data points. Consumr.AI uses 60–70 to provide headroom and ensure the distribution is detectable, not just barely present.

A full survey run may use up to 10,000 respondents when statistical precision across many demographic sub-groups is required. This is compute-intensive and is not done casually. After the survey completes, respondents are deleted. The survey results and the underlying reports are retained; the individual respondent objects are not.

See the Reports page for why this matters to research methodology.