Sampling & Sample Size

A sample is a subset of a population that, if drawn correctly, reflects the behavior of the whole. You do not need to survey every person in the country; you need enough of the right people to get a reliable picture.

Size is relative to the population

There is no single “correct” sample size. The right number depends on the population you are trying to represent. A sample that works well for a national study may be far too small for a niche segment and larger than necessary for a specific city.

Roughly 10,000 respondents is good for a national study in the United States, and notably, the same figure applies to India despite India’s population being approximately 1.4 billion. The size of the source population matters far less than people expect once you pass certain thresholds. What matters is that the sample is representative of the variation within that population.

The 500,000 significance floor for audiences

For audience-based research, as opposed to surveys, the platform requires a minimum audience size before it can extract reliable behavioral signals. That floor is approximately 500,000 people.

Below that threshold, the audience is too small to produce behavior signals that hold up statistically. This is not an arbitrary cutoff; it reflects the minimum needed to observe consistent patterns across enough data points to distinguish real signal from noise.

When you are building custom audiences from an RFM-segmented database and then expanding via lookalikes, keeping this floor in mind tells you how aggressively to broaden the lookalike percentage. A 1% lookalike from a small seed list may still fall below 500,000. In that case, you either widen the lookalike or combine tiers.

Minimum data points for pattern detection

The significance floor is different from the minimum needed to detect a pattern in a scatterplot or regression. For that kind of analysis, the minimum is 30 data points: the threshold at which a pattern becomes statistically meaningful. In practice, the platform targets 50 to 60 matched respondents (referred to as “twins”) when building from a scatterplot, giving a comfortable margin above that minimum.

30 points → enough to see whether a relationship exists.
500,000 people → enough to call a behavioral audience significant.
~10,000 respondents → enough for a nationally representative survey.

Census representation within samples

A sample is only as good as its composition. If your respondents all come from one demographic group, the sample tells you about that group, not the population. Consumr.AI uses ACS (American Community Survey) and US census distributions to ensure respondent pools match the real demographic breakdown of the population being studied: age, gender, income, and other key dimensions.

This is covered in detail in Extrapolation & Census Data.

Why this matters for how you work

When a client asks “how many people did you survey?” you need to explain why that number is sufficient for the population in question. Being able to explain that 10,000 respondents is statistically sound for a US national study, regardless of how large that population is, is a core part of presenting research results credibly.

See How to Read and Interpret Results for how sample size appears in the platform’s output, and Margin of Error & Confidence Interval for how sample size connects to the error ranges you report.