In short
- A new study shows that LLMs can mimic human purchase intention by mapping free-text responses to Likert reviews through semantic similarity.
- The method achieved 90% of human test-retest reliability based on 9,300 real survey responses.
- The study raises questions about bias, generalization and to what extent ‘synthetic consumers’ can stand in for real people.
Forget focus groups: A new study finds that large language models can predict whether you’ll buy something with striking accuracy, dramatically outperforming traditional marketing tools.
Researchers from the University of Mannheim and ETH Zurich have found that large language models can mimic human purchasing intention – the question “How likely are you to buy this?” statistic loved by marketers, by turning free text into structured survey data.
In a paper published last week, the team introduced a method called “Semantic similarity assessment”, which converts the model’s open-ended responses into numerical “Likert” ratings, a five-point scale used in traditional consumer research.
Instead of asking a model to choose a number between one and five, the researchers had it respond naturally – “I would definitely buy this” or “Maybe if it was on sale” – and then measured how semantically similar these statements were to canonical responses like “I would definitely buy this” or “I wouldn’t buy this.”
Each response was mapped to the nearest reference statement in the enclosed space, effectively converting LLM text into statistical assessments. “We show that optimizing semantic similarity rather than numerical labels produces purchase intention distributions that closely match human survey data,” the authors wrote. “LLM-generated responses achieved 90% of the reliability of repeated human surveys, while retaining natural variation in attitudes.”
When tested among 9,300 real human survey responses about personal care products, the SSR method produced synthetic respondents whose Likert distributions nearly mirrored the originals. In other words, when asked to “think like consumers,” the models did just that.
Why it matters
The finding could change the way companies conduct product testing and market research. Consumer surveys are notoriously expensive, slow and vulnerable to bias. Synthetic respondents – if they behave like real respondents – could let companies screen thousands of products or messages at a fraction of the cost.
It also validates a deeper claim: that the geometry of an LLM’s semantic space not only encodes language comprehension, but also attitudinal reasoning. By comparing answers within the embedding of space rather than treating them as literal text, the research shows that model semantics can replace human judgment with surprising accuracy.
At the same time, it entails known ethical and methodological risks. The researchers tested only one product category and did not comment on whether the same approach would apply to financial decisions or politically charged topics. And synthetic ‘consumers’ could easily become synthetic goals: The same modeling techniques can help optimize political persuasion, advertising or behavioral impulses.
As the authors put it, “market-driven optimization pressures can systematically erode alignment” – a phrase that extends far beyond marketing.
A note of skepticism
The authors acknowledge that their testing domain – personal care products – is limited and may not generalize to high-stakes or emotionally charged purchases. The SSR assignment also depends on carefully chosen reference statements: small changes in the wording can distort the results. Furthermore, the study relies on human survey data as “ground truth,” even though such data is notoriously noisy and culturally biased.
Critics point out that embedding-based similarity assumes that language vectors map neatly onto human attitudes, an assumption that can fail when context or irony enter the mix. The article’s own reliability figures – 90% of human test-retest consistency – sound impressive, but still leave room for significant deviation. In short: the method works averagebut it is not yet clear whether these averages reflect true human diversity or simply reflect the model’s training advantage.
The bigger picture
Academic interest in “synthetic consumer modeling” has surged in 2025 as companies experiment with AI-based focus groups and predictive polling. Similar work from MIT and the University of Cambridge has shown that LLMs can mimic demographic and psychometric segments with moderate reliability, but none have previously shown close statistical agreement with real purchase intention data.
For now, the SSR method remains a research prototype, but it points to a future where LLMs may not only answer questions, but also represent the public itself.
Whether that is progress or a hallucination in the making is still up for debate.
Generally intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.