AI
UX

Body 1: The AI

Models update on schedules you don't control. Behavior drifts between versions in ways that aren't always documented. The product your user tested last Tuesday might respond differently today.

Body 2: The User

People's mental models of AI are calibrating in real-time. Someone confused by ChatGPT six months ago might be a power user now. The "novice" and "expert" categories we rely on are less stable than they used to be.

twobody.ai

AI broke UX research.
Can AI fix it?

Experiments in studying systems that change with every interaction.

A research journal by Drew

EXPLORE

The Problem

The thing we're trained to do doesn't quite work anymore

UX research has always assumed you're studying a stable system. You observe users, find patterns, make recommendations, ship changes. The product stays put long enough for your insights to matter.

With AI products, that assumption falls apart. Both sides of the equation are moving. The model gets updated. User expectations shift as they learn what AI can and can't do. By the time you've synthesized your findings, the thing you studied isn't quite the thing that's shipping.

This is a collection of experiments and thoughts on how UX research tools and practices might need to evolve.

Mapping possible approaches

Click any node to explore how the pieces connect

CORE PROBLEM

Both the AI and user are moving targets

APPROACH 1

Scale the N

Compensate for instability with volume

APPROACH 2

Control AI variance

Hold the system constant during testing

APPROACH 3

Rethink what to measure

New metrics for both user and AI

APPROACH 4

Go upstream

Abstract each side, then compare

H1

Synthetic users find the same issues as real users

H2

Passive AI moderators improve some data

H3

We can detect when research is invalidated

H4

Trust calibration follows measurable patterns

H5

AI analysis works for structured questions

H6

Surfacing AI confidence improves user calibration

RELATED RESEARCH

📓 NotebookLM deep dive coming soon

Hypotheses in Detail

Experiments & collected thoughts

H1

Testing

AI-simulated users can identify the same major usability issues as real users for most routine evaluations.

Scale the N

If true, this would let us run cheap, fast sanity checks before investing in real user research. The interesting question isn't "does it work" but "where does it break down." My guess is it fails on anything involving trust, emotional response, or domain expertise.

Experiments

H2

Exploring

AI moderators playing passive observer roles—and human moderator removal in specific contexts—can improve data quality.

Scale the N Rethink what to measure

Two related ideas here. First: some participants perform more naturally without a human researcher watching. Second: AI moderators can adopt passive observer roles that humans simply cannot—infinitely patient, never reacting, intervening only when things go off the rails. The question is whether this captures what matters while reducing observer effects. I suspect the answer depends heavily on task type and participant comfort with AI.

Experiments

H3

Queued

We can detect when an AI product has changed enough that previous research findings no longer apply.

Control AI variance

Right now, research invalidation is vibes-based. Someone notices the product feels different, maybe. What if we could instrument this—track behavioral signatures over time and flag when drift crosses a threshold? Not sure if it's possible, but worth exploring.

Experiments

H4

Queued

User trust calibration follows predictable patterns that can be measured longitudinally.

Rethink what to measure

People start with some mental model of what AI can do. They use it, get surprised (positively or negatively), and adjust. Over time, their expectations stabilize—or don't. If there's a pattern here, we could design for trust calibration instead of just measuring satisfaction at a point in time. The tricky part is that people are often bad at introspecting on their own mental models, so we'd need both self-reported measures and behavioral proxies—what people say they expect vs. how they actually behave.

Experiments

H5

Exploring

AI analysis of qualitative data approaches human-level quality for well-structured research questions.

Scale the N Go upstream

"AI can analyze interviews" is too broad. The real question is: for which types of analysis, with what constraints, and how do you know when it's working? My hunch is it's good at finding patterns in explicit statements and bad at interpreting what people didn't say.

Experiments

H6

Queued

Surfacing AI confidence signals to users improves their trust calibration.

Rethink what to measure

This connects the "measure the AI" angle with user outcomes. If the AI can tell you when it's uncertain—and you actually expose that to users—does their mental model calibrate better? This is a design intervention that could be tested empirically. The interesting questions: what form should confidence signals take? Do users actually attend to them? Does it help, or just create anxiety?

Experiments