AI's Next Leap: Thinking Machines Unveils 'Interaction Models' to End 'Turn-Based' Chat
13 May, 2026
Artificial Intelligence
AI's Next Leap: Thinking Machines Unveils 'Interaction Models' to End 'Turn-Based' Chat
For years, interacting with AI has felt a bit like playing a game of digital tennis. You make a move (ask a question or give a command), you wait for the AI to return the ball (its answer), and then you prepare for the next volley. This "turn-based" system, while functional, often feels clunky and unnatural, especially as AI models become more sophisticated and capable of handling complex, multimodal inputs.
But what if AI could chat, see, and respond with the fluid, real-time rhythm of human conversation? That's the groundbreaking promise of Thinking Machines, a startup founded by AI heavyweights from OpenAI, who have just unveiled a preview of their revolutionary "interaction models." This isn't just an incremental update; it's a fundamental reimagining of how we interact with artificial intelligence.
Beyond the Turn: The 'Full-Duplex' AI Experience
The core innovation behind Thinking Machines' approach is the concept of "full-duplex" simultaneous input/output processing. Unlike current AI, which processes one thing at a time, their new models are designed to listen, speak, and see concurrently. This means an AI could, for example, backchannel (like saying "uh-huh" or "I see") while you're still speaking, or even interject with a relevant observation based on a visual cue, all without missing a beat.
This "collaboration bottleneck" is tackled by moving away from the standard alternating token sequence. Instead, Thinking Machines uses a multi-stream, micro-turn design that processes information in tiny 200ms chunks. This allows the AI to perceive and react to the world in real-time, mirroring human conversational patterns more closely than ever before.
The Dual-Model Architecture for Seamless Interaction
To achieve this level of responsiveness without sacrificing deep reasoning capabilities, Thinking Machines has architected a clever two-part system:
The Interaction Model: This is the AI's conversational "face." It handles the moment-to-moment dialogue, maintains awareness of presence (who is interacting and what's happening around them), and manages immediate follow-ups.
The Background Model: This is the powerhouse for complex tasks. It asynchronously handles sustained reasoning, web browsing, or executing complex tool calls, seamlessly feeding its results back to the Interaction Model to be woven into the conversation naturally.
This separation allows the AI to perform tasks like live translation or generating data visualizations while still actively listening to your feedback, a capability vividly demonstrated in their preview video.
Performance That Speaks Volumes
The impressive claims aren't just theoretical. Thinking Machines has put their models to the test using FD-bench, a benchmark specifically designed to measure interaction quality. The results are striking:
Responsiveness: Their TML-Interaction-Small model achieved a turn-taking latency of just 0.40 seconds, significantly faster than competitors like Gemini-3.1-flash-live (0.57s) and GPT-realtime-2.0 (1.18s).
Interaction Quality: On the FD-bench V1.5, the model scored a remarkable 77.8, nearly doubling the scores of its closest rivals.
Visual Proactivity: In specialized visual tests, the model demonstrated an ability to engage with and understand visual input in real-time, a feat where other models faltered.
This leap in performance is attributed to techniques like encoder-free early fusion, which processes raw audio and image data directly through a lightweight embedding layer, rather than relying on massive, separate encoders.
Transforming Industries: The Enterprise Impact
While the models are currently in a limited research preview, their potential impact on enterprises is enormous. Imagine:
Manufacturing & Labs: AI monitoring video feeds and proactively alerting workers to safety violations or protocol deviations in real-time, not after a full turn is completed.
Customer Service: Support bots offering natural, conversational assistance with near-instantaneous responses and seamless live translation, eliminating the frustrating processing delays of current systems.
Time-Sensitive Operations: AI natively understanding and managing time-sensitive processes, crucial for industrial maintenance, pharmaceutical research, and more.
This "native interactivity" is a game-changer, moving AI from a passive tool to a truly integrated collaborator.
A Look Back at Thinking Machines
Founded in early 2025 by a team including former OpenAI CTO Mira Murati, Thinking Machines has rapidly gained significant traction. They previously launched Tinker, an API for fine-tuning language models, and have secured substantial funding, raising approximately $2 billion at a $12 billion valuation. The company has also been active in building its compute infrastructure, forging partnerships with Nvidia and Google Cloud.
While the talent flow between Thinking Machines and other tech giants like Meta has been notable, the company continues to attract top-tier talent, reinforcing its position at the forefront of AI innovation. The commitment to open-source components in past releases suggests a potential for broader community access to these new models in the future.
With "interaction models," Thinking Machines is not just building smarter AI; they are building AI that collaborates more effectively, ushering in an era where the boundaries between human and machine interaction blur into a more natural, intuitive, and powerful exchange.