What happened
The company is framing the bottleneck clearly. If advanced systems are generating millions of tokens per hard task, then 40 tokens/sec is not just a latency problem. It becomes a product ceiling, a cost ceiling, and eventually an adoption ceiling.
That is why the stated ambition of moving long-context workloads toward roughly 1,200 tokens/sec matters. Faster inference is not only about snappier chat. It changes what kinds of agentic work are economically possible.
The useful question for builders: which workflows are currently impossible because the answer takes too long to produce?
Source
Reported by Fractile raises US$ 220 million for AI inference hardware via w.media, published May 15, 2026.