Skip to content
All posts

The chip that skips the bottleneck: what Olix means for AI founders building in 2026

Andrew Williams··7 min read

You have shipped an AI-powered product. Usage is growing. So is the bill. Every new user compounds a cost you cannot easily negotiate down, because the pricing you are paying sits on top of a market where the underlying supply is already spoken for.

That is not a vendor problem. It is a supply chain problem. And most founders building on LLMs in 2026 have not fully priced in what that means.

This is a briefing on that problem, and on a credible signal that it has a structural solution coming.

The inference cost problem is worse than it looks

When people talk about AI compute costs, they usually reach for training as the headline. Training a frontier model costs tens of millions. True. But for founders building products on top of existing models, training costs are largely irrelevant. What you are paying for is inference: the compute consumed every time your product calls a model to do something useful.

Inference now accounts for 80 to 90% of total AI compute spend over a model's lifetime in production. Early-stage AI startups hitting meaningful usage are spending $10,000 to $30,000 per month on compute before they have reached Series A. At scale, the number does not plateau. It compounds.

The standard response is to optimise: smaller models, caching, batching, prompt compression. All of that helps. But there is a ceiling on what optimisation can do when the underlying infrastructure is structurally expensive. And right now, it is structurally expensive because of a supply crisis most founders have not looked closely at.

What HBM is and why its shortage is your problem

High Bandwidth Memory, HBM, is the specialised memory that sits on modern AI accelerators. It is what allows a GPU to move vast amounts of data to and from its processing cores fast enough to run matrix multiplications at useful speeds. Without it, the chip throttles. There is no viable substitute in conventional GPU architecture.

The problem is that HBM is extraordinarily difficult to manufacture. SK Hynix and Micron between them produce most of the world's supply, and as of early 2026, their entire year's production is already allocated. Sold out. Memory prices in some segments have more than doubled since February 2025. The hyperscalers locked in supply first. Everyone further down the stack is paying spot prices on a constrained market, and those costs flow directly into the inference pricing you see on your cloud bill.

Your GPU bill is not just expensive because inference is computationally intensive. It is expensive because the chip that does the inference depends on a memory technology that the market cannot produce fast enough to meet demand.

Olix and the OTPU

This is where Olix becomes relevant. The London-based startup raised $220m in February 2026 and crossed a $1bn valuation, making it one of the fastest unicorns the UK has produced. The thing it is building is called the Optical Tensor Processing Unit, the OTPU, and it attacks the inference cost problem from a direction that is architecturally different from anything shipping today.

The OTPU uses photonic interferometers to perform matrix multiplications using light rather than electrical logic gates. That shift in physics produces an efficiency figure that is worth sitting with: 10 watts per PetaFLOP, versus approximately 300 watts for current top-tier GPUs. A 30x improvement in energy efficiency per unit of useful work.

More important for the supply chain argument: the OTPU replaces HBM with SRAM integrated into the photonic architecture. It does not just improve on HBM. It removes the dependency entirely. The bottleneck is not reduced. It is sidestepped.

This is not a GPU killer story. It is a structural alternative story. The compute oligopoly that currently controls AI inference infrastructure has a genuine weakness, and photonics is the credible direction that weakness is being exploited from.

James Dacombe, who is 25, founded the company (originally registered as Flux Computing in March 2024) while simultaneously running CoMind, a brain monitoring startup that has itself raised $100m. The Sifted profile is worth reading not because Dacombe's age is the story, but because the ambition density is a useful signal about where serious technical talent in the UK is directing itself right now. Deep hardware. Long cycles. Infrastructure bets that only pay if you are right about where compute is going.

What founders should actually do now

The OTPU ships in 2027. That means this piece would be useless if it ended here. The relevant question for a founder managing inference costs today is not whether to wait for photonics. It is what the structural context changes about how you make infrastructure decisions now.

Three things worth doing.

Treat inference spend as a first-class metric. Most early-stage AI products track inference costs as part of COGS and leave it there. That is not granular enough. Break it down by feature, by user segment, by model version. Understand which parts of your product are generating the spend. You cannot make smart architectural decisions about what to optimise or replace without that visibility. If you are not already doing this, the how to price your SaaS problem becomes very difficult to solve when a core input cost is opaque.

Diversify infrastructure vendors where you can. Most founders pick one cloud AI provider and stay there. That is convenient, but it concentrates supply chain risk. The HBM shortage is already forcing allocation decisions at the hyperscaler level that trickle down to availability and pricing for everyone below. Staying close to multiple providers, even at small scale, keeps options open as the compute landscape shifts. The stack decisions you make at MVP stage tend to harden faster than founders expect.

Watch the photonics space actively. Olix is not alone. The investment flowing into alternative compute architectures in 2025 and 2026 is not noise. When a coherent technical alternative to GPU-based inference becomes commercially available, adoption will be faster than most people assume, because the demand-side pain is already severe. Founders who understand what they are buying from new providers, rather than defaulting to whatever the hyperscalers offer, will be better positioned to make that switch on their terms.

The broader point

The inference cost problem in 2026 is real, and it is structural. You are not just paying for compute. You are paying for compute that depends on a constrained memory supply chain that the largest players in the world have already claimed. Optimisation helps at the margin. A different architecture solves it at the root.

Olix is early evidence that the root solution is coming. The $220m raise, the technical specificity of the photonic approach, the HBM bypass, the 30x efficiency figure: these are not vaporware signals. They are the profile of a serious hardware bet that has attracted serious capital.

The meta-point is this. The infrastructure layer that your AI product sits on is being actively rebuilt. The teams who understand that, who watch where the compute is going and time their product architecture accordingly, are not just keeping costs down. They are building on the right foundations before the rest of the market catches up.

That is not a prediction about Olix specifically. It is a general principle about how infrastructure transitions work. The bottleneck shifts. The winners are the ones who saw it moving.


Sources: Data Center Dynamics, Sifted, SiliconAngle, TechBytes, EnkiAI.

Founder insights

Weekly notes on product, brand, and shipping fast - no spam.

More Posts