Sovereign AI

Case Study: MeinDienstplan Did the Math

Jun 11, 2026

Cloud AI pricing assumes you will never do the math.

The setup

MeinDienstplan runs an AI research pipeline: agents that go out, gather information, and synthesize it into usable answers. The kind of workload that, on a per-request basis, looks cheap until you multiply it by real volume.

They were running that pipeline on the Perplexity API. It worked. The results were good. But every research request was a metered call to someone else's infrastructure, priced on someone else's terms, and the bill grew with usage, the way cloud bills always do.

So they ran the numbers on a different question: what would it cost to run the same workload on their own hardware?

The result

The same research task. The same output quality. Two very different price tags.

Perplexity API: €2.00 per request. On-premise with Xinity: €0.06 per request.

That's a 97% reduction. Not a projection, not an "up to" marketing figure, but the measured cost of the same task, run two ways.

The migration itself was almost anticlimactic: three environment variables changed, zero code changes to the application. The pipeline didn't know the difference. The finance team did.

What's actually running

This wasn't a downgrade to a weaker model to chase savings. MeinDienstplan moved the workload onto Qwen 3.6 35B (A3), running locally on an ASUS Ascent GX10 AI Supercomputer deployed on their own premises through Xinity. Capable model, real hardware, sitting in their building rather than in a hyperscaler's region.

The point isn't that on-prem is cheaper in the abstract. It's that for a high-volume, repetitive AI workload, exactly the kind of thing AI agents generate, the per-request economics of metered cloud APIs stop making sense well before most teams stop to check. Cloud pricing is built for the case where you never do the math. Once you do, the curve crosses.

Why this matters beyond the bill

Cost is the headline, but it isn't the only reason this move makes sense, and for a lot of European companies, it isn't even the main one.

When MeinDienstplan's research pipeline ran through a cloud API, every query left the building. For a workforce-management company handling employee and scheduling data, that's not a neutral fact. It's a data-residency and GDPR question that doesn't go away no matter how good the API is. Running the model on-premise means the data never leaves. There's no third-country transfer to justify, no processor agreement to lean on, no question of whose jurisdiction the request lands in. The data stays where the company can actually account for it.

That's the quiet part of the sovereign-AI argument. The cost savings get attention because they're dramatic. But the structural reason on-prem wins for regulated and data-sensitive companies is that it removes the entire category of "where did our data go" questions, by never sending it anywhere.

If your company needs to go on-prem

If you're reading this because your own company needs GDPR-compliant AI infrastructure, data that stays in your building, run on models you control, at a cost you can predict, that's exactly the problem this solves. The MeinDienstplan migration is one example of a pattern: take a workload you're currently renting from a cloud API, run it on hardware you own, and watch both the bill and the compliance exposure drop at the same time.

You don't have to rebuild your stack to find out whether the math works for you. Most teams are surprised by how little changes in the application, and how much changes in the economics.

Xinity delivers sovereign, on-premise AI infrastructure to European enterprises: the hardware, the open-source software, and the support to run generative AI entirely on your own servers, so your data never leaves your building.