The world is circular. Kubernetes made developers forget about infrastructure and helped enable AI. Now, AI is making them remember. Hardware—which cloud relegated to the back room—is now in the spotlight again.
At KubeCon + CloudNativeCon this week, attendees kept facing the same conclusion when it came to Kubernetes and AI: you can’t scale generative AI, inference jobs, or agentic systems with an outdated hardware stack dragging behind.
AI training will be the responsibility of the few, but demand for AI inference engines will come from the masses. Inference workloads are everywhere. They’re in customer-facing apps, developer tools, and are latency sensitive, cost sensitive, and hardware hungry.
How Are Vendors Responding?
Hardware-agnostic pipelines are now the new grail—build once, run anywhere. “We’re now in a place where we have to consider 400-gig networking because the models need stuff like that,” said Joep Piscaer, analyst at TLA Tech B.V.
Piscaer and Ned Bellavance, independent consultant and technical educator, spoke with theCUBE’s Rob Strechay and Savannah Peterson at the KubeCon + CloudNativeCon NA event during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the future of Kubernetes and AI and how Kubernetes is increasingly optimizing for AI.
Vendors and Technologists Rethink the Stack for AI
Cloud providers, open-source projects, and platform teams are all lining up to make inference workloads less stressful for infrastructure.
Analysts noted that Google Cloud’s Google Kubernetes Engine and other platforms are becoming hardware-agnostic, capable of running AI on GPUs, TPUs, or edge devices. Frameworks such as SynergAI demonstrate Kubernetes orchestrating AI across heterogeneous hardware, reducing quality-of-service violations by 2.4×.
“Inference is going to happen on hardware that people are touching, but it’s going to be Kubernetes built into all this stuff intuitively,” Peterson said.
In other words, the old “throw GPUs at it” no longer cuts it. What’s also needed are fine-grained control, hardware scheduling, network fabric, and accelerator resource management.
Infrastructure builders are now thinking more about optimizing for AI from scratch. If hardware is coming back, then the eyes and ears of the system have to evolve too.
AI is Placing Heavy New Demands on Observability
To illustrate, Strechay related his experience at this year’s Infrastructure as Code Conference: “The last question we got was around, ‘How do I do observability for prompts?’ I think the complexity of AI and so many moving parts has everybody coming to the table because everybody’s freaked out,” he said.
Bellavance echoed this uncertainty: “We have these golden signals we normally observe for: ‘What’s my CPU utilization? What’s my response times on things?’ Now there’s a new metric we have to watch, which is the prompt and also the response. That’s going to be tough, and people need to start instrumenting for that.”
Platforms such as OpenTelemetry and eBPF-powered tooling are emerging, tracking not just CPU or memory, but prompts, responses, token usage, and retrieval accuracy.
AI-native observability for AI systems enables in-production troubleshooting and performance optimization.
Kubernetes Adapts for AI’s Second Arc
AI is forcing Kubernetes to evolve. It’s no longer just about containers. Inference workloads, agentic applications, and massive model deployments demand GPU scheduling, accelerator-aware orchestration, and high-speed networking.
The Certified Kubernetes AI Conformance Program launched by CNCF at the conference sets standards for GPU/TPU scheduling, telemetry, and cluster orchestration specifically for AI workloads.
Google Cloud’s GKE Pod Snapshots reduce inference startup times by up to 80%.
Back in the “before times,” we asked: “What’s after Kubernetes?” Early Kubernetes contributor Kelsey Hightower has said the platform was designed to last about 20 years—it turned 11 this year.
The present question isn’t “what’s after K8s?” but rather “what happens with K8s when AI becomes the driving workload?” The answer is unfolding at KubeCon: Kubernetes is becoming the central nervous system for the AI stack, with clusters optimized for inference, smarter scheduling, predictive scaling, and advanced observability.
Welcome to the second arc, according to the analysts.
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the KubeCon + CloudNativeCon NA event.
Disclosure: TheCUBE is a paid media partner for the KubeCon + CloudNativeCon NA event. Neither Red Hat Inc., the primary sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.
Photo: SiliconANGLE.

Be First to Comment