30 May 2026

7 Min. read time

The average GPU utilization in enterprises hovers around five percent. The rest sits idle while data is copied, staged, and positioned before a workload can even begin. On May 26, Qumulo and Cisco unveiled an architecture designed to close this gap: making existing accelerators work faster instead of buying new ones.

Key Takeaways

  • Liquidity over purchase: The Cloud AI Accelerator delivers distributed enterprise data to GPUs in real time-no replication or weeks of staging required.
  • Connect, don’t copy: On-premises and cloud systems integrate seamlessly with AWS Bedrock, Azure AI Foundry, and Google Vertex AI without data duplication.
  • Cisco as hybrid anchor: Networking, security, and UCS compute underpin the architecture across AWS, Azure, Google Cloud, and OCI.

Related:AWS and Nvidia: GPU million forces platform teams to adapt  /  FinOps sees everything but can’t act

What Qumulo and Cisco announced

What is GPU liquidity? GPU liquidity is the approach of making existing graphics processors productive faster instead of procuring new ones. Data is ready without lengthy staging, so accelerators start their actual work sooner. The bottleneck shifts from hardware procurement to how quickly existing capacity can actually compute.

On May 26, Qumulo introduced the Cloud AI Accelerator, followed shortly by the CloudBridge architecture, timed ahead of Cisco Live 2026. The core idea behind both announcements is the same: GPU capacity is expensive and scarce, yet it mostly sits idle-not because there’s too little compute power, but because data isn’t where the accelerator needs it in time.

Technically, the architecture combines three existing Qumulo components: Cloud Native Qumulo, the Cloud Data Fabric, and a cache layer called NeuralCache. Together, they deliver distributed data across on-premises, edge, and multiple clouds as a single logical source to GPUs. Cisco contributes networking, security, and UCS as the on-premises compute foundation. The solution is available, according to the vendors, across AWS, Azure, Google Cloud, and Oracle Cloud Infrastructure. The timing isn’t coincidental: ahead of their flagship event, they’re showcasing the building blocks that will define their portfolio for the year.

1. The 95% Gap Is a Data Problem

The headline figure from the announcement is uncomfortable. If GPUs are only utilized 5% of the time on average, then the most expensive line item in the AI budget is the time when nothing happens. In most cases, this isn’t due to model architecture or undersized clusters. It’s the pipeline before that: data is exported from the source system, converted into a format, loaded into GPU-adjacent flash storage, and only then processed.

5 %
average GPU utilization in enterprises-the rest sits idle while data is staged.
Source: Qumulo, Cloud AI Accelerator Announcement, May 26, 2026

This observation aligns with what platform teams see in operations. A cluster that spends three days waiting for training data isn’t an asset-it’s a liability. The conversation around GPU shortages shifts from procurement to how quickly existing capacity can actually be put to work. If you secured budget for additional accelerators last year, the first question should be whether the old ones were even fully utilized.

2. Connect, Not Copy: The Real Game-Changer

The technical mandate is clear: no copies. Instead of replicating data into a GPU-adjacent environment, the accelerator directly links Qumulo’s existing systems to the hyperscalers’ inference and training services. Specifically, Qumulo targets AWS Bedrock, Azure AI Foundry, and Google Vertex AI-all accessible without prior data duplication.

The difference isn’t just cosmetic. Every copy introduces storage costs, consistency risks, and delays. Eliminate the copy, and you eliminate the weeks where expensive silicon sits idle, waiting for data. For DACH teams with distributed locations, a second point is nearly more critical: data that isn’t copied rarely leaves its controlled environment. This directly impacts data residency requirements, which already shape every architectural decision in regulated industries.

Dimension Traditional Staging Cloud AI Accelerator
Data Movement Export, copy, replication Direct connection without copying
Time-to-GPU Days to weeks Minutes instead of staging
Idle Costs High, dominated by downtime Reduced through earlier start
Reach Per region, per cloud AWS, Azure, GCP, OCI plus UCS
Schematic diagram: Data flows directly from Qumulo systems to hyperscaler GPU services without copying.
Connect, not copy: Direct real-time linkage of on-premises source data to GPU services.

3. What Cisco Brings to the Hybrid Table

The partnership with Cisco goes beyond a logo on a slide. Cisco contributes the network and security layer essential for data to flow quickly and securely across cloud and location boundaries. With UCS, an on-premises compute foundation is added, pulling the model out of the pure cloud realm and making it appealing to organizations that can’t-or won’t-put everything into a hyperscaler.

The second announcement, CloudBridge, targets a related pain point: the so-called “flash tax.” This refers to the premium for GPU-adjacent flash storage, which Qumulo estimates at up to 400 percent. By eliminating the need to load training data entirely into this expensive tier, teams can bypass some hardware scarcity without purchasing additional capacity. That’s the economic core of the story: not more performance, but less waste.

4. Where Architecture Hits Its Limits

As clean as the promise sounds, it shifts problems rather than solving them. Eliminate the copy, and the network becomes the critical path. Latency and bandwidth between data source and GPU then determine whether theory translates into throughput. It’s manageable-but it’s work, and it lands in operations, not in the pitch deck.

What Still Needs Scrutiny

  • Network latency becomes the new bottleneck
  • Lock-in to the Qumulo fabric as the foundation
  • Governance across cloud and site boundaries

What Clearly Delivers

  • No recopying, lower consistency risk
  • Faster ramp-up for existing GPUs
  • Multi-cloud plus on-premises option via UCS

Then there’s the dependency. A data fabric that ties everything together becomes the foundation itself-one you can’t easily swap out. That’s not an argument against the architecture, but it’s a point for contract negotiations and exit planning. Anyone adopting the fabric should document from day one what an exit would look like, while the question is still theoretical.

What DACH Teams Should Now Examine in Practice

The most honest test isn’t the datasheet-it’s your own pipeline. If you want to know whether GPU liquidity delivers, first measure your own time-to-GPU: How long does it take from triggering a workload to processing the first batch? If that window spans days, the leverage is real. If it’s minutes, the accelerator solves a problem you don’t have.

The second step is the cost question, stripped of marketing. Idle GPU costs become quantifiable once you place utilization and hourly rates side by side. Only this number determines whether the new fabric is an investment or just another layer. A clean pilot-with a real workload, measured pre-deployment utilization, and a clear exit condition-tells you more than any reference architecture. Measure both, and you’ll negotiate with the vendor eye-to-eye, not slide-to-slide.

Frequently Asked Questions

What does GPU liquidity mean?

It refers to putting existing GPU capacity to work faster by making data available without lengthy staging. The bottleneck shifts from buying new hardware to the question of how soon existing accelerators can start processing.

Do I need to copy my data to the cloud?

According to the manufacturer, no. The Cloud AI Accelerator connects existing Qumulo systems directly to AWS Bedrock, Azure AI Foundry, and Google Vertex AI without first copying the data.

Which clouds are supported?

Qumulo lists AWS, Azure, Google Cloud, and Oracle Cloud Infrastructure. Cisco UCS adds an on-premises option for hybrid setups.

What role does Cisco play?

Cisco provides networking, security, and the UCS on-premises compute foundation. This layer determines whether data moves quickly enough across cloud and site boundaries to reach the GPUs.

Who benefits from this approach?

Primarily organizations with distributed data and measurable time-to-GPU delays. If your workloads already launch in minutes, your bottleneck lies elsewhere-and you’ll see little gain.

Image source: Cover and article images AI-generated (May 2026), C2PA certificate embedded in image

Also available in

A magazine by Evernine Media GmbH