AI Strategy

Hosted, Managed, or Sovereign: The AI Deployment Spectrum That Shapes Everything Else

When organisations start deploying AI models seriously, a familiar set of questions surfaces fast — compliance obligations, data sovereignty, cost control, governance gaps, and vendor lock-in. Most teams treat these as separate problems to tackle one by one. They are not. Every one of them flows from a single foundational decision made early: where does the AI workload live?

That decision — made casually or deliberately — shapes the overall cost structure, compliance posture, fallback options, and how much control is retained as the AI landscape continues to evolve.

The Spectrum Is Not a Binary

The conversation is often framed as cloud versus on-premise, or AWS versus open source. That framing is too narrow. The realistic spectrum today looks more like this:

Proprietary hosted — Direct API calls to Anthropic, OpenAI, or Google. Fastest to start, least overhead, but the model, infrastructure, and data path are entirely in someone else’s hands.

Cloud-managed — AWS Bedrock, Amazon SageMaker, and equivalent offerings on GCP and Azure. The hyperscaler manages model serving while the organisation stays within a familiar compliance boundary, but model selection and the serving layer remain vendor-defined. Unless multi-cloud is already embraced, the hyperscaler choice in this scope typically follows the existing infrastructure footprint.

Managed Kubernetes on cloud — EKS, GKE, or AKS with GPU node groups. The cloud provider manages the cluster, but the organisation chooses the models, serving framework, and deployment pipeline. Operational lift is shared; control is meaningfully higher.

Self-hosted Kubernetes — It is up to the organisation to choose the cluster size, GPU node type, and ops team composition. Full control over data, model weights, networking, and upgrade cadence. Maximum sovereignty, maximum responsibility.

Multi-cloud or partner-managed — A solution delivered by an AWS partner, an ISV, or an internal platform team. “Managed” here does not mean the hyperscaler by default — it refers to whichever partner, whether hyperscaler or otherwise, owns the operational responsibility for keeping the platform running.

Each position is a legitimate choice. None is universally correct.

What the Spectrum Decision Actually Controls

This decision shapes the answer to six questions simultaneously — and the quality of those answers reflects how thoroughly the exercise was completed upfront.

Compliance and data residency. The starting point is understanding where prompt data goes and who can access it in transit and at rest. Regulations like the EU’s GDPR and India’s DPDP Act 2023 go beyond PII protection — they impose obligations on where data lives, how long it is retained, and the right to erasure. These are not implementation details. They are legal constraints that determine which deployment options are available before architecture planning begins. An organisation serving EU users may find certain hosted API configurations off the table depending on the provider’s data processing agreements. Under DPDP, India-hosted deployments are not a preference — they are a compliance starting point.

The spectrum decision does not just shape how data is protected inside the system — it determines which deployment options are legally available before architecture begins.

Sovereignty. Sovereignty is not just about data residency. It includes model version control, the ability to audit model behaviour, and the right to keep operating if a vendor changes terms, deprecates a model, or goes down. Open-weight models on self-managed infrastructure give the organisation control across all three scenarios. Proprietary hosted APIs give none of those options by default.

Cost structure. Per-token pricing at scale is a real concern. So is the capital cost of GPU infrastructure and the operational cost of the team running it. Neither end of the spectrum is inherently cheaper — it depends on the organisation’s usage patterns, scale, and internal expertise. In GPU node deployments specifically, cost control comes down to how nodes are terminated during idle periods and re-initialised to serve incoming requests. The spectrum decision determines which cost levers the organisation holds.

Governance. Who called which model, with what prompt, on behalf of which user, and what did it return? Governance is an audit question. Answering it requires a logging and observability layer independent of the model provider. Without it, the governance posture is only as strong as what the vendor exposes to the organisation.

Vendor lock-in. Lock-in at the API level is recoverable — switching providers is painful but possible. Lock-in at the architecture level, where applications are tightly coupled to a specific SDK or endpoint format, is far harder to unwind. The spectrum decision determines how deeply the vendor reaches into the organisation’s technology stack.

Fallback and DR. What happens when a model goes down, a provider has an outage, or a cost spike makes the primary model unviable? A deployment with only one inference path loses its AI capability with it. Every deployment, at any point on the spectrum, should have at least one tested fallback path — defined at conception, not discovered during an incident.

The Gateway Layer Changes the Equation

A unified AI gateway layer — sitting between the organisation’s client applications and its model endpoints — decouples application code from the spectrum decision. Bedrock, a self-hosted model on EKS, or a direct Anthropic API call become configuration entries, not architectural commitments.

This makes the spectrum decision adjustable rather than irreversible. An organisation already on AWS can start on Bedrock, validate its governance and cost model, and plan a migration of its primary workload to a self-hosted cluster without changing application code — whenever the need arises. In that process, a SageMaker endpoint can serve as a fallback for a self-hosted primary, or Bedrock can be used as the DR target for a managed EKS cluster.

The gateway also centralises observability and audit — request logging, PII handling, cost attribution, and trace correlation — regardless of which backend handles a given request. Governance instrumentation is built once and inherited by every workload behind it.

Security deserves the same architectural rigour as the deployment decision itself. Connecting AI agents to existing enterprise systems — banking protocols, EHR platforms, engineering databases — through MCP-based integrations expands capability and attack surface simultaneously. Agent-to-agent and API-based interactions need asymmetric signature validation as a baseline: RSA 4096-bit for legacy-constrained environments, ED25519 for pre-quantum readiness, with keys in a vault rather than embedded in configuration. SBOM, SAST, and DAST belong in the DevSecOps pipeline as standard gates, not periodic audits. OWASP’s Top 10 LLM Vulnerabilities is a practical reference for the new attack surface AI agents introduce — both genuine vulnerability vectors and prompt-level edge cases that become exploitable at scale. Infrastructure-level guardrails (rate limiting, request validation) and application-level guardrails (PII redaction from prompts) are the production readiness bar, not post-launch additions.

Validating this architecture across multiple spectrum positions simultaneously — proprietary API, cloud-managed, and self-hosted in one deployment — demonstrates that the six goals above are not sequential problems. They are a unified configuration concern, addressed once at the gateway layer and inherited across every workload.

The Question to Ask First

Before committing to a deployment model, the useful question is not “which provider has the best model?” or “what is cheapest?” It is: how much of this decision allows a revisit in the twelve to eighteen months after initial deployment?

If the answer is “all of it,” the architecture must be designed for that from the start. The deployment model decision and the gateway architecture decision belong together — not in separate conversations at separate stages.

Every position on the spectrum is a valid starting point. What determines whether it becomes a constraint or an advantage is whether the architecture above it gives the organisation the freedom to move.

Raghuveer Dendukuri is an Enterprise Architect at Minfy Technologies and a former CTO, COO, and startup founder with 21+ years of experience in enterprise architecture, AI-first systems, sovereign AI design, and regulated industry platforms spanning fintech, healthcare, and aerospace. He is a published author on blockchain, cryptography, and payments, and a FIAKS Maven. He writes on system design, technology strategy, and responsible AI adoption.