By Nayana Shetty in Cloud — Mar 29, 2024

The pragmatic cloud agnostic design

Cloud agnostic is designing applications with portability in mind, relying on technologies wrapped in cloud services like kubernetes, postgreSQL, etc. to ensure compatibility across different cloud providers. This approach minimizes dependencies on provider-specific features. Cloud native on the other hand is building applications to fully leverage the specific services and features of a particular cloud provider, including using managed services unique to that cloud provider like step functions on AWS and CosmosDB on Azure.

Cloud agnostic design is something leaders should think about seriously when they go on the cloud adoption journey but should carefully factor in the cost of being cloud agnostic against the risk of being more cloud native in their decision making.

white concrete stairs with no people — Photo by Rayson Tan / Unsplash

Evolution of Cloud Computing

Cloud computing as a term was coined in the late 1990s. It was only in the early 2000s that cloud adoption gained momentum in companies of all sizes. It was the advancements of internet connectivity in terms of speed and reliability, improved virtualisation technologies, scalability and self healing capabilities that enabled this adoption. More importantly it was the pioneering cloud providers like Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform that transformed hosting into a self service commodity. This led to companies relying on these cloud providers to provide the platform or infrastructure as a service so they could focus their engineering efforts working on capabilities that directly impacted their business.

Understanding Cloud Agnostic Design

Companies adopt the numerous benefits hosting on the cloud, especially the flexibility of using cloud services and cost efficiencies. However the analogy I always think of is, cloud adoption is like a paying guest living in someone’s house. While you pay for the rooms you use, share the utility bills, do your own laundry, the house owner takes care of the house and its services. What happens if you fall out with the owner or the owner is not keeping the house to the standards you expect from them? Can you change where you live as soon as you make the decision if a new place is available? What about all the places you then need to change your address? What other dependencies have you created because of where you lived? These are exactly the kind of questions that engineers have to ask themselves when building their services in the cloud. And this is where the topic of cloud agnostic design comes in.

Cloud agnostic design refers to the ability of software applications or services to operate across multiple CSPs (Cloud Service Providers) without being tightly coupled to any specific CSP. In other words, cloud-agnostic solutions are designed to work seamlessly on CSPs, such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, or others, without requiring significant modifications or customisations for each platform. This flexibility allows businesses to avoid vendor lock-in, maintain interoperability, and choose the best-suited cloud services based on their specific needs and preferences.

Operating across multiple CSPs is key. It doesn’t necessarily mean that the service is hosted between multiple CSPs, or serving requests from multiple CSPs. It’s far simpler. It means you can move that piece of technology - that provides a business capability - somewhere else. It’s masquerading business resilience. As an example, it means that if you run kubernetes with AWS EKS, you should be able to move to GCP GKE, without a hitch. We’ll come to this in a bit.

The other common term most commonly confused with cloud agnostic design is multi-cloud, sometimes referred to as polycloud. It is natural to take advantage of specific offerings that are differentiators and could offer a competitive advantage to your business. For example Azure OpenAI or GCP BigQuery. This is far more common. Anecdotally most organisations will still lean heavily on a single CSP, and pick a handful of services from a secondary CSP.

Balancing Act: Cloud Services vs. Concentration Risk

As an organisation you’re not mitigating any risk with just being multi-cloud. You are taking advantage of the flexibility and pricing of cloud services. Your business might grow, but that doesn’t fulfill any of your cloud agnostic or resilience concerns. Calling cloud agnostic just by using cloud technologies that exist across multiple providers is not good enough. We have to verify that we can run operate across multiple cloud providers. If you run kubernetes on a CSP and think you can just migrate your workloads to a different CSP on a whim, you’re going to have a bad time. Google provides a very well thought out document of the considerations we need to take.

Here’s an edited list from the above, that will definitely trip you up:

Secrets management
Managed node groups and self-managed nodes
Amazon Custom AMIs for EKS
Usage of Amazon EKS Fargate

Working exclusively with a single Cloud Service Provider (CSP) significantly heightens the risks for your organisation. Your dependency on a single CSP means that your organisation's pricing, services, and availability are heavily dependent on that provider. In financial services, the term used is concentration risk. Like putting all your eggs in one basket. The implications are profound; imagine the impact if a majority of financial institutions rely on a single CSP and face disruption, failure, or compromise in their services. This vulnerability poses a systemic risk to the stability of the financial system.

Regulatory bodies such as the European Securities and Markets Authority (ESMA) and the Prudential Regulation Authority (PRA) have highlighted the significance of this risk. Financial institutions are expected to comprehend the operational and financial implications of their reliance on third-party providers and are encouraged to develop and adopt comprehensive risk management strategies.

The overall market's concentration risk in such environments becomes your business' risk.

ESMA offers a straightforward perspective: If CSPs demonstrate significantly higher resilience than individual firms, systemic risk could decrease as the additional resilience gained by using CSPs compensates for concentration risk.

Concentration risk is a critical component of the balancing act that every organisation must understand sooner or later. While CSPs provide Cloud Adoption Frameworks, customers must complement these with robust risk management plans and, ultimately, exit strategies.

In the landscape of cloud services, Software-as-a-Service (SaaS) solutions present another layer of consideration in the risk management strategy. When evaluating a technology to solve a business capability you should consider functionality and compatibility with other existing infrastructure but also the vendor lock-in risks and the availability of alternatives. For instance, assume you have a lot of services hosted on AWS, when you want to build the identity and access management, you could choose AWS Cognito which offers tight integration with other AWS services, but opting for Okta could provide more flexibility in multi-cloud environments. As organisations navigate the balancing act of cloud adoption, understanding the implications of SaaS choices alongside infrastructure decisions becomes paramount in crafting a resilient and agile cloud strategy

Exit Strategies: The Building Evacuation Plan for Cloud Services

A good way to bring some balance to this equation is with an exit strategy. An exit strategy is a plan that outlines how teams should act if they were to abandon the cloud provider or technology but they need to continue providing the capability to the business. It’s your building’s evacuation plan. You need to have one, understand it, and review it often. In financial services for example the DORA and its Regulatory Technical Specification, mandate the existence of an exit plan, and an assessment of the impact and alternative providers.

Again in the financial sector, the Prudential Regulation Authority sets expectations on stressed exit. A stressed exit is an unplanned exit. The importance of which is to identify what would happen when it occurs. The tighter the coupling with the third party, the higher the complexity in disengaging.

Triggering the Plan: When and Why to Exit

A good exit strategy identifies the trigger which defines why, when and at what level would the plan be triggered. The consideration needs to be inline with the organisation's strategies and direction. The trigger doesn’t have to be irreversible. All we need to outline is our boundaries of tolerance before we decide to leave. If we sense that we could get a lot more value from another provider or tech stack, that’s when we refer back to the strategy.

Identifying the right time to plan an exit strategy is crucial, as is understanding why you need to exit. The rationale behind an exit should align with specific business needs. Consider the example of concentration risk we discussed earlier. There are various reasons for an exit strategy: the prohibitive cost of a support contract from a vendor, the benefits of switching to a more cost-effective technology, failure to meet growth targets due to scalability issues, a diminishing talent pool, or escalating regulatory risks. Without a clear objective, exiting is not the right decision. The reason for an exit must be substantiated.

We should also identify appropriate levels, the granularity, of the exit strategy. Does every capability, solution, or business unit within an organisation need to have one ? That depends on the size and complexity of your organisation. If you own 100 web applications that run on identical technology stacks in Azure, having 100 exit strategies is wasteful.

Heavily paraphrasing Corey Quinn, if you place the responsibility of the operations of your business on another company, that company isn’t a vendor, they are a partner.

Setting Expectations: The Realities of Cloud Migration

Finally, you should set the right expectations. Migrating data and rewriting code is costly, time consuming, and risky. Economies of scale and special rates from your new vendor might be enticing, but the data egress and technology transition is never easy. Maintaining a detailed low level migration plan is futile. By the time you have an accurate plan, the pricing structure would have changed. Instead, you should have a high level plan for the cost of your exit, including pragmatic timelines that takes into account how versatile and mature are your application teams. Are your apps containerised ? How coupled is your CI/CD to your vendor ? The answers to these questions would help you set the right expectations on your plan. Your timescale might look arbitrary, but you should have data to back it up.

In summary, as organisations dive into cloud computing, it's vital to weigh the advantages and drawbacks of different approaches. Whether embracing cloud-agnostic or cloud-native strategies, leaders must assess their compatibility with business objectives. By grasping the evolution of cloud technology and acknowledging the significance of risk management, companies can make informed decisions. A well-defined exit strategy ensures adaptability amid evolving landscapes. Remember, setting realistic expectations and continuously reassessing strategies are key to maximising the benefits of cloud adoption.

Subscribe to Stratara