This article discusses Microsoft Azure financial operations (FinOps) best practices for managed service providers (MSPs). It is especially relevant for MSPs that wish to guide their customers through cost optimization. MSPs that take advantage of Azure FinOps best practices improve their profit margins and increase loyalty by passing savings on to customers.
Summary of Azure FinOps best practices
The table below summarizes the best practices discussed in this article. The discussion that follows elaborates on relevant use cases and provides practical examples and advice to shorten the cost optimization journey.
Category | Best practice | Description |
---|---|---|
Getting started | Build a FinOps team | The success of any FinOps initiative depends on the team. Form a dedicated but small team that can collaborate in short cycles. |
Monitor spending with budgets and cost alerts | Avoid surprises in your monthly bill, and react promptly when service consumption grows. | |
Planning ahead for savings | Select the right cloud services | Get the most cost-effective service for each task. |
Estimate growth | The cloud is a great way to start small and scale efficiently. Forecast expenses to make sure your margins grow at scale. | |
Tiered services | Save by using lower SLAs for non-mission-critical workloads | Align the level of performance to the requirements of the business. |
Savings with your MSP | Commit to using reserved resources | Get discounts up to 72% by committing for one- or three-year periods. |
Negotiate discounts | MSPs can leverage the size of their deployments to get reductions off Azure’s list price. | |
Consolidate billing | Get a single billing report or dashboard across multiple cloud providers to avoid wasted effort and mistakes due to varying terminology and pricing models. | |
Non-production workloads | Optimally schedule non-production workloads | Turn off development resources during off-work hours for non-production systems. |
Clean up temporary resources | Set up a process for cleaning up unused resources and workloads. | |
Production workloads | Address over-provisioned resources | Scale down in areas where usage has declined over time. |
Engage in dedicated resource cleanup | Set up a process for cleaning up after your tenants. |
Getting started: Azure FinOps organizational processes
Activities to reduce cost are iterative by nature, constantly adapting to changes as the business evolves. FinOps processes define a particular lifecycle of gathering information, optimizing cloud spend, and operating according to business objectives, as shown in the figure below. Always look around for services where spending has risen and areas where utilization has decreased.
Build a FinOps team
The FinOps team identifies optimization opportunities and suggests proper alternatives to reduce costs. It should include representatives from finance, product, DevOps, and R&D.
The following describes a typical FinOps team, including member responsibilities:
- Finance: The primary stakeholder for implementing cost savings.
- DevOps: Provides hands-on experience in cloud cost configurations.
- Product management: Pushes priorities within the organization.
- Tech leader: Understands how cloud services support different business tasks.
The most common way for the FinOps team to detect optimization candidates is to review cost “heavyweights” in the monthly bill. Check the bill for changes from previous months, and scan the service instances for under-utilized resources (such as CPU, storage, memory, and network). We will discuss right-sizing resources later in this article under the section covering production workloads.
The team should keep one thing in mind: Every business activity needs to demonstrate a return on investment (ROI), and FinOps processes are no exception. In other words, the FinOps team must justify its existence by continuously identifying new savings every month.
See the best multi-cloud management solution on the market, and when you book & attend your CloudBolt demo we’ll send you a $75 Amazon Gift Card.
Monitor spending with budgets and cost alerts
Staff members can sometimes create and use cloud resources without considering the cost involved, so it’s important to set budgets to get notified and mitigate cost anomalies before the monthly bill arrives. Azure offers budgets on subscriptions, resource groups, or collections of resources. CloudBolt supports budgets for Azure, AWS, and GCP, providing a standardized approach for designing internal processes within an MSP organization.
Planning ahead for savings
While it’s true that opportunities for cost optimization reveal themselves only in hindsight, it’s vital to plan ahead. The FinOps team has to take part in project planning and make sure that the early decisions made by the engineering teams are cost-effective at scale.
Select the right cloud services
Cloud services come in different sizes and flavors and with varying fee structures; naturally, the top-tier services are also the most expensive.
The FinOps team must challenge development and engineering teams if they select tools that are overqualified for the task and may have been selected because they are interesting to technologists who would like to gain experience with them.
For example, a fast NoSQL database like Cosmos DB is a popular selection for its advanced technology but is often overkill for use cases that don’t require a NoSQL database. Selecting a conventional Azure SQL service would meet the performance and functionality requirements of most applications while saving a few bucks.
Estimate growth
Once cloud services have been selected, it’s time to extrapolate into the future and estimate the costs at scale. The average cost per user should decrease as application adoption grows and the infrastructure scales.
Identify which service costs improve with scale and which grow linearly. FinOps teams should collaborate with development teams to estimate the cost per user in the planning phase of the project. Monthly analysis should reinforce the accuracy of those early predictions and identify areas of savings.
The exercise of completing a simple growth estimation table can force the team to plan ahead. Completing this table requires the engineering team to share estimated capacity for the application infrastructure, which can be used to estimate its cost and the number of transactions it can process. In the following example, the milestones are year-over-year growth targets.
Service | Year 1(0 to 10K Users) | Year 2(Up to 100K Users) | Year 3(Up to 1M Users) |
---|---|---|---|
AKS | $750 | $3,000 | $21,000 |
Network | $900 | $3,600 | $25,200 |
Blob Storage | $500 | $5,000 | $50,000 |
MongoDB | $1,200 | $12,000 | $120,000 |
Total monthly bill | $3,350 | $23,600 | $216,200 |
Cost Per User | $0.34 | $0.24 | $0.22 |
Example growth estimation table with 10X year-over-year growth target.
Platform
|
Multi Cloud Integrations
|
Cost Management
|
Security & Compliance
|
Provisioning Automation
|
Automated Discovery
|
Infrastructure Testing
|
Collaborative Exchange
|
---|---|---|---|---|---|---|---|
CloudHealth
|
âś”
|
âś”
|
âś”
|
||||
Morpheus
|
âś”
|
âś”
|
âś”
|
||||
CloudBolt
|
âś”
|
âś”
|
âś”
|
âś”
|
âś”
|
âś”
|
âś”
|
Estimation table notes:
- The AKS and network areas can benefit from scaling through bulk processing and reservations (a topic we will discuss later in this article), while blob storage and database requirements grow linearly with users, since the average storage per user is constant.
- Use the Azure pricing calculator to assign costs to services.
Tiered services
Azure services come in different options referred to as tiers, which are pricing options that reflect how critical the service is for the business. Tiers are a great way to configure the cloud service to use lower SLAs in non-mission critical workloads.
Azure offers several pricing tiers for most of its services. Two common examples are provided below.
In-memory cache
In-memory cache solutions like Redis and Memcached are a great way to boost performance, but they come at a hefty price. Try allocating lower tiers for non-critical business flows.
Azure Cache for Redis has a discounted tier for development/testing workloads: Savings start at 50% for non-replicated caches. If lower cache tiers don’t work out, another option is to add a configuration that disables the use of cache in the application.
Storage
Azure supports several storage tiers, each of which fits a different use case, with a trade-off between data volume, access frequency, and read/write speed. While content-centric applications require plenty of storage, moving historical or rarely accessed data to lower storage tiers will optimize spending.
For example, consider an application that keeps a log of all user actions. Optimizing the storage for the log records will reduce costs, as demonstrated in the table below.
Retention period | Data use case | Storage tier |
---|---|---|
Last 30 days | UI display | Hot tier (online) |
30 days – 1 year | BI reports | Cool tier (online) |
More than 1 year | Archive / regulation | Archive tier (offline) |
Example optimization of Azure storage tiers
How MSPs can drive savings
Managed service providers can aggregate their clients’ spending and use the funds to negotiate better rates with Azure, passing the savings on to their customers.
First, let’s step through the logistics of the purchasing process. An MSP will onboard Azure customers to a Microsoft Customer Agreement and purchase an Azure Plan. The MSP can then manage onboarded customers in the Partner Center. This setup allows the MSP to purchase reservations for the customer. Being associated with client spending can help the MSP negotiate volume discounts.
Separately, the MSP can use Azure Cost Management APIs to automate FinOps tasks. For example, CloudBolt Cost Report for Azure leverages those APIs to consolidate reports, analyze unit costs, apply discounts, identify savings, calculate margins, and more.
CloudBolt FinOps tools also recommend specific areas of savings, such as purchasing reserved instances, negotiating volume discounts, and consolidating reports across clients and cloud providers.
Commit to using reserved resources
Committing for one or three years is a popular way of saving money. Azure partner MSPs can purchase and manage reservations for their customers.
When considering the minimal ROI period for a reservation, compare the reservation cost to the pay-as-you-go pricing. For example, a 66% discount on a three-year reservation pays for itself in less than two years. MSPs can track reservations with the CloudBolt Reserved Instance Report.
Negotiate discounts
Large Azure customers, particularly MSPs, can leverage the size of their deployments and the fact that they are generating revenue for Azure to get a reduction off Azure’s list prices. MSPs can pass on discounts to customers and retain some to improve their gross profit margins.
Consolidate billing
Terminology and pricing models are different across cloud providers. MSPs can assist in controlling the monthly bill of multi-cloud deployments, providing a single pane across cloud vendors. CloudBolt lets MSPs monitor spending across Azure, AWS, and Google Cloud, among others, with Cost Report.
How to save with non-production workloads
Non-production environments are a great place to hunt for savings. The performance requirements are usually low, and they are more tolerant to minor service disruptions.
Optimally schedule non-production workloads
Non-production environments often lay idle with no one using them. It’s a good practice to schedule shutdowns of those resources, with the most common idle times being overnight and on weekends. Turning the resources back on can be either scheduled (typically every workday morning) or done on demand when users need the environment. CloudBolt can do this with Power Schedule.
Clean up temporary resources
Some things in life are temporary, including non-production workloads. For example, an environment spawned for a proof of concept is only needed for a limited time and then needs to be taken down. Other examples of temporary environments are private developer environments and workloads for a specific customer, project, or product version. CloudBolt’s Cost Service Adviser provides a way to identify and decommission unused resources.
Resource cleanup should cover both extended application environments and temporary resources created on the fly for a tactical purpose. Consider the example of a storage account created for each new tenant to store the user’s content.
Practical tips:
- Establish accountability: Every resource must have an owner in the organization.
- Set an expiration for every cloud resource and regularly destroy expired resources.
- Tag resources to be able to detect zombies: The owner’s email, project name, environment name, and customer ID are just a few examples.
- We should mention Kubernetes, given its rising popularity: Limit the number of nodes in the cluster and the containers’ RAM and CPU allowances. See examples of configuration in Kubernetes, AKS, and the AKS cluster autoscaler. resources: limits: cpu: “1” requests: cpu: 500mExample of container CPU limit from kubernetes.io
How to save in production
The production workloads are the most expensive and use the most resources. This makes it a great place to cut costs by scaling down over-provisioned services. However, production is also the place where it is most important to proceed with caution. As a rule of thumb, customer satisfaction is more valuable than savings.
Address over-provisioned resources
Time flies, and things change: A feature that once got a lot of customer traction may now be rarely used. The resources, however, could still be running—and incurring monthly charges.
It is worth investing time to deal with areas where usage has declined. Use caution and split the scale-down into steps. Following each step, verify that performance and the user experience have not been compromised. CloudBolt’s Cost Service Adviser provides a way to identify and reduce resource consumption.
Engage in dedicated resource cleanup
Due to security and isolation requirements, some SaaS solutions allocate dedicated resources to tenants. When a new tenant joins the service, these resources are created on the fly; an example would be a new storage account spawned for each new tenant to store content created by the users. These resources must be deleted when a tenant leaves the service.
Key Azure FinOps takeaways
In closing, we would like to summarize the core ideas shared in this article as you start on your Azure FinOps journey:
- Form a FinOps team early in the process and set up cost budgets and alerts.
- Have your engineering team select the most affordable services and forecast their usage.
- Take advantage of lower-tier and less expensive services whenever your application service level agreement (SLA) allows it.
- Consolidate billing across your clients, negotiate volume discounts, and purchase reserved resources to deliver savings and improve margins.
- Remember to turn off the resources used for non-production platforms, such as development servers, outside of working hours.
- Right-size over-provisioned resources and remove unused resources.
CloudBolt is designed to help guide you through every step of the FinOps journey by offering analytics and automation across all major cloud providers. Learn more here about our functionality and support program designed for MSPs.
Related Blogs
The New FinOps Paradigm: Maximizing Cloud ROI
Featuring guest presenter Tracy Woo, Principal Analyst at Forrester Research In a world where 98% of enterprises are embracing FinOps,…
FinOps X Europe 2024 Recap
Join Kyle Campos (CTO), and Ryan Wrenn (VP of AI/ML), as they recap what they saw and heard and FinOps…