In this blog, we will be visiting the continuous optimization process followed at CloudArmee while working on the ‘Cost Optimization’ aspect of implementing FinOps for a customer.
At CloudArmee, we recognize that effective FinOps isn’t merely about cost reduction; it’s about empowering your business to make informed choices that align cloud spending with broader organizational goals. Our holistic FinOps methodology, honed through extensive experience and a deep understanding of the AWS ecosystem, equips you with the tools, processes, and insights needed to achieve cloud financial excellence.
To navigate the dynamic landscape of cloud financial management, understanding the core stages and principles of FinOps is essential. The following visuals illustrate the key components that guide organizations towards efficient and cost-effective cloud operations.
Pillars of AWS Cost Optimization
Pillar 1: Right size
Right-sizing your resources means matching the size of your resources to your actual needs. This is about AWS cost optimization through avoiding over-provisioning and under-provisioning, which can lead to wasted resources and increased costs.
Key strategies you can follow for AWS cost optimization using this pillar:
Pillar 2: Increase elasticity
This pillar of AWS cloud cost optimization focuses on scaling resources up or down dynamically based on demand. Elasticity enables you to meet fluctuating business demands without over-provisioning or under-utilizing resources.
Key strategies you can follow to optimize AWS costs with this pillar:
Pillar 3: Leverage the right AWS pricing model
This AWS cost optimization pillar advises that you choose the most appropriate pricing model for your workload and usage patterns in order to optimize AWS costs.
The four main AWS pricing models are:
The main idea is to leverage the best pricing model without sacrificing performance or availability.
Pillar 4: Optimize storage
This pillar of AWS cost optimization tells you to manage your storage efficiently while ensuring that your data is available when you need it.
Use the following recommendations to optimize AWS costs with the help of this pillar:
Pillar 5: Measure, monitor, and improve
This AWS cost optimization pillar focuses on continually measuring and monitoring your AWS resources to identify cloud cost optimization opportunities and implement changes to improve cost efficiency.
Apply the following tips to reduce AWS costs using this pillar:
Understanding the AWS bill
Navigating and understanding AWS bills can be difficult due to their complexity and detail. CloudArmee has dedicated teams who has years of experience in analyzing customer’s current spend on services in AWS. Organizations frequently face challenges in decoding the usage and services charges, which are compounded by the sheer volume of services and pricing options AWS offers. This is where CloudArmee steps in to review billing errors or misjudged resource usage that inflates costs unnecessarily.
Accurately forecasting AWS costs:
We forecast AWS costs accurately for AWS cloud services, where workloads can scale based on demand. Predicting future costs requires a deep understanding of current and projected usage patterns, which can fluctuate widely with business activities and market conditions.
Cost accountability issues:
In organizations with multiple teams or divisions, assigning AWS costs can become complex, leading to accountability issues. Without clear visibility and attribution of expenses, it’s challenging to understand which department or project is driving costs. Our team helps establish control around this.
Reducing AWS costs:
Optimizing the configuration and scaling of services to match workload requirements can significantly reduce costs, but this requires technical skill and effort. CloudArmee Engineers enforce policies that ensure teams are accountable for their cloud consumption and adhere to best practices for cost efficiency.
Taking actionable decisions:
Even with a wealth of data, many organizations find it difficult to take actionable decisions regarding AWS costs. We alleviate this challenge by using budget groups, which categorize expenses based on projects, departments, or business units, and forecasting tools which predict future spending based on historical data.
1. Billing and Cost Management Console
The AWS Billing and Cost Management Console is a central dashboard where users can view and manage their AWS costs and usage. It provides access to billing reports that break down expenses by service, geography, and linked accounts, enabling users to track spending patterns and make necessary adjustments.
In addition to standard billing information, the console offers tools for budgeting and forecasts, alerts, and recommendations. Users can set custom spending limits, receive notifications when usage approaches or exceeds these limits, and explore ways to optimize costs based on AWS’s personalized suggestions.
2. AWS Cost Explorer
Cost Explorer also offers the capability to filter and aggregate data based on various parameters such as service, tags, or time periods. This makes it suitable for de-tailed cost analysis and planning, helping organizations make informed decisions on where to allocate resources.
Right-Sizing Recommendations
Within the Cost Explorer tool, AWS provides the Rightsizing Recommendations tool, which analyzes your current resource usage and suggests changes to your instances and services that could save money. These recommendations are based on your actual usage patterns and can include suggestions to scale down or terminate resources that are consistently underutilized.
3. AWS Budgets
AWS Budgets gives customers the ability to set specific budget limits and receive alerts when their usage or costs exceed or are projected to exceed their budgeted amount. This tool is useful for managing financial compliance and ensuring that spending does not spiral out of control.
By defining expected cost and usage limits, organizations can align their cloud expenditures with their financial planning and operational objectives. AWS Budgets also integrates with other AWS cost management tools, providing a comprehensive view of financial performance and forecasts.
4. AWS Trusted Advisor
AWS Trusted Advisor is an online resource that scans your AWS environment and provides real-time recommendations concerning cost optimization, performance, security, and fault tolerance. By highlighting areas where resources are underutilized or improperly configured, it helps users reduce costs and improve system performance and security.
The tool provides actionable insights and guidelines that can lead to significant cost savings. For example, it suggests where Reserved Instances can be applied to reduce long-term costs, identifies idle resources that are wasting money, and recommends security enhancements that could prevent costly data breaches.
5. AWS Cost Anomaly Detection
AWS Cost Anomaly Detection is a service that uses machine learning to monitor for unusual spikes in spending that may indicate misconfigurations, unintended deployments, or potential security issues. It automatically analyzes historical spending data to detect anomalies and alerts users through integrated notification services when irregular spending patterns occur.
6. AWS CUDOS
AWS Cost and Usage Report (CUDOS) provides detailed insights into your AWS costs and usage. It consolidates billing data from all your AWS accounts into a single, customizable report, enabling a granular analysis of spending and resource consumption. Organizations can break down costs by dimensions such as service, account, region, and usage type.
7. 3rd Party AWS Cost Management Tools – CloudCheckr
CloudCheckr focuses on seven key areas – cloud visibility, cost optimization, resource utilization, security optimization, compliance monitoring, service enablement and pricing and billing. It helps –
1. Continually optimize cloud spend – by 30% or more – by responding to trends and specific recommendations
2. Hundreds of industry and proprietary Best Practice Checks for cost optimization to constantly assess and optimize your environment.
3. Assess cloud provider volume discount purchase options and automatically optimize recommendations for Savings Plans and Reserved Instances.
4. Properly analyze consumption and associated costs and identify waste utilizing an advanced cost query engine that surfaces the most detailed level of information.
We use Cloudchekr to gain unique visibility and recommendations to optimize cloud spend, right size resources and eliminate waste to save 30% or more. It is extensively utilized to assess cost performance and trends to drive organizational alignment on cloud strategies and governance. Cloudchekr helps –
These recommendations highlight cases where it may make sense to do one of the following:
1. Terminate idle instances
2. Right size underutilized instances
3. Upgrade to more modern instance types including Graviton
Idle instances are instances that have lower than 1% maximum CPU utilization. Underutilized instances are instances with maximum CPU utilization between 1% and 40%.
When an idle instance is detected, AWS generates a termination recommendation. When an underutilized instance is identified, AWS simulates covering that usage with a smaller instance within the same family. If bundling several smaller instance sizes within the same family could provide savings, AWS shows three rightsizing options.
Dynamic scaling or elasticity is a key component of having a Well Architected and cost-effective solution in AWS. To take full advantage of the cloud your resources should match current demand and not peak demand, ensuring you pay only for what you need when you need it.
Elasticity
Elasticity in AWS is the ability to scale your resources up or down automatically or manually based on demand. This can lead to significant cost savings by avoiding over-provisioning resources during periods of low demand and ensuring sufficient capacity during peaks.
Historical Usage
Examining your historical usage patterns is crucial when implementing elasticity. CloudWatch metrics can provide valuable insights into how your application’s usage fluctuates over time. By analyzing these metrics, you can identify trends and patterns, such as peak usage times or seasonal variations. The characteristics of an application may not merit adding elasticity if the usage is constant. This information allows you to configure auto-scaling policies that dynamically adjust your resources based on actual demand, optimizing costs while maintaining performance. Some relevant metrics are:
Key AWS Services for Elasticity
There are many different storage technologies: block storage, object storage, solid-state disks, memory, and so on. Each of these has a different set of characteristics that perform well in certain scenarios and poorly in others. For example, solid-state devices have high throughput and fast access but more expensive per-byte than spinning disks; if you are building a system that is designed for archiving large volumes of infrequently accessed data, using SSDs may not be cost-effective. Conversely, using a spinning disk for the random-access requests that a database is likely to make could lead to queries taking longer.
As with many parts of a system there are trade-offs to be made between cost-efficiency, storage reliability, and performance. By choosing the best-suited technology your system can be as efficient as possible with resources while remaining suitably reliable. Storage optimization is the collective process, frameworks and technologies that enable the efficient use of storage infrastructure and resources. It is a broad concept that works across all the technological and management layers of storage management to ensure existing storage resources are working in an efficient and cost-effective way. Storage optimization primarily helps to minimize the disk/storage use across all storage tiers and resources. Typically, the goals of storage optimization are to reduce storage hardware and administration costs, consolidate existing storage resources through server virtualization, gain system-wide visibility into storage resources and minimize storage administration. Data classification and visualization, storage virtualization, tiered storage architecture, integration of storage management software and policy-based storage automation are some of the processes and technologies used in storage optimization.
At CloudArmee we use below services and tools to optimize storage costs:
CloudCheckr
CloudCheckr provides the following storage recommendations:
Logical Volume Manager
In some cases, you might want to use Logical Volume Manager and combine (also known as stripe) several disks in order to provide the target input/output operations per second (IOPS) and throughput. This configuration is typically more cost advantageous than a single volume of a higher class. However, this approach can lead to some additional unused space that you should balance against size and cost considerations.
Amazon S3 Lifecycle Policies
S3 Lifecycle policies are the backbone of automated data management within Amazon S3, enabling you to define rules that govern the lifecycle of your objects from creation to expiration based on their age. These policies empower you to optimize storage costs, adhere to compliance requirements, and streamline your storage infrastructure.
Types of Lifecycle Policies
Transition Actions:
Expiration Actions:
By implementing effective lifecycle policies, you can streamline data management, reduce storage costs, and ensure that your valuable data is stored in the most appropriate and cost-efficient manner throughout its lifecycle.
AWS S3 Intelligent-Tiering
AWS S3 Intelligent-Tiering allows you to automatically control transitive action policies and reduce operational tasks for buckets with unpredictable business or data access patterns. This storage class intelligently adapts to your evolving needs, automatically moving objects between different tiers based on usage, ensuring you only pay for the storage class you need.
Features:
Usage Profile:
How It Works:
Intelligent-Tiering monitors access patterns and automatically moves objects between two tiers:
If an object in the Infrequent Access tier is accessed again, it’s automatically moved back to the Frequent Access tier without performance impact.
Amazon EBS Underutilized Volume Capacity
Amazon Elastic Block Store (Amazon EBS) volumes are used to store your persistent data such as the operating system, SAP binary files and your SAP database files. The sizing of these volumes is critical when balanced with cost optimization. It is extremely easy to increase the size of a volume, therefore you should provision volumes with enough room to hold the actual volume of data plus a small percent to accommodate growth. Right sizing overprovisioned volumes can be labor intensive and require planned downtime. If this is where a large percentage of your spend is occurring though it will make sense to embark on such an activity.
As part of your cost optimization strategy the review of overprovisioned EBS volumes should be conducted every other review cycle. If there are continuous opportunities in this particular area automation can be developed to alleviate some of the manual effort involved.
Optimize Volume Types to General Purpose v3
GP3 offers a compelling way to optimize your EBS volumes by providing the flexibility to fine-tune performance and cost independently. GP3 offers a base 3000 MiB/s included and a 20% cost reduction. For any volume under 1TB GP3 is a very safe optimization that will also increase performance.
GP3 Features:
GP3 strikes a balance between cost and performance, making it an ideal choice for various workloads, including databases, web servers, and development environments. By adopting GP3, you can enhance your EBS strategy, achieving optimal performance at a cost-effective price point.
Amazon EFS Lifecycle Policy
Amazon Elastic File System (Amazon EFS) automatically scales storage as files are added or removed. In SAP environments, this is typically used as your transport, interface, and cluster file systems. In addition, you might use it to temporarily store software for active projects. Although, in the long-term, it is recommended that you store software in an S3 bucket.
For SAP interface file systems, consider configuring an EFS lifecycle policy. With lifecycle policies, you can reduce storage costs by transitioning files with specific retention requirements to a lower-cost storage class. If you use the industry-accepted estimate, 80% of your data is infrequently accessed. Moving this data to lower-cost storage classes is a prime opportunity to reduce your costs. For more information, see Amazon EFS Infrequent Access.
As part of your cost optimization strategy, review EFS lifecycle policies every other review cycle. If there are continuous opportunities, enable EFS lifecycle policies to automate the process and implement a clean-up routine.
Snapshot Management
Amazon Data Lifecycle Manager (DLM) leverages policy-based management to match your snapshot automation to your business and compliance requirements. It automates the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs.
When you automate snapshot and AMI management, it helps you to:
Types of Policies in DLM
DLM can be combined with Amazon EventBridge and AWS CloudTrail to provide a complete backup solution for Amazon EC2 instances. There is not additional cost for using Amazon Data Lifecycle Manager.
Cloudchekr helps identify lot of unused or idle resources like –
Unused Transit Gateways, VPCs, and VPN Gateways
Reserved Instances and Savings Plan enable our customers to commit to a certain capacity for a specified duration, resulting in significant cost savings compared to On-Demand pricing. We analyze AWS workload’s usage patterns using the recommendation reports in the Billing Console and in concert with the customer’s business plan and roadmaps recommend purchasing Savings Plans or RIs for workloads with predictable usage. AWS offers various RI types, including Standard RIs and Convertible RIs, each suited for different scenarios as well as Compute and EC2 Instance Savings Plans. When a business purchases a Reserved Instance or Savings Plan, it reserves a certain number of resources for a specified period of time. The business is then billed at a lower rate for workload’s usage than the standard rates for the same services. Reserved Instances and Savings Plans can be purchased for 1- or 3-year terms, and the longer the term, the greater the discount.
Overall, both these methods allow you to save costs while ensuring that you have the computing capacity you need to run your applications and services efficiently.
Reserved Instances are available for:
Savings Plans can be purchased for EC2 Instances or in a more general Compute type covering EC2, Fargate, and Lambda). A separate Savings Plan exists for SageMaker. The Compute Savings Plan overs a higher discount over the broader Compute Savings Plan.
AWS Billing and Cost Management RI Recommendations
CloudArmee proactively identifies potential cost savings by analyzing instance usage and suggesting suitable RIs. This is done in conjunction with Savings Plan Analysis.
Key Distinctions:
Term: Recommendations cover both one-year and three-year terms, allowing you to balance cost savings with flexibility based on your workload predictability.
Type: It includes suggestions for both Standard and Convertible RIs. Standard RIs offer the highest discounts for specific instance types, while Convertible RIs allow flexibility to change instance types or regions during the term, though at a slightly lower discount.
Scope:
Account Level: Recommendations can be generated for the current account, giving you targeted insights into your own usage patterns.
Organization Level: For AWS Organizations, recommendations can also be generated at the organization level, considering usage across all member accounts and maximizing cost savings potential across the entire organization.
Reserved Instance Recommendation Console
AWS Reserved Instances are available for all AWS services, including Amazon Elastic Compute Cloud (EC2) and Amazon Relational Database Service (RDS). And they are available in three types: Standard, Convertible, and Scheduled.
Each type of AWS Reserved Instance offers unique benefits, and the right choice will depend on your specific needs and usage patterns. It’s essential to evaluate your workload and usage patterns carefully before committing to a specific type of Reserved Instance to ensure that you maximize your savings and optimize your infrastructure usage.
AWS Billing and Cost Management Savings Plan Recommendations
CloudArmee proactively identifies potential cost savings by analyzing instance and workload usage and suggesting a suitable Savings Plan. This is done in conjunction with Reserved Instance Analysis.
Key Distinctions:
Term: Recommendations cover both one-year and three-year terms, allowing you to balance cost savings with flexibility based on your workload predictability.
Type: It includes suggestions for both Compute Savings Plans and EC2 Instance Savings Plans. Compute Savings Plans offer the most flexibility, applying to various EC2 instance families and sizes, as well as Fargate and Lambda. EC2 Instance Savings Plans provide the highest discount but are specific to instance families in a region.
Scope:
Account Level: Recommendations can be generated for the current account, giving you targeted insights into your own usage patterns.
Organization Level: For AWS Organizations, recommendations can also be generated at the organization level, considering usage across all member accounts and maximizing cost savings potential across the entire organization.
Savings Plan Recommendation Console
Savings Plans is a great option to receive discount as it offers low prices on Amazon EC2, AWS Lambda, and AWS Fargate usage, in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a 1 or 3 year term. When you sign up for a Savings Plan, you will be charged the discounted Savings Plans price for your usage up to your commitment. AWS offers two types of Savings Plans:
Compute Savings Plans
Compute Savings Plans provide the most flexibility and help to reduce your costs by up to 66%. These plans automatically apply to EC2 instance usage regardless of instance family, size, AZ, Region, OS or tenancy, and also apply to Fargate or Lambda usage. For example, with Compute Savings Plans, you can change from C4 to M5 instances, shift a workload from EU (Ireland) to EU (London), or move a workload from EC2 to Fargate or Lambda at any time and automatically continue to pay the Savings Plans price.
EC2 Instance Savings Plans
EC2 Instance Savings Plans provide the lowest prices, offering savings up to 72% in exchange for commitment to usage of individual instance families in a Region (e.g. M5 usage in N. Virginia). This automatically reduces your cost on the selected instance family in that region regardless of AZ, size, OS or tenancy. EC2 Instance Savings Plans give you the flexibility to change your usage between instances within a family in that region. For example, you can move from c5.xlarge running Windows to c5.2xlarge running Linux and automatically benefit from the Savings Plan prices.
DynamoDB Reserved Capacity is a cost-saving feature that allows you to reserve a specific amount of read and write capacity for your DynamoDB tables at a discounted rate compared to on-demand capacity pricing. This discount can range from 54% for a one-year term to 77% for a three-year term.
Program Details:
Purchase: You commit to reserving a certain level of read and write capacity units (RCUs and WCUs) for either one or three years. This commitment can be made through the AWS Management Console, CLI, or SDKs.
Discount: In exchange for your commitment, you receive a significant discount on the hourly rate for provisioned throughput capacity compared to the on-demand pricing model.
Application: Once purchased, the reserved capacity is applied to the aggregate capacity of all your DynamoDB tables within the specified AWS Region. Any provisioned capacity beyond your reserved amount will be billed at standard on-demand rates.
Billing: You will be charged a one-time upfront fee (optional) and a discounted hourly rate for the duration of the reserved capacity term. Even if you don’t utilize the full reserved capacity, you will still be charged for the reserved amount.
Key Considerations:
When to Use DynamoDB Reserved Capacity:
To Manually identify usage and unused resources:
Look at the following CloudWatch metrics over a period of 30 days to understand if there are any active reads or writes on a specific table:
ConsumedReadCapacityUnits
The number of read capacity units consumed over the specified time period, so you can track how much consumed capacity you have used. You can retrieve the total consumed read capacity for a table.
ConsumedWriteCapacityUnits
The number of write capacity units consumed over the specified time period, so you can track how much consumed capacity you have used. You can retrieve the total consumed write capacity for a table.
An AWS EC2 Spot Instance is an unused EC2 instance which is available for less than the On-Demand price. Spot instances are up to 90% cheaper than On-Demand instances, which can significantly reduce your EC2 costs. A Spot Price is the hourly rate for a Spot instance. AWS sets the Spot price for each instance type in each availability zone based on the evolving supply and demand for Spot instances. Spot instances are cost-effective when you can be flexible with your application’s availability and when your applications can be interrupted after a two-minute warning notification.
Spot instances are ideal for stateless, error-tolerant, or flexible applications like data analysis, batch jobs, background processing, and optional tasks. These instances are closely integrated with AWS services like Auto Scaling, EMR, ECS, CloudFormation, Data Pipeline, and AWS Batch. You can easily combine Spot instances with On-Demand, RI, and Savings Plans instances to optimize workload costs and performance.
Reasons to Use Spot Instances
1. Low prices
2. Massive scale
3. Easy to use
4. Easy to automate
5. Supports other AWS services
We use 2 key strategies for using Spot Instances:
1. Maintain a minimum number of compute resources by launching a core group of On-Demand Instances and supplementing them with Spot Instances when required.
2. Launch Spot instances with a fixed duration, called Spot blocks, which are designed to be uninterrupted and run continuously for the duration you choose.
Recommendations
The AWS Spot Instance Advisor allows to find an appropriate spot instance to the workload specs such as region, operating system, vCPUs, and memory. A score is provided that lists the historical frequency of interruption which assists in picking instance types with a reliable historical inventory of spot instances.