Cloud Security Orienteering
How to orienteer in a cloud environment, dig in to identify the risks that matter, and put together actionable plans that address short, medium, and long term goals.
This post breaks down a methodology for how to secure an AWS environment with which you’re completely unfamiliar.
This guide will give you the keys to success, covering:
Why applying cloud security best practices is so hard.
Understanding common patterns across cloud adoption, architecture, and implementation.
Identifying the scope and contents of your cloud ecosystem, from environments down to individual resources.
How to identify important risks and prioritize accordingly, including a number of open source tools to help with tasks like finding privilege escalation risks and enforcing least privilege.
Pointers on defining a long term cloud-security stategy.
Most of us are not lucky enough to have architected the perfect cloud environment, according to this month’s best practices, and without any legacy elements or “surprise” assets.
As a security consultant, I had the challenge and opportunity to enter blind into a variety of cloud environments. They were across Azure, GCP, and AWS, some well-architected and others organically sprawling, containing a single account/project and hundreds.
This gave me a rapid education in how to find the information necessary to familiarize myself with the environment, dig in to identify the risks that matter, and put together remediation plans that address short, medium, and long term goals. In these engagements, clients were paying for a security assessment, but to execute them was an exercise in orienteering.
Over the course of a career in cloud security, you’ll likely find yourself walking into a new environment and needing to rapidly orient yourself to both mitigate the biggest risks and also develop a roadmap towards a sustainable, secure future. There are a variety of ways the methodology I’ve defined can be applied, including:
A new job
A new team
A merger or an acquisition
A fresh assessment of your current environment.
In order to thoughtfully analyze cloud environments, it’s crucial we understand why they tend to be inconsistently architected and secured.
Securosis, a really incredible information security research and advisory firm, has broken out four common patterns of cloud adoption. I highly recommend their whole blog post, but the following is an excerpted version of their analysis:
Note the trend. In three of the four patterns, security lags behind cloud adoption, with high or variable risk.
Unless an organization is believed to have exceptional cloud maturity, it’s a fair expectation that upon gaining access you’ll find a broad array of security risks, misconfigurations, and security debt entwined with technical debt.
Cloud Architecture Security Best Practices
The cloud is a relatively new concept, with standards and best practices that are elastic and evolving. Now, if we’re going to critique the current state of an environment, we need a clear and well-founded definition of the target end state.
This can be a hard question in cloud architecture, due to:
Broad configurability and customization
High complexity ceiling
More services than any person could learn: 200+ in AWS, 150 of which have a security chapter
Limitations on the platform side of the shared responsibility model that are poorly, inconsistently documented (See: Matt Fuller’s The Shared Responsibility Model for Cloud Security is Broken)
Note: From here on out, I’m going to use AWS for all examples. However, we’re going to be talking principles, nothing that shouldn’t be applicable to other cloud providers … even Oracle.
AWS Architecture Security Best Practices
So, what does good look like in AWS? What comes to mind when I say “AWS architecture security best practices”?
Maybe it is the AWS Well-Architected Framework - including the security pillar?
Or maybe the prescriptive AWS Security Reference Architecture, “a holistic set of guidelines for deploying the full complement of AWS security services in a multi-account environment”?
For more recent cloud adopters, maybe it is AWS Control Tower? “The easiest way to set up and govern a secure, multi-account AWS environment” … that only supports a narrow set of requirements and generally is only applicable for greenfield deployments.
“Best Practice” has Evolved over Time
Part of the complexity here is the fact that in the last 10 years - AWS (and every cloud provider) has been rapidly improving and iterating on its recommendations, building new features for security and security orchestration.
You’ll often see permanent architectural evidence on when an organization started its cloud journey.
2010: Multiple AWS Accounts + Consolidated Billing
2017: AWS Organizations 1.0: account management and billing
2020: AWS Organizations 2.0: services are operating at an organization level
Organizations, now a standard recommendation and expectation for cloud security maturity, weren’t even around until 2017. It is therefore very common that organizations pre-date even the first versions of this and many other constructs.
As another example we can break down the AWS Security Best Practices Whitepaper, starting in 2017 and ending with the final release in 2020, at which point it was retired in favor of more dynamic artifacts. This is directly inspired by Scott Piper, who in 2018 published a comparison of the year’s updates, with significant changes including:
Use Shield, WAF, and Firewall Manager
Use Athena to search and analyze logs (not ElasticSearch or EMR)
CloudFormation as a key service
No more Macie (this was right around the time there was extensive discussion of how the pricing model made it inadvisable in almost all cases)
Between 2018 and 2020, major changes include:
Account Management and Separation as a top level concern - AWS Organizations
Recommendation of a federated identity provider
Frequent reference to AWS Security Hub (+ Config Rules)
Automatic remediation with EventBridge and Config triggering Lambdas
Systems Manager, software integrity
SCPs for data protection
Significantly expanded Incident Response section
OK, So Where does that Leave Us?
While there doesn’t seem to be one answer here - for reference I’ll roughly be mapping to three standards:
Alright, now let’s get into how to effectively and rapidly orient yourself in a new and complex cloud environment.
Assumptions about Your Organization
Before we dive in, some assumptions we’ll place on the organization you’re analyzing:
Cooperative (but not omniscient) help.
Good intentions in design - but no prior security architecture.
We’re not talking multi-cloud - you’re on your own there.
Requisite access has been established.
No consideration of an active or historic compromise.
This is not an Incident Response guide.
It is not uncommon that this practiced discovers data meriting investigation, be prepared.
There are some core principles to keep in mind while orienteering in the cloud:
Breadth, then depth
When orienteering, it is important that you gain enough context to inform prioritization. To that end, you should focus your initial efforts on breadth of coverage, and avoid extraneous depth of investigation.
As a corollary, avoiding rabbit holds is crucial, as it is an inefficient use of limited time if performed before broad organizational and environmental context is established. Once you’re broadly indexed the environment, you can then make informed decisions on where additional rigor is necessary.
Orienteering is an exercise in pattern matching, based on experience and awareness of common patterns and practices, as well as familiarity with norms of a specific environment. Some of this is gut, but much of it is about gaining an understanding of the standards within an environment, and finding locations that deviate.
To this end, it is important to review every region, every project, and every account.
If an organization has all its infrastructure in
us-east-2, it is then likely the lone resource in Bahrain is problematic. Similarly, if you know the organization adopted serverless for all infrastructure multiple years ago, EC2 instances bear extra scrutiny.
Inside out and outside in
A major benefit you have in this process over a malicious actor is your authenticated access to the environment(s). You should take maximum advantage of this, by leveraging read access (e.g
ReadOnlyAccess) to the cloud management plane to comprehensively query and enumerate the state of the environment.
Meanwhile, you should not avoid emulating an attacker’s perspective. There are numerous existing guides to discovering exposed resources, for example on Hacking the Cloud or in Felipe Eposito’s Hunting for AWS Exposed Resources. For an enumeration of possibly exposable resources, see Summit Route’s GitHub repository. At the most basic level, this includes doing attack surface discovery with a tool like OWASP’s Amass. This is the primary way for you to ensure coverage of unknown unknowns.
Orienteering is heavily dependent on the excavation of historic artifacts within an organization, in order to contextualize and guide you within the cloud environment.
There are numerous forms of artifacts an organization might create that can come in handy when trying to acquaint yourself with their cloud posture. If you’re extremely lucky, there is already a detailed asset inventory. If you’re a little lucky, most resources will be configured as code (Terraform, CloudFormation, Pulumi, Chef, Ansible, Puppet). That gives a definitive (and potentially centralized) source of truth for cloud configuration.
Most organizations, especially those with a mature compliance or governance function, should have data classification and designation of scope for varying types of data. Documentation of all forms is also useful, and it is important to discover how the organization naturally generates documentation: Is it via a wiki? Is it in various Google documents? Is it stored in
READMEs alongside code?
Standardized tagging practices are another rich source of data, and even inconsistent tagging should be mined for context. (If you haven’t already rolled out standardized tags, check out Yor to automate it!)
All organizations should have subject matter experts on the company and the environment who should be available to pass on tribal knowledge.
Finally, identify and take advantage of any existing cloud security tools. This can includes features of the cloud service provider, third-party vendors, or open source tools.
No matter the data available, early in this process you will need to either find or create:
Architecture diagrams or documentation of intended workloads.
Definition of crown jewels: Every organization has a unique set of data that is identified by the business as the most sensitive if compromised. In a regulated industry this can be fairly obvious, while in other cases it requires an acute understanding of the business’ priorities.
Information on intended authentication and identity approach, in order to formulate a strategy to wrangle access.
Hierarchy of Discovery
There is a hierarchical structure to cloud environments.
At the base, there is the overall collection of environments that a business (or business unit) owns. A level up, there are specific environments. Environments are the critical point at which there is a cloud service provider enforced security boundary. Within those environments, there are workloads and regions, where a workload can span multiple regions or a region can host many workloads. Finally, there are the individual services in use, and the resources that compose those services.
This hierarchy is not a directed ordering. For example, once you’ve identified a workload, it should be traced in both directions. First, work your way outwards to discover the containing environment and collection of environments. Then, work your way through to determine the component services and resources, and their regions.
The Discovery Process
Generally, you should start your discovery process by targeting environments. This is because it is the largest discoverable unit, there is no good way to directly identify the collection without first finding the environments.
Scott Piper (of Summit Route) has already written a comprehensive guide on how to find all the AWS accounts affiliated with an organization. His methodology is, in short:
Inventory known accounts.
Ask your Technical Account Manager to identify accounts associated with your company domain.
Search company emails for “Welcome to Amazon Web Services” account setup emails.
Search network logs for traffic to the AWS Console, and ask the users.
Ask the finance team to find all expenses and payments to cloud providers.
Put out a public request to company employees.
Review trust relationships within identified accounts, to find additional accounts.
You should also consider setting up incentives to bring accounts under centralized management early, in order to begin to wrangle the problem. I’ve seen success with programs that offer:
Employees can expense the costs of development environments, or have the central team take budgetary ownership of production accounts.
Centralized and automated default configuration and provisioning of new accounts.
Ownership and responsibility for maintenance, and stability of the environment.
At this juncture, you should also inventory the relationship between Accounts and Organizations. This is part of your process of defining the target end state, and seeking answers to questions like:
Is there a need for multiple Organizations?
Are there accounts that are unused or minimally used?
Who is the proper business owner for each account or organization?
After you’ve taken a pass at environment discovery, you then want to target workloads.
The first place to start is within the discovered accounts, starting with billing reports. The account’s bill can be surprisingly useful in indicating architectural patterns.
For example, is there a huge usage of EC2s, or are managed container services a core element of the cost? Corey Quinn breaks this down more extensively in Last Week in AWS.
You also should rely heavily on the documentation you’ve collected in the archeology phase.
Infrastructure as code is useful here as well. Generally, workloads are grouped within the code, which provides an obvious relationship between resources that might be widespread upon deployment. Besides finding workloads within the previously identified accounts, this also gives you a second chance to identify novel accounts based on the discovered workloads.
Now, having had two chances to identify all the accounts, you should move on to indexing the resources spread throughout the cloud estate.
It is essential to recognize out the gate that this would be impossible to accomplish manually. I’ve been forced to attempt a manual assessment before due to consulting constraints, and it is slow, painful, and fallible. This fundamentally does not scale beyond small environments, and should be a last resort. Instead, automation is key, with two possible mechanisms:
Leveraging existing company tooling: Many organizations will have existing systems in place that you can use. This may be a Cloud Security Posture Management tool, or native Cloud Service Provider services like Security Hub or AWS Config. In this case, you should be wary of the configuration of the service, specifically in terms of disabled rules or excluded resources. Ensure that exceptions do not disguise true positives, or are at least properly tracked in a risk register. Additionally, be conscious that existing tooling does not likely apply to all the organizational accounts that you discovered, and by no means addresses the problem of unknown unknowns.
Running auditing and/or inventory tooling: Generally, given the pace at which you should plan to be working, you’ll find open-source tooling to be your best option (versus procuring a tool from a vendor). NCC Group’s aws-inventory is a good place to start, however I actually generally prefer to go straight to using auditing tools for my discovery. While targeted at misconfigurations, auditing tools must inherently first collect information on the resources. Some of my preferred tools include Steampipe, Prowler, and ScoutSuite.
At this point, you likely have more data than you know what to do with, but you should also have a decent mental model of the organization’s cloud estate.
The next step to distill all the information you’ve gathered in actionable guidance, focused on what is most immediately important.
If you try to bring an entire organization up to code, you’ll be unlikely to make progress, you’ll find significant push-back due to lack of direction, and you’ll be leaving your crucial risks unaddressed until you find a way to comprehensive resolution.
Prioritization is an exercise in deciding what is most important.
It has become well understood that in the cloud, “identity is the new perimeter.” However, the network perimeter also remains. While there has been a strong push for defense in depth models within cloud environments, the truth of the matter remains that in terms of your immediate risks, footholds remain the most pressing.
One concept for breaking down footholds relies on assessing kill chains. I’m partial to DisruptOps’ Top 10 Cloud Attack Killchains.
We can decompose these threats based on three characteristics:
Are they the initial foothold?
Are they cloud-specific?
What is the impact of a successful compromise?
You can then focus on High impact threats that provide the Initial foothold. Prioritizing those that are cloud-native is important to de-conflict with other security teams and work.
An analysis of DisruptOps’ Top 10 Cloud Attack Killchains
Alternatively, we can assess risk based on the root cause of publicly disclosed AWS client security breaches. I did a talk analyzing this data, with the following the outcome summary:
Initial Cloud Breach Access Vectors (source)
There is a distinct benefit to reviewing the oldest and longest running workloads. Inquire after their current usage or necessity. Generally, these resources have the most drift from current best practices, and may predate many controls.
There are a few key concerns on the identity perimeter:
Management plane access model: Enumerate the mechanism(s) used to mediate management plan access.
Companies without a concerted cloud security effort tend to have multiple competing means of management plane access. This can include use of IAM Users, AWS SSO federated access, AWS IAM federated access, adoption of inline and customer management policies, usage of AWS managed policies (with associated risks), cross-account user access (and MFA configuration), cross-account service access (and ExternalID configuration), and others that are even less common.
In the short term, this information should inform your assessment of key risks. In the long term, you’ll need to reconcile everything identified into a single, efficient and auditable mechanism. My current preference is AWS SSO integrated to your Identity Provider, which also allows you to easily enforce phishing-resilient (FIDO2) MFA within the IdP.
Example architecture of federation between Okta and AWS SSO
Andreas Wittig breaks down the benefits of this model on Cloudonaut, which include:
Integration into AWS Organizations for automated role and IdP provisioning
Permission management based on IdP group membership (as well as SCIM support)
A user-friendly login portal
First-party support for the AWS CLI
SSH/Server access model: Within a cloud environment, there are numerous services that do not generally take advantage of cloud-native identity and authentication. It is therefore important to identify blessed and shadow patterns for SSH and server access (as well as to other systems, these are just most common!).
This will commonly include usage of bastion/jump hosts, or potentially direct exposure of SSH over the Internet. There can also be more cloud-native patterns including remote access tooling or use of native services such as SSM. Segment has published a wonderful guide on how to replace legacy bastion hosts with AWS Systems Manager Session Manager.
Improving these patterns can take significant time and refactoring, so it is essential to be able to distinguish known and semi-secure practices, such as exposure of SSH over the internet with fail2ban in place and only key-based authentication + MFA, with accidental exposures.
Least Privilege and IAM security: There are a few additional controls that are universally applicable, and present significant enough risk to mitigate immediately. The first and most well discussed of these is to secure the root user.
The second is to clean up unused roles and users, generally based on Access Analyzer results. However, be careful and thoughtful about what you remove. disaster_recovery_break_glass might be worth some more consideration before nuking.
Finally, identify all cross-account trusts, and a tool like Cloudmapper can help here, and make sure to follow up with your partners across the organization to validate and characterize all of that access.
The primary means of reviewing controls in this domain are:
IAM Credential Report: Identify unused users and roles, and well as authentication patterns, such as MFA usage.
IAM Access Analyzer: Identify resources in your accounts shared with external entities.
Trusted Advisor (free): Multi-factor authentication on root account, AWS IAM use
Open source tools
Cloudsplaining: Provides a comprehensive and digestible risk-prioritized report of violations of least privilege in your AWS IAM.
PMappper: A script (and library) that identifies privilege escalation risks and leverages a local IAM graph to allow querying of principals with access to a specific action or resource, including transitive access.
PolicySentry: A tool that greatly improves the user experience of IAM least privilege policy generation.
RepoKid: A tool that automatically reduces permissions down to least privilege based on usage patterns. (Check out the ENIGMA talk: Least Privilege: Security Gain without Developer Pain.)
ConsoleMe: A Netflix-developed web service that “makes AWS IAM permissions and credential management easier for end-users and cloud administrators.”
The network perimeter brings its own set of concerns:
Public resources in managed services: Managed services in AWS may allow resources to be made public, through either resource policies (S3, ECR repositories, Lambdas), sharing APIs (AMIs, RDS snapshots), or network access (Redshift, ECS, Lightsail). For a more complete list, see aws_exposable_resources on Github.
Public network access to hosted services: Controlling and mediating network access to the services you host on cloud infrastructure is a complicated problem at scale. Leverage the findings from audit tooling or even free Trusted Advisor rules such as the finding for any security group that allows unrestricted access (
0.0.0.0/0) to specific sensitive ports.
Default, insecure resources: Significant usage of default VPCs and security groups in AWS are a sign of poor security practices. A VPC’s default security group is configured to deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group.
Adding multiple workloads to this VPC means they have undue cross-instance access. Defaults should be restrictive, limiting all but necessary traffic. Security groups prefixed with
launch-wizardindicate the web console was used to deploy an EC2 instance, and automatically exposes port 22 to the Internet. What might a more mature approach look like? Check out DisruptOps’ The Power of the Minimum Viable Network.
Hosted Applications and Services
It is important not to disregard the applications and services running within the account while orienteering. These are one of the major initial sources of cloud breaches, as we discussed earlier. Generally, your first pass should only do a basic vulnerability assessment against public resources. You should seek to identify and mark for remediation:
Out of date services, especially ones with known vulnerabilities.
Unauthenticated services that are unintentionally exposed.
Sensitive or internal services that are needlessly public, such as CI/CD tools.
For a lightweight approach, you can use
nmap as a vulnerability scanner. If you can afford it, consider instead a third party tool like Qualys or Nessus, or using cloud native capabilities, like AWS Inspector.
Less actionable concerns
Many risk assessment guides, security assessment methodologies, and AWS security best practices speak to some common concerns that are not sufficiently actionable or impactful to pursue while orienteering. These include exposed secrets, such as:
CloudFormation parameter defaults
Unencrypted Lambda environment variables
EC2 instance data scripts with hardcoded secrets
ECS task definitions with exposed environment variables
Sensitive files on S3
Code repositories, compromised credentials
However, you should identify the planned secrets management pattern, and align on a unified and secure future state. Generally this will be through a service like Secrets Manager or Hashicorp Vault.
Q: Is your organization operating in a regulated industry?
A: No → Congratulations! Please proceed
A: Yes → My apologies. You will be obligated to focus on compliance-impacting controls, documented exceptions, and compensating controls. You will be forced to fight with cloud encryption configuration to meet compliance obligations. For example, there are 44 controls in Security Hub’s PCI DSS security standard, all of which will need to be enforced or compensated for to meet the compliance requirements.
If we accept that most breaches are caused by misconfiguration, then we should address dangerous misconfigurations. However, when orienteering there is a real difficulty in identifying what is a misconfiguration.
My favorite case study is the common example of an exposed S3 bucket, which is likely the largest single reason for cloud data breaches.
In almost any AWS environment, there are going to be intentionally public S3 buckets. For example, they may host static assets or public reports for downloads. Before entering a new environment, you should be forewarned as to the difficulty in determining what is intentional versus accidental. How you effectively determine the difference in data sensitivity between
s3://project-metero-123artaweg goes back to our early discussion of corporate archeology.
Defense in Depth is Over Hyped
In modern discussions of cloud security, there is what I observe as a hyper fixation on defense in depth. While there is objective value to defense in depth, to address the real world risks upon taking on a new environment it is crucial that establishing a first line of defense across the boards is the priority.
The canonical example here is encryption.
Now, I am by no means recommending you don’t leverage available encryption, especially for services where account-wide defaults can be set (e.g EBS).
However, I am saying that if you can avoid it, it shouldn’t be your top priority looking at a new environment. For a more complete discussion of deprioritizing encryption as a control in the cloud, please refer to Chris Farris’ insightful clickbait (not an oxymoron it turns out!) Cloud Encryption is Worthless, click here to see why….
Misconfigurations abound in the average cloud environment. There are hundreds of configurable security controls that can be tuned and hardened.
For example, just between May 2020 and July 2021, Security Hub’s AWS Foundational Security Best Practices controls has expanded from 31 to 141 total security controls covered.
In order to identify misconfigurations, you can use the tools discussed under The Discovery Process - Resources. In order to prioritize remediation, you should focus on the items discussed above - the Identity and Network perimeters.
Planning for the future 🚀
Once you own an AWS environment, it is essential you begin the process of strategic planning. To start, consider a blanket set of (fairly) universally applicable AWS hardening recommendations, including:
Enable GuardDuty in all accounts, and centralize alerts.
Enable Cloudtrail in all accounts; turn on optional security features, including encryption at-rest and file validation; centralize and back up logs.
Ensure security visibility and break-glass access to all accounts.
Configure account-wide security defaults, including S3 block public access, EBS and all other default encryption.
Driving Organizational Change
Working within a company in order to mature its understanding of security and its security maturity is the work of many years. Todd Barnum outlines one methodology in The Cybersecurity Manager’s Guide: The Art of Building Your Security Program. His basic guidelines are composed of seven steps:
Focus on key security domains to build program foundation
Create an evangelism plan
Give away your legos
Build your team
Measure what matters
Overall, fixing things is part marathon, and part whack-a-mole. It is a long road that is unlikely to ever get every resource everywhere up to best practices.
The best approach I’ve found is based on governance. Define a security baseline, document exceptions, and track and measure compliance of each business unit to the company’s security standards.
This leaves runs for explicit tracking of legacy risks, pushes security down to individual teams instead of centralized in security, and includes metrics and measurement that will help you in keeping executive buy-in for your initiatives.
When working to build a long term strategy for cloud security maturity, it is important to have a clear view of the target end state. To this end, targeting a maturity framework can provide scaffolding for your strategic direction.
While there are numerous maturity curves that have been suggested in cloud security, there are two I find particularly actionable:
Cloud Security Maturity Model (CSMM) - IANS, CSA, Securosis
SecOps (Simple Automation)
Manually executed scripts
Centrally managed automation
Cloud Security Roadmap - Marco Lancini. 5 levels of maturity, covering the following seven domains:
Policies and Standards
Supply Chain Security
Monitoring and Alerting
Incidents and Remediation
Where to Learn More
Cloud security is an evolving discipline, and the nature of cloud adoption has left many organizations with disorganized, ungoverned, and insecure cloud estates.
You will, at some point in your career, inherit or enter a cloud environment you had no part in building. This orienteering methodology will give you the tools to confidently approach that situation.
Most common cloud adoption patterns have left security trailing development, introducing significant risk.
The security impact of cloud service provider’s rapid pace of feature development, combined with their evolving definition of security and architecture best practices, can be directly traced to the environments you will encounter.
Corporate archeology is key to gaining the context for risk-informed decisions.
There are practical means to efficiently discover accounts, workloads, and resources.
You should prioritize resolving the risks that most commonly lead to cloud breaches, which means a focus on the Identity Perimeter, Network Perimeter, and Hosted Applications.
When building the future of your cloud security program and posture, focus on a relationship-driven model, and leverage some of the excellent existing maturity models to benchmark and progress against.
Ready to get started? See my checklist below ✅
I wrote checklist summarizing the key actions you should take. Check it out here 🚀.