tl;dr sec
Posts
Cloud Forensics: Putting The Bits Back Together

Cloud Forensics: Putting The Bits Back Together

Brandon describes an experiment he did in AWS forensics (e.g. Does the EBS volume type or instance type matter when recovering data?) and gives some advice on chain of custody and cloud security best practices.

Clint Gibler
January 14, 2020

Brandon Sherman, Cloud Security Tech Lead, Twilio linkedin
abstract slides video

Motivation

If something bad happens, how can we determine how bad that something was?

Compromises happen, so we need to build detection capabilities and processes for doing forensic investigations. These investigations should have a rigorous, repeatable process: post-incident is not the time to be developing techniques. Adhoc forensic processes can lead to important data being lost.

One thing that motivated Brandon to pursue this research was curiosity: how can one determine if an attacker tried to cover their tracks? Have they deleted log files, caussed log files to roll over, used a dropper that erased itself, something else?

The Cloud

This talk focuses on AWS, and specifically the following services:

Elastic Cloud Compute (EC2): VMs on demand.
Elastic Block Storage (EBS): A virtually attached disk, kind of like network attached storage. You can specify its size, where you want to attach it, and there are various backing storage options.
Identity and Access Management (IAM): Fine-grained access controls that enable you to dictate who can perform which API calls on what resources.
Simple Storage Service (S3): Key-value store of binary blobs. When EBS volumes are snapshotted, they can be stored as objects in S3.

The slides give other useful info about these services that I’m skipping here for brevity.

Open Questions

In this work, Brandon sought to answer the following questions:

If a snapshot of an EBS volume is taken, will that snapshot only contain in-use blocks, or are deleted blocks also included?
Does it matter what the original EBS volume type is? Has AWS changed their implementaiton between versions?
Does the instance type matter? Does NVMe vs SATA make a difference?

Experiment Process

Brandon’s methodology was the following:

Launch a selection of EC2 instances (nano, medium, large, etc.)
Attach one of each EBS volume type to each class of instance
Write files (known seed files)
Delete files
Snapshot disks
Rehydrate snapshot to new disk
Look for files

sudo aws s3 sync s3://seed-forensics-files /mnt/forensic_target
sync && sleep 60 && sync && sleep 60
rm -rf /mnt/forensic_target/*
sync && sleep 60 && sync && sleep 60

Tools / Scripts Used

PhotoRec is free software that can find deleted files, originally created for recovering photos from camera SD cards. It does this by looking at the raw blocks of a disk and compares the data to known file signatures (e.g. magic bytes).

Brandon wrote a script, forensics.rb, that will rehydrate each snapshot to an EBS volume, attach it to an instance, run PhotoRec and look for deleted files.

Results: Comparing File Hashes

Many files were dumped into the recovery directory, and not all were seed files written to disk during the experiment. One potential cause is the files could be recovered from the AMI- AMIs are snapshots and thus contain deleted files.

Frequently, more files were returned than originally seeded to disk, even when a non-root volume was used, as PhotoRec has to guesss where files begin and end. Text files, for example, don’t have clearly defined magic numbers and end of file markers.

So Brandon instead tried comparing the first n bytes of recovered files to the original files, where n can be set based on your tolerance for false positives.

Results: Comparing File Bytes

Source instance type (SATA vs NMVe) had no detectable effect
Recovery success varied based on source volume type
- Best recover rates: standard, gp2, io1 (80+%)
- Less good: sc1, st1 (20-50%)
Some weird artifacts were recovered
Recovery of PDFs from sc1/st1 based drives resulted in massive files (but not other drive types)

Why? The root cause for these results was not easy to determine

On the Research Process

In the Q&A section, one audience member asked if the observed results were indicative of PhotoRec vs tools specifically made for these sorts of forensics uses. As Brandon only used PhotoRec, he said the results could be a function of the tool itself or the nature of the volumes examined, though some aspects seemed endemic to a given volume type.

I thought this was a good question. In general, when you're doing research projects, it's easy to be focused on the details as you get into the weeds, but it can be useful to periodically step back and ask meta questions about your methodology, like:

Are there other factors or confounding variables that might make my experiment not as conclusive as I'd like?

I really liked how Brandon decided on the research questions he wanted to answer, determined a methodology for answering them, and then ran his process across many combinations of instance and EBS volume types. It would have been easy to do this less rigorously, but that would have made the results and the talk weaker.

My approach when doing research is kind of like agile prototyping a new product: get as much early critical feedback as possible, even in just the ideation phase, to vet the idea. Specifically, I try to run ideas by experienced security colleagues who can help determine:

How useful is this idea? To whom?
Has it already been done before?
Is there related work that I should make sure to examine before starting?
How can I differentiate from prior work?
Would a different angle or tact make this work more impactful, useful, or meaningful?

After this verbal trial by fire, if the idea passes muster, I then make a reasonable amount of progress and then run it by colleagues again. These checkins can be valuable checkpoints, as the work is more defined (tools used and your approach becomes concrete rather than theoretical) and you have some preliminary results that can be discussed.

This approach has made every research project I've ever done significantly better. It can be scary opening your idea up for criticism, but I can't emphasize enough how useful this is.

Chain of Custody

An attacker who has gained access to your AWS account could delete snapshots as you take them, causing valuable forensic data to be lost.

It’s best to instead copy all snapshots to another, secured account.

AWS accounts are like “blast walls” - an explosion in one account cannot reach other accounts, limiting the blast radius.

Brandon advises enabling AWS CloudTrail, which records most APIs calls (what was done by who from where), and writing the log files to a separate account to the one owning CloudTrail.

Takeaways

What does your threat model look like?
This influences how you structure accounts, what and how much you log, etc. Understand the limitations of the tools and services you’re using.

Consider writing only to non-root EBS volumes
This eliminates the large number of recoverable files deleted from the AMI, potentially making forensic tools less noisy.

Use multiple AWS accounts
So that a breach of a server doesn’t mean a breach of everything. Again, having separate accounts limits the blast radius. Enable CloudTrail and keep everything related to forensics as isolated as possible.

Be careful what you write to an AMI, especially if it’s going to be shared
If you write sensitive info (such as secrets) to your AMI as part of the baking process, it could be recoved.