tl;dr How to get faster, more complete external attack surface coverage by automatically clustering exposed web apps by visual similarity.
The problem statement:
You’re a large company with thousands to hundreds of thousands of websites online. Some of them are public intentionally, some aren’t - some are intended to be internal only, behind SSO or a VPN, or have been spun up and forgotten.
How can you figure out:
- Everything on the Internet belonging to your company?
- Which of those assets are high risk and need to be prioritized?
A strength of the approach described in this talk is you don’t need to know beforehand what you’re looking for. Instead, you can look at groups of similar web applications and focus on outliers to find likely issues.
This has a significant advantage over signature-based approaches, where you need to write a fingerprint of every type of thing you’re looking for, which requires a massive amount of upfront and ongoing engineering effort.
Applications of this research include:
- Pen testers can assess larger clients’ attack surfaces more efficiently and consistently.
- Companies can use it to find:
- Potentially high risk assets or leaked sensitive info.
- Websites that should not be public.
- Regressions - fixes that have been undone.
- Other instances of known bad services that need to be patched or taken down.
Robot Powered Bug Bounty
One neat example the speakers described is that a company gave them access to all of their bug bounty reports. They extracted the paths from those reports then used their infrastructure to scan all of the company’s subdomains they had enumerated for the same bug, finding other instances that hadn’t been reported via the bug bounty program. A security analyst can easily vet if the new instances are true positives by simply following the reproduction steps in the bug bounty submission!
- Challenges of current content discovery approaches (technical and process)
- The architecture and approach of the automated tooling built by the speakers
- An evaluation of the approach across every web app using AWS Elastic Beanstalk
- Future plans and audience questions
Content Discovery of Yesteryear
In the olden days (~2008), content discovery was done using tools like OWASP DirBuster. These tools took file and directory wordlists and then requested them on one or more sites to try to find sensitive, hidden paths that could then be exploited.
Content Discovery Problems
- Because you generally were running DirBuster from one machine, you could easily be blocked with one IPS or IDS role.
- Sometimes you’d DoS the server, as you’re making tens of thousands of requests in rapid succession.
How do you know when requesting a directory or path fails?
After all, you’re probably doing this across thousands of different tech stacks, frameworks, and applications.
Most tools do this by:
- For each domain, make a request to a path that’s known
to be invalid (e.g.
- Build an on the fly signature, based on a combination of factors from the server’s response, like HTTP headers, HTML body, HTTP status code, content length, etc.
- Then, in subsequent requests, the server’s response is compared to the known bad request to determine if the path is valid.
Bringing in the Visual Component
The above process can still be a bit noisy and slow, so a common modern approach is to automatically screenshot every path using a tool like EyeWitness or gowitness. These tools leverage headless Chrome to locally store a screenshot of every page to make it easy to review later.
While this does make human review easier, manually going through hundreds of thousands of screenshots is still prohibitively time-intensive.
In addition to the technical challenges, there are also process challenges when a team of security professionals are doing this work:
Inconsistency – Content discovery steps may be done inconsistently when the scope is large, and it’s not feasible to run large-scale comprehensive dictionaries on every target in a small time window using traditional techniques. Further, different people may have different processes.
Divergence – If multiple people are performing the scanning, they may store the results separately and keeping this info merged/up to date is logistically challenging. As new targets are discovered through an assessment, ensuring all targets were reviewed the same way is tough.
Efficiency – Reviewing results is time consuming when the false positive rate is too high, and the dictionaries used may be outdated or inefficiently used.
Goal: Make it easy for humans to get eyes on interesting screenshots from URL path bruteforcing.
OK, let’s look at the system they devised that addresses these problems.
At a high level, the overall flow is:
- OSINT is used to find target domains and subdomains.
- Threat intel, wordlists, and other sources are used to find interesting paths to check.
- AWS Lambdas are spun up that use headless Chrome to screenshot these paths.
- Shrunk screenshots are stored in S3, response bodies and headers are stored in Elasticsearch. Screenshots are grouped by similarity using fuzzy hashing.
- Humans review sorted screenshots for leaked sensitive data or promising targets to attack.
Let’s look at a few of these steps in more detail.
OWASP Amass is used to build the initial list of domains and subdomains, as it’s great at aggregating various OSINT info (e.g. certificate transparency, DNS enumeration and brute forcing, scraping, etc.).
On those accumulated domains and subdomains, we need to know which paths to request.
They’ve also incorporated info from Grey Noise, which is a threat intelligence dataset from which they can glean the paths that real attackers are scanning the Internet for.
Though it’s early stages, they’ve also tried scraping paths from all of Exploit DB, CVE data, and getting a list of every web path in Metasploit.
The key here is they do a fuzzy hash of images. With cryptographically secure hashes, changing one byte makes the resulting hash totally different. In fuzzy hashing, however, small changes to the original file cause only small changes to the resulting hash, thus allowing you to group “similar” images.
The fuzzy hashing algorithm they use to detect image similarity is probably something like ssdeep.
Alright, so you built some fancy tooling, now time to take it for a spin in the real world!
The speakers decided to target every web app running on AWS Elastic Beanstalk.
- They had 88,022 total Elastic Beanstalk subdomain targets, and they made 10 path requests to each (based on Grey Noise data) => 880,220 total requests.
- Of these 880,220 requests, 52,586 responded with a
200 OK, which they then screenshotted.
- 4,007 of these screenshots were flagged as “interesting” to manually review.
- 30 minutes of human review time yielded 321 confirmed exposures.
So 880,220 requests
=> 52,586 screenshots (~6% of total)
=> 4,007 interesting screenshots (~0.05% of total). Nice!
They found over 9GB of source code (that’s a lot of plaintext files) at paths
/.git/config, which contained sensitive content like encryption keys,
database passwords, API tokens, references to internal systems, and of course
you can hunt for vulnerabilities in the source code.
One thing they’re planning to do is perform OCR on screenshots, to allow performing filtering based on text in images (e.g. using AWS Rekognition). This should hopefully lead to fewer false positives and false negatives, because the tool will be able to understand when contextually important information, such as “404 Error” or “Login” are in images.
Machine learning has some promising applications here, enabling the system to more quickly identify potential issues, without waiting on a human analyst, enabling companies to fix exposures more rapidly. This could be done unsupervised, or potentially supervised, being fed labeled data from human analysts. Another interesting angle is that it may be possible to train an ML model per-organization, so the model understands what’s sensitive or dangerous to be exposed for that company, which may be quite different than other companies.
- For example, the ML could learn that seeing a specific logo on the page may indicate whether the page should be externally facing or not. Likewise with ads or other potential indicators.
A few months after this talk, at BlackHat USA 2019’s Arsenal, Bishop Fox’s Dan Petro and Gavin Stroy released eyeballer, a convolutional neural network for analyzing pentest screenshots.
So far they’ve been focusing on breadth, but there could also be some value in applying these techniques at depth, for example, applying the same process for many paths on one target, like Burp Intruder, using common crawl data or another crawler. These pages could similarly be screenshotted and compared, arguments could be fuzzed, etc.
Have you tried screenshotting with multiple rendering engines? Using only headless Chrome may cause you to miss things.
Some sites will show you more or less functionality based on your user agent (e.g mobile browsers), and some sites may not work on headless Chrome at all ( “This site requires Internet Explorer with ActiveX version…”). This isn’t a feature they’ve implemented yet, but they may in the future.
What’s the smallest set of assets that would still be useful to do this process with?
If you only have a handful of assets, it might not provide a lot of value, but they’ve used it successfully on some small businesses with just hundreds of assets.