name: inverse layout: true class: center, middle, inverse ---
Automated Bug Finding in Practice
### Daniel DeFreez ([@defreez](https://twitter.com/defreez)), Clint Gibler ([@clintgibler](https://twitter.com/clintgibler)) .footnote[Check out more of our work: https://programanalys.is | [@programanalysis](https://twitter.com/programanalysis)] --- layout: false # About Us ## Daniel DeFreez * PhD student at UC Davis * Co-founder of Practical Program Analysis, LLC * Unreasonably enthusiastic about LLVM ## Clint Gibler * Research Director and Senior Security Consultant at NCC Group * Co-founder of Practical Program Analysis, LLC * Midwesterner living in SF --- layout: false .left-column[ ## Motivation ] .right-column[ ### Life as an AppSec Engineer * My AppSec team is 1-5 people * Hundreds to thousands of developers * Too much code, being developed too quickly, to possibly review by hand * Waterfall -> Agile: code released more often, and security team can't be gatekeepers * Want to ensure a security baseline across all repos and services #### Questions: * How can we find bugs at scale? * Can we find some bugs semi-automatically? ] ??? Before we get into the talk, let's first take a step back and discuss the motivation that drives this work. In most companies, there are a handful of AppSec engineers responsible for helping to secure software written by hundreds to thousands of developers. There's too much code, being developed too quickly, for them to possibly review all the new code by hand. With the widespread adoption of agile development practices, code is being pushed out oftentimes daily, and the security team generally can't be blockers. At the same time, the AppSec team wants to ensure at least a security baseline across all code repos and services. So the questions for us are: 1. How can we find bugs at scale? 2. Can we find some bugs semi-automatically? --- background-image: url("https://imgc.allpostersimages.com/img/print/posters/yes-we-can-rosie-the-riveter_a-G-8179202-0.jpg") .center[### (Or this would be a pretty boring talk)] -- .footnote[.red[*] We're going to keep it real though] ??? The answer is "Yes we can!" Which is good, because if it wasn't, this would be a pretty boring talk on automated bug finding in practice, it would just be a title slide and a second slide that said "No." --- layout: false .left-column[ ## Technique ### tl;dr ### Pros ### Cons ### Tools ### Learn more ] .right-column[ ### Automated Bug Finding - Big Picture ### Techniques * #### Taint Analysis * Static * Dynamic * #### Symbolic Execution * #### Fuzzing * #### Symbolic Execution + Fuzzing ### Cheatsheet / Overview ] ??? Alright, so this is the agenda for the rest of the talk: 1. First, we're going to talk about some big picture concepts and terminology that will be useful in framing the rest of the talk. 2. The core of the talk will be discussing various automatic bug finding techniques. * For each technique, we'll first give a brief over view of what it is, so you have some intuition about how it works. * Then, we'll discuss the approach's strengths and weaknesses, where it excels and what it has trouble with. * Finally, we'll link you to some relevant tools as well as books, talks, papers, and blog posts where you can learn more. 3. Finally, we'll wrap up with a cheatsheet / overview of all of the techniques discussed, summarizing when and where you might want to use them. --- template: inverse # "Program Analysis" -- ### A pretentious way of saying: ## "Programs that analyze other programs" ??? Alright, let's get into it. "Program Analysis" is often a term that gets thrown around when talking about this problem domain. It sounds fancy, but basically it's just a pretentious way of saying "programs that analyze other programs." In general, program analysis doesn't need to focus on security, though we will in this talk. It could also refer to analyzing programs for other properties, such as correctness, or for optimizations. --- .left-column[ ## Terminology ] .right-column[
Discussing Tool Findings
Tool reports bug
Tool does not report bug
Real bug
True Positive (TP)
False Negative (FN)
Not a real bug
False Positive (FP)
True Negative (TN)
] ??? Let's talk about some important terms we use when discussing a tool's findings. There are two important axes here: 1. The first is: does the tool report a bug at a given line of code or piece of functionality. 2. And the second is: as a human, after manually reviewing the tool's output (also called "triaging", which is also generally --- layout: false .left-column[ ## Approach Types .red[*] ] .right-column[ ### Static Analysis Reason about code based on looking at it * **Pros**: High coverage, "fast" * **Cons**: Imprecise (false positives) ### Dynamic Analysis Run code and observe how it behaves * **Pros**: "Precise" * i.e. Tends to report true positives * **Cons**: Code coverage is a challenge * i.e. Can't find bugs in code that's not run .footnote[.red[*] These are high level generalizations] ] ??? Just to make sure we're all on the same page, let's briefly give a high level overview of static and dynamic analysis, which are the two broad categories we can lump all analysis techniques underneath. There are also hybrid approaches, and we'll talk about one cases of that too. So "static analysis," is reasoning about code based on looking at it, and "dynamic analysis" is running code and observign how it behaves. * So "static" - we're looking at the code, "dynamic" - we're running it. These approaches tend to have fundamental tradeoffs: * In static analysis, you get high code coverage, as you can see all of the code. * However, it can be imprecise, because of fundamental static analysis reasons as well as implementation tradeoffs you have to make as someone building the tool, including optimizations and approximations you make. * We'll discuss a number of these cases more in a bit. * In dynamic analysis, you tend to report true positives, as you just observed the bug or attack succeeding. * However, it can be tough to get high code coverage. Executing all of the functionality of a complex application isn't easy, and if you miss some parts, that leads to false negatives- that is, you miss a bug that's actually there. Again, these are just high level generalizations, but they're a useful point of reference for thinking about these techniques. --- class: center, middle, inverse # Static Taint Analysis --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
You've probably used static analysis already
-- ~~~ $ grep -r "subprocess.run(" . ~~~ ] -- .right-column[ But what about this? ~~~python # We've re-architected this to avoid having to use # subprocess.run(cmd) def foo(args): ... ~~~ ] -- .right-column[ or this? ~~~ logger.info("subprocess.run(args) succeeded"); ~~~ ] ??? Maybe these seem like contrived examples, but when I've looked for various security-relevant things using regexes on real penetration tests there's usually a number of results that are uninteresting for these or related reasons. -- .right-column[
We need a parser!
] --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
AST Matching
When you say ~~~ $ grep -r "subprocess.run(" . ~~~ What you really mean is: "Find me every time the `run()` method is called on the `subprocess` module." ] -- .center[] --- background-image: url("http://redfairyproject.com/wp-content/uploads/2016/02/You-get-in-life-what-you-have-the-courage-to-ask-for_red-fairy-Project_daily-inspiration.jpg") --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
So what do we want?
We want code where: * Some data I control * Eventually passes to the arguments to `subprocess.run()` OWASP calls these [Injection](https://www.owasp.org/index.php/Top_10-2017_A1-Injection) attacks. ] -- .right-column[
Attacker controlled data -> dangerous location
Describes many types of vulnerabilities. * Buffer overflows * XSS * SQL injection * Command injection * ... ] ---
--- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
Core Static Taint Analysis Components
There are a number of core components in every static taint analysis tool. ### Example Flask Endpoint ```python @app.route("/list_dir") def list_dir(): folder = request.args.get('folder', '') subprocess.run(["ls", folder], shell=True) ``` ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
Core Static Taint Analysis Components
There are a number of core components in every static taint analysis tool. ### Example Flask Endpoint ```python @app.route("/list_dir") def list_dir(): * folder = request.args.get('folder', '') subprocess.run(["ls", folder], shell=True) ``` ####
`Source`
- where attacker-controlled input enters the system ] -- .right-column[ Examples include: * URL and form params * Headers * Environment variables * Reading from a file or database * Command line arguments ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
Core Static Taint Analysis Components
There are a number of core components in every static taint analysis tool. ### Example Flask Endpoint ```python @app.route("/list_dir") def list_dir(): folder = request.args.get('folder', '') * subprocess.run(["ls", folder], shell=True) ``` ####
`Sink`
- where is it dangerous if attacker input can reach? ] -- .right-column[ Examples include: * Unparameterized SQL queries (SQLi) * In generated HTML without proper encoding (XSS) * In a file name to read or write * In shell exec'd commands (command execution) ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
Core Static Taint Analysis Components
There are a number of core components in every static taint analysis tool. ### Example Flask Endpoint ```python @app.route("/list_dir") def list_dir(): folder = request.args.get('folder', '') * target = "/tmp" + folder subprocess.run(["ls", target], shell=True) ``` ####
`Transfer`
- an operation where taint is propagated ] -- .right-column[ Examples include: ```python foo = bar # assignment foo + bar # string concatenation "foo %s" % (bar) # string interpolation ' '.join(some_list) # create string from list ... # tons of other ways ``` ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
Core Static Taint Analysis Components
There are a number of core components in every static taint analysis tool. ### Example Flask Endpoint ```python @app.route("/list_dir") def list_dir(): folder = request.args.get('folder', '') * target = canonicalize_path(folder) subprocess.run(["ls", target], shell=True) ``` ####
`Sanitizer`
- process user input and make it safe ] -- .right-column[ Examples include: * Output encoding * Parameterized queries * Canonicalize paths * Validations that enforce expected character sets/structure of input ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ] .right-column[
When Static Analysis is Hard
* Missing code * Library functions (need to model effects) * API calls * Dynamic Language features * `eval()` * reflection * Interprocedural analysis is tough to scale * Supporting many languages and frameworks is miserable.red[*] * Life is much easier if you can target a single code base .footnote[.red[*] [A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World](https://cacm.acm.org/magazines/2010/2/69354-a-few-billion-lines-of-code-later/fulltext) (Coverity)] ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ### Pros ] .right-column[
Speed and Scalability
* Tools scale better than people * You probably don't have time to manually review millions of LOC * Useful large, legacy code bases * Human auditors get tired, you can apply the same security rules consistently * Can keep up with rapid pace of development.red[*] .footnote[.red[*] In theory. Data-flow analysis is slow in practice.] ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ### Pros ### Cons ] .right-column[
For AppSec Teams
* Significant initial time investment in setting up and tuning * (Usually) Large recurring time investment triaging findings * Boring job for app sec team * Pushing to devs damages trust ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ### Pros ### Cons ### Tools ] .right-column[
Popular Tools
Ruby on Rails: [@presidentbeef/brakeman](https://github.com/presidentbeef/brakeman) Python: * [python-security/pyt](https://github.com/python-security/pyt) * [@PyCQA/bandit](https://github.com/PyCQA/bandit) Java: [spotbugs/spotbugs](https://github.com/spotbugs/spotbugs) ([find-sec-bugs](http://find-sec-bugs.github.io/)) PHP: [RIPS](http://rips-scanner.sourceforge.net/) .NET: [pumasecurity/puma-scan](https://github.com/pumasecurity/puma-scan) C/C++: [Clang Static Analyzer](https://clang-analyzer.llvm.org/) Massive list of linters, code quality checkers, and security-related tools: https://github.com/mre/awesome-static-analysis ] --- layout: false .left-column[ ## Static Analysis ### tl;dr ### Pros ### Cons ### Tools ### Learn more ] .right-column[
Books
### Reasonably Practical * [Secure Programming with Static Analysis](https://www.amazon.com/Secure-Programming-Static-Analysis-Brian/dp/0321424778/) ### A bit more academic * [Compilers: Principles, Techniques, and Tools](https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811) * [Engineering a Compiler](https://www.amazon.com/Engineering-Compiler-Keith-Cooper/dp/012088478X) ### Abandon all hope * [Principles of Program Analysis](https://www.amazon.com/Principles-Program-Analysis-Flemming-Nielson/dp/3540654100/) ] --- layout: false .left-column[ ## Dynamic Taint Analysis ### tl;dr ] .right-column[ ### AKA Taint Tracking * #### Same goals as static taint analysis. * #### Attach tags to data structures * #### Low-level instrumentation, e.g. Java Bytecode, x86. * #### Exercise the program through test suites, fuzzing, etc. ]
.center[]
--- layout: false .left-column[ ## Dynamic Taint Analysis ### tl;dr ### Propagation ] .right-column[ ### Taint Propagation Similar to static taint analysis, determine taint policy * Sources, Sinks * Taint propagation ``` x = y + z run_query(x) ``` * Implicit flows Control dependent flow from A to B can lead to information leaks ``` if (A) then B = true, else B = false ``` ] --- layout: false .left-column[ ## Dynamic Taint Analysis ### tl;dr ### Propagation ### Challenges ] .right-column[ * Overtaint ```javascript function_table = [fn1, fn2, fn3, fn4, fn5]; x = partially_tainted_source(); // e.g. network header function_table[x[0]](data) // tainted? ``` * Requires buildable source and instrumentation of language runtime * Handling implicit flows leads to overtaint * Inherent performance degradation * Gets worse as taint tags get more granular. Taint every bit? Every byte? Objects? * Propagation across native interfaces ] --- layout: false .left-column[ ## Dynamic Taint Analysis ### tl;dr ### Propagation ### Challenges ### Learn More ] .right-column[
Papers
* [All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution](https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) * [TaintDroid](https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Enck.pdf) * [Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software](http://valgrind.org/docs/newsome2005.pdf)
Tools
* [libdft](https://www.cs.columbia.edu/~vpk/research/libdft/) * [Triton](https://triton.quarkslab.com) ] --- class: center, middle, inverse # Symbolic Execution --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## General Approach 1. Treat inputs as *symbolic*, rather than concrete values * User/network input, file reads, env variables, ... 1. Keep track of conditionals/loop predicates as path predicates 1. To get to a specific LOC, see what concrete values of the symbolic inputs satisfy the constraints Running a program shows you .red[one execution path] through the program. Symbolic execution aims to be able to reason about .red[all paths] through the program, by using a constraint solver. * Uses constraint solver to cover new branches ## Applications * Automatic test-case generation * Finding bugs * Reversing ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python *name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python name = input() # name = ? if name == "admin": * cmd = input() # name = "admin" if cmd == "die": crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": * crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": crash() # name = "admin", cmd = "die" else: * run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: * passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: * auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Example ```python name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: * print("Error") # name != "admin", len(passwd) >= 20 ``` ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ] .right-column[ ## Working with Your Constraints ```python name = input() # name = ? if name == "admin": cmd = input() # name = "admin" if cmd == "die": * crash() # name = "admin", cmd = "die" else: run_cmd(cmd) # name = "admin", cmd != "die" else: passwd = input() # name != "admin", passwd = ? if len(passwd) < 20: auth(name, passwd) # name != "admin", len(passwd) < 20 else: print("Error") # name != "admin", len(passwd) >= 20 ``` "Tell me how to get here." 1. Pass constraints to SMT solver 1. Suppy "`admin`" for username and "`die`" for command. 1. ??? 1. Bugs! ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ### Pros ] .right-column[ ### Pros * Input generation is amazing * High-coverage test-suites * Can reason about entire symbolic ranges, not just individual values ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ### Pros ### Cons ] .right-column[
Cons
* Path explosion * Cannot generate inputs if constraints cannot be solved *
Concolic
techniques alleviate some of these problems * Requires implementation of symbolic interpreter * Concolic testing addresses this * Concolic testing starts with initial concrete input * Uses constraint solver to drive down new execution paths ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ### Pros ### Cons ### Tools ] .right-column[ ### Source code [Klee](https://klee.github.io/) - a symbolic VM built on top of LLVM ### Binary [BitBlaze](http://bitblaze.cs.berkeley.edu/) project by UC Berkeley [angr](http://angr.io) by UCSB / Shellphish [Triton](https://triton.quarkslab.com/) by Quarkslab - dynamic binary analysis framework * Includes a dynamic symbolic execution engine, taint engine, SMT solver interface, ... [illera88/Ponce](https://github.com/illera88/Ponce) - IDA plugin that uses Triton * Point and click symbolic execution ] --- layout: false .left-column[ ## Symbolic Execution ### tl;dr ### Pros ### Cons ### Tools ### Learn more ] .right-column[
Academic Papers
[KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs](https://www.doc.ic.ac.uk/~cristic/papers/klee-osdi-08.pdf) Cristian Cadar et al [All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but might have been afraid to ask)](https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) Edward Schwartz et al [A Survey of Symbolic Execution Techniques](https://arxiv.org/pdf/1610.00502.pdf) Roberto Baldoni et al
Talks
[25 Years of Program Analysis](https://www.youtube.com/watch?v=XL9kWQ3YpLo) Yan Shoshitaishvili, DEF CON 25 ] --- class: center, middle, inverse # Fuzzing --- layout: false .left-column[ ## Fuzzing ### tl;dr ] .right-column[
Throw pasta at the wall, see what crashes
Core idea: throw a bunch of input at a program, investigate when it crashes. #### There are three main approaches to fuzzing: 1. Mutation-based 1. Generation-based 1. Evolution-based Each have strengths and weaknesses. ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ] .right-column[ ## 1. Mutation-based approach 1. Start with known good inputs ("input corpus") 1. Choose a mutation strategy (flipping bits, injecting byte combinations) 1. Give mutant inputs to the app ### Pros * Easy to get started * Requires little to no knowledge of input structure ### Con * Can be tough to find non surface-level bugs ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ] .right-column[ ## 2. Generation-based approach 1. Define a spec for how input should be structured 1. Define a list of bad values for data types in the input 1. Generate all possible inputs for a structure using the bad values 1. Send the generated inputs to the program ### Pros * Can get better / deeper coverage * Can handle complex input types with dependencies ### Con * Requires deep understanding of the input or protocol ```python # github.com/jtpereyda/boofuzz-ftp/blob/master/ftp.py def initialize_ftp(session, username, password): s_initialize("user") s_string("USER") s_delim(" ") s_string(username.encode('ascii')) s_static("\r\n") ... ``` ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ] .right-column[ ## 3. Evolution-based approach 1. Mutate input from a known good corpus 1. Monitor execution for some property (e.g. branches taken) 1. If this input discovers a new branch, put it back in the corpus ### Pros * Can be *really* effective at finding bugs * Even hard to reach ones ### Con * Generally requires source code to get the full value * Requires building one or more test harnesses ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ### Pros ] .right-column[ ## Speed and Scalability * Depending on type of fuzzing, fast to get set up * Surprisingly effective at finding bugs * You can let it run continuously, periodically triage crashes * Quite useful on C/C++ code bases ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ### Pros ### Cons ] .right-column[ ## Code Coverage is Hard * Fuzzing can get stuck in shallow, uninteresting parts of the code * e.g. Initial input has complicated structure, checksums, length fields, etc. * Deep, nested conditionals or complex program state that must hold to reach interesting functionality * Stateful protocols / fuzzing network services is hard * Fuzzing complicated input formats or protocols requires codifying them upfront * How much fuzzing is "good enough"? ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ### Pros ### Cons ### Tools ] .right-column[ ## Popular Fuzzers #### Mutation-based * [@akihe/radamsa](https://gitlab.com/akihe/radamsa) * [samhocevar/zzuf](https://github.com/samhocevar/zzuf) - intercepts file operations and changes random bits in the program's input #### Generation-based * [@jtpereyda/boofuzz](https://github.com/jtpereyda/boofuzz) #### Evolution-based * [American fuzzy lop (afl)](http://lcamtuf.coredump.cx/afl/) by lcamtuf * [libFuzzer](https://llvm.org/docs/LibFuzzer.html) by LLVM * [google/honggfuzz](https://github.com/google/honggfuzz) ## Misc * [Echidna, a smart fuzzer for Ethereum](https://blog.trailofbits.com/2018/03/09/echidna-a-smart-fuzzer-for-ethereum/) by Trail of Bits * [googleprojectzero/domato](https://github.com/googleprojectzero/domato) - DOM fuzzer * [jakobbotsch/Fuzzlyn](https://github.com/jakobbotsch/Fuzzlyn) - Fuzzer for the .NET toolchains ] --- layout: false .left-column[ ## Fuzzing ### tl;dr ### Pros ### Cons ### Tools ### Learn more ] .right-column[
Books
*
Fuzzing: Brute Force Vulnerability Discovery
Michael Sutton et al
Blog Posts
*
Technical "whitepaper" for afl-fuzz
MichaĆ Zalewski (lcamtuf)
*
Project Triforce: Run AFL on Everything!
Jesse Hertz and Tim Newsham, NCC Group
Conference Talks
*
Follow the White Rabbit: Simplifying Fuzz Testing Using FuzzExMachina
Bhargava Shastry et al, BlackHat USA 2018
] --- class: center, middle, inverse
Fuzzing
Symbolic Execution
--
Fuzzing
is fast and good at finding bugs it can reach
--
Symbolic execution
is slow but can solve complex constraints
--- layout: false .left-column[ ## Hybrid Approaches ] .right-column[
Fuzzing + Symbolic Execution
Academic Papers
*
Automated Whitebox Fuzz Testing
Patrice Godefroid et al (2008)
*
Billions and Billions of Constraints: Whitebox Fuzz Testing in Production
Ella Bounimova et al (2013)
* [Driller: Augmenting Fuzzing Through Selective Symbolic Execution](https://www.cs.ucsb.edu/~vigna/publications/2016_NDSS_Driller.pdf)
Nick Stephens et all (2016)
Articles
* Trail of Bits' articles on the [Cyber Grand Challenge](https://blog.trailofbits.com/category/cyber-grand-challenge/) * [Ask HN: What is the emerging state of the art in fuzzing techniques?](https://news.ycombinator.com/item?id=12078243) (2016) ] --- class: center, middle, inverse # Overall Resources --- layout: false .left-column[ ## Overall Resources ] .right-column[
Conference Talks
On integrating static and dynamic analysis into the SDLC.
How Leading Companies Are Scaling Their Security
Clint Gibler, AppSec EU 2018
https://bit.ly/GiblerAppSecEU2018_DevSecOps
Practical Tips for Defending Web Applications in the Age of DevOps
Zane Lackey, BlackHat USA 2017
[slides](https://www.blackhat.com/docs/us-17/thursday/us-17-Lackey-Practical%20Tips-for-Defending-Web-Applications-in-the-Age-of-DevOps.pdf) | [video](https://www.youtube.com/watch?v=IvdKtf3ol2U) ] --- layout: false .left-column[ ## What should I use, when?.red[*] ] .right-column[
Core Take-aways
* Note: None of these will find business logic flaws * Tune/customize existing tools, try not to roll your own * There's no silver bullet * Different tools can be better or worse on different websites, languages, frameworks
Static Taint Analysis
(SAST)
Large, monolithic code bases in Java, C, C++
Willing to invest significant time in tuning and triaging
Dynamic languages (e.g. Ruby, Python, JavaScript)
Microservice architectures
Time constrained AppSec team in agile dev shop
.footnote[.red[*] These are rules of thumb, every situation is different.] ] --- layout: false .left-column[ ## What should I use, when?.red[*] ] .right-column[
Dynamic Taint Analysis
Tracking input you control through binaries
When you can build/run the program to test
When large or complex input is touched by many parts of the program and spreads everywhere
Symbolic Execution
Input generation to reach a certain program point
Solving complex constraints placed on inputs
Handling massive, complex code bases unassisted
Fuzzing
C/C++ code bases (can be used on other langs)
Targeting code responsible for parsing
You have a lot of computing resources and/or time
Stateful protocols (e.g. networking)
Very complex input structure requirements
Complex logic that must be bypassed to reach interesting functionality
] --- class: center, middle, inverse
Automated Bug Finding in Practice
### Daniel Defreez ([@defreez](https://twitter.com/defreez)), Clint Gibler ([@clintgibler](https://twitter.com/clintgibler)) ## Questions? .footnote[Check out more of our work: https://programanalys.is | [@programanalysis](https://twitter.com/programanalysis)]