- tl;dr sec
- Lessons Learned from the DevSecOps Trenches
Lessons Learned from the DevSecOps Trenches
In a sentence: Learn how Netflix, Dropbox, Datadog, Snap, and DocuSign think about security. A masterclass in DevSecOps and modern AppSec best practices.
Panel: Lessons Learned from the DevSecOps Trenches
Clint Gibler, Research Director, NCC Group twitter, linkedin
Dev Akhawe, Director of Security Engineering, Dropbox twitter, linkedin
Doug DePerry, Director of Product Security, Datadog twitter, linkedin
Divya Dwarakanath, Security Engineering Manager, Snap twitter, linkedin
John Heasman, Deputy CISO, DocuSign linkedin
Astha Singhal, AppSec Engineering Manager, Netflix AppSec Cali, Santa Monica, twitter, linkedin CA. January 25th, 2019
🔖 summary 💬 abstract 📹️ video
You can read the full transcript here.
Though the security teams may have different names at different companies (e.g. AppSec vs ProdSec), they tend to have the same core responsibilities: developer security training, threat modeling and architecture reviews, triaging bug bounty reports, internal pen testing, and building security-relevant services, infrastructure, and secure-by-default libraries.
Everyone built their own internal continuous code scanning platforms that essentially run company-specific
greps that look for things like hard-coded secrets, known anti-patterns, and enforcing that secure wrapper libraries are being used (e.g. crypto, secrets management).
SAST and DAST tools were generally not found to be useful due to having too many FPs, being too slow and not customizable, and failing to handle modern frameworks and tech (e.g. single page apps).
Everyone emphasized the important of building secure-by-default wrapper libraries and frameworks for devs to use, as this can prevent classes of vulnerabilities and keep you from getting caught up in vuln whack-a-mole.
This can be hard if you have a very polyglot environment but it’s worth it.
Determine where to invest resources by a) reviewing the classes of bugs your company has had historically and b) have conversations with dev teams to understand their day-to-day challenges.
Building relationships with engineering teams is essential to knowing relevant upcoming features and services, being able to advise engineering decisions at the outset, and spreading awareness and gaining buy-in for secure wrappers.
When you’re building a tool or designing a new process you should be hyper aware of existing developer workflows so you don’t add friction or slow down engineering. Make sure what you’ve built is well-documented, has had the bugs ironed out, and is easy for devs to use and integrate.
If possible, include features that provide value to devs if they adopt what you’ve built (e.g. telemetry) and try to hitch your security efforts to the developer productivity wagon.
Invest in tooling that gives you visibility - how is code changing over time? What new features and services are in the pipeline? What’s happening to your apps in production?
Netflix has gotten value from an internal security questionnaire tool they’ve built, while Snap and Dropbox had their version rejected by dev teams. This was due to wanting to have in-person discussions and the lack of collaboration features, respectively.
While everyone agreed on the importance of having strong relationships with engineering teams, John argued that individual relationships alone are not sufficient: dev teams grow faster than security teams and people move between teams or leave the company. Instead, you need to focus on processes and tooling (e.g. wrapper libraries and continuous scanning) to truly scale security.
For most of the panel members, the security teams wrote secure wrappers and then tried to get devs to adopt them. The Dropbox AppSec team actually went in and made the code changes themselves. This had the benefit of showing them that what they thought was a great design and solid code actually had poor dev UX and high adoption friction.
Don’t spend too much time trying to ensure you’re working on the perfect task to improve your company’s security. Choose something that makes sense and get started!
This section condenses and summarizes the points made by each speaker in the order they occur in the video.
You can read this instead of the full transcript if you want the core ideas. Don't worry, you can watch the video for the witty banter later.
SDLC / Security Automation
Datadog focuses on visibility, which they call ProdSecOps. Members of Doug’s team do 1-week rotations where they invest in building useful tooling that does things like continuously scan newly submitted source code (for security anti-patterns as well as when specific, critical files are changed, such as those reponsible for crypto, authentication, authorization, etc.).
They also scan Trello cards that have a “security” label or other relevant keywords to gain continuous insight into potentially security-relevant features that developers are working on, without having to attend every stand-up meeting, which doesn’t scale.
Sidenote: I liked these and other neat tips and tricks I heard from various companies so much that I included a number of them in my AppSec EU 2018 slides.
Divya’s team at Snap similarly has built tooling to scan new code for likely misconfigured auth, are they using the recommended key manager to store secrets, if you’re implementing a some functionality that they’ve built a secure-by-default framework to help with, their tool will comment on the PR to recommend the developer to use what they’ve built.
DocuSign built all of their AppSec training inhouse, which was a huge time investment, but was worth it. The key insight was that they realized that at DocuSign it was rare for develoopers to start building an app completely from scratch– generally they’ll be using existing frameworks and components, and so their training primarily focuses on what devs need to know to use those securely.
DocuSign tries to have many lighweight, low friction touch points with developers throughout the SDLC. For example, any developer can mention the AppSec team GitHub handle on a PR if they’d like to ask questions or have security take a look.
The Netflix AppSec team focuses on building a Paved Road for developers (Dianne Marsh at OSCON 2017, Astha Singhal and Patrick Thomas at AppSec Cali 2018), where the most common languages and frameworks receive significant support from the Netflix developer productivity and AppSec teams to make them have a smooth developer experience and significant secure-by-default guardrails (e.g. authn/authz, mTLS between services, etc.).
Previously, much of the tooling the Netflix AppSec team built was designed for them to consume, as they’re careful to only expose the output to devs if they’re highly confident that it’s something that devs should care about. More recently, the AppSec team has been leaning into a developer self-service model, for example, being able to build a service that can advise developers:
Dropbox uses their A/B testing framework to roll out new features early to a population that includes their bug bounty researchers so they get early feedback.
They also spends time reviewing the common bugs that occur through their SDLC to determine where they can write better libraries, build better developer education, or better static or dynamic analysis tools to detect and/or prevent these flaws.
They’ve gotten a lot of value from working with product managers during the design and ideation phase, as the PMs are thinking about the product even before it gets to developers.
Continuous Code Scanning
Dropbox uses Phabricator, which can scan a code diff and add a blocking security reviewer requirement and if certain static analysis issues are triggered. Phabricator comments on the PR why the security reviewer was added and if the offending code is removed, the dev can then merge it in without speaking to security. This approach has been effective both in empowering developers as well as optimizing for the security team’s time.
One example check is they flag any crypto code that does not use their internally developed, blessed framework.
Snap’s tool that comments on diffs generally doesn’t block merges, but it does give the devs a chance to mark an issue as a true or false positive, and the security team tracks these metrics and uses them to determine if they can improve the signal of a given check.
Dropbox - Checks that fire a lot that are not high risk, they move them to audit mode, which is post commit, rather than during code review, which allows devs to move quickly, and then security team can later come back to them and say, “Hey, this doesn’t follow best practices, can you please change it?”
DocuSign - Using static analysis to confirm the adoption of secure wrapper libraries is a much simpler problem than finding bugs. Their AppSec team wrote a number of wrapper classes around potentially dangerous operations and enforce their usage over the original framework versions using simple lightweight checks.
They got adoption of these frameworks by canvassing developers and determining what mattered to them. At DocuSign, they’re big on telemetry-driven design, so the AppSec team built in telemetry into all of the wrappers, for example, for preventing SSRF. Then they could go to devs and say, “Hey, use our component and you get monitoring for free.”
Netflix hasn’t gotten a lot of value from complex static analysis (e.g. data-flow analysis as provided by traditional SAST vendors like Fortify, Checkmarx, etc.).
Grep-ing for anti-patterns has been much more effective.
Secure Wrapper Libraries
DocuSign - Rolling out the secure libraries takes time, you don’t go from 0% to 100% in a week. They started by partnering with dev teams they had close relationships with, who wanted to adopt the libraries. That allowed the security team to iron out the bugs so that when they presented the libraries to teams who were more hesitant to adopt them, it was a smooth dev UX.
Dropbox - They went in and did they changes themselves. It felt right, because it didn’t seem fair to export their pain to the dev teams. This also has the benefit that you might think you’ve built something that’s easy to use and secure, but then you try to use it and it’s terrible. This gives you invaluable insight into how to redesign your approach to make it better and easier to use.
DocuSign chose to write the orchestration framework and glue because it enables you to plug in new tools or remove existing ones depending on how they’re performing. But if yuo rely on a vendor to provide the orchestration and glue, then you’re a bit dependent on them and it limits your flexibility.
Grep, even though it’s wrong more often than fancier static analysis, is less annoying to devs because it’s clear why it’s wrong.
DocuSign - They’ve had significant trouble getting value from DAST tools, for example, on single page apps. DAST tools seem to poorly support modern web development technologies. They did end up getting some value from partnering with the QA team, who had a great set of end-to-end functional tests, and added some security checks into those tests.
The Datadog ProdSecOps team tried scraping dev Slack channels for security-relevant keywords, like “hash,” “crypto,” or “MD5.” They thought they could pop in and be helpful, however, it was perceived as a bit creepy.
Snap tried to scale their security reviews by having devs fill out a customized security questionnaire (built on Google’s vendor questionnaire, VSAQ) that could provide semi-tailored advice, rather than requiring an AppSec team member to manually do a every review. However, it turns out devs wanted the person-to-person communication, so they’d just fill out the questionnaire in the quickest way possible.
Netflix built an internal security questionnaire service, Zoltar (see AppSec Cali 2019 video, slides #25), that has been pretty useful. They added features to auto-create Jira tickets based on what devs then had to do, but only 2 or 3 people ever used it.
Netflix spent 6-9 months getting an Internet-facing DAST scanner up and running, and it ended up finding basically nothing, ever.
Dropbox also tried to shift from doing threat modeling in a Dropbox Paper doc (like a Google Doc) to a custom threat modeling questionnaire. They hated it. The reason is because a Google Doc you can fill it out with your team- if you don’t know an answer, you can write comments, ask questions, and collaboratively fill it out the product manager, designer, your eng manager, tech lead, etc. You can’t do that with a survey.
How to store secrets for your security tools and infra?
A: Use your existing secrets management solution that you use in prod for other purposes.
Astha - Even before getting into security automation, it’s probably more important to build solid foundational security services, such as a secrets management solution, that you can point developers to that they can use to solve the problems you’re finding.
John - Further, these solutions need to be well-documented and easy to use.
For a new or small AppSec team, where do you start?
Astha - First, identify, for your business model, which teams are building the highest risk assets and build relationships with them. Work to understand the problems they’re facing and then use that to prioritize how you want to spend your time, because they can tell you a lot more about the security problems that exist in your ecosystem.
Doug - You have to brutally prioritize. Work on the things that are most likely to bite you the worst, while keeping a list of the other things that you can gradually get to as you have time and as the security team grows.
John - Seconds building relationships, which can be a great way to build out an AppSec team by hiring internally- start with a security champions program, and then persuade people in ops, QA, and engineering who are interested in security to join your team.
Review historical bug data to identify trends, identify systemic vulnerabilities, and try to solve those entire bug classes, for example using wrapper libraries. don’t play whack-a-mole.
Divya - If you’re the first AppSec hire, you need to really get to know the company’s infra and what you’re dealing with. Ask people a bunch of questions and you’ll find there are some clear gaps that you can immediately start tackling.
Regarding specific teams to partner with, if you’re at a B2C company, you’ll probably interact with the growth and identity teams a lot because their goals can be the opposite of yours, so stay in touch with them so you know the latest things they’re building and advise throughout the process.
Dev - One thing he’s seen others trip up on is analysis paralysis* where you spend too much time deciding what to do because you want to make sure you’re working on the optimal thing.
Instead, like database query optimization engines - give yourself some fixed amount of time to make a decision on what to tackle and then start doing it. Then, after a bit, re-evaluate, and start working on something else if it’s clearly better ROI.
If you had to choose between investing in building security automation (e.g. scanning every commit) vs secure wrapper libraries, which would you focus on and why?
Dev - Dropbox focuses on wrapper libraries because most of their prod code uses a small number of supported frameworks and tech stacks, and they were a monolith for a long time. (Note: Dev’s colleague, Hongyi Hu gave an excellent talk, also at AppSec Cali 2018, about securing polyglot internal apps.** If you have to support many frameworks and languages this approach may not make sense.
John - You may initially be able to get things done via personal relationships, but as the size of the company and engineering org grows, you’re going to get to the point where it doesn’t scale and that you instead need to rely on processes. Also, individuals come and go from a company, while wrapper libraries require an (admittedly large** one-time investment.
Astha finds that the relationship component is crucial in her experience for getting dev teams to be aware of adn adopt the secure defaults you’ve built.
Dev - You need to spend the engineering time building the core security building blocks for devs before processes become useful. A process that asks devs, “What are you going to do about secrets management?** isn’t useful if you don’t yet have a good solution for them, so he’d invest in building the solution first.
Divya - It depends on the problems you’re trying to solve, the best way to solve them, the resources you have, and the current maturity of what you’re trying to do.
How do you handle retention?
John - Align their personal interests with the interests of the team and company. For example, encourage them to submit to conferences, but on research that is useful to the company.
Divya - Put people in the right roles, give them the autonomy to think big and to tackle big security problems. If people think they’re working on cool stuff, they won’t leave.
Doug and Astha - Tech moves fast, and healthy attrition is a thing, sometimes people want to move on.
Astha - The most important thing is build a culture that people want to work on. People leave when they feel like they’re not growing or if they don’t identify with the company’s culture.
Dev - There’s nothing unique about security. Talk with your HR partners and other experienced people about building a good culture.
John - Maybe you want some AppSec engineers to leave the team. He had a member of his team leave to join an engineering team. It’s valuable to have AppSec alumni throughout the compay - they know how you work, they’ll build things well, and they’ll come to you when they need to.
How do you automate inventory and discovery?
Astha - The Netflix AppSec team has been realizing more and more that this is a foundational aspect of running a successful security program. They’ve been able to rely on existing cloud infra tooling that’s already used for releasing software. They’re currently working on applying a security lens to it: how do we gather all of the data from different sources, correlate it, and then build intelligence on top like: what is the risk of a given app?
One Sentence Takeaways
Doug - first, invest in gaining visibility, then start automating once you know exactly the situation you’re in and the data sets you’re dealing with.
Your automation, your solution, and when and where you implement that is going to be very different based on your org.
John - We don’t discuss it much on this panel, but logging and monitoring are also essential- yes you try to find all bugs early, but ultimately some will make it through. At DocuSign, any time an engineer makes a fix, they encourage them to add some telemetry that will help the security team. An interesting application is then teaching the IR team to understand the importance of this telemetry and what it means in the context of your app.
Productivity engineering teams build tooling that has a big downstream impact, so try to get your security tooling in by default in those tools or try to leverage some of the visibility tooling they’re building. Devs want to be more productive, so they are really committed to that tooling.
Dev - Do things that seem like a good idea and then move onto other things. Keep going.
Stay in Touch!
If you have any feedback, questions, or comments about this post, please reach out! We’d love to chat.