- tl;dr sec
- Posts
- [tl;dr sec] #323 - Anthropic Mythos, Security Program Politics, Vulnerability Research is Cooked
[tl;dr sec] #323 - Anthropic Mythos, Security Program Politics, Vulnerability Research is Cooked
New model finds thousands of 0-days and writes exploits, lessons and how to be influential from decades of being a CISO, why LLMs will democratize elite vuln hunting
Hey there,
I hope you’ve been doing well!
🏫 High School Reflections
As you might guess from the fact that I write a cybersecurity newsletter, I was pretty cool in high school.
This week I randomly came across this YouTube video Learn to Solve an Integral (What Makes You Beautiful Parody), and it took me back.
I remember algebra and calculus being really fun, I would have loved to make a song like this.
I remember being in marching band, playing during football games. The drum line had this one riff where whenever they started playing it the band and the entire student body would start moshing. At one point they got banned from playing it because it made people too rowdy 😂
(“One time, at…”) Band camp before the school year started was a blast, we always had these epic Halo tournaments. 8 vs 8 CTF, four TVs, two in each room. Mountain Dew and smack talking galore.
I took two programming classes, Visual Basic and Java, but I actually didn’t like them, and I didn’t really “get it.” It might not have helped that neither teacher really knew how to program (one was a football coach 🤷). Kind of ironic given what I do now.
I wish I journaled more, it’d be fun to look back on.
I wonder how your childhood and high school formed who you are today 🤔
P.S. My friend and former colleague Peter Greko is open to new opportunities. He worked on the AI red team at Microsoft, and most recently Block.
Sponsor
Register for a brand new research-focused webinar series from Push Security
Join Push Security threat researchers along with incredible guests like John Hammond, Troy Hunt, and Matt Johansen in a brand new webinar series deep-diving into the State of Browser Attacks.
The browser is the place where modern breaches happen, powered by a huge amount of attacker innovation — countless ClickFix variants, new malvertised phishing campaigns intercepting users on search engines, and device code phishing attacks being powered by brand new PhaaS kits and AI tools. And we’re only in April.
Get ahead of this threat evolution and register your spot now!
👉 Register here 👈
Push Security consistently has solid security research content, this series is going to be good. And John Hammond and Matt Johansen are both great and good friends 👍️
AppSec
Remediation at Scale: What High-Performing AppSec Teams Do Differently
My colleagues at Semgrep did some interesting data crunching across 50k+ repos across 400+ organizations, analyzing stats on fix rate and mean time to remediation across SAST, SCA, and severity level, if certain vulnerability classes are harder to remediate, if catching vulnerabilities earlier improves remediation rates, and more.
salesforce/url-content-auditor
A security auditing tool designed to detect sensitive data exposure in publicly accessible web content. It systematically scans, extracts, and audits images, PDFs, and video files using AI-powered analysis (Google Gemini API) to identify potential data leaks, compliance violations, and privacy risks.
Meet Vespasian. It Sees What Static Analysis Can’t.
Praetorian's Blayne Dreier, Nathan Sportsman et al release Vespasian, an open-source tool that generates API specifications (OpenAPI 3.0, GraphQL SDL, WSDL) by observing real HTTP traffic from headless browser crawls (powered by Katana) or importing existing captures from Burp Suite, HAR files, or mitmproxy. The tool uses a two-stage pipeline: first capturing traffic with full JavaScript execution to catch dynamically-constructed API calls that static analysis misses (or import traffic), then classifying requests using confidence-based heuristics, deduplicating endpoints via path normalization, and probing for metadata through OPTIONS requests, GraphQL introspection, and WSDL fetching.
Organizational Politics & The Security Program
Phil Venables describes how organizational politics are a necessary skill for security leaders, arguing it's the application of influence to achieve outcomes, not something inherently negative. Phil shares lessons from decades as a CISO and Chief Risk Officer, including: decisions are pre-ordained outside formal meetings through advance consensus-building, you should embed security into existing business processes and budgets rather than creating separate initiatives, and building broad support across the organization (not just relying on your boss) is critical for program longevity.
Some key tactics: leveraging the Risk = Hazard + Outrage equation to prioritize work, using Force Field Analysis to understand what's preventing change, and connecting disparate teams across the organization to build political capital beyond just security outcomes.
“If you’re going into a meeting or committee and you don’t already feel confident on the outcome then you’ve missed the point and will have likely not done the work to line up support for the outcome you want. Remember, committees are the roots of power structures not the structure themselves.”
“Many of the security programs I’ve run also drove improvements in reliability, development agility, product features, and more. These came from observations of issues in the security processes that we could have ignored as being outside of our lane, but we decided to press and got support in doing so.”
“Remember, published organization charts almost never actually represent the true organization structure in terms of influence. That has to be discovered by you.”
💡 So many good insights, as you’d expect from a Phil Venables post 🤯
Sponsor
📣 Axios. Trivy. LiteLLM. More are coming. Root stops compromised dependencies.
Recent software supply chain attacks just rewrote the playbook. Here's the kicker: they won't have a CVE and they won't be triggered by a scanner. These attacks exploit the one thing every pipeline trusts blindly: upstream dependencies. Compromised maintainers, hijacked registries, silent tag overwrites. By the time you notice, you've already built and shipped it. The only fix is controlling what enters your environment. Root pins every dependency to verified, known-good versions and backports security patches without forced upgrades so you stay secure without breaking your build.
Pinning dependencies to known-good versions would prevent so many recent supply chain attacks, and I could see backported security patches saving weeks to months of dev time, depending on the company. Neat 👍️
Cloud Security
gabrielPav/aws-preflight
By Gabriel Pavel: A security linter for AWS CLI commands that catches misconfigurations before execution, featuring 703 checks across 91 AWS services. The tool analyzes commands for issues like missing IMDSv2 enforcement, unencrypted storage, public accessibility, overly permissive IAM policies, and disabled logging.
notyet: Open-Source Tool to Test AWS IAM Credential Revocation Gaps
OFFENSAI's Eduard Agavriloae has released notyet, an open-source tool that exploits AWS IAM's eventual consistency (the ~4 second propagation window where disabled or deleted credentials remain valid) by continuously monitoring for defender actions (key deletion, policy detachment, role removal) and automatically responding with credential rotation, role assumption, policy persistence, and defensive action stripping. Notyet can help IR teams test whether their containment playbooks work against automated adversaries.
Enforcing AI Governance Across AWS Organizations
Sonrai Security describes how to enforce AI governance across AWS Organizations using Service Control Policies (SCPs) and Bedrock Policies to centrally manage AI service access. The post provides examples for: preventing access to the AWS control plane through AWS’s managed MCP servers, org-wide Bedrock policies for blocking prompt injection attacks, disabling specific AI services like Bedrock AgentCore, controlling model family availability, and preventing long-term Bedrock API key creation/use.
Supply Chain
elastic/supply-chain-monitor
By Elastic: Automated monitoring of the top PyPI and npm packages for supply chain compromise. Polls both registries for new releases, diffs each release against its predecessor, and uses an LLM (via Cursor Agent CLI) to classify diffs as benign or malicious.
lirantal/npm-security-best-practices
By Liran Tal: A curated and practical list of security best practice for using npm packages. Safe-by-default npm package manager command-line options, hardening against supply chain attacks, deterministic and secure dependency resolution, etc.
DriftlessAF: Introducing Chainguard Factory 2.0
Chainguard’s Matt Moore, Manfred Moser, and Maxime Greau announce DriftlessAF (GitHub), an agentic reconciliation framework that replaced their event-driven Factory 1.0 architecture. DriftlessAF uses AI-powered reconciler bots in a Kubernetes-style reconciliation loop to continuously compare desired state (zero CVEs, latest packages) against actual state across 2,000+ containers and hundreds of thousands of package versions, by reasoning about unstructured data and creating self-healing workflows that can safely discard failed work items.
The framework includes Terraform modules for event-driven reconciliation infrastructure, a multi-regional work queue, and Go packages for GitHub repository, OCI container, and APK package reconciliation. Engineers now review AI-generated pull requests and package updates instead of creating them manually, while the system autonomously manages hundreds of thousands of package versions and CVE patch backports.
💡 This seems like some impressive engineering, and cool that they open sourced it! I think we’ll see more of this going forward, with agents autonomously building and mending software. See OpenAI’s Harness Engineering blog or Simon Willison’s Agentic Engineering Patterns for more.
AI + Security
Quicklinks
step-security/dev-machine-guard - Scan your dev machine for AI agents, MCP servers, IDE extensions, and suspicious packages.
Is Your Identity Security Keeping Up with AI? | Delinea 2026 Report - 87% of organizations say they’re ready for AI—but nearly 50% admit they can’t fully track AI and non-human identities accessing critical systems. That gap creates unmanaged access and standing privileges. Read Delinea’s 2026 Identity Security Report to learn more.*
knostic/AgentSonar - Detect shadow AI agents by monitoring network traffic and classifying process-to-domain pairs.
*Sponsored
Vulnpocalypse: AI, Open Source, and the Race to Remediate
Nice post by my bud Chris Hughes synthesizing a number of stats, posts, related work, and his interviews on AI finding vulnerabilities, time to exploitation, patching challenges, etc.
On LLMs and Vulnerability Research
Devansh Batham gives a number of arguments refuting that LLMs can’t understand code or find meaningful vulnerabilities, and I think makes an interesting case that LLMs + coding harnesses will likely be able to find “novel” or “creative” new vulnerability classes, as these classes are really just combinations of known primitives.
For example, HTTP request smuggling is really: ambiguous protocol specification + inconsistent parsing between components + a security-critical assumption about message boundaries. And prototype pollution RCEs in JavaScript frameworks: injection + type confusion + privilege boundary crossing. This novel composition of primitives is what LLMs are increasingly good at. “Most of what we call novel vulnerability research is creative recombination within a known search space.”
Vulnerability Research Is Cooked
Excellent post by Thomas Ptacek on the history of vulnerability research, and how LLMs are uniquely suited for finding and exploiting vulnerabilities because they already encode vast correlations across source code, understand all documented bug classes (stale pointers, type confusion, allocator grooming), and excel at the pattern-matching and constraint-solving required to chain subtle framework details into exploits.
Thomas argues that this capability will democratize elite exploit development beyond high-value targets like Chrome to everything from databases to printers, overwhelming open source maintainers with verified high severity reports, making closed-source protection irrelevant (agents can reason directly from assembly), and potentially triggering bad AI security regulations that fail to recognize asymmetric costs on defenders.
“We’ve been shielded from exploits not only by soundly engineered countermeasures but also by a scarcity of elite attention.”
“Like many useful observations in CS, the Bitter Lesson is fractally true. It’s about to hit software security like a brick to the face.”
See also their podcast with Nicolas Carlini.
Project Glasswing: Securing critical software for the AI era
Anthropic announces Project Glasswing, a collaboration with AWS, Apple, Google, Microsoft, NVIDIA, and others to use Claude Mythos Preview—an unreleased frontier model that has already discovered thousands of high-severity vulnerabilities across major operating systems and web browsers—for defensive security purposes.
Anthropic is providing access to 40+ organizations building critical infrastructure along with $100M in usage credits and $4M in direct donations to open-source security organizations. Launch partners include CrowdStrike, Palo Alto Networks, and Cisco. They’ve also donated $2.5M to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5M to the Apache Software Foundation.
💡It’s great that Anthropic is gathering a group of partner companies to collaborate with on this, and I appreciate the sizable investment in helping secure the software ecosystem more broadly ($100M in usage credits is no joke).
Prediction: models from multiple labs are going to keep getting better, and specifically better at cybersecurity, but those will mostly not be available except to trusted parties due to the risk of abuse. I’m glad that folks take this risk seriously.
Assessing Claude Mythos Preview’s cybersecurity capabilities
Anthropic’s Nicholas Carlini et al discuss Mythos Preview’s capabilities in finding and exploiting zero-days in open source software, and its ability to reverse engineer exploits on closed-source software, turning N-day (known but not yet widely patched) vulnerabilities into exploits. “Over 99% of the vulnerabilities we’ve found have not yet been patched.”
“In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD’s NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets.”
“We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.”"
Methodology:
They launched a container (isolated from the Internet and other systems) that runs the project-under-test and its source code.
They invoke Claude Code with Mythos Preview and prompt it to find bugs, and produce a proof-of-concept exploit and reproduction steps if found.
To encourage Claude to focus on different parts of the code base, they first have Claude rank each file in the project from 1-5 on how likely it is to have bugs.
They then invoke many copies of Claude in parallel, tasking each run focus on one of the most interesting files.
Finally they ask Mythos to triage findings from prior steps.
The most critical bug they found in OpenBSD, after a thousand runs of their scaffold, cost ~$20,000 total, and found several dozen more findings.
“In 89% of the 198 manually reviewed vulnerability reports, our expert contractors agreed with Claude’s severity assessment exactly, and 98% of the assessments were within one severity level. If these results hold consistently for our remaining findings, we would have over a thousand more critical severity vulnerabilities and thousands more high severity vulnerabilities.”
“For multiple different web browsers, Mythos Preview fully autonomously discovered the necessary read and write primitives, and then chained them together to form a JIT heap spray.”
“We’ve used these capabilities to find vulnerabilities and exploits in closed-source browsers and operating systems. We have been able to use it to find, for example, remote DoS attacks that could remotely take down servers, firmware vulnerabilities that let us root smartphones, and local privilege escalation exploit chains on desktop operating systems.”
Regarding N-days: “We began by providing Mythos Preview a list of 100 CVEs and known memory corruption vulnerabilities that were filed in 2024 and 2025 against the Linux kernel. We asked the model to filter these down to a list of potentially exploitable vulnerabilities, of which it selected 40. Then, for each of these, we asked Mythos Preview to write a privilege escalation exploit that made use of the vulnerability (along with others if chaining vulnerabilities would be necessary). More than half of these attempts succeeded.”
See also:
AI Explained’s video: Claude Mythos: Highlights from 244-page Release.
Heidy Khlaaf also weighs in, specifically around how Project Glasswing and Mythos are not compared against existing tools, do not discuss false positive rates, and the amount of human evaluation (e.g. in triage) is not detailed.
Sean Heelan takes here and here, Tavis Ormandy here, and Alex Matrosov on the market underrating the gap between finding bugs and actually proving exploitability.
💡 This was a great, detailed write-up. The hashes as proof of unfixed vulnerabilities and exploits makes sense, I like it. It’d be nice to know a bit more about model costs, false positive rates, and the level (and cost) of human involvement, but overall a pretty good amount of detail.
Misc
Misc
Disgruntled researcher leaks “BlueHammer” Windows zero-day exploit - A researcher released the PoC for an unpatched Windows privilege escalation bug because they were unhappy with MSRC’s disclosure process.
Nick Collins - Introduction to Computer Music - Free ~350 page book.
Contrapunk - Real-time MIDI harmony generator and guitar-to-MIDI converter.
Advanced Search for YouTube - Filters like exact terms, exclude terms, title includes, video length, date before/after.
Alex Hormozi - How to make progress faster than everyone - On being cringe, trying hard, and not comparing your first chapter to someone’s 20th.
AI
voldemortensen/snark-driven-development - A Claude Code skill that wraps development workflows with sharp, substance-backed snarky commentary
GitHub issue: Claude Code is unusable for complex engineering tasks with the Feb updates
Lenny’s Podcast - How Anthropic added $11B in ARR in one month | Amol Avasare (Head of Growth, Anthropic)
Humanoid Robot Accidentally Slaps Boy During Public Demo in China - After reportedly saying, “You best watch yo’ mouth son!”
What is Rizzbot? Meet the AI Robot Rizzing Up Your Girl - Not gonna life, some of these videos are pretty hilarious
✉️ Wrapping Up
Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.
If you find this newsletter useful and know other people who would too, I'd really appreciate if you'd forward it to them 🙏
Thanks for reading!
Cheers,
Clint
P.S. Feel free to connect with me on LinkedIn 👋