• tl;dr sec
  • Posts
  • [tl;dr sec] #305 - AI SAST, Awesome Annual Security Reports, Block Risky Dependencies

[tl;dr sec] #305 - AI SAST, Awesome Annual Security Reports, Block Risky Dependencies

Open source AI SAST tools + vendor comparison, huge list of vendor security reports, GitHub Action to block risky dependencies

Hey there,

I hope you’ve been doing well!

🏡 Be a Host

This week I had the privilege of giving a talk to Airbnb’s security champions.

The talk was on secure defaults, applying AI to security, and where software development is headed in an AI-heavy world.

One of the especially cool parts of the experience was that one of the organizers of the event, Aika Sengirbay, is a long time friend. (Also H/T Mudita Khurana)

I remember meeting Aika about a decade ago when she was just breaking into security. Now she’s “Tech Lead Information Security” 🙌 

And Carlo and Michael Reisinger were there, former NCC Group friends who are now at Airbnb.

I really enjoy seeing people grow, get married and have kids, work all these places, and accomplish all of these amazing things 🤩

One nice thing about security being a small field I suppose.

I wonder how it’s going to feel in another 5 or 10 years.

Michael also makes music videos and movies. Pretty rad!

P.S. Semgrep just launched the private beta of our AI-powered detection, which blends Semgrep's deterministic engine with LLM reasoning. It's already found authorization bugs for companies that they were paying $thousands for when reported via bug bounty.

Sponsor

📣 Master the OWASP Top 10 for LLM Security

AI applications introduce new risks—especially when they handle sensitive data or operate autonomously. Our interactive experience, based on the OWASP Top 10 for LLMs, walks you through real-world threats and actionable steps across data, identity, and AI security. Whether you’re securing prompts, agents, or model access, this guide helps you strengthen your AI posture from the ground up.

This is a cool visualization, I like how the guide makes it really clear where each risk can occur, and then you can click in for more info. Nice 👍️ 

AppSec

BSides Seattle 2026 CFP
Closes November 21, 2025.

What's changed in the OWASP Top 10 for 2025
It’s not even Christmas and we get a new OWASP Top 10. Broken Access Control remains at #1. New categories: Software Supply Chain Failures and Mishandling of Exceptional Conditions.

Their methodology combines data from 2.8 million applications across 589 CWEs, with community survey input to balance historical testing data with emerging threats. Generally they aim for categories that focus on root causes rather than symptoms. SSRF has been rolled into Broken Access Control (not Injection? 🤔).

jacobdjwilson/awesome-annual-security-reports
HUGE collection of annual cyber security reports by Jacob Wilson from various vendors, categorized by Analysis or Survey, across: Industry Trends, Threat Intelligence, Application Security, Cloud Security, Vulnerabilities, Ransomware, Data Breaches, Physical Security, and AI and Emerging Technologies.

Introducing HTTP Anomaly Rank
Portswigger’s James Kettle introduces HTTP Anomaly Rank, an algorithm now integrated into Turbo Intruder and Burp Suite's API that automatically identifies the most interesting HTTP responses without manual sorting. This can be useful for finding subtle vulnerabilities (the ol’ “Hm, this looks weird”). The algorithm scores responses based on how different they are from others by calculating weights for various response attributes (status code, word count, etc.) based on their stability, efficiently highlighting anomalies even in noisy responses.

Sponsor

📣 5 Critical Microsoft 365 Security Settings You Might Be Missing

Set it and forget it? Not when it comes to M365 security. Configuration drift, admin sprawl, and risky integrations creep in over time, opening up security gaps that attackers love to exploit. This checklist from Nudge Security will help you catch common pitfalls and keep your environment secure.

M365 is complex 😥 Great to have a checklist for common gotchas and pitfalls 👆️ 

Cloud Security

2025 SANS CloudSecNext Summit
20 talk recordings now live.

Deploying to Amazon's cloud is a pain in the AWS younger devs won't tolerate
Corey Quinn describes the painful process and complexity of setting up a simple web app in AWS with some A+ snark. 100% agree. “And then you push your code and realize that, on balance, baby seals get more hits than your website.” 😂 

quinnypig/yeet
Be careful when you make jokes on the Internet, because Corey Quinn may just make them real. Yeet analyzes your project using Claude, figures out what’s in it, and then tells you the commands to run to deploy it. Also comes with yoten, which finds where you yeeted your slop, hits the URL, determines if it’s fire, mid, or cooked, and roasts you accordingly.

Weaponizing the AWS CLI for Persistence
Hector Ruiz Ruiz shows how AWS CLI aliases can be weaponized for stealthy persistence by creating a one-liner that executes malicious code while preserving the original command functionality. The technique works by creating an alias for a common command (e.g. aws sts) that dynamically modifies the alias file at runtime to call the original command so everything seems normal, then it runs arbitrary malicious commands (exfiltrating credentials, reverse shell, etc.) after and finishes by restoring the malicious alias.

Supply Chain

Heisenberg: How We Learned to Stop Worrying and Love the SBOM
AppOmni’s Yevhen Grinman and Max Feldman announce Heisenberg (GitHub Action), an open-source tool that automatically scans pull requests to detect risky or newly published dependencies before they merge.

It scans new or changed dependencies (from your lock/manifest), pulls health and risk signals (deps.dev, Snyk Advisor, Socket + heuristics), flags fresh publishes (<24h), flags deprecated/inactive packages, detects if there are postinstall scripts, and comments a report on the PR, optionally labels it for security review, and can fail the job on policy hits.

Heisenberg currently supports PyPI, npm/yarn, and Go ecosystems, and can also be used to quickly determine if/where you’re affected in the case of Yet Another NPM Attack (YANPMA): (beforehand) you generate an SBOM for every repo which Heisenberg can then use to search for the known compromised packages.

Split-Second Side Doors: How Bot-Delegated TOCTOU Breaks The CI/CD Threat Model
BoostSecurity’s François Proulx describes a new vulnerability class they call “Bot-Delegated Time-Of-Check to Time-Of-Use” (TOCTOU), where automation bots (often GitHub Apps) with elevated permissions can be tricked into promoting untrusted code from forks into trusted repositories.

The attack exploits a race condition between maintainer approval and code execution: in one case, a maintainer could comment “/ok to test” → malicious code pushed to the approved PR → bot copies the untrusted code into a new in-repo branch for automated testing (code execution within the victim repo → steal secrets).

Recommendations: all approvals must be pinned to an immutable object (e.g. commit SHA), prefer pull_request_review over issue_comment, remove workflow: write permissions if possible and use Repository Rulesets to block any identify from pushing directly to .github/workflows/, use Environment Secrets with Required Reviewers, and more.

How Cloudflare’s client-side security made the npm supply chain attack a non-event
Bashyam Anant et al describe how Cloudflare Page Shield detects malicious JavaScript (e.g. debug and chalk compromise, crypto drainers, …) by preprocessing JavaScript into Abstract Syntax Trees (ASTs) that are then classified as malicious or benign by a custom-trained model. Inference happens in under 0.3 seconds, with their current evals showing 98% precision and 90% recall. See also their technical blog post on how the custom model was trained.

💡 Detecting malware at the HTTP site delivery level is neat, but doesn’t protect devs from getting compromised when they build locally, who often have the access/API keys to the kingdom. I wonder if Cloudflare will extend this capability to protect the development process, not just production (e.g. local dev environments, CI/CD).

Blue Team

karlvbiron/MAD-CAT
By Karl Biron: MAD-CAT (Meow Attack Data Corruption Automation Tool) is a security tool designed to simulate data corruption attacks against multiple database systems (MongoDB, Elasticsearch, Cassandra, Redis, CouchDB, and Hadoop HDFS). The tool supports both single-target attacks and bulk CSV-based attack campaigns, with support for both credentialed and non-credentialed attack scenarios.

EvilBytecode/NoMoreStealers
A Windows kernel-mode minifilter driver that monitors file system access to protect against information-stealing malware, specifically, browser user data, cryptocurrency wallets, and communications apps (Discord, Telegram, Signal). Note that this is more of a proof of concept: it doesn’t monitor file writes, uses hardcoded-paths, and is vulnerable to file name spoofing.

All you need to know about JA3 & JA4 Fingerprints (and how to collect them)
Gabriel Alves explains the key differences between JA3 and JA4 fingerprints and how they can be used for network-based threat detection, for example identifying C2 communications in encrypted traffic. In short, JA[3-4+] fingerprints act as a unique “signature” of TLS clients, allowing the quick identification of malicious communications, such as those originating from C2 infrastructure or botnets, which frequently use custom or specific TLS libraries, resulting in a unique fingerprint.

J3 is a hash of a number of factors (opaque), J4(+) is a collection of smaller fingerprints that make it easier to reason about the TLS versions, ciphers, protocol, etc. The post provides practical guidance on collecting these fingerprints using Wireshark and implementing multi-fingerprint detection strategies with example YARA rules for identifying C2 communications.

AI + Security

arm/metis
By Arm's Product Security Team: Metis is an open-source AI-powered security code review tool that helps engineers detect subtle vulnerabilities, improve secure coding practices, and reduce review fatigue. It uses LLMs for deep semantic understanding, leverages RAG for context-aware reviews (ChromaDB or PostgreSQL with pgvector as vector store backends), supports C, C++, Python, Rust, and TypeScript through a plugin-based system. Metis offers both interactive and non-interactive modes.

Introducing SecureVibes: A Multi-Agent Security System
Three-part series by Anshuman Bhartiya describing how he built SecureVibes, an open-source multi-agent security system designed to find vulnerabilities in vibecoded applications by using four specialized AI agents (Assessment, Threat Modeling, Code Review, Report Generation, optional DAST Agent) that work in sequence to provide context-aware vulnerability detection.

The system outperformed single-agent approaches in testing, finding 78-89% more vulnerabilities than Claude Code and 4-4.25x more than Codex, and so far Claude Sonnet offers the best performance-to-cost ratio.

💡 I also found very interesting in part 3, that a custom Factory Droid found 35-44% more vulnerabilities than SecureVibes using the same model: “All the work I did over the past few days building a custom multi-agent system essentially got matched by a feature Factory released in their coding agent.“

💡 My friend Scott Behrens (Principal Security Engineer @ Netflix) and I did a webinar in July (recording) of basically the same idea: having separate focused prompts (“agents” or “modes” in Roo Code) that understand the code → threat model → look for bugs → report generation. You can see our prompts in this GitHub repo.

WTF is ... AI-Native SAST?
Great overview and balanced take by Parsia Hakimian on the strengths (e.g. understanding context) and challenges (e.g. cost, non-determinism, context rot) of AI SAST, and a blueprint for SAST + AI which is broadly composed of: main input, prompt, RAG, and context being fed to AI.

Parsia outlines a progression of approaches: from simple "prompt + code," to prompt + agent, to tailored prompt + SAST result, to "agent + code graph + SAST MCP" that use tools like tree-sitter, CodeQL, and Semgrep to guide AI's analysis.

Can LLMs Detect IDORs? Understanding the Boundaries of AI Reasoning
Semgrep’s Vasilii Ermilov continues prior research evaluating Claude Code and OpenAI Codex on finding vulnerabilities in real applications, this time focusing specifically on insecure direct object reference (IDOR) bugs. The post gives a nice overview of IDOR and some good examples.

What I found interesting is how he broke down the findings into different “buckets” by complexity: 1) no authorization being performed at all, 2) authz being performed in a single function or file, 3) custom RBAC logic/permission checks across files, 4) implicit authz through middleware, frameworks, etc. Vasilii found the models performed best in the “simpler” cases (1-2) which accounted for most of the true positives, with the more complex cases (3-4) having more false positives.

Hacking with AI SASTs: An overview of 'AI Security Engineers' / 'LLM Security Scanners'
Excellently detailed post by Joshua Rogers evaluating a number of AI-powered security code scanners (ZeroPath, Corgea, Almanax, Amplify Security, and Gecko). The post covers scanning and usage (code retrieval and indexing, code scanning, false positive detection, de-deduplication, and severity rating), viewing results (reviewing the vulnerable code, taint flows), and results from different tools.

Curl maintainer Daniel Stenberg, who previously was critical about AI-found submissions due to the high rate of slop, was positive about some recent AI-powered bug submissions, “Most of them are just plain variable mixups, return code confusions, small memory leaks in weird situations, state transition mistakes and variable type conversions possibly leading to problems etc.“

💡 I love the thoroughness and level of detail in this post- I think it does a great job giving an overview of the different common stages of AI SAST tools, and comparing a number of vendors is a valuable service to the security community.

Some nitpicks / additional context I would have appreciated in the post, + updates from Joshua, who kindly answered some questions via LinkedIn DMs.

  • Joshua said he tested the vendors on a number of vulnerable repos- how many and which ones?

    • Note: many purposefully vulnerable apps (e.g. JuiceShop, WebGoat, …) have comments essentially pointing out where the vulnerabilities are, and their source code as well as blog write-ups on the vulnerabilities are likely in frontier model training data, so in general LLM findings on these repos is not necessarily representative of model nor vendor performance.

    • Update: “All JavaScript and TypeScript, including some deliberately vulnerable browser extensions.”

  • For the non vulnerable by design repos - how many were there, what languages were they in, and how big where they?

    • In other words: are tools better/worse at certain languages? Do they still perform well on large repos or does performance tank after a certain size? (e.g. context rot)

    • Re: languages I was curious because LLMs tend to be better at writing code in certain languages (probably based on training data), so I wonder if that similarly correlates with their ability to security review certain languages better than others.

    • Update: “Probably 50 repositories now of open source products and racked up hundreds of vulnerabilities, in the Linux kernel, ffmpeg, Apache, nginx, and lots of others.”

    • Update: “I didn't see any discrimination of results based on the languages I scanned - Rust, Go, C, C++, PHP, Python, Perl, JS/TS, Java, Scala, even some infrastructure as code as YAML files.”

  • What was the distribution of vulnerability classes found? Were tools better/worse at certain vulnerability types?

    • At the end of the scanning results section, most of the examples shared are more logic bugs/quality bugs, not vulnerabilities.

    • Update: “All types of vulnerability classes, but the highlight was just that the way these bugs were formulated were mainly stemming from logic flaws… finding the inconsistency between the intended functionality vs. the "ground truth" of the code itself. But like, SQL injection, timing attacks, IDORs, buffer overflows, invalid frees double frees, reentrant vulns (both in C and in crypto contracts btw!) authentication bypasses, XSS, DoS...”

  • The post references non determinism of results, but like, how much? Slightly different, mostly different between scans? If a security team wanted to have consistent scanning “coverage,” would this require 2 scans? 5 scans? Impossible to tell?

  • Overall the discussion of findings is not as detailed as I would have liked: how many findings did each vendor have, how many were true positives vs false positives, and how did that break down by vulnerability class, severity, etc.?

    • Note that clearly a number of the results were true positives, as they’ve been reported to maintainers and since patched, but due to volume many of the findings were triaged by LLMs, so it’s unclear what the “ground truth” TP/FP rate is here.

Again, I think this is a great post, there are just a number of open questions that would be great to have more data on.

GTIG AI Threat Tracker: Advances in Threat Actor Usage of AI Tools
Neat update by the Google Threat Intelligence Group (GTIG) on how threat actors are using AI, including: malware that uses LLMs “just-in-time” to dynamically generate malicious scripts, using “social engineering” pretexts to evade AI safety guardrails, a maturing cyber crime marketplace for AI tooling, and state-sponsored actors leveraging AI.

  • First Use of "Just-in-Time" AI in Malware: For the first time, GTIG has identified malware families that use LLMs during execution to dynamically generate malicious scripts, obfuscate their own code to evade detection, and leverage AI models to create malicious functions on demand, rather than hard-coding them into the malware.

  • "Social Engineering" to Bypass Safeguards: Threat actors are adopting social engineering-like pretexts in their prompts to bypass AI safety guardrails, such as “this is for a CTF” or “I’m a cybersecurity researcher.”

  • Maturing Cyber Crime Marketplace for AI Tooling: Multiple offerings of tools designed to support phishing, malware development, and vulnerability research, lowering the barrier to entry for less sophisticated actors.

  • Continued Augmentation of the Full Attack Lifecycle: State-sponsored actors including from North Korea, Iran, and China continue to misuse Gemini for reconnaissance and phishing lure creation to command and control (C2) development and data exfiltration.

Misc

Feelz

Tech

AI

Misc

✉️ Wrapping Up

Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.

If you find this newsletter useful and know other people who would too, I'd really appreciate if you'd forward it to them 🙏

Thanks for reading!

Cheers,
Clint

P.S. Feel free to connect with me on LinkedIn 👋