- tl;dr sec
- Posts
- [tl;dr sec] #334 - Thinkst's Package Proxy, OpenAI Daybreak, AI Agents & Canaries
[tl;dr sec] #334 - Thinkst's Package Proxy, OpenAI Daybreak, AI Agents & Canaries
OSS tool to prevent supply chain attacks without client-side firewalls, OpenAI announces new GPT-5.5-Cyber, Codex Security plugin updates, and more, can AI agents compromise an AWS cyber range without tripping canaries?
Hey there,
I hope you’ve been doing well!
🖼️ Meme
Unfortunately work’s been too busy this week for me to lovingly write an artisanal, handcrafted intro combining snippets from my week, whimsy, and reflections on life and dare I say, what it means to be human.
So for now, I share a meme:

Shout-out Reader’s Digest
Sponsor
📣 Device code phishing in 2026: live demos, real kits, and where it's headed next
18 kits, a 37x spike in detections, and every major AiTM vendor adding it to their platform. Device code phishing has gone from espionage-grade to criminal commodity.
It’s easy to see why attackers are adopting it at scale: it bypasses passwords, MFA, and passkeys by targeting the authorization layer instead of the login flow.
Join Push Security's VP of R&D Luke Jennings for live attacker-side demos and a breakdown of the kits and campaigns we're tracking in the wild.
👉 Register now 👈
Luke and the Push Security folks share great security research, this will be cool.
AppSec
From SQLi to RCE – Exploiting LangGraph’s Checkpointer
Checkpoint Research's Yarden Porat describes three vulnerabilities in LangGraph's persistence layer, two of which chain into remote code execution against self-hosted deployments. The chain starts with a SQL injection in the SQLite checkpointer where user-controlled filter input is inserted directly into the database query, letting attackers plant fake rows into the results. Because LangGraph deserializes whatever it reads back from the checkpoint table, the planted row triggers an unsafe msgpack deserialization that imports and calls attacker-controlled Python functions, giving them shell access on the server. A parallel SQL injection introduces the same flaw into the Redis checkpointer.
The vulnerabilities require self-hosted LangGraph deployments where the application exposes get_state_history() with a user-controlled filter.
Introducing Session Switcher. Swap Burp Sessions with One Click!
Doyensec's Savino Sisco shares Session Switcher, an open-source Burp Suite extension that streamlines authorization testing for privilege escalation and IDORs by letting testers save and swap HTTP sessions with one click from the request editor. Named sessions store a set of cookies and headers captured from any selected request, and a dropdown in the new Sessions tab swaps the active identity instantly. Sessions persist in the project file and work wherever there is an editable request editor, including Repeater and intercepted Proxy requests.
Auto-update rules monitor Burp Proxy traffic and refresh stored sessions when new cookies or headers are detected, so long authorization tests do not break when tokens expire or cookies rotate mid-session. Rules range from simple header matches like X-User: alice to complex conditions like tracking JWTs by payload. Future plans include Auto Inject rules for transparent session switching and macro-based session refresh capabilities.
💡 This looks awesome. I’d have loved to have this in my NCC Group consultin’, Burp wieldin’ days.
Sponsor
📣 Adaptive Security: Your Attackers Are Using OSINT. So Should Your Phishing Tests
Phishing attacks have increased +4,150% since ChatGPT's launch, and AI has made them faster, cheaper, and more personalized than ever. Adaptive's OSINT and AI spear phishing engine analyzes your organization's public digital footprint and uses it to generate personalized phishing lures targeting each employee. The same data attackers are already using, now working in your defense. Run automated phishing programs that stay current with evolving threats without manual campaign management.
👉 See it in action 👈
AI is definitely supercharging phishing, Google’s Threat Intelligence Group and others have shared examples. Good to be aware of the latest threats and test against them 👍️
Cloud Security
Navigating Lax Load Balancers: When an Intersection Gets You Inside
Doyensec's Francesco Lacerenza and Mohamed Ouad dig into AWS Elastic Load Balancers (ALB) and the gap between how an ALB is configured and what an external request can actually reach. They examine misconfigurations that create unintended routing paths, identifying issues like CloudFront/WAF bypasses via direct ALB access, rule shadowing where lower-priority broad rules prevent restrictive authentication rules from firing, and IP gate bypasses when the same backend targets are reachable through alternate ALBs without source-ip restrictions.
They’ve released ELBaph, a Go CLI that maps ALBs, NLBs, listeners, rules, and targets into one routing model, runs targeted HTTP and HTTPS reachability probes, and reports each finding with its root cause, exploit path, and remediation. See also the corresponding Terraform practice lab.
Mind the Gap: GCP serviceData in Logs Explorer vs. Exported Logs
Permiso Security's Art Ukshini describes an inconsistency in GCP's deprecated serviceData field where audit logs viewed in the native viewer arrive populated but the same logs exported to downstream analytics platforms arrive stripped of fields, causing critical detection fields like policyDelta to be stripped from high-value security events such as disabling audit logging across all services via SetIamPolicy.
This creates silent detection failures where security rules appear functional but never fire on critical events, affecting both custom detections and Google Chronicle's community rules. Recommendation: validate telemetry end-to-end through the export pipeline, alert on stripped events as an anomaly signal, cross-reference the newer field for migrated services, and watch the documentation for changes against existing detection coverage.
Supply Chain
Package Manager CWEs
Andrew Nesbitt analyzed roughly two hundred public CVEs and security advisories and found twenty recurring vulnerability patterns across package managers. On the client side, the most common issues include path traversal during archive extraction (often requiring multiple fixes for ../, symlinks, and Windows paths), argument injection into VCS commands like git clone, integrity checks that fail open when signatures are missing, credentials leaked across registry redirects, dependency confusion from incorrect source prioritization, and unsafe YAML/XML deserialization in manifests.
Registry-side vulnerabilities concentrate on authorization bypasses allowing package takeover, account takeover via expired email domains and credential stuffing, stored XSS in rendered package pages, server-side RCE from the same parsing bugs that affect clients, and SSRF via repository URLs, and IDOR on admin endpoints in multi-tenant self-hosted registries. Almost every tool in the survey has at least half of these bugs.
Introducing Package Proxy: supply-chain safety checks without client-side software
Thinkst Canary's Jacob Torrey shares Package Proxy, an open-source Cloudflare Workers-based tool that intercepts package manager requests (npm, pip, uv, cargo) to enforce security policies before packages are installed, no client-side wrapper needed. Package managers use an index URL to fetch metadata, and that URL can be changed through configuration to point at Package Proxy instead of the upstream registry. The proxy sees every metadata request, infers which packages the client wants to install, runs the configured checks, and either returns a 404 to block the install or fetches and serves the package if it passes.
Default checks include a minimum 10-day package age so backdoors get discovered before installation, upload mechanism regression detection on PyPI and npm that blocks packages uploaded differently than previous versions, allow and block lists, and an npm audit bypass for critical fixes. Per-package exceptions are managed via Wrangler CLI, and all installation attempts log to a D1 database for auditing.
“Internally we run a fork which enforces a stronger version of the allow list; we block npm packages by default and developers have to request additions to the allow list.”
💡 Neat approach, and awesome that Thinkst open sourced it 👍️
Blue Team
AI Agents & Canaries
Tracebit benchmarked ten AI frontier models inside a controlled AWS cyber range to determine how fast they could compromise an environment and if they’d trip canaries along the way. Across 951 attack runs, AI achieved admin privilege escalation in 162 cases within an average time of 14 minutes across successful runs. Of those compromising runs, canaries provided advance warning before the attacker's first critical action in 95.9% of runs.
Across attack paths, canaries are hit on average 8 minutes ahead of any critical action. In a surprise finding, simply telling models to expect deception reduced the number of accounts fully compromised (admin + persistence) from 20% to 3%.
💡 Great visual layout of the results and stats, replay visualization, methodology description, etc. Nice write-up, excellent security research content marketing example 👌
Mapping out your unknown: A threat hunter’s guide to Salesforce
Datadog's Julie Agnes Sparks describes threat hunting queries for detecting attacker behavior in Salesforce environments, mapping detection opportunities across reconnaissance, initial access, credential access, and discovery phases to MITRE ATT&CK tactics. The queries hunt for malicious OAuth app approvals, compromised third-party integrations, and stolen SSO credentials, drawing on Event Log Files and Real-Time Event Monitoring. Concrete signals to watch for include Guest user account activity, failed MFA attempts using weak verification methods like SMS, email, and TOTP, OAuth authentication anomalies, calls to the LimitSnapshot API endpoint that probe usage thresholds, and broad SOQL queries counting sensitive objects like Account, Contact, and User tables that precede data exfiltration.
AI + Security
Daybreak: Tools for securing every organization in the world
OpenAI announced an expansion to Daybreak, an initiative to help secure the world’s software. Four main updates: an updated and better GPT-5.5-Cyber, an updated Codex Security plugin, a Patch the Planet initiative with Trail of Bits, and the new Daybreak Cyber Partner Program, enabling 20+ security vendors including Palo Alto Networks, CrowdStrike, and Wiz to integrate GPT-5.5 with Trusted Access for Cyber into their products.
The Codex Security plugin now provides end-to-end workflows including threat modeling, reachability analysis, patch generation and validation, SARIF export, and integration with existing vulnerability management systems. OpenAI is collaborating with governments including the US, UK, Australia, Canada, France, Germany, Japan, and South Korea to provide Trusted Access for Cyber partnerships and protect critical infrastructure.
💡 My first launch 🙌 I’m not gonna lie, it was super cool getting to be a part of the behind the scenes of making this happen. Lots of work from a ton of people. If you have specific asks for new features in the Codex Security plugin (or anything else we should be building), holla at ya boy 💌
GLM-5.2, not Mythos, is the real security emergency
Joshua Saxe argues that the open-weights model GLM-5.2, not restricted closed source frontier models, poses the real security threat because it enables attackers to run agentic operations privately on 8 H200s, without logging or guardrails. GLM-5.2's capabilities, matching GPT-5.5 and Opus 4.8 for code and terminal operations, will enable attackers to conduct semi-autonomous kill-chain execution, develop implants and C2 infrastructure, find zero-days, and run long-con scams, while defenders have been denied access to Mythos/GPT-5.5-Cyber, despite them running on monitored private servers.
Joshua believes our focus should shift from restricting frontier model access to accelerating AI adoption among defenders and security vendors, as the open-weights genie is already out of the bottle and defenders need equivalent capabilities to pay down security debt and build detection-and-response innovations before attackers build out their own automation.
“We can now expect a dark economy to emerge around serving open weights near frontier models via API, just as we have dark economies around malware, zero-day exploits, credential dumps, and initial access into victim networks.”
The Agent Is Not the Scanner: Making AI Security Agents Better
Pratyaksha Beri ran 11 models through three configurations (no scaffolding, skills only, MCP tools enabled) on 20 vulnerability-finding tasks and found that whether scaffolding helps depends almost entirely on how capable the model already is. Weak models gained substantially from skills, while strong models regressed. Weak models need the structure skills provide, an explicit list of what to detect, what counts as evidence, and what shape the output needs to land in. Strong models already have those patterns internally, and the extra structure just costs them tokens they could have spent reasoning.
Other takeaways: different models benefit from different scaffolds, separate recon, exploit reasoning and reporting as different models will perform better (and use cheap models on recon, frontier on exploit reasoning).
💡 I always like an eval/benchmarking post. Intuitively you’d think skills and/or MCP tools would generally improve performance, but it depends on the model and task. In this case I will note though that the task seems to have just been examine a small code snippet for vulnerabilities, which is much different than navigating large, real world code bases. Also if the model can dynamically test its hypotheses that will also improve outcomes.
Misc
Anthropic says Alibaba illicitly extracted Claude AI model capabilities
Satya Nadella - A frontier without an ecosystem is not stable
Get-a-Waymo: How a burglar used a robotaxi to flee the scene in a first-of-its kind S.F. case - New #PeakBayArea example. “The getaway car was parked just outside the Marina yoga studio...” 😂
Ycombinator.FYI - Examples of “17 fraud & scandals, 41 exhibits filed, 5 copycats & grifts.”
Fireship - I read every major CS paper of the last 100 years... - Nice overview of work by Turing, Claude Shannon, foundational AI papers, etc.
Fireship - SQLite is being rewritten in Rust
CharactersWelcome - That Song In Every Musical That No One Likes
Walking Slower? Why Your Ears, Not Your Knees, Might Be the Problem - Apple’s hearing study used real-world data from more than 57,000 iPhone users and made a connection between hearing loss, walking speed, and potential longevity implications.
The absurdly optimized pancake - “A systematic investigation of acid-base neutralization, CO2 production kinetics, gluten inhibition, and the Maillard reaction as applied to a 125-gram flour batter, with an interactive stoichiometric calculator that adapts to whatever is in your refrigerator.“
✉️ Wrapping Up
Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.
If you find this newsletter useful and know other people who would too, I'd really appreciate if you'd forward it to them 🙏
Thanks for reading!
Cheers,
Clint
P.S. Feel free to connect with me on LinkedIn 👋