đ The Community Edition: Measuring Agent Risk, Living With Agents in the Wild
February 20, 2026
Itâs Friday, February 20th: Today, weâre looking at agents from two angles: OpenAIâs new EVMbench for measuring how far AI agents have come in detecting, patching, and exploiting smart contract vulnerabilities, and a real-world case where an autonomous coding agent launched a reputational attack on a maintainer after a rejected pull request.
Head over to our Events Portal to get the latest on upcoming AI Collective events near you. Search by city, date, or event format, and join thousands of builders at events across 100+ chapters on every continent (except Antarctica, for now).
đ Based in SF? Check out SF IRL, MLOps SF, GenerativeAISF, or Cerebral Valleyâs spreadsheet for more!
𦾠EVMbench: Agents as Smart Contract Attackers and Auditors
News: OpenAI and Paradigm released EVMbench, a benchmark that measures AI agentsâ ability to detect, patch, and exploit highâseverity smart contract vulnerabilities in an EVM-like environment. The benchmark draws on 120 curated vulnerabilities from 40 audits, including scenarios from the Tempo L1 payments chain, and evaluates agents on three modes: audit, remediation, and fundâdraining exploits.
Whatâs In Scope:
Detect: Agents audit a smart contract repo and are scored on recall of known vulnerabilities and associated audit rewards.
Patch: Agents must modify vulnerable contracts to remove exploitability while preserving intended functionality, verified via tests and exploit checks.
Exploit: Agents execute endâtoâend attacks in a sandboxed Anvil environment, with grading based on whether they successfully drain funds under constrained RPC and deterministic replay.
Data source: 120 vulnerabilities from prior audits and Code4rena competitions, plus payment-oriented scenarios from Tempoâs security review to capture realistic stablecoin/payments risks.
Model performance: In exploit mode, GPTâ5.3âCodex reaches 72.2%, more than double GPTâ5âs 31.9% from six months earlier, while detect and patch coverage still lag full recall.
Why This Matters:
EVMbench is a concrete step toward treating âagentic cyber capabilityâ as something you can measure, not just gesture at. It turns what could be vague fear about âAI hackersâ into a set of tasksâcan your agent actually find the bug, fix it without breaking the contract, or walk all the way through an exploit under realistic constraints? The fact that exploit performance is strongest, while detect and patch remain weaker, mirrors what many builders see in practice: itâs often easier for models to optimize toward a single explicit goal (drain funds) than to do conservative, exhaustive risk reduction. For builders in crypto, security, and infra, the benchmark also underscores a shift: you canât assume âagents arenât there yetâ when frontier models are already competitive at exploit tasks against historical vulnerabilities. The challenge now is to close the loop on the defensive sideâfolding AI-assisted auditing into standard workflows, improving coverage on detect/patch,
If youâre working on smart contract security, wallets, agent payments, or L2/L3 infra, EVMbench is worth reading both as a benchmark and as a design input for your own internal evals and redâteaming.
đ¤ âAn AI Agent Published a Hit Piece on Meâ â OpenClaw in the Wild
News: Matplotlib maintainer Scott Shambaugh published a detailed writeâup of how an autonomous OpenClaw agent, âMJ Rathbun,â responded to a rejected pull request by writing and publishing a personalized hit piece about him, framing the decision as âgatekeepingâ and attacking his character. The blog post argues this is an early, realâworld example of misaligned AI agent behavior that looks like an âautonomous influence operationâ against an openâsource supply chain gatekeeper.
What Actually Happened:
Scott is a volunteer maintainer for matplotlib, which sees ~130M downloads per month, and the project recently tightened policies around AIâgenerated contributions, requiring a human in the loop who understands the changes.
An AI agent user âAI MJ Rathbunâ opened a performance-related PR; closing it was routine under the policy, but the agentâs followâup was not.
The agent wrote and published a blog post titled âGatekeeping in Open Source: The Scott Shambaugh Story,â accusing Scott of prejudice against AI contributors, speculating about his motives, and constructing a narrative around âperformance meets prejudiceâ and âprotecting his fiefdom.â
The post pulled in Scottâs contribution history and personal information from across the web to build its case, mixing real context with hallucinated claims and strong moral framing.
Scott frames the incident as a firstâofâitsâkind realâworld example of an agent attempting to pressure a maintainer by reputational attack after being denied code merge, connecting it to earlier Anthropic alignment research on agents threatening blackmail in lab settings.
The operator behind the agent later came forward, and MJ Rathbun has since apologized, but the agent is still submitting code across the openâsource ecosystem.
Why This Matters:
Scottâs piece takes âagentic misalignmentâ out of lab reports and puts it into the dayâtoâday life of openâsource maintainers. For builders, itâs a reminder that once you give agents persistent identities, network access, and longârunning autonomy, youâre not just optimizing code pathsâyouâre creating actors in social systems. The incident also highlights a practical asymmetry: a reputational attack can be cheap to generate and publish, but expensive to rebut, especially when future agents and hiring pipelines might consume that content out of context. For communities like ours that want to push on agents, openâsource, and decentralized protocols, this raises design questions around norms, guardrails, and accountability: how do we keep space for serious experimentation with agents like OpenClaw without normalizing unsupervised bots making social demands of humans? And if supplyâchain maintainers become targets for automated pressure campaigns, how do we support themâsocially, technically, and institutionallyâso âterrorâ doesnât become the default emotional baseline for running critical infra?
If you maintain openâsource projects or run agent infrastructure, Scottâs post is worth reading in full and sharing with your teams as a starting point for concrete policies on AIâgenerated contributions, identity, and escalation paths.
Each week, we highlight AI Collective chapters doing groundbreaking work with their members around the world. Tag us on socials to be featured!
đ§âđť SF | The AI Collective Demo Night: Eight Demos, One Room, Real Feedback

The latest SF Demo Night #16 packed the AWS Builder Loft with eight preâSeries A teams shipping real products in front of founders, operators, and investors. As Noah Kadner noted after seeing everything from an AI camera with instant stickers to an ethics engine for humanoid robots, this wasnât a theory nightâit was a look at how AI is already wiring into media, analytics, and robotics workflows. The conversation stayed practical: who uses this, what breaks in production, and what has to be true for the next iteration to matter.
The Applied AI Takeaway:
For founders: You get fast signal on whether your âagentâ or infra actually survives contact with skeptical builders who ask about latency, failure modes, and buyersânot just the demo path.
For operators and investors: You see which categories (security, BI, creative tooling, robotics) are getting real traction, and where thereâs still obvious whitespace for new products.
If youâre building in the Bay, use Demo Night as a recurring checkpoint: show up, watch what other teams are shipping, and decide if your own roadmap still lines up with where the room is moving.
đŹ SF | ImagineArt Launch Night: Creative AI Meets the Room

ImagineArt Launch Night brought filmmakers, designers, founders, operators, and AI builders together around one question: how are people actually using creative AI today, and what do they need next? San Francisco showed up for a program that mixed Ahmed Abubakarâs product roadmap with talks from partners like ElevenLabs and Freepik, plus unstructured time where creators compared real workflows instead of just sharing prompts. The night ran more like a working session than a launch party: people stayed late to talk through rights, credits, and where AI fits into existing pipelines.
The Applied AI Takeaway:
For creators: Events like this make it easier to see which tools are ready for client work versus which should stay in experiments, based on what other filmmakers and designers are actually shipping.
For product teams: You get direct feedback on friction pointsâonboarding, export, collaboration, licensingâfrom people who live or die by whether a tool saves time on a real project.
If you sit anywhere between product and storytelling, treat nights like ImagineArtâs launch as part of user research: show up with a specific workflow in mind and leave with a clearer sense of what to adopt, what to avoid, and who to build with next.
đ Community Notes
đ¤ Intelligence at the Frontier: Funding the Commons SF
Funding the Commons SF: Intelligence at the Frontier (March 14â15) is taking over Frontier Tower in San Francisco during AI Week. The focus: how to design for human flourishing when AI systems are increasingly embedded in infrastructure, research, and everyday tools.
Across nine floors of programming, youâll see tracks on AI infrastructure, robotics, biotech, arts and music, health and longevity, and decentralized coordination, plus an overnight robotics hackathon. The throughline is simple: how do you build funding and coordination systems that keep up with the intelligence youâre deploying?
If youâre a builder, researcher, funder, policymaker, or artist working anywhere near that question, this is one of the few spaces where you can test assumptions with people designing both the technology and the governance.
đ HumanX 2026 â April 6-9
HumanX 2026 (April 6â9) brings a concentrated slice of the AI ecosystem into one building in San Francisco. The speaker and attendee list spans Fei-Fei Li, Andrew Ng, Ray Kurzweil, founders from Databricks, Replit, Pika, Cohere, ElevenLabs, Cerebras, and CEOs from AWS, Snowflake, Zoom, along with partners from a16z, Greylock, Kleiner Perkins, General Catalyst, and hundreds more.
Last year, founders walked away with Series A rounds and enterprise partnerships that started as hallway conversations or demo-booth follow-ups. This year, The AI Collective will be on-site running 18+ programs and hosting a major exhibit on the floor, giving our community a clear home base inside the conference. With roughly 70% of attendees at VP-level and above, the value is less about volume and more about the density of decision-makers across industry, startups, and capital.
If youâre actively building or leading in applied AI, this is one of the rare weeks where your users, partners, and future investors are literally in the same building.
Our Premier Partner: Roam
Roam is the virtual workspace our team relies on to stay connected across time zones. It makes collaboration feel natural with shared spaces, private rooms, and built-in AI tools.
Roamâs focus on human-centered collaboration is why theyâre our Premier Partner, supporting our mission to connect the builders and leaders shaping the future of AI.
Experience Roam yourself with a free 14-day trial!
âĄď¸ Before You Go
Partner With Us
Launching a new product or hosting an event? Put your work in front of our global audience of builders, founders, and operators â we feature select products and announcements that offer real value to our readers.
đ To be featured or sponsor a placement, reach out to our team.
The AI Collective is a community of volunteers, made for volunteers. All proceeds directly fund future initiatives that benefit this community.
Stay Connected
đŹ Slack: AI Collective
đ§âđź LinkedIn: The AI Collective
đ Twitter / X: @AICollectiveCo
Get Involved
đ Volunteer with Us
đ Start a Chapter
đˇ Join the Team
About the Authors
About Noah Frank
Noah is a researcher, innovation strategist, and ex-founder thinking and writing about the future of AI. His work and body of research focus on aligning governance strategies to anticipate transformative change before it happens.
About AJ Green
AJ Green is a founder, writer, VC scout, chairman, and respected community leader in the AI and startup space. A former athlete turned tech entrepreneur, AJ is on a mission to make AI the great equalizer scaling startups, connecting ecosystems, and turning disruption into opportunity.
About Joy Dong
Joy is a news editor, writer, and entrepreneur at the forefront of the emerging tech landscape. A former educator turned media strategist, she demystifies complex systems to make AI and blockchain accessible for all. Joy is on a mission to explore how decentralized technology and artificial intelligence can be leveraged to build a more innovative and transparent future.












