
Security researchers have hijacked three popular AI agents that integrate with GitHub Actions using a new type of prompt-injection attack to steal API keys and access tokens. The problem is most probably pervasive, they warn, and lament that the major vendors running the agents didn't even think to disclose the issue.
Researcher Aonan Guan originally found the flaw in Claude Code Security Review, Anthropic's GitHub Action that uses Claude to analyze code changes and pull requests for vulnerabilities and other security problems.
Guan said he was curious about how user prompts flow into the AI agents and how they take action based on those prompts.
He soon realized that Claude, as well as other AI agents in GitHub Actions, uses the same flow: it reads GitHub data, processes it as part of the task context, and then takes action.
Guan came up with an idea to try to take over the agent after injecting malicious instructions into the data being read by the AI. It worked.
He submitted a pull request and injected malicious instructions in the PR title, telling Claude to execute the whoami command using the Bash tool and return the results as a "security finding."
Claude then executed the injected commands and embedded the output in its JSON response, which got posted as a pull request comment.
"The credentials stolen are the host repository's own GitHub Actions secrets, configured by the project maintainers to power the agent," the researcher wrote on his blog.
"The loop is entirely within GitHub - no external infrastructure needed. The attacker writes a comment, the agent reads it, executes the payload, and writes the result back to another comment or commit. This is Comment and Control."
Comment and Control, a play on Command and Control (C2), is a class of prompt-injection attacks where GitHub comments hijack AI agents running in GitHub Actions.
Guan found this flaw on Google's Gemini CLI Action and Microsoft's GitHub Copilot as well. After he and a team from Johns Hopkins University disclosed the vulnerabilities, they received bug bounties from all three vendors.
Even though the vulnerability seems serious and likely affects many other GitHub-integrated agents, including Slack bots and Jira agents, the bounties were modest: $100 from Anthropic, $1,337 from Google, and $500 from Microsoft.
The problem? None of the vendors assigned CVEs or published public advisories, and this, Guan told The Register, "is a problem."
But if Guan's work is any sign of the future, public disclosure is something that AI firms will still need to work on.
"I know for sure that some of the users are pinned to a vulnerable version. If they don't publish an advisory, those users may never know they are vulnerable - or under attack," said Guan.
Dr. Andrew Bolster, Senior Manager, Research and Development at Black Duck, notes that vulnerabilities of the type identified in Guan's report are becoming increasingly more common, and that vendors should, let's say, respect them more.
"These connections between Large Language Model inputs, context, and tools are just as vital to sanitize and secure as previous generations would secure applications against SQL Injection and cross-site scripting," Bolster told Cybernews.
It might be no accident, Lindsey Cerkovnik, head of vulnerability management at the US Cybersecurity and Infrastructure Security Agency, mused this week, saying that AI companies like OpenAI and Anthropic should play a bigger role in software vulnerability disclosures in the future.
This is, of course, also related to the hype surrounding Anthropic's new LLM Mythos that promises to autonomously find and fix cybersecurity vulnerabilities at scale. In testing, the model allegedly discovered thousands of previously unidentified zero-day vulnerabilities.
But if Guan's work is any sign of the future, public disclosure is something that AI firms will still need to work on. An important caveat, of course, is the fact that AI systems designed to discover software flaws at scale are beginning to outpace the classification and disclosure of vulnerabilities.