Python security developer-in-residence decries use of bots that ‘cannot understand code’

Software vulnerability submissions generated by AI models have ushered in a “new era of slop security reports for open source” – and the devs maintaining these projects wish bug hunters would rely less on results produced by machine learning assistants.

Seth Larson, security developer-in-residence at the Python Software Foundation, raised the issue in a blog post last week, urging those reporting bugs not to use AI systems for bug hunting.

“Recently I’ve noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports to open source projects,” he wrote, pointing to similar findings from the Curl project in January. “These reports appear at first glance to be potentially legitimate and thus require time to refute.”

Larson argued that low-quality reports should be treated as if they’re malicious.

As if to underscore the persistence of these concerns, a Curl project bug report posted on December 8 shows that nearly a year after maintainer Daniel Stenberg raised the issue, he’s still confronted by “AI slop” – and wasting his time arguing with a bug submitter who may be partially or entirely automated.

In response to the bug report, Stenberg wrote:

We receive AI slop like this regularly and at volume. You contribute to [the] unnecessary load of Curl maintainers and I refuse to take that lightly and I am determined to act swiftly against it. Now and going forward.

You submitted what seems to be an obvious AI slop ‘report’ where you say there is a security problem, probably because an AI tricked you into believing this. You then waste our time by not telling us that an AI did this for you and you then continue the discussion with even more crap responses – seemingly also generated by AI.

Spammy, low-grade online content existed long before chatbots, but generative AI models have made it easier to produce the stuff. The result is pollution in journalism, web search, and of course social media.

For open source projects, AI-assisted bug reports are particularly pernicious because they require consideration and evaluation from security engineers – many of them volunteers – who are already pressed for time.

Larson told The Register that while he sees relatively few low-quality AI bug reports – fewer than ten each month – they represent the proverbial canary in the coal mine.

“Whatever happens to Python or pip is likely to eventually happen to more projects or more frequently,” he warned. “I am concerned mostly about maintainers that are handling this in isolation. If they don’t know that AI-generated reports are commonplace, they might not be able to recognize what’s happening before wasting tons of time on a false report. Wasting precious volunteer time doing something you don’t love and in the end for nothing is the surest way to burn out maintainers or drive them away from security work.”

Larson argued that the open source community needs to get ahead of this trend to mitigate potential damage.

“I am hesitant to say that ‘more tech’ is what will solve the problem,” he said. "I think open source security needs some fundamental changes. It can’t keep falling onto a small number of maintainers to do the work, and we need more normalization and visibility into these types of open source contributions.

“We should be answering the question: ‘how do we get more trusted individuals involved in open source?’ Funding for staffing is one answer – such as my own grant through Alpha-Omega – and involvement from donated employment time is another.”

While the open source community mulls how to respond, Larson asks that bug submitters not submit reports unless they’ve been verified by a human – and don’t use AI, because “these systems today cannot understand code.” He also urges platforms that accept vulnerability reports on behalf of maintainers to take steps to limit automated or abusive security report creation.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 day ago

    So, this isn’t quite the issue being raised by the article – that’s bug reports generated on bug trackers by apparently a bot that they aren’t running.

    However, I do feel that there’s more potential with existing LLMs in checking and flagging potential errors than in outright writing code. Like, I’d rather have something like a “code grammar checker” that highlights potential errors for my examination rather than something that generates code from scratch itself and hopes that I will adequately review it.

    • GenderNeutralBro@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 day ago

      I’d rather have something like a “code grammar checker” that highlights potential errors for my examination rather than something that generates code from scratch itself

      Agreed. The other good use case I’ve found is as a faster reference for simple things. LLMs are absolutely great for one-liners and generating troublesome (but logically simple) things like complex xpath queries. But I still haven’t seen one generate a good script of even moderate complexity without hand-holding. In some cases I’ve been able to get usable output with a few shots, saving me a bit of time compared to if I’d written the whole darned thing from scratch.

      I’ve found LLMs very useful for coding, but they aren’t replacing my actual coding, per se. They replace looking things up, like through man pages, language references, or StackOverflow. Something like ffmpeg, for example, has a million options and it is always a little annoying to sift through the docs manually when I just need to do one specific task.

      I’m sure it’ll happen sooner or later. I’m not naive enough to claim that “computers will never be able to do $THING” anymore. I’ll say “not in the next year”, though.