A new trick could block abuse of open-source AI

A new trick could block abuse of open-source AI

Mazeika says the approach isn’t perfect, but suggests the bar for “decensoring” AI models could be raised. “The achievable goal is to make the cost of breaking the model high enough that most adversaries are deterred by it,” he says.

“We hope that this work will jump-start research into tamper-resistant safeguards, and the research community can figure out how to develop more and more robust safeguards,” said Dan Hendricks, director of the AI ​​Safety Center.

The new work draws inspiration from a 2023 research paper that shows how smaller machine learning models can be made tamper-resistant. “They tested [new] approach on much larger models and extended the approach with some modifications,” says Peter Henderson, an assistant professor at Princeton who led the 2023 work. “Scaling this type of approach is difficult, and it seems to be holding up well, which is great.”

The idea of ​​anti-falsification of open models may become more popular as interest in open source AI grows. Open models now compete with state-of-the-art closed models from companies like OpenAI and Google. The latest version of Llama 3, for example, released in July, is about as powerful as the models behind popular chatbots like ChatGPT, Gemini, and Claude, as measured using popular benchmarks for evaluating the abilities of language models. Mistral Large 2, an LLM from a French startup also launched last month, has similar capabilities.

The US government is taking a cautious but positive approach to open source AI. A report released this week by the National Telecommunications and Information Administration, an agency of the U.S. Department of Commerce, “recommends that the U.S. government develop new capabilities to monitor potential risks, but refrain from immediately limiting the widespread availability of open model in the largest AI systems.”

However, not everyone is a fan of imposing restrictions on open models. Stella Biederman, director of EleutherAI, a community-run open-source AI project, says the new technique may be elegant in theory, but could prove difficult to implement in practice. Biederman says the approach is also antithetical to the philosophy behind free software and openness in AI.

“I think this paper misses the core issue,” Biederman says. “If they are worried about LLMs generating WMD information, the right intervention is on the training data, not the trained model.”

Leave a Reply

Your email address will not be published. Required fields are marked *