UC Berkeley researchers release CyberGym as a benchmark for evaluating AI agents cybersecurity capabilities. The reproduction rate of identifying known bugs was low (only 11.9%), but this serves as a baseline for improvements in AI agent performance over time.
More interestingly, the evaluation process discovered 15 new vulnerabilities that present security risks, a tangential benefit. As this is a new technique, I’d expect that teams will find these tools to be increasingly helpful over the next few years.
Leave a Reply