CEO-Bench: Can Agents Play the Long Game? . Contribute to zlab-princeton/ceobench-src development by creating an account on GitHub.
My AI stopped having goldfish syndrome.
Evaluate the effectiveness of Microsoft’s Python Risk Identification Toolkit (PyRIT) for agentic AI red teaming. Address evolving autonomous AI system threats.