Web vulnerability benchmarks
Reproducible suites that grade an agent on finding and triaging real web flaws — injection, auth bypass, access control — with file- and function-level localization scoring.
+ ZP · LLM × SECURITY · EST. 2026
ZeroProbe is an independent research lab at the intersection of large language models and security. We design rigorous benchmarks across web, network, host, and cloud — and we build autonomous agents that discover vulnerabilities and write working proof-of-concepts. The goal: high-quality data, environments, and trajectories that make frontier models genuinely better at security.
Security is where AI capability gets tested for real. A model that can reason about an unfamiliar codebase, chain a series of weak signals into an exploit, and verify its own work is a model that has learned something deep. Measuring that honestly is hard. Most benchmarks are stale, leaky, or easy to game. ZeroProbe exists to fix the measurement problem first — then to push the capability itself.
+ RESEARCH AREAS
Reproducible suites that grade an agent on finding and triaging real web flaws — injection, auth bypass, access control — with file- and function-level localization scoring.
Benchmarks spanning network reconnaissance, service exploitation, and host-level privilege escalation, built on containerized targets with hidden oracles.
IaC misconfiguration and IAM privilege-escalation scenarios graded against ground-truth policy. Built to measure what agents actually catch in the wild.
Expert agents that hunt for vulnerabilities and write proof-of-concept exploits end to end — and the harness that measures them honestly.
+ BENCHMARK SUITE
+ FIELD NOTES
Jun 12, 2026
Leakage, gameable heuristics, and stale targets quietly inflate every number you read. Here's how we design probes that resist all three.
Read →Jun 5, 2026
While validating a web target, one of our discovery agents surfaced a server-side request forgery path that wasn't in the ground-truth set. A short teardown.
Read →+ BUILD WITH US
ZeroProbe partners with model developers who need high-quality security environments, trajectories, and benchmarks. If that's you — or if you just want to compare notes — get in touch.
hello@zeroprobe.com