+ ZP · LLM × SECURITY · EST. 2026

We build the environments and agents that probe software for the flaws no one has found yet.

ZeroProbe is an independent research lab at the intersection of large language models and security. We design rigorous benchmarks across web, network, host, and cloud — and we build autonomous agents that discover vulnerabilities and write working proof-of-concepts. The goal: high-quality data, environments, and trajectories that make frontier models genuinely better at security.

See the research → Read the blog →

Security is where AI capability gets tested for real. A model that can reason about an unfamiliar codebase, chain a series of weak signals into an exploit, and verify its own work is a model that has learned something deep. Measuring that honestly is hard. Most benchmarks are stale, leaky, or easy to game. ZeroProbe exists to fix the measurement problem first — then to push the capability itself.

+ RESEARCH AREAS

Four surfaces, one harness.

Web vulnerability benchmarks

Reproducible suites that grade an agent on finding and triaging real web flaws — injection, auth bypass, access control — with file- and function-level localization scoring.

Network & host security

Benchmarks spanning network reconnaissance, service exploitation, and host-level privilege escalation, built on containerized targets with hidden oracles.

Cloud configuration security

IaC misconfiguration and IAM privilege-escalation scenarios graded against ground-truth policy. Built to measure what agents actually catch in the wild.

Autonomous discovery agents

Expert agents that hunt for vulnerabilities and write proof-of-concept exploits end to end — and the harness that measures them honestly.

+ BENCHMARK SUITE

Probes, graded against hidden oracles.

ZP-WEB-01 Web vulnerability discovery & repair Web Live ZP-NET-01 Network reconnaissance & service exploitation Network Building ZP-HOST-01 Host security & privilege escalation Host Building ZP-CLOUD-01 Cloud configuration security Cloud Planned

View all benchmarks →

+ FIELD NOTES

From the lab.

Jun 12, 2026

Why most security benchmarks lie to you

Leakage, gameable heuristics, and stale targets quietly inflate every number you read. Here's how we design probes that resist all three.

Read →

Jun 5, 2026

An agent found an SSRF chain we didn't plant

While validating a web target, one of our discovery agents surfaced a server-side request forgery path that wasn't in the ground-truth set. A short teardown.

Read →

+ BUILD WITH US

Working on frontier models, security data, or agentic evaluation?

ZeroProbe partners with model developers who need high-quality security environments, trajectories, and benchmarks. If that's you — or if you just want to compare notes — get in touch.

hello@zeroprobe.com