Security Data Works

Independent practice

The benchmarks vendors won't run.

An independent lab and capability matrix for security data engineering. Methodology and code in the open.

Trustworthy

Measured, not asserted.

Well-connected

Context resolved cleanly.

Performant

Detection + hunting at scale.

The thesis in detail

What each pillar requires.

Most security data programs trust their vendors, their schemas, and their own past assumptions. The data platform should earn that trust empirically — source by source, claim by claim, query by query — on three properties.

Trustworthy

Data is instrumented, validated, and lineage-traceable. Completeness, freshness, and schema conformance are measured per source. Failures surface before analysts notice them in their queries.

Well-connected

Entities resolve cleanly across sources. The catalog knows which source is authoritative for which attribute, with confidence and freshness scoring. Joins do what their JOIN clauses claim they do.

Performant

The data platform meets two latency regimes on the same data: sub-second detection and response, and petabyte-scale historical hunting. Vendor performance claims are validated against the actual workload, not the brochure.

Why now

Why the SIEM model is breaking.

01

Attackers are faster than your detection cadence.

Mandiant's 2026 numbers show exploitation landing 7 days before patch release. CrowdStrike clocks attacker breakout at 51 seconds. The AI tooling making this possible is now open-weight.

02

Query performance has flipped.

On a 10M-event Zeek workload, ClickHouse runs 145× faster than the dominant schema-on-read SIEM. Same data, same hardware, same queries; methodology in the lab. The architecture schema-on-read indexing was sized for is gone.

03

Storage cost has flipped too.

Object storage plus columnar formats compress 8.2× in our benchmark. Netflix, Huntress, and Insider run multi-petabyte security data lakes on costs SIEM customers can't access. The tradeoff: data freshness.

04

Stream processing closes the freshness gap.

Modern stream engines handle thousands of near-real-time detections in the same time SIEMs handle dozens. The next move — federated query over source-retained data — is closer than vendors will admit.

The lab — proof on this page

Every benchmark is yours to re-run.

Zeek analytical workload · 10M events · single-node Docker

145× faster

ClickHouse vs. schema-on-read SIEM on identical workload. Methodology and caveats published; reference implementation under NDA.

ClickHouse Native 0.19 s
Dremio + Reflections 1.00 s
Schema-on-read SIEM 27.52 s

Reproducible Docker lab · methodology and code shared during engagement scoping · public repository queued for launch

What makes this different

Two products. A method you can audit.

The Lab and the Matrix are the public outputs. Practitioner depth and disclosure-first integrity are what make the outputs defensible.

Ready to see the numbers on your own data?

A 1–2 week POV runs the benchmarks on your data. $15K, credited toward the full assessment.