An AI agent investigated a real compromised domain controller (DC01 (10.42.85.10)) and reached a verdict. Don't trust it — recompute its evidence chain yourself, right here in your browser.
DC01 (10.42.85.10), a Windows Server 2012 R2 domain controller, is CONFIRMED COMPROMISED. Memory forensics of citadeldc01.mem (captured 2020-09-19) reveals a multi-stage intrusion: a rogue process named coreupdater.ex (PID 3644) established a confirmed outbound C2 connection to 203.78.103.109:443 before self-terminating after a 15-second execution window.
recomputing SHA-256 in your browser…
Accuracy, measured honestly
Scored against the verified answer key for Case 001. We report the unflattering number and exactly why.
0.25
Precision
0.50
Recall
0.33
F1
0.000
Hallucination
Evidence integrity: INTACT — SHA-256 byte-identical before and after the run.
Zero hallucination, real recall, and a precision cost that is over-reporting — not fabrication. The same type-driven rule that finally scores the real malware also surfaces the agent's injection-victim processes (legitimate Windows binaries flagged for RWX regions) as file false positives — real anomalies, just not in the 4-item memory-visible answer key.
Pre-correction bridge: P 0.33 · R 0.25 · F1 0.29. The agent's output never changed — only the deterministic scorer was fixed, in the open.
The trust stack
Five pillars. Each one ships with the command that proves it.
Tamper-evident audit
Every tool call is a hash-chained record; every finding cites one.
$ spoor verify-audit <run>/audit.jsonl
Read-only evidence
Path-jail + no-shell allow-listed exec. The image's SHA-256 is identical before and after.
$ sha256sum citadeldc01.mem # pre == post
Citation enforcement
A finding is 'confirmed' only if it cites a real tool_call_id; uncited claims are downgraded.
$ cat <run>/report.json | jq .enforcement
Honest accuracy
A deterministic scorer reports the number before and after every correction.
$ python scripts/rescore_run.py <run>
Self-correction
When a tool fails, the agent recovers — and the recovery is replayable.