Datasets and tools built alongside the writing on LegalRealist AI. The data is pulled from primary sources and cleaned for analysis; the tools are small, focused utilities that answer the kinds of questions the posts raise. Everything is free to use, with sources cited and methods documented.
Data#
Public-record datasets — court dockets, enforcement filings, regulatory data — cleaned and structured so they can be queried instead of skimmed. Each release ships with source notes, a schema description, and the script used to assemble it.
| Project | Description | Source |
|---|---|---|
| FinCEN SARs (Synthetic) + FCPA Enforcement | Synthetic SARs joined with Stanford FCPA Clearinghouse enforcement data for anomaly-detection and classification research. See The Government’s Data Advantage. | [coming soon] |
Code#
Small, focused utilities — not platforms. Each tool does one thing a lawyer or legal-ops person might want done, with the source open so you can read what it’s doing before you trust the output.
| Project | Description | Source |
|---|---|---|
| Legal AI Cost Calculator | Estimate what an AI document-review workflow actually costs — per document and at scale — across model tiers and iteration rounds. Based on the pricing model from The Foundation. | [coming soon] |
| Law School LLM Wiki | AI-maintained law school knowledge base powered by Claude Code. Adapted from Andrej Karpathy’s LLM Wiki pattern. The example is built from law school outlines, but you can point it at any source material. | GitHub |
Suggestions welcome — get in touch.

