Data Analytics and Fraud

The Government Already Has the Data

15 July 2025 · 15 mins

Fraud-Analytics PPP-Fraud Healthcare-Fraud DOJ-Enforcement Data-Fusion-Center PACE

Medicare claims, tax returns, PPP and EIDL applications — the government increasingly holds the structured transaction data that lets fraud enforcement start with a query, not a tip.

Following the Money: Which FCPA Cases Look SAR-Matchable?

12 August 2025 · 10 mins

FCPA FinCEN Suspicious-Activity-Reports Synthetic-Data Enforcement-Intelligence Anti-Corruption

A proof of concept for using AI and synthetic SAR data to estimate how many FCPA enforcement actions describe transaction patterns that would plausibly generate suspicious activity reports — and what that tells us about the hidden plumbing of anti-corruption enforcement.

The Data Miner's Dilemma

29 April 2026 · 16 mins

Data-Miners Qui-Tam False-Claims-Act FOCUS-Initiative DOJ-Enforcement Whistleblower

DOJ's new FOCUS initiative wants better data-driven fraud cases. But it keeps its two best enforcement channels — whistleblower tips and data miner analytics — in separate silos. The real opportunity is connecting them.

Show Your Work

10 December 2025 · 15 mins

PPP-Fraud Data-Analytics Fraud-Scoring DOJ-Enforcement Kabbage Fintech-Lenders

Public PPP data can produce enforcement-relevant anomaly maps. An open-source fraud-scoring system, run against the SBA PPP dataset, surfaced lender and geographic concentrations that overlap with known enforcement patterns — while also showing why public data cannot prove fraud by itself.

From Kaggle to MCP: Open-Source Medicare Fraud Detection

20 December 2025 · 14 mins

Fraud-Analytics Healthcare-Fraud CMS Medicare Open-Source GitHub

The PPP fraud pipeline worked because the SBA released unusually inspectable data. Medicare's public data is fragmented, de-identified, and missing the features detection needs. Here's what exists on GitHub, where it falls short, and what CMS would need to release to make outside healthcare-fraud analysis more practical.

The Backtest: What Excluded Medicare Providers Look Like Before They Get Caught

20 May 2026 · 27 mins

Fraud-Analytics Healthcare-Fraud CMS Medicare LEIE Anomaly-Detection

The previous post described a Medicare fraud backtest nobody had built. Here are the results. 289 excluded providers across 41 states, matched to pre-exclusion billing data, compared against 3.39 million peers. Thirteen of fifteen features showed statistically significant differences — and the same behavioral fingerprint shows up in never-excluded providers who have independent enforcement histories.

Building a Medicare Fraud Backtest in One Claude Code Session

25 May 2026 · 19 mins

Fraud-Analytics Healthcare-Fraud CMS Medicare Claude-Code Agentic-AI

A walkthrough of building a Medicare fraud backtest overnight in Claude Code — from a plain-English spec to 289 matched providers across 41 states, a fraud-similarity model with AUC 0.79, and a manual public-record check of high-scoring peers. Including the three times the pipeline failed, the data duplication bug, and the engineering decisions that shaped the final design.

↑