DOJ's new FOCUS initiative wants better data-driven fraud cases. But it keeps its two best enforcement channels — whistleblower tips and data miner analytics — in separate silos. The real opportunity is connecting them.
Public data can source prosecution leads. An open-source fraud-scoring system, run against the full SBA PPP dataset, identified the same lenders, geographies, and loan populations that DOJ prosecuted — using nothing but a downloadable CSV and a standard laptop.
The PPP fraud pipeline worked because the SBA released everything. Medicare's public data is fragmented, de-identified, and missing the features detection needs. Here's what exists on GitHub, where it falls short, and what CMS would need to release to let outside analysts do for healthcare fraud what one Python repo did for PPP.