<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Analytics and Fraud on LegalRealist AI</title><link>https://legalrealist.ai/series/data-analytics-and-fraud/</link><description>Recent content in Data Analytics and Fraud on LegalRealist AI</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>hi@legalrealist.ai (LegalRealist AI)</managingEditor><webMaster>hi@legalrealist.ai (LegalRealist AI)</webMaster><copyright>© 2026 LegalRealist AI</copyright><lastBuildDate>Mon, 25 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://legalrealist.ai/series/data-analytics-and-fraud/index.xml" rel="self" type="application/rss+xml"/><item><title>Building a Medicare Fraud Backtest in One Claude Code Session</title><link>https://legalrealist.ai/posts/38-backtest-walkthrough/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/38-backtest-walkthrough/</guid><description>A walkthrough of building a Medicare fraud backtest overnight in Claude Code — from a plain-English spec to 289 matched providers across 41 states, a predictive model with AUC 0.79, and out-of-sample validation. Including the three times the pipeline failed, the data duplication bug, and the engineering decisions that shaped the final design.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/38-backtest-walkthrough/feature.png"/></item><item><title>I Built the Backtest: What Excluded Medicare Providers Look Like Before They Get Caught</title><link>https://legalrealist.ai/posts/37-backtest-results/</link><pubDate>Wed, 20 May 2026 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/37-backtest-results/</guid><description>The previous post described a Medicare fraud backtest nobody had built. I built it. 289 excluded providers across 41 states, matched to pre-exclusion billing data, compared against 3.39 million peers. Thirteen of fifteen features showed statistically significant differences — and the behavioral fingerprint is consistent enough to predict fraud in providers who were never excluded.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/37-backtest-results/feature.png"/></item><item><title>The Data Miner's Dilemma</title><link>https://legalrealist.ai/posts/24-data-miners-dilemma/</link><pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/24-data-miners-dilemma/</guid><description>DOJ&amp;rsquo;s new FOCUS initiative wants better data-driven fraud cases. But it keeps its two best enforcement channels — whistleblower tips and data miner analytics — in separate silos. The real opportunity is connecting them.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/24-data-miners-dilemma/feature.png"/></item><item><title>From Kaggle to MCP: Open-Source Medicare Fraud Detection</title><link>https://legalrealist.ai/posts/40-open-source-fraud-detection/</link><pubDate>Sat, 20 Dec 2025 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/40-open-source-fraud-detection/</guid><description>The PPP fraud pipeline worked because the SBA released everything. Medicare&amp;rsquo;s public data is fragmented, de-identified, and missing the features detection needs. Here&amp;rsquo;s what exists on GitHub, where it falls short, and what CMS would need to release to let outside analysts do for healthcare fraud what one Python repo did for PPP.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/40-open-source-fraud-detection/feature.png"/></item><item><title>Show Your Work</title><link>https://legalrealist.ai/posts/39-show-your-work/</link><pubDate>Wed, 10 Dec 2025 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/39-show-your-work/</guid><description>Public data can source prosecution leads. An open-source fraud-scoring system, run against the full SBA PPP dataset, identified the same lenders, geographies, and loan populations that DOJ prosecuted — using nothing but a downloadable CSV and a standard laptop.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/39-show-your-work/feature.png"/></item><item><title>Following the Money: Can AI Trace FCPA Cases Back to Suspicious Activity Reports?</title><link>https://legalrealist.ai/posts/11-following-the-money/</link><pubDate>Tue, 12 Aug 2025 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/11-following-the-money/</guid><description>A proof of concept for using AI and synthetic SAR data to estimate how many FCPA enforcement actions may have originated from FinCEN suspicious activity reports — and what that tells us about the hidden plumbing of anti-corruption enforcement.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/11-following-the-money/feature.png"/></item><item><title>The Government Already Has the Data</title><link>https://legalrealist.ai/posts/10-the-governments-data-advantage/</link><pubDate>Tue, 15 Jul 2025 00:00:00 +0000</pubDate><author>hi@legalrealist.ai (LegalRealist AI)</author><guid>https://legalrealist.ai/posts/10-the-governments-data-advantage/</guid><description>Medicare claims, tax returns, PPP applications — the government already holds a closed, mostly clean dataset of every transaction it needs to find fraud. It doesn&amp;rsquo;t need SARs or tips. It just needs to run the query.</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://legalrealist.ai/posts/10-the-governments-data-advantage/feature.png"/></item></channel></rss>