Healthcare Data Projects
Real analysis on real data. Showcasing what's possible when you combine domain expertise, data engineering, and modern analytics on 100M+ Medicare records.
Featured Projects
Each project demonstrates different analytical approaches and business applications using CMS public data
Healthcare Cost Analysis
Research PhaseThe Question: What does healthcare actually cost to deliver in the United States?
We spend $4.5 trillion annually on healthcare — but how much of that is direct patient care versus administrative overhead, insurance complexity, and systemic waste? This multi-method research project uses hospital cost reports, Medicare utilization data, and national spending tables to triangulate the true cost.
Four Calculation Approaches:
- Provider capacity model (physician count × volume × cost/encounter)
- Facility cost model (hospital beds × occupancy × cost/bed-day)
- Claims bottom-up (Medicare → age-adjust → extrapolate)
- Top-down national accounting ($4.5T → decompose → net out waste)
Status: Research design complete, data acquisition in progress, pilot calculations starting Q2 2026
View Project Details
Fraud Detection & Anomaly Analysis
ActiveThe Problem: $60-100 billion in healthcare fraud annually. Can we find it in public data?
Medicare fraud costs taxpayers billions, but detecting it requires finding needles in haystacks of legitimate billing. This project applies statistical, geographic, and network-based techniques to identify billing outliers, prescribing anomalies, and suspicious provider patterns.
Five Detection Approaches:
- Statistical outlier detection (billing 3+ std dev above peers)
- Prescribing pattern analysis (pill mills, opioid red flags)
- Geographic clustering (fraud hotspots by county/zip)
- Network analysis (referral rings, kickback schemes)
- Time series anomalies (COVID scams, pre-retirement fraud)
Status: Methodology complete, initial outlier analysis underway, case studies targeted for Q2 2026
View Project DetailsWhat These Projects Showcase
Demonstrating end-to-end capabilities in healthcare data intelligence
🎯 Domain Expertise
Deep understanding of healthcare business models, provider networks, claims data structures, and regulatory frameworks (Medicare, CMS, HIPAA, Stark Law).
📊 Data Engineering
Building production data pipelines: acquiring, cleaning, transforming, and serving 100M+ records. DuckDB, FastAPI, nginx, Docker — the full stack.
🔬 Analytical Rigor
Multiple validation methods, transparent assumptions, peer-reviewable methodology. Academic-quality analysis, business-friendly presentation.
💼 Business Value
Every analysis answers a real business question: How can we reduce costs? Where's the fraud? What's the ROI? Data for decisions, not just dashboards.
Want Custom Analysis for Your Organization?
These projects are proof-of-concept for what's possible with healthcare data. If you need similar analysis for your health system, claims data, or provider network — we can build it.
Contact UsComing Soon
Projects in the pipeline for 2026
Provider Network Intelligence
Mapping referral patterns, identifying key opinion leaders, and quantifying provider influence within regional healthcare networks.
Q2 2026Prescribing Pattern Analysis
Deep-dive into Part D prescriber data: brand vs. generic preferences, specialty drug adoption patterns, and physician-pharma relationships.
Q3 2026Hospital Market Analysis
Competitive dynamics in major metro markets: bed capacity, service lines, patient migration, and market share shifts over time.
Q3 2026Price Transparency Analysis
What hospital price transparency data reveals about negotiated rates, price variation, and the true cost of common procedures.
Q4 2026