About HealthcareDataAI
A free and open-source platform making CMS healthcare data accessible, analyzable, and actionable for researchers, builders, and healthcare organizations.
The Mission
CMS publishes incredible datasets — 100M+ records on Medicare providers, utilization, payments, and quality. But accessing it requires navigating fragmented websites, inconsistent formats, and cryptic documentation. HealthcareDataAI solves this.
🎯 Make Data Accessible
Consolidate 30+ CMS datasets into one place with a modern API and interactive dashboard. No more hunting across data.cms.gov for scattered files.
💡 Showcase Possibilities
Real analysis projects (cost modeling, fraud detection) demonstrate what's possible when you combine domain expertise with data engineering.
🤝 Build Community
Empower researchers, startups, and healthcare organizations with the intelligence tools that only well-resourced health systems could afford.
🔓 Stay Open
Free API, open methodology, transparent analysis. Healthcare data should serve the public good, not just those who can pay.
The Builder
HealthcareDataAI is built by Blake Thomson, a healthcare data strategist with deep experience in provider intelligence and claims analysis.
Professional Background
Cedars-Sinai Health System
Business Development Team • 2+ years
Known as the "data guru" — built intelligence systems for physician liaisons and strategic planners working with 10,000+ LA County providers. Designed data workflows, claims analysis tools, and referral intelligence systems that turned raw data into actionable insights.
Consulting & Startups
Med Tech & Pharma Consulting • Prior experience
Sales consulting and PowerPoint storytelling for complex stakeholders. Biotech startup in SF Bay Area. Learned to translate technical complexity into business value.
Education
MS, Biomedical Engineering • Systems-level thinking meets healthcare domain
Why This Project?
After 2+ years building data systems inside a health system, I realized two things:
- CMS data is underutilized. Organizations don't know what's available or how to use it.
- Intelligence systems shouldn't be exclusive. Smaller organizations need the same tools as big health systems.
HealthcareDataAI is both a public service (free data access) and a portfolio (demonstrating what's possible for clients who need custom solutions).
The Platform
Built from scratch to serve healthcare data at scale.
Across 30 datasets: providers, claims, payments, quality, hospitals
Compressed DuckDB format for fast queries
Following CMS release schedules
Technology Stack
Database
DuckDB (columnar, analytics-optimized)
API
FastAPI (Python, async)
Frontend
Vanilla JS, no framework bloat
Deployment
Docker + nginx on Hetzner VPS
Pipeline
Python scripts + data.cms.gov
Hosting
$80/month (accessible at scale)
Open Source Approach
The code is available on GitHub. Data pipeline scripts, API endpoints, and dashboard code are all public. If you want to run your own instance or contribute improvements — you're welcome to.
Custom Services
While the platform is free, Blake offers bespoke data intelligence services for healthcare organizations that need more than self-service access.
🤖 Bespoke AI Agents
Custom LLM-powered agents for provider search, claims analysis, referral intelligence. Deployed in your Azure tenant (your data never leaves your environment).
📊 Intelligence Platforms
Full-stack data platforms tailored to your needs: dashboards, APIs, automated reports, alerting systems. Built on your data sources, not just CMS.
🔍 Custom Analysis
One-off research projects: market assessments, fraud detection, cost analysis, network intelligence. Academic rigor, business focus.
💼 Data Strategy
Consulting on data architecture, vendor selection, build-vs-buy decisions. Help you think through what's possible and what's worth building.
Interested in Custom Work?
If you need provider intelligence tools, claims analytics, fraud detection, or custom data solutions — let's talk. I've built these systems before and can do it again for you.
Contact BlakeFrequently Asked Questions
Where does the data come from?
All data is from CMS public use files available at data.cms.gov. We download, consolidate, and serve it through a modern API. No private or restricted data.
Is this affiliated with CMS?
No. HealthcareDataAI is an independent project. We use CMS data but are not endorsed by, sponsored by, or affiliated with CMS or the federal government. All analysis and opinions are our own.
Can I use this for commercial purposes?
Yes. CMS data is public domain and can be used commercially. Our API and transformed datasets are also free to use. Just attribute the data source (CMS) and respect our rate limits.
How often is data updated?
Quarterly, following CMS release schedules. NPPES updates monthly, Open Payments updates quarterly, Medicare utilization updates annually. We track the latest releases and refresh our database accordingly.
Do you have patient-level data?
No. All our data is provider-level aggregates (total services, total payments, etc.). Individual patient claims require a Data Use Agreement with CMS and are not available through our platform.
Can I download the entire database?
Yes, via Research Access (Tier 2). Request access through our data access page, describe your use case, and we'll send you a download link for the full DuckDB database (6GB).
How can I contribute or report issues?
The project is on GitHub. Open an issue for bugs or feature requests. Pull requests welcome for pipeline improvements or new data sources.
Connect
Questions, feedback, or collaboration ideas? Get in touch.