About HealthcareDataAI

A free and open-source platform making CMS healthcare data accessible, analyzable, and actionable for researchers, builders, and healthcare organizations.

The Mission

CMS publishes incredible datasets — 100M+ records on Medicare providers, utilization, payments, and quality. But accessing it requires navigating fragmented websites, inconsistent formats, and cryptic documentation. HealthcareDataAI solves this.

🎯 Make Data Accessible

Consolidate 30+ CMS datasets into one place with a modern API and interactive dashboard. No more hunting across data.cms.gov for scattered files.

💡 Showcase Possibilities

Real analysis projects (cost modeling, fraud detection) demonstrate what's possible when you combine domain expertise with data engineering.

🤝 Build Community

Empower researchers, startups, and healthcare organizations with the intelligence tools that only well-resourced health systems could afford.

🔓 Stay Open

Free API, open methodology, transparent analysis. Healthcare data should serve the public good, not just those who can pay.

The Builder

HealthcareDataAI is built by Blake Thomson, a healthcare data strategist with deep experience in provider intelligence and claims analysis.

Professional Background

Cedars-Sinai Health System

Business Development Team • 2+ years

Known as the "data guru" — built intelligence systems for physician liaisons and strategic planners working with 10,000+ LA County providers. Designed data workflows, claims analysis tools, and referral intelligence systems that turned raw data into actionable insights.

Consulting & Startups

Med Tech & Pharma Consulting • Prior experience

Sales consulting and PowerPoint storytelling for complex stakeholders. Biotech startup in SF Bay Area. Learned to translate technical complexity into business value.

Education

MS, Biomedical Engineering • Systems-level thinking meets healthcare domain

Why This Project?

After 2+ years building data systems inside a health system, I realized two things:

  1. CMS data is underutilized. Organizations don't know what's available or how to use it.
  2. Intelligence systems shouldn't be exclusive. Smaller organizations need the same tools as big health systems.

HealthcareDataAI is both a public service (free data access) and a portfolio (demonstrating what's possible for clients who need custom solutions).

The Platform

Built from scratch to serve healthcare data at scale.

107M
Total Records

Across 30 datasets: providers, claims, payments, quality, hospitals

6GB
Database Size

Compressed DuckDB format for fast queries

Quarterly
Update Frequency

Following CMS release schedules

Technology Stack

Database

DuckDB (columnar, analytics-optimized)

API

FastAPI (Python, async)

Frontend

Vanilla JS, no framework bloat

Deployment

Docker + nginx on Hetzner VPS

Pipeline

Python scripts + data.cms.gov

Hosting

$80/month (accessible at scale)

Open Source Approach

The code is available on GitHub. Data pipeline scripts, API endpoints, and dashboard code are all public. If you want to run your own instance or contribute improvements — you're welcome to.

Custom Services

While the platform is free, Blake offers bespoke data intelligence services for healthcare organizations that need more than self-service access.

🤖 Bespoke AI Agents

Custom LLM-powered agents for provider search, claims analysis, referral intelligence. Deployed in your Azure tenant (your data never leaves your environment).

📊 Intelligence Platforms

Full-stack data platforms tailored to your needs: dashboards, APIs, automated reports, alerting systems. Built on your data sources, not just CMS.

🔍 Custom Analysis

One-off research projects: market assessments, fraud detection, cost analysis, network intelligence. Academic rigor, business focus.

💼 Data Strategy

Consulting on data architecture, vendor selection, build-vs-buy decisions. Help you think through what's possible and what's worth building.

Interested in Custom Work?

If you need provider intelligence tools, claims analytics, fraud detection, or custom data solutions — let's talk. I've built these systems before and can do it again for you.

Contact Blake

Frequently Asked Questions

Where does the data come from?

All data is from CMS public use files available at data.cms.gov. We download, consolidate, and serve it through a modern API. No private or restricted data.

Is this affiliated with CMS?

No. HealthcareDataAI is an independent project. We use CMS data but are not endorsed by, sponsored by, or affiliated with CMS or the federal government. All analysis and opinions are our own.

Can I use this for commercial purposes?

Yes. CMS data is public domain and can be used commercially. Our API and transformed datasets are also free to use. Just attribute the data source (CMS) and respect our rate limits.

How often is data updated?

Quarterly, following CMS release schedules. NPPES updates monthly, Open Payments updates quarterly, Medicare utilization updates annually. We track the latest releases and refresh our database accordingly.

Do you have patient-level data?

No. All our data is provider-level aggregates (total services, total payments, etc.). Individual patient claims require a Data Use Agreement with CMS and are not available through our platform.

Can I download the entire database?

Yes, via Research Access (Tier 2). Request access through our data access page, describe your use case, and we'll send you a download link for the full DuckDB database (6GB).

How can I contribute or report issues?

The project is on GitHub. Open an issue for bugs or feature requests. Pull requests welcome for pipeline improvements or new data sources.