Why We Built a Model Context Protocol Server for Health Data
Most health platforms treat AI as a chatbot skin over a database. We built an MCP server with a plugin architecture, time-series storage, and multi-step reasoning. Here's why.
Most health platforms treat AI as a chatbot skin over a database. You ask a question, a query runs, an LLM paraphrases the result. It works — until you ask something that requires reasoning across multiple data sources, time ranges, or analytical dimensions.
We took a different approach. Omnio’s AI is powered by a purpose-built Model Context Protocol (MCP) server with a rich toolkit of specialized tools, a plugin architecture for every data source, and a time-series database designed for exactly this kind of workload. This post explains why — and what it means for the insights you get.
The Problem with “Database Layer + LLM Interpreter”
The common pattern in health AI products goes like this:
- User asks a question
- App translates it into a database query
- Database returns numbers
- LLM writes a sentence about those numbers
This works for simple lookups: “What was my sleep score last night?” But it falls apart for the questions that actually matter:
- “Why did my sleep quality drop this week?”
- “How does my training volume affect my recovery the next day?”
- “Compare my sleep on days I lift heavy vs. rest days over the last 90 days.”
These questions require multi-step reasoning — fetching sleep data, then activity data, then computing correlations, then checking for outliers, then contextualizing the results. A static database layer can’t anticipate the chain of analysis an LLM needs to perform.
MCP: Letting the AI Drive the Analysis
The Model Context Protocol is an open standard for connecting AI models to external tools and data sources. Instead of pre-computing answers, MCP gives the AI a toolkit and lets it decide which tools to use, in what order, based on what the user is actually asking.
Our MCP server exposes tools across several categories:
Domain Tools
The foundation. Each tool returns structured, normalized data from one or more wearable sources — covering sleep, recovery, activity, workouts, strength training, heart rate, stress, environment, bloodwork, body composition, blood oxygen, and nutrition.
These tools abstract away the differences between devices. Whether your sleep data comes from an Oura Ring, a Garmin, or a WHOOP, the AI sees a consistent schema.
Cross-Source Analysis Tools
This is where it gets interesting. These tools operate across data sources to find relationships:
- Correlation analysis — Joins all metrics by date and computes correlations across known metric pairs (e.g., training volume vs. deep sleep, air quality vs. sleep score, protein intake vs. recovery)
- Temporal patterns — Groups metrics by day of week to surface behavioral patterns
- Outlier detection — Identifies unusual days and cross-references what else was atypical, connecting cause and effect
- Predictive modeling — Identifies which metrics are the strongest predictors of a target metric using lag analysis
Comparison & Snapshot Tools
Tools for high-level overviews and period-over-period comparisons — things like comprehensive health snapshots, automatic delta computation between time periods, recovery adherence tracking, and analysis of how specific behaviors (like training) affect subsequent recovery and sleep.
Why This Matters in Practice
When you ask “Why did my sleep drop this week?”, the AI doesn’t just fetch your sleep data. It:
- Pulls sleep data for the current and previous week
- Sees the decline, then pulls activity, stress, HRV, environment, and nutrition data for the full period to look for correlations
- Identifies what was unusual on your worst sleep nights
- Synthesizes: “Your sleep score dropped 8 points this week. The data shows three contributing factors: your training volume was 40% higher than your 30-day average, your bedroom PM2.5 was elevated on Tuesday and Wednesday (likely the wildfire smoke), and your protein intake dropped by 25g/day — which correlates with reduced deep sleep in your historical data.”
That’s multiple tool calls, chained by the AI’s reasoning about what to investigate next. No pre-built query could anticipate that chain.
The Plugin Architecture
Every data source in Omnio is a plugin — a self-contained module that declares its capabilities and implements standardized data access methods.
We currently support plugins for Oura, Garmin, WHOOP, strength training apps (LiftLog, Hevy), nutrition trackers (MyFitnessPal, Cronometer), environment sensors (via Home Assistant), smart scales, DEXA scans, bloodwork, and manual entries.
Each plugin declares a capability map — structured metadata about what data types it provides and what fields are available. When the MCP server starts, it auto-discovers plugins — if you have Oura and Garmin configured, those plugins register. If you add WHOOP later, its data automatically appears in every cross-source analysis tool. The AI sees a unified dataset; it doesn’t need to know which device generated which metric.
The aggregation layer runs all plugin queries in parallel, merges results by date, and handles partial failures gracefully — if one source times out, the others still return.
Why Time-Series, Not Relational
Most health platforms store metric data in PostgreSQL (or worse, SQLite). We use a purpose-built time-series database designed for exactly this workload.
Why it matters:
- Native range queries — Aggregations, rollups, and range queries are first-class operations, not bolted-on SQL window functions
- Efficient storage — Compression ratios of 10–20x for health metrics mean we can store years of high-frequency data (heart rate every few seconds, sleep stages minute-by-minute) without cost concerns
- Downsampling — Automatic retention policies keep high-resolution data for recent periods and progressively aggregate older data
- Sub-second queries on 365-day ranges — Asking the AI to analyze a full year of correlations isn’t a theoretical feature; it’s a fast operation
- Label-based filtering — Sleep type, workout activity type, environment room, sensor source — all queryable via labels without schema changes
Every plugin’s data client speaks the database’s native query language. When the AI asks for 90 days of correlated data, it’s running parallel range queries across every configured source, not scanning relational tables.
Security Model: The AI Never Sees What It Shouldn’t
Health data is sensitive. Our architecture enforces strict boundaries:
Hardened System Prompts
The system prompt is structured into prioritized sections with explicit instructions that override any user attempts to manipulate behavior. User messages are delimited to prevent prompt injection.
Tool Result Sanitization
Every tool result passes through a sanitization boundary before reaching the LLM. Results are wrapped in clearly marked delimiters, and content is validated and size-limited to prevent context stuffing.
Per-User Data Scoping
The MCP server scopes every database query to the authenticated user. There’s no way for the AI — or a malicious prompt — to access another user’s data. The scoping happens at the query client level, below the tool layer.
Rate Limiting
Per-user daily limits on both messages and tool calls prevent abuse.
Sanitized Error Messages
If a tool fails, the user sees a safe generic message. The full stack trace is logged server-side for debugging — never exposed to the client or the LLM.
The Chat Orchestrator
The MCP server is one half of the system. The other half is the chat service — the orchestrator that manages the conversation loop between the user, the LLM, and the tools.
Multi-Provider Support
We support multiple LLM providers with a protocol-based abstraction that makes them interchangeable. All support streaming responses and function calling, with automatic retry and circuit-breaking for resilience.
Autonomous Investigation
When the AI decides it needs data, the chat service executes tool calls, feeds results back to the LLM for interpretation, and allows the AI to request additional tool calls based on what it finds. This means the AI can pursue multi-step investigations — it doesn’t stop after one query; it follows the data.
Persona System
The chat supports multiple personas — health coach (encouraging, action-oriented), clinical (objective, statistical), and casual (approachable, plain language) — each with carefully tuned system prompts. The health coach persona is specifically instructed to always explain why, not just what — connecting changes in one metric to correlated factors across other data sources.
Real-Time Streaming
Responses stream in real-time via Server-Sent Events. You see the AI thinking, see tool calls happening, and get the response word-by-word. A stop button lets you cancel generation mid-stream.
Observability: We Instrument Everything
Every layer of the system exposes metrics that we collect and visualize:
- Tool metrics — Call count, duration, error rate per tool
- Plugin metrics — Per-source call count and latency by capability type
- Database query metrics — Query count, duration, retries by query type
- Chat metrics — Messages, tool calls, token usage, latency, rate limit hits
- Infrastructure metrics — CPU, memory, disk I/O, network
When a user reports “the AI was slow answering my question,” we can trace it end-to-end — from chat latency to which tool calls were made to which plugin queries were slow to which database queries took longest. Full observability, not guesswork.
What’s Next
The MCP server architecture is designed to grow. Adding a new data source means writing a single plugin — declare capabilities, implement the data methods, and every cross-source tool automatically includes the new data. We’re working on:
- Mobile App
- Apple Health & Google Health Connect integration via mobile SDKs
- Fitbit and Polar device support
- Advanced analysis tools — multivariate analysis, seasonal decomposition, and ML-powered anomaly detection
- Proactive insights — scheduled analysis that surfaces notable changes before you ask
The health data space doesn’t need another dashboard that shows you numbers you already saw on your wrist. It needs infrastructure that turns fragmented data into understanding. That’s what we’re building.
Omnio is a health analytics platform that unifies wearable and health data with AI-powered insights. Learn more at getomn.io.
Related reading
- Your AI Health Assistant Doesn't Know Who You AreYour AI health assistant analyzes your sleep, recovery, and training data — but it never sees your name, email, or account ID. Here's how we built privacy into the architecture itself.
- Can Your Health AI Prove It Isn't Lying to You?We asked Oura, WHOOP, and Omnio the same research question. Two gave vague platitudes with zero citations. One gave verifiable evidence. Here's why the difference is architectural.
- We Asked Oura, WHOOP, and Omnio the Same Sleep Question. Here's What Happened.Every major wearable now has an AI assistant. We asked all three to compare a week of sleep data. The difference in depth reveals a fundamental architectural gap.