How is a composite score different from Oura Readiness or Garmin Body Battery?

Functionally they are the same type of metric — weighted combinations of physiological inputs. The difference is transparency. Oura, Garmin, and WHOOP do not publish their formulas, so when the number drops you cannot tell whether HRV, sleep, body temperature, or training load caused it. A transparent composite score exposes the weights, the inputs, and the contribution of each factor so you can act on the right lever.

What makes a composite score actually trustworthy?

Trust comes from being able to decompose the number. You should be able to see which inputs contributed positively, which dragged the score down, and by how much. A score that tells you recovery dropped to 58 because HRV is 15 percent below your 30-day average — contributing minus 12 points — is traceable. A score that just shows a color and a number is an opinion you are asked to trust without evidence.

What happens to a composite score when some of my data is missing?

Most platforms either reuse the last known value or silently drop the missing input and reweight the remaining ones, producing a score that looks confident but is not. A better approach attaches a confidence indicator to every score. When inputs are fresh and complete, confidence is high. When data is stale or missing, confidence degrades visibly, so you know whether a 75 is actionable or a guess dressed up as a measurement.

What Is a Composite Health Score and Why Does It Matter?

Q: What is a composite health score?

A composite health score is a weighted combination of multiple physiological inputs collapsed into a single number. A typical recovery score might weight HRV at 40%, sleep quality at 30%, resting heart rate at 20%, and training load trend at 10%. The weights encode a model of what matters for your specific goal, which is why a powerlifter's readiness score should look very different from a marathon runner's.

Your HRV is great. Your sleep was terrible. Should you train today?

The Problem with Single Metrics

Your Oura ring says your HRV is 15% above baseline. Good sign. But you slept four hours because your kid was sick. Your Garmin says your training load is “optimal.” But you’ve increased volume 40% in two weeks and your body hasn’t caught up yet. Your WHOOP recovery score is green. But it doesn’t know you skipped meals yesterday and your fasting glucose is climbing.

No single metric captures your full physiological state. They each measure one dimension of a multi-dimensional problem. And yet, most health platforms present them as independent numbers on a dashboard — a disconnected grid of green, yellow, and red indicators that you’re supposed to synthesize in your head. Apple Health gives you a wall of data. Google Fit gives you a wall of data. Neither synthesizes it into anything actionable. You get the ingredients but not the recipe.

This is the fundamental gap in wearable health technology. The sensors are good enough. The algorithms for individual metrics are solid. What’s missing is the layer that combines them into a single, contextual assessment of how your body is actually doing right now.

What a Composite Score Does

A composite score is a weighted combination of multiple inputs collapsed into a single number. The concept is straightforward. A recovery score might weight HRV at 40%, sleep quality at 30%, resting heart rate at 20%, and training load trend at 10%. The weights encode a model of what matters most for recovery — and that model can be tuned for different goals.

If you’re an endurance athlete in a build phase, you might want training load trend weighted more heavily. If you’re recovering from illness, sleep quality dominates. If you’re tapering for a race, HRV sensitivity matters more than volume.

The score itself isn’t magic. It’s a design decision. The weights determine what the number optimizes for, and different goals need different weights. A “readiness score” for a powerlifter looks nothing like a “readiness score” for a marathon runner. Pretending otherwise — which is what every one-size-fits-all wearable does — produces a number that’s too generic to act on.

What makes a composite score useful isn’t the math. It’s the transparency of the math. Which brings us to the problem.

Why Black-Box Scores Are Useless

Oura gives you a Readiness Score. Garmin gives you a Body Battery. WHOOP gives you a Recovery percentage. All three are composite scores. None of them show you the formula.

When your Oura Readiness Score drops from 85 to 62 overnight, you can’t tell whether it was driven by HRV, sleep, body temperature, or activity. You just see a number and a color. The app might surface a contributing factor — “your HRV was below average” — but it won’t tell you the weight, the threshold, or how that factor interacted with the others.

This matters because you can’t act on what you can’t understand. If your score dropped because of poor sleep, the intervention is obvious: fix your sleep. If it dropped because your training load has been climbing for three weeks, the intervention is a deload. If it dropped because of a temperature spike that might indicate early illness, the intervention is rest and monitoring. These are three completely different responses to the same “your score is low” notification.

Black-box scores optimize for simplicity at the expense of utility. They feel smart because they reduce complexity to a single number. But they’re actually less useful than the raw metrics they’re composed from — because at least with raw metrics, you know what you’re looking at.

There’s also a practical consistency problem. Individual physiological inputs — HRV, sleep stages, resting heart rate — are grounded in decades of peer-reviewed physiology. But the way each vendor combines those inputs into a single recovery or readiness score is proprietary and undocumented, and the same underlying data can yield very different composite scores across devices. The black box introduces noise on top of signals that were fine on their own.

How Omnio Builds Transparent Scores

Omnio’s scoring engine takes the opposite approach. Every composite score is defined in YAML configuration — human-readable, version-controlled, auditable. The engine uses pluggable normalizers to map raw inputs to a 0–100 scale, then applies weighted combination with topological dependency ordering so that scores can depend on other scores.

Here’s what that means in practice. When your recovery score drops, Omnio tells you exactly why: “Your recovery dropped to 58 because HRV is 15% below your 30-day average (contributing -12 points), which correlates with the 40% training load increase this week.” Not a vague suggestion. A specific, traceable decomposition.

Every score shows its breakdown. You can see which inputs contributed positively, which dragged the score down, and by how much. You can trace the causal chain: training load spiked → HRV suppressed → recovery dropped → readiness gated your next workout. The system isn’t just giving you a number. It’s showing its reasoning. See how this works in practice at composite health scores.

The Confidence Problem

Here’s something no other platform addresses: what happens when the data is incomplete?

If your Garmin hasn’t synced in three days, your training load input is stale. If your Oura died overnight, your sleep data is missing. If you haven’t logged blood work in six months, any biomarker-dependent score is guessing.

Most platforms handle this by either using the last known value (which might be days or weeks old) or silently dropping the input and reweighting the remaining factors. Both approaches produce a score that looks confident but isn’t. You see “Recovery: 82” and have no idea that half the inputs are stale.

Omnio includes a confidence indicator on every composite score. When inputs are fresh and complete, confidence is high. When data is missing or stale, confidence degrades visibly. A score of 75 with 95% confidence means something very different from a score of 75 with 40% confidence. The first is actionable. The second is a guess dressed up as a measurement.

This is borrowed directly from how Bayesian systems handle uncertainty — and we’ve written about that approach in detail in our post on the Bayesian training engine. The principle is the same: honest uncertainty is more useful than false precision.

Building Your Own Scores

The default scores in Omnio — recovery, readiness, training load, sleep quality — are designed to work well for most people. But “most people” might not be you.

Maybe you’re a shift worker and sleep timing matters more than sleep duration. Maybe you’re managing a chronic condition and CRP trends should be weighted into your daily readiness. Maybe you’re prepping for a race and want a “race readiness” score that factors in taper progression, carbohydrate loading, sleep debt, and HRV trend.

Omnio’s scoring engine is configurable. The YAML definitions that drive every score are the same ones you can modify. Want to weight sleep at 50% instead of 30%? Change one number. Want to create an entirely new composite score that combines inputs no one else has thought to combine? Define it, set the weights, and the engine handles the rest — normalization, dependency resolution, confidence tracking, all of it.

This isn’t a theoretical feature. It’s how we build our own scores internally. The same engine, the same configuration format, the same runtime. We don’t have a special internal version with more knobs. What ships is what we use.

The Bet

We’re betting that health scores should be tools, not oracles. That the value of a composite metric comes from understanding it, not just reading it. That a number you can decompose, question, and reconfigure is worth more than a number you’re told to trust.

Every wearable company is building toward the same thing: a single daily score that tells you what to do. The difference is whether you can see inside it. We think you should.