Which Wearable Is Most Accurate? What 17 Validation Studies Actually Found

Most “which wearable is best?” articles give you the same non-answer: it depends on what you value. Then they link affiliate products and move on.

We think you deserve the actual data. We dug into 17 peer-reviewed validation studies — three independent sleep studies using polysomnography, a 536-night HRV comparison against chest-strap ECG, multi-device SpO2 and VO2 max validations, and several manufacturer-funded studies where the funding itself is part of the story.

What we found is that no device wins everywhere, but the gaps between devices are larger than most people realize — and which device leads depends entirely on which metric you care about. Every study’s funding is disclosed below so you can judge the evidence yourself.

How to Read These Studies

Before the data, three things determine whether a study result means anything:

Reference standard — Sleep studies should use polysomnography (PSG). Heart rate should compare against ECG (e.g., Polar H10). VO2 max needs indirect calorimetry. If a study uses self-report as the reference, treat results skeptically.
Funding and affiliation — An Oura-funded study ranking Oura first for sleep doesn’t mean the data is wrong, but independent studies finding different rankings should carry more weight. We flag funding for every study below.
Error metric — Correlation alone is misleading. A device can correlate well with PSG but still be biased by 30+ minutes. Look for mean absolute error (MAE), bias, concordance correlation coefficient (CCC), and Cohen’s kappa (κ).

1. Sleep Staging (4-Stage Classification)

Classifying sleep into light, deep, REM, and wake stages is where wearables struggle most — and where study funding has the most visible impact on results. Three major studies compared devices against polysomnography (PSG), and they produced meaningfully different rankings:

Robbins et al. (2024) — Oura-funded

36 participants, multiple nights, Brigham and Women’s Hospital. Funded by Oura Ring Inc. Lead author Dr. Rebecca Robbins is an Oura scientific advisor.

Device	Cohen’s κ	Notes
Oura Ring Gen 3	0.65 (Substantial)	No significant over/underestimation of any sleep stage
Apple Watch Series 8	0.60 (Moderate)	Overestimated light sleep by 45 min, underestimated deep sleep by 43 min
Fitbit Sense 2	0.55 (Moderate)	Moderate accuracy overall

Park et al. (2023) — Independent

75 participants, 2 centers in Korea, 349,114 epochs analyzed. No industry funding disclosed. With 10× the epoch count and no manufacturer involvement, this study produced notably different rankings — Oura dropped from first to last:

Device	Cohen’s κ
Google Pixel Watch	0.4–0.6 (Moderate)
Galaxy Watch 5	0.4–0.6 (Moderate)
Fitbit Sense 2	0.4–0.6 (Moderate)
Apple Watch 8	0.2–0.4 (Fair)
Oura Ring 3	0.2–0.4 (Fair)

Schyvens et al. (2025) — Independent

62 adults, single-night PSG, University of Antwerp. Funded by VLAIO (Flanders Innovation & Entrepreneurship) — no device manufacturer funding.

Device	Cohen’s κ (4-stage)	TST Bias	Deep Sleep Correct	REM Correct	Wake Specificity
Apple Watch Series 8	0.53 (Moderate)	+19.6 min	50.7%	68.6%	52.2%
Fitbit Sense	0.42 (Moderate)	+6.3 min	48.3%	55.5%	39.2%
Fitbit Charge 5	0.41 (Moderate)	+11.1 min	43.3%	47.5%	42.7%
WHOOP 4.0	0.37 (Fair)	+24.5 min	69.6%	62.0%	32.5%
Withings Scanwatch	0.22 (Fair)	+39.9 min	29.8%	36.5%	29.4%
Garmin Vivosmart 4	0.21 (Fair)	+38.4 min	32.1%	28.7%	27.6%

Clinically acceptable = <30 min TST bias. Only Apple Watch, Fitbit Sense, and Fitbit Charge 5 met this threshold. The Schyvens data reveals a universal pattern: every device defaults to “light sleep” when uncertain, systematically inflating light sleep totals and underestimating how often you actually wake up (by 12–48 minutes).

WHOOP Sleep Staging — University of Arizona (2020)

WHOOP achieved 89% agreement on 2-stage classification (sleep vs. wake) but only 64% on 4-stage classification (light/deep/REM/wake), with κ=0.47 against PSG. That’s moderate agreement — better than chance but far from clinical grade.

Deep Sleep Detection Sensitivity — Robbins et al. (2024), Oura-funded

Device	Sensitivity	Bias
Oura Ring Gen 3	79.5%	No significant bias
Fitbit Sense 2	61.7%	−15 min (underestimates)
Apple Watch Series 8	50.5%	−43 min (underestimates)

Wake Detection Sensitivity — Robbins et al. (2024) & Chinoy et al. (2022)

Device	Sensitivity
Oura Ring Gen 3	68.6%
Fitbit Sense 2	67.7%
Apple Watch Series 8	52.4%
Garmin Vivosmart 4	27%

Practical takeaway: Single-night stage breakdowns are unreliable from any device. Use multi-night averages for sleep duration trends. If you frequently wake during the night, know that your wearable is almost certainly undercounting those awakenings — regardless of brand.

2. Nocturnal Heart Rate Variability (HRV)

Dial et al. (2025) — Independent

13 participants, 536 nights, Ohio State University / Air Force Research Lab. No industry funding disclosed. Reference: Polar H10 chest strap.

Nocturnal HRV is where Oura’s ring form factor pays off — continuous finger-based PPG during sleep produces a cleaner signal than wrist-based sensors fighting motion noise. The study measured concordance correlation coefficients (CCC) for nocturnal RMSSD:

Device	CCC	MAPE	Rating
Oura Ring Gen 4	0.99	5.96% ± 5.12%	Nearly Perfect
Oura Ring Gen 3	0.97	7.15% ± 5.48%	Substantial
WHOOP 4.0	0.94	8.17% ± 10.49%	Moderate
Garmin Fenix 6	0.87	10.52% ± 8.63%	Poor
Polar Grit X Pro	0.82	16.32% ± 24.39%	Poor

CCC scale: >0.99 = Nearly Perfect, 0.95–0.99 = Substantial, 0.90–0.95 = Moderate, <0.90 = Poor.

Important caveat: The Garmin Fenix 6 tested is 2+ generations old. Current Garmin devices (Fenix 7/8, Forerunner 265/965) may perform differently. The study authors acknowledged this limitation.

Practical takeaway: If nocturnal HRV is your primary recovery metric, Oura Gen 4 is the clear leader. But never compare HRV numbers across devices — the algorithms, sampling windows, and artifact filtering are all different. Pick one device and track your own trend over time.

3. Resting Heart Rate (RHR)

Dial et al. (2025) — Same study as above

Device	CCC	MAPE	Rating
Oura Ring Gen 4	0.98	1.94% ± 2.51%	Nearly Perfect
Oura Ring Gen 3	0.97	1.67% ± 1.54%	Substantial
WHOOP 4.0	0.91	3.00% ± 2.15%	Moderate
Polar Grit X Pro	0.86	2.71% ± 2.75%	Poor

Note: Garmin Fenix 6 was excluded from RHR analysis due to timestamp reporting issues that prevented alignment with the Polar H10 reference data.

Resting HR is the one metric where you can mostly trust any device. Even the weakest performer (Polar, CCC 0.86) stays under 3% average error. This is the easiest measurement for an optical sensor — at rest, blood flow is steady, motion artifact is near zero, and the signal-to-noise ratio is high.

Practical takeaway: If your resting HR suddenly jumps 5–10 bpm, it likely reflects a real physiological change (illness, stress, overtraining) regardless of which device you’re wearing. This is the most actionable metric across all wearables.

4. Active Heart Rate

WellnessPulse Meta-Analysis (2025)

Active heart rate accuracy (percentage of readings within acceptable error):

Device	Accuracy
Apple Watch	86.31%
Fitbit	73.56%
Garmin	67.73%
TomTom	67.63%

Heart rate correlation vs. ECG during activity (WellnessPulse / PubMed Central aggregate):

Device	Correlation (r)
Polar Chest Strap	0.99
Apple Watch	0.80
Garmin	0.52

The gap between resting and active HR accuracy is striking. Apple Watch drops from near-perfect resting agreement to 86% accuracy and r=0.80 during exercise — and it’s still the best wrist device. Garmin’s r=0.52 during activity means its readings are barely correlated with actual heart rate — functionally useless for pacing decisions.

The physics explains this: wrist-based optical sensors measure blood volume changes through skin. During exercise, motion artifact, sweat, and reduced peripheral blood flow all degrade the signal. Activities with grip pressure (cycling, rowing, lifting) or rapid arm movement (boxing, CrossFit) are the worst cases.

Practical takeaway: For zone-based cardio training where HR accuracy matters, a chest strap (r=0.99) is in a different league from any wrist sensor. For steady-state running or walking, Apple Watch is adequate.

5. Blood Oxygen (SpO2)

Various validation studies (PLOS, Nature, etc.)

Device	MAE	MDE	RMSE
Apple Watch Series 7	2.2%	−0.4%	2.9%
Garmin Fenix 6 Pro	~4.5%	—	—
Withings ScanWatch	~4.8%	—	—
Garmin Venu 2s	5.8%	5.5%	6.7%

Device	Within Range	Underestimate	Missing Data
Apple Watch Series 7	58.3%	24.3%	11%
Garmin Fenix 6 Pro	~44%	~28%	28%
Withings ScanWatch	~38%	~31%	31%
Garmin Venu 2s	18.5%	67.4%	14%

These numbers are sobering. The best consumer device (Apple Watch) is only “within range” 58% of the time — meaning it’s wrong in some direction for 4 out of 10 readings. The Garmin Venu 2s underestimates SpO2 in two-thirds of readings and misses data entirely 14% of the time. None of these are FDA-cleared for SpO2 — Apple Watch included.

Practical takeaway: Consumer SpO2 is useful for detecting trends (altitude adaptation, possible sleep apnea patterns over weeks) but should never inform a medical decision. If you see consistently low readings, get a medical-grade pulse oximeter before worrying.

6. Step Count

WellnessPulse Meta-Analysis (2025)

Device	Accuracy
Garmin	82.58%
Apple Watch	81.07%
Fitbit	77.29%
Jawbone	57.91%
Polar	53.21%
Oura Ring	Poor (50.3% error real-world, 4.8% controlled)

Additional MAPE data:

Device	MAPE
Garmin Vivoactive 4	<2%
Fitbit Sense	~8%

Step counting is the most commoditized metric — Garmin and Apple Watch are within 1.5% of each other, and even Fitbit’s 77% is serviceable. The real outlier is Oura, which was never designed for step detection (a finger doesn’t swing like a wrist during walking). Edge cases that degrade all devices: slow gait, pushing a stroller or cart, walking with a cane, and arm-intensive activities that get misclassified as steps.

Practical takeaway: For step-based activity goals, any major wrist device works. Don’t obsess over daily counts — look at 7-day rolling averages to smooth out noise.

7. Calories / Energy Expenditure

WellnessPulse Meta-Analysis (2025)

Device	Accuracy
Apple Watch	71.02%
Fitbit	65.57%
Polar	~50–65%
Garmin	48.05%

Oura Ring reports ~87% accuracy (13% average error), but this figure reflects resting/basal metabolic estimates — a fundamentally easier problem than tracking active calorie burn. It’s not directly comparable to the active-exercise accuracy figures above.

This is the weakest metric category overall. Apple Watch leads at 71%, which still means nearly a third of calorie readings are off by a meaningful amount. Garmin’s 48% during activity is essentially no better than guessing. All devices use HR + accelerometer data fed into proprietary algorithms, and accuracy drops further during high-intensity or multi-modal exercise.

Practical takeaway: No consumer wearable is a reliable calorie counter. For body composition goals, track nutrition intake and scale trends — don’t trust the burn number on your wrist.

8. VO2 Max Estimation

Caserman et al. (2024), Lambe et al. (2025), Garmin validations

Device	MAPE	MAE	Bias Direction
Garmin Forerunner 245	5.7%	—	Acceptable for runners
Garmin Fenix 6	7.05%	—	CCC=0.73 for 30s averages
Apple Watch Series 7	15.79%	6.07 ml/kg/min	Underestimates
Apple Watch (2025 study)	13.31%	6.92 ml/kg/min	Mixed

Garmin’s advantage here is substantial — roughly half the error rate of Apple Watch (MAPE 5.7–7% vs. 13–16%). Apple Watch’s MAE of 6–7 ml/kg/min is significant for a metric that typically ranges 30–60 ml/kg/min — that’s a 10–20% relative error. Both devices share a systematic bias: they pull everyone toward the population mean, overestimating sedentary users and underestimating athletes. If your true VO2 max is 55+, expect your watch to lowball it.

Practical takeaway: For VO2 max trending, Garmin is the clear winner. But even Garmin’s number is an estimate — useful for tracking your own trajectory over months, not for comparing against someone else’s device or a lab result.

9. Skin Temperature

Oura Internal Validation (2024)

16 participants, 1 week, 93,571 data points. This is Oura’s own study, not independently peer-reviewed.

Device	Lab Accuracy	Real-World Accuracy	Precision
Oura Ring	r² > 0.99	r² > 0.92	±0.13°C (0.234°F) per minute

Independent menstrual cycle tracking studies (Maijala et al., 2019) have validated the utility of nocturnal finger skin temperature for cycle phase detection.

Apple Watch, Garmin, WHOOP, and Samsung all track skin temperature, but limited independent validation data comparing accuracy across devices exists, which is why this metric is excluded from the master summary.

Practical takeaway: Skin temperature trends are useful for cycle tracking and illness detection. Oura has the most published data here, but independent comparative studies are still lacking.

10. Respiratory Rate

Respiratory rate is the least validated metric across all consumer wearables. Most manufacturers claim to track it, but independent comparative studies are essentially nonexistent.

Samsung has published validation data (Park et al., 2023 — Samsung-funded), but cross-device comparisons don’t exist in the literature.

Practical takeaway: Treat respiratory rate as experimental. If a sudden change correlates with other signals (elevated RHR, poor sleep), it may be worth noting, but don’t rely on it in isolation.

11. FDA-Cleared Features

Feature	Device	Status
ECG / Atrial Fibrillation Detection	Apple Watch (Series 4+)	FDA Cleared
ECG / Atrial Fibrillation Detection	Samsung Galaxy Watch (4+)	FDA Cleared
Sleep Apnea Notification	Apple Watch (Series 9+, Ultra 2)	FDA Authorized
Sleep Apnea Detection	Samsung Galaxy Watch	FDA De Novo Authorized (Feb 2024)
Blood Oxygen (SpO2)	Apple Watch	Wellness feature (not FDA cleared)
Irregular Rhythm Notification	Fitbit	FDA Cleared

WHOOP and Garmin have no FDA-cleared features. FDA clearance means the feature has passed validation for a specific clinical use case. “Wellness” features (most HRV, sleep staging, stress scores) have no regulatory oversight.

Important Caveats

These aren’t footnotes — they materially affect how you should interpret everything above:

Study funding matters. The primary sleep study ranking Oura highest (Robbins et al.) was Oura-funded. Two independent studies (Park et al., Schyvens et al.) found different rankings. Weight independent findings more heavily when they conflict.
Device generations matter. The Garmin Fenix 6 and Vivosmart 4 tested in several studies are 2+ generations behind current models. Results may not apply to the Fenix 8 or Forerunner 965.
Small sample sizes. The HRV/RHR study (Dial et al.) had only 13 participants, though 536 nights of data partially compensates. The Antwerp study had 62 participants but only 1 night each.
PSG isn’t perfect either. The “gold standard” polysomnography has inter-rater reliability of κ≈0.75, meaning even human sleep experts disagree ~25% of the time on stage classification.
Skin tone and body composition bias. PPG (optical heart rate) accuracy is affected by skin pigmentation, tattoos, BMI, and wear fit. Most validation studies have predominantly white participants — a critical research gap.
Individual variation is real. Accuracy can differ meaningfully from person to person based on wrist anatomy, skin tone, tattoos, body composition, and how tightly the device is worn. Population-level accuracy figures don’t guarantee your personal experience.
Calorie tracking is weak across all devices. Even the best performer (Apple Watch, 71%) is wrong nearly a third of the time. No consumer wearable should be used as a precise calorie counter.
All wearables default to light sleep when uncertain. Every consumer device tested shows the same conservative algorithmic bias: when in doubt, label it light sleep. This inflates light sleep percentages across the board.
Algorithms update silently. A firmware update can change how your device calculates HRV, sleep stages, or recovery scores. Validation studies test a snapshot in time — your device’s current firmware may produce different results.

Cross-Metric Patterns: What The Data Actually Reveals

Three patterns emerge when you look across all 17 studies together — patterns you won’t see if you only read one study at a time.

Pattern 1: Recovery metrics vs. activity metrics are dominated by different devices

Oura consistently leads metrics measured at rest — nocturnal HRV (CCC 0.99), resting heart rate (CCC 0.98), and even resting calorie estimation (~87%). Apple Watch consistently leads metrics measured during activity — active HR (86.3%), SpO2 (MAE 2.2%), and sleep staging in independent studies (κ=0.53). Garmin leads fitness performance metrics — step counting (82.6%) and VO2 max (MAPE 5.7–7%).

This isn’t coincidental. Oura is a ring — it has excellent skin contact and minimal motion artifact during sleep, but it can’t track wrist movement well (poor step counting) and has no GPS. Apple Watch is a full smartwatch with GPS, accelerometer, and gyroscope — better suited for daytime activity tracking. Garmin’s running-focused algorithms have years of sport-specific tuning.

Pattern 2: Study funding consistently shifts rankings

Metric	Oura-funded result	Independent result
Sleep staging	Oura #1 (κ=0.65)	Oura #5 (κ=0.2–0.4)
Deep sleep	Oura #1 (79.5%)	WHOOP #1 (69.6%)
Wake detection	Oura #1 (68.6%)	Apple Watch #1 (52.2%)

This doesn’t prove the Oura-funded studies are wrong — but it does mean you should weight independent findings more heavily when the two conflict.

Pattern 3: Every device has the same failure mode for sleep

Across all three sleep studies and all six devices tested, every single one defaults to labeling uncertain epochs as “light sleep.” This inflates light sleep totals and underestimates wake time by 12–48 minutes. It’s a conservative algorithmic choice — manufacturers would rather you think you slept lightly than tell you that you were awake and have their “sleep score” look worse.

Which Device For Your Goal?

Instead of “which is best overall,” the research points to specific devices for specific goals:

If you care most about recovery and readiness: Oura Gen 4 — best-in-class nocturnal HRV (CCC 0.99, MAPE 5.96%), best resting HR (CCC 0.98, MAPE 1.94%). Recovery signals are measured at rest, where Oura’s ring form factor excels.

If you care most about workout accuracy: Apple Watch — leads active HR (86.3% accuracy, r=0.80 vs ECG), best SpO2 (MAE 2.2%), strong sleep staging in independent studies (κ=0.53). For intervals or high-intensity work, pair any wrist device with a Polar chest strap (r=0.99).

If you care most about running/cardio performance: Garmin — leads VO2 max estimation (MAPE 5.7–7% vs Apple Watch’s 13–16%), leads step counting (82.6%), strong activity-specific algorithms. Weak on recovery metrics (HRV CCC 0.87, excluded from RHR analysis).

If you want clinical-grade cardiac screening: Apple Watch or Samsung Galaxy Watch — only devices with FDA-cleared ECG and atrial fibrillation detection.

If you want one device that does everything adequately: Apple Watch — never the worst at anything, top 2 in most activity metrics, only device with FDA-cleared cardiac features. Its main weakness is HRV/recovery tracking, where Oura leads significantly.

The real insight isn’t “buy Device X.” It’s that no single wearable covers every blind spot. Oura can’t tell you how hard your workout was. Apple Watch can’t match Oura’s recovery signal fidelity. Garmin’s VO2 max estimate won’t help you understand why your sleep tanked.

That’s the case for combining sources — not to collect more data for its own sake, but to give yourself enough context to actually interpret what’s happening. When your recovery score drops, you want to know whether it’s the bad sleep, the hard workout, the late meal, or the bedroom temperature. No single wrist (or finger) can see all of that.

Sources

Robbins R, et al. (2024). “Accuracy of Three Commercial Wearable Devices for Sleep Tracking in Healthy Adults.” Sensors, 24(20), 6532. DOI: 10.3390/s24206532 — Funded by Oura Ring Inc.
Dial MB, et al. (2025). “Validation of nocturnal resting heart rate and heart rate variability in consumer wearables.” Physiological Reports, 13(16), e70527. DOI: 10.14814/phy2.70527 — Independent (Ohio State / Air Force Research Lab)
Park et al. (2023). “Accuracy of 11 Wearable, Nearable, and Airable Consumer Sleep Trackers: Prospective Multicenter Validation Study.” JMIR mHealth and uHealth, 11, e50983. DOI: 10.2196/50983 — Independent (Korean multicenter)
Park et al. (2023). “Validating a Consumer Smartwatch for Nocturnal Respiratory Rate Measurements in Sleep Monitoring.” Sensors, 23(18), 7867. DOI: 10.3390/s23187867 — Samsung-affiliated, Samsung-funded
Khodr R, et al. (2024). “Accuracy, Utility and Applicability of the WHOOP Wearable Monitoring Device in Health, Wellness and Performance — A Systematic Review.” medRxiv. DOI: 10.1101/2024.01.04.24300784
Oura Internal Validation (2024). Temperature sensor validation study. 16 participants, 93,571 data points. Published on Oura blog — Oura internal study
Maijala et al. (2019). “Nocturnal finger skin temperature in menstrual cycle tracking.” BMC Women’s Health, 19, 150. DOI: 10.1186/s12905-019-0844-9
Lanfranchi et al. (2024). Samsung Galaxy Watch SpO2 validation. Journal of Clinical Sleep Medicine, 20(9), 1479–1488. DOI: 10.5664/jcsm.11178 — Samsung-affiliated
WellnessPulse Meta-Analysis (2025). Accuracy of Fitness Trackers — Aggregate data
AIM7. Smartwatch/Wearable Technology Accuracy — Aggregate validation data
Christakis et al. (2025). “A guide to consumer-grade wearables in cardiovascular clinical care.” npj Cardiovascular Health, 2, 82. DOI: 10.1038/s44325-025-00082-6
PMC/JAMA (2025). “Selecting Wearable Devices to Measure Cardiovascular Functions in Community-Dwelling Adults.” DOI: 10.1016/j.jamda.2025.105529
Schyvens AM, et al. (2025). “Performance of six consumer sleep trackers in comparison with polysomnography in healthy adults.” Sleep Advances, 6(1), zpaf016. DOI: 10.1093/sleepadvances/zpaf016 — Independent (VLAIO-funded, University of Antwerp)
Caserman P, et al. (2024). “Validity of Apple Watch Series 7 VO2 Max Estimation.” JMIR Biomedical Engineering, 9, e54023.
Lambe RF, et al. (2025). “Validation of Apple Watch VO2 max estimates.” PLOS One, 20(2), e0318498. DOI: 10.1371/journal.pone.0318498
Miller DJ, et al. (2022). “A Validation of Six Wearable Devices for Estimating Sleep, Heart Rate and Heart Rate Variability in Healthy Adults.” Sensors, 22(16), 6317. DOI: 10.3390/s22166317
University of Arizona (2020). WHOOP sleep staging validation vs polysomnography. 89% 2-stage agreement, 64% 4-stage, κ=0.47.

How to Combine Oura and Garmin Data in One Dashboard — practical guide to merging multi-device data without duplicating metrics
Garmin vs Oura: Which Is Better for Training Readiness and Sleep? — head-to-head comparison using the same validation studies cited above
Oura vs WHOOP: Which Should You Buy for Sleep, Recovery, and Training? — the two recovery-focused devices compared in detail
Best App for Polar H10 Heart Rate and HRV Tracking — why chest-strap accuracy (r=0.99) makes the H10 the reference standard in these studies

Omnio unifies data from Oura, Apple Watch, Garmin, WHOOP, and more — so you can see what actually matters across all your sources. Join the pre-beta at getomn.io.

How to Read These Studies

1. Sleep Staging (4-Stage Classification)

Robbins et al. (2024) — Oura-funded

Park et al. (2023) — Independent

Schyvens et al. (2025) — Independent

WHOOP Sleep Staging — University of Arizona (2020)

Deep Sleep Detection Sensitivity — Robbins et al. (2024), Oura-funded

Wake Detection Sensitivity — Robbins et al. (2024) & Chinoy et al. (2022)

2. Nocturnal Heart Rate Variability (HRV)

Dial et al. (2025) — Independent

3. Resting Heart Rate (RHR)

Dial et al. (2025) — Same study as above

4. Active Heart Rate

WellnessPulse Meta-Analysis (2025)

5. Blood Oxygen (SpO2)

Various validation studies (PLOS, Nature, etc.)

6. Step Count

WellnessPulse Meta-Analysis (2025)

7. Calories / Energy Expenditure

WellnessPulse Meta-Analysis (2025)

8. VO2 Max Estimation

Caserman et al. (2024), Lambe et al. (2025), Garmin validations

9. Skin Temperature

Oura Internal Validation (2024)

10. Respiratory Rate

11. FDA-Cleared Features

Important Caveats

Cross-Metric Patterns: What The Data Actually Reveals

Pattern 1: Recovery metrics vs. activity metrics are dominated by different devices

Pattern 2: Study funding consistently shifts rankings

Pattern 3: Every device has the same failure mode for sleep

Which Device For Your Goal?

Sources

Related Reading

Related reading