Apple
Apple Watch Series 10 validation studies
6 peer-reviewed studies in the CircaTest corpus that validated this device against polysomnography or another reference standard.
Read CircaTest's Apple Watch Series 10 review →Apple watch accuracy in monitoring health metrics: a systematic review and meta-analysis
Choe & Kang, 2025 · Physiological Measurement
Most comprehensive published meta-analysis of Apple Watch accuracy: 56 studies, 270 effect sizes. Editorially load-bearing because it gives a definitive answer to two common reader questions: (1) Is Apple Watch heart rate accurate? Yes (mean bias -0.12 bpm, none of the subgroups exceed the 10% MAPE threshold). (2) Is Apple Watch energy expenditure accurate? No (every subgroup exceeds the 10% MAPE threshold). Important limitation for CircaTest's editorial focus: this meta-analysis covers HR, energy expenditure, and step counts, NOT sleep stage classification. For Apple Watch sleep accuracy, see Schyvens 2025 and Walch 2019.
Detection of sleep apnea using only inertial measurement unit signals from Apple Watch: a pilot study with machine learning approach
Hayano et al., 2025 · Sleep & Breathing
Important because it validates Apple Watch IMU-only sleep apnea detection, which is methodologically distinct from Apple's own sleep apnea notifications feature (which uses combined sensors). Hayano et al. demonstrate that even ACCELEROMETER-ONLY data from the Apple Watch can detect apnea/hypopnea events at AUC 0.831 in a held-out test set. CircaTest cites this when discussing the underlying feasibility of consumer-wearable apnea screening. Caveat: Random Forest models are not the same as Apple's production algorithm; the AUC figure is for the research classifier, not for what an end user sees on a Series 10 or 11.
Performance of consumer wrist-worn sleep tracking devices compared to polysomnography: a meta-analysis
Lee et al., 2025 · Journal of Clinical Sleep Medicine
The most comprehensive recent meta-analysis of consumer wrist-worn sleep trackers vs PSG: 24 studies, 798 patients, 12 different brands including Fitbit, WHOOP, Garmin, Apple Watch, Empatica E4, and Xiaomi Mi Band 5. Headline finding is that across the entire device set, consumer wrist trackers UNDERESTIMATE total sleep time by ~17 minutes (95% CI -26 to -7) and UNDERESTIMATE sleep efficiency by ~4.7 percentage points, both statistically significant. This is the strongest published quantitative answer to the question 'how wrong are consumer trackers on average' across the wrist-worn category. Important limit: pooled across brands, no per-device breakdown extracted into this record.
A performance validation of six commercial wrist-worn wearable sleep-tracking devices for sleep stage scoring compared to polysomnography
Schyvens et al., 2025 · Sleep Advances
Single most editorially important study in the CircaTest corpus. Six commercial wearables tested against PSG in a uniform protocol means the kappa values are directly comparable in a way most validation studies are not. Drives the head-to-head accuracy figures across CircaTest's comparison content. Limitations: tested previous-generation models (Series 8 not 10, Charge 5 not 6, original ScanWatch not 2) so the results are upper bounds for current models, not direct evidence.
Detecting sleep using heart rate and motion data from multisensor consumer-grade wearables, relative to wrist actigraphy and polysomnography
Roberts et al., 2020 · Sleep
Important because it directly compares Apple Watch and Oura Ring against ECG and PSG using identical methodology and machine-learning-built classifiers. The published abstract reports aggregated ranges across the device set (sensitivity 0.883-0.977, specificity 0.407-0.821, d' 1.827-2.347) but does not break these down per device, so this CircaTest record stores them as range-only with null per-device values. Anyone needing per-device numbers should consult the full paper at the PMC link.
Sleep stage prediction with raw acceleration and photoplethysmography heart rate data derived from a consumer wearable device
Walch et al., 2019 · Sleep
Important because it demonstrates what an open, peer-reviewed Apple Watch sleep classifier can achieve from raw sensor data, independent of Apple's proprietary algorithm. The accompanying PhysioNet dataset is the most-used open dataset for wearable sleep classification research. Useful counterpoint to Apple's closed-algorithm white papers.