Study record · comparative · 2025
A performance validation of six commercial wrist-worn wearable sleep-tracking devices for sleep stage scoring compared to polysomnography
Schyvens AM, Peters B, Van Oost NC, Aerts JM, Masci F, Neven A, et al.
Sleep Advances, 6(2), zpaf021 · 2025
Why this study matters to CircaTest
Single most editorially important study in the CircaTest corpus. Six commercial wearables tested against PSG in a uniform protocol means the kappa values are directly comparable in a way most validation studies are not. Drives the head-to-head accuracy figures across CircaTest's comparison content. Limitations: tested previous-generation models (Series 8 not 10, Charge 5 not 6, original ScanWatch not 2) so the results are upper bounds for current models, not direct evidence.
Abstract
STUDY OBJECTIVES: The aim of this study is to assess the performance of six different consumer wearable sleep-tracking devices, namely the Fitbit Charge 5, Fitbit Sense, Withings Scanwatch, Garmin Vivosmart 4, Whoop 4.0, and the Apple Watch Series 8, for detecting sleep parameters compared to the gold standard, polysomnography (PSG). METHODS: Sixty-two adults (52 males and 10 females, mean age ± SD = 46.0 ± 12.6 years) spent a single night in the sleep laboratory with PSG while simultaneously using two to four wearable devices. RESULTS: The results indicate that most wearables displayed significant differences with PSG for total sleep time, sleep efficiency, wake after sleep onset, and light sleep (LS). Nevertheless, all wearables demonstrated a higher percentage of correctly identified epochs for deep sleep and rapid eye movement sleep compared to wake (W) and LS. All devices detected >90% of sleep epochs (ie, sensitivity), but showed lower specificity (29.39%-52.15%). The Cohen's kappa coefficients of the wearable devices ranged from 0.21 to 0.53, indicating fair to moderate agreement with PSG. CONCLUSIONS: Our results indicate that all devices can benefit from further improvement for multistate categorization. However, the devices with higher Cohen's kappa coefficients, such as the Fitbit Sense (κ = 0.42), Fitbit Charge 5 (κ = 0.41), and Apple Watch Series 8 (κ = 0.53), could be effectively used to track prolonged and significant changes in sleep architecture.
Source: PUBMED · Licensed under CC-BY 4.0
Population
Sample size
n = 62
Age
46.0 ± 12.6 years
Reference standard
psg
62 adults (52 male, 10 female), mean age 46.0 ± 12.6 years, single in-laboratory night with simultaneous PSG and 2-4 wearables.
Devices and metrics
Apple Watch Series 8
All studies for this device →| Metric | Value | 95% CI | Note |
|---|---|---|---|
| Cohen's kappa | κ = 0.53 | — | Highest of six devices tested per the published abstract. Tested on Series 8, not Series 10. Stage-specific percentages are reported in the full paper but are not extracted into this record; consult the source link for per-stage breakdowns. |
Fitbit Charge 5 (also Fitbit Sense)
All studies for this device →| Metric | Value | 95% CI | Note |
|---|---|---|---|
| Cohen's kappa | κ = 0.41 | — | Charge 5 kappa per the published abstract. The same study also reports Fitbit Sense at κ=0.42. No Charge 6 specific PSG validation has been published. |
Withings ScanWatch
All studies for this device →| Metric | Value | 95% CI | Note |
|---|---|---|---|
| Cohen's kappa | see source | — | The published abstract states Cohen's kappa across all six devices ranged from 0.21 to 0.53 but explicitly names only Sense (0.42), Charge 5 (0.41), and Apple Watch S8 (0.53). The per-device kappa for the Withings ScanWatch is not stated in the abstract; consult the full paper for the breakdown. |
Whoop 4.0
All studies for this device →| Metric | Value | 95% CI | Note |
|---|---|---|---|
| Cohen's kappa | see source | — | The published abstract reports the kappa range across all six devices (0.21-0.53) but does not state the per-device kappa for Whoop 4.0. Consult the full paper for the breakdown. |
Cite this study
Schyvens AM, Peters B, Van Oost NC, Aerts JM, Masci F, Neven A, et al. (2025). A performance validation of six commercial wrist-worn wearable sleep-tracking devices for sleep stage scoring compared to polysomnography. Sleep Advances, 6(2), zpaf021. https://doi.org/10.1093/sleepadvances/zpaf021
Source links
Added to the CircaTest meta-analysis on 2026-04-06. How CircaTest evaluates studies →