Study record · meta analysis · 2019

Accuracy of Wristband Fitbit Models in Assessing Sleep: Systematic Review and Meta-Analysis

Haghayegh S, Khoshnevis S, Smolensky MH, Diller KR, and Castriotta RJ

Journal of Medical Internet Research, 21(11), e16273 · 2019

Why this study matters to CircaTest

The most-cited meta-analysis of Fitbit accuracy. CircaTest references the 81-91% sleep/wake accuracy figure as the editorial baseline for any Fitbit claim, particularly because no peer-reviewed Charge 6 specific validation has been published. The very low specificity range (10-52%) on early models is the source of the well-known 'Fitbits overestimate sleep' criticism.

Abstract

BACKGROUND: Wearable sleep monitors are of high interest to consumers and researchers because of their ability to provide estimation of sleep patterns in free-living conditions in a cost-efficient way. OBJECTIVE: We conducted a systematic review of publications reporting on the performance of wristband Fitbit models in assessing sleep parameters and stages. METHODS: In adherence with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, we comprehensively searched the Cumulative Index to Nursing and Allied Health Literature (CINAHL), Cochrane, Embase, MEDLINE, PubMed, PsycINFO, and Web of Science databases using the keyword Fitbit. RESULTS: 22 articles qualified for systematic review, with 8 providing quantitative data for meta-analysis. In reference to polysomnography (PSG), nonsleep-staging Fitbit models tended to overestimate total sleep time (range 7-67 min) and sleep efficiency (2-15%), and underestimate wake after sleep onset (range 6-44 min). Nonsleep-staging Fitbit models correctly identified sleep epochs with accuracy values between 0.81 and 0.91, sensitivity values between 0.87 and 0.99, and specificity values between 0.10 and 0.52. Recent-generation sleep-staging Fitbit models showed no significant difference in measured WASO, TST, and SE versus PSG, and higher sensitivity (0.95-0.96) and specificity (0.58-0.69) than nonsleep-staging models. CONCLUSIONS: Sleep-staging Fitbit models showed promising performance, especially in differentiating wake from sleep. However, although these models are a convenient and economical means for consumers to obtain gross estimates of sleep parameters and time spent in sleep stages, they are of limited specificity and are not a substitute for PSG.

Source: PUBMED · Licensed under CC-BY 4.0

Population

Age

varied across included studies

Reference standard

psg

Systematic review and meta-analysis pooling data from 22 individual Fitbit validation studies, 8 of which contributed quantitative effect sizes.

Devices and metrics

Multiple Fitbit wristband models (Charge, Charge HR, Charge 2, Alta, Alta HR, Inspire, Versa, Ionic)

All studies for this device →

Metric	Value	95% CI	Note
Accuracy	91%	—	Upper bound of sleep epoch detection accuracy across non-sleep-staging Fitbit models.
Accuracy	81%	—	Lower bound of sleep epoch detection accuracy across non-sleep-staging Fitbit models.
Sensitivity	99%	—	Upper bound sensitivity to sleep, non-staging models.
Sensitivity	87%	—	Lower bound sensitivity to sleep, non-staging models.
Specificity	52%	—	Upper bound specificity for wake, non-staging models.
Specificity	10%	—	Lower bound specificity for wake, non-staging models — the well-known Fitbit weakness on detecting wakefulness.
Sensitivity	96%	—	Upper bound sensitivity for sleep-staging models.
Sensitivity	95%	—	Lower bound sensitivity for sleep-staging models.
Specificity	69%	—	Upper bound specificity for sleep-staging models.
Specificity	58%	—	Lower bound specificity for sleep-staging models.

Cite this study

Haghayegh S, Khoshnevis S, Smolensky MH, Diller KR, and Castriotta RJ (2019). Accuracy of Wristband Fitbit Models in Assessing Sleep: Systematic Review and Meta-Analysis. Journal of Medical Internet Research, 21(11), e16273. https://doi.org/10.2196/16273

AI-assembled record · errors are possible

This page was built and is maintained by AI working from the published abstract and PubMed metadata for the cited study. CircaTest does not have access to the full text of every paper in its corpus. Categories of error that have happened on this site and could happen again include:

Per-device metric values inferred when the abstract reports only an aggregate range
Stage-specific accuracy figures pulled from secondary sources rather than the primary paper
Sample size or population characteristics missing from the abstract
Editorial framing in the curatorial note that overstates the study's actual scope
Year typos and author-name transliteration errors

Where a metric value reads see source, the published abstract did not contain that value and CircaTest deliberately left it null rather than guess. For any decision that depends on the underlying numbers, please consult the linked PubMed, PMC, or DOI source directly. Corrections are welcome via the about page.

Source links

Added to the CircaTest meta-analysis on 2026-04-06. How CircaTest evaluates studies →