Stanford researchers trained an AI called SleepFM on hundreds of thousands of hours of clinical sleep studies and matched that data with decades of electronic health records to test whether a night’s sleep can reveal long-term disease risk; the work points to wide-ranging signals in sleep but remains strictly research for now.
The team assembled nearly 600,000 hours of polysomnography recordings from more than 60,000 people across sleep centers, relying on the gold standard that tracks brain waves, heart activity, breathing and limb and eye movements. Using that rich physiological picture, they taught an AI to spot patterns that humans don’t normally read from a single night of sleep. The dataset’s depth is what lets researchers start asking whether sleep itself encodes early signs of disease.
SleepFM was asked to predict outcomes across roughly 1,000 diagnostic categories in linked medical records, and the model flagged about 130 diseases with reasonable accuracy. That set included everything from dementia and heart disease to kidney disorders and mortality risk, and predictions were especially strong for cancers, pregnancy complications, circulatory problems and certain mental health conditions. Pairing sleep physiology with up to 25 years of electronic health record history gave the algorithm a long view few studies achieve.
“Sleep contains far more information about future health than we currently use,” James Zou, Ph.D., said about the work. “By learning the language of sleep, our AI model opens new doors for studying the science and medicine of sleep,” he added, noting that humans spend about one-third of their lives sleeping.
The model does not spit out a written explanation in plain English for each decision, which the team acknowledges. “It doesn’t explain that to us in English,” Zou noted. “But we have developed different interpretation techniques to figure out what the model is looking at when it’s making a specific disease prediction.” Those interpretation methods aim to map which sleep features drive specific risk signals so clinicians and scientists can investigate the biology behind the prediction.
Experts caution against immediate clinical use, and that caution is well placed. “A significant signal doesn’t equal ready medicine,” said Dr. Harvey Castro, a board-certified emergency medicine physician who commented on the findings. “SleepFM is a breakthrough, not yet a bedside tool.”
Castro also underscored the difference between ranking and predicting outcomes in real patients. “Ranking risk isn’t the same as predicting outcomes, and patients live in outcomes,” he said, pointing out that a statistical ranking of risk needs trial-based validation to show it changes care or improves health. External, prospective testing is the next step before any clinical deployment.
The researchers were transparent about study limits. “There’s still much that we don’t understand … Most analysis focuses on narrow tasks like sleep staging and apnea detection,” Zou noted, stressing that much of sleep AI to date has been task-specific. This study uses multi-modal sleep recordings that pull strong signals from brain, cardiac and respiratory sources, which is powerful in a lab but may not translate directly to simpler devices.
Because polysomnography captures high-fidelity brain and bodily signals, the team wants to test whether the same patterns appear in data from wearable sensors. Moving from clinic-grade equipment to consumer wearables would broaden reach but also requires careful work to identify what exact signals the model depends on. The hope is to isolate features that survive downsampling and the noisier world outside a sleep lab.
The study was partly funded by the National Institutes of Health and the findings were published in Nature Medicine, which gives the work a strong peer-reviewed platform. Despite the promising results and reputable backing, the researchers repeatedly note this is a research tool, not medical advice, and that “sleep is very important” for health.
For now, SleepFM remains a research-only technology being tested in controlled settings and is not available to consumers. The path forward looks like careful validation, clearer interpretation of model drivers, and studies showing that early risk detection from sleep actually leads to better outcomes for patients.
