Harvard Electroencephalography Database: A comprehensive clinical electroencephalographic resource from four Boston hospitals.

Saved in:
Bibliographic Details
Title: Harvard Electroencephalography Database: A comprehensive clinical electroencephalographic resource from four Boston hospitals.
Authors: Sun, Chenxi (AUTHOR), Jing, Jin (AUTHOR), Turley, Niels (AUTHOR), Alcott, Callison (AUTHOR), Kang, Wan‐Yee (AUTHOR), Cole, Andrew J. (AUTHOR), Goldenholz, Daniel M. (AUTHOR), Lam, Alice (AUTHOR), Amorim, Edilberto (AUTHOR), Chu, Catherine (AUTHOR), Cash, Sydney (AUTHOR), Junior, Valdery Moura (AUTHOR), Gupta, Aditya (AUTHOR), Ghanta, Manohar (AUTHOR), Nearing, Bruce (AUTHOR), Nascimento, Fábio A. (AUTHOR), Struck, Aaron (AUTHOR), Kim, Jennifer (AUTHOR), Sartipi, Shadi (AUTHOR), Tauton, Alexandra‐Maria (AUTHOR)
Source: Epilepsia (Series 4). Sep2025, Vol. 66 Issue 9, p3411-3425. 15p.
Subjects: Electroencephalography, Epilepsy, Artificial intelligence, Interdisciplinary research, Medical records, Clinical neurosciences, Information dissemination
Geographic Terms: Boston (Mass.)
Abstract: Objective: This article presents the Harvard Electroencephalography Database (HEEDB), a large‐scale, deidentified, and standardized electroencephalographic (EEG) resource supporting artificial intelligence‐driven and reproducible research in epilepsy and broader clinical neuroscience. Methods: HEEDB aggregates more than 280 000 EEG recordings from more than 108 000 patients across four Harvard‐affiliated hospitals. Data are harmonized using the Brain Imaging Data Structure and hosted on the Brain Data Science Platform. EEG data are linked with clinical notes, International Classification of Diseases, 10th Revision codes, medications, and EEG reports. Deidentification follows Health Insurance Portability and Accountability Act Safe Harbor standards. Results: The database includes routine, epilepsy monitoring unit, and intensive care unit EEGs across all age groups, with 73% linked to deidentified clinical reports and 96% of those matched to recordings. Findings are extracted using expert curation, regular expressions, and medical natural language processing models. Auxiliary data include diagnoses, medications, and hospital course, supporting multimodal analysis. Significance: HEEDB fills a critical gap in EEG data availability for epilepsy research. By enabling large‐scale, privacy‐compliant, and clinically relevant analysis, it accelerates the development of diagnostic tools, improves training datasets for machine learning, and promotes data‐sharing in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and National Institutes of Health data policies. [ABSTRACT FROM AUTHOR]
Copyright of Epilepsia (Series 4) is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Psychology and Behavioral Sciences Collection
Description
Abstract:Objective: This article presents the Harvard Electroencephalography Database (HEEDB), a large‐scale, deidentified, and standardized electroencephalographic (EEG) resource supporting artificial intelligence‐driven and reproducible research in epilepsy and broader clinical neuroscience. Methods: HEEDB aggregates more than 280 000 EEG recordings from more than 108 000 patients across four Harvard‐affiliated hospitals. Data are harmonized using the Brain Imaging Data Structure and hosted on the Brain Data Science Platform. EEG data are linked with clinical notes, International Classification of Diseases, 10th Revision codes, medications, and EEG reports. Deidentification follows Health Insurance Portability and Accountability Act Safe Harbor standards. Results: The database includes routine, epilepsy monitoring unit, and intensive care unit EEGs across all age groups, with 73% linked to deidentified clinical reports and 96% of those matched to recordings. Findings are extracted using expert curation, regular expressions, and medical natural language processing models. Auxiliary data include diagnoses, medications, and hospital course, supporting multimodal analysis. Significance: HEEDB fills a critical gap in EEG data availability for epilepsy research. By enabling large‐scale, privacy‐compliant, and clinically relevant analysis, it accelerates the development of diagnostic tools, improves training datasets for machine learning, and promotes data‐sharing in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and National Institutes of Health data policies. [ABSTRACT FROM AUTHOR]
ISSN:00139580
DOI:10.1111/epi.18487