Content-based EEG data retrieval and annotation for timelocked and continuous EEG data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Improvements in recording technology and the movement of EEG experiments into real-world settings have resulted in an explosion of EEG recordings. These recordings, which use a spectrum of headset technologies recorded under a range of experimental conditions, are being assembled into collections for large-scale analysis. Such collections are essential for determining the robustness of observations and for characterizing intra and inter subject variability. This dissertation begins to address two new research directions that result from the creation of such repositories: content-based EEG retrieval and automated EEG annotation. Content-based EEG retrieval (CBER) allows researchers to take short time-locked EEG data segments as queries and retrieve similar time-locked EEG segments from large EEG databases. EEG data are recorded under different configurations in many independent research laboratories. Even sessions in which the same individual performs the same task in the same laboratory vary due to factors such as differences in placement of the EEG headset between sessions or changes in the cognitive and physical state of the subject. CBER implementations require the use of configuration-invariant feature sets. We have evaluated a number of such feature sets for CBER and have proposed new EEG-based bag-of-words models to represent EEG data. We have applied these and other configuration-invariant features to the problem of retrieving time-locked epochs representing similar responses across subjects and conditions. However, as EEG imaging moves into more real-world settings, well-defined event time markers are no longer available for time-locked analysis. Further, the question of when particular brain states occur is not necessarily tied to particular identifiable environmental events. We have also developed a prototype system for automated annotation of EEG data and applied it to predict events in one subject based on the labeled events of other subjects. We introduce the idea of timing slack and timing-tolerant performance measures to deal with jitter inherent in such non-time-locked systems. This timing-tolerant approach has the added benefit of accommodating inter- and intra- subject timing tolerances among subjects. Using standard domain-adaptation-based classifiers and relatively simple features, we are able to identify non-time locked events in a subject without using any labeled data from that subject. This approach can be applied to automatically identify patterns in large-scale EEG data collections and may give us insight into the prototypical brain responses that are evoked during real-world imaging.