Generating Synthetic Electrocardiograms Using Deep Generative Algorithms
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
One of the major causes of death is cardiovascular diseases. In 2019, it reached 32% of all deaths worldwide. ECG is widely used in the diagnosis of cardiovascular diseases mostly since it is non-invasive and painless. Diagnosis is usually performed by human specialists which is time-consuming and prone to human error, in case of availability. However, automatic ECG diagnosis is becoming increasingly more acceptable since not only it eliminates randomized human errors, but also it can be available as a bedside testing any time and anywhere using common and affordable wearable heart monitoring devices. Automatic ECG diagnosis algorithms are usually deep neural network classifier models which classify the ECG beats depending on the general pattern of the ECG heartbeat. Electrocardiogram (ECG) datasets used for training the diagnosis classifiers, tend to be highly class-imbalanced due to the scarcity of abnormal cases and the abundance of normal cases. As such, the classifiers trained on class-imbalanced datasets usually perform poorly, especially on minor classes. Additionally, the use of real patients' ECGs is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data. One approach is to generate realistic synthetic ECG signals using Generative Adversarial Networks (GAN) to augment class-imbalanced datasets. The data generated by generative algorithms are not duplicates of the real training data and are unique for distinct latent variables, as generative algorithms map variables from latent space to the real space. First Project: We studied the capability of generating synthetic ECG signals for 5 different models from the unconditional GAN family and compared their performances, focusing only on Normal cardiac cycles (monoclass). Dynamic Time Warping (DTW), Fréchet, and Euclidean distance functions were employed to quantitatively measure the quality of the generated beats. The quality of a beat signifies the existence of morphological patterns in a beat. We proposed and applied five different methods (metrics) for evaluating generated beats. The results show that all the tested models can, to some extent, successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favor BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to the productivity rate metric, the Classic GAN is superior with a 72% productivity rate. We also designed a simple experiment with the state-of-the-art classifier (ECGResNet34) to show empirically that the augmentation of the imbalanced dataset by synthetic ECG signals could significantly improve the classification performance. This study is different from its predecessors as it includes WGAN and uses the MLII lead, which had not been done before. This paper has been published in PLOS ONE journal. Second Project: We combined conditional GAN with WGAN-GP and developed AC-WGAN-GP in 1D form for the first time to be applied to the MIT-BIH Arrhythmia dataset. We investigated the impact of data augmentation on arrhythmia classification. Two models were employed for ECG generation: (i) unconditional GAN; Wasserstein GAN with gradient penalty (WGAN-GP) trained on each class individually, and (ii) conditional GAN; one single Auxiliary Classifier WGAN-GP (AC-WGAN-GP) model trained on all classes and then used to generate synthetic beats in all classes. Two scenarios were defined for each case: (a) unscreened; i.e., all the generated synthetic beats were used, and (b) screened; i.e., only high-quality beats are selected and used, based on their Dynamic Time Warping (DTW) distance to a designated approved template. The state-of-the-art ResNet classifier (EcgResNet34) was trained on each of the four aforementioned study cases (augmented datasets), and the standard classification performance metrics (precision/recall/F1-Score micro- and macro-averaged, confusion matrices, multiclass precision-recall curves) were compared with those of the original imbalanced case. We also used a simple metric called Net Improvement. All three metrics consistently show that unconditional GAN with raw generated data creates the best improvements. This paper has been presented and published in IEEE BIBM Conference, Las Vegas, 2022.Third Project: We employed Diffusion models to generate synthetic ECG signals. Deep learning image processing models have had remarkable success in recent years in generating high-quality images. Particularly, the Improved Denoising Diffusion Probabilistic Models (DDPM) have shown superiority in image quality compared to state-of-the-art generative models, which motivated us to investigate its capability in generating synthetic electrocardiogram (ECG) signals. In this work, synthetic ECG signals are generated by the Improved DDPM and by the Wasserstein GAN with Gradient Penalty (WGANGP) models and then compared. To this end, we devised a pipeline to utilize DDPM in its original 2D form. First, the 1D ECG time series data is embedded into the 2D space, for which we employed the Gramian Angular Summation/Difference Fields (GASF/GADF) as well as Markov Transition Fields (MTF) to generate three 2D matrices from each ECG time series that, when put together, form a 3-channel 2D datum. Then, 2D DDPM is used to generate 2D 3-channel synthetic ECG images. The 1D ECG signals are reconstructed by de-embedding the 2D generated image files back into the 1D space. This work focuses on unconditional models and the generation of only Normal sinus ECG signals, where the Normal class from the MIT-BIH Arrhythmia dataset is used as the training phase. The quality, distribution, and the authenticity (equivalency) of the generated ECG signals by each model are compared. Our results show that, in the proposed pipeline, the WGAN-GP model is superior to DDPM by far in all the considered metrics consistently. This paper has been published in IEEE Access journal.