
Citation: | Torse Dattaprasad, Desai Veena, Khanai Rajashri. An optimized design of seizure detection system using joint feature extraction of multichannel EEG signals[J]. The Journal of Biomedical Research, 2020, 34(3): 191-204. DOI: 10.7555/JBR.33.20190019 |
Epilepsy is caused by irregular changes in neural activity[1]. The recurrent impulsive seizure from intricate processes is related to numerous neurotransmitters in the cholinergic, glutamatergic and GABAergic system[2]. As the World Health Organization reports[3], out of 450 million people worldwide with neurological or behavioral disorders, 65 million are affected by epilepsy with nearly 7 million cases in India[4]. The medical diagnosis of epileptic disorders is made by medical professionals using EEG signals recorded from the patient's scalp[5]. The two major types of EEG signals are focal and non-focal in generalized epilepsy. The focal EEG signals may be categorized as epileptic or non-epileptic. Epileptic spikes are the interictal indication of a patient with involvement of limited area of the brain. Non-epileptic deformities are characterized by variations in normal signals or by the manifestation of abnormal signals. While these two types often coexist, separating them is conceptually useful and helps in diagnosis of focal epilepsy[6]. The major treatment option for epilepsy is anti-epileptic drugs (AEDs). The best possible AEDs fail to control seizures in practically one-third of epileptic patients and seizures persist for lifetime. The design and development of automated seizure detection systems may have the potential to improve the diagnostic accuracy when compared to manual observations of EEG signals. The signal feature computation and then classification are major steps in computerized seizure detection[7]. EEG signals are a powerful tool in different clinical applications for diagnosis of brain disorders. The computational complexity of signal processing algorithms is a prime factor in real time seizure detection systems[8].
Recently, several automated EEG classification algorithms based on empirical mode decomposition (EMD) and tunable-Q wavelet transform (TQWT) are proposed on single-channel univariate signals for 2-class and 3-class problems. Hassan AR et al have proposed an automated epilepsy diagnosis scheme based on TQWT and bootstrap aggregating, which reveals applicability of spectral features in the TQWT domain in combination with bagging[9]. Based on statistical moment based features extracted using EMD, sleep scoring on single channel EEG classification using adaptive boosting and decision trees has been presented by Hassan AR et al[10]. A study on use of normal inverse Gaussian (NIG) pdf modeling of TQWT sub-bands in sleep stage classification has been presented by Hassan AR et al[11]. An automated sleep staging method has been developed by Hassan AR et al by using TQWT and different spectral features obtained from TQWT sub-bands. The classification has been performed by random forest (RF) classifier[12]. Hassan AR et al have shown the efficacy of automated EEG signal processing systems using TQWT and EMD[13–16].
As in the above literature, the EEG signals are classified using statistical and entropy features. The decomposition of EEG signal and computation of features is done separately in two steps, using either ensemble EMD (EEMD) or TQWT. The single stage decomposition used earlier shows increased computational complexity of the classifiers due to the inclusion of single level decomposition and features. The main advantage of EEMD is in identifying statistical characteristics of white noise when deciding the final number of IMFs. TQWT has an advantage of operating on small/no oscillatory signals with a low Q-factor, but not on high oscillatory signals with a relatively high Q-factor. TQWT handles the issue by adjusting its Q-factor and hence may be used as an effective algorithm for EEG signal analysis. In this paper, advantages of EEMD and TQWT are combined to extract optimum features in two stages to enhance the classification accuracy in a real time seizure detection system.
The proposed joint EEMD-TQWT is shown in Fig. 1. In the first step, the EEG signals from normal and seizure classes are decomposed using EEMD. Since the maximum oscillatory information about the signals lies in the first few IMFs, four initial IMFs are selected for decomposition using TQWT based on optimum values of Kraskov entropy (KraEn), permutation entropy (PermEn), and sample entropy (SampEn). The sub-bands of the signals are obtained by choosing low-Q values for IMFs with low oscillatory behavior, and high-Q values for more oscillatory IMFs. In the second step, experimental results show that the centered correntropy (CenCorrEn) features computed only for the 1st and 16th sub-bands of the selected IMFs results in the highest classification accuracy. The first sub-band of TQWT represents high-pass sub-band and the last low-pass sub-band. Since we have considered 4 IMFs from the EEMD step, a total of 16 sub-bands are produced by the TQWT step. The features from these sub-bands represent oscillatory behavior of decomposed signals. Moreover, with the higher order of sub-band, a reasonably good resolution can be attained concurrently in high and low frequency region of the input signals. The selection of optimum features from this set is done using the wrapper method of feature selection[17]. Finally, a comparison of the classification performance is presented for LS-SVM and RF classifiers.
This research work used two datasets, namely public and local. The public dataset (http://www.dtic.upf.edu/~ralph/sc/) contains multichannel intracranial EEG records of 5 patients who were suffering from pharmacoresistant temporal lobe epilepsy. Each EEG signal was sampled with 512 Hz sampling frequency and contained 10 240 samples. The band-pass filter of frequency range between 0.5 to 150 Hz has been used to filter noises[18]. In this dataset, 3 750 files were belonging to each class, focal and non-focal and 80% of the dataset have been used to train the machine learning algorithm and test its performance of local dataset. The local dataset contained the EEG data recorded from 10 subjects. The feature extraction and classification algorithms always performed exceedingly well for the online datasets. However, the performance of the same system considerably deteriorated when tested on the real time data recorded from patients. The reason for the adverse performance is unbalanced data resulting from the recording and sampling process in real time. In this work, all four entropy values were computed for multiple channels (19-channels) related to both EEG datasets for detection of seizure activity. The dataset contained 5 healthy subjects' EEG (3 male and 2 female), few with history of chronic headache and none with symptoms of epileptic seizures, and 5 epileptic subjects (3 male and 2 female), aged between 10 to 35 years. The extracranial recording was done with eyes open and closed using standard 10 -20 electrode placement. Out of 5 epileptic subjects, only one child subject of age 12 years was recorded with drug induced sleep EEG. The recording was done for 10-to-15-minutes sessions. The expert neurologist was consulted to read the EEG data manually and decide epileptic seizures identified from the specific channel of the recording system. The system used was Recorders & Medicare Systems (RMS) medical equipment data acquisition 10-20 standard, which enabled recording of EEG from 19 active unipolar channels (FP1, FP2, F7, F3, Fz, F4, F8, T3, C3, CZ, C4, T4, T5, P3, PZ, P4, T6, O1, and O2) with A1 and A2 as a reference connected to total 21 electrodes placed on the subject's left and right head. To maintain uniformity in data samples taken from public and local dataset, selection of 19 channels was done based on comparison with public dataset. An electrode fixed in the middle of distance between FP1 and FP2 on the subject's forehead was the ground. For this pilot research work, time series of approximately 23 seconds (4 000 samples) from the EEG recording was selected for 5 healthy and 5 epileptic patients under the expert observation of the neurologist as shown in Fig. 2. The total number of samples was 176/second and hence the EEG data was sampled at 176 Hz. From the recording of original datasets and manual observation, only artifact-free EEG signals were segmented from large datasets.
In time-scale domain, the sparsity of IMFs obtained from EEMD can be best represented by decomposing them using TQWT. The novel technique of application of TQWT to the IMFs to characterize the oscillatory behavior of EEG was used on F and NF signals. The assessment and selection of IMFs were based on the optimum values of Kraskov, sample and permutation entropies.
EMD is an adaptive joint time-frequency data analysis scheme[18]. The methods work on the principle of decomposing the EEG signal into a finite set of oscillatory elements called intrinsic mode functions (IMFs). The IMFs are considered as zero-mean, amplitude and frequency modulated (AM-FM) components. To obtain the correct form of IMF, it should satisfy two conditions[18]: (1) In the complete dataset, the total number of extrema values and the number of zero-crossings have to be either equal or vary at most by one; (2) The mean of two envelopes, one defined by connecting local maxima and the other by local minima, at any point should be zero.
In EMD the original signal x(t) is expressed as
x(t)=∑jcj(t)+r(t) | (1) |
cj(t)=Re{aj(t)exp[ifj(t)]}=Re{aj(t)exp[it∫−∞ωj(t′)dt′]} | (2) |
where
Although the data adaptive EMD is a powerful method to decompose nonlinear signal, it has the disadvantage of the frequent emergence of mode integration. This indicates a mode mixing problem, where a single IMF either consists of signals of widely dissimilar scales, or a signal of a scale alike existing in unlike IMFs. An improved technique of a noise-assisted data analysis (NADA), known as the EEMD, was suggested to eliminate this drawback[19].
In EEMD, an ensemble of data sets was generated by adding uniformly distributed white noise with finite amplitude 'a' to the original signal x(t). The EMD analysis was then used to each of the data series of the ensemble. Finally, the IMFs were obtained by averaging the individual components in every realization over the ensemble. In EEMD algorithm various parameters such as ensemble number (ne), number of siftings (ns), amplitude of white noise (a), number of IMFs (nimf) and IMF (cm) were defined based on the time complexity of the signal being decomposed. The algorithm was initialized by setting the IMF value [cm(t)] equal to zero. The resulting signal decomposed into IMFs and the trial loop was executed for the maximum value of ne. Finally, the ensemble mean of the computed IMFs from the decompositions was given by[20]
af=a/√ne | (3) |
In this work, decomposition of bivariate EEG signal was done by adding optimum values of white noise of amplitude 0.4 and number of ensembles was equal to 10 based on the experimentation performed on both datasets. The IMFs extracted from sample non-focal and focal EEG channel difference signals using EEMD technique were depicted in Fig. 3–4.
The tunable Q-factor is a prime quality of TQWT that makes it most suited for transform in analysis of IMFs with varying oscillatory nature. Ideally, the ratio of center frequency of an oscillatory signal to its bandwidth, defined as the Q-factor of a wavelet transform should be selected based on the oscillatory frequency components in the signals[14]. However, fixed value of Q-factor poses limitation on decomposition of nonlinear IMFs. The analysis of high oscillatory IMFs should be done using large value of Q whereas less oscillatory IMFs need smaller values of Q. The TQWT is bound by the Q-factor and its redundancy. In decomposing selected IMFs, the TQWT parameters defined are the Q-factor (Q), total over-sampling (r), and the number of decomposition phases termed as (j). The number of oscillations of the wavelet is determined by the Q parameter, which has the property of tuning in terms of α and β.
The parameters r and Q-factor for j decomposition levels is defined as[14]
r=β1−α;Q=ωcBW=2−ββ | (4) |
The term BW is bandwidth of the frequency response resulting in 'j' sub-bands.
The selection of Q-factor and 'r' depend on the number of levels of decomposition to cover the IMF's frequency range. The filter bank parameters α and β are selected to achieve the desired values. For a constant value of Q, as 'r' increases, overlapping between neighboring frequency responses increase. TQWT filters are easier to implement since the construction of these filters are based on non-rational transfer functions.
The multi-stage TQWT decomposition of IMFs can be achieved by repeatable connection of 2-band filter banks to its low pass channel. A 'j-level' TQWT is depicted in Fig. 5.
At each level of decomposition, the input signal x(n) is converted into low-pass and high-pass signal components. For the input signal sampling rate fs, the resulting low-pass and high-pass sampling frequencies are αfs and βfs where α and β are the same scaling constants for all levels of decompositions. The low-pass and high-pass sub-bands are computed using their respective frequency responses and scaling factors. The low-pass and high-pass frequency response factors are determined as shown in equations[14]
Fl(ω)=1,|ω|≺(1−β)πθ[ω+(β−1)πα+β−1],(1−β)π⩽|ω|≺απ0,απ⩽|ω|⩽π | (5) |
Fh(ω)=0,|ω|≺(1−β)πθ(απ−ωα+β−1),(1−β)π⩽|ω|≺απ1,απ⩽|ω|⩽π | (6) |
where θ(ω) is expressed as[13]
θ(ω)=1/2(1+cos(ω))√cos(ω),|ω|⩽π | (7) |
Finally, the reconstruction of original input signal is accomplished by using the same filter banks used in the decomposition stage.
Normally, classification of seizure and non-seizure EEG signals is achieved using a set of diagnostic features extracted from the signal components. The optimum feature sets with intra-class identical characteristics and inter-class fluctuations are well suited for automated systems. However, proper selection of feature sets and their discrimination ability depends on signal decomposition stage. The entropy features, developed by Shannon are a measure of spread of the data. The entropy is used to quantify unpredictability within the signal and to measure complexity of nonlinear signals. Acharya, U. Rajendra, et al. have proposed a review on applications of entropies in automated diagnosis of epilepsy[21].
The entropy measure is useful in analyzing nonstationary and nonlinear EEG signals. The Shannon's entropy for a data source S is defined using
Hk(S)=−∫f(S)log[f(s)]ds | (8) |
where f(S) is probability density function (pdf). The differential statistical entropy of nonlinear signals known as the Shannon entropy is measured using KraEn. In this work, KraEn is computed for the EEG signals using the "k-nearest neighbor" sample. By default the Euclidean distance has been used for estimating H(S) from EEG signals of 'n' samples of signal 'S'. Equation 8 is considered as an average of
ˆHk(S)=−1NN∑i=1[logμ(Si)] | (9) |
To obtain the estimate
ˆHk(S)=−Ψ(k)+ψ(N)+logCd+dNN∑i=1logε(i) | (10) |
where d is the length of source S, Cd is volume of the d,
By combining the concept of entropy and symbolic dynamics, Bandt and Pompe introduced a new concept of PermEn as a measure of complexity of nonlinear signals by[23]
Hp(S)=−n!∑i=1p′ilog(p′i) | (11) |
where
The PermEn per symbol is as given by
hp(S)=−1(n−1)n!∑i=1p′ilog(p′i) | (12) |
where
The PermEn is computed with permutation order 'n' using a method which considers the first occurrence order[23].
SampEn devised by Richman and Moorman[24], is defined as measure of the unpredictability of the i.e. how much a given data point depends on the values of k of previous data points, averaged over the entire signal range. It is also a measure of data regularity like approximate entropy. SampEn is mostly independent of record length. The detection irregularity in the EEG data is associated to larger values of SampEn.
The SampEn is computed by finding points that are matching within the tolerance 'r' for all match points, whereas the number of template matches are stored in counters A(dim) and B(dim) for all lengths dim up to 'e'. SampEn is defined by
SampEn(dim,r,N)=ln[A(dim)B(dim−1)] | (13) |
where B(0) = N, the length of the input series and dim is the embedding dimension and r is the tolerance value. The false nearest neighbor has been used in this work to extract SampEn. With the appropriate k value the false neighbor value reaches zero. In this work, k=2 was appropriate value for SampEn computation of IMFs[24].
In a specific window controlled by the kernel dimension, correntropy straightly specifies the probability density of closeness of two random variables[25]. Cross-correlation is the similarity between two different signals in time domain and its entropy known as correntropy. It is a similarity measure of multichannel signals resulting in nonlinearity of a feature space. The correntropy is defined by
C(r,t)=E[f(xr),f(xt)] | (14) |
where
K(xr,xt)=[f(xr),f(xt)] | (15) |
This kernel-trick method results in relevant feature space without the precise knowledge of nonlinear mapping and hence reduces the expected value of the kernel of a particular choice. The most commonly used Mercer kernel is in the form of Gaussian kernel denoted by
K(xr,xt)=1√2πσ2e(−‖ | (16) |
where σ is the size of the kernel.
The Fourier transform of correntropy has been introduced as the general power spectral density and defined as correntropy spectral density (CSD)[25]. The CSD is defined by
{P_v}(\omega ) = \sum\limits_{m = - \infty }^\infty {V[m]} {e^{ - j\omega m}} | (17) |
In machine learning and statistics, classification deals with identifying a set of class to which a data sample belongs in the sample database. In this work, EEG signal is classified into seizure and non-seizure class. An illustration of supervised learning has considered where training data set of correctly classified samples is available for assessing performance of three classifiers.
Multilayer feed-forward neural networks (MFNN)[26] designed for the problem uses multiple neurons interconnected in basic format of input layer, multiple hidden layers, and an output layer. The basic functionality of multilayer neural networks is the same as that of a traditional one-hidden-layer neural network[26]. In the proposed study, four features were used to train the MLPNN and its performance was e-evaluated. The training and testing the MLPNN algorithm begins with configuring the neural network's various parameters such as hidden neurons, transfer functions and training algorithms with initialization of the network weights bias. The structure with 10 hidden-neurons was decided with "tansig" function as input-hidden transfer function. An optimal design value was chosen for hidden output transfer function. Four different supervised training functions were tested to decide the optimum performance. The performance function selected were MSE, the number of epochs were multivariate recordings (750), performance goal (0.001), number of hidden neurons (10), maximum training time, and maximum validation failures (6). The network training and testing was split to a standard division of training (70%) and testing (30%) in these cases. The evaluation was done based on simulation results, the computation of average MSE and an average Classification Accuracy (CAcc).
Support vector machines (SVM) is dominant in nonlinear classification problems, function, and density estimation in kernel based schemes. The SVM originated from statistical learning theory and structural risk minimization where quadratic programming is used to solve convex optimization problems. The standard SVMs are re-formulated to obtain the least squares support vector machines (LS-SVM). The LS-SVM has also been used successfully in the classification of Bonn EEG datasets[27]. In this work, the classification performance of LS-SVM was assessed due to its extensive application in machine learning domain.
For a given training set
y(x) = sign[{W^T}\varphi (x) + b] | (18) |
with
Thus for separable data, the assumptions are:
\left\{ \begin{array}{l} {W^T}\varphi ({x_n}) + b \geqslant + 1,if{y_n} = + 1 \\ {W^T}\varphi ({x_n}) + b \leqslant - 1,if{y_n} = - 1 \\ \end{array} \right. | (19) |
which is equivalent to
{y_n}[{W^T}\varphi ({x_n}) + b] \geqslant 1,k = 1,...,N | (20) |
In the LS-SVM technique, optimal separation of hyper-plane can be determined to maximize the distance from every class to the hyper-plane. The optimization problem can be addressed by the following equation[28]:
\mathop {\min }\limits_{\omega ,\xi } \Im (\omega ,\xi ) = \frac{1}{2}{\omega ^T}\omega + c\sum\limits_{k = 1}^N {{\xi _k}} | (21) |
which is subjected to equality constraints,
\begin{array}{*{20}{c}} {{y_k}\left[ {{\omega ^T}\varphi ({\chi _k}) + b} \right] \geqslant 1 - {\xi _k},}&{k = 1, \ldots ,N} \\ {{\xi _k} \geqslant 0}&{k = 1, \ldots ,N} \end{array} | (22) |
Constructing Lagrangian is given by
\begin{array}{c} \text{£}\left( {\omega ,b,\xi ;\alpha ,\nu } \right) = \tau (\omega ,{\xi _k}) - \\ \displaystyle\sum\limits_{k = 1}^N {{\alpha _k}\left\{ {{y_k}\left[ {{\omega ^T}\varphi ({\chi _k}) + b} \right] - 1 + {\xi _k}} \right\} - \displaystyle\sum\limits_{k = 1}^N {{\nu _k}} } {\xi _k} \end{array} | (23) |
with Lagrange multipliers
After solving equation 24, the LS-SVM classifier is given by
f(x) = sign\left[\sum\limits_{i = 1}^N {{\alpha _i}{y_i}K({x_k},{y_k}) + b} \right] | (24) |
where represents a kernel function. In this paper, radial basis kernel function has been used for the 2-class problem. The kernel function is defined by
K({x_k},{y_k}) = {e^{\frac{{ - {{\left\| {x - {x_k}} \right\|}^2}}}{{2{\sigma ^2}}}}} | (25) |
RF are a combination of tree predictors and emerged as an efficient tool in classification for the reason that they do not overfit. The accuracy of the RF classifiers improves with injection of right type of randomness, the individual predictor's strength and their correlations. RFs produce results competitive with boosting and adaptive bagging without change in the training set. With the knowledge of the construction of single classification trees, RF develops several trees for classification. To classify a new sample, the values of the sample are assigned to the trees in the forest. Then, every tree gives a classification, which means it actually "votes" for that class. The forest chooses the classification having the most votes as compared to all the trees in the forest[29].
For M input variables, a number m<<M is specified so that at each node, m variables are randomly selected out of the M and the finest split on these m is used to divide the node. The value of m=10 and it is held constant for the period of the forest development.
In this work, 3-class problem (normal, pre-ictal, and ictal) is designed using RF classifier due to its suitability in multiclass classification over SVM. A random vector ςn is used to assign the nth tree. The number of features in the training set are xN =100. The tree classifier C(xN, ςn) for 3-class problem was developed using feature samples and the random vector.
The experiment used datasets with difference in signal from adjacent channels. Differencing stabilized the mean of EEG signals by eliminating changes in the level of the signals, and so eliminated seasonality and drift. Firstly, signals from various channels were decomposed into band-limited, oscillatory and symmetric IMFs. In this work, signal pairs of seizure and non-seizure EEG signals were decomposed using EEMD algorithm to obtain 8 IMFs. A value of 0.4 for standard deviation error and 10 for ensemble number were selected for EEMD decomposition. Three entropy parameters namely, KraEn, PermEn, SampEn were computed for the 8 IMFs. Since maximum oscillatory information about the underlying signal is obtained from the first few IMFs[18], selection of the first four IMFs were made for TQWT decomposition. Accuracy, Sensitivity, Specificity and Receiver Operating Characteristics (ROC) are standard evaluation parameters for machine learning algorithms and have been used to analyze the performance of a classifier. The resulting observations of 3 different maximum entropy feature values derived from 8 IMFs of real time dataset are shown in Table 1.
IMF | Feature parameters | Real time dataset | |
FP-channel | P-channel | ||
IMF1 | KraEn | 5.09 | 4.92 |
PermEn | 1.64 | 1.65 | |
SampEn | 2.15 | 2.16 | |
IMF2 | KraEn | 2.79 | 2.66 |
PermEn | 2.58 | 2.58 | |
SampEn | 2.19 | 2.18 | |
IMF3 | KraEn | 2.40 | 2.27 |
PermEn | 2.29 | 2.29 | |
SampEn | 2.19 | 2.19 | |
IMF4 | KraEn | 3.03 | 3.04 |
PermEn | 1.74 | 1.74 | |
SampEn | 1.89 | 1.89 | |
IMF5 | KraEn | 4.84 | 4.62 |
PermEn | 1.31 | 1.32 | |
SampEn | 2.14 | 2.13 | |
IMF6 | KraEn | 3.88 | 3.65 |
PermEn | 1.21 | 1.21 | |
SampEn | 2.18 | 2.17 | |
IMF7 | KraEn | 3.31 | 3.16 |
PermEn | 1.11 | 1.11 | |
SampEn | 2.18 | 2.17 | |
IMF8 | KraEn | 3.11 | 2.95 |
PermEn | 1.07 | 1.06 | |
SampEn | 1.94 | 1.94 | |
IMF: intrinsic mode functions; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
A total of 2 562 samples of normal and seizure EEG dataset were considered due to computational complexity[30]. However, by employing EEMD, the feature set for 2 562 samples of the normal and seizure EEG signals had been considered and found computationally efficient with an average computation time of 11.63 seconds. The algorithm was tested using MATLAB 9.0.1 tool on Intel ®CoreTM i5-7200 U CPU @ 2.5 GHz with 8 GB RAM. The computation time was obtained for the entire 2 562 samples for each of the entropy features and is depicted in Table 2. The table shows the average values of computation time for randomly chosen 100 data samples from both signal classes. The minimum average time for computation of KraEn, PermEn, and SampEn was 0.042 for each IMF. The computation time depicts the potential of the proposed EEG classification algorithm in real time hardware implementation. The CenCorrEn of the 16th sub-band of TQWT is separately computed and computation time of the same has not been considered in this paper.
Features | Computation time (second) |
KraEn | 0.055 |
PermEn | 0.043 |
SampEn | 0.050 |
IMF: intrinsic mode functions; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
The KraEn estimated the Shannon entropy of the dataset using k-nearest neighboring sample. The KraEn of IMFs with Euclidean distance produced a valuable feature set for 2-nearest neighbor sample. The PermEn computes the permutation entropy H of the IMFs using permutation order n, time lags from tau and a method to treat equal values. These parameters describe the accuracy of the values in the IMF data series by the number of decimal places. In this work, the value of n=1, 2, 3 were considered to experiment and the optimum PermEn feature set was obtained for n=3, for which maximum classification accuracy has been achieved. The SampEn computes sample entropy of input signal vector for maximum template length M. The default value of M=5. Experimental values of M=1, 2, 3, 4, 5 have been employed in this work. When the SampEn is estimated for M=1, ..., M-1, the feature set size increases with standard error estimates (se), and therefore the number of matches for m = 1, ..., M and m=0, ..., M-1 are computed separately. In this work the maximum accuracy of RF classifier has been reported for SampEn feature vector computed for M=1.
Initially, 100 EEG signals (50 each from normal and seizure) were selected to compute entropy features. Using 8 IMFs and 3 entropy parameters the size of the feature vector became 42×1 000. In the next step, to minimize the classifier complexity, selection of the IMFs containing the best possible entropy for further decomposition using TQWT was achieved using Student's t-test (P≤0.05)[31]. The reduced feature set after performing the test was of 12×1 000 size is shown in Table 3.
Feature | IMF | P-value |
KraEn | IMF1 | 0.0413 |
IMF2 | 0.0365 | |
IMF3 | 0.0351 | |
IMF4 | 0.0257 | |
PermEn | IMF1 | 0.0371 |
IMF2 | 0.0194 | |
IMF3 | 0.000084 | |
IMF4 | 0.000057 | |
SampEn | IMF1 | 0.000147 |
IMF2 | 0.000026 | |
IMF3 | 0.000054 | |
IMF4 | 0.000135 | |
IMF: intrinsic mode functions; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
Further, the pair of IMFs was decomposed using TQWT for various values of Q and J. The maximum classification accuracy was obtained for subband16 of IMF pairs IMF1, IMF3, and IMF4 for Q=3 and J=16. The minimum value of r was selected as 3. The decomposition of the 1st IMF from seizure difference signal into the 1st to 8th and 9th to 16th sub-bands are shown in Fig. 6 and 7, respectively.
For CenCorrEn features computed from the 16th sub-band are shown in Table 4.
Feature | Non-focal EEG signal | Focal EEG signal | P-value (average for all four signals of first IMF decomposed) | ||
x-channel | y-channel | x-channel | y-channel | ||
CenCorrEn | 0.1582 | 0.1463 | 0.1763 | 0.1864 | 1.7381×10-21 |
CenCorrEn: centered correntropy; EEG: electroencephalogram. |
The computation of CenCorrEn was done using Laplacian kernel in ITL toolbox available online (http://www.sohanseth.com/Home/codes). The MATLAB function computes the parametric CenCorrEn between two IMF vectors, IMFx and IMFy, where 'x' and 'y' are positive integers. In this work, the maximum entropy was obtained for IMF34 for default parameter of [1, 0] and kernel size (kSize) =1. The CenCorrEn was computed for these IMF pairs to form the final feature set which is of size 16 × 1 000. To select optimum values of Q and J using CenCorrEn, experiments were performed to evaluate classification accuracy for various values of Q and J of TQWT. The LS-SVM classifier was implemented in MATLAB using LS-SVM toolbox available online (http://www.esat.kuleuven.be/sista/lssvmlab/).
The selection of optimum value of Q and J was experimented by taking only CenCorrEn features that produced the highest classification accuracy. Further, the values of J are varied by keeping Q=3. The maximum possible value of J with Q=3 is 16. Hence, the value of J was tested from 4 to 18 and CenCorrEn was computed for each sub-band. Then, the classification was performed by varying value of J from 4 to 18. The plot of classification accuracy versus values of J is shown in Fig. 8.
It was observed that at J=16, maximum classification accuracy is obtained. The fixed value of J=16 was then used to test the value of Q for maximum accuracy. The variation of accuracies with Q is shown in Fig. 9.
The use of various transfer functions in MLPNN, namely, "tansig", "logsig", "purelin", and "elliotsig" resulted in a comparable performance. The features were randomized before being fed into MLP network and re-trained 10 times to calculate average MSE. From the preliminary study, a number of hidden neurons were set to 10 and "tansig" was set as an input-hidden transfer function. Four training functions, namely, conjugate gradient with Powell-Beale restarts (CGB), variable learning rate backpropagation (GDX), Levenberg-Marquardt (LM), and scaled conjugate gradient (SCG) were used for the experimental study from neural network toolbox of MATLAB (MathWorks, Natick, USA). Fig. 10 depicts average classification accuracy for CGB, GDX, LM, and SCG training functions for 2-class (normal-seizure) and 3-class (normal-pre-seizure-seizure) problems.
It was inferred from the simulation study that the tan-sigmoid and pure linear transfer functions were optimal for input-hidden layer and hidden-output layer respectively with LM training function for network learning. Four different classification tasks were considered to assess the performance of MLPNN.
In the RF classifier, the increase in the number of decision trees decreases the classification errors. The number of trees was less than 50 and the errors varied with minor changes when the number of trees was larger than 250. For the RF classifier, the minimum samples for the terminal node were taken 15 and the maximum depth of tree was tuned to the cross-validation. The maximum number of terminal nodes and maximum features to consider for split was selected based on the value of square root of the total features. However, values up to 30%–40% of the total number of features were tested to check over-fitting. The 10-fold cross validation and split of dataset into 70% train and 30% test group for both the LS-SVM and RF classifiers resulted in classification accuracies shown in Table 5.
Kernel | Sensitivity | Specificity | Accuracy |
LS-SVM Gaussian RBF kernel with σ2 =0.4 and gam = 10 | 85.2 | 83.0 | 84.5 |
RF | 94.5 | 95.6 | 96.2 |
LS-SVM: least squares support vector machine; RF: random forest; RBF: radial basis function. |
It was observed that the tuning of RF classifier for 3-class problem outperformed LS-SVM classifier on real time dataset. The receiver operating characteristics (ROC) curves for these two classifiers is shown in Fig. 11.
The ROC curves depict that the proposed method improved the RF classifier performance significantly from 84% to 96% for multichannel data samples. The computation time for the entropy calculation of all four entropy parameters is shown in Table 6. Multi-channel EEG analysis from all 19 channels involves a large number of samples and hence computational complex. Hence, multi-channel signals are grouped in 5 regions and the average of signals from these regions is considered without reducing the number of channels. Table 6 shows computation time of 3 entropy parameters for average of signals from brain regions.
Brain regions | Channel name | Computation time (second) | ||
KraEn | PermEn | SampEn | ||
Region 1 | FP1, FP2, FZ, F3 | 3.57 | 2.52 | 3.34 |
Region 2 | F7, T6, F7 | 3.41 | 2.49 | 3.25 |
Region 3 | C3, P3, C4, P4, CZ, PZ | 3.58 | 2.53 | 3.29 |
Region 4 | F4, F8, T3, T4 | 3.87 | 2.74 | 3.61 |
Region 5 | O1, O2 | 3.27 | 2.81 | 3.11 |
EEG: electroencephalogram; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
It was observed that average computation time of 3.54, 2.62 and 3.32 seconds is required for SampEn, PermEn and KraEn. As shown in Table 7, the computation time for CenCorrEn is relatively small (0.47 seconds) as it is computed only for the 1st and 16th sub-band of TQWT decomposed IMFs.
CenCorrEn | Computation time (second) |
IMF-1 | 0.47 |
IMF-2 | 0.48 |
IMF-3 | 0.45 |
IMF-4 | 0.47 |
CenCorrEn: centered correntropy; IMF: intrinsic mode function; TQWT: tunable-Q wavelet transform. |
In Table 8, the results of recent algorithms in EEG research are compared with the proposed method. In the comparison experiment, it was found that the proposed joint algorithm shows approximately 2% of improvement in classification accuracy. Besides, a comparative experiment using a basic feature extraction method using discrete wavelet transform (DWT) based entropy features fed to RF classifier results in 87% classification accuracy, while the proposed joint EEMD-TQWT results an improvement of 9.2% accuracy. This study was carried to verify the effectiveness of the proposed model with baseline method.
Authors | Features | Classifier | Accuracy (%) | Sensitivity (%) | Specificity (%) |
Rajeev Sharma et al[12] | Entropy | LS-SVM (RBF kernel) | 86.00 | 88.00 | 84.00 |
Vipin Gupta et al[32] | Entropy | LS-SVM (RBF kernel) | 94.41 | 93.25 | 95.57 |
Baseline method | DWT-entropy | RF | 87.00 | 86.23 | 83.12 |
Proposed work | Entropy | RF | 96.20 | 94.50 | 95.60 |
LS-SVM: least squares support vector machine; RBF: radial basis function; DWT: discrete wavelet transform; RF: random forest. |
Finally, the joint EEMD-TQWT and RF algorithms for feature extraction and classification respectively have been implemented using an embedded system.
The EEG signal is captured with a low cost 5-channel EMOTIV INSIGHT EEG sensor. The sensor kit records EEG from 5 channels, namely, AF3, AF4, T7, T8, Pz with 2 references in the common mode sense (CMS) active electrode/driven right leg (DRL) passive electrode noise cancellation configuration. With a data transmission rate of 128 samples/sec/channel, the recorded signal is available for decomposition and entropy feature extraction with the Raspberry pi (RPi) module. The signals from each channel are processed by algorithms implemented using inbuilt Python language for a RPi model-B V1.2. The system works as an independent seizure detection module by employing the proposed algorithms. The stand alone system may help patients/neurologists in providing them with first-hand information about the diagnosis of the seizure disorders. The limited number of channels used in diagnosis of seizures is a major challenge in this system. The use of 5-channel EEG device performs better for generalized epileptic seizure. This limitation can be corrected in future by employing wired 32-channel EEG recording instrument to acquire EEG signals from the entire scalp.
Epilepsy is a chronic brain disorder with recurrent seizures and associated disorders. The AEDs and surgical procedures are conventional ways to control seizures in case of drug resistant epilepsy. In such cases, the non-invasive seizure detection plays a vital role in automated diagnosis of epilepsy and needs systematic study and development of signal processing algorithms. In this work, a technique to categorize seizure and normal EEG signals using the tuning of Q-factor in the wavelet transform is explored. Initially, EEMD is used to decompose raw EEG signals into band-limited IMFs. Based on highest values of three entropy parameters namely, KraEn, PermEn, and SampEn, IMFs are selected for further decomposition using TQWT. The CenCorrEn is then computed for the 1st and 16th sub bands of selected IMFs. This ensures a more relevant feature space in classifying bivariate signals. The values of TQWT parameters Q=3, r=3, and J=16 have produced optimum classification performance. The joint EEMD-TQWT method has generated highest classification accuracy of 96.2% for RF classifier when compared to MLPNN and LSSVM. A compact low cost embedded system developed using optimum feature extraction and classification algorithms may find application in automated seizure diagnosis.
The authors are grateful to Dr. M. D. Mohire, MD, DM (Neurology), Neuro-physician, Mohire's Neurology Center & Research, Kolhapur, Maharashtra, INDIA for observing the results and providing the EEG dataset for experimentation. We also thank the developers of open source ITL toolbox and TQWT toolbox in MATLAB.
[1] |
Devinsky, Orrin, and J. Kiffin Penry. Quality of life in epilepsy: the clinician's view[J]. Epilepsia, 1993, 34: S4–S7.
|
[2] |
Panayiotopoulos CP. A clinical guide to epileptic syndromes and their treatment[M]. 2nd ed. London: Springer, 2007: 185–206.
|
[3] |
WHO. Mental health: new understanding, new hope[R]. Geneva, Switzerland: WHO, 2001.
|
[4] |
Kumar S, Madaan R, Bansal G, et al. Plants and plant products with potential anticonvulsant activity—a review[J]. Pharmacog Commun, 2012, 2(1): 3–99.
|
[5] |
Gargiulo G, Calvo RA, Bifulco P, et al. A new EEG recording system for passive dry electrodes[J]. Clin Neurophysiol, 2010, 121(5): 686–693. doi: 10.1016/j.clinph.2009.12.025
|
[6] |
Siegel AM. Presurgical evaluation and surgical treatment of medically refractory epilepsy[J]. Neurosurg Rev, 2004, 27(1): 1–18. doi: 10.1007/s10143-003-0305-6
|
[7] |
Giannakakis G, Sakkalis V, Pediaditis M, et al. Methods for seizure detection and prediction: an overview[M]//Sakkalis V. Modern Electroencephalographic Assessment Techniques: Theory and Applications. New York, NY: Humana Press, 2014: 131–157.
|
[8] |
Verma N, Shoeb A, Bohorquez J, et al. A micro-power EEG acquisition SoC with integrated feature extraction processor for a chronic seizure detection system[J]. IEEE J Solid-State Circuits, 2010, 45(4): 804–816. doi: 10.1109/JSSC.2010.2042245
|
[9] |
Hassan AR, Siuly S, Zhang YC. Epileptic seizure detection in EEG signals using tunable-Q factor wavelet transform and bootstrap aggregating[J]. Comput Methods Programs Biomed, 2016, 137: 247–259.
|
[10] |
Hassan AR, Bhuiyan MIH. Automatic sleep scoring using statistical features in the EMD domain and ensemble methods[J]. Biocybern Biomed Eng, 2016, 36(1): 248–255.
|
[11] |
Hassan AR, Bhuiyan MIH. An automated method for sleep staging from EEG signals using normal inverse Gaussian parameters and adaptive boosting[J]. Neurocomputing, 2017, 219: 76–87.
|
[12] |
Hassan AR, Bhuiyan MIH. A decision support system for automatic sleep staging from EEG signals using tunable Q-factor wavelet transform and spectral features[J]. J Neurosci Methods, 2016, 271: 107–118.
|
[13] |
Hassan AR, Bashar SK, Bhuiyan MIH. Automatic classification of sleep stages from single-channel electroencephalogram[C]//Proceedings of 2015 Annual IEEE India Conference. New Delhi, India: IEEE, 2015.
|
[14] |
Hassan AR, Huda MN, Sarker F, et al. An overview of brain machine interface research in developing countries: opportunities and challenges[C]//Proceedings of the 2016 5th International Conference on Informatics, Electronics and Vision. Dhaka, Bangladesh: IEEE, 2016.
|
[15] |
Rahman M, Bhuiyan MIH, Hassan AR. Sleep stage classification using single-channel EOG[J]. Comput Biol Med, 2018, 102: 211–220. doi: 10.1016/j.compbiomed.2018.08.022
|
[16] |
Hassan AR, Subasi A. A decision support system for automated identification of sleep stages from single-channel EEG signals[J]. Knowl-Based Syst, 2017, 128: 115–124.
|
[17] |
Kohavi R, John GH. Wrappers for feature subset selection[J]. Artif Intell, 1997, 97(1-2): 273–324. doi: 10.1016/S0004-3702(97)00043-X
|
[18] |
Huang NE, Shen Z, Long SR, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis[J]. Proc Roy Soc A Math Phys Eng Sci, 1998, 454(1971): 903–995. doi: 10.1098/rspa.1998.0193
|
[19] |
Rato RT, Ortigueira MD, Batista AG. On the HHT, its problems, and some solutions[J]. Mech Syst Signal Process, 2008, 22(6): 1374–1394. doi: 10.1016/j.ymssp.2007.11.028
|
[20] |
Wu ZH, Huang NE. Ensemble empirical mode decomposition: a noise-assisted data analysis method[J]. Adv Adapt Data Anal, 2009, 1(1): 1–41.
|
[21] |
Acharya UR, Molinari F, Sree SV, et al. Automated diagnosis of epileptic EEG using entropies[J]. Biomed Signal Process Control, 2012, 7(4): 401–408. doi: 10.1016/j.bspc.2011.07.007
|
[22] |
Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information[J]. Phys Rev E Stat Nonlin Soft Matter Phys, 2004, 69(6): 066138. doi: 10.1103/PhysRevE.69.066138
|
[23] |
Riedl M, Müller A, Wessel N. Practical considerations of permutation entropy: a tutorial review[J]. Eur Phys J Spec Top, 2013, 222(2): 249–262.
|
[24] |
Hu M, Liang HL. Intrinsic mode entropy based on multivariate empirical mode decomposition and its application to neural data analysis[J]. Cogn Neurodyn, 2011, 5(3): 277–284. doi: 10.1007/s11571-011-9159-8
|
[25] |
Liu WF, Pokharel P, Xu JW, et al. Correntropy for random variables: properties and applications in statistical inference[M]//Principe JC. Information Theoretic Learning: Renyi's Entropy and Kernel Perspectives. New York: Springer, 2010: 385–413.
|
[26] |
Haykin S, Network N. A comprehensive foundation[J]. Neural Netw, 2004, 2: 41.
|
[27] |
Sharma M, Pachori RB, Acharya UR. A new approach to characterize epileptic seizures using analytic time-frequency flexible wavelet transform and fractal dimension[J]. Pattern Recognit Lett, 2017, 94: 172–179.
|
[28] |
Suykens JAK, Vandewalle J. Least squares support vector machine classifiers[J]. Neural Process Lett, 1999, 9(3): 293–300. doi: 10.1023/A:1018628609742
|
[29] |
Breiman L. Random forests[J]. Mach Learn, 2001, 45(1): 5–32.
|
[30] |
Sharma R, Pachori RB, Acharya UR. Application of entropy measures on intrinsic mode functions for the automated identification of focal electroencephalogram signals[J]. Entropy, 2015, 17(2): 669–691. doi: 10.3390/e17020669
|
[31] |
Haynes W. Student's t-test[M]//Dubitzky W, Wolkenhauer O, Cho KH, et al. Encyclopedia of Systems Biology. New York: Springer, 2013: 2023–2025.
|
[32] |
Gupta V, Priya T, Yadav AK, et al. Automated detection of focal EEG signals using features extracted from flexible analytic wavelet transform[J]. Pattern Recognit Lett, 2017, 94: 180–188. doi: 10.1016/j.patrec.2017.03.017
|
1. | Li G, Zheng C, Cui Y, et al. Diagnostic efficacy of complexity metrics from cardiac MRI myocardial segmental motion curves in detecting late gadolinium enhancement in myocardial infarction patients. Heliyon, 2024, 10(11): e31889. DOI:10.1016/j.heliyon.2024.e31889 |
2. | Yoo SH, Huang G, Hong KS. Physiological Noise Filtering in Functional Near-Infrared Spectroscopy Signals Using Wavelet Transform and Long-Short Term Memory Networks. Bioengineering (Basel), 2023, 10(6): 685. DOI:10.3390/bioengineering10060685 |
3. | Boubchir L. Editorial commentary on special issue of Advances in EEG Signal Processing and Machine Learning for Epileptic Seizure Detection and Prediction. J Biomed Res, 2020, 34(3): 149-150. DOI:10.7555/JBR.34.20200700 |
IMF | Feature parameters | Real time dataset | |
FP-channel | P-channel | ||
IMF1 | KraEn | 5.09 | 4.92 |
PermEn | 1.64 | 1.65 | |
SampEn | 2.15 | 2.16 | |
IMF2 | KraEn | 2.79 | 2.66 |
PermEn | 2.58 | 2.58 | |
SampEn | 2.19 | 2.18 | |
IMF3 | KraEn | 2.40 | 2.27 |
PermEn | 2.29 | 2.29 | |
SampEn | 2.19 | 2.19 | |
IMF4 | KraEn | 3.03 | 3.04 |
PermEn | 1.74 | 1.74 | |
SampEn | 1.89 | 1.89 | |
IMF5 | KraEn | 4.84 | 4.62 |
PermEn | 1.31 | 1.32 | |
SampEn | 2.14 | 2.13 | |
IMF6 | KraEn | 3.88 | 3.65 |
PermEn | 1.21 | 1.21 | |
SampEn | 2.18 | 2.17 | |
IMF7 | KraEn | 3.31 | 3.16 |
PermEn | 1.11 | 1.11 | |
SampEn | 2.18 | 2.17 | |
IMF8 | KraEn | 3.11 | 2.95 |
PermEn | 1.07 | 1.06 | |
SampEn | 1.94 | 1.94 | |
IMF: intrinsic mode functions; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
Features | Computation time (second) |
KraEn | 0.055 |
PermEn | 0.043 |
SampEn | 0.050 |
IMF: intrinsic mode functions; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
Feature | IMF | P-value |
KraEn | IMF1 | 0.0413 |
IMF2 | 0.0365 | |
IMF3 | 0.0351 | |
IMF4 | 0.0257 | |
PermEn | IMF1 | 0.0371 |
IMF2 | 0.0194 | |
IMF3 | 0.000084 | |
IMF4 | 0.000057 | |
SampEn | IMF1 | 0.000147 |
IMF2 | 0.000026 | |
IMF3 | 0.000054 | |
IMF4 | 0.000135 | |
IMF: intrinsic mode functions; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
Feature | Non-focal EEG signal | Focal EEG signal | P-value (average for all four signals of first IMF decomposed) | ||
x-channel | y-channel | x-channel | y-channel | ||
CenCorrEn | 0.1582 | 0.1463 | 0.1763 | 0.1864 | 1.7381×10-21 |
CenCorrEn: centered correntropy; EEG: electroencephalogram. |
Kernel | Sensitivity | Specificity | Accuracy |
LS-SVM Gaussian RBF kernel with σ2 =0.4 and gam = 10 | 85.2 | 83.0 | 84.5 |
RF | 94.5 | 95.6 | 96.2 |
LS-SVM: least squares support vector machine; RF: random forest; RBF: radial basis function. |
Brain regions | Channel name | Computation time (second) | ||
KraEn | PermEn | SampEn | ||
Region 1 | FP1, FP2, FZ, F3 | 3.57 | 2.52 | 3.34 |
Region 2 | F7, T6, F7 | 3.41 | 2.49 | 3.25 |
Region 3 | C3, P3, C4, P4, CZ, PZ | 3.58 | 2.53 | 3.29 |
Region 4 | F4, F8, T3, T4 | 3.87 | 2.74 | 3.61 |
Region 5 | O1, O2 | 3.27 | 2.81 | 3.11 |
EEG: electroencephalogram; KraEn: Kraskov entropy; PermEn: permutation entropy; SampEn: sample entropy. |
CenCorrEn | Computation time (second) |
IMF-1 | 0.47 |
IMF-2 | 0.48 |
IMF-3 | 0.45 |
IMF-4 | 0.47 |
CenCorrEn: centered correntropy; IMF: intrinsic mode function; TQWT: tunable-Q wavelet transform. |
Authors | Features | Classifier | Accuracy (%) | Sensitivity (%) | Specificity (%) |
Rajeev Sharma et al[12] | Entropy | LS-SVM (RBF kernel) | 86.00 | 88.00 | 84.00 |
Vipin Gupta et al[32] | Entropy | LS-SVM (RBF kernel) | 94.41 | 93.25 | 95.57 |
Baseline method | DWT-entropy | RF | 87.00 | 86.23 | 83.12 |
Proposed work | Entropy | RF | 96.20 | 94.50 | 95.60 |
LS-SVM: least squares support vector machine; RBF: radial basis function; DWT: discrete wavelet transform; RF: random forest. |