Using of the Machine Learning Methods to Identify Bronchopulmonary System Diseases with the Use of Lung Sounds

This study reviews the main approaches to the analyzing of modern methods of digital processing of lung sounds. It is shown that each of the existing methods gives a definite result in solving a particular problem. However, none of the methods that were reviewed, can’t be called universal and completely convenient for using in the real conditions of the hospital. Certain numerical parameters can be obtained, as a result of the work of each method. In this study it is showed that machine learning can serve as a unifying mechanism for the considered methods. A set of different parameters can be the input arguments of the classifier, which will be properly trained. As a result, the primary opinion in a convenient and accessible form can be presented to the doctor.


Introduction
Lung sound is a physiological sound signal produced by the human respiratory system during the exchange process with external environment, which contains large amount of physiological and pathological information.
According to general lung sounds classification of the American Thoracic Society, the typical lung sounds can be divided into normal lung sounds, bronchial sound, continuous or discontinuous additional audio, etc.The continuous or discontinuous additional sounds include coarse crackles, fine crackles, asthma beep and rhonchi [1].
Different lung disease can be based on the corresponding test for diagnosis of pulmonary abnonnal sound, such as wheezing sound can be used to detect asthma [2], rhonchi for diagnosis of chronic obstructive pulmonary disease [3] effective and sound burst is one of the most important characters of pneumonia and pulmonary fibrosis disease detection [4], and so on.
Pulmonary diseases represent a large disease burden in terms of morbidity and mortality worldwide.Auscultation, the process of listening to a patient's heart or lung sounds, is perhaps the oldest and most common medical procedure still in use today.
This technique was significantly advanced through the invention of the stethoscope by Rene Laennec, exactly 200 years ago.
In the hospital, using percussion and auscultation are the most common ways for physical examination.Recently, in order to develop tele-medicine and home care system and to assist physician getting better auscultation results; electric stethoscope and computer analysis have become an inevitable trend.
Recently, auscultation has become useful not only for pointofcare diagnostics, but also as a tool for remote patient monitoring and telemedicine.
The problem of accurate and timely diagnosis is relevant in relation to the increased number of patients and with the remaining high percentage of unsatisfactory outcomes of treatment.Application of computer methods of recording and analysis in the research of noise of lungs allows the doctor to remove the subjectivity of hearing and to identify pathological features which are not audible to the human ear.
Modern medicine is developing very fast, so engineers are trying to find new and more accurate methods of diagnosis that allow detecting pathology earlier and provide medical care [5].
The analysis of lung sounds, collected through auscultation, is a fundamental component of pulmonary disease diagnostics for primary care and general patient monitoring for telemedicine.
The development of computer algorithms to the study of lung sound provides a broader research ideas and methods, with the development of digital signal processing.

Modern technical and electronic means allow
conducting medical diagnostics at a higher level, faster, more accurate and more comfortable for the patient and the doctor.
In addition, it should be mentioned that electronic In the research of respiratory sounds the division overtones on pleural friction, dry and wet wheezing are frequently used.They differ in frequency content, duration and periodicity appearance in the audio path of breath.Each of these phenomena is listened to background basic breathing -bronchial and vesicular, the presence of which in the overall case is not a disease.

Spectral analysis
Today, there is a quite vast array of approaches that are based on the Fourier analysis.This method has the advantage: calculation and informative results.The first is classic spectral analysis of respiratory sounds that is based on the determination of the spectral density [10].
The main breathing (bronchial and vesicular) occupies a very wide frequency range.This causes major difficulties for using the frequency analysis of breathing sounds.In most cases, it is quite problematic to differ breathing noises against the background of the main breathing through the overlap of frequency ranges and a small difference in amplitudes.
Time-frequency analysis is used for more detailed analysis.This method allows researching the signal more detailed because provides information about the time intervals and other frequency components.
This approach allows identifying a lot of auscultation phenomena, however, it requires specific sound environment in process of registration and high quality of the recording equipment [11].analysis is that the frequency distortion and tonal characteristics of breathing of particular person does not affect the analysis results [13].It means that the lung sounds that were recorded in different clinics can be analyzed regardless of which sound recording equipment they were made.The greatest attention was paid to the research of autocorrelation of the 3rd order.
The application of this method allows to detect random artifacts (for example, cracking) in the process of breathing, as well as determine the degree of their nonperiodicity.As shown earlier, traditional spectral analysis doesn't allow to identificate of breathing sounds because of the intersection of frequency ranges.
Cumulative analysis allows you to research signals for finding the presence of random additional sound in them, for example, crackling.This method allows us to evaluate not only the monotony of the signal, but also its frequency.A significant advantage of using higher order autocorrelation to research lung noise is the absence of an identical sound recording device.

Cepstral analysis
Today it is generally accepted that the cepstral is the spectrum of the logarithm of the spectrum of the output signal, it means, the primary spectrum should be represented on a logarithmic scale.The main advantage is the ability to provide output spectral information even more compactly, when each harmonic series of the output spectrum will be represented by only one (ideally) component in the cepstral.
It is important to understand the fundamental differences that are between frequency components of the traditional spectrum and frequency components in the spectrum, that called cepstrum.In the first case, any frequency component has a physical sense signal with this frequency and amplitude that truly present in the output signal in the time domain.In the second case, presence of harmonics in cepstrum can not mean that the original spectrum has appropriate frequency.

Analysis methods of high-order statistic
The complex nature of lung sounds is the reason for applying to their analysis methods of high-order statistic.
The energy spectrum data represents a complete description of the Gaussian process.But in some cases it is necessary to obtain information on deviations from the Gaussian distribution and to obtain information about the presence of nonlinear link.In such cases it is better to use higher order spectrum (HOS, order> 2) that containing the required information.The spectrum of the third order is bispectrum, the fourth is the three spectrum.In fact, the power spectrum is a second-order spectrum.
At least, there are 3 reasons for using HOS analysis for processing biomedical signals: -Suppressing of Gaussian noise and reducing of dispersion.Gaussian noises have a zero-order spectrum.Due to this, the high-order spectrum does not contain noise components, which makes it possible to detect useful signals more easily.
-Possibility of phase recovery.
High-order spectrum saves the information about the phase.
-Detection and the ability to characterize nonlinear bonds in the biomedical signals.HOS is a nonlinear data function, so it is a comfortable tool for detecting nonlinearities [14].

The iterative method
Another method is the iterative method that based on the kurtosis coefficient for detecting unsteady bi- On signal black man window can make the overall classification rate increased from 56% to 66%.

Support Vector Machine
The Support Vector Machine (SVM) is based on statistical learning theory based on VC dimension theory and structure risk minimum principle, according to the limited sample information in the complexity of the model (on a particular learning accuracy of training samples) and learning ability (not wrongly's ability to identity random sample) to seek the best compromise between, in order to get the best generalization ability.The results of using the method of average power spectrum and instantaneous frequency of normal lung crackling sound, and snoring classification recognition, showed that the PSD of frequency ratio and the average instantaneous frequency and instantaneous frequency switching time three feature extraction method, the feature extraction method based on PSO has higher classification accuracy, especially for rhonchi recognition accuracy as high as 90% -95%.

K Nearest neighbor
K -to his Neighbour (KNN) classification algorithm, is a mature method in theory, is also one of the most simple machine learning algorithms.The idea of the method is: if a sample in the feature space k most similar (in the feature space for adjacent) most of the sample belongs to a category, then the sample also belong to this category [17].

Decision tree method
A decision tree (also called a classification tree or regression tree) is a decision support tool used in statistics and data analysis for predictive models.
The structure of the tree includes "leaves" and "branches".On the edges ("branches") of the decision tree corresponds to features dependencies for target function, the "leaves" contains the values of the target function, and in the remaining nodes -the features to separate the classes.To classify a new case, it is required to go down the tree to the "leave" and obtain the corresponding value.The goal is to create a model that predicts the value of the target variable based on several variables at the input.
This method is easy to understand and interpret, and does not require preparation of data, in addition it allows to evaluate the model with the help of statistical tests.This makes it possible to assess the reliability of the model.
Using deep learning method, gives the ability to automatically identify lung sounds from a reasonably large number of patients -significantly larger than any previously published study.
The creation of these algorithms enables the possibility of self-contained automated systems that can provide diagnostic guidance for telemedicine and remote patient monitoring, as well as point-of-care diagnostics for lowskilled health workers in many parts of the world [18].

Materials and methods
Using methods of higher order statistics (HOSA) for analysis is a good idea because the breath sounds have the complex nature.Thus the interest can induce spectral components of the respiratory sounds, as well as phase components.
The calculation of the skewness and kurtosis coefficients used for the analysis of respiratory sounds in this study: where  2 is the variance,  3 is the third-order cumulants: 4 is the fourth moment about the mean.
Non-zero values of the skewness coefficients allows to evaluate the nature and extent of process deviation from the Gaussian noise within of a one-dimensional distribution.
Kurtosis is a measure of the "pointedness" of the probability distribution of a real-valued random variable.Under the above definition, the kurtosis of any univariate normal distribution is 3. Assuming that a real-valued lung sounds sequence
If we consider the -model for three harmonics with frequencies   ,  = 1, 2, 3, that harmonics are quadratically phase coupled if  3 =  1 + 2 .In this case ideally () contains impulses at ±  , and  3 ( 1 ,  2 ) is given by Machine learning can be one of the best solution for the task of diseases diagnosing.To solve this problem, the best type is "learning with a teacher".In this study a database of lung sounds that contain a set of specific parameters was used.There are certain dependencies between the parameters and the response that need to be established.To do this training subset was used.
In this work, classifiers of different types for the detection of lung diseases have been investigated and analyzed.Namely, the classifier based on the k nearest neighbors method, based on the decision trees (DT), based on the support vector method (SVM).
So, the database of 134 patients (54 healthy and 80 patients with bronchopulmonary diseases) was used.
After using HOSA for our signals, parameters that characterize healthy and sick patients were obtained.
However, since in fact the number of parameters were significantly larger, as an example, we will show two of them.
For example, skewness coefficients were calculated for each breathing phase according to (1).It was found that this coefficient for healthy patients has different signs in the separate phases, as a result, the average value for the whole signal is usually close to zero.The For patients with recurrent bronchitis skewness coefficient is negative in 84% of all cases, and therefore the distribution function is shifted to negative values (Fig. 1b).
Another parameter, the coefficient of kurtosis.It was found that only for patients with bronchitis signals this deviation may exceed 50% (Table 1 and 2).The dataset was divided into training and test subsets in the ratio of 85% and 15%.
The results of the work of various classifiers are presented in Tables 3-5.
The final value was calculated as the average for all four classes.As an example of the of SVM method, the relation of two features of the respiration sounds  is shown on Fig. 2. As we can see, for the dataset of the lung sounds, the support vector machine and the decision tree classifiers are proved to be optimal.

Conclusion
In this study the methods for analysis lung sounds and the possibility of using machine learning to optimize and universalize considered methods were reviewed.Each of the reviewed methods of analysis is used in processing, for example, as signal denoising or for finding artifacts of lung sounds.As a result, we obtain a sufficiently large number of some diagnostically valuable parameters.On the one hand, it is good, but on the other hand it is rather inconvenient for processing and perception of information.Symbiosis of various methods of digital processing with modern tools of machine will significantly improve the accuracy of the methods, as well as greatly facilitate the work of the doctor.In this study one of the methods was used, it was found 7 features that were used in 3 different classifiers.For the dataset of the lung sound, the SVM classifier and the decision tree classifier are turned out as optimal.The greatest accuracy of the right decisions was obtained for these classifiers.
The resulting models of classifiers can be easily adapted for more features that significantly increases the accuracy for further research.

1
auscultation and the extraction of diagnostically valuable parameters from signals makes it possible to use signals in telemedicine.Signed signals and / or their parameters can be easily transferred, for example, to a central server where the patient database can be stored, or to another clinic.In this study, an overview of modern techniques of auscultation, methods of lung sounds analyzing, and using to improve processing in machine learning is given.Methods of processing The ability to record lung sounds allows signal processing and machine learning techniques to automatically analyze the recorded sounds to provide diagnostic support.For over 30 years, a number of different signal processing and machine learning methods have been proposed in the literature for the automated detection of abnormal lung sounds and diagnosis of pulmonary disease [6-9].Many of thesemethods focus on frequency domain features such as peaks, or compare the ratio of power within certain frequency bands.
oacoustic signals was made by scientists from the Department of Informatics and Communication, the Technological and Educational Institute of Serres, Greece, the Faculty of Electrical and Computer Engineering, the University of Aristotle in Thessaloniki, Thessaloniki, Greece.The purpose of their work was to create a new technique based on the coefficient of kurtosis for the determination of non-stationary bioacoustic signals, such as sounds of the lungs.For Gaussian signals, the kurtosis is zero, a significant deviation from this value can be attributed to the presence of non-Gaussian signals that are interesting for diagnostics.These deviations from the zero value can be used to formulate a criterion for identifying the presence of non-stationary transient signals.Based on the kurtosis criterion, the iterative kurtosis detector was adopted, which gradually separates the useful signal from noise.The experimental results showed that the iterative kurtosis detector is able to clearly detect bioacoustic signals even when the amplitude of the useful signal is high [15].1.7 J. Method of acoustic intensities The original method of acoustic intensities allows dividing noises into the different spectral components of air and structural conduction of voice and respiratory.Observation of very close distances to sources of whistles with different frequencies of spectral maximum can be interpreted as presence in one and the same place of violations of respiratory tracts in the chest, which, depending on compression in different phases of breathing, give a variation of frequencies of zone's oscillations of closure of the respiratory tract.Last effect may be a sign of the center of pathological changes that associated with the inflammatory process, in the respiratory tract, or their deformation by adjacent pathologically altered areas of pulmonary tissues.As a result of the simultaneous or sequential calculation of the distance to the source of whistle from several areas of the chest with the help of differencedistance-finding methods, the location of the whistle source in the lungs can also be estimated[16].2Machine LearningMachine learning is the core of artificial intelligence, it is the basic way to make the computer intelligence.For lung sound signal processing, the researchers expect to under the appropriate algorithm and the model makes the lung sound recognition classifier can build in a lot of lung sound data processing and improve their own judgment in discriminant ability, fmally realizes the accurate effective automatic recognition and classification.Classification of lung sound objective is to build a classification function or classification model (also called classifier).the model can be according to a certain classification method for the mapping of data items to extract the characteristics of the data to a given category of one.Typical classification method in the field of lung sound classification also got different degrees of trial and application, including the following several classification methods in classification of lung sounds are more widely in the research and application.2.1 Vector Quantization Vector Quantization (VQ) is an extremely important signal compression method.In speech signal processing of VQ accounts for an important position.Widely used in speech coding, speech recognition and speech synthesis, etc. to the crackles and wheezing sound classification experiment, the results show that the construction of feature extraction based on wavelet packet of classifier for crackles methods more than ever, while the wheezing sound instead.2.2 Artificial Neural Network Artificial Neural Network (ANN), a mathematical model to simulate neuron activity, is based on imitation of the brain Neural Network structure and function of an information processing system.Artificial neural network has very strong self-learning, self-organizing, adaptive and nonlinear function approximation ability, has a strong fault tolerance.It can realize the simulation and prediction and fuzzy control, and other functions.Is a powerful tool for dealing with a nonlinear system.Neural network is a computing model, by a large number of nodes (or neurons) and join each other.Each node represents a specific output function, known as the excitation function (activation function).Each connection between two nodes represents a for weighted values through the connection signal, called the weight, this is equivalent to the memory of an artificial neural network.The output of the network is in accordance with the network connection mode, the different weights and incentive function.The network itself is usually 771 an algorithm of nature or function approximation, can also be a logical strategy for expression.The results of using Fourier transform power spectrum of normal lung sounds, wheezing sound and burst classified recognition, show that the vector can be correctly classified as much as 95% of the training, but only 43% accurate classification of test vector.Method of discrete wavelet transform (DWT) of normal lung sound, a variety of abnormal lung sound classification recognition, divided into 6 categories according to the results, the normal sound, wheezing sound, burst, shrill, wheezing, rhonchi.Use 100% of the training set, the classification accuracy, the use of the validation set the classification accuracy of 94.02%.The results of using the method of average power spectral density of normal and abnormal lung sound classification recognition, show that the signal segmentation can make the best overall classification rate from 60% Up to 70%.

4
Parametric estimation Phase relationships of signals are lost in the power spectrum and in the autocorrelation function.The higher order spectra make possible detection and quantitative description of nonlinearities in signals (not only stochastic).Such signals arise, when they are passed through the systems with nonlinear characteristic.The human body due to its inhomogeneity can be an example of such nonlinear system.

Class 2 :Class 3 :
ned that were used to set up classifiers: the average of the four channels the value of the asymmetry coefficient for each respiratory cycle; the root-mean-square value of the asymmetry coefficient for each cycle; the average frequency in four channels corresponding to the maximum value of the bicocurrent function in each respiratory cycle; rms value of the given frequency; the frequency corresponding to the maximum of the bicogeasure function of the entire signal not splitted into respiratory cycles in each channel; average frequency in all channels; the maximum value of the bicoherence function for each channel.In the study, both a two-class classification (healthy-sick) and a multi-class classification were carried out.As a result, there were 4 classes: Class 1: Healthy; Chronic obstructive pulmonary disease, basal lower lobe pneumofibrosis; Chronic obstructive pulmonary disease, diffuse pneumofibrosis; Class 4: Another pathology.

Fig. 2 .
Fig. 2. Example of the of SVM method work for two parameters.

Table 1
Kurtosis Coefficients for Healthy Patient

Table 2
Kurtosis Coefficients for Patient with Bronchi-

Table 5
Decision Tree Classifier Results