Abstract
Accuracy of glioma grading is fundamental for the diagnosis, treatment planning and prognosis of patients. The purpose of this work was to develop a low cost and easy to implement classification model which distinguishes low grade gliomas (LGGs) from high grade gliomas (HGGs), through texture analysis applied to conventional brain MRI. Different combinations between MRI contrasts (T1Gd and T2) and one segmented glioma region (necrotic and non-enhancing tumor core (NCR/NET)) were studied. Texture features obtained from the Gray Level Size Zone Matrix (GLSZM) were calculated. An under-samplig method was proposed to divide the data into different training subsets and subsequently extract complementary information for the creation of distinct classification models. The sensitivity, specificity and accuracy of the models were calculated. The best model was explicitly reported. The best model included only three texture features and reached a sensitivity, specificity and accuracy of 94.12%, 88.24% and 91.18% respectively. According to the features of the model, when the NCR/NET region was studied, HGGs had a more heterogeneous texture than LGGs in the T1Gd images and LGGs had a more heterogeneous texture than HGGs in the T2 images. These novel results partially contrast with results from literature. The best model proved to be useful for the classification of gliomas. Complementary results showed that heterogeneity of gliomas depended on the studied MRI contrast. The model presented stands out as a simple, low cost, easy to implement, reproducible and highly accurate glioma classifier. What is more important, it should be accessible to populations with reduced economic and scientific resources.
Introduction
Gliomas are tumors formed by the glial cells of the nervous tissue. These can be benign or malignant. Malignant gliomas represent about 80% of all malignant brain tumors [1] and can be classified as low grade gliomas (LGGs) (grade II, according to the World Health Organization (WHO)) or high grade gliomas (HGGs) (grade III and IV, according to the WHO) [2]. LGGs are usually slow-growing and infiltrative tumors. Treatment consists of a complete resection of the tumor and a subsequent follow-up of the patient. In some cases, chemotherapy and radiotherapy may be necessary. On the other hand, HGGs have a rapid evolution and immediate treatment is necessary. It includes complete resection, chemotherapy and radiotherapy [3]. HGG patients have a low life expectancy (approximately 1 to 2 years), while LGG patients have a longer life expectancy (approximately 5 to 10 years) [4]. The 2016 WHO Classification of Tumors of the Central Nervous System showed that the molecular characteristics have greater importance in comparison to the histological characteristics for the diagnosis and management of the patient [2]. However, as in other medical classification problems, a single approach is usually not sufficient to provide all the information necessary for the understanding of a disease and the accuracy of its diagnosis [5]. On the other hand, the latest advances in disease diagnosis are not always accessible to the entire population, as is the case in developing countries. Then, the creation of low cost and relatively easy to implement diagnostic methodologies is useful and necessary.
Different imaging techniques, such as Computed Tomography (CT), Positron Emission Tomography (PET), Single-Photon Emission Computed Tomography (SPECT), Magnetic Resonance Imaging (MRI), Infrared Spectroscopic Imaging and a combination of them have been used for the study and classification of gliomas [6–14]. However, intracranial tumors are best evaluated on MRI [15] and conventional MRI is traditionally employing in many works whose objective is to distinguish LGGs from HGGs [16]. Moreover, MRI is first utilized clinically when, after a general medical examination, there is a suspicion of the presence of a brain tumor. Once confirmed its existence, conventional and advanced MRI can be used to delimit the tumor, to follow its evolution, as well as to obtain some evidence about its type and malignancy grade [17]. However, the characterization of gliomas by imaging is difficult, since they can present mixed characteristics of both low and high grade. Currently, the standard diagnosis of gliomas is performed by histopathological tests after performing a surgical resection or a stereotactic biopsy (which can be guided by MRI) [18]. These procedures are invasive and can be risky due to the location of the tumor. Moreover, due to the heterogeneity of the gliomas, a biopsy presents problems such as taking samples that are not representative of the complete tumor as well as variability in the interpretation [19]. Therefore, there is a need in the clinical comunity to develop new non-invasive and prefereably automatic diagnostic methodologies, such that the diagnosis, treatment planning and prognosis of glioma patients are improved.
To date, various computational methodologies have been developed for the classification of gliomas. Some of them study conventional (anatomical) [7, 20] or advanced MRI (Perfusion or Diffusion Weighted Imaging, spectroscopy, etc.) [5, 21–23]. They use qualitative, semiquantitative or quantitative variables, or a combination of them. Some variables are obtained from a specific MRI (for example, diffusion variables). However, there are quantitative analytical methodologies that allow measuring variables of all MRI. One of the most used is texture analysis [20, 23, 24]. It basically consists of quantifying the spatial distribution of pixels (2D images) or voxels (3D images) with different intensities of gray levels and extracting information through statistical variables (for example, correlation, homogeneity, contrast, entropy, etc.) [25]. Also, color images with R(ed) G(reen) B(lue) format can be analyzed by separating their three components as gray level images. Thus, texture analysis is versatile for any type of MRI [26]. In this type of analysis, texture features are extracted from the calculation of texture matrices. Among them there are for example the called Gray Level Size Zone Matrix (GLSZM). The GLSZM measures the number of gray level zones i and size j [27]. Basically it calculates the number of times that gray level voxels i were grouped forming a set (zone) of j voxels considering all the possible directions (26 in 3D). Since its invention, the GLSZM has proven to be useful when the main characteristic to study is heterogeneity [27]. Since different glioma grades are characterized by having different grades of heterogeneity [28], characterizing gliomas through texture features obtained from the GLSZM is convenient.
Concerning classification models, simple mathematical methods (linear regression, logistic regression, etc.) and complex mathematical methods (fuzzy modeling, artificial neural networks, etc.) have been used for glioma grading [29]. Although the most complex models tend to be the most flexible, the computational cost is also higher [30]. On the other hand, the utility of any model is usually measured after validation. For classification models, this commonly consists of taking training samples to create the model and testing samples to ratify its results. However, because usually the available databases contain a small number of samples, it is not always possible to validate the results [19, 21–23]. Moreover, many of the databases used in different studies are private or were acquired using specific protocols, which does not always allow extrapolating their results or reproducing them through independent studies. Also, a common problem present in many databases is the so-called “class imbalance”, which occurs when one or more classes have a greater or lesser number of samples than the rest of them. The consequence is that a classifier will be biased towards the classes with the highest number of samples [31]. There are several strategies to deal with this problem such as under-sampling (elimination of samples from major classes), over-sampling (replication of samples from minority classes), cost sensitive learning (taking into account misclassification costs), among others [32].
The objective of the present work was to develop a non-invasive, semi-automatic, simple and reproducible method to differentiate low and high grade gliomas, with the benefit that it can be used in developing countries with limited access to technology. This was done studying different conventional MRI contrasts by applying texture analysis (which are low cost and easy to implement) and finally reporting explicitly the best classification model. In addition, since an imbalanced database was studied, an under-sampling approach was proposed, in which first different subsets of gliomas with balanced classes were created, and then complementary information from each of them was extracted in order to create a variety of classification models.
Materials and methods
Patient database
The patient database belonging to the Multimodal Brain Tumor Segmentation (BRATS) Challenge 2018 [33–36] was used in this work. It was available online previous data request on the challenge page [33]. The database was comprised of routine clinically-acquired 1.0T, 1.5T and 3.0T pre-operative multimodal MRI scans of 210 glioblastoma (GBM/HGG) and 75 LGG patients, with pathologically confirmed diagnosis. There was information about the age and overall survival of 168 of the 210 HGG patients, having them an average age of 60.33 ± 12.08 years and an average overall survival of 422.96 ± 349 days. The scanners consisted of four conventional MRI contrasts: Native (T1) and post-contrast T1-weighted (T1Gd), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR). Images of the tumors were segmented and manually labeled in different regions: Gd-enhancing tumor (ET), necrotic and non-enhancing tumor core (NCR/NET) and peritumoral edema (ED). The manual segmentation was performed by one to four raters, and their annotations were approved by experienced neuroradiologists. The database was grouped into three sets identified as: BRATS 2013 (from the 2013 challenge database), with 10 LGGs and 20 HGGs; TCIA (The Cancer Imaging Archive), with 65 LGGs and 102 HGGs; and CBICA (Center for Biomedical Image Computing and Analytics) with 88 HGGs (S1 Table). However, the scanners came from 19 different institutions and were acquired with different clinical protocols as well as various scaning systems.
Pre-processing
The database information was already pre-processed [34]. Each patient’s image volumes were co-registered rigidly to the T1Gd MRI and all images were resampled to 1 mm isotropic resolution in a standardized axial orientation with a linear interpolator. A rigid registration model was used with the mutual information similarity metric through the software Insight Segmentation and Registration Toolkit (ITK) [37] (“VersorRigid3DTransform” with “MattesMutualInformation” similarity metric and three multi-resolution levels). All images were skull stripped.
As the features of study in the texture analysis describe different properties based on the gray level intensity of the images, two extra pre-processing steps were performed before the analysis: Intensity inhomogeneity correction and intensity normalization. Intensity inhomogeneities are mainly produced by imperfections in the radiofrequency coils and object dependent interactions; in the images it is observed as a low frequency variation of the intensity across the image [38]. In any quantitative image analysis, a tissue is considered to be represented by similar gray level intensities, so that intensity inhomogeneities have a high influence on the results obtained. Therefore, it was necessary to include the inhomogeneity correction in the pre-processing of this work. Besides, as images were obtained through different clinical protocols and scaning systems, their intensity ranges were different. Thus, to be able to compare the images, intensity normalization was performed.
Inhomogeneity correction was carried out using the FreeSurfer Software Suite version 6.0 in Linux (Ubuntu 14.04) [39], through the tool “nu_correct”, which apply the N3 (nonparametric non-uniformity intensity normalization method) algorithm developed by the Montreal Neurological Institute (MNI). This analyzes the image intensity distribution in order to find the smooth intensity non-uniformity field that maximizes its frequency content [40]. For each patient, this pre-processing was applied to whole brain images (including the glioma region).
Intensity normalization was performed with an algorithm developed in MATLAB software and available online [41, 42]; it was based on the method proposed by Nyul et al [43], in which landmarks are adjusted on different histograms. This process of normalization required a set of reference volumes whose choice was arbitrary. A choice criterion used exclusively for this work consisted of the following: Considering that the range of intensities of each patient volume was different, then those with the lowest and highest ranges were chosen. This was done by averaging the intensities of their voxels excluding the region corresponding to the tumor; since that the tumor environment is highly heterogeneous, it had to be excluded from the reference volumes used for normalization. For each of the available MRI contrasts (T1, T1Gd, T2 y FLAIR), glioma grades (LGG and HGG) and set in which the database was grouped (BRATS 2013, TCIA and CBICA), two reference volumes were chosen: One with the lowest average of intensities and another with the highest. Thus, a total of 40 reference volumes were chosen (remember that CBICA did not have LGGs). Then, to normalize the rest of T1 volumes, all reference T1 volumes were used, and so on with the other MRI contrasts. The normalization range was selected to vary between 0 and 255 in steps of 1 (0 corresponded to absence of value). Gliomas whose volumes were employed for normalization were excluded from further work. Although in total there were 40 reference volumes (16 LGGs and 24 HGGs), some of them corresponded to the same gliomas (for example, for more than one MRI contrast, the same glioma had the lowest or highest average intensity). Thus, the volumes of 11 different LGGs and 19 different HGGs were used for normalization (S2 Table). At the end, 64 LGGs and 191 HGGs were available, this being a database with imbalanced classes.
Database division
Low and high grade gliomas were divided into two classes: Training gliomas and testing gliomas. As the first part of the proposed under-sampling approach, a unique and independent subset formed by testing gliomas (testing subset) and different subsets formed by training gliomas (training subsets) were created. In each subset the same number of LGGs and HGGs was chosen. Classifiers were created from the training subsets and then these were applied to the testing subset.
For the formation of the subsets with balanced classes the following was done. From the 64 LGGs and 191 HGGs, 34 LGGs and 34 HGGs were randomly chosen to form the testing subset (S3 Table). Then, of the remaining 157 HGGs, 30 were chosen randomly and this was repeated 100 times. Then, along with the remaining 30 LGGs (after having chosen the testing LGGs), 100 training subsets were formed (S4 Table) (Fig 1). The above was carried out in order to extract different but complementary information from different training subsets, even though the training LGGs were the same in each one.
Texture features
In this work the matrix called Gray Level Size Zone Matrix (GLSZM) was calculated by studying the region of interest comprised by the tumors. Subsequently, 13 texture features were obtained (being these the ones that are usually extracted from it) [27, 44]). The above was decided as a first study approach, using a small set of texture features in order to simplify analysis and models. Besides, other texture matrices were not used since the results obtained with the GLSZM were good. Names and notations of the 13 features are shown in the Table 1; their definitions and descriptions can be found in the literature [45].
Names and notations of the thirteen texture features are shown [45].
The calculation of the GLSZM and the texture features, in addition to all work that will be described below, were completed through computational algorithms developed in the lab using MATLAB software Version: 8.5.0.197613 (R2015a) in a normal computer system (Intel Core i7-4790 CPU 3.60 Ghz, 16 GB RAM, Windows 7).
MRI contrasts and tumor regions
For simplicity and in order to reduce the time consumed by the computational algorithms developed, only two of four MRI contrasts (T1Gd and T2) and two of three tumor regions (NCR/NET and ED) were analyzed. The reason for choosing these MRI contrasts was that, in initial versions of the work, better results were obtained when those contrasts were analyzed in comparison to the rest. On the other hand, study regions that were common among LGGs and HGGs were a target. Since the Gd-enhancing tumor region was not present in all LGGs, it was excluded from the work.
All possible combinations between MRI contrasts and tumor regions were studied. Thus, the total number of combinations was equal to 15, varying from one MRI contrast and one tumor region (MRIreg) on their own, to all of them together. In each combination, 13 texture features were calculated for each MRIreg. Therefore, as it can be seen in the Table 2, for the first four combinations, 13 features were calculated; for combinations 5 to 10, 26 were calculated; for combinations 11 to 14, 39 were calculated; and for combination 15, 52 were calculated.
Fifteen different studied combinations between MRI contrasts and glioma regions are listed. In each MRI contrast (T1Gd and T2), the region (reg) of the studied tumor is indicated by a superscript. Superscript 1 corresponds to the NCR/NET region, while superscript 2 corresponds to ED region. For each MRIreg, thirteen different texture features were calculated. The total number of calculated features in each combination is indicated.
Classification models
Once the data division was made, the proposed under-sampling approach continued as follows. For each of the 15 combinations, different classification models were created. In general terms, the models were constructed from the training subsets that had the same texture features with the higher significant differences in an orderly manner (according to their p-values obtained after applying statistical tests). Then, different models were created and averaged. Thus, unique models of classification using from one to more texture features were obtained. The procedure for the creation of the models considering some particular combination is described below. However, this same procedure was followed for all 15 combinations.
Features with significant differences
In each of the 100 training subsets, the texture features (13, 26, 39 or 52, depending on the combination) of the respective 30 LGGs and 30 HGGs were compared. The comparison was made applying the Wilcoxon Rank Sum Test. The features were ordered considering their p-value, putting in the first place the one with the lowest p-value and putting in the last place the one with the highest p-value. Then, in each subset only the features that presented significant differences (p < 0.05) were considered. The number of these features was called Di, with i = 1,2, …, 100. Afterward, the minimum of them was calculated and it was called d (i.e., d = min{Di}). Thus, a set of features {Xis} was obtained, with i = 1,2, …,100 (indicating the training subset) and s = 1,2, …, d (indicating the order). Subsequently, d histograms were created, each of them formed from the features located in the same place (from the 100 training subsets) (Fig 2). That is, one histogram with all features located in the first place, another with those located in the second place, and so on until that one with those located in the d position were created. From each of these histograms, the highest frequency feature was chosen. Then, a set {xs}, with s = 1,2, …, d, of ordered highest frequency features was obtained. In case that in some histogram the chosen feature had been the same of a previous histogram, then the following highest frequency feature was selected, so that in the end d different features were obtained.
Creation of unique classification models
For some t, with t ≤ d, training subsets in which their t first ordered features coincided with the t first ordered highest frequency features were chosen. The total of the subsets that complied with the above was called w. In each of the w training subsets a multiple linear regression was carried out employing the set of t features. In the regressions, the independent variable was chosen arbitrarily to be equal to −10 for the LGGs and equal to 10 for the HGGs (whose choice had no particular reason). Hence, w individual regression models were obtained in the form: in which β’s were the coefficients obtained after performing the linear regression, x’s were the variables or ordered highest frequency features and ŷ’s were predictions of the models. Then, in order to obtain a single model from the w created, coefficients associated to the same variable (including constant term coefficient) were averaged. Thus, a unique classification model of t variables was obtained and expressed as: where the were the averaged coefficients. The above was repeated for all possible values of t (from 1 to d). Then, for the considered combination, d different models were obtained using from 1 to d variables; i.e., one model used only the ordered highest frequency feature located in the first place, another model used the two features located in the first and second place, and so on until the model that occupied all the d features.
Application of models
Unique models created from all combinations were applied to the testing subset (34 LGGs and 34 HGGs). If the prediction ŷ of some model was less than 0, then the glioma was classified as LGG; and if it was greater than 0, then the glioma was classified as HGG. Sensitivity, specificity, accuracy and mean absolute error (mae) were calculated for all the models. However, before calculating the mae of each model, the following was done. For any testing LGG, if its predictions ŷ was less than −10, then it was equalized to −10; while for any testing HGG, if its prediction ŷ was greater than 10, then it was equalized to 10. This was done because a value greater than 10 for some testing HGG, or a value less than −10 for some testing LGG, was not considered a bad result; however, the mae of the respective model would has been negatively influenced by this. Thus, there was only interest on the error of predictions of the LGGs above −10 and in the error of predictions of the HGGs below 10. It should be mentioned that in the results section the actual predictions of each glioma were graphically showed. From all unique models created, in this work only the one that obtained the best results was reported.
Reduced models
Assuming that the best model was created from more than one variable, then reduced models were created using all possible combinations among these variables. This was done with the objective of knowing if any of the variables could be left out, still obtaining good results. To understand how training subsets were chosen (among the 100 available) in order to create the reduced models, the following example can be considered. Suppose that the best model used 3 variables. Then, considering their order, the possible combinations between the variables were: 1, 2, 3, 1-2, 1-3, 2-3 and 1-2-3. For combination 1, the training subsets whose first ordered variable was variable 1, were chosen; for combinations 2 and 1-2, the subsets whose first two ordered variables were variables 1 and 2, were chosen; and for combinations 3, 1-3, 2-3 and 1-2-3, the subsets whose first three ordered variables were variables 1, 2 and 3, were chosen. Once considered the variables and their respective training subsets, a procedure similar to that explained in the previous sections was completed, creating individual reduced models and obtaining unique reduced models. These latest models were applied to the testing subset and their sensitivity, specificity, accuracy and mae were calculated. From all unique reduced models, the one that obtained the best results with the least number of variables and mae value was chosen. Its mathematical expression was then explicitly reported. Further, boxplots of the variables used in the best model and obtained from the testing subset were created; and the Wilcoxon rank sum test was applied.
Having the best classification model, the duration of the entire classification process for each testing glioma was measured and averaged. This consisted of the following procedures: Inhomogeneity correction, intensity normalization, calculation of the GLSZM, calculation of the texture features, application of model and classification (Fig 3).
Results
After calculating the texture features of training HGGs and LGGs (S5 Table, S6 Table, S7 Table and S8 Table), the minimum number of features (called d) with significant differences (p < 0.05) of two MRIreg: T1Gd2 and T22 (numbers of combination 2 and 4 respectively in Table 2), whose study region was ED (indicated by superscript 2), was equal to zero. That is, when the only studied glioma region was ED, in some of the 100 training subsets there were no texture features with significant differences between LGGs and HGGs. Therefore, it was decided to exclude from the subsequent work the combinations that included the two mentioned MRIreg. Then, of the total of 15 combinations between MRI contrasts and tumor regions, only three continued to be studied: T1Gd1, T21 and T1Gd1-T21 (numbers of combination 1, 3 and 6 respectively in Table 2), whose study region was NCR/NET (indicated by superscript 1). For combination 1, the value of d was equal to 5; for combination 3, it was equal to 7; and for combination 6, it was equal to 16. Thus, for combination 1, models with 1 to 5 variables were created; for combination 3, models with 1 to 7 variables were created; and for the combination 6, models with 1 to 16 variables were created. Therefore, in total 28 different unique classification models were created (S9 Table).
Figs 4a, 4b and 4c show the results obtained after applying the classification models of combinations 1, 3 and 6 to the testing subset respectively. The percentages of sensitivity, specificity and accuracy reached are indicated. Figs 4d, 4e and 4f indicate the mae of all created models. The model that showed the best results (signaled in Fig 4 with black arrow heads) corresponded to the combination 6 (T1Gd1-T21) using the first 5 ordered highest frecuency features. In order, the 5 features or variables of the models were: Fszm.z.perc, Fszm.zs.var,, Fszm.zs.lzlge, Fszm.zs.lze and Fszm.zsnu. The first four were measured in T21 and the fifth one in T1Gd1.
Following the methodology for the creation of reduced models, 30 models were created using different combinations among the 5 variables of the aforementioned best model (S10 Table). Fig 5a shows the sensitivity, specificity and accuracy obtained by the 30 models when they were applied to the testing subset. Their respective mae are shown in Fig 5b. As it can be seen in Fig 5a, the model that used only the 3 ordered variables 1-2-5, obtained the same results as the model that used all 5 variables (1-2-3-4-5). This reduced model obtained a sensitivity of 94.12%, a specificity of 88.24%, an accuracy of 91.18% and a mae of 5.03. Table 3 shows information regarding the 3 variables (1-2-5) or texture features.
Taking as reference Eq. 1, the mathematical expression of the 3-variables reduced model was: and considering the data shown in Table 3, Eq. 2 became: being this the mathematical expression of the best classification model.
This model used only 3 variables (1-2-5) obtained from the texture features employed in the 5-variable (1-2-3-4-5) model of combination number 6. The order of the variables used is shown, in addition to the MRI contrast and the glioma region from which they were measured. The average coefficient and the minimum and maximum p-values (pmin and pmax respectively) corresponding to each variable are indicated. The coefficient of the constant term is also shown.
In Fig 6, the predictions made by this model when Eq. 3 was applied to the testing subset are presented graphically.
In Fig 7, boxplots made from the 3 variables of the testing LGGs and HGGs are shown. These variables presented significant differences when both study groups were compared (p = 1.21×10−7 for Fszm.zsnu, and p = 1.58×10−7 for Fszm.z.perc and Fszm.zs.var). In addition, it can be seen that the testing LGGs had relatively higher values of Fszm.zs.var compared to the testing HGGs, and the testing HGGs had relatively higher values of Fszm.zsnu and Fszm.z.perc compared to the testing LGGs.
After having obtained the best model, the computation time of the complete classification process on the testing subset was measured and averaged. The individual processes carried out were the following: Inhomogeneity correction and intensity normalization of the T1Gd and T2 contrasts; calculation of two GLSZM, one from T1Gd and another from T2, considering in both only the NCR/NET region; calculation of the 3 texture features, being one obtained from the GLSZM of T1Gd and two from the GLSZM of T2; application of the model (Eq. 3); and classification of gliomas according to the criteria described above. The average time of classification was 2 min 4 s ± 46 s.
Discussion
Through an under-sampling approach to create testing and training subsets with balanced classes, various classification models were created occupying the highest frequency texture features obtained from the different training subsets. The best model used only 3 texture features (studying two conventional MRI contrasts and only one glioma region), obtaining good classification results. Thus, this model was characterized by its simplicity, in addition to the reduced average computation time employed to classify an individual glioma. Furthermore, as the methodology is thoroughly described and the studied database is publicly available, it is possible to reproduce and corroborate the reported model. Finally, the features used in the model presented significant differences between the testing LGGs and HGGs.
Regarding the interpretation of the variables used in the best model, the feature Fszm.z.perc (calculated from T21) was a measurement of coarseness of the texture, such that higher values of this feature corresponded to a finer (or more homogeneous) texture [46]. On the other hand, the feature Fszm.zsnu (calculated from T21) measured the variability of zone size volumes across the image, with a higher value indicating more heterogeneity in zone size volumes [46]. And regarding the feature Fszm.zs.var (calculated from T1Gd2), it measured the variance of the zone sizes, and similarly to the previous one, higher values of this feature indicated a more heterogeneous texture [27]. From the interpretation of the features and the results described above, it could be deduced that LGGs had a more heterogeneous texture than HGGs, specifically in the T2 contrasts; and HGGs had a more heterogeneous texture than LGGs, specifically in the T1Gd contrasts; in both cases studying the NCR/ NET region. Several works have reported models whose main classification variable was heterogeneity of gliomas [18, 23, 25, 47–49]. For example, through texture analysis applied on diffusion tensor imaging [25, 49] and diffusion kurtosis imaging [49] maps, diverse features that characterized the heterogeneity of gliomas indicated an increased heterogeneity for higher grade gliomas compared to lower grade gliomas. Moreover, Kin et al. [47] studied the texture matrix called Grey Level Co-occurrence Matrix (GLCM) of contrast enhanced T1 MR and ADC maps and reported higher values of entropy (or non-uniformity) as well as reduced values of homogeneity for HGGs when these were compared to LGGs. Also, Skogen et al. [48] applied texture analysis on post-contrast spoiled gradient echo (SPGR) sequences using a filtration histogram technique in order to obtain from fine to coarse features and quantified the heterogeneity of gliomas through standard deviation of the histograms. They reported results that showed a higher heterogeneity for the HGGs compared to the LGGs. Thereby, diverse studies have related a higher heterogeneity to a higher grade glioma. However, the present work showed that one glioma grade had a more heterogeneous texture than the other according to the studied MRI contrast. Therefore, this result is complementary to what is usually reported, since it was more specific after having included the MRI contrast as a variable of the models.
One of the objectives of this work was to present explicitly one classification model, and then apply it on a single and independent testing subset as validation process. Because of this, the database was divided between different training subsets and one testing subset, creating the models from the first and applying them to the last one. The number of 30 LGGs and 30 HGGs was chosen to form the training subsets, because 30 was the minimum number of gliomas per study group such that there were no significant differences in the results obtained by the created models (data not reported). In addition, the same number of LGGs and HGGs were chosen to avoid the problem so-called “class imbalance” using an under-sampling approach. Later, as part of this approach, complementary information obtained from different training subsets was used to create the classification models.
The main contribution of this work, in addition to the proposed under-sampling approach already mentioned, is the simplicity of the best classification model (which obtained high values of accuracy) compared to others recently reported. For instance, Wang et al. [5] analyzed a combination of advanced and conventional MRI (diffusion-weighted, contrast-enhanced T1-weighted and axial T2-weighted images) of 26 LGGs and 26 HGGs divided into a training and validation set. A total of 654 radiomic features were extracted for each subject. Through a LASSON regression 15 features were chosen, from which a nomogram was created. Then, the classification capacity of the nomogram was evaluated using the Harrell’s concordance index (C-index), obtaining a C-index of 0.971 and 0.961 on training and validation data respectively. In another work, Khawaldeh et al. [7] studied 2D slides images from conventional MRI (FLAIR) of 128 subjects including LGGs, HGGs and healthy subjects. The authors proposed a modified version of the convolutional neural network known as AlexNet [50] in 2D, which reached an accuracy of 91.16% by differentiating the 3 study groups. On the other hand, Tian et al. [24] analyzed conventional (T1-weighted before and after contrast-enhanced and T2-weighted images) and advanced (multi-b-value diffusion-weighted and 3D arterial spin labeling images) MRI of 42 LGGs and 111 HGGs by extracting texture features and histogram parameters. SVM-based recursive feature elimination was used to choose the best features for the classification of gliomas, and then create different SVM classifiers by cross-validation. Using 30 texture features they reached an average accuracy of classification of 96.8%. Also, Gupta et al. [20] analyzed conventional MRI (T1-weighted before and after contrast-enhanced, T2-weighted and FLAIR images) of 80 LGGs and 120 HGGs to perform three tasks: detection, location and identification of gliomas. For the third task (identification), they occupied geometric parameters such as area, solidity, perimeter and orientation of the tumor, in addition to the radiologists consultation. They obtained an accuracy to classify LGGs and HGGs of 94.4% and 94% respectively when T1-weighted before and after contrast-enhanced images were studied, and 96.5% and 97% when they studied T2-weighted and FLAIR images. Therefore, in this work conventional MRI (T1Gd and T2 contrasts) was studied, while others have analyzed advanced MRI or a combination of both [5, 21–24, 51–54]. The model was created from a simple mathematical method (a multiple linear regression), in comparison to others in which mathematical tools of higher complexity were utilized [7, 52–54]. The best model was found to use only 3 variables of a single type (quantitative, being also only texture features), instead of a combination of different classes and types of variables [21, 24, 51, 53]. A texture analysis was performed (which is easy to implement for any type of MRI) and a single texture matrix was used instead of different matrices [24], being the chosen one (GLSZM) a suitable texture matrix when heterogeneity is a predominat characteristic of the object of study. In addition, since the studied database is publicly available and the mathematical expression of the best model was explicitly reported, the reproducibility of the presented methodology and the corroboration of the results by other independent studies is feasible. In general, any classifier model has a very strong dependence on the database and image acquisition protocol used to develop them. Usually an institutional database and protocol are used for this purpose. In contrast, the BRATS database was obtained from 19 study centers with different clinical protocols and various scanners. This makes the database heterogeneous and therefore it approaches a more realistic scenario of what could be found in a clinical environment. Hence, there is a possibility that the reported model could be tested on other databases without being limited to a specific clinical protocol. In addition to the simplicity of the reported classification model, since conventional MRI and texture analysis were studied, the diagnostic model presented is low cost and easy to implement, so that it is accessible to populations with reduced economic and scientific resources.
Among the limitations of presented work, the following should be mentioned. Since there was only a single independent testing subset (randomly chosen), there is a possibility that the results may vary according to the chosen subset. Also, the number of gliomas that made up the training and testing subsets were relatively small. It is always preferable and desirable to have a database with a greater number of samples, such that the results obtained have a higher reliability. On the other hand, images of manually segmented gliomas were used, so that the proposed classification method was supervised (not fully automated). Moreover, the criterion for the choice of the texture features was limited to use only statistical tests. This was not enough to ensure good results in all models, even though the texture features used in them showed significant differences. Moreover, the molecular characteristics of the tumors have shown to be more useful than the histological characteristics in the diagnosis, treatment and prognosis of the patients. Taking into account the above, future work should consider the application of the reported classification methodology to other independent databases. An automatic segmentation method must be developed or an existing one must be implemented, such that the glioma classification methodology becomes fully automated. Besides, other criteria for the extraction (Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), etc.) and selection (filter approach, wrapper approach, etc.) of texture features should be considered. Also, other texture matrices (Gray Level Co-Occurrence Matrix (GLCM), Grey Level Run Length Matrix (GLRLM), etc.) and conventional MRI contrasts (T1, FLAIR, etc.) could be studied. Finally, the work done and the characteristics studied intend to be a complement to other analysis techniques, such as those that study molecular characteristics, so that future work can include the correlation and implementation of results from different work approaches.
In conclusion, the methodology proposed proved to be useful for the classification of low and high grade gliomas obtaining high values of accuracy. The main objective of the authors is that the model can be implemented as a complement in the clinical diagnosis environment of this type of brain tumors.
Supporting information
S1Table. List of HGGs and LGGs, including the key names occupied in the database and the reference numbering used in the work.
S2 Table. Reference gliomas for normalization.
S3 Table. Testing glioma subset.
S4 Table. One hundred training glioma subsets.
S5 Table. Texture features of all gliomas, both training and testing, considering the MRIreg: T1Gd1.
S6 Table. Texture features of all gliomas, both training and testing, considering the MRIreg: T21.
S7 Table. Texture features of all gliomas, both training and testing, considering the MRIreg: T1Gd2.
S8 Table. Texture features of all gliomas, both training and testing, considering the MRIreg: T22.
S9 Table. Models created for combinations 1, 3 and 6, indicating the respective ordered features and their coefficients.
S10 Table. Thirty reduced models created from the first 5 ordered highest frequency features used in the combination 6, indicating order, reference numbering and their coefficients.