U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Comput Struct Biotechnol J

Accurate brain tumor detection using deep convolutional neural network

Md. saikat islam khan.

a Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh

Anichur Rahman

b Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka 1350, Bangladesh

Tanoy Debnath

c Department of CSE, Green University of Bangladesh, 220/D, Begum Rokeya Sarani, Dhaka 1207, Bangladesh

Md. Razaul Karim

Mostofa kamal nasir, shahab s. band.

d Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin 64002, Taiwan

Amir Mosavi

e Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia

f John von Neumann Faculty of Informatics, Obuda University, 1034 Budapest, Hungary

Iman Dehzangi

g Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA

h Department of Computer Science, Rutgers University, Camden, NJ 08102, USA

Graphical abstract

An external file that holds a picture, illustration, etc.
Object name is ga1.jpg

Detection and Classification of a brain tumor is an important step to better understanding its mechanism. Magnetic Reasoning Imaging (MRI) is an experimental medical imaging technique that helps the radiologist find the tumor region. However, it is a time taking process and requires expertise to test the MRI images, manually. Nowadays, the advancement of Computer-assisted Diagnosis (CAD), machine learning, and deep learning in specific allow the radiologist to more reliably identify brain tumors. The traditional machine learning methods used to tackle this problem require a handcrafted feature for classification purposes. Whereas deep learning methods can be designed in a way to not require any handcrafted feature extraction while achieving accurate classification results. This paper proposes two deep learning models to identify both binary (normal and abnormal) and multiclass (meningioma, glioma, and pituitary) brain tumors. We use two publicly available datasets that include 3064 and 152 MRI images, respectively. To build our models, we first apply a 23-layers convolution neural network (CNN) to the first dataset since there is a large number of MRI images for the training purpose. However, when dealing with limited volumes of data, which is the case in the second dataset, our proposed “23-layers CNN” architecture faces overfitting problem. To address this issue, we use transfer learning and combine VGG16 architecture along with the reflection of our proposed “23 layers CNN” architecture. Finally, we compare our proposed models with those reported in the literature. Our experimental results indicate that our models achieve up to 97.8% and 100% classification accuracy for our employed datasets, respectively, exceeding all other state-of-the-art models. Our proposed models, employed datasets, and all the source codes are publicly available at: ( https://github.com/saikat15010/Brain-Tumor-Detection ).

1. Introduction

A brain tumor is one of the deadliest illnesses which occurs due to the sudden and unregulated brain tissue growth inside the skull. It can be either benign or malignant. Malignant tumors can expand quickly and disperse across the surrounding brain tissue, whereas benign tumors tend to grow slowly. However, benign tumors can also be dangerous as their proliferation may affect surrounding brain tissues. About 70% of the tumors are benign, and 30% are malignant [1] . So far, more than 120 different brain tumors including meningioma, glioma, and pituitary as the most popular ones have been detected and identified. Among these three, meningioma tumors are perhaps the most prominent primary brain tumor in the meninges and affect the brain and spinal cord [2] . On the other hand, glioma tumors grow from glial cells called astrocytes. The most prominent tumor of glioma is an astrocytoma, a low-risk tumor that suggests slow development. However, high-risk glioma is one of the most severe brain tumors. Pituitary is another type of tumor that is due to excessive growth of brain cells in the pituitary gland of the brain. Therefore, early diagnosis of a brain tumor is essential due to its deadly aspect.

According to the International Association of Cancer Registries (IARC), there are more than 28,000 people diagnosed with brain tumors every year just in India in which more than 24,000 people die [3] . Another study reported that there are approximately 5,250 deaths recorded annually in the United Kingdom due to brain tumors [4] . In the United States, the impact of brain tumors is even more significant than in other countries. Just in 2019, about 86,970 cases of benign and malignant brain tumors are diagnosed [5] . The radiologist uses different experimental procedures for diagnosing brain tumors, including biopsy, Cerebrospinal fluid (CSF) analysis, and X-ray analysis. In the biopsy procedure, a small fragment of tissue is removed by surgery. The radiologist then determines whether the tissue holds a tumor or not. However, the biopsy process introduces many risks including inflammation and severe bleeding. It also has just 49.1% accuracy [6] . CSF is a colorless fluid that illustrates inside the brain. The radiologist tests the liquid to detect a brain tumor. However, similar to biopsy, it introduces many risks including bleeding from the incision site to the bloodstream and perhaps an allergic reaction after the treatment [7] . Similarly, using X-rays on the skull can lead to an increase in the risk of cancer due to the radiation.

Nowadays, image modalities are becoming more popular for radiologists since they are more accurate and introduce much less risk to patients. There are different methods for capturing medical imaging data including radiography, magnetic reasoning imaging (MRI), tomography, and echocardiography. Among them, MRI is the most prominent as it provides higher resolution images without any radiation. MRI is a non-invasive procedure that provides the radiologist with useful knowledge of medical image data to diagnose brain abnormalities [8] , [9] . On the other hand, the Computer-Aided Diagnosis (CAD) method is designed for detecting brain tumors in the early stages without any human intervention. CAD systems can produce diagnostic reports based on MRI images and offer guidance to the radiologist [10] .

The CAD process has improved dramatically using machine learning (ML) and deep learning (DL) applications in the medical imaging field [11] , [12] , [13] . Such techniques lead to better accuracy in terms of detecting brain tumors in the CAD system. Machine learning techniques are based on feature extraction, feature selection, and classification approaches. Different feature extraction techniques, including thresholding-based, clustering-based, contour-based, and texture-based are used for segmenting the tumor region from the human skull [14] . Such techniques extract the features from the MRI images where the important features are selected through the feature selection process. Extracting features with significant discriminatory information lead to achieving high accuracy [15] . However, using features extraction, it is possible to discard important information from the original image [16] .

On the other hand, DL methods address this issue by using the original image as input [17] . In other words, they do not require handcrafted features for classification purposes. Among DL models, Convolutional Neural Network (CNN) provides [18] different convolution layers which will automatically extract features from the images [19] . CNN performed well when working with a large dataset which is not always easy to obtain in the medical imaging field [20] . One method to address this issue is to use transfer learning. In transfer learning [21] , a model that has been previously trained with another large dataset related to another domain is used for the classification purpose [22] . Such knowledge helps the model to achieve high accuracy on a small dataset [23] .

In this paper, we propose a system for automatically classifying brain tumors based on two deep learning models. A “Fine-tuned proposed model with the attachment of the transfer learning based VGG16” architecture is used for classifying normal and abnormal brain images. Four dense layers are employed in place of the completely connected layers during the tuning process, with the last dense layer equipped with a softmax activation function being used to identify brain tumors. To transform the two-dimensional matrix into a vector, we use Global Average Pooling 2D instead of flattening layers. A total of 71 normal and 81 abnormal MRI images are used in this classification to address the data imbalance problem. On the other hand, we propose a “23-layers CNN” architecture for classifying multiclass brain tumors. In this work, a total of 3064 MRI images are used for training the CNN model. A dropout layer is applied to solve the overfitting issue. In addition, different kernel sizes are integrated with the model to extract the complex features from the MRI images, making the model more robust. Our experimental results indicate that our models reach up to 97.8% and 100% prediction accuracies for our employed, exceeding all other previous studies found in the literature.

To summarize, the main contributions of this study are as follows:

  • • The “23-layer CNN” framework provides segmentation-free feature extraction techniques that do not require any handcrafted feature extraction method relative to the conventional machine learning methods.
  • • In this model, we replace the fully connected layers with four dense layers which facilitate the tuning process.
  • • Data imbalance issue is solved in the Harvard Medical dataset by taking an almost equal number of MRI slices in both normal and abnormal tumor classes.
  • • The overfitting issue is solved in this study by increasing the number of MRI slices using a data augmentation strategy and introducing the dropout layers within both models.
  • • The proposed “23-layers CNN” framework performance is evaluated on both large and small datasets. Results indicate that our framework is able to outperform previous studies found in the literature.
  • • To prevent overfitting in a small image dataset, we merged the “23-layers CNN” framework with the transfer learning-based VGG16 model. Results show that the suggested technique performs splendidly in the test images without experiencing any overfitting problems.

Our proposed models, employed datasets, and all the source codes are publicly available at:  https://github.com/saikat15010/Brain-Tumor-Detection .

2. Background

During the past decades, a wide range of machine learning and deep learning models for detecting brain tumors have been proposed. In this section, a summary of such models is presented.

2.1. Brain tumor detection with segmentation based machine learning technique

As a large volume of medical MRI imaging data is gathered through image acquisition, the researchers are now proposing different machine learning methods to identify brain tumors. These methods are based on feature extraction, feature selection, dimensionality reduction, and classification techniques. Most of those suggested machine learning models are focused on the binary identification of brain tumors. For example, Kharrat et al. proposed a binary classification of brain images using a support vector machine (SVM) and a genetic algorithm (GA) [24] . In this study, the features are extracted using Spatial Gray Level Dependency (SGLDM) method. In a different study, Bahadure et al., used Berkeley wavelet transformation (BWT) and SVM to segment and categorized normal and abnormal brain tissues [25] . They were able to achieve 96.5% prediction accuracy on 135 images. In a related study, Rehman et al., used a Random Forest (RF) classifier to the 2012 BRATS dataset [26] . They compared their model to other classifiers and found that the RF classifier achieve better results in terms of precision and specificity.

Later, for the purpose of identifying brain tumors, Chaplot et al. used a discrete wavelet transform (DWT) as a feature extractor and SVM as a classifier [27] . On 52 images, they achieved 98% prediction accuracy. The K-nearest neighbor (KNN) classifier was then applied by El-Dahshan et al. to 70 images, and the results showed 98.6% prediction accuracy [28] . For feature extraction and feature reduction, they employed DWT and the principle component analysis (PCA), respectively. They also used Particle Swarm Optimization (PSO) and SVM to select and classify textural features. To detect different grading of glioma tumors, Chen et al., used a 3D convolution network to segment the tumor region [29] . The segmented tumors are then classified using the SVM classifier. They also used the recursive function exclusion (RFE) method to extract features with significant discriminatory information. More recently, Ranjan et al., proposed a new model using 2D Stationary Wavelet Transform (SWT) as a feature extractor, and AdaBoost and SVM classifiers to detect brain abnormalities.

Although those techniques significantly enhanced brain tumor detection accuracy, they still have several limitations, including:

  • • Since all these methods are based on binary classification (normal and abnormal), it is not sufficient for the radiologist to decide the patient’s treatment concerning tumor grading.
  • • Those methods are based on different hand-crafted feature extraction techniques, which are time-consuming, complex, and in many cases not effective.
  • • Techniques that were used in those studies performed well with a small amount of data. However, working with a large volume of data required advanced classifiers.

2.2. Brain tumor detection using convolution neural networks (CNN)

CNN presents a segmentation-free method that eliminates the need for hand-crafted feature extractor techniques. For this reason, different CNN architectures have been proposed by several researchers. Most of the CNN models reported multiclass brain tumor detection, including a vast number of image data. For example, Sultan et al., suggested a CNN model with 16 layers [30] . The CNN model tested on two publicly available datasets. One dataset identified tumors as meningioma, glioma, and pituitary tumors, and the other dataset differentiated between the three grades of glioma tumors, including Grade II, Grade III, and Grade IV. They achieved 96.1% and 98.7% prediction accuracies on datasets with 3064 and 516 images, respectively. Hossain et al., used the Fuzzy C-Means clustering technique to extract the tumor area from the MRI images [31] . They proposed a new CNN-basedmodel and compared it to six other machine learning models. The reported 97.9% prediction accuracy outperforms prior models.

A novel hybrid CNN model was created by Ertosun et al. in a different study to find multiclass glioma tumors [32] . For Grade II, Grade III, and Grade IV glioma tumors, they achieved classification accuracy of 96.0%, 71.0%, and 71.0%, respectively. In a similar study, Anaraki et al., identified glioma tumors with 90.9% prediction accuracy using CNN and GA [33] . They obtained 94.2% prediction accuracy for the diagnosis of pituitary, meningioma, and glioma tumors. More recently, Özyurt et al., suggested a combined Neutrosophy and CNN model. In this model, the Neutrosophy technique is used to segment the tumor zone, the segmented portion is extracted using the CNN model and then classified using SVM and KNN classifiers [34] . In a different study, Iqbal et al., introduced a 10-layer CNN model to tackle this problem [35] . They carried out their experiment on the BRATS 2015 dataset and achieved promising results. As it is discussed here, CNN appears to be doing well for a large image dataset. However, it also suffers from two main limitations as follows:

  • • CNN model required a vast number of images for training, which is often difficult to obtain in the medical imaging field.
  • • Convolutional Neural Networks (CNN) perform remarkably well at classifying images that are quite similar to the dataset. CNNs, on the other hand, struggle to classify images that have a slight tilt or rotation. This can be fixed by utilizing data augmentation to continuously introduce new variants to the image during training. To address this problem in our research, we employed the data augmentation technique.

2.3. Brain tumor detection through transfer learning

Transfer learning does well when the volume of data is limited since such a model is previously trained on a large dataset (e.g., the ImageNet database), containing millions of images. In this approach, the pre-trained model with adjusted weights is adopted for the classification tasks. Another benefit is that it does not require a massive amount of computational resources since only the model’s fully connected layers need to be trained. Due to such advantages, different transfer learning models have been used for diagnosing brain tumors. For instance, Talo et al., used a pre-trained ResNet34 model to detect normal and abnormal brain MRI images. A large-scale of data augmentation is also carried out to reach high prediction accuracy [36] . Furthermore, for detecting multiclass brain tumors, Swati et al., proposed a fine-tuned VGG19 model [37] . Later on Lu et al., suggested a fine-tuned AlexNet structure to diagnose brain abnormalities [38] . In this study, just 291 images were used. In a similar study, Sajjad et al., used a fine-tuned VGG19 model for multiclass brain tumor detection and conducted it on 121 images [39] . They achieved an overall prediction accuracy of 87.4% before the data augmentation. Finally, by applying the data augmentation technique, they increased the accuracy to 90.7%. Despite all the benefits, there are several shortcomings associated with transfer learning which are listed below:

  • • Pre-trained models fail to obtain satisfactory results when training on imbalance datasets. They are more biased towards classes with a larger number of samples [36]   [38]   [56] .
  • • Proper fine-tuning is required in pre-trained models. Otherwise, the model will fail to achieve satisfactory results [37]   [39] .

Although previous studies achieved significant improvement in brain tumor diagnosis, there is still room for improvement. This research mainly concentrated on overcoming those shortcomings by fine-tuning the deep learning models and improving forecast accuracy.

3. Methodology

Our proposed block diagram for automated binary and multiclass brain tumor detection is shown in Fig. 1 . The architecture starts with image extraction and loading labels from the dataset. The extracted images then need to be preprocessed before splitting them into training, validation, and test set. Finally, our proposed “23-layers CNN” and the “Fine-tuned VGG16” architectures are applied to the employed datasets. In the following sections, the block descriptions of our proposed methods are discussed in detail.

An external file that holds a picture, illustration, etc.
Object name is gr1.jpg

Proposed architecture for brain tumor detection..

3.1. Dataset

In this study, two different datasets are used. The first one (referred to as dataset 1 in this article) is a publicly available CE-MRI Figshare dataset [40] . The data was collected from General Hospital, Tianjin Medical University, and Nanfang Hospital (China) during 2005 to 2010. This dataset contains a total of 3064 T1- weighted contrast MRI slices from 233 patients diagnosed with one of the three brain tumors, including meningioma, glioma, and pituitary (as shown in Fig. 2 ). The MRI images used in this dataset have three different views including axial, coronal, and sagittal.

An external file that holds a picture, illustration, etc.
Object name is gr2.jpg

Different samples of brain tumors. Glioma, Metastatic adenocarcinoma, Metastatic bronchogenic carcinoma, Meningioma, and Sarcoma tumors from left to right in Harvard medical dataset. The tumor presents within the rectangle.

The second dataset (referred to as dataset 2 in this article) is collected by the Harvard repository [41] . The dataset includes a total of 152 T1 and T2-weighted contrast MRI slices. Among them, 71 slices are healthy images that do not contain any tumor, and a total of 81 are abnormal images containing a tumor. The abnormal brain slices have five different types of tumors, including Glioma, Metastatic adenocarcinoma, Metastatic bronchogenic carcinoma, Meningioma, and Sarcoma (as shown in Fig. 2 ). Table 2 , Table 1 include detail information of these two datasets.

Number of MRI slices in dataset 2.

Number of MRI slices in dataset 1.

3.2. Data preprocessing

We employ several preprocessing techniques before feeding the images into our classifiers. For instance, all the MRI images in the Figshare dataset are in.mat type (defined in Matlab). Hence, to read the image, we require to expand the dimension of the image. After that, we transform all the images into NumPy arrays (available in python) so that our model can take up less space. Before splitting the dataset, we have shuffled the data so that our model can train on unordered data. After shuffling the data, we divide the dataset into three sections including train, test, and validation. Approximately 70% of the data is used for training, and a further 30% is used for validation and testing purposes (see Table 4 ).

MRI slices distribution for training validation and testing purposes.

On the other hand, all the MRI images in the Harvard Medical dataset are in.GIF type. To process the dataset, we have converted the MRI images to.JPEG type. To reduce the image’s dimensionality, we down-size the original image from 256 × 256 × 1 to 128 × 128 × 3. We replicate the pixel intensity value three times to create three channels according to the pre-trained VGG16 architecture input size. Although only 152 images are available in dataset 2, we have conducted several data augmentation techniques for solving the overfitting issue, increasing the dataset size, and making the model more robust [42] , [49] , [50] . Further descriptions of the data augmentation technique are provided in Table 3 . As a result, the number of images increased from 152 to 884 after performing data augmentation. Additionally, we have used 70% of the data to train the model, and a further 30% of the data were used to validate and test the proposed method. (see Table 4 ).

Data augmentation strategy used in this study.

3.3. Proposed 23-layers CNN architecture

Fig. 3 demonstrates the proposed “23-layers CNN” architecture used to classify different tumor types, including meningioma, glioma, and pituitary. In the proposed architecture, we take MRI slices as input, process the slices in different layers, and differentiate them from one another. In this study, a total of 23 layers are used to process the slice. Below is the description of each layer:

An external file that holds a picture, illustration, etc.
Object name is gr3.jpg

Proposed 23-layers CNN architecture.

One of the predominant building blocks of the CNN model is the convolutional layer. It is a mathematical method that performs a dot product between two matrices to construct a transformed feature map. One matrix relates to the kernel, while the other presents the pixel intensity values of the original image. The kernel is used to move vertically and horizontally over the original image to extract properties such as borders, corners, shapes, etc. When we move further into the model, it begins to find more better features like blurring, sharpening, texturing, and gradients direction [43] . A total of four convolutional layers with different kernel sizes, including 22 × 22, 11 × 11, 7 × 7, and 3 × 3, are included in the “23-layers CNN” architecture. We move the filter 2 pixels at a time using stride two over the input matrix. For padding, we preserve the original size of the image by applying zero paddings, to avoid losing the details of the image. The following equation describes the convolutional layer:

where, K is the image with a size of (h, d), and (i, j) corresponds to the kernel size value with an f-number of filters. Fig. 4 illustrates the convolutional approach to generate the feature map.

An external file that holds a picture, illustration, etc.
Object name is gr4.jpg

Convolution operation on 5 × 5 image using 3 × 3 kernel.

As an activation function, we use the Rectified Linear Unit (ReLU) which performs non-linear operations within the convolutional layer. The RelU activation function helps to solve the gradient vanishing problem using the backpropagation process [44] . The RelU is defined as follows:

The ReLU activation function is graphically presented in Fig. 5 .

An external file that holds a picture, illustration, etc.
Object name is gr5.jpg

ReLU operation.

In the next level, Pooling layers help to minimize the dimension of the transformed feature map. In this architecture, a total of 3 pooling layers are used. Different pooling layers are available in the CNN model, including max pooling, min pooling, and average pooling. We choose max pooling with varying sizes of the pool, such as 4 × 4 and 2 × 2, to retrieve the most prominent features from the transformed feature map [45] . Fig. 7 illustrates the max-pooling procedures where the feature map is in 4 × 4 blocks. As shown in this figure, max-pooling generates the most dominant features in every 2 × 2 blocks.

An external file that holds a picture, illustration, etc.
Object name is gr7.jpg

Max Pooling procedure.

Batch normalization also plays a vital role in designing an accurate CNN model. It is used to regulate the model and enables a higher learning rate. It also helps to re-scale all the data to normalize the input data. Here we use a total of 7 batch normalization layers to build our model. Before feeding the data into a fully connected layer, GlobalAveragePooling2D is used to convert multi-dimensional data into a one-dimensional vector. It takes the average output of each convoluted feature map from the previous layer and build a one-dimensional vector. Next, the one-dimensional vector is fed into the fully connected layer as the input. Additionally, we employ a total of four fully connected layers to construct our model, with the classification taking place in the final fully connected layer. We have used softmax function as our activation function in the output layer of our proposed model, that predicts a multinomial probability where the probabilities of each value are proportional to the relative scale of each value in the vector. In the softmax activation function, the outcome value is between 0 and 1 which is defined as follows:

One of the most challenging issues in building an accurate deep neural network is overfitting. It occurs when the model is over-trained on the training data but has a negative impact on the new data [46] . To avoid overfitting, we use the dropout layer before the classification layer. In the “23-layers CNN” architecture, a dropout of 20% is used. Hence, only 80% of the features will be trained on every iteration. Fig. 6 illustrates the dropout procedure.

An external file that holds a picture, illustration, etc.
Object name is gr6.jpg

Dropout layer.

3.4. Fine-tuning for proposed CNN

A fine-tuning approach not only replaces the pre-trained model’s layers with a new set of layers to train a given dataset, it also uses backpropagation to fine-tune all or part of the kernels in the pre-trained convolutional layer. In this study, the Fine-tuned CNN pre-trained model is used to identify whether or not the tumor is located inside the image. As our pre-trained model, we use VGG16, which was first introduced in 2014 and became the first runner-up in the ILSVRC competition [47] . When a model fits the training set too well, then overfitting happens. The model thus has a hard time generalizing to new data that are not in the training set. In the case of dataset 2, since the training dataset is small, it is very likely to overfit complex models. To address this issue, we combine the reflection of our proposed “23-Layers CNN” architecture with the “transfer learning based VGG16 architecture”. The VGG16 architecture was fine-tuned to be integrated with the reflection of the proposed model with Harvard Medical dataset (as presented in Fig. 8 ).

An external file that holds a picture, illustration, etc.
Object name is gr8.jpg

Fine-tuned Proposed architecture with the attachment of “transfer learning based VGG16 architecture”..

Here we use all 13 convolution layers from the VGG16 architecture along with the reflection of the proposed architecture with kernel size 3 * 3 and 5 total max-pooling layers with stride 2. In all convolution layers, the ReLU activation function is used. In this study, different filter sizes are used to fine-tune the fully connected layers, including 1024, 1024, 512, and 2. A dropout layer which is placed between two dense layers is also used for the fine-tuning process to overcome the over-fitting problem. Finally, in the classification stage, we use a CNN model and tune its parameters. We also investigate more about hyper-parameters such as padding, zero-padding, strides, feature map, batch size, and learning rate to build a best-suited model.

4. Experimental setup

The proposed models are implemented in TensorFlow, with Keras in Python. The implementation was performed on Google Colab which provides free online cloud service along with 15 GB of free space in google drive.

4.1. Training and parameter optimization

For Study I (using dataset 1), Fig. 9 demonstrates both training and validation steps for the “23-layers CNN” architecture. The hyper-parameter optimization used for this training is presented in Table 5 . As our loss function, we select sparse categorical cross-entropy. We also study different batch-sized optimizers to train the model. Among them, the Adam optimizer with batch size 32 obtained the best performance. We observe that the optimal convergence for the model depends on the initial learning rate of alpha. We have to select alpha very carefully because CNN does not converge well if alpha is very high. If alpha is very small, then CNN will take more time to converge. Here we select the alpha as 0.0001 to avoid these issues.

An external file that holds a picture, illustration, etc.
Object name is gr9.jpg

Training progress for study I: (a) accuracy value during training and validation process (preferred higher value), and (b) loss value during training and validation process (preferred lower value).

Optimization of Hyper-Parameters for Study I and Study II.

For each epoch, Fig. 9 (a) shows both training and validation progress. After the 29th epoch, the CNN model achieves 100% prediction accuracy with overall validation accuracy of 97.0%. Considering the consistency of the results (as shown in this figure), we can conclude that the “23-layers CNN” architecture successfully avoids the overfitting problem. Fig. 9 (b) shows that the loss value decreases, and right after the 29th epoch, it hits zero for the training phase. Due to the limited batch size, some fluctuations occurred in the curve for the validation process. However the instability vanished after the 43rd epoch, and the loss curve approaches to zero.

5. Performance metrics

To evaluate the performance of “23-layers CNN” and “Fine-tuned VGG16” architectures and compare our results with previous studies, we use different evaluation metrics including, accuracy, precision, recall, false-positive rate (FPR), true negative rate (TNR), and F1-score. These metrics are calculate as follows:

Where TP stands for true positive, FP stands for false positive, TN stands for true negative, and FN stands for false negative.

The confusion matrix and the ROC curve for the Figshare dataset are given in Fig. 10 . In the Figshare dataset, a “23-layers CNN” architecture was used for the prediction purpose. It can be observed from Fig. 10 that a total of 140, 270, and 180 MRI slices are correctly classified for meningioma, glioma, and pituitary tumors, respectively. While only 20 MRI slices are misclassified by the proposed architecture. The other performance metrics, including accuracy, precision, recall, FPR, TNR, and F1-score, are presented in Table 6 . As shown in Table 6 , the prediction accuracy of 96.7%, 97.2%, and 99.5% are achieved for meningioma, glioma, and pituitary tumors, respectively. Finally, the overall prediction accuracy achieved on the Figshare dataset is 97.8%. For the other performance metrics, we achieve an average precision of 96.5%, a recall of 96.4%, and an F1-score of 96.4%. The false-positive rate is approximately 0, and the true negative rate appears to be close to 1, which demonstrates that the “23-layers CNN” architecture can achieve excellent efficiency on the Figshare dataset.

An external file that holds a picture, illustration, etc.
Object name is gr10.jpg

CNN model’s performance a) confusion matrix, b) ROC curve.

The results obtained using the CNN model on dataset1.

From the ROC curve, we can observe that the area value is 0.989, which indicates the consistency and generality of our model.

6.1. System validation

We also apply our proposed “23 layers CNN” architecture to the Harvard Medical dataset. Here we achieved more than 85% training and validation accuracy on this dataset. However, the testing accuracy is less than 55%, indicating an overfitting issue occurred while training the model. Hence, to validate the system’s performance and for solving the overfitting issue, the generalization technique was applied. As it was discussed earlier, to build this model, we combine VGG-16 model with some reflection of our proposed “23 layers CNN” architecture as shown in Fig. 8 . In this way, we address the overfitting issue for the small dataset.

Fig. 14 demonstrates both training and validation process for the “Fine-tuned VGG16” architecture. The hyper-parameter optimization used for the training process is presented in Table 5 . At first, we have selected a minimal batch size of 10 since dataset 1 consists of only 152 MRI images. Additionally, we used categorical cross-entropy as a loss function, which is used in both single label and multi-class classification problems. We can observe from Fig. 14 (a) that, right after the 33rd epoch, 100% training accuracy is achieved. As shown in Fig. 14 (b), the loss value starts decreasing and after the 33rd epoch, it approaches to zero for both training and validation sets.

An external file that holds a picture, illustration, etc.
Object name is gr14.jpg

Training progress for study I I: (a) accuracy value during training and validation process (preferred higher value), and (b) loss value during training and validation process (preferred lower value).

The confusion matrix and the ROC curves for dataset 1 are given in Fig. 13 . In this dataset, a “Fine-tuned VGG16” architecture is tested on 30 images. Among them, 14 images contain no tumor, and 16 images include tumors. Interestingly, no MRI slices are misclassified by our proposed architecture. As shown in Fig. 13 all 14 and 16 MRI slices are correctly classified for normal and abnormal brain images, respectively. The other performance metrics are shown in Table 7 . As shown in this table, we achieve an average accuracy of 100%, 100% precision, recall of 100%, and F1-score of 100%. The FNR is 0, and the TNR is 1 for dataset 2. From the ROC curve, we can also observe that the area under the curve value is 1, which indicates the model’s consistency and generality. The performance of the proposed framework on both datasets are given in Fig. 11 , Fig. 12 . We have also tested our proposed method using different configurations. Table 9 shows the performance of various activation functions and loss functions when combined with the proposed 23-layers CNN architecture. Among the loss functions, sparse categorical cross entropy performed well compared to the other two loss functions. Binary cross entropy, however, performed poorly. It is understandable that binary cross entropy will perform poorly when categorizing multiclass brain tumor grades because it worked well for the binary class data. The categorical cross entropy produced notable outcomes by obtaining greater than 90% accuracy. However, its performance was still inadequate to that of categorical cross-entropy. Additionally, we have employed three activation functions in this study where the softmax activation function and the sparse categorical cross-entropy loss function achieved more than 97% accuracy, outperforming all the other configurations.

The results obtained using the reflection of the proposed CNN model on dataset2.

An external file that holds a picture, illustration, etc.
Object name is gr11.jpg

Performance of the proposed method on Dataset-1.

An external file that holds a picture, illustration, etc.
Object name is gr12.jpg

Performance of the proposed method on Dataset-2.

An external file that holds a picture, illustration, etc.
Object name is gr13.jpg

Fine-tuned model’s performance a) confusion matrix, b) ROC curve.

Performance of different configurations on the Figshare dataset.

7. Discussion

In this study, we proposed two individual models to diagnose binary (normal and abnormal) and multiclass (meningioma, glioma, and pituitary) brain tumors (see Fig. 1 ). The proposed models are compared to the existing state-of-the-art models found in the literature, which is illustrated in Table 8 . Those models used the same datasets and tumor types with different architectures. It is evident from Table 8 that our proposed “23-layers CNN” and “Fine-tuned CNN with the attachment of transfer learning based VGG16” architectures demonstrate the best prediction performance for the identification of both binary and multiclass brain tumors compared to other methods found in the literature.

Comparison of the proposed framework with the other state of art models

For the Harvard Medical Dataset (dataset 2) and Figshare dataset (dataset 1), we have obtained 100% and 97.8% prediction accuracies, respectively. However, there are other advantages to our proposed model over the existing models found in the literature. For example, most of the methods require handcrafted feature extractor methods [9]   [27]   [28]   [51] , which may not be very effective when dealing with a large number of images. While the “23-layers CNN” and “Fine-tuned CNN with VGG16” architectures are segmentation-free and do not require handcrafted features.

Previously, Anaraki et al., introduced GA with CNN to predict brain tumors [33] . GA, however, does not always demonstrate good precision when working with CNN. GA is also a computationally expensive model. In another research, Afshar et al., used CapsNets architecture to focus on both the tumor and its surrounding region [48] . However, defining two objects at the same time can compromise the results for each individual problem despite their similarities. Swati and Sajjad et al., both applied the pre-trained VGG19 model to the Figshare dataset and obtained nearly the same performance [37]   [39] . However, they did not implement any dropout or regularization strategy to solve the issue of overfitting.

In another study, Shanaka et al. segmented the tumor region using the active contour approach [52] . Active contour uses energy forces and limitations to extract the crucial pixels from an image for additional processing and interpretation. However, there are drawbacks that could occur while using active contouring in segmentation, such as getting stuck in local minima states while training or overlooking tiny details while minimizing the energy throughout the whole path of their contours. Momina et al. applied Mask RCNN along with the ResNet-50 model to locate the tumor region [53] . They have achieved 95% classification accuracy. However, more sophisticated object detection algorithms, such as the Yolo model and the Faster RCNN model, perform much better than the Mask RCNN. For instance, Eko et al. outperformed Mask RCNN by employing the Yolo model, which has a mAP rate of 80.12%, when segmenting the head and tail of fish [57] .

Later on, Francisco et al., and Emrah et al. both used CNN model to obtain detection accuracy of more than 90% [54] [55] . However, both models are computationally expensive and do not offer a method for system validation. Since a specific model may work well on one dataset while having detrimental effects on another, it is crucial to apply system validation techniques. In a similar study, Abiwinanda et al. proposed a CNN model to categorize tumor classes using only 700 MRI images from the Figshare dataset [58] . They also did not employ any data augmentation techniques in order to increase the amount of MRI images. As a result, they only achieved a classification accuracy of 84%, which is quite low compared to similar studies.

To classify the binary class, previous studies used an imbalance dataset [9]   [27]   [28] . We addressed this issue by using almost the same number of normal and abnormal brain MRI images. Besides, using the CNN model in the Figshare dataset, Sultan et al., achieved very promising results. However, there was still room for improvement by adding more layers into the network. A comparison between the proposed framework and all the previous studies found in the literature mentioned above are shown in Fig. 15 .

An external file that holds a picture, illustration, etc.
Object name is gr15.jpg

Performance of the proposed method compared to the latest research..

7.1. Limitations and future work

Although our proposed models achieved promising classification outcomes, there are still a number of issues that can be resolved in the future work. For example, one of the key difficulties in using the deep learning-based automated detection of brain tumor is the requirement for a substantial amount of annotated images collected by a qualified physician or radiologist. In order to make a robust deep learning model, we would require a large dataset. To the best of our knowledge, the majority of contemporary machine learning tools for medical imaging have this constraint. Although the majority of earlier studies are currently making their datasets available to the public in an effort to address this problem. Sill, the amount of properly and accurately annotated data is still very limited.

Adopting zero-shot, few-shot, and deep reinforcement learning (DRL) techniques could help us to tackle this problem in the future. Zero-shot learning has the capacity to build a recognition model for unseen test samples that are not labeled for training. Zero-shot learning can thereby address the issue of the tumor classes’ lack of training data. Additionally, a deep learning model can learn information from a small number of labeled instances per class using few-shot learning technique. On the other hand, DRL can reduce the need for precise annotations and high-quality images.

Another drawback of this study is that although the proposed method achieved a significant performance on two publicly available datasets, the work is not validated on actual clinical study. It is the case for almost all of the models reviewed in this study as well. Our aim is to test our model on actual clinical data when thy become available. In this way, we can directly compare the performance of our proposed models with experimental approaches. Another future direction is to use more layers or other regularization techniques to work with a small image dataset using CNN model.

8. Conclusion

This research introduces two deep learning models for identifying brain abnormalities as well as classifying different tumor grades, including meningioma, glioma, and pituitary. The “proposed 23-layer CNN” architecture is designed to work with a relatively large volume of image data, whereas the “Fine-tuned CNN with VGG16” architecture is designed for a limited amount of image data. A comprehensive data augmentation technique is also conducted to enhance the “Fine-tuned CNN with VGG16” model’s performance. Our experimental results demonstrated that both models enhance the prediction performance of diagnosis of brain tumors. We achieved 97.8% and 100% prediction accuracy for dataset 1 and dataset 2, respectively outperforming previous studies found in the literature. Therefore, we believe that our proposed methods are outstanding candidates for brain tumor detection. Our proposed models, employed datasets, and all the source codes are publicly available at:  https://github.com/saikat15010/Brain-Tumor-Detection .

Authors contributions

SIK, AR, and MKN conceived and initiated this study. SIK, AR, RK, and TD performed the experiments. SIK, AR, TD, SSB, AM, and ID wrote the manuscript. SIK, AR, MKN, SSB, MS, and ID helped with the literature review. AR, SSB, AM, ID, and TD mentored and analytically reviewed the paper. All the authors reviewed the article.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 27 January 2022

Classification of brain tumours in MR images using deep spatiospatial models

  • Soumick Chatterjee 1 , 2 , 3   na1 ,
  • Faraz Ahmed Nizamani 4   na1 ,
  • Andreas Nürnberger 2 , 3 , 5 &
  • Oliver Speck 1 , 5 , 6 , 7  

Scientific Reports volume  12 , Article number:  1505 ( 2022 ) Cite this article

15k Accesses

26 Citations

15 Altmetric

Metrics details

  • Cancer imaging
  • Cancer screening
  • Computer science

A brain tumour is a mass or cluster of abnormal cells in the brain, which has the possibility of becoming life-threatening because of its ability to invade neighbouring tissues and also form metastases. An accurate diagnosis is essential for successful treatment planning, and magnetic resonance imaging is the principal imaging modality for diagnosing brain tumours and their extent. Deep Learning methods in computer vision applications have shown significant improvement in recent years, most of which can be credited to the fact that a sizeable amount of data is available to train models, and the improvements in the model architectures yield better approximations in a supervised setting. Classifying tumours using such deep learning methods has made significant progress with the availability of open datasets with reliable annotations. Typically those methods are either 3D models, which use 3D volumetric MRIs or even 2D models considering each slice separately. However, by treating one spatial dimension separately or by considering the slices as a sequence of images over time, spatiotemporal models can be employed as “spatiospatial” models for this task. These models have the capabilities of learning specific spatial and temporal relationships while reducing computational costs. This paper uses two spatiotemporal models, ResNet (2+1)D and ResNet Mixed Convolution, to classify different types of brain tumours. It was observed that both these models performed superior to the pure 3D convolutional model, ResNet18. Furthermore, it was also observed that pre-training the models on a different, even unrelated dataset before training them for the task of tumour classification improves the performance. Finally, Pre-trained ResNet Mixed Convolution was observed to be the best model in these experiments, achieving a macro F1-score of 0.9345 and a test accuracy of 96.98%, while at the same time being the model with the least computational cost.


A brain tumour is the growth of abnormal cells in the brain. Brain tumours are classified based on their speed of growth and the likeness of them growing back after treatment. They are mainly divided into two overall categories: malignant and benign. Benign tumours are not cancerous, they grow slowly and are less likely to return after treatment. Malignant tumours, on the other hand, are essentially made up of cancer cells, they have the ability to invade the tissues locally, or they can spread to different parts of the body, a process called metastasise 1 . Glioma tumours are the result of glial cell mutations resulting in malignancy of normal cells. They are the most common types of Astrocytomas (tumour of the brain or spinal cord), account for 30% of all brain and central nervous system tumours, and 80% of all malignant tumours 2 . The phenotypical makeup of glioma tumours can consist of Astrocytomas, Oligodendrogliomas, or Ependymomas. Each of these tumours behaves differently, and World Health Organisation (WHO) uses the following grading-based method to categorise each tumour based upon its aggressiveness:

Grade I tumours are generally benign tumours, which means they are mostly curable, and they are commonly found in children.

Grade II includes three types of tumours: Astrocytomas, Oligodendrogliomas, and Oligoastrocytoma—which is a mix of both 3 . They are common in adults. Eventually, all low-grade gliomas can progress to high-grade tumours 3 .

Grade III tumour can include Anaplastic Astrocytomas, Anaplastic Oligodendrogliomas or Anaplastic Oligoastrocytoma. They are more aggressive and infiltrating than grade II.

Grade IV glioma, also called Glioblastoma Multiforme (GBM), is the most aggressive tumour in the WHO category.

In general, grades I and II gliomas are considered low-grade gliomas (LGG), while grades III and IV are known as high-grade glioma (HGG). The LGG are benign tumours, and they can be excised using surgical resection. In contrast, HGGs are malignant tumours that are hard to excise by surgical methods because of their extent of nearby tissue invasion. Figure  1 shows an example MRI of LGG and HGG.

figure 1

An example MRI of Low-grade glioma (LGG, on the left) and High-grade glioma (HGG, on the right). Source: BraTS 2019.

A Glioblastoma Multiforme (GBM) typically has the following types of tissues (shown in Fig.  2 ):

The Tumour Core : This is the region of the tumour that has the malignant cells that are actively proliferating.

Necrosis : The necrotic region is the important distinguishing factor between low-grade gliomas and GBM 4 . This is the region where the cells/tissue are dying, or they are dead.

Perifocal oedema : The swelling of the brain is caused by fluid build-up around the tumour core, which increases the intracranial pressure; perifocal oedema is caused by the changes in glial cell distribution 5 .

figure 2

High-grade glioma structure on T1ce, T2 and FLAIR contrast images (from left to right), (red circle) Necrotic core, (blue circle) Perifocal oedema. Source: BraTS 2019.

The prognosis of a brain tumour depends on many factors, such as the tumour’s location, the histological subtype of the tumour, and the tumour margins. In many cases, the tumour reoccurs and progresses to grade IV even after treatment 3 . Modern imaging methods such as MRI can be used for multiple diagnostic purposes; they can be used to identify the tumour location—which is used for investigating tumour progression and surgical pre-planning. MR imaging is also used to study the anatomy of the lesion, physiology, and metabolic activity along with its haemodynamics. Therefore MR imaging remains the primary diagnostic modality for brain tumours.

Detection of cancer, specifically an earlier detection, holds the potential to make a difference in treatment. Earlier detection is vital because lesions in earlier stages are more likely curable; therefore, if intervened early on, this can make the difference between life and death. Deep learning methods can help automate the process of detecting and classifying brain lesions—they can also reduce the radiologists’ burden of reading many images by prioritising only malignant lesions. This will eventually improve the overall efficiency, and it can reduce diagnostic errors 6 . Recent studies have shown that deep learning methods in the field of radiology have already achieved comparable and super-human performance for some pathologies 7 .

Related work

Various deep learning based methods have been proposed in recent times to classify brain tumours. Mzoughi et al. 8 proposed an approach using volumetric CNNs to classify high-grade glioma and low-grade glioma using T1 contrast-enhanced images. Another similar work on glioma classification based on grading was done by Pei et al. 9 , where they first segmented the tumour and then classified the tumour between HGG and LGG. Most of the literature on glioma tumour classification and grading used one single MR contrast image at a time, but Ge et al. 10 used a fusion framework that uses T1 contrast-enhanced, T2, and FLAIR images simultaneously for classifying the tumour. Ouerghi et al. 11 used a novel fusion method for the inclusion of multiple MRI contrasts, first, the T1 images are transformed by non-subsampled shearlet transform (NSST) into low frequency (LF) and high frequency (HF) subimages, essentially separating principle information in the source image from edge information, then the images are fused by predefined rules to include the coefficients, resulting in fusion of T1 and T2 or FLAIR images. Most of the literature only classifies between the different grades of tumour and does not consider healthy brains as an additional class.

Technical background

ResNet or residual network, proposed by He et al. 12 , has shown to be one of the most efficient network architectures for image recognition tasks, dealing with problems of deep networks, e.g. vanishing gradients. This paper introduced residual-link, the identity mappings, which are “skipped connections”, whose outputs are added to the outputs of the rest of the stacked layers. These identity connections do not add any complexity to the network while improving the training process. The spatiotemporal models introduced by Tran et al. 13 for action recognition are fundamentally 3D Convolutional Neural Networks based on ResNet. There are two spatial dimensions and one temporal dimension in video data, making the data three dimensional. For handling such data (e.g. action recognition task), using a network with 3D convolution layers is an obvious choice. Tran et al. 13 introduced two variants of spatiotemporal models: ResNet (2+1)D and ResNet Mixed Convolution. The ResNet(2+1)D model consists of 2D and 1D convolutions, where the 2D convolutions are used spatially while the 1D convolutions are reserved for the temporal element. This gives an advantage of increased non-linearity by using non-linear rectification, which allows this kind of mixed model to be more “learnable” than conventional full 3D models. On the other hand, the ResNet Mixed Convolution model is constructed as a mixture of 2D and 3D Convolution operations. The initial layers of the model are made of 3D convolution operations, while the later layers consist of 2D convolutions. The rationale behind using this type of configuration is that the motion-modelling occurs mostly at the initial layers, and applying 3D convolution there encapsulates action better.

Apart from trying to improve the network architecture, one frequently used technique to improve the performance of the same architecture is transfer learning 14 . This is a technique for re-purposing a model for another task that is different from the task the model was originally trained for performing. Typically, the model parameters are initialised randomly before starting the training. However, in the case of transfer learning, model parameters learned from task one are used as the starting point (called pre-training), instead of random values, for training the model for task two. Pre-training has shown to be an effective method to improve the initial training process, eventually achieving better accuracy 15 , 16 .


Spatiotemporal models are typically used for video classification tasks, which are three dimensional in nature. Their potential in classifying 3D volumetric images like MRI, considering them as “spatiospatial” models, has not been explored yet. This explores the possibility of applying spatiotemporal models (ResNet(2+1)D and ResNet Mixed Convolution) as “spatiospatial” models by treating one dimension (slice dimension) differently than the other two spatial dimensions of the 3D volumetric images. “Spatiospatial” were employed to classify brain tumours of the different types of gliomas based on their grading as well as healthy brains from 3D volumetric MR Images using a single MR contrast, and compare their performances against a pure 3D convolutional model (ResNet3D). Furthermore, the models are to be compared with and without pre-training—to judge the usability of transfer learning for this task.


This section explains the network models used in this research, implementation details, pre-training and training methods, data augmentation techniques, dataset information, data pre-processing steps, and finally, the evaluation metrics.

Network models

Spatiotemporal models are mainly used for video-related tasks, where there are two spatial and one temporal dimension. These models deal with the spatial and temporal dimensions differently, unlike pure 3D convolution-based models. There is no temporal component in 3D volumetric image classification tasks; hence, using a 3D convolution-based model is a frequent choice. At times, they are divided into 2D slices, and 2D convolution-based models are applied to them. For the task of tumour classification, the rationale for using 3D filters is grounded in the morphological heterogeneity of gliomas 17 , it is to make the convolution kernels invariant to tissue discrimination in all dimensions, learning more complex features spanning voxels, while 2D convolution filters will capture the spatial representation within the slices. Spatiotemporal models combine two different types of convolution into one model while having the possibility of reducing the complexity of the model or of incorporating more non-linearity. These advantages might be possible to exploit while working with volumetric data by considering the spatiotemporal models as “spatiospatial” models—the motivation behind using such models for a tumour classification task. In this paper, the slice-dimension is treated as the pseudo-temporal dimension of spatiotemporal models, and in-plane dimensions are treated as the spatial dimensions. The spatiotemporal models used here as spatiospatial models are based on the work of Tran et al. 13 .

Two different spatiospatial models are explored here: ResNet (2+1)D and ResNet Mixed Convolution. Their performances are compared against ResNet3D, which is a pure 3D convolution-based model.

figure 3

Schematic representations of the network architectures. ( a ) ResNet (2+1)D, ( b ) ResNet Mixed Convolution, and ( c ) ResNet 3D.

ResNet (2+1)D

ResNet (2+1)D uses a combination of 2D convolution followed by 1D convolution instead of a single 3D convolution. The benefit of using this configuration is that it allows an added non-linear activation unit between the two convolutions, as in comparison to using a single 3D Convolution 13 . This then results in an overall increase of ReLU units in the network, giving the model the ability to learn even more complex functions. The ResNet(2+1)D uses a stem that contains a 2D convolution with a kernel size of seven and a stride of two, accepting one channel as an input and providing 45 channels as output; followed by a 1D convolution with a kernel size of three and a stride of one, providing 64 channels as the final output. Next, there are four convolutional blocks; each of them contains two sets of basic residual blocks. Each residual block contains one 2D convolution with a kernel size of three and a stride of one, followed by a 1D convolution with a kernel size of three and a stride of one. Each convolutional layer in the model (both 2D and 1D) is followed by a 3D batch normalisation layer and a ReLU activation function. The residual blocks inside the convolutional blocks, except for the first convolutional block, are separated by a pair of 3D convolution layers with a kernel size of one and a stride of two—to downsample the input by half. The 2D convolutions are applied in-plane, and the 1D convolutions are applied on the slice dimension. After the final convolutional block, an adaptive average pooling layer has been added, with an output size of one for all three dimensions. After the pooling layer, a dropout layer followed by a fully connected layer with n output neurons for n classes were added to obtain the final output. Figure  3 (a) portrays the schematic diagram of the ResNet (2+1)D architecture.

ResNet mixed convolution

ResNet Mixed Convolution uses a combination of 2D and 3D Convolutions. The stem of this model contains a 3D convolution layer with a kernel size of (3,7,7), a stride of (1,2,2), and padding of (1,3,3)—where the first dimension is the slice dimension and the other two dimensions are the in-plane dimensions, and accepts a single channel as input while providing 64 channels as output. After the stem, there is one 3D convolution block, followed by three 2D convolution blocks. All the convolution layers (both 3D and 2D) have a kernel size of three and a stride of one, identical for all dimensions. Each of these convolution blocks contains a pair of residual blocks, each of which contains a pair of convolution layers. Similar to ResNet (2+1)D, the residual blocks inside the convolutional blocks, except for the first convolutional block, are separated by a pair of 3D convolution layers with a kernel size of one and a stride of two—to downsample the input by half. Each convolutional layer in the model (both 3D and 2D) is followed by a 3D batch normalisation layer and a ReLU activation function. The motivation behind using both modes of convolution in 2D and 3D is that the 3D filters can learn the spatial features of the tumour in 3D space while 2D can learn representation within each 2D slice. After the convolutional blocks, the final pooling, dropout, and fully connected layers are identical to the ResNet (2+1)D architecture. Figure  3 (b) shows the schematic representation of this model.

The performance of the spatiospatial models are compared against a pure 3D ResNet model, schematic diagram shown in Fig.  3 (c). The architecture of the ResNet3D model used here is almost identical to the architecture of ResNet Mixed Convolution (“ Network models ” section), except for the fact that this model uses only 3D convolutions. The stem of these models are identical, the only difference being that this model uses four 3D convolution blocks, unlike ResNet Mixed Convolution, which uses one 3D convolution block, followed by three 2D convolution blocks. This configuration of ResNet3D architecture results in a 3D ResNet18 model.

Summary and comparison

The general structure of the network models can be divided into the following: input goes to the stem, then there are four convolutional blocks, followed by the output block—which contains an adaptive pooling layer, followed by a dropout layer, and finally a fully connected layer. ResNet Mixed Convolution and ResNet 3D have the same stem, including a 3D convolutional layer with a kernel size of (3,7,7), followed by a batch normalisation layer and a ReLU. ResNet (2+1)D uses a different stem: a 2D convolution layer with a kernel size of seven, then a 1D convolution with a kernel size of three—splitting the 3D convolution (3,7,7) used by the other models into a pair of 2D and 1D convolution: (7,7) and (3). Both 2D and 1D convolution inside this stem is followed by a batch normalisation layer and ReLU pair. The convolutional blocks in the ResNet3D and ResNet Mixed Convolution architectures follow the same architecture: two residual blocks consisting of two sub-blocks consisting of a 3D convolution with a kernel size of three, followed by batch normalisation layer and a ReLU. On the other hand, the first convolutional block of the ResNet (2+1)D architecture uses a pair of 2D and 1D convolutions with the kernel size of three instead of the 3D convolutional layers used by the other models. The rest of the architecture is the same. It is noteworthy that this model has more non-linearity because the 3D convolutions are split into a pair of 2D and 1D convolutions; additional pair of batch normalisation and ReLU could have been used between the 2D 1D convolution. There is one difference between the first convolutional block and the other three blocks (applicable for all three models): the second, third and fourth convolutional blocks included a downsampling pair, which consisted of a 3D convolutional layer with a kennel size of one and a stride of two, followed by a batch normalisation layer. This was not present in the first convolutional block. The convolution blocks of each of all three models double the input features by two (number of input features to the first block: 64, number of output features of the fourth (and final) block: 512). All of these models end with an adaptive average pooling layer, which forces the output to have a shape of 1×1×1, with 512 different features. A dropout with a probability of 0.3 is then applied to introduce regularisation to prevent over-fitting before supplying them to a fully connected linear layer that generates n classes as output. The width and depth of these models are comparable, but they differ in terms of the number of trainable parameters depending upon the type of convolution used, as shown in Table  1 . It is noteworthy that the less the number of trainable parameters - the less the computational costs. A model with a lesser number of parameters would require lesser memory for computation (GPU and RAM), and also the complexity of the model is lesser—reducing the overall computational costs for both training and inference. Moreover, a lesser number of trainable parameters would also reduce the risk of overfitting.

Implementation and training

The models were implemented using PyTorch 18 , by modifying the Torchvision models 19 and were trained with a batch-size of 1 using an Nvidia RTX 4000 GPU, which has a memory of 8 GB. Models were compared with and without pre-training. Models with pre-training were pre-trained on Kinetics-400 20 , except for the stems and fully connected layers. Images from the Kinetics dataset contain three channels (RGB Images), whereas the 3D volumetric MRIs have only one channel. Therefore, the stem trained on the Kinetics dataset could not be used and was initialised randomly. Similarly, for the fully connected layer, Kinetics-400 has 400 output classes, whereas the task at hand has three classes (LGG, HGG and Healthy)—hence, this layer was also initialised with random weights.

Trainings were performed using mixed-precision 21 with the help of Nvidia’s Apex library 22 . The loss was calculated using the weighted cross-entropy loss function to minimise the under-representation of classes with fewer samples during training and was optimised using the Adam optimiser with a learning rate of 1e−5 and weight decay coefficient \(\lambda =1\) e−3. The code of this research is publicly available on GitHub: https://github.com/farazahmeds/Classification-of-brain-tumor-using-Spatiotemporal-models .

Weighted cross-entropy loss

The normalised weight value for each class ( \(W_c\) ) is calculated using:

where \(samples_c\) is the number of samples from class c and \(samples_t\) are the total number of samples from all classes. The normalised weight values from this equation is then used to scale cross-entropy loss of the respective class loss:

Where \(x_{c}\) is the true distribution and P(c) is the estimate distribution for class c. The total cross-entropy loss then is the sum of individual class losses.

Data augmentation

Different data augmentation techniques were applied to the dataset before training the models, and for that purpose, TorchIO 23 was used. Initial experiments were performed using different amounts of augmentation and can be categorised as light and heavy augmentation, where light augmentation included only random affine (scale 0.9-1.2, degrees 10) and random flip (L-R, probability 0.25); on the other hand, heavy augmentation included the ones from light augmentation together with elastic deformation and random k-space transformations (motion, spike, and ghosting). It was observed that the training of the network with heavily augmented data not only performed poorly in terms of final accuracy, but the loss took a much longer time to converge. Therefore, only light augmentation was used throughout this research.

Two different datasets were used in this work - the pathological brain images were obtained from the Brain Tumour Segmentation (BraTS) 2019 dataset, which includes images with four different MR contrasts (T1, T1 contrast-enhanced, T2 and FLAIR) 6 , 24 , 25 ; and non-pathological images were collected from the IXI Dataset 26 . Among the available four types of MRIs, T1 contrast-enhanced (T1ce) is the most commonly used contrast while performing single-contrast tumour classification 8 , 27 . Hence in this research, T1ce images of 332 subjects were used from the BRaTS dataset: 259 volumes of Glioblastoma Multiforme (high-grade glioma, HGG), and 73 volumes of low-grade glioma (LGG). 259 T1 weighted volumes were chosen randomly from the IXI dataset as healthy samples to have the same number of subjects as HGG. The final combined dataset was then randomly divided into 3-folds of training and testing split with a ratio of 7:3.

Data pre-processing

The IXI images were pre-processed first by using the brain extraction tool (BET2) of FSL 28 , 29 . This was done to keep the input data uniform throughout, as the BraTS images are already skull stripped. Moreover, the intensity values of all the volumes from the combined datasets were normalised by scaling intensities to [0.5,99.5] percentile, as used by Isensee et al. 30 . Finally, the volumes were re-sampled to the same voxel-resolution of 2mm isotropic.

Evaluation metrics

The performance of the models was compared using precision, recall, F1 score, specificity, and testing accuracy. Furthermore, a confusion matrix was used to show class-wise accuracy.

The performance of the models were compared with and without pre-training. Figures  4 , 5 , and 6 show the average accuracy over 3-fold cross validation using confusion metrics, for ResNet (2+1)D, ResNet Mixed Convolution, and ResNet 3D, respectively.

figure 4

Confusion matrix for 3-fold cross-validation on pre-trained ResNet(2+1)D.

figure 5

Confusion matrix for 3-fold cross-validation on ResNet mixed convolution.

figure 6

Confusion matrix for 3-fold cross-validation on ResNet3D18.

Figure  7 shows the class-wise performance of the different models, both with and without pre-training, using precision, recall, specificity, and F1-score.

figure 7

Heatmaps showing the class-wise performance of the classifiers, compared using precision, recall, specificity, and F1-score: ( a ) LGG, ( b ) HGG, and ( c ) healthy.

Comparison of the models

The mean F1-score over 3-fold cross-validation was used as the metric to compare the performance of the different models. Tables 2 , 3 and 4 show the results of the different models for the classes LGG, HGG, and Heathy, respectively; and finally Table  5 shows the consolidated scores.

For low-grade glioma (LGG), ResNet Mixed Convolution with pre-training achieved the highest F1 score of 0.8949 with a standard deviation of 0.033. The pre-trained ResNet(2+1)D is not far behind, with 0.8739 \({\pm }\) 0.033.

For the high-grade glioma (HGG) class, the highest F1 was achieved by the pre-trained ResNet Mixed Convolution model, with an F1 score of 0.9123 \({\pm }\) 0.029. This is higher than the best model’s F1 score for the class LGG. This can be expected because of the class imbalance between LGG and HGG. As with low-grade glioma, the second-best model for HGG is also the Pre-trained ResNet(2+1)D with the F1 score of 0.8979 \({\pm }\) 0.032.

The healthy brain class achieved the highest F1 score of 0.9998 \({\pm }\) 0.0002, with the pre-trained ResNet 3D model, which can be expected because of the complete absence of any lesion in the MR images making it far less challenging for the model to learn and distinguish it from the brain MRIs with pathology. Even though the pre-trained ResNet 3D model achieved the highest mean F1 score, all pre-trained models achieved similar F1 scores, i.e. all the mean scores are more than 0.9960—making it difficult to choose a clear winner.

ResNet Mixed Convolution with pre-training came up as the best model for both classes with pathology (LGG and HGG) and achieved a similar score as the other models while classifying healthy brain MRIs, as well as based on macro and weighted F1 scores - making this model as the clear overall winner. It can also be observed that the spatiospatial models performed better with pre-training, but ResNet 3D performed better without pre-training.

Comparison against literature

This sub-section compares the best model from the previous sub-section (i.e. ResNet Mixed Convolution with pre-training) against seven other research papers (in no specific order), where they classified LGG and HGG tumours. Mean test accuracy was used as the metric to compare the results as that was the common metric used in those papers.

Starting from Shahzadi et al. 31 , where they used LSTM-CNN to classify between HGG and LGG, using T2-FLAIR images from the BraTS 2015 dataset. Their work focuses on using a smaller sample size, and they were able to achieve 84.00% accuracy 31 . Pei et al. 9 achieved even less classification accuracy of 74.9% although they did use all of the available contrasts of the BraTS dataset, and their method performed segmentation using a U-Net like model before performing classification. Ge et al. 10 uses a novel method of fusing the contrasts into multiple streams to be trained simultaneously. Their model achieved an accuracy of 90.87% overall on all the contrasts, and they achieved 83.73% on T1ce. Mzoughi et al. 8 achieved 96.59% using deep convolutional neural networks on T1ce images. Their work does not present any other metric for their results, except for the overall accuracy of their model, which makes it difficult to compare against their results. Next, Yang et al. 27 did similar work; they used pre-trained GoogLeNet on 2D images, achieving an overall accuracy of 94.5%. They did not use the BraTS dataset, but the purpose of their work was similar - to classify glioma tumours based on LGG and HGG grading. Their dataset had fewer samples of LGG and HGG class in comparison to this research, with the former having 52 samples, and later 61 samples 27 . Ouerghi et al. 11 used different machine learning methods in their paper to train on the fusion images, one of which is the random forest, on which they achieved 96.5% for classification between High-Grade and Low-Grade Glioma. Finally, the Zhuge et al. 32 achieved an impressive 97.1% using Deep CNN for classification of glioma based on LGG and HGG grading, beating the proposed model by 0.12%. This difference can be explained by two factors, 1) their use of an additional dataset from The Cancer Imaging Archive (TCIA) in combination with BraTS 2018 2) and their use of four different contrasts - both these factors increase the size of the training set significantly. Furthermore, no cross-validation has been reported in their paper. Table  6 shows the complete comparative results.

The F1 scores of all the models in classifying healthy brains were very close to one, as segregating healthy brains from brains with pathology is comparatively a simpler task than classifying the grade of the tumour. Furthermore, using two different datasets for healthy and pathological brain, MRIs could have also introduced a dataset bias. In classifying the grade of the tumour, the pre-trained ResNet Mixed Convolution model performed best, while in classifying healthy brains, all the three pre-trained models performed similarly. For comparing the models based on consolidated scores, macro and weighted F1 scores were used. However, the macro F1 score is to be given more importance as the dataset was imbalanced. Both of the metrics declared the pre-trained ResNet Mixed Convolution as the clear winner.

One interesting observation that can be made from the confusion matrices is that the classification performance of the models for the LGG class has been lower than the other two classes. Even the best performing model managed to get an accuracy of 81% for LGG while achieving 96% for HGG and nearly perfect results for healthy. This might be attributed to the fact that the dataset was highly imbalanced (“ Dataset ” section), i.e. 259 volumes each for HGG and healthy, while having 73 volumes for LGG. Even though weighted cross-entropy loss (“ Weighted cross-entropy loss ” section) was used in this research to deal with the problem of class imbalance, increasing the number of LGG samples or employing further techniques to deal with this problem further and might improve the performance of the models for LGG 33 .

It is noteworthy that the pre-trained ResNet Mixed Convolution resulted in the best classification performance, even though it is the model with the least number of trainable parameters (see Table  1 ). Moreover, it is to be noted that both spatiospatial models performed better than the pure 3D ResNet18 model, even though they had a fewer number of trainable parameters than the 3D ResNet18. A fewer number of trainable parameters can reduce the computational costs, as well as the chance of overfitting. The authors hypothesise that the increased non-linearity due to the additional activation functions between the 2D and 1D convolutions in (2+1)D convolutional layers helped the ResNet (2+1)D model to achieve better results than ResNet3D, and the reduction of trainable parameters while having a similar number of layers, in turn preserving the level of non-linearity, contributed to the success of ResNet Mixed Convolution. Even though it has been seen that the spatiospatial models performed better, it is worthy of mention that the spatiospatial models do not adequately maintain the 3D nature of the data—the spatial relationship between the three dimensions is not preserved within the network like a fully 3D network as ResNet3D—which is a limitation of this architecture, which might have some unforeseen adverse effects. The authors hypothesised that this relationship was indirectly maintained through the channels of the network, and the network could learn the general representation to be able to classify appropriately. The experiments have also shown that the spatiospatial models are superior to a fully 3D model for the brain tumour classification problem shown here. Nevertheless, before creating a common consensus about this finding, these models should be further evaluated for other tasks.

In this research, the slice dimension in the axial orientation was considered as the “specially-treated” spatial dimension of the spatiospatial models, which can also be seen as the pseudo-temporal dimension of the spatiotemporal models. The authors hypothesise that using the data in sagittal or coronal orientation in a similar way might also be possible to exploit the advantages of such models, which it is yet to be tested.

It can also be observed that the pre-trained models were the winners for all three different classes. However, the effect of pre-training was not the same on all three models. For both the spatiospatial models, pre-training improved the model’s performance, but in different amounts: 2.24% improvement for ResNet (2+1)D and 8.57% for ResNet Mixed Convolution (based on macro F1 scores). However, pre-training had a negative impact on the 3D ResNet18 model (for two out of three classes), causing it to decrease the macro F1 score by 1.87%. Average macro F1 scores for all the models with and without pre-training (0.9169 with pre-training, 0.8912 without pre-training) show that the pre-training resulted in an overall improvement of 2.88% across models. It is noteworthy that the pre-trained networks were initially trained on RGB videos. Pre-training them on MRI volumes or MR videos (dynamic MRIs) might further improve the performance of the models.

Regarding the comparisons to other published works, an interesting point to note is that the previous papers only classified different grades of brain tumours (LGG and HGG), whereas this paper also classified healthy brains as an additional class. Thus, the results are not fully comparable as more classes increase the difficulty of the task. Even then, the results obtained by the winning model are better than all previously published methods, except for one, which reported comparable results to the ResNet Mixed Convolution (that paper reported 0.12% better accuracy, and 0.41% less specificity). However, this paper used four different contrasts and an additional dataset apart from BraTS, making them have a larger dataset for training.

This paper shows that the spatiotemporal models, ResNet(2+1)D and ResNet Mixed Convolution, working as spatiospatial models, could improve the classification of grades of brain tumours (i.e. low-grade and high-grade glioma), as well as classifying brain images with and without tumours, while reducing the computational costs. A 3D ResNet18 model was used to compare the performance of the spatiospatial models against a pure 3D convolution model. Each of the three models was trained from scratch and also trained using weights from pre-trained models that were trained on an action recognition dataset—to compare the effectiveness of pre-training in this setup. The final results were generated using cross-validation with three folds. It was observed that the spatiospatial models performed better than a pure 3D convolutional ResNet18 model, even though having fewer trainable parameters. It can be observed further that pre-training improved the performance of the models. Overall, the pre-trained ResNet Mixed Convolution model was observed to be the best model in terms of F1-score, obtaining a macro F1-score of 0.9345 and a mean test accuracy of 96.98%, while achieving 0.8949 and 0.9123 F1-scores for low-grade glioma and high-grade glioma, respectively. This study shows that the spatiospatial models have the potential to outperform a fully 3D convolutional model. However, this was only shown for a specific task here—brain tumour classification, using one dataset—BraTS. These models should be compared for other tasks in the future to build a common consensus regarding the spatiospatial models. One limitation of this study is that it only used T1 contrast-enhanced images for classifying the tumours, which already resulted in good accuracy. Incorporating all four available types of images (T1, T1ce, T2, T2-Flair) or any combination of them might improve the performance of the model even further.

Fritz, A. et al. International Classification of Diseases for Oncology Vol. 3 (World Health Organization, Geneva, 2001).

Google Scholar  

Goodenberger, M. L. et al. Genetics of adult glioma. Cancer Genet. 205 , 613–621 (2012).

Article   CAS   Google Scholar  

Claus, E. B. et al. Survival and low-grade glioma: The emergence of genetic information. Neurosurg. Focus 38 , E6 (2015).

Article   Google Scholar  

Raza, S. M. et al. Necrosis and glioblastoma: A friend or a foe? A review and a hypothesis. Neurosurgery 51 , 2–13 (2002).

Engelhorn, T. et al. Cellular characterization of the peritumoral edema zone in malignant brain tumors. Cancer Sci. 100 , 1856–1862 (2009).

Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. 34 , 1993–2024 (2014).

Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the chexnext algorithm to practicing radiologists. PLoS Med. 15 , e1002686 (2018).

Mzoughi, H. et al. Deep multi-scale 3d convolutional neural network (cnn) for mri gliomas brain tumor classification. J. Digit. Imaging 33 , 903–915 (2020).

Pei, L. et al. Brain tumor classification using 3d convolutional neural network. In International MICCAI Brain lesion Workshop , 335–342 (2019).

Ge, C. et al. Deep learning and multi-sensor fusion for glioma classification using multistream 2d convolutional networks. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , 5894–5897 (2018).

Ouerghi, H. et al. Glioma classification via mr images radiomics analysis. Vis. Comput. 2021 , 1–15 (2021).

He, K. et al. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 770–778 (2016).

Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , 6450–6459 (2018).

Torrey, L. et al. Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques , 242–264 (2010).

Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. IEEE 109 , 43–76 (2020).

Sarasaen, C. et al. Fine-tuning deep learning model parameters for improved super-resolution of dynamic mri with prior-knowledge. Artif. Intell. Med. 121 , 102196 (2021).

Pallud, J. et al. Quantitative morphological magnetic resonance imaging follow-up of low-grade glioma: A plea for systematic measurement of growth rates. Neurosurgery 71 , 729–740 (2012).

Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 , 8026–8037 (2019).

Torchvision models. https://pytorch.org/vision/stable/models.html#video-classification . Accessed on 15th December 2021.

Kinetics-400 dataset. https://deepmind.com/research/open-source/kinetics . Accessed on 15th December 2021.

Micikevicius, P. et al. Mixed precision training. arXiv preprint arXiv:1710.03740 (2017).

Nvidia apex. https://github.com/NVIDIA/apex . Accessed on 15th December 2021.

Pérez-García, F. et al. Torchio: A python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed. 2021 , 106236 (2021).

Bakas, S. et al. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Sci. data 4 , 1–13 (2017).

Bakas, S. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018).

Ixi dataset. https://brain-development.org/ixi-dataset . Accessed on 15th December 2021.

Yang, Y. et al. Glioma grading on conventional mr images: A deep learning study with transfer learning. Front. Neurosci. 12 , 804 (2018).

Smith, S. M. et al. Advances in functional and structural mr image analysis and implementation as fsl. Neuroimage 23 , S208–S219 (2004).

Jenkinson, M. et al. Smith sm. FSL Neuroimage 62 , 782–90 (2012).

Isensee, F. et al. nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486 (2018).

Shahzadi, I. et al. Cnn-lstm: Cascaded framework for brain tumour classification. In 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES) , 633–637 (IEEE, 2018).

Zhuge, Y. et al. Automated glioma grading on conventional mri images using deep convolutional neural networks. Med. Phys. 47 , 3044–3053 (2020).

Johnson, J. M. et al. Survey on deep learning with class imbalance. J. Big Data 6 , 1–54 (2019).

Download references


This work was in part conducted within the context of the International Graduate School MEMoRIAL at Otto von Guericke University (OVGU) Magdeburg, Germany, kindly supported by the European Structural and Investment Funds (ESF) under the programme “Sachsen-Anhalt WISSENSCHAFT Internationalisierung” (Project No. ZS/2016/08/80646).

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Soumick Chatterjee and Faraz Ahmed Nizamani.

Authors and Affiliations

Biomedical Magnetic Resonance, Otto von Guericke University Magdeburg, Magdeburg, Germany

Soumick Chatterjee & Oliver Speck

Data and Knowledge Engineering Group, Otto von Guericke University Magdeburg, Magdeburg, Germany

Soumick Chatterjee & Andreas Nürnberger

Faculty of Computer Science, Otto von Guericke University, Magdeburg, Germany

Institute for Medical Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany

Faraz Ahmed Nizamani

Center for Behavioral Brain Sciences, Magdeburg, Germany

Andreas Nürnberger & Oliver Speck

German Center for Neurodegenerative Disease, Magdeburg, Germany

Oliver Speck

Leibniz Institute for Neurobiology, Magdeburg, Germany

You can also search for this author in PubMed   Google Scholar


S.C. developed the idea and wrote the manuscript. F.N. Implemented the idea and performed the experiments. A.N. and O.S. created the concept and design the work, and finally reviewed the manuscript.

Corresponding author

Correspondence to Soumick Chatterjee .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and Permissions

About this article

Cite this article.

Chatterjee, S., Nizamani, F.A., Nürnberger, A. et al. Classification of brain tumours in MR images using deep spatiospatial models. Sci Rep 12 , 1505 (2022). https://doi.org/10.1038/s41598-022-05572-6

Download citation

Received : 28 May 2021

Accepted : 14 January 2022

Published : 27 January 2022

DOI : https://doi.org/10.1038/s41598-022-05572-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

brain tumor research paper

  • Reference Manager
  • Simple TEXT file

People also looked at

Methods article, brain tumor segmentation and survival prediction using multimodal mri scans with deep learning.

brain tumor research paper

  • 1 School of Innovation and Entrepreneurship, Southern University of Science and Technology, Shenzhen, China
  • 2 College of Engineering, Peking University, Beijing, China

Gliomas are the most common primary brain malignancies. Accurate and robust tumor segmentation and prediction of patients' overall survival are important for diagnosis, treatment planning and risk factor identification. Here we present a deep learning-based framework for brain tumor segmentation and survival prediction in glioma, using multimodal MRI scans. For tumor segmentation, we use ensembles of three different 3D CNN architectures for robust performance through a majority rule. This approach can effectively reduce model bias and boost performance. For survival prediction, we extract 4,524 radiomic features from segmented tumor regions, then, a decision tree and cross validation are used to select potent features. Finally, a random forest model is trained to predict the overall survival of patients. The 2018 MICCAI Multimodal Brain Tumor Segmentation Challenge (BraTS), ranks our method at 2nd and 5th place out of 60+ participating teams for survival prediction tasks and segmentation tasks respectively, achieving a promising 61.0% accuracy on the classification of short-survivors, mid-survivors and long-survivors.

1. Introduction

A brain tumor is a cancerous or noncancerous mass or growth of abnormal cells in the brain. Originating in the glial cells, gliomas are the most common brain tumor ( Ferlay et al., 2010 ). Depending on the pathological evaluation of the tumor, gliomas can be categorized into glioblastoma (GBM/HGG), and lower grade glioma (LGG). Glioblastoma is one of the most aggressive and fatal human brain tumors ( Bleeker et al., 2012 ). Gliomas contain various heterogeneous histological sub-regions, including peritumoral edema, a necrotic core, an enhancing and a non-enhancing tumor core. Magnetic resonance imaging (MRI) is commonly used in radiology to portray the phenotype and intrinsic heterogeneity of gliomas, since multimodal MRI scans, such as T1-weighted, contrast enhanced T1-weighted (T1Gd), T2-weighted, and Fluid Attenuation Inversion Recovery (FLAIR) images, provide complementary profiles for different sub-regions of gliomas. For example, the enhancing tumor sub-region is described by areas that show hyper-intensity in a T1Gd scan when compared to a T1 scan.

Accurate and robust predictions of overall survival, using automated algorithms, for patients diagnosed with gliomas can provide valuable guidance for diagnosis, treatment planning, and outcome prediction ( Liu et al., 2018 ). However, it is difficult to select reliable and potent prognostic features. Medical imaging (e.g., MRI, CT) can provide radiographic phenotype of tumor, and it has been exploited to extract and analyze quantitative imaging features ( Gillies et al., 2016 ). Clinical data, including patient age and resection status, can also provide important information about patients' outcome.

Segmentation of gliomas in pre-operative MRI scans, conventionally done by expert board-certified neuroradiologists, can provide quantitative morphological characterization and measurement of glioma sub-regions. It is also a pre-requisite for survival prediction since most potent features are derived from the tumor region. This quantitative analysis has great potential for diagnosis and research, as it can be used for grade assessment of gliomas and planning of treatment strategies. But this task is challenging due to the high variance in appearance and shape, ambiguous boundaries and imaging artifacts, while automatic segmentation has the advantage of fast speed, consistency in accuracy and immunity to fatigue ( Sharma and Aggarwal, 2010 ). Until now, the automatic segmentation of brain tumors in multimodal MRI scans is still one of the most difficult tasks in medical image analysis. In recent years, deep convolutional neural networks (CNNs) have achieved great success in the field of computer vision. Inspired by the biological structure of visual cortex ( Fukushima, 1980 ), CNNs are artificial neural networks with multiple hidden convolutional layers between the input and output layers. They have non-linear properties and are capable of extracting higher level representative features ( Gu et al., 2018 ). Deep learning methods with CNN have shown excellent results on a wide variety of other medical imaging tasks, including diabetic retinopathy detection ( Gulshan et al., 2016 ), skin cancer classification ( Esteva et al., 2017 ), and brain tumor segmentation ( Çiçek et al., 2016 ; Isensee et al., 2017 ; Wang et al., 2017 ; Sun et al., 2018 ).

In this paper, we present a novel deep learning-based framework for segmentation of a brain tumor and its subregions from multimodal MRI scans, and survival prediction based on radiomic features extracted from segmented tumor sub-regions as well as clinical features. The proposed framework for brain tumor segmentation and survival prediction using multimodal MRI scans consists of the following steps, as illustrated in Figure 1 . First, tumor subregions are segmented using an ensemble model comprising three different convolutional neural network architectures for robust performance through voting (majority rule). Then radiomic features are extracted from tumor sub-regions and total tumor volume. Next, decision tree regression model with gradient boosting is used to fit the training data and rank the importance of features based on variance reduction. Cross validation is used to select the optimal number of top-ranking features to use. Finally, a random forest regression model is used to fit the training data and predict the overall survival of patients.


Figure 1 . Framework overview.

2. Materials and Methods

2.1. dataset.

We utilized the BraTS 2018 dataset ( Menze et al., 2015 ; Bakas et al., 2017a , b , c , 2018 ) to evaluate the performance of our methods. The training set contained images from 285 patients, including 210 HGG and 75 LGG. The validation set contained MRI scans from 66 patients with brain tumors of an unknown grade. It was a predefined set constructed by BraTS challenge organizers. The test set contained images from 191 patients with a brain tumor, in which 77 patients had a resection state of Gross Total Resection (GTR) and were evaluated for survival prediction. Each patient was scanned with four sequences: T1, T1Gd, T2, and FLAIR. All the images were skull-striped and re-sampled to an isotropic 1 mm 3 resolution, and the four sequences of the same patient had been co-registered. The ground truth of segmentation mask was obtained by manual segmentation results given by experts. The evaluation of the model performance on the validation and testing set is performed on CBICA's Image Processing Portal ipp.cbica.upenn.edu . Segmentation annotations comprise of the following tumor subtypes: Necrotic/non-enhancing tumor (NCR), peritumoral edema (ED), and Gd-enhancing tumor (ET). Resection status and patient age are also provided. The overall survival (OS) data, defined in days, is also included in the training set. The distribution of patients' age is shown in Figure 2 .


Figure 2 . Overall survival distribution of patients across the training, validation, and testing sets.

2.2. Data Preprocessing

Since the intensity value of MRI is dependent on the imaging protocol and scanner used, we applied intensity normalization to reduce the bias in imaging. More specifically, the intensity value of each MRI is subtracted by the mean and divided by the standard deviation of the brain region. In order to reduce overfitting, we applied random flipping and random gaussian noise to augment the training set.

2.3. Network Architecture

In order to perform accurate and robust brain tumor segmentation, we use an ensemble model comprising of three different convolutional neural network architectures. A variety of models have been proposed for tumor segmentation. Generally, they differ in model depth, filter number, connection way and others. Different model architectures can lead to different model performance and behavior. By training different kinds of models separately and by merging the results, the model variance can be decreased, and the overall performance can be improved ( Polikar, 2006 ; Kamnitsas et al., 2017 ). We used three different CNN models and fused the result by voting (majority rule). The detailed description of each model will be discussed in the following sections.

2.3.1. CA-CNN

The first network we employed was Cascaded Anisotropic Convolutional Neural Network (CA-CNN) proposed by Wang et al. (2017) . The cascade is used to convert multi-class segmentation problem into a sequence of three hierarchical binary segmentation problems. The network is illustrated in Figure 3 .


Figure 3 . Cascaded framework and architecture of CA-CNN.

This architecture also employs anisotropic and dilated convolution filters, which are combined with multi-view fusions to reduce false positives. It also employs residual connections ( He et al., 2016 ), batch normalization ( Ioffe and Szegedy, 2015 ) and multi-scale prediction to boost the performance of segmentation. For implementation, we trained the CA-CNN model using Adam optimizer ( Kingma and Ba, 2014 ) and set Dice coefficient ( Milletari et al., 2016 ) as the loss function. We set the initial learning rate to 1 × 10 −3 , weight decay 1 × 10 −7 , batch size 5, and maximal iteration 30 k .

2.3.2. DFKZ Net

The second network we employed was DFKZ Net, which was proposed by Isensee et al. (2017) from the German Cancer Research Center (DFKZ). Inspired by U-Net, DFKZ Net employs a context encoding pathway that extracts increasingly abstract representations of the input, and a decoding pathway used to recombine these representations with shallower features to precisely segment the structure of interest. The context encoding pathway consists of three content modules, each has two 3 × 3 × 3 convolutional layers and a dropout layer with residual connection. The decoding pathway consists of three localization modules, each containing 3 × 3 × 3 convolutional layers followed by a 1 × 1 × 1 convolutional layer. For the decoding pathway, the output of layers of different depths are integrated by elementwise summation, thus the supervision can be injected deep in the network. The network is illustrated in Figure 4 .


Figure 4 . Architecture of DFKZ Net.

For implementation, we trained the network using the Adam optimizer. To address the problem of class imbalance, we utilized the multi-class Dice loss function ( Isensee et al., 2017 ):

where u denotes output possibility, v denotes one-hot encoding of ground truth, k denotes the class, K denotes the total number of classes and i ( k ) denotes the number of voxels for class k in patch. We set initial learning rate 5 × 10 −4 and used instance normalization ( Ulyanov et al., 2016a ). We trained the model for 90 epochs.

2.3.3. 3D U-Net

U-Net ( Ronneberger et al., 2015 ; Çiçek et al., 2016) is a classical network for biomedical image segmentation. It consists of a contracting path to capture context and a symmetric expanding path that enables precise localization with extension. Each pathway has three convolutional layers with dropout and pooling. The contracting pathway and expanding pathway are linked by skip-connections. Each layer contains 3 × 3 × 3 convolutional kernels. The first convolutional layer has 32 filters, while deeper layers contains twice filters than previous shallower layer.

For implementation, we used Adam optimizer ( Kingma and Ba, 2015 ), and instance normalization ( Ulyanov et al., 2016b ). In addition, we utilized cross entropy as the loss function. The initial learning rate was 0.001, and the model is trained for 4 epochs.

2.3.4. Ensemble of Models

In order to enhance segmentation performance and to reduce model variance, we used the voting strategy (majority rule) to build an ensemble model without using a weighted scheme. During the training process, different models were trained independently. The selection of the number of iterations in the training process was based on the model's performance in the validation set. In the testing stage, each model independently predicts the class for each voxel, the final class is determined by the majority rule.

2.4. Feature Extraction

Quantitative phenotypic features from MRI scans can reveal the characteristics of brain tumors. Based on the segmentation result, we extract radiomics features from edema, non-enhancing solid core and necrotic/cystic core and the whole tumor region respectively using Pyradiomics toolbox ( Van Griethuysen et al., 2017 ). Illustration of feature extraction is shown in Figure 5 .


Figure 5 . Illustration of feature extraction.

The modality used for feature extraction is dependent on the intrinsic properties of the tumor subregion. For example, edema features are extracted from FLAIR modality, since it is typically depicted by hyper-intense signal in FLAIR. Non-enhancing solid core features are extracted from T1Gd modality, since the appearance of the necrotic (NCR) and the non-enhancing (NET) tumor core is typically hypo-intense in T1Gd when compared to T1. Necrotic/cystic core tumor features are extracted from T1Gd modality, since it is described by areas that show hyper-intensity in T1Gd when compared to T1.

The features we extracted can be grouped into three categories. The first category is the first order statistics, which includes maximum intensity, minimum intensity, mean, median, 10th percentile, 90th percentile, standard deviation, variance of intensity value, energy, entropy, and others. These features characterize the gray level intensity of the tumor region.

The second category is shape features, which include volume, surface area, surface area to volume ratio, maximum 3D diameter, maximum 2D diameter for axial, coronal and sagittal plane respectively, major axis length, minor axis length and least axis length, sphericity, elongation, and other features. These features characterize the shape of the tumor region.

The third category is texture features, which include 22 gray level co-occurrence matrix (GLCM) features, 16 gray level run length matrix (GLRLM) features, 16 Gray level size zone matrix (GLSZM) features, five neighboring gray tone difference matrix (NGTDM) features and 14 gray level dependence matrix (GLDM) Features. These features characterize the texture of the tumor region.

Not only do we extract features from original images, but we also extract features from Laplacian of Gaussian (LoG) filtered images and images generated by wavelet decomposition. Because LoG filtering can enhance the edge of images, possibly enhance the boundary of the tumor, and wavelet decomposition can separate images into multiple levels of detail components (finer or coarser). More specifically, from each region, 1131 features are extracted, including 99 features extracted from the original image, and 344 features extracted from Laplacian of Gaussian filtered images, since we used four filters with sigma values 2.0, 3.0, 4.0, 5.0, respectively, and 688 features extracted from eight wavelet decomposed images (all possible combinations of applying either a High or a Low pass filter in each of the three dimensions). In total, for each patient, we extracted 1131 × 4 = 4524 radiomic features, these features are combined with clinical data (age and resection state) for survival prediction. The values of these features except for resection state are normalized by subtracting the mean and scaling it to unit variance.

2.5. Feature Selection

A portion of the features we extracted were redundant or irrelevant to survival prediction. In order to enhance performance and reduce overfitting, we applied feature selection to select a subset of features that have the most predictive power. Feature selection is divided into two steps: importance ranking and cross validation. We ranked the importance of features by fitting a decision tree regressor with gradient boosting using training data, then the importance of features can be determined by how effectively the feature can reduce intra-node standard deviation in leaf nodes. The second step is to select the optimal number of best features for prediction by cross validation. In the end, we selected 14 features and their importance are listed in Table 1 . The detailed feature definition can be found at ( https://pyradiomics.readthedocs.io/en/latest/features.html ), last accessed on 30 June 2018.


Table 1 . Selected most predicative features (WT, edema; TC, tumor core; ET, enhancing tumor; FULL, full tumor volume comprised of edema, tumor core, and enhancing tumor; N/A, not applicable).

Unsurprisingly, age had the most predictive power among all of the features. The rest of the features selected came from both original images and derived images. We also found that most features selected came from images generated by wavelet decomposition.

2.6. Survival Prediction

Based on the 14 features selected, we trained a random forest regression model ( Ho, 1995 ) for final survival prediction. The random forest regressor is a meta regressor of 100 base decision tree regressors. Each base regressor is trained on a bootstrapped sub-dataset into order to introduce randomness and diversity. Finally, the prediction from base regressors are averaged to improve prediction accuracy, robustness and suppress overfitting. Mean squared error is used as loss function when constructing individual regression model.

3.1. Result of Tumor Segmentation

We trained the model using the 2018 MICCAI BraTS training set using the methods described above. We then applied the trained model for prediction on the validation and test set. We compared the segmentation result of the ensemble model with the individual model on the validation set. The evaluation result of our approach is shown in Table 2 . For other teams' performance, please see the BraTS summarizing paper ( Bakas et al., 2018 ). The result demonstrates that the ensemble model performs better than individual models in enhancing tumor and whole tumor, while CA-CNN performs marginally better on the tumor core.


Table 2 . Evaluation result of ensemble model and individual models.

The predicted segmentation labels are uploaded to the CBICA's Image Processing Portal (IPP) for evaluation. BraTS Challenge uses two schemes for evaluation: Dice score and the Hausdorff distance (95th percentile). Dice score is a widely used overlap measure for pairwise comparison of segmentation mask S and G . It can be expressed in terms of set operations:

Hausdorff distance is the maximum distance of a set to the nearest point in the other set, defined as:

where sup represents the supremum and inf the infimum. In order to have more robust results and to avoid issues with noisy segmentation, the evaluation scheme uses the 95th percentile.

In the test phase, our result ranked 5th out of 60+ teams. The evaluation result of the segmentation on the validation and test set are listed in Table 3 . Examples of the segmentation result compared with ground truth are shown in Figure 6 .


Table 3 . Evaluation result of ensemble model for segmentation.


Figure 6 . Examples of segmentation result compared with ground truth. Image ID: TCIA04_343_1, Green:edema, Yellow:non-enhancing solid core, Red:enhancing core.

3.2. Result of Survival Prediction

Based on the segmentation result of brain tumor subregions, we extracted features from brain tumor sub-regions segmented from MRI scans and trained the survival prediction model as described above. We then used the model to predict patient's overall survival on the validation and test set. The predicted overall survival was uploaded to the IPP for evaluation. We used two schemes for evaluation: classification of subjects as long-survivors (>15 months), short-survivors (<10 months), and mid-survivors (between 10 and 15 months) and median error (in days). In the test phase, we ranked second out of 60+ teams. The evaluation results of our method are listed in Table 4 . For other teams' performance, please see the BraTS summarizing paper ( Bakas et al., 2018 ).


Table 4 . Evaluation result of survival prediction.

4. Discussion

In this paper, we present an automatic framework for the prediction of survival in glioma using multimodal MRI scans and clinical features. First, a deep convolutional neural network is used to segment a tumor region from MRI scans, then radiomics features are extracted and combined with clinical features to predict overall survival. For tumor segmentation, we used ensembles of three different 3D CNN architectures for robust performance through voting (majority rule). The evaluation results show that the ensemble model performs better than individual models, which indicates that the ensemble approach can effectively reduce model bias and boost performance. Although the Dice score for segmentation is promising, we noticed that the specificity of the model is much higher than the sensitivity, indicating an under-segmentation of the model. For survival prediction, we extracted shape features, first order statistics, and texture features from segmented tumor sub-region, then used a decision tree and cross validation to select features. Finally, a random forest model was trained to predict the overall survival of patients. The accuracy for three-class classification is 61.0%, which still leaves room for improvement. Part of the reason is that we only had a very limited number of samples (285 patients) to train the regression model. In addition, imaging and limited clinical features may only explain patients' survival outcome partially, too. In the future, we will explore different network architectures and training strategies to further improve our result. We will also design new features and optimize our feature selection methods for survival prediction.

Data Availability

The datasets analyzed for this study can be found in the BraTS 2018 dataset https://www.med.upenn.edu/sbia/brats2018/data.html .

Author Contributions

LS and SZ performed the analysis and prepared the manuscript. HC helped with the analysis. LL conceived the project, supervised and funded the study, and prepared the manuscript.

Financial support from the Shenzhen Science and Technology Innovation (SZSTI) Commission (JCYJ20180507181527806 and JCYJ20170817105131701) is gratefully acknowledged.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., et al. (2017a). Segmentation labels and radiomic features for the pre-operative scans of the tcga-gbm collection. Cancer Imaging Arch. 286. doi: 10.7937/K9/TCIA.2017.KLXWJJ1Q

CrossRef Full Text | Google Scholar

Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., et al. (2017b). Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection. Cancer Imaging Arch. 286. doi: 10.7937/K9/TCIA.2017.GJQ7R0EF

Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J. S., et al. (2017c). Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Sci. Data 4:170117. doi: 10.1038/sdata.2017.117

PubMed Abstract | CrossRef Full Text | Google Scholar

Bakas, S., Reyes, M., et Int, and Menze, B. (2018). Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv arXiv:1811.02629.

Google Scholar

Bleeker, F. E., Molenaar, R. J., and Leenstra, S. (2012). Recent advances in the molecular understanding of glioblastoma. J. Neuro Oncol. 108, 11–27. doi: 10.1007/s11060-011-0793-0

Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., and Ronneberger, O. (2016). “3d u-net: learning dense volumetric segmentation from sparse annotation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 , eds S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells (Cham: Springer International Publishing), 424–432.

Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., and Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118. doi: 10.1038/nature21056

Ferlay, J., Shin, H.-R., Bray, F., Forman, D., Mathers, C., and Parkin, D. M. (2010). Estimates of worldwide burden of cancer in 2008: Globocan 2008. Int. J. Cancer 127, 2893–2917. doi: 10.1002/ijc.25516

Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernet. 36, 193–202.

PubMed Abstract | Google Scholar

Gillies, R. J., Kinahan, P. E., and Hricak, H. (2016). Radiomics: images are more than pictures, they are data. Radiology 278, 563–577. doi: 10.1148/radiol.2015151169

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., et al. (2018). Recent advances in convolutional neural networks. Patt. Recogn. 77, 354–377. doi: 10.1016/j.patcog.2017.10.013

Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410. doi: 10.1001/jama.2016.17216

He, K., Zhang, X., Ren, S., and Sun, J. (2016). “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Beijing), 770–778.

Ho, T. K. (1995). “Random decision forests,” in Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1 , ICDAR '95 (Washington, DC: IEEE Computer Society), 278.

Ioffe, S., and Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv arXiv:1502.03167.

Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., and Maier-Hein, K. H. (2017). “Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge,” in International MICCAI Brainlesion Workshop (Springer), 287–297.

Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N., et al. (2017). “Ensembles of multiple models and architectures for robust brain tumour segmentation,” in International MICCAI Brainlesion Workshop (London: Springer), 450–462.

Kingma, D. P., and Ba, J. (2014). Adam: a method for stochastic optimization. arXiv arXiv:1412.6980.

Kingma, D. P., and Ba, J. (2015). “Adam: a method for stochastic optimization,” in International Conference on Learning Representations (Amsterdam).

Liu, L., Zhang, H., Wu, J., Yu, Z., Chen, X., Rekik, I., et al. (2018). Overall survival time prediction for high-grade glioma patients based on large-scale brain functional networks. Brain Imaging Behav . 1–19. doi: 10.1007/s11682-018-9949-2

Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., et al. (2015). The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imag. 34:1993–2024. doi: 10.1109/TMI.2014.2377694

Milletari, F., Navab, N., and Ahmadi, S.-A. (2016). “V-net: fully convolutional neural networks for volumetric medical image segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV) , (Munich: IEEE) 565–571.

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circ. Syst. Magaz. 6, 21–45. doi: 10.1109/MCAS.2006.1688199

Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-net: convolutional networks for biomedical image segmentation,” in International Conference on Medical Image computing and Computer-Assisted Intervention (Freiburg: Springer), 234–241.

Sharma, N., and Aggarwal, L. M. (2010). Automated medical image segmentation techniques. J. Med. Phys. Assoc. Med. Phys. India 35:3–14. doi: 10.4103/0971-6203.58777

Sun, L., Zhang, S., and Luo, L. (2018). “Tumor segmentation and survival prediction in glioma with deep learning,” in International MICCAI Brainlesion Workshop (Shenzhen: Springer), 83–93.

Ulyanov, D., Vedaldi, A., and Lempitsky, V. (2016a). Instance normalization: The missing ingredient for fast stylization. arXiv arXiv:1607.08022.

Ulyanov, D., Vedaldi, A., and Lempitsky, V. (2016b). Instance normalization: the missing ingredient for fast stylization. arxiv 2016. arXiv arXiv:1607.08022.

Van Griethuysen, J. J. M., Fedorov, A., Parmar, C., Hosny, A., Aucoin, N., Narayan, V., et al. (2017). Computational radiomics system to decode the radiographic phenotype. Cancer Res. 77, e104–e107. doi: 10.1158/0008-5472.CAN-17-0339

Wang, G., Li, W., Ourselin, S., and Vercauteren, T. (2017). “Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks,” in International MICCAI Brainlesion Workshop (London: Springer), 178–190.

Keywords: survival prediction, brain tumor segmentation, 3D CNN, multimodal MRI, deep learning

Citation: Sun L, Zhang S, Chen H and Luo L (2019) Brain Tumor Segmentation and Survival Prediction Using Multimodal MRI Scans With Deep Learning. Front. Neurosci. 13:810. doi: 10.3389/fnins.2019.00810

Received: 26 April 2019; Accepted: 22 July 2019; Published: 16 August 2019.

Reviewed by:

Copyright © 2019 Sun, Zhang, Chen and Luo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lin Luo, luol@pku.edu.cn

This article is part of the Research Topic

Multimodal Brain Tumor Segmentation and Beyond

brain tumor research paper

Brain Tumor Pathology

Journal information.

  • Takashi Komori

Journal metrics

Latest issue.

brain tumor research paper

Issue 4, October 2023

Latest articles

Touch imprint cytology is useful for the intraoperative pathological diagnosis of pitnets’ surgical margins, authors (first, second and last of 10).

  • Noriaki Tanabe
  • Naoko Inoshita
  • Shozo Yamada
  • Content type: Original Article
  • Open Access
  • Published: 06 October 2023
  • Pages: 215 - 221

brain tumor research paper

Intraventricular central neurocytoma molecularly defined as extraventricular neurocytoma: a case representing the discrepancy between clinicopathological and molecular classifications

Authors (first, second and last of 7).

  • Daisuke Sato
  • Hirokazu Takami
  • Nobuhito Saito
  • Content type: Case Report
  • Published: 11 September 2023
  • Pages: 230 - 234

brain tumor research paper

Diffuse paediatric-type high-grade glioma, H3-wildtype and IDH-wildtype: case series of a new entity

Authors (first, second and last of 8).

  • Katja Bender
  • Johannes Kahn
  • Published: 10 August 2023
  • Pages: 204 - 214

brain tumor research paper

Integrated analysis of multiple methods reveals characteristics of the immune microenvironment in medulloblastoma

Authors (first, second and last of 4).

  • Published: 09 August 2023
  • Pages: 191 - 203

brain tumor research paper

Spontaneous malignant transformation of trigeminal schwannoma: consideration of responsible gene alterations for tumorigenesis—a case report

Authors (first, second and last of 13).

  • Natsuki Ogasawara
  • Shinji Yamashita
  • Hideo Takeshima
  • Published: 29 July 2023
  • Pages: 222 - 229

brain tumor research paper

Societies, partners and affiliations

New Content Item


For authors

Working on a manuscript.

Avoid the most common mistakes and prepare your manuscript for journal editors.

About this journal

  • Chemical Abstracts Service (CAS)
  • Current Contents/Clinical Medicine
  • EBSCO Discovery Service
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Journal Citation Reports/Science Edition
  • OCLC WorldCat Discovery Service
  • ProQuest-ExLibris Primo
  • ProQuest-ExLibris Summon
  • Science Citation Index Expanded (SCIE)
  • Semantic Scholar
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Springer policies

© The Japan Society of Brain Tumor Pathology

brain tumor research paper

Modal title

modal img

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access


Research Article

Brain tumor detection and segmentation: Interactive framework with a visual interface and feedback facility for dynamically improved accuracy and trust

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliation Department of Computer Science, University of Calgary, Alberta, Canada

Roles Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – review & editing

Affiliation Department of Computer Engineering, Istanbul Medipol University, Istanbul, Turkey

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Validation, Writing – review & editing

* E-mail: [email protected]

Affiliation International School of Medicine, Istanbul Medipol University, Istanbul, Turkey

Roles Conceptualization, Investigation, Methodology, Project administration, Supervision, Validation, Writing – review & editing

Affiliation Department of Computer Engineering, Ankara Medipol University, Ankara, Turkey

Roles Conceptualization, Project administration, Supervision, Writing – review & editing

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing

Affiliations Department of Computer Science, University of Calgary, Alberta, Canada, Department of Computer Engineering, Istanbul Medipol University, Istanbul, Turkey, Department of Health Informatics, University of Southern Denmark, Odense, Denmark

ORCID logo

  • Kashfia Sailunaz, 
  • Deniz Bestepe, 
  • Sleiman Alhajj, 
  • Tansel Özyer, 
  • Jon Rokne, 
  • Reda Alhajj


  • Published: April 17, 2023
  • https://doi.org/10.1371/journal.pone.0284418
  • Reader Comments

Fig 1

Brain cancers caused by malignant brain tumors are one of the most fatal cancer types with a low survival rate mostly due to the difficulties in early detection. Medical professionals therefore use various invasive and non-invasive methods for detecting and treating brain tumors at the earlier stages thus enabling early treatment. The main non-invasive methods for brain tumor diagnosis and assessment are brain imaging like computed tomography (CT), positron emission tomography (PET) and magnetic resonance imaging (MRI) scans. In this paper, the focus is on detection and segmentation of brain tumors from 2D and 3D brain MRIs. For this purpose, a complete automated system with a web application user interface is described which detects and segments brain tumors with more than 90% accuracy and Dice scores. The user can upload brain MRIs or can access brain images from hospital databases to check presence or absence of brain tumor, to check the existence of brain tumor from brain MRI features and to extract the tumor region precisely from the brain MRI using deep neural networks like CNN, U-Net and U-Net++. The web application also provides an option for entering feedbacks on the results of the detection and segmentation to allow healthcare professionals to add more precise information on the results that can be used to train the model for better future predictions and segmentations.

Citation: Sailunaz K, Bestepe D, Alhajj S, Özyer T, Rokne J, Alhajj R (2023) Brain tumor detection and segmentation: Interactive framework with a visual interface and feedback facility for dynamically improved accuracy and trust. PLoS ONE 18(4): e0284418. https://doi.org/10.1371/journal.pone.0284418

Editor: Gulistan Raja, University of Engineering & Technology, Taxila, PAKISTAN

Received: January 2, 2023; Accepted: March 30, 2023; Published: April 17, 2023

Copyright: © 2023 Sailunaz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Experiment data files. 2022. Available from: https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection https://figshare.com/articles/dataset/brain_tumor_dataset/1512427 https://www.kaggle.com/datasets/sanglequang/brats2018?select=MICCAI_BraTS_2018_Data_Training .

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.


The brain is one of the major organs of human body. It controls most of the nervous system and it is responsible for managing most of the functions of our body [ 1 ]. The brain weighs about 3 pounds and it contains soft tissues, fat, protein, carbohydrate, water and salts [ 2 ]. The soft tissues (i.e. gray matter and white matter) contain neurons, blood vessels and other cells. The gray matter is the outer part of the brain having darker colors and the white matter is the inner part with lighter colors. This sequence is opposite for other major organ of the nervous system, the spinal cord. A tumor, “an abnormal mass of tissue” [ 3 ] may occur due to the deviation of regular cell life cycle or growth or both, may occur in the brain. Although the normal life cycle of a cell is that they grow, then are divided to two cells and eventually die, this cycle may be disrupted and some cells are divided into multiple cells uncontrollably and if they do not die they create a mass which is the tumor. Tumors can be benign (i.e. non-cancerous) or malignant (i.e. cancerous). Benign tumors do not invade other nearby tissues nor do they spread to other organs or parts of the body. Malignant tumors may, however, spread to other organs and invade nearby tissues. Cancer is the disease caused by such malignant tumors [ 4 ].

Brain tumors are tumors that starts in the brain or in the spinal cord [ 5 ]. They are called primary brain tumors if the origin of the tumor is brain or spinal cord. But, if the tumor originated in another part or organ and then spread to the brain then they are called secondary brain tumors or brain metastases. Fig 1 shows a sample of normal and abnormal cell growth for a brain tumor. Brain cancer, independent of how it originated, is one of the 10 deadliest cancers with a quite low 5 year relative survival rate of 32.5% [ 6 ]. 308,102 new brain cancer cases were documented in 2020 and 251,329 people died that year due to brain cancer worldwide [ 7 ]. More recent 2022 statistics for USA patients show that 700,000 people in USA are already suffering from brain cancer including 88,970 new primary brain tumor cases diagnosed and with the possibility of 18,200 deaths for due malignant tumor makes the relative survival rate only 36% [ 8 , 9 ].


  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image


A brain tumor diagnosis includes different types of physical exams, blood tests, urine tests, medical images, biopsies and spinal taps [ 10 ]. Medical images are very popular non-invasive diagnosis tools that may include computed tomography (CT), positron emission tomography (PET) and magnetic resonance imaging (MRI) scans. The CT scan images are generated by combining X-rays taken from different angles to create a 3D view of an organ and it can also detect fluids (e.g., bleeding, blood vessels etc.) and bone structures, whereas PET scans take pictures of organs and tissues with the help of various mostly injected substances. MRI uses magnetic fields to generate details images of organs, tissues and they can also provide information about brain functionalities, chemical composition, blood flow etc. [ 11 ]. Although each type of imaging has its’ own benefits, MRI images are generally preferred for brain tumor imaging as they are less risky and produce clearer images. The different tissues of the brain is shown with various contrasts in different modalities of MRIs according to the imaging parameters like echo time, repetition time, flip angel etc. The four most common modalities for brain MRIs are T1-weighted (T1), T1-weighted contrast-enhanced (T1ce), T2-weighted (T2), and fluid attenuated inversion recovery (FLAIR) [ 12 ]. Generally, there are three anatomical positions for MRIs—axial, coronal and sagittal [ 13 , 14 ] due to three different plane cross sections.

Medical professionals can asses the brain tumor, tumor location, tumor size, tumor area and other tumor properties from MRI images. Researchers have, however, been trying to automate those tasks due to the high cost of having this done by the medical professionals. Initially various conventional methods like thresholding/filtering, morphology-based models, geometry-based models, contouring, region-based models etc. were used for brain tumor detection and segmentation for automated brain tumor image analysis [ 15 ]. Once machine learning (ML) models became popular and showed higher efficiency in classification and image analysis tasks, researchers started to focus more on ML-based tumor detection and segmentation models using supervised, unsupervised and hybrid models [ 16 ]. With the emergence of more advanced artificial neural networks (ANN), deep neural networks (DNN) became more popular for the brain medical image analysis with deep learning (DL) models due to the high performance and accuracy of the outputs [ 17 ]. More recent transfer learning (TL) models and hybrid or ensemble models have also become quite popular in this research field [ 18 ].

Medical image analysis aims at detecting abnormalities from images and then extracting the abnormal region from the images. The first task in brain tumor analysis from brain medical images is therefore called ‘brain tumor detection’. The task is to detect if brain tumor is present in a brain medical image or not [ 19 ]. It can also be represented as an image classification problem where the input image can be classified as either a healthy image/non-tumor image or a tumor image. The second task of the major brain medical image analysis task is similarly ‘brain tumor segmentation’. After having identified that there is a tumor in the brain medical images, the next task is to divide the image into multiple segments or objects based on the similarity and dissimilarity between the different regions of the image. Brain tumor segmentation therefore focuses on segmenting or extracting only the tumor regions from the rest of the image for further analysis of tumor properties [ 20 ]. In this paper, a novel automated system to detect and segment brain tumors from brain medical images is proposed using DL-based approaches.

In our effort to serve healthcare professionals who deal with various types of tumors and infections, we initiated a project to develop a system having a visual interface for dealing with each type of diseases which is capable of identifying infected spots within an image (e.g., MRI, X-Ray, CT, etc.). Our target is to have a decision support system which increases the confidence of experienced professionals and new professionals in assessing MRI images. In this regards, we have already developed for COVID-19 an effective visual interface with feedback capabilities [ 21 ]. In this paper, we describe an automated system for brain tumor detection and segmentation from brain medical images. It is a web application which will help a healthcare professional with an initial screening of the images. The web application based system is able to provide decisions about the presence or absence of brain tumor in a brain medical image including a detection probability and provide a segmentation of the tumor from the brain imaging with a tumor confidence score, tumor area score and tumor ratio score. Fig 2 shows the architecture of the proposed system. The proposed model provides a web application that can be used to upload patient data by the user or it can directly access patient data from the connected hospital or medical databases. After the data collection process, the data is pre-processed and sent to the brain tumor analysis for brain tumor detection and segmentation. Then the generated outputs are pre-processed and the user has the option to provide feedbacks on the results as required.



Fig 3 shows the overall framework for the developed system in more details. Users can upload images (for tumor detection or 2D segmentation) or feature sets (for tumor detection) or nifti files (for 3D tumor segmentation) to the system based on the user requirement. The system provides the options for detection, 2D segmentation and 3D segmentation at the beginning. Then the users can specify if they want to use an image or feature sets for the detection part. For both 2D and 3D segmentation, the users have the choice of applying U-Net or U-Net++ model for the task. Based on the users choices and uploaded inputs, the system uses the saved trained models to predict and segment brain tumors from the user uploaded input data. Then the system provides output decisions (i.e. tumor or non-tumor) or segmentation information (i.e. the segmented tumor) with some performance scores for the users. The users or medical professionals can also provide their feedbacks on the output with the feedbacks being included in the training models to incorporate the professional feedback for future detection and segmentation.



The objectives and major contributions of this research are as follows -

  • provide a complete web application for the medical professionals for two major brain tumor analysis tasks—tumor detection and tumor segmentation—that can be connected to existing medical databases for data access or that can operate independently using user input,
  • deliver multiple types of options for each of the two tasks (i.e. detection with image input and detection with image features, 2D segmentation with U-Net and 2D segmentation with U-Net++, 3D segmentation with U-Net and 3D segmentation with U-Net++) for a more user-friendly application,
  • provide medical professionals with the opportunity to review the outputs generated by the system and include their feedbacks to the application to improve the accuracy of the tasks that can be incorporated with the data for future executions.

The rest of the paper first provides brief summaries of the few related works on similar researches in the next Section. The methodology of our proposed system is explained after that. Then experimental results are included and finally some concluding remarks are provided.

Related works

Although most recent brain medical image analysis systems use DL and ML models due to their high accuracy, a few researchers are still working on improving the conventional approaches such as thresholding, geometry, morphology, contouring etc. Nyo et al. [ 22 ] recently proposed a thresholding and morphology based model for brain tumor segmentation from brain MRIs. After converting the images into grayscale, they removed noise and applied Otsu’s thresholding algorithm to segment the tumor region from the MRI and then used opening and closing morphological operations for post-processing the segmented images. Their method was applied to 110 FLAIR images from BRATS 2015 [ 23 ] dataset for 2, 3, and 4 class models and achieved around 90% accuracy. Another recent thresholding and morphology based model applied a similar method for brain tumor segmentation from brain MRI, while adding a brain tumor severity detection (i.e. benign tumor or normal.) [ 24 ]. They created a dataset combining online datasets and gathering images from hospitals and then pre-processed the 2294 images for the threshold-based segmentation that segmented the tumor from the background. Then morphological operations were applied and a connected component analysis was done to get the solidity, the area and the bounding box of the tumor. The highest density area was extracted and checked with the maximum area of connected pixels. If both were same then the tumor was identified and the image was labeled as a ‘benign tumor’ image. Otherwise the image was considered as a normal/healthy image.

Most of the recent researches on brain tumor detection and segmentation apply conventional approaches as part of the pre-processing or post-processing methods for ML or DL hybrid or ensemble models. Support vector machine (SVM), random forest (RF), fuzzy C-means (FCM), K-means clustering are few of the popular ML models for tumor detection, segmentation and classification [ 25 ]. Hasanah et al. [ 26 ] recently proposed a ML-based brain tumor classification using filtering, contouring and thresholding as part of pre-processing and segmentation of the tumor. The MRIs were filtered with median filtering and the skulls were stripped from the images to use binary thresholding for a contouring algorithm. The largest contour was used for tumor feature (i.e. intensity and GLCM features) extraction. A SVM model used these features to classify the tumors into Glioma, Meningioma and Pituitary tumors with 95.83% average accuracy. A FCM clustering with level set method called fuzzy kernel level set (FKLS) was applied for 3D tumor segmentation in [ 27 ] using a combination of conventional and ML methods. They applied symmetry analysis to create the bounding box for the volume of interest (VOI) and then used FCM and level set methods to minimize the energy function for tumor segmentation. The proposed model showed high Dice score (i.e. 97.62%) on the BRATS 2017 [ 28 ] dataset.

Currently DL-based techniques are the most popular tools for brain tumor image analysis. Convolutional neural network (CNN), recurrent neural network (RNN), visual geometry group (VGG), ResNet, Inception, autoencoders, U-Nets and their variations are the most popular DL models for brain tumor detection, segmentation and classification [ 29 ]. Researchers have also tried to combine conventional method and ML models with DL-models to enhance the performance even more. Chattopadhyay et al. [ 30 ] proposed a DL model with a CNN framework and they tried to include ML method in the CNN for tumor/non-tumor detection. They applied SVM as the activation function at the last layer of the CNN and showed that the accuracy was only about 15% and they therefore moved to using a Softmax activation function thus achieving more than 99% accuracy. Another CNN-SVM hybrid model was used for brain tumor classification into benign and malignant tumor classes in [ 31 ]. After some basic image pre-processing and skull stripping, the tumor images were segmented using thresholding. Then a CNN model extracted the feature maps from the segmented images and the feature maps were fed into a SVM for the final classification. The hybrid CNN-SVM model was compared to the separate CNN and SVM models and outperformed both of them with more than 98% classification accuracy.

Another CNN-based tumor classification model was recently proposed by Ayadi et al. [ 32 ]. They applied a CNN model to three brain tumor datasets (i.e. Figshare [ 33 ], Radiopaedia [ 34 ], and REMBRANDT [ 35 ]) and classified the images into 2 (i.e. tumorous or normal), 3 (i.e. normal, low grade glioma (LGG), high grade glioma (HGG)), 4 (i.e. normal, astrocytoma (AST), oligodendroglioma (OLI), glioblastoma multiforme (GBM)), 5 (i.e. AST grade 2, AST grade 3, OLI grade 2, OLI grade 3, GBM), and 6 (i.e. normal, AST grade 2, AST grade 3, OLI grade 2, OLI grade 3, GBM) classes. Their model achieved overall accuracy of 90.35% without data augmentation and 93.71% with data augmentation which was comparable to the accuracy of similar CNN-based brain tumor classifiers. A CNN-based ensemble model was proposed in [ 36 ] with two stage ensemble model for best feature extractions to classify the brain MRIs into normal, meningioma, glioma and pituitary tumor classes. Three different brain MRI datasets were merged to create a collection of 10620 MRIs and they were used separately and together. They were pre-processed and fed into five pre-trained CNN models (i.e. VGG-19, EfficientNet-B0, Inception-V3, ResNet-50 and Xception) and five classifier (i.e. softmax, SVM, RF, K-nearest neighbor (KNN) and AdaBoost) for choosing the best feature extractor and best classifier respectively. Finally, the final classifier classified the MRIs into 4 classes with more than 99% accuracy. They created a python-based UI for the users to upload brain MRIs for classifying them in real-time and provide confidence percentages for each class.

A CNN variation U-Net [ 37 ] and its’ variations are very popular for brain tumor image analysis and brain tumor segmentation from MRIs. Ilhan et al. [ 20 ] proposed a U-Net based brain tumor segmentation model with tumor localization and enhancement models. After pre-processing the tumor regions were localized and enhanced by using the intensity of the pixels and the standard deviations from the image histogram to separate the tumors from the non-tumor regions. Then the U-net model was applied to segment the tumor and it achieved 0.85—0.94 Dice scores outperforming similar tumor segmentation models. The localization and enhancement of the ROI improved the feature extraction and training of the U-Net by a noticeable amount. Aghalari et al. [ 38 ] proposed a modified U-Net for brain tumor segmentation from BRATS 2018 dataset. 2D slices containing only background were discarded from the 3D data and then the rest of the slices were normalized from T1ce, T2 and FLAIR images. A two-pathway-residual (TPR) block structure was added to the U-Net to extract the global features as well as the local features from the images. The TPR block at every U-Net level sent the extracted global features for concatenation at the next level, hence enhancing the feature maps. The model was trained for segmenting tumors and achieved around 89% Dice scores that outperformed most similar works and was comparable to other DL models.

An image driven U-Net model was proposed in [ 39 ] for brain tumor segmentation for the BRATS 2018 dataset. As the first and last few slices of the 3D MRIs did not contain much information, only slices from 30th to 120th were used for the analysis. After cropping each slice from dimension 240 X 240 to 192 X 192 to crop out the background parts, the Watershed algorithm was used to separate the image into a tumor region and a non-tumor region and finally a Z-score normalization was done before feeding the images into the U-Net. The U-Net model was trained and tested and achieved more than 98% Dice scores for both LGG and HGG tumor segmentation. Das et al. [ 40 ] also worked on U-net but from a different perspective. They experimented on the learning parameters of U-Net for brain tumor segmentation to achieve the optimal results for BRATS 2017 and BRATS 2018 datasets. The input images were pre-processed and normalized with N4ITK bias field correction and then cropped into 192 X 192 slices to be fed into the U-net. Five different types of activation functions (i.e. Tanh, ReLU, leaky ReLU, parametric ReLU and ELU) were applied on the U-Net to segment the complete tumor, the core tumor and the enhancing parts with 97% to 99% accuracy. Their experiments with the activation functions, filter size, pooling, batch normalization and dropout showed better performance for ReLU activation function and average pooling.

U-Net++ [ 41 ] is an extension of the U-Net model that aims to improve the performance of medical image analysis that has been implemented by Hou et al. [ 42 ] for brain tumor segmentation. They applied basic data pre-processing on the BRATS 2018 and BRATS 2019 datasets by normalizing and clipping the slices and then merging all of the modalities together. Then the resulting U-Net++ model with a hybrid loss function of binary cross entropy (BCE) and Dice loss was trained and tested. Their proposed model achieved almost 89% Dice scores for segmenting the whole tumor. Another U-Net++ based ensemble model for Glioblastoma segmentation was proposed in [ 43 ]. This ensemble model used EfficientNet [ 44 ] for data pre-processing instead of the common pre-processing methods. A 3D EfficientNet was applied on each of the four MRI modalities of the BRATS 2021 dataset the and average was used for the classification step. Each MRI was sliced to create 2D images along the axial, coronal and sagittal planes and four modality images were concatenated. A separate 3D U-Net++ was applied on each plane to produce the segmentation and a majority voting model was applied to generate the final output with 90% Dice score for the final tumor segmentation.

nnU-Net [ 45 ], a new self-configuring DL model for biomedical image segmentation tasks is another popular DL model used in medical image analysis now a days. Luu et al. [ 46 ] proposed an extended nnU-Net for brain tumor segmentation in BRATS 2021 challenge. They replaced the batch normalization with group normalization using axial attention mechanism in decoder and applying double filters in the encoder of U-Net. The larger encoder helped to manage the large amount of data more appropriately. Their proposed model was able to achieve more than 90% Dice score with the extended nnU-Net model by modifying only three properties of the original nnU-Net. Axial attention was also used in [ 47 ] for brain tumor segmentation recently. They applied axial attention to extract and use both local and global semantic features from the MRIs more accurately for tumor sub-region segmentations with a hybrid loss function. They chose the 3D U-Net and added the axial attention mechanism at the decoder of the 3D U-Net. The proposed model was tested with BRATS 2019 and BRATS 2021 datasets and achieved more than 84% Dice score outperforming six U-Net based models proving the advantage of axial attention in accurate local and global pixel feature extraction. An ensemble model including DeepSeg, nnU-Net and DeepSCAN for segmenting brain tumor was proposed in [ 48 ] for BRATS 2022 challenge. Both DeepSeg and nnU-Net were inspired by the U-Net architecture and DeepSCAN was inspired by U-Net and DenseNet. The ensemble model combined these three DL models and applied an expectation-maximization method used for medical image segmentations called simultaneous truth and performance level estimation (STAPLE). Their ensemble model achieved more than 88% Dice scores outperforming all three individual models. Their performance ranking showed that among the three models, nnU-Net performed the best and DeepSeg achieved the lowest rank.

Vijay et al. [ 49 ] recently proposed an extended U-Net called SPP-U-Net by replacing the residual connection with attention blocks and spatial pyramid pooling (SPP). The attention blocks added at the levels of the decoder enhanced the features by incorporating more local pixel features with their global feature dependencies. The SPP blocks collected the information from all encoder layers to provide more specific data for reconstruction at the decoder. Their experiments on the BRATS 2021 dataset with variations of presence and absence of the SPP blocks and showed that the models achieved comparable results. The results with SPP were better in average and the best result was achieved with one SPP block with almost 87% Dice scores. A different U-Net based approach with transformers was applied in [ 50 ] for brain tumor segmentation task on BRATS 2019, BRATS 2020 and BRATS 2021 datasets. Their hybrid model combining CNN and transformers implemented the shifted window based swin transformer blocks that were able enhance the learning process and achieved 81.15% Dice score outperforming similar transformer-based brain tumor segmentation models. Lin et al. [ 51 ] also proposed a CNN-transformer hybrid model for brain tumor segmentation called CKD-TransBTS (i.e., clinical knowledge-driven brain tumor segmentation). Their dual-branch hybrid encoder was able to extract the correlations between different modalities of MRIs, extracted more precise features from the fusion of multimodal MRIs. They also added a hybrid transformer-CNN block for each encoder layer to calibrate the features better. They grouped the inputs into two categories—T1, T1Gd and T2, FLAIR and their proposed novel model achieved more than 90% Dice scores outperforming basic U-Net, U-Net++, transformer based U-Nets and similar networks. Generally, in recent BRATS dataset based brain tumor segmentation, part of the solution contains the segmentation of different tumor tissues like necrotic, edema etc. But few researchers also extracted other tumor properties like tumor area, volume, location etc. [ 52 ]. Recently, Nalepa et al. [ 53 ] proposed an end-to-end pipeline for tumor sub region segmentations for both pre and post operative data with DL and then computed the bidimensional and volumetric properties of the tumors with a new RANO (i.e., response assessment in neuro-oncology) computation. They also proposed an efficient manual annotation process and discussed their experiments on pre and post operative data and their proposed model was able to achieve comparable performances. Their research outputs provided a new direction for various brain tumor property extraction and keeping track of patients state before and after surgery. Some UI based researches mentioned below also included some tumor property computations.

Some researchers also worked on basic UI-based systems to automate the tumor detection process. Some of these researches focused on brain tumor/non-tumor detection or brain tumor type classification using basic ML or DL models with various image intensity and texture features. Abdullah et al. [ 54 ] worked on a Matlab simulator for tumor/non-tumor detection and tumor area segmentation using a cellular neural network. MRIs collected from KPJ Penang specialists were used to train the network with some modified templates for corner detection (i.e. template 1), edge detection (i.e. template 2) and hole filling (i.e. template 3). The templates helped to detect the presence or absence of a tumor in the uploaded image. A ML-based benign or malignant tumor detection UI was provided in [ 55 ] for brain MRIs. A median filter was used to pre-process the images and then a hybrid of Otsu binarization and K-means clustering was applied for segmenting the images. Thirteen intensity and GLCM features were then extracted from the segmented images to train a SVM model for classifying the image into benign or malignant tumor classes. The proposed model achieved about 100% accuracy in classification.

Boudjella et al. [ 56 ] proposed a KNN based prediction model implemented in a graphical user interface (GUI) for brain tumor detection. A dataset with tumor and non-tumor labeled images were used for six features extraction (i.e. mean, variance, standard deviation, entropy, skewness, kurtosis) which wa then used to train a KNN model for image classification. The model parameters were adjusted to get the optimal outputs with k values between 1 to 20. A GUI was developed where the users can enter six features, test size and k value for the KNN classifier. The GUI can then generate the prediction with more than 80% accuracy and display the relevant patient information. A similar web-based software for tumor classification that provides the UI options in both English and Turkish was proposed in [ 57 ]. They applied CNN with python AutoKeras libraries on T1-weighted brain MRIs to classify the input image into meningioma, glioma and pituitary tumors. The users can upload .jpeg, .jpg or .png T1-weighted brain MRIs to the system and the classification prediction appears as output with 94% to 98% prediction accuracy.

Khan et al. [ 58 ] also proposed a UI-based system with Matlab for brain tumor detection and classification with SVM for three classes—normal scan, benign tumor, malignant tumor. The input images were pre-processed and then first order (i.e. mean, standard deviation, entropy, kurtosis, skewness, energy) and second order (i.e. smoothness, contrast, homogeneity, correlation, inverse different moment (IDM)) features were collected from both benign and malignant training data. The users can upload an image to the UI and the features are extracted to generate the final decision on the tumor class. Very recently, a mobile application for tumor/non-tumor detection model was proposed in [ 59 ]. The image datasets were pre-processed and fed into a simple CNN model for tumor/non-tumor classification. The trained models was then used for the mobile application where a user can select image by taking a photo with a mobile camera or select an existing image to upload. While uploading the image, the user can crop the image to remove background and then the uploaded image is classified into tumor or non-tumor class with percentages and the higher percentage class is the prediction. The proposed model achieved more than 77% true positive (TP) and true negative (TN) rates.

There are further researches done in this field using other types of ML, DL, TL and hybrid models for brain tumor detection and segmentation from brain medical images which have other advantages and limitations. The researches that provide some type of UI have mostly worked on brain tumor detection or classification. Some of them applied tumor segmentation as part of their detection or classification, however, this was not the major output from their algorithms. Most of the existing UI based researches focused on one task with a limited scope and restricted input and output types.


The proposed system represents a complete interactive framework for achieving various brain MRI analysis tasks to assist medical professionals. Although the web application framework is designed in a way that it is capable of adding any trained ML and DL models for both the detection and segmentation tasks, some of the well-performed DL models are added to the application currently for detection and segmentation. More recent DL models will be added for the users in future to choose from for each task to provide them with more options for detection and segmentation tasks. The current implementation includes CNN models for the tumor detection tasks and U-Net and U-Net++ models for tumor segmentation tasks.

As mentioned earlier, in this paper, CNN models are used for brain tumor detection from brain MRIs and features collected from MRIs. Information shared in recent brain tumor image analysis reviews [ 17 , 60 ] showed that despite the usage of newer DL models, CNN and its’ variations, and hybrid models containing CNN models are still widely used in medical image analysis researches. Another literature review [ 61 ] recently showed that CNN models had the highest amount (i.e. 32%) of researches that used DL methods like CNN, TL, Encoder-Decoder, HDL etc. for brain tumor images analysis. Similarly, U-Net has been broadly used for different medical image segmentation with high performances compared to previous DL models. Although there are various old and new DL models (i.e. attention-based models, LSTM, encoder-decoders, TL models, cascaded networks, etc.) used in literature, U-Net and variations of U-Nets are still few of the most popular DL models for brain tumor image analysis as mentioned in the recent brain tumor analysis literature reviews [ 61 , 62 ]. Hence, the CNN, U-Net and U-Net++ models are chosen for implementation and experiments in this paper and more DL models will be added to the framework in future.

The proposed system contains a web application for the automation of tumor detection and tumor area segmentation from 2D and 3D images using a few DL models. The rest of this section includes more details on the image features, DL models (i.e. CNN, U-Net and U-Net++) and the algorithms used for the complete process of the proposed automated system.

Image features

Various features can be extracted from the pixels of an image to understand their characteristics for further analysis. In this paper, we focused on the image intensity features, discrete wavelet transform (DWT) features, gray level co-occurrence matrix (GLCM) features and texture based features. These features are discussed as follows. First order histogram or image intensity based features depend on the pixel values of an image [ 63 ]. Four intensity features such as mean, variance, skewness and kurtosis are extracted for the analysis of images in this research. The mean value can be computed by summing up the pixel values and dividing the summation by the total number of pixels in the image. The variance value is an indication for how much the pixel values are spread out. The skewness refers to a measurement of the asymmetry of the pixel values in the histogram. Finally, the kurtosis represents the flatness or peakedness of the pixel values distribution in the histogram. The intensity values are normally calculated after converting the image into a grayscale image.

The GLCM features represent the frequency of different grayscale level combination occurring together [ 63 ]. The co-occurrence matrix computes the relative frequencies of the co-occurrences of the neighbor pixels. The contrast refers to the number of variations that exists in the image, dissimilarity represents the distance between the co-occurrences of two pixels based on their joint probability, whereas homogeneity represents the similarity and increases with low contrast. Angular second moment (ASM) is another measurement of homogeneity, the energy is the frequency of repetition of pixel pairs, and the correlation computes the grey level linear dependency of the image. The DWT transforms an image in order to reduce the dimension of the image by dividing it into four parts—low-low (LL), low-high (LH), high-low (HL) and high-high (HH) containing the low frequency sub-bands, horizontal features, vertical features and diagonal features respectively, that covers the full frequency spectrum of the original image [ 64 ]. The DWT-Coefficient represents the difference between the wavelet function and the analyzed signal of the image. A few other texture-based features like entropy, local binary pattern (LBP) and Haralick features were also used in this paper [ 63 , 65 ]. The entropy computes the randomness of the pixels, LBP represents the texture of the image by thresholding neighbor pixels based on a specific pixel and the Haralick features provides the texture of the image from the normalization of the GLCM.

The data pre-processing and few dimensions may need to be changed based on the input image dimension (i.e. 2D or 3D) for the CNN, U-Net and U-Net++ and are explained in the ‘Experimental Setup’, the basic structures of CNN, U-Net and U-Net++ are discussed here.

Convolutional neural networks (CNN) are among the most popular models for image classification, segmentation and analysis that have been developed over the last few years. The convolution layers of the model extracts different aspects of the features at each layer and incorporate these for an improved analysis of an image [ 66 ]. Medical images need more accurate feature extraction and an extensively well-trained feed-forward ANN to classify or segment pixels for output generation and for this CNN and its’ variations are the most frequently used models for medical image analysis tasks [ 67 ]. A CNN model uses a structure similar to the structure of our visual cortex (i.e. the primary region of our brain that receives and processes visual information [ 68 ]). The CNN model are trained on large image datasets with class labels in order to learn from the features automatically extracted at different convolution layers of the model and predict the labels for unknown test data based on the previously learnt patterns from training images. Fig 4 shows a sample CNN network with the basic few layers and nodes.



A CNN model reduces the total number of parameters by a large amount compared to a fully connected neural network. Instead of collecting data from the whole image at once, CNN scans the input image in blocks of n x n sliding windows (i.e. filter or kernel) at every convolution layer. The n x n block size is called the kernel size and it varies based on input and application type. The stride size is the number of pixels the sliding window moves at one step. The convolution reduces the dimension of the input without losing any important information in the image collected by the sliding window. So, at each convolution layer, a component-wise multiplication is done based on the kernel at every stride and the results are summed up to create the output for each pixel and finally the feature map is generated for the convolution layer. A pooling layer is applied after the convolution layer which has a similar operation as the convolution, except that it takes the average or maximum of the pixels to generate the feature map. So, it downsamples the output of the convolution layer with generally the same kernel size and stride size as the convolution layer. The number of convolution layers, neurons and pooling layers may vary based on the application. The number of hidden layers (i.e. depth of the network) can be varied and tested in order to find the optimal structure. In some cases some additional layers like batch normalization, dropout etc. may be added after each convolution [ 69 ]. The batch normalization is used to normalize the feature maps created for the next layer to make the computation faster and decrease the possibility of overfitting the model. A dropout layer is also added to avoid overfitting by randomly dropping out some neurons (i.e. setting the weight to zero). A flattened layer is used to convert the multi dimensional feature maps generated by the convolution layers into a one dimensional vector for the fully connected layer (i.e. a dense layer). Every neuron of the output of the fully connected layer is connected to each neuron of the input of that layer with different weights. An activation function is used in the dense layer to apply a non-linear transformation to generate the output by deciding which neurons should be activated during the transform.

A few more hyperparameters are used in CNNs like learning rate, loss function, optimizer, epochs, momentum, batch size etc. A loss function is a function that computes the differences between the target output and predicted output of the network to check the performance of the model. The goal is to achieve a minimum loss. The optimizer is a method to update the hyperparameters of the network to minimize the loss function and achieve optimal output. The learning rate is defined so as to control the amount of modifications introduced to the model hyperparameters to minimize loss. A higher learning rate can speed up the learning process of the model, however, it can lead to divergence and lower learning rate which slows down the learning process but gradually achieve convergence. The momentum decides the amount of changes needed based on the previous steps to avoid getting lost in local maxima by controlling the oscillation of the model. The epoch size represent the number of times the training model passes through the complete dataset and the batch size refers to the number of samples from the dataset passed through the network at a time. The hyperparameter selection is a crucial step for every DNN as the performance of the network depends largely on these hyperparameters.

U-Net is a variation of CNN specially proposed and developed for biomedical image segmentation and abnormality detection in medical images [ 37 ]. The U-Net is composed of a symmetrical U-shaped architecture containing a contracting path with convolution layers and an expansive path with transposed convolution layers represents the U-Net framework as shown in Fig 5 . The contracting path has four levels of downsampling for the feature maps and the expansion path has four upsampling levels with a bridge to connect them. Each downsampling level includes two consecutive convolution layers for deep feature extraction and a max pooling layer to prepare the input for the next level. After the four downsampling levels, a bridge with just the two convolution layers is applied to pass the feature maps to the expansive path. Each level of the expansive path contains a similar structure but with a transposed convolution for feature extractions and two consecutive convolutions for upsampling after concatenating the feature maps generated by the transposed convolution and the corresponding feature map generated by the same level of the contraction path through skip connections. The concatenation is done to enhance the feature maps by combining the features of previous levels. The kernel size of U-Net is 3X3, the stride size is 2X2, the max pooling size is 2X2, the activation function used for all levels is a rectified linear unit (ReLU) and the output layer activation function is sigmoid. The hyperparameters are tuned or modified based on the type of application and inputs.



U-net++, which is a variation of U-net to improve the medical image abnormality detection, was proposed in 2018 [ 41 ]. U-Net++ is a nested U-Net architecture containing more convolution layers on the skip paths to reduce the semantic gaps between the feature maps of the same level of contracting and expansive layers, additional skip connections and deep supervision for a denser U-Net structure. Fig 6 shows the basic U-net++ framework. The backbone of the U-Net++ (in black in Fig 6 ) is the basic U-Net structure, but the components on the skip pathways (in green and blue in Fig 6 ) and the deep supervision (in red in Fig 6 ) are the additions to the U-Net backbone which creates the U-Net++. The added convolution blocks on the skip pathways include a transposed convolution of the feature map used as input, then merges the feature map with the feature maps generated by the previous nodes and previous levels and then apply two consecutive convolution to upsample the feature map for the next node. The same computation is followed for all the nodes present in the skip pathway. As each node the feature maps from the previous nodes and levels are merged and the gap between the feature maps of the contracting and expansive path on the same level is reduced. As the nodes now have more similar feature maps, this structure improves the training time and performance. Unlike the U-Net structure, the deep supervision allows U-Net++ to combine the outputs from all branches to produce a more accurate prediction for the model. The hyperparameters used in the U-Net++ are the same as the hyperparameters that are used in the U-Net.



Brain tumor analysis tasks

The two major brain tumor image analysis tasks are as note earlier in this paper: -i) brain tumor detection, and ii) brain tumor segmentation. Both of these tasks are implemented in our web application. The details of each task are now discussed.

Tumor detection.

The tumor detection models uses the Kaggle dataset [ 70 , 71 ] to train and test the models. The dataset includes images with names ‘Yxx’ or ‘Nxx’ where xx are numbers. ‘Y’ represents yes or tumor and ‘N’ refers to no or non-tumor. The first task before applying the detection models on the images is to clearly label each image for classification. Algorithm 1 shows the steps of the data labeling process. The images with tumors are labeled as 1 and the other images are labeled as 0 with the labels being stored in a file together with the image location. After labeling the images, some basic pre-processing steps are applied to the images. If the input image is a dicom file, then it is converted into a .jpg file for the further analysis. Then the input image is converted into a 2D format and resized into the dimension 256X256. Finally, the image is normalized so that the pixel scores are in the range 0 to 255. Algorithm 2 shows the image pre-processing steps for the tumor detection model.

Algorithm 1 Tumor/non-tumor image labeling

Require : Input image, Image name

  image _ class ← image _ name [0]    ▹ Gets the first character of the image_name

  if image_class == ‘Y’ then

   image _ label ← 1     ▹ 1 = Tumor

   image _ label ← 0    ▹ 0 = Non-Tumor

Algorithm 2 Image pre-processing

Require : Input image

  if image_type == DICOM then

   Convert dicom to jpg

  Convert image into 2 D format

  Resize image into dimension 256 X 256

  Normalize each pixel score between 0 to 255

The tumor detection model allows the user to either choose the image as input or the features as input for the tumor/non-tumor detection. Based on the user’s choice, the corresponding model is applied on the input to classify it into tumor or non-tumor classes and then calculating the evaluation scores as mentioned in Algorithm 3. If the user chooses the image as input, then the model trained with Algorithm 4 is applied and if the user choice is features the trained model from Algorithm 5 is used The tumor detection from image training model uses a simple CNN model for classifying the image by extracting deep features using the convolution layer. The feature maps extracted by the network are used to classify the image. The CNN model is trained with the pre-processed images and the image labels. The training model splits the dataset into training, validation and testing datasets. Then the ImageDataGenerator [ 72 ] process is used for data augmentation. The augmented data is then used to train the CNN model and the trained model is saved for future detection. The model is then tested with the test dataset. The model predicts a value and if the predicted value is greater than 0.5 then the test image is assigned to class 1, otherwise it is assigned to class 0. Class 1 refers to a tumorous image and class 0 refers to a healthy/non-tumorous image. The tumor detection from image features also uses a similar trained CNN model and trains the model with image features. The trained model is saved for the predictions of user inputs and follows similar steps as for the tumor detection process.

Algorithm 3 Tumor/non-tumor detection

Require : Input image, Image features

  if user_choice == image then

   Apply tumor / non − tumor detection from image trained model

   if user_choice == features then

    Apply tumor / non − tumor detection from features trained model

  Provide detection output

  Generate evaluation scores

Algorithm 4 Model for tumor/non-tumor detection from image

Require : Pre-processed image, Image label

  Split dataset into training , and testing set ( i . e . 80 : 20)

  Split training set into training and validation set ( i . e . 80 : 20)

  Apply image augmentation on training set using ImageDataGenerator

  Apply image augmentation on validation set using ImageDataGenerator

  Apply augmented data to train CNN model

  Save CNN model weights, performance scores

  Test trained model with testing dataset

  if predicted_value > 0.5 then

   test _ class ← 1

   test _ class ← 0

  if test_class == 1 then

   test image is tumorous

   test image is non − tumorous

  Generate classification report

Algorithm 5 Model for tumor/non-tumor detection from features

Require : Image name, Image features, Image label

  Shuffle dataset randomly

  Apply data to train CNN model

   if test_class == 1 then

  test image is tumorous

Algorithm 6 is used for extracting the features from the images. The input image is converted into 2D format and resized into the dimension 256X256. Then the intensity, DWT, GLCM, entropy, LBP and Haralick features are extracted separately. After saving each type of features against the image label, they are combined to have the complete feature set and saved with the corresponding image label. The features can be used separately or together to classify images as tumorous or non-tumorous.

Algorithm 6 Feature extraction from image

Require : Input image, image Label

  Extract intensity features

  Extract DWT features

  Extract GLCM features

  Extract Entropy features

  Extract LBP features

  Extract Haralick features

  Save every feature against image _ name , image _ label

  Combine all features

  Save combined feature set against image _ name , image _ label

Tumor segmentation.

The tumor segmentation process is applied based on user choice. As mentioned in Algorithm 7, the user can choose between the 2D segmentation and 3D segmentation. If the user chooses 2D segmentation, then they can either U-net or U-Net++ when uploading the image. The same process is followed for 3D segmentation. The only difference is that the 3D segmentation requires four nifti files as inputs for T1, T2, T1ce and FLAIR modalities. The chosen model is applied on the input and the tumor segmentation with performance scores are shown as outputs.

Algorithm 7 Tumor segmentation

Require : Input image (2D) or Input nifti files (3D)

  if user_choice == 2D segmentation then

   Apply 2 D segmentation image pre − processing

   if user_sub_choice == U-Net then

    Apply 2 D U − Net tumor segmentation model

    if user_sub_choice == U-Net++ then

     Apply 2 D U − Net ++ tumor segmentation model

   if user_choice == 3D segmentation then

    Upload nifti files for T 1, T 2, T 1 ce and FLAIR modalities

    Apply 3 D segmentation image pre − processing

    if user_sub_choice == U-Net then

     Apply 3 D U − Net tumor segmentation model

     if user_sub_choice == U-Net++ then

      Apply 3 D U − Net ++ tumor segmentation model

     end if

  Provide segmentation output

The 2D segmentation is applied after few basic pre-processing on the image as mentioned in Algorithm 8. If the input image is in dcom format, then it is converted into .jpg or .png format and transformed into grayscale. The dimensions are then resized into 512X512 and the pixels are normalized between 0 and 1. Then the image is expanded to 3D and downsampled to 128X128 and the chosen DL model is applied on it. The pre-processing in Algorithm 9 is the pre-processing needed for the 2D segmentation training model in the training and testing phase discussed in Algorithm 10. The training pre-processing is similar to the automated system input pre-processing with only the added tumor mask input and the tumor mask pre-processing required for the training.

Algorithm 8 2D segmentation image pre-processing

  Convert image into grayscale

  Resize image into dimension 512 X 512

  Normalize image pixels between 0 to 1

  Expand image to 3 D by adding 1 as channel

  Downsample image to 128 X 128

Algorithm 9 2D segmentation image pre-processing for training model

Require : Input image, tumor Mask

  Extract . jpg files for MRI and tumor from . mat files

  Expand tumor mask to 3 D by adding 1 as channel

  Downsample image to‘mask to 128 X 128

The training and testing models for U-Net and U-Net++ for 2D tumor segmentation shown in Algorithm 10. The pre-processed dataset and tumor masks are divided into training, validation and testing sets and then both the MRIs and tumor masks are flipped right and left and added to the training data for data augmentation. The updated training data is then processed by randomly changing the brightness level and zoom ranges to create random changes for data augmentation. The augmented data is used to train U-Net and U-net++ and the models are saved and tested to generate performance evaluations.

Algorithm 10 Model for 2D brain tumor segmentation

Require : Pre-processed image, Tumor mask

  Flipping training data

  for imageinimages , maskintumor _ masks do

   rflip _ image ← right _ flip ( image )

   rflip _ mask ← right _ flip ( mask )

   lflip _ image ← left _ flip ( image )

   lflip _ mask ← left _ flip ( mask )

  training _ image _ new ← add ( training _ image , rflip _ image , lflip _ image )

  training _ mask _ new ← add ( training _ mask , rflip _ mask , lflip _ mask )

  Data generator for training data

  for random image in training _ image _ new , random mask in training _ mask _ new do

   bright _ image ← brightness _ range ( image )

   bright _ mask ← brightness _ range ( mask )

   zoom _ image ← zoom _ range ( image )

   zoom _ mask ← zoom _ range ( mask )

  Use augmented data for model training & validation

  Train U − Net / U − Net ++ model

  Store model weights , evaluations

The 3D segmentation process uses a very similar structure as the 2D segmentation process. The user input is pre-processed with Algorithm 11 then U-net or U-Net++ is applied to generate the tumor segmentation and performance scores, whereas Algorithm 12 is used to pre-process the input files and tumor masks for training the 3D U-Net and U-Net++ models as shown in Algorithm 13. For the 3D segmentations, the user needs to upload four nifti files, one for each modality (i.e. T1, T2, T1ce and FLAIR). The system then computes the mean and standard deviation for each of them and applies standardization on the images. As the first few slices and last few slices of the images do not contain much information, only the middle 70 slices from 155 slices in total (i.e. from slice 60 to 130) are stored for computation. Then they are resized to the dimension 128X128 and expanded. All four modality slices are then concatenated to create one 3D image for the DL models. After the U-Net or U-Net++ model is applied, the system generates the segmentation output with performance scores. The 3D segmentation training process uses the same training, validation and testing dataset divisions as the 2D process. After pre-processing the four modality files and the tumor mask files (according to Algorithm 12), the DL models are trained and tested following the steps in Algorithm 13.

Algorithm 11 3D segmentation image pre-processing

Require : Input nifti files for 4 modalities

  for each modality file do

   mean _ image ← mean ( image )

   std _ image ← std ( image )

   standard _ image ← standardization ( mean _ image , std _ image )

   for image_slice in range(60,130) do    ▹ Taking middle 70 slices from 155

    Resize slice to 128 X 128

    Expand slice dimension

  preProcessed _ image ← Concatenate ( T 1, T 2, T 1 ce , FLAIR )

Algorithm 12 3D segmentation image pre-processing for training model

Require : T1, T2, T1ce, FLAIR, Tumor mask

   for image_slice in range(60,130) do    ▹ Taking the middle 70 slices from 155

  mask [ mask ! = 0] ← 1

  for mask_slice in range(60,130) do    ▹ Taking the middle 70 slices from 155

   Resize mask _ slice to 128 X 128

   Expand mask _ slice dimension

Algorithm 13 Model for 3D brain tumor segmentation

Require : Pre-processed Image, Tumor Mask

The post-processing for the detection and segmentation tasks is simple and follows the steps in Algorithm 14. The detection post-processing simply computes the probability of the output class and shows that as the performance score. For the segmentation, the confidence score, the tumor area and the ration of the tumor compared to the brain area are calculated with the segmentation image output.

Algorithm 14 Post-processing results

Require : Input image, Segmented tumor

  Compute detection probility scores

  Compute segmentation confidence scores

  Compute tumor area

  Compute tumor ratio

Web application

The web application developed for the proposed system can use user input (i.e. data uploaded by the user) or can access hospital data from a picture archiving and communication system (PACS). PACS can be integrated into the system and through that the system can connect to the imaging system of any hospital to access data. Any user or healthcare professionals can also use the browser to access the system from the client side. At the server side, a Gunicorn WSGI server is used to run the main Flask application and a PostgreSQL database (i.e. application DB) is used to store the data for the complete system. The detection, segmentation and PACS communication at the server side are designed as subprocesses so that they can be edited, added or removed easily. The subprocesses are independent, so they can use any programming languages or format without disrupting the main application. The PACS communication process uses C-ECHO request to create the communication with the PACS, C-FIND request to search for specific data, and C-MOVE request to request the selected medical images from the system. The Jinja2 [ 73 ] template engine with Flask is used to generate HTML content at the client side. The pages on the client side are generated by the Jinja2 template engine, whereas the PACS and feedbacks functionalities include additional content with HTML, Javascript and the asynchronous queries in PACS applied jQuery. Currently, the web application is deployed to a local development server (Intel Xeon Gold 6134 CPU, Nvidia P100 16GB GPU, 128GB RAM). The hospital systems (i.e., PACS) options are added to the proposed architecture to enable the medical professionals and/or institutes to incorporate our medical image analysis system into their existing databases, applications etc. The real-time usage of the proposed system in any healthcare organization will require some additional functionalities for the anonymization of patient information according to the rules and regulations of the organization and the ethical obligations of the state/country. The procedure of implementing these functionalities will depend on the conditions of the organizations and/or state/country, hence may vary for every organization. We will update the anonymization process according to these conditions during real-time usage of our proposed system. Fig 7 shows the web application architecture components and their connections between each other.



Experimental results

The datasets, experimental setup for all models, the web application outputs and the model results are discussed in details in this section.

Different datasets were used to train and test the aforementioned models. The tumor/non-tumor detection models were trained and tested with the Kaggle tumor dataset [ 70 , 71 ]. The dataset has both 2D and 3D images in .jpg, .jpeg and .png formats. The axial plane are visible in all 253 files. 155 of them are in the ‘Yes’ folder representing them as tumorous images and 98 of them are in the folder labeled ‘No’ referring to no tumors. The image dimensions and sizes are not consistent hence they need to be resized before they are used in this implementation. The 2D segmentation models were trained and tested with the CjData [ 33 , 74 , 75 ]. The dataset include 3064 T1-weighted contrast-enhanced MRIs of all three anatomical planes (i.e. axial, coronal, sagittal) from 233 patients suffering from three types of brain tumors—meningioma (708 images), glioma (1426 images) and pituitary tumors (930 images). The dataset contains 3064 .mat files each including patient ID, tumor type, tumor border, MRI and tumor mask (i.e. ground truth). The dataset was transformed by extracting the images and masks in 2D 512 X 512 images in .jpg format from the .mat files before applying it to the segmentation models. The 3D segmentation models used the most popular benchmark dataset for brain tumor image analysis—the BRATS dataset from the MICCAI brain tumor segmentation (BraTS) challenges [ 76 ]. In this paper, we applied the BRATS 2021 [ 77 – 82 ] dataset for our analysis. The BRATS 2021 training dataset contains 1251 folders in .nii .gz format. Each folder contains four 3D MRIs for the same patient for four modalities (i.e. T1, T2, T1ce and FLAIR) and one segmentation file (i.e. tumor ground truth) in the axial plane. The MRIs are in 3D nifti formats and they all have the size 240 X 240 X 155. Table 1 shows a summary of the datasets used in this paper.



Experimental setup

The DL models used for the tumor detection and segmentation are now discussed in detail. In all implemented models, the datasets were divided with a 80-20 distribution. 80% of the data was first separated and then the remaining 20% was kept as the testing data. Then the first 80% of the data was again divided with a 80-20 ratio to have 80% data as training data and the remaining 20% of the data as validation data. So, the test data fro all models were completely new to the trained models. Python [ 83 ] was used as the programming language to implement the tumor detection and segmentation models for this paper.

The tumor detection task was implemented using a simple CNN with two different input types. In one implementation, the original brain MRI was used as input and the deep features were generated by the model for the detection task. In the other implementation, the image features were extracted beforehand from the MRI and the features were used as inputs to the CNN model for the final tumor/non-tumor classification task. For the CNN that used the image as input, we implemented a CNN with input size (256, 256, 3) and trained the model for 50 epochs. The first convolution layer was included with filter size 32, kernel size 8 X 8 and activation function ReLU. Then a dense layer with unit 32 and activation function ReLU was included. Then a 2 X 2 maxpooling layer and a dropout layer with a dropout of 0.2 was added. The second convolution layer had filter size 64, kernel size 8 X 8, activation function ReLU and was followed by a dense layer with 64 units and ReLU activation function, a 2 X 2 maxpooling layer and a dropout layer with 0.2 dropout. Lastly, a flatten layer was added before the final dense layer to generate the output. The final dense layer had unit size 1 and the sigmoid function as the activation function. The model used binary cross entropy as loss function and the RMSprop optimizer with a learning rate of 0.0001. The total number of parameters for the model was 352,609 where all of them were trainable. The CNN structure is shown in Table 2 .



The CNN model that used the MRI features had a similar structure. In the first convolution layer with the ReLU activation function, the filter size was 64 and the kernel size was 2. Then a dense layer was added with 32 units with ReLU activation function followed by a dropout layer with 0.2 dropout. Then another dense layer with 16 units and ReLU activation function was added to the network. A maxpooling layer was included before another dropout layer with 0.2 dropout. Finally, a flatten layer was used before the final dense layer with unit size 1 and sigmoid activation function. There was 2,817 trainable and total parameters. The model was trained for 100 epochs with an Adam optimizer and a binary cross entropy loss function. Table 3 shows the details of the model structure. The Kaggle dataset [ 70 , 71 ] and the features extracted from the dataset were used for the detection tasks.



The tumor segmentation was applied with four different models—2D U-net, 2D U-Net++, 3D U-Net and 3D U-Net++. Both of the 2D models had the same sets of hyperparameters and both of the 3D models applied the same sets of hyperparameters. The 2D U-Net model followed the U-Net structure [ 37 ] and the 2D U-Net++ model used the U-Net++ structure [ 41 ] as published. For the 2D U-Net, each layer of the contracting path included two consecutive convolution blocks with a kernel size 3 X 3, ReLU activation function and they were followed by a maxpooling layer with pool size 2 X 2 and stride size 2 X 2. Then a batch normalization with momentum 0.8 and a dropout layer with 0.1 dropout were added. The same structure was repeated for the complete contracting path and the filter sizes were 64, 128, 256, 512 respectively. The bridge layer between the encode and decoder had two consecutive convolution blocks with kernel size 3 X 3 and filter size 512. On the expansive path, the filter sizes were 512, 256, 128, 64 respectively. Each layer of the expansive path had a transposed convolution block with kernel size 3 X 3, stride size 2 X 2. After the concatenation layer, a dropout layer with 0.1 dropout was added. Finally, two consecutive convolution blocks with kernel size 3 X 3, activation function ReLU were included. The final output convolution layer of the 2D U-Net had filter size 1, kernel size 1 X 1 and activation function sigmoid. The model was tested with variations of parameter values and the optimal parameter set was used for training the model. The model was trained for 60 epochs with batch size 8 with an Adam optimizer as the optimizer function with a learning rate of 0.001. The U-Net++ layers had the same sets of hyperparameters as the 2D U-Net. Both models used a hybrid loss function that computed the Dice loss and binary cross entropy separately and then they were added together with 0.5 weight for each. The 2D U-Net model used total 22,718,529 parameters and 22,718,529 of them were trainable whereas 1,920 of them were non-trainable. Similarly, the 2D U-Net++ model applied 22,498,881 parameters in total (22,496,961 trainable and 1,920 non-trainable). Tables 4 and 5 show the structures of the models. The 2D segmentation models were applied to the CjData [ 33 , 74 , 75 ].





The 3D U-Net and 3D U-Net++ models used the BRATS 2021 [ 77 – 82 ] dataset and they followed the same structure as the original U-Net and U-net++, with some additional modifications which were done to adjust the dimension changes for the 3D images. Again, the same sets of hyperparameters were used for the 3D U-net and the 3D U-Net++ models. The input size for the models were (128 X 128 X 4). Each layer of the contracting path had two convolution layers with kernel size 3 X 3, activation function ReLU where each convolution layer was followed by a batch normalization layer with a momentum of 0.8. Then the contracting layer had a final maxpooling with pool size 2 X 2 to generate the input feature map for the next layer. The filter sizes of the contracting layer were 64, 128, 256, 512 respectively. The bridge layer between the contracting and expansive path had a filter size 1024, kernel size 3 X 3 and ReLU activation function. Each layer of the expansion path had a transposed convolution layer with kernel size 2 X 2 and stride size 2 X 2. After the concatenation, there were two consecutive convolution layers with kernel size 3 X 3, activation function ReLU and each of them was followed by a batch normalization layer. The filter sizes for the expansive path were 512, 256, 128 and 64 respectively. The final convolution layer of the model had filter size 1, kernel size 1 X 1 and used the sigmoid activation function. The 3D models were trained for 50 epochs with batch size 8, Adam optimizer, 0.0001 learning rate and the Dice loss function. The 3D U-Net and 3D U-Net++ models had 31,055,873 (31,044,097 trainable and 11,776 non-trainable) and 24,266,977 (24,266,737 trainable and 240 non-trainable) parameters in total respectively. Tables 6 and 7 show the structures of the models.





Web application UI

The web application for the proposed brain tumor detection and segmentation system can be browsed from the home page as shown in Fig 8 . The home page includes a short description of the system and the options to either directly upload an image for evaluation or using the PACS to access medical images from hospitals. If the user is a registered user then they can login to the system with their ID and password as shown in Fig 9 . Fig 10 shows the process for new users who can also register to the system with an ID, given name, surname and password to use the system. If the user wants to directly access the medical image from the hospital system, they can access the ‘Evaluate with PACS’ option from the homepage which leads to the PACS page as shown in Fig 11 . The user can search with a valid patient ID and select the study type to choose a medical image. Then they can enter the study ID and choose the model they want to apply on the image. Finally, after entering all fields, they can click on the ‘Evaluate’ button to evaluate the image with the chosen DL model and get the results.









After the user logs in to the system, they have three options—‘Detection’, ‘2D Segmentation’ and ‘3D Segmentation’. If they choose ‘Detection’, they can upload the image (in .jpg or .png or .jpeg or .dcm formats) for applying the brain tumor detection models and the result shows whether there is a tumor present in the image or not as shown in Fig 12 . After clicking on ‘Evaluate’, the chosen model is applied to the uploaded image and the prediction result is shown in the evaluation results as in Fig 13 with the decision (i.e. tumor or non-tumor) and the probability score of the prediction result. The user can click on ‘Show’ to check the result details in results as shown in Fig 14 and they can also provide feedback on the result using the ‘Feedback’ button. The feedback page is shown in Fig 15 and it allows the user to enter their feedback on the existence of tumor in the ‘State’ option and it allows them to include more details in the ‘Comment’ if necessary.









If the user chooses the ‘2D segmentation’ option, then they can upload the image (in .jpg or .png or .jpeg or .dcm formats) and choose one of the two DL models (i.e. U-Net or U-Net++) as shown in Fig 16 . The evaluation results page shows the segmented tumor, the tumor area colored in red in the original image, the tumor to total brain area ratio and the segmentation confidence score. The user can evaluate another image, reevaluate the same image with another model or provide feedback on the evaluation as shown in Fig 17 . They can click on the ‘Show’ button to see the results in details like Fig 18 . The same approach can be followed for 3D segmentation. The user can go to the 3D segmentation option, upload the images and choose one of the two models for segmentation as shown in Fig 19 . The only difference is that the 3D segmentation accepts four input files for all modalities (i.e. T1, T2, T1ce and FLAIR) in nifti (i.e. .nii or .nii.gz) formats. Fig 20 shows the results of 3D segmentation.











The user can provide feedback using the ‘Feedback’ option. They can mention the state of the result as ‘Area should be found (FN)’ for false negative outputs and ‘Area should not be found (FP)’ for false positive results. They can also provide or draw a contour of the tumor area in case the segmented area is not completely accurate. They can include any other comments they might have in the ‘Comment’ section as shown in Fig 21 . The user can also see all evaluations they have executed on ‘My evaluations’ page as mentioned in Fig 22 with the evaluation ID, date and time of the evaluation, the models used and number of images used. They can also check the results from the ‘Show’ option.





brain tumor research paper

Table 8 shows the tumor detection performance evaluation for MRI image input and image features inputs for MRI classification into tumor and non-tumor classes. The model with direct MRI image input achieved more than 95% accuracy at the training phase and more than 82% accuracy at the validation phase. The accuracy at the testing phase declined to 70%, but that was still a high accuracy for tumor detection. The results for the tumor detection with each feature separately and the combination of all features collected from the images showed a different outcome. The accuracy for the training and validation models varied between 68% to 80% where sometimes the validation accuracy was higher and other times the training accuracy was higher. But in all cases, the prediction accuracy for the testing data (i.e. new data that was not used during training or validation) was much higher and varied between 80% to 93%. The features separately collected from the images were able to classify the tumor/non-tumor images better than the features automatically generated from image by the CNN model.



Although the training and validation accuracies for the model that directly used MR image as input were visibly higher than the feature based outputs, the prediction accuracy shows that the CNN models with feature inputs were able to predict the tumorous and non-tumorous images with at least more then 4% to 16% accuracy. The performances of the detection models with extracted feature inputs were better than direct image input as the amount of data was not sufficient and the quality of all input images were not consistent. Some of the input images have visible distinguishing characteristics (i.e., edges, boundaries, solid areas, etc.) between the tumor area and the rest of the image, but that was not the case for all images. Hence, the prediction accuracy of the CNN model using the images as inputs for the detection task was slightly lower than the prediction accuracy of the model with separate and/or all features. The model was able to predict the existence or absence of a brain tumor from all features more accurately compared to separate feature sets. For this reason, the trained model for image detection with all features was added as one of the user choices for classifying the image. The prediction accuracy comparisons also show that the intensity features were able to contribute to the prediction task better than other separate sets of features and that is why the intensity feature based prediction model was also added as an option to the UI. Fig 23 shows the comparison in a graph where the red line represents the prediction accuracy, the green line represents the training accuracy and the yellow line refers to the validation accuracy for the MRI CNN features, intensity features, GLCM features, DWT features, other features and all (i.e. intensity, GLCM, DWT, other) features.



Table 9 and Fig 24 show performance evaluations for the 2D and 3D MRI segmentations with U-Net and U-Net++ models. The 2D segmentations achieved more than 84% and 61% training Dice score and validation Dice scores respectively. The prediction Dice scores on the test dataset were higher than the validation scores and varied from 80% to almost 82%. The 3D segmentation models achieved higher Dice scores for training, validation and testing data and all scores were higher than 90% attaining more than 96%. The 3D U-Net models achieved almost same to 2% higher Dice scores for the test datasets compared to the training and validation whereas the 3D U-Net++ models had almost no difference between training, validation and testing data. The results showed that the U-Net models performed slightly better in tumor segmentation than the U-Net++ models for both 2D and 3D data. Although the training and validation Dice scores for most cases were very similar and the prediction Dice scores showed the differences between the models more clearly for both 2D and 3D data, the Dice scores for all implemented models were comparable to DL-based brain tumor segmentation. Fig 25 shows a more comprehensive plot diagram based on the average Dice scores of the test data prediction of the brain tumor segmentation models. The bar plot represents the performance differences between the 2D and the 3D models which varied between 14% to 16% showing the 3D models performed better. Another interesting observation on the U-Net and U-Net++ model performances was the similarity between them. Although the U-Net model performed better than the U-Net++ in both 2D and 3D images, the difference was only less than 2%.







The resulting differences between the 2D DNNs and 3D DNNs can be caused by the differences in the 2D and 3D datasets. The 2D dataset contains 3064 t1-weighted 2D MRIs combining images of three different anatomical planes. They also have different intensities for tumor and non-tumor regions and these intensities are not consistent. In some images the tumor region intensity is higher than the rest of the image, in some images it is lower, and it is not distinguishable in others. The 2D dataset has images that include the skull, brain and tumor and the differences between these are not prominent. The 3D dataset is a processed benchmark dataset containing all axial view 3D MRIs with skull-stripped brain images and the tumor regions are distinguishable in almost all files. The 2D slices collected from the 3D files also provide more specific and consistent information on the tumor region. Hence, the tumor segmentation from the 3D dataset with U-net and U-Net++ achieved noticeably higher performances than U-Net and U-Net++ on the 2D dataset. Table 10 shows some comparisons between the Dice scores of whole tumor segmentations between the implemented models in this research and baseline U-Net [ 84 ], U-Net++ [ 51 ] for BRATS 2021 dataset. The results show that the implemented models in this research achieved almost 3% to 10% higher Dice scores in segmenting the whole tumor from the BRATS 2021 dataset for both U-Net and U-Net++ models. The hyperparameter tuning and hybrid loss function improved the performances of the implemented model by a large scale compared to baseline models.



Figs 26 and 27 show some sample input MRIs, the tumor and predicted tumor marked in red for 2D U-Net and 2D U-Net++ respectively. Similarly, Figs 28 and 29 show some sample input MRIs, the tumor and predicted tumor for 3D U-Net and 3D U-Net++ respectively. Although most of the segmentation outputs were able to segment the tumors properly, there were also some cases where the models were i) not able to detect the tumor, ii) detected tumor area even if there were not any tumors, iii) segmented the tumor and some extra area, iv) segmented a part of the tumor area. Cases i and ii were the worst case scenarios where the models were unable to focus on the tumor region of the test images. For example, in Fig 26 , (a) and (b) show proper tumor area segmentation, whereas (c) shows some missing part in the segmented tumor compared to the ground truth and (d) shows some extra region at the opposite side of the brain segmented as tumor area. The examples in 2D U-Net++ show some more variations. Fig 27 shows proper segmentation in (a) and (b). But the segmentation example in (c) shows that the model could not detect the tumor area and detected another nearby but non-overlapping area as the tumor. The example in (d) shows one of the worst case scenarios, where the model detected the complete image as the tumor which is obviously incorrect. These examples support the Dice score results showing that the 2D U-Net model performed better in tumor segmentation compared to the 2D U-Net++ model. Figs 28 and 29 also provided similar results as the Dice scores. Both 3D segmentation models clearly provided better segmentation than the 2D models in most cases. The images show some slices from the 3D test images with their segmentations and like their Dice scores, the 3D U-Net segmented the tumors more accurately than the 3D U-Net++ model. Fig 28 (a) and (b) show that the segmented tumor area are exactly same as the ground truth. (c) shows that although there was no tumor in the MRI, the segmentation output shows very few nearby pixels at the lower left part of the image as the tumor. The 3D U-Net model segmented a small nearby area as the tumor as well with the original tumor in (d). Finally, Fig 29 (a) shows an accurate tumor segmentation output of 3D U-Net++, but the segmented tumor regions in (b) and (c) are slightly more rounder at the edges including few more nearby pixels as part of the tumors. So, for these two examples, the model detected a very similar but slightly bigger supersets of the tumor area pixels. On the other hand, the pixels present in the segmented output of (d) are mostly different than the ground truth pixels. Although there are few overlaps between the pixels of ground truth and segmented output, mostly the segmented output provided different pixels from similar area of the image as tumor. Each test dataset had hundreds of test images and corresponding outputs and these are just some examples. But these examples show most of the variations the test data showed for the complete test outputs for all 2D and 3D segmentation models.









Medical image analysis is a popular non-invasive way of diagnosing diseases an hence helping medical professionals in their professional assessments. Researchers from various fields have been trying to improve this analysis process applying different methods to develop more accurate automated processes and systems using medical images from different organs and body parts. As brain is the most complex organ that controls most of the functionalities of our bodies, neurological disease analysis from medical images of brain (i.e. MRI, CT, PET etc.) is a well-explored research area. In this paper, we propose a complete web application to detect the existence of brain tumor and to segment the tumor area from medical images like brain MRIs to provide a primary and precise scanning phase to help the medical professionals. The proposed web application produces a complete automated system to upload brain medical images, analyze the uploaded images with different types of operations implemented by DL models and to allow feedbacks from medical professionals to be incorporated in the future training of the models.

The web application provides the users to directly upload medical images or they can use existing medical images from hospital databases with PACS and apply three types of operations—tumor detection, 2D tumor segmentation and 3D tumor segmentation. The tumor detection can be used with image or image features and CNN models process the inputs to generate a decision—tumor or no tumor with a probability score to show the accuracy of the prediction. Both 2D and 3D tumor segmentation can be used to upload 2D or 3D brain images and either U-Net or U-Net++ model can be chosen to apply the segmentation. The segmentation results show the segmented tumor, the ratio of the tumor area and the confidence score of the tumor segmentation process. The tumor detection for some features achieved more than 90% accuracy and the segmentation models achieved around 96% Dice scores for few models. The application also allows a feedback option for healthcare professionals to provide their feedbacks on the detection and segmentation to reduce the limitations of the results with text inputs, contouring inputs and checkbox inputs. The current application have some restrictions on the input image types for each operation due to the training of the DL models. The system works with few popular medical image based DL models like CNN, U-Net and U-net++ models. But the architecture of the application allows the possibility of adding any detection or segmentation models needed. So, possible future works for the proposed system would be to extend the system to allow any images of any dimension and format, adding more new DL models as options for the users to apply on the images and including more detail properties of the tumors in the results. Analyzing the MRIs more to detect and segment separate tumor tissues, computing various tumor features (i.e. spatial, biological, etc.), tumor/cancer severity prediction will be added in future extensions of this research.

  • 1. Houston Methodist. Is Your Brain a Muscle?. 2022. Available from: https://www.houstonmethodist.org/blog/articles/2021/may/is-your-brain-a-muscle/ .
  • 2. The Johns Hopkins University, The Johns Hopkins Hospital, and Johns Hopkins Health System. Brain Anatomy and How the Brain Works. 2022. Available from: https://www.hopkinsmedicine.org/health/conditions-and-diseases/anatomy-of-the-brain#:~:text=The%20brain%20is%20a%20complex,central%20nervous%20system%2C%20or%20CNS .
  • 3. National Cancer Institute. tumor. 2022. Available from: https://www.cancer.gov/publications/dictionaries/cancer-terms/def/tumor .
  • 4. National Institutes of Health. What Is Cancer?. 2022. Available from: https://www.cancer.gov/about-cancer/understanding/what-is-cancer .
  • 5. American Society of Clinical Oncology (ASCO). Brain Tumor: Statistics. 2022. Available from: https://www.cancer.net/cancer-types/brain-tumor/statistics .
  • 6. Live Science. The 10 deadliest cancers, and why there’s no cure. 2022. Available from: https://www.livescience.com/11041-10-deadliest-cancers-cure.html#:~:text=Pancreatic%20cancer%20begins%20in%20the,the%20deadliest%20of%20the%20bunch .
  • 7. International Agency for Research on Cancer. Brain, central nervous system—Global Cancer Observatory. 2022. Available from: https://gco.iarc.fr/today/data/factsheets/cancers/31-Brain-central-nervous-system-fact-sheet.pdf .
  • 8. National Brain Tumor Society. Brain Tumor Facts. 2022. Available from: https://braintumor.org/brain-tumors/about-brain-tumors/brain-tumor-facts/ .
  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 10. American Cancer Society, Inc. Tests for Brain and Spinal Cord Tumors in Adults. 2022. Available from: https://www.cancer.org/cancer/brain-spinal-cord-tumors-adults/detection-diagnosis-staging/how-diagnosed.html#:~:text=Magnetic%20resonance%20imaging%20(MRI)%20and,tumor%2C%20if%20one%20is%20present .
  • 11. American Society of Clinical Oncology (ASCO). Brain Tumor: Diagnosis. 2022. Available from: https://www.cancer.net/cancer-types/brain-tumor/diagnosis .
  • 13. Strobbe G. Advanced forward models for EEG source imaging. Doctoral dissertation, Ghent University, 2015.
  • 23. SICAS Medical Image Repository. BRATS2015. 2022. Available from: https://www.smir.ch/BRATS/Start2015 .
  • 24. Khan MU, Khan H, Arshad A, Baloch NK, Shaheen A, Tariq F. Brain tumor detection based on magnetic resonance imaging analysis Using segmentation, thresholding and morphological operations. In 2021 6th International multi-topic ICT conference (IMTIC) 2021 Nov 10 (pp. 1-6). IEEE.
  • 26. Hasanah U, Sigit R, Harsono T. Classification of brain tumor on magnetic resonance imaging using support vector machine. In 2021 International electronics symposium (IES) 2021 Sep 29 (pp. 257-262). IEEE.
  • 28. Kaggle. BraTS 2017. 2022. Available from: https://www.kaggle.com/datasets/abdullahalmunem/brats17 .
  • 33. Figshare. brain tumor dataset. 2022. Available from: https://figshare.com/articles/dataset/brain_tumor_dataset/1512427 .
  • 34. Radiopaedia. Radiopaedia. 2022. Available from: https://radiopaedia.org/ .
  • 37. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention 2015 Oct 5 (pp. 234-241). Springer, Cham.
  • 41. Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J. Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support 2018 Sep 20 (pp. 3-11). Springer, Cham.
  • 42. Hou A, Wu L, Sun H, Yang Q, Ji H, Cui B, et al. Brain segmentation based on UNet++ with weighted parameters and convolutional neural network. In 2021 IEEE International conference on advances in electrical engineering and computer applications (AEECA) 2021 Aug 27 (pp. 644-648). IEEE.
  • 43. Roth J, Keller J, Franke S, Neumuth T, Schneider D. Multi-plane UNet++ ensemble for glioblastoma segmentation. In International MICCAI brainlesion workshop 2022 (pp. 285-294). Springer, Cham.
  • 44. Google Research. EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling. https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html .
  • 46. Luu HM, Park SH. Extending nn-UNet for brain tumor segmentation. In Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries: 7th international workshop, BrainLes 2021, held in conjunction with MICCAI 2021, virtual Event, September 27, 2021, Revised Selected Papers, Part II 2022 Jul 15 (pp. 173-186). Cham: Springer international publishing.
  • 48. Zeineldin RA, Karar ME, Burgert O, Mathis-Ullrich F. Multimodal CNN networks for brain tumor segmentation in MRI: a BraTS 2022 challenge solution. arXiv preprint arXiv:2212.09310. 2022 Dec 19.
  • 55. Abd-Ellah MK, Awad AI, Khalaf AA, Hamed HF. Design and implementation of a computer-aided diagnosis system for brain tumor classification. In 2016 28th International conference on microelectronics (ICM) 2016 Dec 17 (pp. 73-76). IEEE.
  • 56. Boudjella A, Boudjella MY, Bellebna B. Machine learning KNN classification an approach on detecting abnormality in brain tissues MRI-graphic user interface. In 2022 7th International conference on image and signal processing and their applications (ISPA) 2022 May 8 (pp. 1-6). IEEE.
  • 57. Ucuzal H, YAŞAR Ş, Çolak C. Classification of brain tumor types by deep learning with convolutional neural network on magnetic resonance images using a developed web-based interface. In 2019 3rd international symposium on multidisciplinary studies and innovative technologies (ISMSIT) 2019 Oct 11 (pp. 1-5). IEEE.
  • 58. Khan IU, Akhter S, Khan S. Detection and classification of brain tumor using support vector machine based GUI. In 2020 7th International conference on signal processing and integrated networks (SPIN) 2020 Feb 27 (pp. 739-744). IEEE.
  • 59. Surya S, Aurelia S. A machine learning entrenched brain tumor recognition framework. In 2022 International conference on electronics and renewable systems (ICEARS) 2022 Mar 16 (pp. 1372-1376). IEEE.
  • 65. Deepa G, Mary GL, Karthikeyan A, Rajalakshmi P, Hemavathi K, Dharanisri M. Detection of brain tumor using modified particle swarm optimization (MPSO) segmentation via haralick features extraction and subsequent classification by KNN algorithm. Materials today: proceedings. 2022 Jan 1;56:1820-6.
  • 68. Huff T, Mahabadi N, Tadi P. Neuroanatomy, visual cortex. InStatPearls [internet] 2021 Jul 31. StatPearls publishing.
  • 71. Kaggle. Brain MRI Images for Brain Tumor Detection. 2022. Available from: https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection .
  • 72. TensorFlow. tf.keras.preprocessing.image.ImageDataGenerator. 2022. Available from: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator .
  • 73. Jinja Pallets. 2023. Available from: https://palletsprojects.com/p/jinja/
  • 76. CBICA. RSNA-ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge 2021. 2022. Available from: http://braintumorsegmentation.org/ .
  • 77. BraTS Continuous Evaluation. 2023 Sage Bionetworks. 2023. Available from: https://www.synapse.org/#!Synapse:syn27046444/wiki/616992 .
  • 79. Baid U, Ghodasara S, Mohan S, Bilello M, Calabrese E, Colak E, et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314. 2021 Jul 5.
  • 83. Python Python software foundation. 2023. Available from: https://www.python.org/
  • Open access
  • Published: 26 April 2023

An early detection and segmentation of Brain Tumor using Deep Neural Network

  • Mukul Aggarwal 1 ,
  • Amod Kumar Tiwari 2 ,
  • M Partha Sarathi 3 &
  • Anchit Bijalwan 4  

BMC Medical Informatics and Decision Making volume  23 , Article number:  78 ( 2023 ) Cite this article

2850 Accesses

3 Citations

Metrics details

Magnetic resonance image (MRI) brain tumor segmentation is crucial and important in the medical field, which can help in diagnosis and prognosis, overall growth predictions, Tumor density measures, and care plans needed for patients. The difficulty in segmenting brain Tumors is primarily because of the wide range of structures, shapes, frequency, position, and visual appeal of Tumors, like intensity, contrast, and visual variation. With recent advancements in Deep Neural Networks (DNN) for image classification tasks, intelligent medical image segmentation is an exciting direction for Brain Tumor research. DNN requires a lot of time & processing capabilities to train because of only some gradient diffusion difficulty and its complication.

To overcome the gradient issue of DNN, this research work provides an efficient method for brain Tumor segmentation based on the Improved Residual Network (ResNet). Existing ResNet can be improved by maintaining the details of all the available connection links or by improving projection shortcuts. These details are fed to later phases, due to which improved ResNet achieves higher precision and can speed up the learning process.

The proposed improved Resnet address all three main components of existing ResNet: the flow of information through the network layers, the residual building block, and the projection shortcut. This approach minimizes computational costs and speeds up the process.

An experimental analysis of the BRATS 2020 MRI sample data reveals that the proposed methodology achieves competitive performance over the traditional methods like CNN and Fully Convolution Neural Network (FCN) in more than 10% improved accuracy, recall, and f-measure.

Peer Review reports


Brain Tumor segmentation and detection are very challenging in the medical imaging area. Various DNN methods are used for Tumor segmentation, utilizing multiple deep-learning network architectures. The processing of medical images plays a crucial role in assisting humans in identifying different diseases [ 1 ]. Classification of brain Tumors is a significant part that depends on the expertise and knowledge of the physician. An intelligent system for detecting and classifying brain Tumors is essential to help physicians. Gliomas have an irregular shape and ambiguous boundaries, which are the most challenging Tumors to detect. Various authors have performed additional research on deep learning networks based on healthcare, i.e., Convolutional neural networks (CNNs), LinkNet, Visual Graphic Group (VGG), UNet, and SegNet [ 2 ].

Image segmentation poses significant challenges, including categorization, image processing, object recognition, and explanation. Whenever an image classification model is formed, e.g., it must be eligible to function with great precision even when subjected to occlusion, lighting modifications, observing angles, and other factors [ 3 ].

The conventional object detection process, including its primary feature extraction step, is unsuitable for wealthy areas. Sometimes experts in the domain cannot provide a single or collective of functionalities capable of achieving accurate results under varying conditions. The concept of model training emerges due to that kind of problem. The appropriate features for working with image data are instantly figured out [ 4 ].

Content-based image retrieval provides various imaging modalities, such as CT, MR, PET, X-rays, and Ultrasound. Also, the many image data available because of different scan parameter settings and multiple views of the same pathology make image retrieval in the medical domain tough and challenging. However, at the same time, it is one of the essential applications [ 5 ]. The MR images are taken from three different directions. These views are called sagittal, axial, and coronal [ 6 ]. For CBIR to be used in healthcare as a diagnostic aid, the medical information framework must be robust in various scenarios to be accepted by clinicians and medical practitioners [ 7 ].

First, case-based reasoning will be more acceptable to the medical community when the retrieval engine results in cases with exact locations and similar pathology responding to a query (new) case [ 8 ].

This will significantly help the medical expert have more information about the case and aid the expert in monitoring. Secondly, the database formed for testing purposes should be carefully built consisting of cases from multiple views, different scanning parameters, and acquired from different imaging modalities. CNN has been used to segment Tumors in multi-modal Imaging [ 8 ].

The CNN architecture is sophisticated, combining segmentation and classification into a single product. Current segmentation methods have been designed to solve the reduplication issue of CNNs by allocating a target class toward each pixel. A CNN model has been transformed into an FCN (Fully CNN). This article has critical contributions to brain Tumor research, which are as follows:

This research develops the ResNet Model to address the weaknesses of CNN and FCN methodologies and improve computational costs. The principle of ResNet is premised on adding the layer’s outcome towards its significant input.

The simple transformation used in Enhanced ResNet mainly improves the training process of Convolutional models by utilizing the “shortcut links.” These links provide all the possible route details in a single place and provide access in a single click reducing the accessing time.

The complete research article is organized as follows: Section 1 covers the introduction, Section 2 covers existing Tumor segmentation work related to research, Section 3 covers material and methods, section 4 covers results, section 5 covers the discussion and Section 6 covers the conclusion and future direction of the research.

Related works

The field of Tumor segmentation is continuously undergoing investigation. Deep learning has recently proven effective in healthcare image segmentation and information extraction. In deep learning techniques, pixel-based classification is the latest phenomenon. Various researchers have suggested different methods for brain Tumor segmentation. This section covers the analysis of a few of the critical research.

Research [ 9 ] presents brain Tumor segmentation using DNN. Brain Tumors are segmented on magnetic resonance visuals of the brain using a Deep Convolutional encoder model. This approach enhances learning by extracting attributes from complete images, eliminating patchwork selections, and improving calculations at adjacent intersections. Research [ 10 ] presented a technique for the early detection of brain cancers. Magnetic resonance images were examined to identify Tumor-bearing areas and categorize them into various classifications. In image classification techniques, deep learning generates efficient performance.

Consequently, the Fully Convolutional Networks technique was applied and incorporated through the Tensor Flow repository throughout this research. A newer CNN technique has been demonstrated to have a precision of 91 percent, which is better than previous research.

Research [ 11 ] developed a model by utilizing Brain imaging to recognize the nature of brain Tumors. A two-dimensional CNN was used to acknowledge malignant Tumors with an accuracy rate of 93 percent. The data for the four most often detected brain Tumors are included in the research’s analysis.

Research [ 12 ] advised a responsive and efficient Tumor segmentation framework. In a Cascades Classification Model, this strategy reduces computation time and addresses the problem of overfitting. Using two separate forms, this CNN architecture extracts global and regional characteristics. Additionally, the Tumor detection precision is significantly enhanced compared to current algorithms. The average WT, increasing Tumor, and Tumor center dice scores for the proposed approach achieved 92.3%, 94.5%, and 93.2 %.

Research [ 13 ] developed a model to evaluate Tumors utilizing an MRI dataset. It entails finding cancer, grading it by size and type, and determining the Tumor’s position. Instead of using alternative approaches for each classification task, this strategy used a single model to organize MRI Images on many classification techniques.

Research [ 14 ] prompted brain Tumor identification and separation by integrating both training methods. The first proposed approach was the Binary Pattern method based upon that neighbor range connection termed ‘nLBP’. The second strategy was based on the perspective of the neighbor next door called “αLBP.” The above two techniques were developed to process and analyses MRI images of the most prevalent cancers: Glioblastoma, malignant Tumors, & gland Tumors. For feature evolution, the statistics of the precompiled images were employed. Conventional extraction of feature strategies scored worse than this proposed model.

Research [ 15 ] applied the brain Tumor partition by integrating all the RELM (“Regularized Extreme Learning Machine”). The procedure initially normalized images to make the framework’s understanding easier. The framework utilized a min-max strategy for pre-processing phase. This min-max processing method significantly improved the brightness of the original images.

Research [ 16 ] applied the brain Tumor partition by integrating all the RELM (“Regularized Extreme Learning Machine”). The procedure initially normalized images to make the framework’s understanding easier. The framework utilized a min-max strategy for pre-processing phase. This min-max processing method significantly improved the brightness of the original images.

Research [ 17 ] proposed a Convolutional Perceptron neural network-based segmentation initiative to improve the Whale Optimization method. For improved feature evolution and partition, the hybrid algorithm produced an updated form of WOA. The Mean Filtering was used to first remove the noise from data in product development and production. The enhanced WOA was used to pick characteristics from the retrieved features. The MLP-IWOA-based classification was used to classify Tumors and outperformed specific current approaches.

Research [ 18 ] consolidated significant statistical attributes with CNN architectures to create a technique for the segment of brain cancer cells. The architecture concentrated on the Tumor’s boundary. The two-dimensional Wavelet Decomposition, Gabor Filters Filter, and similarity measures were used to identify and extract the image. A significant feature with further categorization was developed by combining these statistical properties.

Research [ 19 ] analyzed that cancer seems to be the most severe disease and therefore is considered challenging to treat. While behind the bottom section of the belly is a pancreatic malignant that develops in the pancreatic cells that aid indigestion. Its stage of growth determines the therapy for this Tumor. The Tumor is detected by individually identifying the afflicted region of the CT scanned data. It forecasts the Tumor region under consideration by utilizing Gaussian Mixture Framework and Expectation-Maximization method & CNN [ 20 ].

Materials & Methods

This section covers the essential methods used in this research and the proposed improved ResNet method working.

Convolution Neural Network

CNN is mainly a deep learning approach used to classify images. CNN is an artificial neural network designed to analyze input in a mesh form. In CNN, a Convolution process is an activity inside the convolution layer premised on just a mathematical matrix operation that increases the matrix of both the filtration system in the image to be analyzed. This convolution operation is the first and most significant utilization phase [ 21 ].

Figure 1 shows the architecture of CNN. This figure shows three layers named convolutional, pooling and fully connected layers. Another layer often employed is a pooling layer that receives the whole or averaged values of the pixels image regions. CNN is capable of learning advanced functionality by creating a feature map.

figure 1

Architecture of Convolution Neural Network (CNN)

It constructs many feature maps; each convolution layer core is covered across its input sequence. Input sequences recognize characteristics presented on this feature map as simple boxes. Such maps are sent to the optimum related resources layer, keeping the most important features while discarding the remaining. Inside each fully-connected layer, the characteristics of its max-pooling base layer are turned into a 1-D feature vector, which will be employed to determine the output consequence [ 22 ]. Image scalability is not possible in a traditional neural network model.

However, in a CNN model, the image can be scaled (that is, it can go from a 3D input space to a 3-dimensional output pattern). The CNN Model comprises its input layers, convolution, Rectified Unit layer, pooling layer, and fully-Connected layers. The provided data (input images) gets split into small sections inside the convolution operation. The ReLU layer performs element-by-element activation. The requirement for a pooling layer is voluntary. Here the option of using or skipping can be taken

On the other hand, this pooling layer is mainly utilized for downstream sampling. A category score or class score code is represented in the last stage (i.e., fully connected layer) based on 0 and 1. The CNN-based brain Tumor segmentation training/testing rounds are categorized into two sections. All images are classified using categories like Tumor images and non-Tumor brain Tumor images [ 23 ].

Algorithm: 1 CNN-based Brain Tumor segmentation process. Input: Brain Tumor imagoes dataset Output: Tumor images are segmented into Tumor and Non-Tumor images. Step 1: Impose a Convolutional filtration to the very initial layer. Step 2: Refine the Convolutional filter to lower its sensitivities called “sub-sampling.” Step 3: All signal transmissions from one layer to the next are regulated primarily through activation blocks. Step 4: Use the rectified linear component to shorten the training process. Step 5: Each neuron in the previous layer is linked to every cell inside the subsequent stage. Step 6: At the end of the learning process, a failure layer is applied to provide constructive feedback on the CNN architecture.

Fully Convolutional Network (FCN)

In research [ 24 ], the FCN has been suggested as a solution to semantic segmentation and classification. Researchers utilized AlexNet, VGGNet, and GoogleNet as potential options. Researchers transmitted all such approaches from classification methods to thick FCN by replacing convolution layers with (1×1) Convolutional layers and adding a (1 × 1) convolution to frequency axis 21 to forecast rankings at each class and context category. FCN can learn to quickly build dense assumptions for per-pixel processes such as semantic segmentation [ 24 ].

Figure 2 shows the working of FCN architecture for image segmentation. Each layer in FCN is just a 3-D array of different sizes, including height, width, and dimension. The image is the first layer, with all the pixels’ information, including height, width, and colour space dimensions. Higher-level locations correlate to the image regions and are route-based, their visual field.

figure 2

FCN Architecture

Significant alterations in FCN that further contributed to the conceptual framework to accomplish state- of-art outcomes are just the prototype VGG16, bipolar extrapolation method for up-sampling only the resulting feature outline, and skip correlation for incorporating minimal layer as well as consistently high layer characteristics in the closing layer for fine-grained segmentation. FCN only uses local data for segmentation.

However, only neighborhood details make logical segmentation unclear because the image’s global semantic scope is lost. Relevant information first from the entire image is beneficial for reducing uncertainty. U-Net and V-Net are the most popular FCN architectures widely used in image segmentation [ 25 , 26 ].

Proposed model based on Residual Learning Network

The work explains the MRI brain Tumor datasets for medical image analysis that are freely available. This research outlines the performance indicators for evaluating deep learning image and segmentation models.

To address existing challenges, this work utilized an advanced pre-processing approach in the proposed method to eliminate many irrelevant data, resulting in impressive outcomes, perhaps in the current convolutional neural network.

The proposed strategy does not employ a complicated segmentation method to categorize the position of the brain Tumor and the extraction of features, which results in a time-consuming process with a high fault rate.

ResNet has been taken for proposed work as it is free from gradient issues, originally a problem of various deep learning models. The fading gradient problem occurs during the training procedure of a CNN network. As the learning continued, a gradient rule of previous layers lowered to nil or zero. A ResNet method can be utilized to address this problem. A gain of the relationship between these factors residual layer in ResNet is combined with all of its direct input to become its next inner layer [ 27 , 28 , 29 ]. Let H(RX) denote a residual mapping to establish a deep residual block, as shown in Fig.  3 .

figure 3

ResNet working structure

Consider a CNNS block with RX as input and the main objective of learning the accurate distribution H (RX). The output and the information difference is the “Residual learning value (RL),” as described in equation 2 .

where H (RX) represents the actual outcome, RL represents the Residual learning value, and RX represents the input. To overcome the gradient issue of DNN, this research provides an efficient method for a brain Tumor.

The Proposed Improved ResNet Model Working

Segmentation based on the Improved Residual Learning Network (ResNet). Existing ResNet can be improved by maintaining the details of all the available connection links. The proposed ResNet utilizes a jump relationship in that initial input data is combined with the convolution building’s outcome. The above addresses the disappearing gradient problem by enabling an additional route for the gradient to move across. The proposed method also utilizes an identification function that allows a more significant layer to accomplish as delicate as a bottom level. The proposed model used the pre-processing, Data Segmentation, and post-processing phases [ 30 , 31 , 32 ].

Figure 4 presents the working of the proposed ResNet model. In improved ResNet, the complete process is divided into four phases

figure 4

( A ) Long Skip Connection process in ResNet, ( B ) ResNet Bottleneck Block process, ( C ) ResNet Basic Block Working, and ( D ) ResNet Simple Block Working

In past research, researchers suggested numerous ResNet configurations with ResNet-18, ResNet-34, ResNet-50, and ResNet-152 layers. Each layer of just a ResNet consists of several frames or building blocks. The Identification and Convolutional blocks are merged to produce an Improved ResNet structure in such implementations. This research uses an improved ResNet-50 layered model for segmentation because it has more fabulous depth layers than ResNet-34 and fewer parameters than other ResNet models, resulting in a quicker training period. Figure 4 shows the ResNet-50 architectures [ 33 ].

where \({\mathrm{L}}_{\mathrm{bce}}\) represents the standard binary entropy loss and \({L}_{dice}\) represents the dice loss mainly occurring during image segmentation.

The complete process of the proposed Improved ResNet is as follows:

Step 1: It contains a two-dimensional Convolution that has 64 filtrations of (7*7) framings and just a stride of size (2*2) small-batch Standard, and also the ReLU (activation function) completes the route axis uniformity. Finally, a Max Pooling with a frame of (2*2) is used.

Step 2: It includes one two-dimensional CNN model block with two Identification blocks, each having three pairs of filtrations [64, 64, 256] and a stride with size (1*1).

Step 3: It comprises one fully-connected block with three Identification blocks, each with three pairs of filtrations [128, 128, 512] to a stride with size (2*2).

Step 4: It contains one Convolution layer block as well as five Identification; it also uses three pairs of filtration of size [256, 256, 1024] and blocks size (3*3), as well as a stride of size (2*2).

Step 5: It comprises one Convolution layer block and two Identification blocks, each with three pairs of filtrations [512, 512, 2048] with just a stride size (2*2).

Step 6: The fully connected layer is also used to reduce the direct input toward the number of subclasses using a “Soft-max reactivation” algorithm, after which the outcome is flattened.

Proposed work model description

The Residual Network with Long Skip Connections is represented by Phase 1. It contains down-sampling (in Figure 4 , represented by blue colour), indicating that it is a contracting path. Similarly, an up-sampling (in Figure 4 , represented by orange colour) reveals that it is a rapidly expanding route. During this process, long skip connections interact with the contracting path to the growing direction, shown with arrows from left to right in Figure 4 A.

Various (1*1) and (3*3) Conv are used; these blocks are called bottlenecks. BN and ReLU are used in this phase [ 34 , 35 , 36 ]. The concept behind Pre-Activation ResNet is to employ BN-ReLU just before a Conv, as shown in Figure 4 B. the Benefits of using these bottleneck blocks are less training time and improved performance. The use of a bottleneck reduces the number of parameters and matrix multiplications. For example, if 9 operations were there, it would mainly reduce them to 6. The idea is to make residual blocks as thin as possible to increase the depth and has fewer parameters.

The third phase is the primary block phase, mainly utilizing (3*3) blocks only, not the (1*1) block. This phase represents the basic block. A basic ResNet block comprises two layers of 3x3 conv /BatchNorm/relu. In the picture, the lines represent the residual operation. The dotted line means that the shortcut was applied to match the input and the output dimension

The last phase is the simple block phase, which utilizes (3*3) n blocks. Max Pooling is used in this phase which rejects a big chunk of data. It extracts only the most salient features of the data. MaxPool bound the system to only the very important features and might miss out on some details

Dataset description

This research utilized the BraTS2020 dataset [ 37 ]. A brat consistently evaluates cutting-edge brain Tumor segmentation approaches in composite MRI scan data. BraTS 2020 uses multi-institutional like pre Image data. It concentrates on segmenting inherently heterogeneous (through shape, location, and cell biology) brain Tumors, such as gliomas. It includes 369 brain Tumor MR images. As described in Fig.  5 , all previous research examined T1-weighted (called T1), post-contrast T1-weighted (called T1ce), T2-weighted (called T2), and fluid-attenuated inversion recovery (called Flair) sequencing. Each of the images has a (240*240*155) size[ 38 ]. The dataset is collected from the online Kaggle website. It includes 369 brain MR images; 125 are utilized for training and 169 MRI images for testing. Figure  5 shows the Brain Tumor types available in the BraTS 2020 dataset.

figure 5

Brain Tumor Images in BraTS2020 (1) for Type T1, (2) for Tumor Type T2, (3) for Tumor Type T1c, and (4) for Tumor type FLAIR

Performance measuring parameters

The following essential version was utilized to measure the performance of the proposed method and the existing one [ 39 , 40 , 41 ].

Mean Square Error (MSE)

The procedure of squaring predicted quantities is MSE. An average of such squared errors can be used to explain it. Equation 5 denotes the cumulative square estimation error between the actual picture and the output image as MSE

Peak Signal Noise Ratio (PSNR)

PSNR relates to a picture’s immune function to noise external interference signals. When the PSNR level is greater, the noisy interference signal’s effect on the MR image database is minimal. MSE phrases are used to represent PSNR. PSNR must be between 40 and 60 dB. It is calculated by Eq.  6 . Where Maxl is usually 255 and MSE is the mean square error

Computation Time

The time it takes to complete the segmentation procedure is calculated in milliseconds or Seconds and represented as elapsed time.

Jaccard Coefficient (JC)

It also serves as a metric for evaluating segmentation strategies. Jacquard offers Eq.  7 to compute the matching of two Q1 and Q2 pairs by standardizing the volume of their overlap over the respective union.

Dice Similarity Coefficient (DSC)

The DSC is now the most popular and common assessment indicator for assessing the segmentation results and their base facts. This measures the overlap values of two pairs, Q1 and Q2, via normalizing them well across the average of respective standard sizes. DSC is presented in the equation

Sensitivity and Specificity

The following Eqs.  9 and 10  calculate sensitivity and specificity as rule-based decision theory measures. Where: TP-True Positive, FP-False Positive, TN-True Negative, FN -False Negative

Training results

In this research, the BraTS2020 dataset has been used collected from Kaggle [ 35 ]. This dataset mainly contains 369 brain Tumor patient MR images, where 125 are utilized for training and 169 MRI images for testing. The proposed improved ResNet model, existing CNN model, and FCN (model type U Net) are implemented using Python programming (Tensor flow) in the Anaconda environment. A complete experimental process is divided into two phases: training and testing. The first training phase is applied to train the model.

In the first phase, the normalization process is used. The dataset was corrected in the initial stage because the dataset had some inclination sub-field contortion for which the N4ITK technique has been taken. This technique mainly converts all four MRI brain Tumor image sequences of a particular patient, which helps in Tumor growth and sequencing analysis.

This work has presented an improved Recurrent neural network-based approach for Tumor segmentation from multi-modal 3-dimensional MRI images that further utilizes the BraTS 2020 brain Tumor dataset for performance validation. Several possible solutions have been tried while messing with CNN models. Table 1 shows the proposed improved ResNet system parameters utilized for training purposes. After normalization, the Stochastic Gradient Descent optimization method (SGDOM) manages the loss function limit. Its value mainly depends on the gradient (negative) towards the model minima. The training performance of the proposed improved ResNet and existing CNN and FCN is described in Figure 6 .

figure 6

Experimental outcomes for training accuracy of proposed improved ResNet and existing CNN and FCN

The proposed enhanced ResNet model shows a lower error rate and higher accuracy in the training phase than existing methods. The proposed improved ResNet model is validated using thirty percent of the training dataset in this experiment.

Testing results

Figure  7 represents the performance validation of the proposed improved ResNet model with 50 epochs. Experimental outcomes prove that the training error rate decreases linearly, and the accuracy percentage increases for each epoch. The test dataset is implemented to the proposed and existing model through the testing phase to identify the brain Tumor cells in MRI images. The proposed improved ResNet model is compared to specific other existing methods in terms of performance metrics (T, ET, WT) to analyze the performance of Tumor segmentation. All performance measures have been taken for each patient in the given dataset. The mean values of these performance measures were then calculated for all patients. Figure  8 shows the experimental results of the proposed Improved ResNet Mode.

figure 7

Experimental outcomes for training Error Rate of proposed improved ResNet and existing CNN and FCN

figure 8

Experimental Results of proposed Improved ResNet Mode


Brain Tumor segmentation and detection is a widely known area of research. Various Deep learning models have been executed for all brain Tumor cases like core Tumor region(CT), enhanced Tumor region(ET) and whole Tumor region(WT).

The proposed Improved ResNet model is based on Linked, which further performs identity mapping, and one “s outcome is merged with the outcome of the convolution layer without using any model factors. It also implies that a layer in the ResNet prototype tries to understand the residual of interconnects.

In contrast, layers in CNNs and perhaps FCN (U-Net) methods discover the actual performance. Consequently, the gradients can move quickly back, leading to faster computation than CNNs and FCN models. The quick access links in the proposed Improved ResNet model regulate the disappearing gradient issue.

Tables 2 , 3 , and 4 compare proposed ResNet and existing models (CNN and FCN) for JC, DICE Score, and Sensitivity, Specificity, and Accuracy parameters for CT, ET and WT respectively on BraTS2020 datasets.

According to the assessment conducted for CT proposed model, the output is 0.658, 0.924, 0.7613, 0.835, and 0.854 of JC, DICE Score, Sensitivity, Specificity and Accuracy, respectively. Similarly, the ET proposed model is 0.6328, 0.945, 0.7989, 0.926, 0.913, and for WT, it gives 0.6308, 0.864, 0.7365, 0.923, 0.879 values.

These results show improvement over CNN and FCN due to the four-phase process of the proposed model. The proposed Improved ResNet Model has better outcomes for all three Tumor cases (ET, CT, and WT). This proves that the proposed Improved ResNet model performs well in pediatric segmentation for a brain Tumor. Table 5 demonstrates that the proposed Improved ResNet model has the lowest computation time and the best PSNR and MSE. The proposed method has better results for MSE and PSNR than existing CNN and FCN methods. Loewe, the MSE value shows better performance. The proposed method has 26. 898% MSE and 21.457% PSNR are more than 20%, far better than CNN and FCN.

Conclusion & future work

Deep Neural Networks (DNNs) are very useful for image segmentation. However, this technique encounters a disappearing gradient issue that emerges throughout the training. To address this issue, the Improved ResNet is proposed in this research. A “connection link” inside a current ResNet allows the gradient to propagate backwards to subsequent layers. These links provide all the possible route details in a single place and provide access in a single click reducing the accessing time. This paper presents a pre-processing approach in the proposed method to eliminate many irrelevant data, resulting in impressive outcomes.

The proposed Improved ResNet and existing CNN and FCN models are implemented using tensor flow and tested on the BraTS2020 dataset. Experimental results demonstrate the strength of the proposed method in terms of better accuracy, less computation time, MSE, PSNR, and better DSC and JC. The strength of the proposed improved ResNet model is that users did not require the assistance of an expert to manually find the Tumor pixel by pixel, which is a complex and time-consuming operation. This proposed model tackles these issues by utilizing shortcut connection links in ResNet.

The experimental outcomes achieve better performance and a remarkable result compared with conventional techniques. In the binary classification problem, accuracy and precision were examined, as was the Dice coefficient score throughout the segmentation experiment. Future research can improve current outcomes and leverage deeper architectures to improve the overall effectiveness of segmentation output.

Availability of data and materials

This work utilizes the online brain Tumor available dataset data from the Kaggle BraTS2020 competition. The following is the link: https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation (accessed on 13 March 2022).


Magnetic resonance image

Deep Neural Networks

Residual Network

Fully Convolution Neural Network

Visual Graphic group

Residual learning value

Core Tumor Region

Mean Square Error

Jaccard Coefficient

Magnetic Resonance

Positron emission tomography

True Positive

False Positive

True Negative

False Negative

Whole Tumor Region

Enhanced Tumor Region

Peak Signal Noise Ratio

Dice Similarity Coefficient

Stochastic Gradient Descent optimization method

Regularized Extreme Learning Machine

A Tiwari A, Srivastava S, Pant M. Brain Tumor segmentation and classification from magnetic resonance images: Review of selected methods from 2014 to 2019. Pattern Recognition Letters. 2020;131:244–60. https://doi.org/10.1016/j.patrec.2019.11.020

Munir K, Frezza F, Rizzi A. Brain Tumor segmentation using 2D-UNET convolutional neural network. Deep Learning for Cancer Diagnosis. 2021:239–48. https://doi.org/10.1007/978-981-15-6321-8_14

Aher P, Lilhore U. Survey of brain Tumor image quarrying techniques. Int J Sci Eng Dev Res, ISSN. 2020:2455–631.

Zhang D, Huang G, Zhang Q, Han J, Han J, Yu Y. Cross-modality deep feature learning for brain Tumor segmentation. Pattern Recogn. 2021;1(110).  https://doi.org/10.1016/j.patcog.2020.107562

Silva CA, Pinto A, Pereira S, Lopes A. Multi-stage deep layer aggregation for brain Tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke, and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6 2021 (pp. 179–188). Springer International Publishing.  https://doi.org/10.1007/978-3-030-72087-2_16

Zhou T, Canu S, Vera P, Ruan S. Feature-enhanced generation and multi-modality fusion based deep neural network for brain Tumor segmentation with missing MR modalities. Neurocomputing. 2021;27(466):102–12. https://doi.org/10.1016/j.neucom.2021.09.032 .

Article   Google Scholar  

Lin F, Wu Q, Liu J, Wang D, Kong X. Path aggregation U-Net model for brain Tumor segmentation. Multimedia Tools Appl. 2021;80:22951–64. https://doi.org/10.1007/s11042-020-08795-9 .

Das S, Swain MK, Nayak GK, Saxena S. Brain Tumor segmentation from 3D MRI slices using cascaded convolutional neural network. Advances in Electronics, Communication, and Computing: Select Proceedings of ETAEERE 2020 2021 (pp. 119–126). Springer Singapore.  https://doi.org/10.1007/978-981-15-8752-8_12

Zhang Y, Lu Y, Chen W, Chang Y, Gu H, Yu B. MSMANet: a multi-scale mesh aggregation network for brain Tumor segmentation. Appl Soft Comput. 2021;1(110):107733. https://doi.org/10.1016/j.asoc.2021.107733

Munir K, Frezza F, Rizzi A. Deep learning for brain Tumor segmentation. Deep Learning for Cancer Diagnosis. 2021:189–201.  https://doi.org/10.1007/978-981-15-6321-8_11

Vaibhavi P, Rupal K. Brain Tumor Segmentation Using K-means–FCM Hybrid Technique. InAmbient Communications and Computer Systems: RACCCS 2017 2018 (pp. 341–352). Springer Singapore.  https://doi.org/10.1007/978-981-10-7386-1_30

Sharif MI, Li JP, Amin J, Sharif A. An improved framework for brain Tumor analysis using MRI based on YOLOv2 and convolutional neural network. Complex Intell Syst. 2021;7:2023–36. https://doi.org/10.1007/s40747-021-00310-3 .

Saueressig C, Berkley A, Munbodh R, Singh R. A joint graph and image convolution network for automatic brain Tumor segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke, and Traumatic Brain Injuries: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part I. Cham: Springer International Publishing; 2022. p. 356–65. https://doi.org/10.1007/978-3-031-08999-2_30 .

Chapter   Google Scholar  

Zeineldin RA, Karar ME, Coburger J, Wirtz CR, Burgert O. DeepSeg: deep neural network framework for automatic brain Tumor segmentation using magnetic resonance FLAIR images. Int J Computer-Assisted Radiol Surg. 2020;15:909–20. https://doi.org/10.1007/s11548-020-02186-z .

Abd El Kader I, Xu G, Shuai Z, Saminu S, Javaid I, Salim Ahmad I. Differential deep convolutional neural network model for brain Tumor classification. Brain Sci. 2021;11(3):352. https://doi.org/10.3390/brainsci11030352 .

Article   PubMed   PubMed Central   Google Scholar  

Deng W, Shi Q, Luo K, Yang Y, Ning N. Brain Tumor segmentation based on improved convolutional neural network in combination with non-quantifiable local texture feature. J Med Syst. 2019;43:1–9. https://doi.org/10.1007/s10916-019-1289-2 .

Bodapati JD, Shaik NS, Naralasetti V, Mundukur NB. Joint training of two-channel deep neural network for brain Tumor classification. SIViP. 2021;15(4):753–60. https://doi.org/10.1007/s11760-020-01793-2 .

Zhou Z, He Z, Jia Y. AFPNet: A 3D fully convolutional neural network with atrous-convolution feature pyramid for brain Tumor segmentation via MRI images. Neurocomputing. 2020;18(402):235–44. https://doi.org/10.1016/j.neucom.2020.03.097 .

Jiang Y, Ye M, Huang D, Lu X. AIU-Net: An Efficient Deep Convolutional Neural Network for Brain Tumor Segmentation. Math Probl Eng. 2021;4(2021):1–8. https://doi.org/10.1155/2021/7915706 .

Article   CAS   Google Scholar  

Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, González-Ortega D. A deep learning approach for brain Tumor classification and segmentation using a multi-scale convolutional neural network. Healthcare. 2021;9(2):153. https://doi.org/10.3390/healthcare9020153 . MDPI.

Saleem H, Shahid AR, Raza B. Visual interpretability in 3D brain Tumor segmentation network. Comput Biol Med. 2021;1(133):104410. https://doi.org/10.1016/j.compbiomed.2021.104410

Gupta S, Gupta M. Deep learning for brain Tumor segmentation using magnetic resonance images. In2021 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB) 2021 (pp. 1–6). IEEE.  https://doi.org/10.1109/CIBCB49929.2021.9562890

Kamnitsas K, Ferrante E, Parisot S, Ledig C, Nori AV, Criminisi A, Rueckert D, Glocker B. DeepMedic for brain Tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: Second International Workshop, BrainLes 2016, with the Challenges on BRATS, ISLES and mTOP 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, October 17, 2016, Revised Selected Papers 2 2016 (pp. 138–149). Springer International Publishing.  https://doi.org/10.1007/978-3-319-55524-9_14

Hao K, Lin S, Qiao J, Tu Y. A generalised pooling for brain Tumor segmentation. IEEE Access. 2021;23(9):159283–90. https://doi.org/10.1109/ACCESS.2021.3130035 .

Iqbal S, Ghani MU, Saba T, Rehman A. Brain Tumor segmentation in multi-spectral MRI using convolutional neural networks (CNN). Microsc Res Tech. 2018;81(4):419–27. https://doi.org/10.1002/jemt.22994 .

Article   PubMed   Google Scholar  

Isensee F, Jäger PF, Full PM, Vollmuth P, Maier-Hein KH. nnU-Net for brain Tumor segmentation. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6 2021 (pp. 118–132). Springer International Publishing.  https://doi.org/10.1007/978-3-030-72087-2_11

Liu H, Li Q, Wang IC. A deep-learning model with learnable group convolution and deep supervision for brain Tumor segmentation. Math Probl Eng. 2021;10(2021):1–1. https://doi.org/10.1155/2021/6661083 .

Ramesh TR, Lilhore UK, Poongodi M, Simaiya S, Kaur A, Hamdi M. Predictive analysis of heart diseases with machine learning approaches. Malays J Comput Sci. 2022;31:132–48. https://doi.org/10.22452/mjcs.sp2022no1.10 .

Chen S, Ding C, Liu M. Dual-force convolutional neural networks for accurate brain Tumor segmentation. Pattern Recogn. 2019;1(88):90–100. https://doi.org/10.1016/j.patcog.2018.11.009 .

Wadhwa A, Bhardwaj A. Verma VS A review on brain Tumor segmentation of MRI images. Magn Reson Imaging. 2019;1(61):247–59. https://doi.org/10.1016/j.mri.2019.05.043 .

Lilhore U, Kumar S, Simaiya D, Prasad K. A Hybrid Tumor detection and classification based on machine learning. J Comput Theor Nanosci. 2020;17(6):2539–44. https://doi.org/10.1166/jctn.2020.8927 .

Wang Y, Peng J, Jia Z. Brain Tumor segmentation via c-dense convolutional neural network. Progress in Artificial Intelligence. 2021;10:147–56. https://doi.org/10.1007/s13748-021-00232-8 .

Punn NS, Agarwal S. Multi-modality encoded fusion with 3D inception U-net and decoder model for brain Tumor segmentation. Multimedia tools and applications. 2021;80(20):30305–20. https://doi.org/10.1007/s11042-020-09271-0 .

Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin PM, Larochelle H. Brain Tumor segmentation with deep neural networks. Med Image Anal. 2017;1(35):18–31. https://doi.org/10.1016/j.media.2016.05.004 .

Online Kaggle Brain Tumor dataset. BraTS2020 Dataset (Training + Validation). 2022. p. 13.

Google Scholar  

Sharif MI, Li JP, Khan MA, Saleem MA. Active deep neural network features selection for segmentation and recognition of brain Tumors using MRI images. Pattern Recogn Lett. 2020;1(129):181–9. https://doi.org/10.1016/j.patrec.2019.11.019 .

Singh K, Lilhore U, Agrawal N. Survey on different Tumor detection methods from MR images. Int J Sci Res Comput Sci Eng Inf Technol. 2017;5:589–94.

Ghassemi N, Shoeibi A, Rouhani M. Deep neural network with generative adversarial networks pre-training for brain Tumor classification based on MR images. Biomed Signal Process Control. 2020;1(57):101678. https://doi.org/10.1016/j.bspc.2019.101678

Saouli R, Akil M, Kachouri R. Fully automatic brain Tumor segmentation using end-to-end incremental deep neural networks in MRI images. Comput Methods Programs Biomed. 2018;1(166):39–49. https://doi.org/10.1016/j.cmpb.2018.09.007 .

Simaiya S, Lilhore UK, Prasad D, Verma DK. MRI brain Tumor detection & image segmentation by hybrid hierarchical K-means clustering with FCM-based machine learning model. Ann Roman Soc Cell Biol. 2021;28:88–94.

Jia Q, Shu H. Bitr-unet: a cnn-transformer combined network for MRI brain Tumor segmentation. In: Brain lesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 7th International Workshop, Brain Les 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part II. Cham: Springer International Publishing; 2022. p. 3–14. https://doi.org/10.1007/978-3-031-09002-8_1 .

Download references


We pay sincere thanks to all cited researchers.

No External Funding has been received for this research from any International or national body.

Author information

Authors and affiliations.

Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India

Mukul Aggarwal

Rajkiya Engineering College, Sonbhadra, Uttar Pradesh, India

Amod Kumar Tiwari

Amity School of Engineering and Technology, Amity University, Noida, Uttar Pradesh, India

M Partha Sarathi

Faculty of Electrical and Computer Engineering, Arba Minch University, Arba Minch, Ethiopia

Anchit Bijalwan

You can also search for this author in PubMed   Google Scholar


MA: writing and implementation of the proposed algorithm, results gathering, manuscript writing, analysis and interpretation of data. AKT: Supervision, formal analysis, validation, editing. MPS: formal analysis, critical manuscript revision, investigation, editing. AB: BraTS data set analysis, investigation, validation, writing literature—review and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Anchit Bijalwan .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The corresponding author here declares that there is no conflict of interest from the other co-authors, including themselves.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Cite this article.

Aggarwal, M., Tiwari, A.K., Sarathi, M. et al. An early detection and segmentation of Brain Tumor using Deep Neural Network. BMC Med Inform Decis Mak 23 , 78 (2023). https://doi.org/10.1186/s12911-023-02174-8

Download citation

Received : 16 December 2022

Accepted : 12 April 2023

Published : 26 April 2023

DOI : https://doi.org/10.1186/s12911-023-02174-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Brain tumor
  • Segmentation
  • Deep neural network
  • Prediction models

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

brain tumor research paper


  1. (PDF) The presenting features of brain tumors: A review of 200 cases

    brain tumor research paper

  2. (PDF) Brain tumor stem cell

    brain tumor research paper

  3. Results for a proposed method for MRI brain tumor detection in a GUI

    brain tumor research paper

  4. (PDF) A Review Paper on Brain Tumor Segmentation and Detection

    brain tumor research paper

  5. (PDF) Meige’s Syndrome Associated with Brain Tumor

    brain tumor research paper

  6. Segmentation result of the brain tumor (HGG) from a training image

    brain tumor research paper


  1. What Is the Prognosis of a Benign Brain Tumor?

    Benign brain tumors generally carry a good prognosis as they are unlikely to spread. A benign brain tumor is a growth that originates in the brain but does not invade the surrounding tissues and is not considered cancerous, says WebMD.

  2. What Is “stage 4” Brain Cancer?

    There is no standardized staging system for brain cancer, but a grading system is used to determine the seriousness of a tumor. A grade IV brain tumor grows very quickly and easily spreads malignant cells to other parts of the brain, accord...

  3. What Is the Prognosis for Stage 4 Brain Cancer?

    As of 2014, the American Brain Tumor Association states that the prognosis for stage four brain cancer is a survival rate of approximately 14.6 months. Stage four brain cancer consists of malignant tumors called glioblastomas.

  4. Accurate brain tumor detection using deep convolutional neural

    In this study, two different datasets are used. The first one (referred to as dataset 1 in this article) is a publicly available CE-MRI Figshare dataset [40].

  5. Classification of brain tumours in MR images using deep ...

    They are mainly divided into two overall categories: malignant and benign. Benign tumours are not cancerous, they grow slowly and are less

  6. Brain Tumor Segmentation and Survival Prediction Using ...

    This article is part of the Research Topic. Multimodal Brain Tumor Segmentation and Beyond. View all 28 Articles. Brain Tumor Segmentation and

  7. Evolution in diagnosis and detection of brain tumor

    The paper described recent advances in the process of

  8. Brain Tumor Pathology

    Societies, partners and affiliations. The Japan Society of Brain Tumor

  9. :: BTRT :: Brain Tumor Research and Treatment

    Reports and a Review of the Literature. Kim GE, Park SJ, Kim YJ, Kim SK, Jung

  10. Brain tumor detection and segmentation: Interactive framework with

    Research Article. Brain tumor detection and segmentation: Interactive framework with a visual interface and feedback facility for dynamically improved

  11. An early detection and segmentation of Brain Tumor using Deep

    A CNN model has been transformed into an FCN (Fully CNN). This article has critical contributions to brain Tumor research, which are as follows:.

  12. A Deep Analysis of Brain Tumor Detection from MR Images Using

    ... study, we suggest a convolutional neural network (CNN) architecture for the efficient identification of brain tumors using MR images. This paper also

  13. A Review Paper on Brain Tumor Segmentation and Detection

    Content may be subject to copyright. ResearchGate Logo. Discover the world's research. 25+ million members; 160+ million publication pages

  14. Investigating Brain Tumor Segmentation and Detection Techniques

    This article focuses on the work done by many researchers in the past to partially or fully automate the job of segmenting the brain tumor. The consolidated