Skip to main content

Advanced Image Preprocessing & Augmentation Pipeline for Brain Tumor MRI Datasets

Advanced Image Preprocessing & Augmentation Pipeline for Brain Tumor MRI Datasets (Freely available)

Working with medical images—especially MRI scans—can get tricky because different scanners, resolutions, lighting conditions, and noise levels create inconsistent datasets. When the dataset isn’t uniform, even powerful CNN models like VGG, ResNet, or EfficientNet struggle during training.

To solve this, here’s a complete preprocessing and augmentation workflow that automatically prepares MRI brain images for deep-learning models. It's written in simple, professional language and works perfectly for multi-class tumor classification projects.


1. Installing the Required Libraries

We begin by installing essential packages like OpenCV, Pillow, and tqdm. These tools handle image processing, file conversions, and progress visualization.

# Install Required Packages !pip install opencv-python-headless Pillow tqdm

2. Importing Libraries & Connecting Google Drive

Since MRI datasets are usually stored in Google Drive, the first step is to mount it inside Google Colab.

# Import Libraries import os import cv2 import numpy as np import shutil from tqdm import tqdm from PIL import Image from google.colab import drive # Mount Google Drive drive.mount('/content/drive')

We then define paths for the input dataset and the output folder where the processed images will be stored:

input_path = '/content/drive/MyDrive/dataset-agumented' output_path = '/content/processed_brain_dataset' os.makedirs(output_path, exist_ok=True)

3. Advanced Preprocessing Function Explained

The main part of this project is a custom function called advanced_preprocess_image(). It generates multiple augmented versions of each MRI image, making the dataset richer and more model-friendly.

Here’s what the function does step-by-step:

  • Grayscale conversion – simplifies MRI images without losing structure
  • Binary thresholding – highlights strong boundaries
  • HSV jitter – adds brightness/contrast variations
  • Gamma correction – randomly brightens or darkens the scan
  • Random zooming – simulates cropping variations
  • Rotation (+30° and −30°) – helps the model handle orientation changes
def advanced_preprocess_image(img): processed = [] h, w = img.shape[:2] # 1. Grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) gray_3ch = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR) processed.append(gray_3ch) # 2. Black & White Threshold _, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) bw_3ch = cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR) processed.append(bw_3ch) # 3. Random Color Jitter (HSV) hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) h_, s_, v_ = cv2.split(hsv) s_ = cv2.add(s_, np.random.randint(-20, 20)) v_ = cv2.add(v_, np.random.randint(-20, 20)) hsv_jittered = cv2.merge([h_, s_, v_]) jittered_img = cv2.cvtColor(hsv_jittered, cv2.COLOR_HSV2BGR) processed.append(jittered_img) # 4. Gamma Correction gamma = np.random.uniform(0.5, 1.5) invGamma = 1.0 / gamma table = np.array([((i / 255.0) ** invGamma) * 255 for i in np.arange(256)]).astype("uint8") gamma_corrected = cv2.LUT(img, table) processed.append(gamma_corrected) # 5. Random Zoom zoom_factor = np.random.uniform(0.8, 0.95) nh, nw = int(h * zoom_factor), int(w * zoom_factor) startx = np.random.randint(0, w - nw) starty = np.random.randint(0, h - nh) zoomed = img[starty:starty + nh, startx:startx + nw] zoomed = cv2.resize(zoomed, (224, 224)) processed.append(zoomed) # 6. Rotation ±30 degrees for angle in [30, -30]: M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) rotated = cv2.warpAffine(img, M, (w, h)) rotated = cv2.resize(rotated, (224, 224)) processed.append(rotated) return processed

4. Processing All Four MRI Categories

The dataset contains four folders:

The script loops through each category, resizes images to 224×224, applies preprocessing, and saves every version.

categories = ['glioma_tumor', 'meningioma_tumor', 'no_tumor', 'pituitary_tumor'] image_count = 0 for category in categories: input_folder = os.path.join(input_path, category) output_folder = os.path.join(output_path, category) os.makedirs(output_folder, exist_ok=True) for img_name in tqdm(os.listdir(input_folder), desc=f"Processing {category}"): img_path = os.path.join(input_folder, img_name) img = cv2.imread(img_path) if img is None: continue img = cv2.resize(img, (224, 224)) processed_imgs = advanced_preprocess_image(img) out_filename = f"{category}_{image_count:04d}_orig.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), img) for i, p_img in enumerate(processed_imgs): out_filename = f"{category}_{image_count:04d}_aug{i+1}.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), p_img) image_count += 1

5. Exporting the Final Processed Dataset

Once all images are processed, we zip the dataset so it's easy to download and use for training.

shutil.make_archive('/content/processed_brain_images', 'zip', output_path) print("Brain tumor dataset preprocessing + augmentation complete and zipped.")

Final Thoughts

This preprocessing pipeline prepares a medical image dataset in the best possible way. It fixes lighting problems, improves contrast, adds useful variations, and creates a much stronger dataset for training.

If you're planning to use models like ResNet, VGG, EfficientNet, or MobileNet, this pipeline gives them clean, consistent, and augmented images—leading to better accuracy and faster convergence.

If you want a more advanced version with CLAHE, denoising, edge detection, or auto-segmentation, just let me know—I can create that for you too.

Comments

Popular posts from this blog

New Era of AI based Chat Tools (chatbot): ChatGPT In Hindi : Explanation...

FDS-01: SIMPLE FLUID FLOW ANALYSIS USING FDS (FIRE DYNAMICS SIMULATOR) TOOL

FDS-01: SIMPLE FLUID FLOW ANALYSIS USING FDS (FIRE DYNAMICS SIMULATOR) TOOL In this tutorial a window is created which is treated as inflow of air with velocity of 2.5 m/s having temperature of 5 C. The outflow conditions is treated at top of the office, and the boundary condition is set as open to atmosphere  The steps are followed in this tutorial are listed below::   Step I: create header syntax file to start program in FDS software.   &HEAD CHID='office'/ Note: office is user defined name of FDS function/ file.   Step II: create syntax for simulation flow time.   &TIME T_END=15.0 Note: 15 sec is simulation flow time, which is solved in FDS software.   Step III: create syntax for initial temperature of domain.   &MISC TMPA=45.0/   Note: 45 C is initial room temperature, which is provided in this tutorial. Following three syntax is must for every FDS function.   Step IV: Create syntax for geom...

Tutorial 01 Heat transfer analysis in Pipe Flow in Elbow pipe structure OPENFOAM

 Title : Heat transfer analysis in Pipe Flow in Elbow pipe structure Figure 1.1 Meshed domain of elbow section Problem Identification This problem is simple pipe flow problem, in which air is flow in elbow section from two inlet conditions having different flow and energy boundary conditions. (See the following figure). Air properties are selected from literature material available in digital medium. “ buoyantBoussinesqPimpleFoam ”  is selected as solver for this problem. Open Foam software is installed on Win 7, provided by FSD blueCAPE Lda: http://bluecfd.com / Note : This tutorial is not endorsed/ supported by any provider of software's which are used in this tutorial. This tutorial is made for educational purpose only. Figure1.2 Different Boundary Conditions The simulation is solved in ambient conditions at transient flow scheme available in selected solver. Pre-Processing This problem is selected as turbulent flow problem, so the solver must...