Skip to main content

Advanced Image Preprocessing & Augmentation Pipeline for Brain Tumor MRI Datasets

Advanced Image Preprocessing & Augmentation Pipeline for Brain Tumor MRI Datasets (Freely available)

Working with medical images—especially MRI scans—can get tricky because different scanners, resolutions, lighting conditions, and noise levels create inconsistent datasets. When the dataset isn’t uniform, even powerful CNN models like VGG, ResNet, or EfficientNet struggle during training.

To solve this, here’s a complete preprocessing and augmentation workflow that automatically prepares MRI brain images for deep-learning models. It's written in simple, professional language and works perfectly for multi-class tumor classification projects.


1. Installing the Required Libraries

We begin by installing essential packages like OpenCV, Pillow, and tqdm. These tools handle image processing, file conversions, and progress visualization.

# Install Required Packages !pip install opencv-python-headless Pillow tqdm

2. Importing Libraries & Connecting Google Drive

Since MRI datasets are usually stored in Google Drive, the first step is to mount it inside Google Colab.

# Import Libraries import os import cv2 import numpy as np import shutil from tqdm import tqdm from PIL import Image from google.colab import drive # Mount Google Drive drive.mount('/content/drive')

We then define paths for the input dataset and the output folder where the processed images will be stored:

input_path = '/content/drive/MyDrive/dataset-agumented' output_path = '/content/processed_brain_dataset' os.makedirs(output_path, exist_ok=True)

3. Advanced Preprocessing Function Explained

The main part of this project is a custom function called advanced_preprocess_image(). It generates multiple augmented versions of each MRI image, making the dataset richer and more model-friendly.

Here’s what the function does step-by-step:

  • Grayscale conversion – simplifies MRI images without losing structure
  • Binary thresholding – highlights strong boundaries
  • HSV jitter – adds brightness/contrast variations
  • Gamma correction – randomly brightens or darkens the scan
  • Random zooming – simulates cropping variations
  • Rotation (+30° and −30°) – helps the model handle orientation changes
def advanced_preprocess_image(img): processed = [] h, w = img.shape[:2] # 1. Grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) gray_3ch = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR) processed.append(gray_3ch) # 2. Black & White Threshold _, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) bw_3ch = cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR) processed.append(bw_3ch) # 3. Random Color Jitter (HSV) hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) h_, s_, v_ = cv2.split(hsv) s_ = cv2.add(s_, np.random.randint(-20, 20)) v_ = cv2.add(v_, np.random.randint(-20, 20)) hsv_jittered = cv2.merge([h_, s_, v_]) jittered_img = cv2.cvtColor(hsv_jittered, cv2.COLOR_HSV2BGR) processed.append(jittered_img) # 4. Gamma Correction gamma = np.random.uniform(0.5, 1.5) invGamma = 1.0 / gamma table = np.array([((i / 255.0) ** invGamma) * 255 for i in np.arange(256)]).astype("uint8") gamma_corrected = cv2.LUT(img, table) processed.append(gamma_corrected) # 5. Random Zoom zoom_factor = np.random.uniform(0.8, 0.95) nh, nw = int(h * zoom_factor), int(w * zoom_factor) startx = np.random.randint(0, w - nw) starty = np.random.randint(0, h - nh) zoomed = img[starty:starty + nh, startx:startx + nw] zoomed = cv2.resize(zoomed, (224, 224)) processed.append(zoomed) # 6. Rotation ±30 degrees for angle in [30, -30]: M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) rotated = cv2.warpAffine(img, M, (w, h)) rotated = cv2.resize(rotated, (224, 224)) processed.append(rotated) return processed

4. Processing All Four MRI Categories

The dataset contains four folders:

The script loops through each category, resizes images to 224×224, applies preprocessing, and saves every version.

categories = ['glioma_tumor', 'meningioma_tumor', 'no_tumor', 'pituitary_tumor'] image_count = 0 for category in categories: input_folder = os.path.join(input_path, category) output_folder = os.path.join(output_path, category) os.makedirs(output_folder, exist_ok=True) for img_name in tqdm(os.listdir(input_folder), desc=f"Processing {category}"): img_path = os.path.join(input_folder, img_name) img = cv2.imread(img_path) if img is None: continue img = cv2.resize(img, (224, 224)) processed_imgs = advanced_preprocess_image(img) out_filename = f"{category}_{image_count:04d}_orig.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), img) for i, p_img in enumerate(processed_imgs): out_filename = f"{category}_{image_count:04d}_aug{i+1}.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), p_img) image_count += 1

5. Exporting the Final Processed Dataset

Once all images are processed, we zip the dataset so it's easy to download and use for training.

shutil.make_archive('/content/processed_brain_images', 'zip', output_path) print("Brain tumor dataset preprocessing + augmentation complete and zipped.")

Final Thoughts

This preprocessing pipeline prepares a medical image dataset in the best possible way. It fixes lighting problems, improves contrast, adds useful variations, and creates a much stronger dataset for training.

If you're planning to use models like ResNet, VGG, EfficientNet, or MobileNet, this pipeline gives them clean, consistent, and augmented images—leading to better accuracy and faster convergence.

If you want a more advanced version with CLAHE, denoising, edge detection, or auto-segmentation, just let me know—I can create that for you too.

Comments

Popular posts from this blog

FDS-01: SIMPLE FLUID FLOW ANALYSIS USING FDS (FIRE DYNAMICS SIMULATOR) TOOL

FDS-01: SIMPLE FLUID FLOW ANALYSIS USING FDS (FIRE DYNAMICS SIMULATOR) TOOL In this tutorial a window is created which is treated as inflow of air with velocity of 2.5 m/s having temperature of 5 C. The outflow conditions is treated at top of the office, and the boundary condition is set as open to atmosphere  The steps are followed in this tutorial are listed below::   Step I: create header syntax file to start program in FDS software.   &HEAD CHID='office'/ Note: office is user defined name of FDS function/ file.   Step II: create syntax for simulation flow time.   &TIME T_END=15.0 Note: 15 sec is simulation flow time, which is solved in FDS software.   Step III: create syntax for initial temperature of domain.   &MISC TMPA=45.0/   Note: 45 C is initial room temperature, which is provided in this tutorial. Following three syntax is must for every FDS function.   Step IV: Create syntax for geom...

TUTORIAL 03: CFD ANALYSIS OF DATA CENTER USING OPEN FOAM SOFTWARE

Title : CFD analysis of data center using open foam software Figure 3.1 (a) Velocity Contour of data center with BCs Figure 3.1 (b) Meshed domain of data center Problem Identification In this problem investigation of data center using OPENFOAM is proposed for heat transfer modeling (data center cooling), in which air is flow in data center from prescribed location section from one inlet condition, which is assumed at surface of left side wall (See the following figure). Air properties are selected from literature available in digital medium. Outlet is at top of the room which is selected for cooling effect of data center system. Some assumptions are applied in this problem like initial room temperature is assumed at constant value for this problem. Air properties are also assumed constant for this problem. “ buoyantBoussinesqPimpleFoam ”  is selected as solver for this problem. Open Foam software is installed on Win 7, provided by FSD blueCAPE Lda: http://bluec...

Ceiling Fan Simulation in a Room Using Ansys Fluent | Adding Fan Boundary Condition in CFD

Ceiling Fan Simulation in a Room Using Ansys Fluent | Adding Fan Boundary Condition in CFD Introduction In this tutorial, we will perform a CFD simulation of a ceiling fan inside a room using Ansys Fluent. We will also learn how to add a fan boundary condition to simulate airflow behavior accurately. Step 1: Setting Up the Geometry in Ansys Open Ansys Workbench and create a new Fluid Flow (Fluent) Project . Use SpaceClaim or DesignModeler to create the room and fan geometry. Ensure that the fan blades are modeled properly or import the 3D fan model. Step 2: Meshing the Model Open the Meshing Tool in Ansys. Apply a fine mesh around the fan for better resolution. Use inflation layers near walls for accurate boundary layer calculations. Step 3: Defining the Boundary Conditions Open Ansys Fluent and import the mesh. Set the room walls as no-slip boundaries. Define the fan region and apply the fan boundary condition . Set the inlet velocity and outlet pressure as per simulation r...