Advanced Image Preprocessing & Augmentation Pipeline for Brain Tumor MRI Datasets

Advanced Image Preprocessing & Augmentation Pipeline for Brain Tumor MRI Datasets (Freely available)

Working with medical images—especially MRI scans—can get tricky because different scanners, resolutions, lighting conditions, and noise levels create inconsistent datasets. When the dataset isn’t uniform, even powerful CNN models like VGG, ResNet, or EfficientNet struggle during training.

To solve this, here’s a complete preprocessing and augmentation workflow that automatically prepares MRI brain images for deep-learning models. It's written in simple, professional language and works perfectly for multi-class tumor classification projects.

1. Installing the Required Libraries

We begin by installing essential packages like OpenCV, Pillow, and tqdm. These tools handle image processing, file conversions, and progress visualization.

# Install Required Packages !pip install opencv-python-headless Pillow tqdm

2. Importing Libraries & Connecting Google Drive

Since MRI datasets are usually stored in Google Drive, the first step is to mount it inside Google Colab.

# Import Libraries import os import cv2 import numpy as np import shutil from tqdm import tqdm from PIL import Image from google.colab import drive # Mount Google Drive drive.mount('/content/drive')

We then define paths for the input dataset and the output folder where the processed images will be stored:

input_path = '/content/drive/MyDrive/dataset-agumented' output_path = '/content/processed_brain_dataset' os.makedirs(output_path, exist_ok=True)

3. Advanced Preprocessing Function Explained

The main part of this project is a custom function called advanced_preprocess_image(). It generates multiple augmented versions of each MRI image, making the dataset richer and more model-friendly.

Here’s what the function does step-by-step:

Grayscale conversion – simplifies MRI images without losing structure
Binary thresholding – highlights strong boundaries
HSV jitter – adds brightness/contrast variations
Gamma correction – randomly brightens or darkens the scan
Random zooming – simulates cropping variations
Rotation (+30° and −30°) – helps the model handle orientation changes

def advanced_preprocess_image(img): processed = [] h, w = img.shape[:2] # 1. Grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) gray_3ch = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR) processed.append(gray_3ch) # 2. Black & White Threshold _, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) bw_3ch = cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR) processed.append(bw_3ch) # 3. Random Color Jitter (HSV) hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) h_, s_, v_ = cv2.split(hsv) s_ = cv2.add(s_, np.random.randint(-20, 20)) v_ = cv2.add(v_, np.random.randint(-20, 20)) hsv_jittered = cv2.merge([h_, s_, v_]) jittered_img = cv2.cvtColor(hsv_jittered, cv2.COLOR_HSV2BGR) processed.append(jittered_img) # 4. Gamma Correction gamma = np.random.uniform(0.5, 1.5) invGamma = 1.0 / gamma table = np.array([((i / 255.0) ** invGamma) * 255 for i in np.arange(256)]).astype("uint8") gamma_corrected = cv2.LUT(img, table) processed.append(gamma_corrected) # 5. Random Zoom zoom_factor = np.random.uniform(0.8, 0.95) nh, nw = int(h * zoom_factor), int(w * zoom_factor) startx = np.random.randint(0, w - nw) starty = np.random.randint(0, h - nh) zoomed = img[starty:starty + nh, startx:startx + nw] zoomed = cv2.resize(zoomed, (224, 224)) processed.append(zoomed) # 6. Rotation ±30 degrees for angle in [30, -30]: M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) rotated = cv2.warpAffine(img, M, (w, h)) rotated = cv2.resize(rotated, (224, 224)) processed.append(rotated) return processed

4. Processing All Four MRI Categories

The dataset contains four folders:

The script loops through each category, resizes images to 224×224, applies preprocessing, and saves every version.

categories = ['glioma_tumor', 'meningioma_tumor', 'no_tumor', 'pituitary_tumor'] image_count = 0 for category in categories: input_folder = os.path.join(input_path, category) output_folder = os.path.join(output_path, category) os.makedirs(output_folder, exist_ok=True) for img_name in tqdm(os.listdir(input_folder), desc=f"Processing {category}"): img_path = os.path.join(input_folder, img_name) img = cv2.imread(img_path) if img is None: continue img = cv2.resize(img, (224, 224)) processed_imgs = advanced_preprocess_image(img) out_filename = f"{category}_{image_count:04d}_orig.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), img) for i, p_img in enumerate(processed_imgs): out_filename = f"{category}_{image_count:04d}_aug{i+1}.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), p_img) image_count += 1

5. Exporting the Final Processed Dataset

Once all images are processed, we zip the dataset so it's easy to download and use for training.

shutil.make_archive('/content/processed_brain_images', 'zip', output_path) print("Brain tumor dataset preprocessing + augmentation complete and zipped.")

Final Thoughts

This preprocessing pipeline prepares a medical image dataset in the best possible way. It fixes lighting problems, improves contrast, adds useful variations, and creates a much stronger dataset for training.

If you're planning to use models like ResNet, VGG, EfficientNet, or MobileNet, this pipeline gives them clean, consistent, and augmented images—leading to better accuracy and faster convergence.

If you want a more advanced version with CLAHE, denoising, edge detection, or auto-segmentation, just let me know—I can create that for you too.

Tutorial 01 Heat transfer analysis in Pipe Flow in Elbow pipe structure OPENFOAM

Title : Heat transfer analysis in Pipe Flow in Elbow pipe structure Figure 1.1 Meshed domain of elbow section Problem Identification This problem is simple pipe flow problem, in which air is flow in elbow section from two inlet conditions having different flow and energy boundary conditions. (See the following figure). Air properties are selected from literature material available in digital medium. “ buoyantBoussinesqPimpleFoam ” is selected as solver for this problem. Open Foam software is installed on Win 7, provided by FSD blueCAPE Lda: http://bluecfd.com / Note : This tutorial is not endorsed/ supported by any provider of software's which are used in this tutorial. This tutorial is made for educational purpose only. Figure1.2 Different Boundary Conditions The simulation is solved in ambient conditions at transient flow scheme available in selected solver. Pre-Processing This problem is selected as turbulent flow problem, so the solver must...

CFD-AI/ML-Expert

Search This Blog