Skip to main content

Advanced Image Preprocessing & Augmentation Pipeline for All Type Agriculture Crop Disease Datasets (15 Techniques Applied)

Advanced Image Preprocessing & 15-Step Augmentation Pipeline for Agriculture Crop Disease Datasets

Working with agricultural images—especially fruit and leaf disease datasets—comes with its own set of challenges. Lighting variations, inconsistent contrast, noise, and different camera positions can easily confuse a machine-learning model. To handle this, here is a complete preprocessing and augmentation workflow designed for agriculture datasets such as apples, mangoes, grapes, wheat leaves, rice leaves, and more.

This pipeline generates 15 automatic transformations per image, making your dataset much richer and far more suitable for training deep-learning models like CNNs, Vision Transformers, and hybrid SVM-ViT models.


1. Install the Required Python Packages

We install only the necessary tools such as OpenCV, Pillow, and tqdm. These handle image processing, resizing, and progress visualization.

# 📦 Install Required Packages !pip install opencv-python-headless Pillow tqdm

2. Importing Libraries

Next, we import OpenCV, NumPy, PIL, and Google Drive utilities. These are essential for reading images, preprocessing them, and saving the outputs.

# 📚 Import Libraries import os import cv2 import numpy as np import shutil from tqdm import tqdm from PIL import Image, ImageOps from google.colab import drive

3. Mount Google Drive

Most users keep datasets inside Drive, so we mount it here for easy access.

# 🔗 Mount Google Drive drive.mount('/content/drive')

4. Define Input and Output Paths

You can replace the folder names based on your own dataset directory. The script creates a clean output folder to store all processed images.

# 📁 Define Paths input_path = '/content/drive/MyDrive/Apple_Fruit_Dataset' output_path = '/content/processed_apple_dataset' os.makedirs(output_path, exist_ok=True)

We use a standard 256×256 size because it works well with CNN-based models.

TARGET_SIZE = (256, 256)

5. The 15-Step Preprocessing & Augmentation Function

This function generates 15 enhanced versions of every original image. These include grayscale, CLAHE, blurring, edge detection, thresholding, color jitter, gamma correction, rotation, flipping, and more.

This diversity helps the model learn real-world variations in crops and diseases.

# 🔧 Define 15 Preprocessing Functions def advanced_preprocess_15(img): processed = [] # 1. Original resized processed.append(img) # 2. Grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) processed.append(cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)) # 3. Black & White Threshold _, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) processed.append(cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR)) # 4. CLAHE (Contrast Enhancement) lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB) l, a, b = cv2.split(lab) cl = cv2.createCLAHE(clipLimit=2.0).apply(l) limg = cv2.merge((cl, a, b)) processed.append(cv2.cvtColor(limg, cv2.COLOR_LAB2BGR)) # 5. Gaussian Blur processed.append(cv2.GaussianBlur(img, (5, 5), 0)) # 6. Median Blur processed.append(cv2.medianBlur(img, 5)) # 7. Bilateral Filter processed.append(cv2.bilateralFilter(img, 9, 75, 75)) # 8. Adaptive Mean Threshold th_mean = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2) processed.append(cv2.cvtColor(th_mean, cv2.COLOR_GRAY2BGR)) # 9. Adaptive Gaussian Threshold th_gauss = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2) processed.append(cv2.cvtColor(th_gauss, cv2.COLOR_GRAY2BGR)) # 10. Canny Edge Detection canny = cv2.Canny(gray, 100, 200) processed.append(cv2.cvtColor(canny, cv2.COLOR_GRAY2BGR)) # 11. Sobel Edge Detection sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=5) sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=5) sobel = cv2.magnitude(sobelx, sobely) sobel = np.uint8(np.clip(sobel, 0, 255)) processed.append(cv2.cvtColor(sobel, cv2.COLOR_GRAY2BGR)) # 12. Gamma Correction gamma = 1.5 invGamma = 1.0 / gamma table = np.array([((i / 255.0) ** invGamma) * 255 for i in np.arange(256)]).astype("uint8") gamma_corrected = cv2.LUT(img, table) processed.append(gamma_corrected) # 13. HSV Jitter hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) h_, s_, v_ = cv2.split(hsv) s_ = cv2.add(s_, 20) v_ = cv2.add(v_, 20) hsv_merged = cv2.merge([h_, s_, v_]) processed.append(cv2.cvtColor(hsv_merged, cv2.COLOR_HSV2BGR)) # 14. Rotation (+30°) M1 = cv2.getRotationMatrix2D((TARGET_SIZE[0] // 2, TARGET_SIZE[1] // 2), 30, 1.0) rot1 = cv2.warpAffine(img, M1, TARGET_SIZE) processed.append(rot1) # 15. Horizontal Flip processed.append(cv2.flip(img, 1)) return processed

6. Processing Agriculture Dataset Categories

Here, our dataset has two categories:

You can add more classes easily—like scab, leaf spot, canker, anthracnose, etc.

# ⚙️ Process Each Image categories = ['healthy', 'rot'] image_count = 0 for category in categories: input_folder = os.path.join(input_path, category) output_folder = os.path.join(output_path, category) os.makedirs(output_folder, exist_ok=True) for img_name in tqdm(os.listdir(input_folder), desc=f"Processing {category}"): img_path = os.path.join(input_folder, img_name) img = cv2.imread(img_path) if img is None: continue img_resized = cv2.resize(img, TARGET_SIZE) processed_imgs = advanced_preprocess_15(img_resized) for i, proc_img in enumerate(processed_imgs): out_filename = f"abf_{category}_{image_count:04d}_v{i+1}.jpg" cv2.imwrite(os.path.join(output_folder, out_filename), proc_img) image_count += 1

7. Exporting the Final Dataset

The complete processed dataset is compressed into a ZIP file for easy download and training.

# 📦 Zip the Folder shutil.make_archive('/content/processed_apple_dataset', 'zip', output_path) print("✅ Done: Resized, preprocessed, and zipped 15-image variants per raw input.")

Final Thoughts

This 15-step preprocessing pipeline is one of the most advanced setups for agriculture datasets. It enhances clarity, reduces noise, boosts contrast, extracts edges, and generates multiple variations of each image—making your training dataset stronger and more realistic.

Such transformations help deep-learning models detect crop diseases with better accuracy and generalization. Whether you're working on apple fruit rot detection, leaf disease classification, or smart farming applications, this workflow is designed to give you clean, consistent, and augmented images ready for AI models.

If you want, I can also prepare:

Comments

Popular posts from this blog

FDS-01: SIMPLE FLUID FLOW ANALYSIS USING FDS (FIRE DYNAMICS SIMULATOR) TOOL

FDS-01: SIMPLE FLUID FLOW ANALYSIS USING FDS (FIRE DYNAMICS SIMULATOR) TOOL In this tutorial a window is created which is treated as inflow of air with velocity of 2.5 m/s having temperature of 5 C. The outflow conditions is treated at top of the office, and the boundary condition is set as open to atmosphere  The steps are followed in this tutorial are listed below::   Step I: create header syntax file to start program in FDS software.   &HEAD CHID='office'/ Note: office is user defined name of FDS function/ file.   Step II: create syntax for simulation flow time.   &TIME T_END=15.0 Note: 15 sec is simulation flow time, which is solved in FDS software.   Step III: create syntax for initial temperature of domain.   &MISC TMPA=45.0/   Note: 45 C is initial room temperature, which is provided in this tutorial. Following three syntax is must for every FDS function.   Step IV: Create syntax for geom...

TUTORIAL 03: CFD ANALYSIS OF DATA CENTER USING OPEN FOAM SOFTWARE

Title : CFD analysis of data center using open foam software Figure 3.1 (a) Velocity Contour of data center with BCs Figure 3.1 (b) Meshed domain of data center Problem Identification In this problem investigation of data center using OPENFOAM is proposed for heat transfer modeling (data center cooling), in which air is flow in data center from prescribed location section from one inlet condition, which is assumed at surface of left side wall (See the following figure). Air properties are selected from literature available in digital medium. Outlet is at top of the room which is selected for cooling effect of data center system. Some assumptions are applied in this problem like initial room temperature is assumed at constant value for this problem. Air properties are also assumed constant for this problem. “ buoyantBoussinesqPimpleFoam ”  is selected as solver for this problem. Open Foam software is installed on Win 7, provided by FSD blueCAPE Lda: http://bluec...

Ceiling Fan Simulation in a Room Using Ansys Fluent | Adding Fan Boundary Condition in CFD

Ceiling Fan Simulation in a Room Using Ansys Fluent | Adding Fan Boundary Condition in CFD Introduction In this tutorial, we will perform a CFD simulation of a ceiling fan inside a room using Ansys Fluent. We will also learn how to add a fan boundary condition to simulate airflow behavior accurately. Step 1: Setting Up the Geometry in Ansys Open Ansys Workbench and create a new Fluid Flow (Fluent) Project . Use SpaceClaim or DesignModeler to create the room and fan geometry. Ensure that the fan blades are modeled properly or import the 3D fan model. Step 2: Meshing the Model Open the Meshing Tool in Ansys. Apply a fine mesh around the fan for better resolution. Use inflation layers near walls for accurate boundary layer calculations. Step 3: Defining the Boundary Conditions Open Ansys Fluent and import the mesh. Set the room walls as no-slip boundaries. Define the fan region and apply the fan boundary condition . Set the inlet velocity and outlet pressure as per simulation r...