Ensemble and Multi-Scale Inference Strategies for Improving Tile-Based Defect Segmentation in Industrial X-Ray Images

Background and Motivation

The current defect segmentation pipeline for industrial X-ray images uses a pretrained U-Net model applied in a tile-based manner. Each large image is split into overlapping tiles, inference is performed per tile, and the results are merged into a full-size prediction mask. Early observations show that small changes in tile position, downsampling, or image preprocessing can lead to different segmentation outcomes. This suggests that the system may benefit from ensemble-style inference, where predictions from multiple slightly different inputs are combined. This project explores whether multi-scale inference, tile-shift ensembles, or merged probability maps can improve defect detection performance or reduce false positives, without retraining the model.

Problem Definition

Tile-based segmentation can be sensitive to tile alignment and image scale. Different tiling offsets or scales may reveal additional defect pixels or affect false positive rates. However, the best way to combine or merge these predictions remains unknown. The goal is to systematically evaluate multiple ensemble and merging strategies and determine which provides the best overall performance.

Objectives

1. Test whether shifting tile positions (tile-offset inference) leads to changes in predicted defect masks.

2. Evaluate multi-scale inference by comparing predictions from original and downsampled images.

3. Implement merging strategies for combining probabilistic outputs from multiple inference runs.

4. Benchmark all strategies using quantitative metrics (Dice, IoU, precision, recall, FP count).

5. Identify which ensemble or merging technique improves detection quality without retraining.