Florin Ghesu
Artificial Intelligence for Medical Image Understanding
Abstract
Robust and fast detection and segmentation of anatomical structures in medical image data represents an important component of medical image analysis technologies. Current solutions for this problem are nowadays based on machine learning techniques that exploit large annotated image databases in order to learn the appearance of the captured anatomy. These solutions are subject to several limitations. This includes the use of suboptimal image feature engineering methods and most importantly the use of computationally suboptimal search-schemes for anatomy parsing, e.g., exhaustive hypotheses scanning. In particular, these techniques do not effectively address cases of incomplete data, i.e., scans acquired with a partial field-of-view. To address these challenges, we introduce in this thesis marginal space deep learning, a framework for medical image parsing which exploits the automated feature design of deep learning models and an efficient object parametrization scheme in hierarchical marginal spaces. To support the efficient evaluation of solution hypotheses under complex transformations, such as rotation and anisotropic scaling, we propose a novel cascaded network architecture, called sparse adaptive neural network. Experiments on detecting and segmenting the aortic root in 2891 3D ultrasound volumes from 869 patients, demonstrate a high level of robustness with an accuracy increase of 30-50% against the state-of-the-art. Despite these facts, using a scanning routine to explore large parameter subspaces results in a high computational complexity, with false-positive predictions and limited scalability to high resolution volumetric data. To deal with these limitations, we propose a novel paradigm for medical image parsing, based on principles of cognitive modeling and behavior learning. The anatomy detection problem is reformulated as a behavior learning task for an intelligent artificial agent. Using deep reinforcement learning, agents are taught how to search for an anatomical structure. This resumes to learning to navigate optimal search trajectories through the image space that converge to the locations of the sought anatomical structures. To support the effective parsing of high resolution volumetric data, we apply elements from scale-space theory and enhance our framework to support the learning of multi-scale search strategies through the scale-space representation of medical images. Finally, to enable the accurate recognition whether certain anatomical landmarks are missing from the field-of-view, we exploit prior knowledge about the anatomy and ensure the spatial coherence of the agents by using statistical shape modeling and robust estimation theory. Comprehensive experiments demonstrate a high level of accuracy, compared to state-of-the-art solutions, without failures of clinical significance. In particular, our method achieves 0% false positive and 0% false negative rates at detecting whether anatomical structures are captured in the field-of-view (excluding border cases). The dataset contains 5043 3D computed tomography volumes from over 2000 patients, totaling over 2,500,000 image slices. A significant increase in accuracy compared to reference solutions is achieved on additional 2D ultrasound and 2D/3D magnetic resonance datasets, containing up to 1000 images. Most importantly, this paradigm improves the detection-speed of the previous solutions by 2-3 orders of magnitude, achieving unmatched real-time performance on high resolution volumetric scans.