End-to-End SMPL Parameter Estimation and Template-to-Scan Registration Tailored Towards Automatic X-ray to CT Initialization

Abstract: In the field of interventional image fusion, an automatic initialization process is highly desired to reduce reliance on manual user inputs. Recently, several techniques have been proposed which utilize landmarks to provide an efficient initialization for 2D/3D registration [1, 2]. Alternatively, another method involves directly regressing the 6D orientation of the CT volume based on the input X-ray image. Each of these approaches, however, encounters difficulties in gathering adequate ground truth data to train a sufficiently robust system [3].
In addition to these methods, there’s a growing line of research investigating the utilization of statistical shape models to provide an effective initialization for 2D/3D registration algorithms. When given X-ray images paired with ground truth statistical shape model parameters, multiple approaches can be employed to solve the problem of estimating these parameters. The challenge, however, lies in establishing a ground truth pairing between the CT volumes and the statistical shape model, with the goal of learning a network which can be directly applied for 2D/3D initialization. In essence, we aim to consolidate the two-step process: first establishing pseudo ground truth parameters, then learning a network to predict the parameters in accordance with the pseudo ground truth. Our approach has the potential to reduce systematic errors incurred during the process of establishing pseudo ground truth, providing a more robust initialization that is directly suitable for use in 2D/3D registration tasks.
Moving forward, we plan to construct our proposed method using a linear inverse kinematic solver [4], in conjunction with a self-supervised template-to-scan registration process [5]. This approach will enable us to directly learn end-to-end estimation of SMPL parameters, as well as parameters to register to the input scan (CT volume). In doing so, we can directly predict the 3D CT volume-specific initialization for a given X-ray image without the necessity of an additional explicit step to register the CT volume with the statistical shape model.

References:
[1] Bier, B., Unberath, M., Zaech, J. N., Fotouhi, J., Armand, M., Osgood, G., … & Maier, A. (2018, September). X-ray-transform invariant anatomical landmark detection for pelvic trauma surgery. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 55-63). Springer, Cham.
[2] Grupp, R. B., Unberath, M., Gao, C., Hegeman, R. A., Murphy, R. J., Alexander, C. P., … & Taylor, R. H. (2020). Automatic annotation of hip anatomy in fluoroscopy for robust and efficient 2D/3D registration. International journal of computer assisted radiology and surgery, 15(5), 759-769.
[3] Grimm, M., Esteban, J., Unberath, M., & Navab, N. (2021). Pose-dependent weights and Domain Randomization for fully automatic X-ray to CT Registration. IEEE Transactions on Medical Imaging.
[4] Shetty, Karthik, et al. “PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
[5] Bhatnagar, Bharat Lal, et al. “Loopreg: Self-supervised learning of implicit surface correspondences, pose and shape for 3d human mesh registration.” Advances in Neural Information Processing Systems 33 (2020): 12909-12922.