Metal-conscious Transformer Enhanced CBCT Projection Inpainting

Computed tomography device (CT) is a means of tomographic imaging technology, and it has been developed
rapidly. Due to beam hardening effect, metallic artifacts occur and degrade the quality of CT images. Metal
artifacts have been the focus and difficulty in the field of CT imaging research because of their direct impact on
clinical diagnosis and the diversity of manifestations and causes [1]. In order to reconstruct metal-free CT
images, the inpainting task is an essential part.
The traditional method of inpainting replaces the metal-affected region of the projected data by interpolation
[2][3]. Recently, deep convolutional networks (CNNs) have shown strong potential in all computer vision tasks,
including image inpainting. Several approaches have been proposed for image restoration using CNN based
encoder-decoder network. Shift-Net based on U-Net architecture is one of these approaches, which has good
restoration accuracy in structure and texture [4]. Zeng et al. [5] built a pyramidal-context architecture called
PEN-NET for high-quality image inpainting. Liao et al. [6] proposed a new generative mask pyramid network
to reduce for CT/CBCT Metal Artifact Reduction. Although CNNs have many advantages, their field of
perception is usually small and not conducive to capturing global features. On the contrary, Vision Transformer
(ViT) uses attention to model long-term dependencies among image patches. The shifted window Transformer
(Swin Transformer) is proposed to adapt to the high resolution of images in vision tasks [8], taking into account
the translational invariance of CNNs, the perceptual field and the hierarchical relationship.
To overcome the shortage of medical image data and the domain shift problem in the field of deep learning, this
research is based on simulated X-ray images using ViT as the encoder and CNN as the decoder for image
inpainting. In order to further improve the inpainting performance, some variants of the backbone network are
considered, such as using Swin Transformer instead of ViT and adding the adversarial loss.
The paper will include the following points:
• Literature review in inpainting and metal artifacts reduction.
• Traditional method and CNN based model implementation.
• ViT-based model construction; parameter optimization and incorporation with adversarial loss; results
evaluation.
• Thesis writing.

References
[1] Netto, C., Mansur, N., Tazegul, T., Lalevee, M., Lee, H., Behrens, A., Lintz, F., Godoy-Santos, A., Dibbern,
K., Anderson, D. Implant Related Artifact Around Metallic and Bio-Integrative Screws: A CT Scan 3D
Hounsfield Unit Assessment. Foot & Ankle Orthopaedics. 7, 2473011421S00174 (2022)
[2] Kalender WA, Hebel R, Ebersberger J. Reduction of CT artifacts caused by metallic implants. Radiology.
1987 Aug;164(2):576-7. doi: 10.1148/radiology.164.2.3602406. PMID: 3602406.
[3] Meyer E, Raupach R, Lell M, Schmidt B, Kachelriess M. Normalized metal artifact reduction (NMAR) in
computed tomography. Med Phys. 2010 Oct;37(10):5482-93. doi: 10.1118/1.3484090. PMID: 21089784.
[4] Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, and Shiguang Shan. Shift-net: Image inpainting via
deep feature rearrangement, 2018.
[5] Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. Learning pyramid-context encoder
network for high-quality image inpainting, 2019.
[6] Haofu Liao, Wei-An Lin, Zhimin Huo, Levon Vogelsang, William J Sehnert, S Kevin Zhou, and Jiebo Luo.
Generative mask pyramid network for ct/cbct metal artifact reduction with joint projection-sinogram
correction. In International Conference on Medical Image Computing and Computer-Assisted Intervention,
pages 77–85. Springer, 2019.
[7] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas
Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth
16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[8] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin
transformer: Hierarchical vision transformer using shifted windows, 2021.