Learning Object Skeletons with Differentiable Vector Primitives
Erlernen von Objektskeletten mittels differenzierbarer Vektorprimitiven
Thesis description
The aim of this thesis is to develop a method for object skeletonisation ”in the wild” by exploiting differentiable
rendering of vector primitives. An object skeleton is a small set of strokes linked as a graph and captures how
object parts connect and where they bend [1]. It is a compact, editable, and helpful tool for various tasks in
visual recognition, tracking and image-based manipulation. Current skeletonisation systems, however, often fail
in cluttered scenes, break under occlusion, and confuse surface texture with the actual structure. In this thesis,
we propose to recover a short list of parametric strokes whose differentiable rendering explains the image.
This analysis-by-synthesis view lets us state clearly what we expect and then optimise for it, in particular,
smoothness, connectivity, and efficiency. The approach is inspired by differentiable drawing of simple vector
primitives [2, 3], which provides a way to render curves with gradients, and is here adapted to skeletal topology
and natural imagery.
Methodologically, the project will use a differentiable rasteriser for curves and line segments as the core
tool [2, 4]. Candidate strokes, with learnable geometry and width, are rendered to soft images that can be
compared to evidence from the photograph, such as edges or a rough mask. Because the renderer is differentiable,
we can refine the stroke parameters by gradient-based optimisation. The training objective aims to balance two
aspects: (1) image agreement (the render should look like the observed structure) and (2) structural agreement
(the skeleton should be sparse, smooth, and well-connected). Junctions should be encouraged to meet at
sensible angles, and unnecessary small loops are discouraged. To handle clutter and occlusion, the system will
allow simple layering, guided by features from a compact Deep Learning-based encoder network. Inference will
proceed in two complementary ways. A first path proposes a small set of stroke seeds and tentative links,
then improves them through the renderer. A second, end-to-end path trains the proposal network using the
same rendering objective, so that the whole pipeline learns to produce strokes that render well. Robustness
will be improved with standard augmentation and a light auxiliary signal that maintains consistency between
the rendered skeleton features and the photograph. Evaluation will primarily measure centre-line accuracy and
may test whether the skeletons help with downstream tasks.
Throughout the thesis, the following steps should be taken into consideration:
- Literature survey: read core related work on skeletonisation and differentiable drawing; set the problem
scope, success criteria, etc. - Datasets and metrics: choose training/validation/test sets, define quantitative evaluation metrics, and
decide on baselines for comparison. - Method development and implementation: implement a differentiable renderer based on the method
described above, and adapt it to the skeletonisation problem at hand. - Experiment and ablation study: evaluate the method quantitatively and qualitatively against the
baselines, and study the effects of various settings in the rendered pipeline (types of primitives, specific
combination of learning objectives, etc.)
References
[1] H. Blum. A Transformation for Extracting New Descriptors of Shape. M.I.T. Press, 1967.
[2] Daniela Mihai and Jonathon Hare. Differentiable drawing and sketching. arXiv preprint arXiv:2103.16194,
2021.
[3] Hiroharu Kato, Deniz Beker, Mihai Morariu, Takahiro Ando, Toru Matsuoka, Wadim Kehl, and Adrien
Gaidon. Differentiable rendering: A survey. arXiv preprint arXiv:2006.12057, 2020.
Master Thesis Neda Zargar Talebi
[4] Xingzhe He, Bastian Wandt, and Helge Rhodin. Autolink: Self-supervised learning of human skeletons and
object outlines by linking keypoints. Advances in Neural Information Processing Systems, 35:36123–36141,
2022.