Typically a clean speech consists of two components, a locally periodic component and a stochastic component. If a speech signal only has a stochastic component, the difference between the enhanced signal applied with the corresponding ideal ratio mask and the clean speech signal is barely perceivable. However, if a speech has a perfect periodic component, then the enhanced signal applied with the corresponding ideal ratio mask is affected by the inter-harmonic noise.
A comb filter based on the speech signal’s pitch period is able to attenuate noise between the pitch harmonics. Thus, a robust pitch estimate is of fundamental importance. In this work, a deep learning-based method for robust pitch estimation in noisy environments will be investigated.
Deep Learning-based Pitch Estimation and Comb Filter Construction
Type: MA thesis
Status: finished
Date: November 2, 2020 - April 30, 2021
Supervisors: Hendrik Schröter, Andreas Maier