Emotional states are strong influential factors of humans’ choices, activities, and desires. They can be evaluated from face, self-observing reports and, what this thesis focuses on, speech. While there is some research done in speech emotion recognition it has less exploitation of deep learning approaches due to the field’s recentness and recent improvements in computational and optimizational approaches. In addition, the complicatedness of collecting improvised data, not from professional adult actors remains present in the state-of-the-art literature. Thus, the goal of this thesis is to explore the area of speech emotion recognition in children by testing the predominant approaches of neural networks with temporal prosody as well as abruptly expanding Transformers methods. We investigate the potential of transfer knowledge applied from adults’ to children’s data as the mechanism of dealing with lacking data. From the outcomes, we observe the improvement in the opportunities of transfer knowledge when gender and cultural aspects are included into the classification of emotions. Emotionally intelligent systems built based on the experiments described in the thesis can benefit the fields of remote monitoring or telemedicine for psychologists and pediatrists, teaching emotional intelligence for autistic children, and improving children’s health diagnostics and scanning procedures.