This thesis investigates how federated learning can be applied to train vision-language models in the medical domain while preserving patient privacy. The work focuses on enabling multi-institutional collaboration without sharing sensitive data, supporting the development of secure and scalable AI solutions for healthcare.