Leveraging Vision-Language Models to Generate Realistic Synthetic Echocardiography Data for Improved Downstream Task Performance
Leveraging the joint representation of anatomical semantic label maps and text prompts, this work demonstrates the ability of diffusion-based models to generate high-fidelity and diverse synthetic echocardiography images, which can enhance the performance of downstream medical segmentation and classification tasks.