Abstract
"Multimodal data are increasingly common in modern biomedical and machine learning applications yet learning useful representations from heterogeneous modalities remains challenging. A central issue is that different modalities may contain complementary information, but the extent and pattern of information sharing can vary substantially across modalities. In this talk, I will present two recent works that develop statistical foundations for contrastive learning in multimodal settings. The first focuses on electronic health records and studies how structured clinical codes and unstructured clinical notes can be jointly embedded through a multimodal contrastive framework. This approach connects the contrastive objective to a pointwise mutual information matrix, yielding an interpretable and privacy-preserving algorithm based on summary level co-occurrence information. The second work moves beyond the conventional sharedversus-private decomposition and introduces a hierarchical framework that learns globally shared, partially shared, and modality-specific representations within a unified model. I will discuss the key modeling ideas, identifiability results, recovery guarantees, and implications for downstream prediction. Together, these works highlight how principled statistical modeling can improve both the interpretability and effectiveness of multimodal representation learning." plex discovery workflows.
About the speaker
Doudou Zhou is an Assistant Professor of Statistics & Data Science at the National University of Singapore. His research lies at the intersection of statistics, machine learning, and artificial intelligence, with a focus on statistical learning theory, multimodal data integration, electronic health records, and the evaluation of large language models. He develops principled methods for learning from noisy, heterogeneous, and partially observed data, with applications in biomedicine and modern AI systems.
