The School of Computing and Data Science (https://www.cds.hku.hk/) was established by the University of Hong Kong on 1 July 2024, comprising the Department of Computer Science and Department of Statistics and Actuarial Science and Department of AI and Data Science.

Events for
Past Seminars and Events
May 12, 2026
  • Title: Anti-concentration inequalities for the difference of maxima of gaussian random vectors

    Time: 10:30am 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Shuting Shen

    Remark(s): 

    Abstract

    We derive novel anti-concentration bounds for the difference between the maximal values of two Gaussian random vectors across various settings. Our bounds are dimension-free, scaling with the dimension of the Gaussian vectors only through the smaller expected maximum of the Gaussian subvectors. In addition, our bounds hold under the degenerate covariance structures, which previous results do not cover. In addition, we show that our conditions are sharp under the homogeneous component-wise variance setting, while we only impose some mild assumptions on the covariance structures under the heterogeneous variance setting. We apply the new anticoncentration bounds to derive the central limit theorem for the maximizers of discrete empirical processes. Finally, we back up our theoretical findings with comprehensive numerical studies.

    About the speaker

    Shen Shuting is an Assistant Professor of Statistics & Data Science at the National University of Singapore. Before joining NUS, she was a postdoctoral fellow at the Fuqua School of Business and the Department of Biostatistics & Bioinformatics at Duke University, jointly supervised by Dr. Alexandre Belloni and Dr. Ethan X. Fang. Prior to her postdoctoral position, she obtained her PhD in Biostatistics from Harvard University in 2023, where she was jointly supervised by Dr. Xihong Lin and Dr. Junwei Lu. She earned a B.A. and a B.S. in Mathematics (dual) from Peking University in 2018. Her research interests primarily include large-scale inference, combinatorial inference, choice model asymptotics, operations research theories, applied probability, and distributed computing.

May 11, 2026
  • Title: From cross-modal alignment to hierarchical sharing: statistical foundations of contrastive learning for multimodal data

    Time: 02:30pm 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Doudou Zhou

    Remark(s): 

    Abstract

    "Multimodal data are increasingly common in modern biomedical and machine learning applications yet learning useful representations from heterogeneous modalities remains challenging. A central issue is that different modalities may contain complementary information, but the extent and pattern of information sharing can vary substantially across modalities. In this talk, I will present two recent works that develop statistical foundations for contrastive learning in multimodal settings. The first focuses on electronic health records and studies how structured clinical codes and unstructured clinical notes can be jointly embedded through a multimodal contrastive framework. This approach connects the contrastive objective to a pointwise mutual information matrix, yielding an interpretable and privacy-preserving algorithm based on summary level co-occurrence information. The second work moves beyond the conventional sharedversus-private decomposition and introduces a hierarchical framework that learns globally shared, partially shared, and modality-specific representations within a unified model. I will discuss the key modeling ideas, identifiability results, recovery guarantees, and implications for downstream prediction. Together, these works highlight how principled statistical modeling can improve both the interpretability and effectiveness of multimodal representation learning." plex discovery workflows.

    About the speaker

    Doudou Zhou is an Assistant Professor of Statistics & Data Science at the National University of Singapore. His research lies at the intersection of statistics, machine learning, and artificial intelligence, with a focus on statistical learning theory, multimodal data integration, electronic health records, and the evaluation of large language models. He develops principled methods for learning from noisy, heterogeneous, and partially observed data, with applications in biomedicine and modern AI systems.

  • Title: Generative AI for Drug Discovery: From High-Resolution Proteomics to Autonomous Scientific Workflows

    Time: 10:00am 

    Venue: CB308, 3/F, Chow Yei Ching Building, HKU (Zoom broadcasting)

    Speaker(s): Dr. Elie Wolfe

    Remark(s): 

    Abstract

    The integration of generative AI into drug discovery is moving beyond simple structure prediction toward a more comprehensive and autonomous pipeline. In this talk, I will focus on our recent efforts to accelerate AI-driven drug discovery (AIDD) through a multi-layered approach. I will first present our work on de novo protein and peptide sequencing, which enables the high-resolution data acquisition necessary for identifying novel targets. I will then delve into our core research on biomolecular structure prediction, discussing how we optimize these models for the specific challenges of therapeutic design. Finally, I will briefly explore how these generative tools are setting the stage for agentic science, where autonomous systems begin to orchestrate complex discovery workflows.

    About the speaker

    Siqi Sun is an associate professor at Fudan University and a researcher at the Shanghai AI Lab. He previously served as a researcher at Microsoft Research, Redmond. He holds a PhD from the Toyota Technological Institute at Chicago (TTIC) and a bachelor's degree in Mathematics from Fudan University. His research focuses on AI for science, specifically developing generative models and standardized benchmarks for proteomics and structural biology.

May 07, 2026
  • Title: Choosing the right stochastic block model

    Time: 04:00pm 

    Venue: CB 328

    Speaker(s): Dr. Max Jerdee

    Remark(s): 

    Abstract

    Many types of stochastic block models (SBMs) have been proposed and used to model community structure in networks. In a social network, for example, these methods can reveal tightly-knit friend groups. Across the literature, these models variously appear in canonical and microcanonical, degree-corrected and non-degree corrected, assortative and non-assortative forms. When applied to the same network, variants of the model often yield markedly different groupings of nodes and so produce competing interpretations and predictions. We introduce a parametric model that directly generalizes many of these forms, allowing us to for instance interpolate between a non degree-corrected and a degree-corrected SBM. We discuss how the posterior distribution of the parameter that bridges these models not only reveals which endpoint better represents the network, but also itself measures something meaningful about the network, in this case the inequality of degrees within communities. While individual SBMs can identify interpretable groups of nodes under restricted assumptions, we demonstrate that in an unsupervised, purely data-driven sense (model evidence and predictive power), our generalized model routinely adjudicates between and out-performs existing SBM variants on real-world networks. This unified picture allows us to precisely identify the assumptions latent within each of these models and select between them as appropriate for empirical networks. 

    About the speaker

    Max is a Omidyar Postdoctoral Fellow at the Santa Fe Institute where he works on various problems in math, physics, and statistics related to network science. He aims to understand the mechanisms driving the formation of observed network structures and to explore the fundamental limits of what such methods can reveal. Max holds a B.A. in Physics from Princeton University and a Ph.D. in Physics from the University of Michigan.

April 29, 2026
  • Title: When Quantum Causal Structures Diverge from their Classical Counterparts

    Time: 11:00am 

    Venue: CB 308

    Speaker(s): Dr. Elie Wolfe

    Remark(s): 

    Abstract

    In classical causal modelling it is conventional to group together “indistinguishable” scenarios; that is, to use a single graphical model to represent all the different latent-variable structures that generate the same operationally testable predictions. Equivalence rules which hold in the classical setting, however, can break down in the quantum setting. I will discuss my group’s recent work regarding causal scenarios with intermediate latent variables, where different quantum structures can be distinguished in ways that have no classical analogue. I will summarize prior work establishing that replacing classical hidden common causes by quantum systems often broadens the set of correlations admitting causal explanation. I will then highlight that such “causal quantum-ization” fundamentally reorganizes the landscape of which causal structures are operationally distinguishable. To capture these new distinctions, we will leverage tools such as monogamy of nonlocal correlations and semidefinite-programming hierarchies. The talk will summarize arXiv:2412.10238, will introduce (unpublished!) results regarding the (astonishing!) causal utility of quantum secret sharing codes, and will conclude with some (tantalizing!) open questions.

    About the speaker

    "Elie Wolfe is a Research Scientist at the Perimeter Institute for Theoretical Physics. His research lies at the intersection of quantum foundations, information, and causality.  He studies diverse topics such as causal modelling, quantum networks, and contextuality, all through the unifying theme of distinguishing classical, quantum, and post-quantum operational theories.
    Speaker profile photo: attached."

April 22, 2026
  • Title: AI + Data: A Match Made in Heaven?

    Time: 01:00pm 

    Venue: HW312, Haking Wong Building, The University of Hong Kong (Zoom broadcasting) Lecture theater 1A, G/F, CDS-1 Building, HK

    Speaker(s): Prof. C. Mohan

    Remark(s): 

    Abstract

    Artificial Intelligence (AI) and Data Management (DM) emerged as distinct computer science disciplines about 6-7 decades ago, with DM playing a non-significant role in the first symbolic wave of AI, which relied on handcrafted knowledge in rule-based or expert systems. Big Data (BD) became a hot topic about two decades ago, initially driven by Web 2.0 (e-commerce and social media) companies, with technical, non-technical, and open-source factors fueling its rapid development. This BD wave, combined with hardware advances and AI algorithmic inventions, like neural networks, has fueled the major strides made in the second wave of AI. After a long dormancy, AI reemerged in the past decade, initially driven by Deep Learning (DL), which leveraged vast labeled data to train models faster with less human intervention. More recently, AI has surged with the rise of Large Language Models (LLMs), Generative AI (GenAI), and startups like Anthropic, DeepSeek, OpenAI, and xAI. Major vendors, including Alibaba, AWS, Google, IBM, Meta, and Nvidia, have pivoted their focus toward AI. In this talk, Prof. C. Mohan will survey historical developments, explore AI’s implications for DM, and provide a status report on the AI landscape across different regions.

    About the speaker

    Prof. C. Mohan is currently a Distinguished Professor of Science at Hong Kong Baptist University, a Distinguished Visiting Professor at Tsinghua University, and a member of the inaugural Board of Governors of Digital University Kerala. He retired in 2020 as an IBM Fellow at the IBM Almaden Research Center in Silicon Valley. He retired in 2020 as an IBM Fellow after 38.5 years at IBM Almaden Research Center, where he worked on database, blockchain, and AI technologies. He is known for inventing the ARIES family of database locking and recovery algorithms and the Presumed Abort distributed commit protocol. He is a Fellow of IBM (1997–2020), ACM, and IEEE, and served as IBM India Chief Scientist (2006–2009). Prof. C. Mohan received the ACM SIGMOD Edgar F. Codd Innovations Award and the VLDB 10 Year Best Paper Award and was elected to the U.S. and Indian National Academies of Engineering. He is also a Distinguished Alumnus of IIT Madras, received his PhD at the University of Texas at Austin, and holds 50 patents. During the last many years, he focused on Data, Cloud, Blockchain and AI technologies. He held visiting and consulting roles at the National University of Singapore, Google, and Microsoft, and has spoken in 43 countries. 

April 21, 2026
  • Title: Early Pension Withdrawal for Homeownership

    Time: 11:00am 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Hamza Hanbali

    Remark(s): 

    Abstract

    This research examines whether individuals should utilise their individual pension savings for housing. The primary challenge is that such a policy could result in a liquidity shock, manifesting as housing inflation, and potentially benefiting only early homebuyers (e.g. high-income earners) while reducing accessibility for others. The first part presents a numerical analysis calibrated to Australian data, motivated by recent government proposals. This analysis identifies key factors influencing the outcomes on accessibility and retirement financial security. The second part investigates the problem
    analytically, modelling households’ purchases as a system of first-passage time with price feedback from aggregate demand.

    About the speaker

    "Hamza is a senior lecturer in Actuarial Science at the University of Melbourne since August 2024. His research interests include risk management and measurement, pricing, and dependence modelling. He is interested in studying issues related to insurance and pension, as well as reconciling consumer welfare and providers' financial sustainability or solvency.Hamza holds a Bachelors degree in Mathematics from Universite Pierre et Marie Currie (Paris, France) and a Masters degree in Actuarial Science from Universite Catholique de Louvain (Louvain-la-Neuve, Belgium). He completed a PhD in Actuarial Science at KU Leuven (Leuven, Belgium), and joined Monash University in 2020 as a Lecturer."

April 17, 2026
  • Title: Using Optimal Transport To Mitigate Unfair Predictions and Quantify Counterfactual Fairness

    Time: 10:30am 

    Venue: CB 308

    Speaker(s): Prof. Pinjia He

    Remark(s): 

    Abstract

    Large Language Models (LLMs) excel at software development, but can they troubleshoot post-deployment failures? This talk explores the limitations of how we evaluate LLMs for Root Cause Analysis (RCA) in software systems.

    Our study reveals that existing RCA benchmarks are too simple, allowing basic rule-based methods to outperform state-of-the-art models. To address this, we introduce OpenRCA, a benchmark dataset and evaluation framework for assessing LLMs' RCA ability, showing substantial room for model improvement. In addition, by implementing step-wise causal process supervision, we reveal that even top LLMs often guess the correct root cause following entirely flawed reasoning paths. Finally, we discuss the transition towards agentic software engineering, outlining future research directions such as building dynamic benchmarks and enhancing process-level reasoning via self-play.

    About the speaker

    Dr. Pinjia He is an Assistant Professor at The Chinese University of Hong Kong, Shenzhen. His research interests include software engineering, AI for SE, large language models, and trustworthy AI. He has published 70+ papers in top-tier conferences and journals such as ICSE, FSE, ICLR, NeurIPS, and CSUR. He received the IEEE TCSE Rising Star Award and the IEEE Open Source Software Services Award. His work has been cited over 9,000 times according to Google Scholar. The open-source projects he leads have been starred 7,000+ times on GitHub and have been downloaded 100K+ times by 450+ organizations.

April 10, 2026
  • Title: Advancing Exploration in Reinforcement Learning

    Time: 02:00pm 

    Venue: CB 328

    Speaker(s):  Prof. Leong Hou U

    Remark(s): 

    Abstract

    Exploration remains a key barrier to deploying reinforcement learning in realistic embodied settings, where agents must act under high-dimensional visual observations, sparse and delayed rewards, and often overactuated control interfaces. This talk presents a line of research that makes exploration more practical and scalable by progressively introducing structure into both representation and intrinsic motivation. We first revisit metric-based intrinsic bonuses and propose an effective discrepancy metric with adaptive scaling to improve robustness on hard exploration benchmarks. We then move beyond raw novelty by learning compact representations in a behavioral metric space and rewarding value-diverse, behaviorally distinct trajectories for scalable exploration in high-dimensional environments. To address long-horizon embodied tasks, we introduce latent “foresight” via diffusion-based self-prediction and a latent-space exploration reward, demonstrating gains in navigation/manipulation and real-world indoor deployment. Finally, for overactuated musculoskeletal control, we discover disentangled synergy patterns and learn policies entirely in a synergy-aware latent action space, improving efficiency and generalization.

    About the speaker

    Leong Hou U is currently an Associate Professor in the Department of Computer and Information Science at the University of Macau, Director of the Data Science Center. His research focuses on interdisciplinary areas at the intersection of artificial intelligence and data engineering, including traffic data optimization, spatiotemporal databases, large-scale data visualization, graph neural networks, and reinforcement learning. His team has published over 80 papers in leading journals and conferences such as SIGMOD, VLDB, ICDE, NeurIPS, AAAI, ICLR, IJCAI, and KDD. In recent years, the team has led and participated in multiple national and regional key R&D projects, including the National Key R&D Program on efficient integration and dynamic cognition technologies for urban public services, the Macau Science and Technology Development Fund key project on collaborative intelligence–driven autonomous driving, and a 2024 project on urban traffic perception fusion and intelligent reasoning that received the Second Prize of the Science and Technology Invention Award. He is also actively engaged in the international research community, serving in program and organizing committees for major conferences such as BigData, IJCAI, ICDE, DASFAA, and PAKDD, and has been a committee member of the China Association of Young Scientists (Information and Electronic Science) and the Urban Planning Committee of the Macao SAR Government since 2020, promoting the integration of scientific research with urban development policy.

  • Title: Using Optimal Transport To Mitigate Unfair Predictions and Quantify Counterfactual Fairness

    Time: 11:00am 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Arthur Charpentier

    Remark(s): 

    Abstract

    Many industries are heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation, is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective.

    In the first part of this seminar, we propose to mitigate possible discrimination (related to so call group fairness, related to discrepancies in score distributions) through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications.

    In the second part, we will focus on another aspect of discrimination usually called counterfactual fairness, where the goal is to quantify a potential discrimination if that person had not been Black or if that person had not been a woman. The standard approach, called ceteris paribus (everything remains unchanged) is not sufficient to take into account indirect discrimination, and therefore, we consider a mutates mutants approach based on optimal transport. With multiple features, optimal transport becomes more challenging and we suggest a sequential approach based on probabilistic graphical models

    About the speaker

    Professor Arthur Charpentier
    Department of Mathematics
    University of Quebec at Montreal




Division of AI & Data Science, School of Computing and Data Science
Rm 207 Chow Yei Ching Building
The University of Hong Kong
Pokfulam Road, Hong Kong
香港大學計算與數據科學學院,人工智能與數據科學系
香港薄扶林道香港大學周亦卿樓207室

Email: aienq@hku.hk
Telephone: 3917 3146

Copyright © School of Computing and Data Science, The University of Hong Kong. All rights reserved.
Don't have an account yet? Register Now!

Sign in to your CS account
(Staff only)