Spaces:
Running
Running
| Research Statement BRIEF SUMMARY OF MAJOR RESEARCH ACCOMPLISHMENTS Critical real-world open problems motivate my efforts toward new generic techniques, which reversely lead to my technical contributions that generically benefit more real-world problems. This iterative process depicts my research philosophy, which covers all the way from complex data modeling, to intelligent learning, to scalable computing, and to critical applications. For data modeling, my work focuses on generative modeling and knowledge discovery in complex data involving geometry, time, networks, and texts. I am a pioneer in several niches, like deep generative models for spatial networks, temporal graphs, and tex.tual edge graphs. My NSF CAREER Award, CIFellow Mentorship, and ICDM’19 Best Paper Award have evidenced such contributions. I published a 700-page Springer book on graph neural networks which is deemed the most comprehensive book in this area and attracted over 700 citations since its publishing two years ago. For intelligent learning, I debuted the area called visual explanation-guided learning, which went beyond explainable AI toward “AI explanation correction” for guiding AI models towards correct reasoning. These works have been recognized by a series of AI top conferences, a Cisco Faculty Research Award, and supported by two new NIH grants with a total of $5M to advance medical imaging for cancer diagnosis. For scalable computing, in KDD’19, I paved a new pathway of training deep neural networks without gradient descent to avoid its incapability in layer-wise paral.lelism, which is crucial for training huge, deep models. My Amazon Research Award and Meta Research Award have recognized this achievement. My work in NeurIPS’24 recently extended gradient-free optimization to enable the global pruning of large language models for the first time. Over a decade, I have been actively collaborating with experts in differ.ent fields to solve critical problems in healthcare, biology, sociology, chemistry, geography, astronomy, engineering, and social sciences. A SUMMARY OF ONGOING AND FUTURE RESEARCH GOALS My Research Philosophy. Critical open problems in the real world constantly motivate my efforts toward new generic techniques, which reversely lead to technical advancement that benefits more real-world problems, as shown in Fig. 1, with my detailed research in the following. 1. Complex Data Modeling Representation Learning for Complex Data. I am interested in complex data that can be spa.tial, temporal, networked, and textual. One of my focuses is representation learning for spatial net.works, a complex data structure where nodes and edges are embedded in geometric space. Spatial network data is important in various domains, ranging from microscale (e.g., protein structures), to middle-scale (e.g., biological neural networks), to macro-scale (e.g., mobility networks). I pioneered new graph neural networks (GNNs) and inference strategies to encode spatial networks into their constituent compo.nents, so that their embeddings can be learned independently and in combination. This scheme was shown to be qualitatively and quantitatively superior in the effectiveness of downstream tasks such as spatial network property prediction. This work is supported by my NSF CAREER Award for deep learning on spatial networks. I have published a Springer book “Graph Neural Networks: Foundations, Frontiers, and Applications”, with other pres.tigious researchers namely Dr. Jian Pei, Dr. Peng Cui, and Dr. Lingfei Wu. This is deemed the most comprehensive book in this area and is endorsed by world-renowned researchers including Dr. Jiawei Han, Dr. Jure Leskovec, Dr. Charu Aggarwal, Dr. Heung-Yeung Shum, and Dr. Bo Zhang. Since its publishing in 2022, this book has received over 700 citations. Its Chinese version has won the Best Seller Award from its publisher namely Posts & Telecom.munications Press. Deep Generative Models for Complex Data. This is a very promising area that benefits crucial applications with spatial, temporal, and/or networked data, such as molecule design and brain network synthesis. I have contributed substantially to the research community in several aspects: 1) Debuting new deep generative model directions and techniques for spa.tial and temporal graphs. We have also debuted deep generative models for other data such as trajectories (won Best Paper Candidate in ACM SIGSPATIAL 2022) and periodic graphs. 2) Develop benchmark dataset and model evaluation infrastructure. I have made funda.mental contributions to this area by publishing a large benchmark dataset repository named GraphGT in NeurIPS 2021, as well as the earliest and most comprehensive survey paper in TPAMI (Impact Factor: 24.31) that classifies the techniques and unifies the evaluation sce.narios. 3) I have also pioneered a new research direction called deep graph transformation, which generates target graph-conditioning on source graphs, with important applications such as molecule structure optimization and circuit obfuscation. Our initial work that aims at jointly transforming both node and edge embeddings into a new graph has been recog.nized with a Best Paper Award at ICDM 2019. Future Work. My pioneering work will lead to property-controllable complex data gen.eration and design. For example, Merck Company has funded my research on designing molecules that satisfy specific properties required for new medicines with the collaboration of their chemists. In addition, my group is pioneering a new promising area on textual graphs where the nodes and edges can be texts, by synergizing the advancements of large language models (LLMs) and graph deep learning. Our recently published works led the area of exploring and transferring knowledge across LLMs and GNNs, by advancing knowl.edge and graph distillation. Also, we have released the first-of-its-kind benchmark dataset and infrastructure on textual edge graphs in NeurIPS 2024. 2. INTELLIGENT LEARNING STRATEGIES Harnessing and predicting the concept drifts. I focus on developing models for learning and prediction across seen and unseen tasks. I debuted research directions of spatial multi.task learning that, for the first time, seamlessly correspond multitask learning principles (i.e., “trade-off between task relation v.s. difference”) to geography laws (i.e., “the trade-off between spatial correlation v.s. heterogeneity”). This series of works led to over a thousand citations. Also, I aim at more challenging, real-world tasks where data distribution varies over time and we require our machine learning model to anticipate and adapt to future un.seen distribution without training. This domain is coined “temporal domain generalization” by us, who published the first work in ICLR 2023 as an oral paper (5% among published pa.pers). In this paper, we proposed a new Bayesian treatment to learn the model parameter distribution evolution from the data for interpolating and extrapolating new models for new (and future) times. In our recent NeurIPS 2024 paper, we extended this work to continuous time domains by learning a continuous dynamic system under Koopman’s space to learn the underlying dynamics. Visual explanation-guided learning. Our group goes beyond existing explainable AI on “how to generate explanations”, to “how to identify and correct wrong explanations”. This area is called explanation-guided learning, which has been well developed for structured data like text tokens and tabular data, but not for unstructured ones like image and geomet.ric data. Hence, we laid the foundation of visual explanation-guided learning by developing techniques that jointly minimize the prediction and explanation errors in our KDD 2022 pa.per. This work, together with a series of follow-up works in top AI conferences, constructed this niche, which is further standardized and consolidated by our recent survey and bench.marking paper published in ACM Computing Surveys. Future Work. I aim to establish a generic framework of a human-AI interaction system that supports the iterative mutual learning between human and AI models. Such a system has great po.tential in critical applications such as cancer diagnosis and molecule design, where humans and AI have complementary strengths and motivations to learn from and understand each other. Another direction is to harness the knowledge staleness of large foundation models and how to update them against concept drifts. It is crucial to keep large foundation models up-to-date with real-time data, but it is prohibitively costly to re-train them frequently. I will leverage and extend my unique investigation on the relation between data dynamics and model dynamics toward a shortcut of large foundation model efficient updating over time. 3. SCALABLE OPTIMIZATION METHODS Scalable Optimization Methods for Large and Deep Models Although being treated as routine optimization methods for deep learning, gradient-based methods are well-recognized to suffer inherent drawbacks such as gradient vanishing, poor conditioning, low concur.rency, and biological implausibility. To address them, my group has established a different theoretical framework of alternating optimization for deep learning. For the first time, in our KDD 2019 work, we showed that gradient-free optimization methods achieved state-of-the.art performance when compared to the best gradient-based methods such as ADAM. This groundbreaking work, together with our follow-up works on this topic, has been reported by famous technical news outlets such as Synced Review. Our alternating optimization un.precedentedly provides us with an intuitive way towards layerwise model parallelism that is extremely important for training large, deep models. For example, one recent area is LLM pruning, which aims at minimizing the difference between the outputs of sparsified LLM and the original one. Therefore, directly solving the minimization requires loading the whole LLMs into memory, which is prohibitive, so the existing work workaround is to seg.ment LLMs into independent layers and only do local pruning. To address this dilemma, we debuted the first-of-its-kind work on LLM global pruning, called SparseLLM. SparseLLM is a novel framework that redefines the global pruning process into manageable, coordinated subproblems, allowing for resource-efficient alternating optimization with global optimality. Combinatorial Optimization in Dynamic Graphs. Combinatorial optimization problems in dynamic systems, such as influence maximization, network tomography, and source local.ization, have been intensively studied over decades, where for each problem, even brilliant human researchers have to devote sophisticated efforts to algorithm design to pursue (close-to) optimal solutions. The designed algorithms require substantial change if any major prob.lem settings are changed, such as node/edge attribute quantity, types, network types, graph dynamics, constraints, and optimization objectives. To scale with the fast-accumulated data and knowledge and quickly generalize the knowledge from existing algorithms to other similar problems, my team has debuted novel data-driven frameworks that leverage GNNs to learn the unknown graph dynamics for different graph optimization problems, which stimulated the fast growth of the research of this thriving area in recent years. For example, our pioneering work published in the ICML 2023 paper on influence maximization by graph neural networks attracted over a hundred citations in 2024. Future Work. We aim to build a new physics-guided deep dynamic graph optimization framework that can unify combinatorial optimization problems in dynamic graphs as their shared “language”, under which we propose a suite of techniques for modeling, inference, acceleration, scaling-up, theoretical analysis, and explanation for the problem-solving. Ad.ditionally, we will extend our LLM global pruning work from unstructured to structured pruning. We can further accelerate the pruning of very large LLMs, such as those over 60B, by distributing the computing of different parts into different yet coordinated devices. 4. Critical Real-world Problem-solving I have been actively collaborating with domain experts in critical applications, exemplified as follows. Societal event forecasting: I am a leading expert on event prediction, with over 40 publications in this direction. I laid the technical foundation and standardized the bench.marking scenarios in collaboration with several sociologists. Medical imaging: My team has developed ground-founding techniques, benchmarks, and evaluation scenarios for re.sponsible and explainable AI in health by leveraging human-annotated data in cancer imag.ing, such as tumor annotation and diagnosis reports. I am collaborating with radiologists, oncologists, and bio-statisticians from medical schools. Molecule design and chemistry:I am an advocator of bridging “AI experts”, “computational chemists”, and “experimental chemists” in a loop for molecule design. Our interdisciplinary team is entailing this idea via our recent NSF grants. My lab is financially and logistically supported by one of the biggest pharma companies in the collaboration of AI-driven molecule design. Neuroscience and neuroimaging: I have been collaborating with neuroscientists from universities and NIH for over six years on “AI for neuroscience” and “neuroscience for AI” and we are pushing forward AI for aging in our recently funded R01 grant from National Institutes of Health. |