Professor, Computer Science, Computing
Thesis Advisor, Integrative Sciences and Engineering
Doctor of Philosophy, Simon Fraser University, Canada
Bachelor of Science (Comp & Info Sci) Hons Class 2A, National University of Singapore, Singapore
Bachelor of Science (Comp and Info Sc) with Merit, National University of Singapore, Singapore
Anthony K. H. Tung builds Prudent, Just-in-Time, and white-box AI systems—linking diversified evidence retrieval with small-data event modeling and DeepConnect-style collective intelligence for trustworthy deployment.
—————————————————————————————————————————————————————————–
Professor Anthony K. H. Tung is Professor of Computer Science at the National University of Singapore (NUS). He currently serves as Director of the Google–NUS Joint Research & Innovation Center and is the AI Lead for the NUS Artificial Intelligence Institute (NAII) domain on AI + Urban Sustainability. Across academic leadership and industry-collaborative research, his work is driven by a practical question: how can AI systems be engineered to be right-sized, explainable, and governable, while still delivering strong performance in real operational environments?
Technically, Prof. Tung’s research spans (i) hybrid information retrieval and RAG indexing/search over complex data types (including time series and other structured or high-dimensional objects), (ii) white-box / interpretable analytics for decision support (e.g., explainable ranking and rare-event/anomaly detection), and (iii) trustworthy AI foundations such as privacy technologies, selective data removal/unlearning, and auditable governance mechanisms. A defining emphasis in his recent agenda is to avoid “oversized AI by default” and instead design systems whose capabilities match the deployment context, cost, and risk profile.
This perspective is crystallized in his Prudent AI direction: developing lightweight, transparent, and resource-efficient AI that can be deployed as modular, plug-and-play components—often framed as Just-in-Time AI boxes. These approaches are motivated by challenging realities such as rare and fragmented events, limited labels, operational latency requirements, and privacy constraints, especially in sensor-rich settings and tabular/longitudinal data. The goal is not only early and accurate detection, but also clear interpretability of signals and causes—making model behavior inspectable rather than opaque.
At the same time, Prof. Tung is advancing diversified-view retrieval and explainable diversity control in RAG systems—addressing the tendency of similarity-only retrieval to reinforce dominant perspectives. Complementing these technical directions, he proposed DeepConnect, a sociotechnical framework that deploys AI as shared connective infrastructure—supporting interdisciplinary discovery, meaning alignment, provenance-linked boundary objects, and prudent execution pathways through the integrated programs of Synapse (People↔Ideas) and AI Prudent Technologies (APT) (Ideas→Execution).
My research interest is anchored in time series as a first-class data modality for real-world decision-making—because in many operational domains (urban systems, finance, cyber-physical infrastructure, industrial sensing), the most consequential signals arrive as multivariate, high-velocity, noisy time series. The core question I work on is how to build AI systems that can understand, generate, and act on time series under practical constraints: limited labels, shifting distributions, tight latency budgets, and strong requirements for interpretability and auditability. This motivates a “Prudent Time-Series Intelligence” approach: models should be right-sized (cost-effective and deployable), transparent (white-box when possible), and actionable (supporting early detection and intervention rather than only retrospective analytics).
A major thread of my work is small-data event modeling through time series generation (TSG)—not just to synthesize data, but to systematically improve robustness when real events are rare, fragmented, or expensive to observe. This includes building rigorous evaluation infrastructure (benchmarks and metrics) so that progress in TSG is measurable and reproducible, and extending TSG toward controllability so that synthetic sequences can reflect external conditions and targeted scenarios. Representative examples include TSGBench, a comprehensive benchmark for time series generation, and CTS, a framework that formalizes controllable time series generation for data-scarce regimes.
A second major thrust is early anomaly detection and white-box time series reasoning, especially for sensor networks where failures often show up first as correlation structure changes before individual sensors exhibit obvious outliers. I have worked on correlation-aware methods that convert multivariate time series into time-series graphs and track unusual correlation variations for early detection and actionable localization. This is paired with system work (e.g., interactive/analyzable tooling) to make early-detection pipelines more usable, interpretable, and operationally relevant for predictive maintenance settings.
Finally, I connect time series intelligence with retrieval and RAG to reduce the friction from research to deployment and to support diversified evidence discovery rather than single-view retrieval. One example is bringing LLM+RAG into the time-series workflow itself (recommendation, benchmarking, and decision support), and another is diversified-view retrieval for complex information environments (e.g., news and fact verification). Across these themes, the unifying aim is cohesive: diversified retrieval to broaden evidence, just-in-time and small-data time series modeling to act under uncertainty, and white-box / governable components that can be integrated into reliable systems.
TSGBench: Time Series Generation Benchmark (PVLDB) – PDF: https://www.vldb.org/pvldb/vol17/p305-huang.pdf
TSGAssist: LLMs + RAG for Time Series Generation Recommendations & Benchmarking (PVLDB Demo) – PDF: https://www.vldb.org/pvldb/vol17/p4309-huang.pdf
Towards Controllable Time Series Generation (CTS) (arXiv) – PDF: https://arxiv.org/pdf/2403.03698.pdf
CTBench: Cryptocurrency Time Series Generation Benchmark (arXiv) – PDF: https://arxiv.org/pdf/2508.02758.pdf
EADS: An Early Anomaly Detection System for Sensor-based Multivariate Time Series (ICDE Demo) – PDF: https://www.comp.nus.edu.sg/~huangzy/icde2024demoEADS.pdf
A Stitch in Time Saves Nine: Enabling Early Anomaly Detection with Correlation Analysis (ICDE) – public full-text page (PDF available via page UI): https://www.researchgate.net/publication/372669739_A_Stitch_in_Time_Saves_Nine_Enabling_Early_Anomaly_Detection_with_Correlation_Analysis
Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback (TS-Agent) (arXiv) – PDF: https://arxiv.org/pdf/2508.13915.pdf
DiversiNews: Relevant Yet Diverse News Articles Retrieval (PVLDB Demo) – PDF: https://www.vldb.org/pvldb/vol17/p4277-huang.pdf
The Missing Parts: Augmenting Fact Verification with Half-Truth Detection (EMNLP) – PDF: https://aclanthology.org/2025.emnlp-main.1724.pdf
PRISM: A Framework for Producing Interpretable Political … (ACL) – PDF: https://aclanthology.org/2025.acl-long.1344.pdf
Don’t Reinvent the Wheel: Efficient Instruction-Following Text Embedding (ACL) – PDF: https://aclanthology.org/2025.acl-long.1196.pdf
A General Framework for Producing Interpretable Semantic Text Embeddings (ICLR) – PDF: https://proceedings.iclr.cc/paper_files/paper/2025/file/fa5617c176e76fee83f3f9947fdf9f3f-Paper-Conference.pdf
Detecting Leaked Data through Synthetic Data Injection and Model Querying (PVLDB) – PDF: https://www.vldb.org/pvldb/vol17/p1898-huang.pdf
My research interest is anchored in time series as a first-class data modality for real-world decision-making—because in many operational domains (urban systems, finance, cyber-physical infrastructure, industrial sensing), the most consequential signals arrive as multivariate, high-velocity, noisy time series. The core question I work on is how to build AI systems that can understand, generate, and act on time series under practical constraints: limited labels, shifting distributions, tight latency budgets, and strong requirements for interpretability and auditability. This motivates a “Prudent Time-Series Intelligence” approach: models should be right-sized (cost-effective and deployable), transparent (white-box when possible), and actionable (supporting early detection and intervention rather than only retrospective analytics).
A major thread of my work is small-data event modeling through time series generation (TSG)—not just to synthesize data, but to systematically improve robustness when real events are rare, fragmented, or expensive to observe. This includes building rigorous evaluation infrastructure (benchmarks and metrics) so that progress in TSG is measurable and reproducible, and extending TSG toward controllability so that synthetic sequences can reflect external conditions and targeted scenarios. Representative examples include TSGBench, a comprehensive benchmark for time series generation, and CTS, a framework that formalizes controllable time series generation for data-scarce regimes.
A second major thrust is early anomaly detection and white-box time series reasoning, especially for sensor networks where failures often show up first as correlation structure changes before individual sensors exhibit obvious outliers. I have worked on correlation-aware methods that convert multivariate time series into time-series graphs and track unusual correlation variations for early detection and actionable localization. This is paired with system work (e.g., interactive/analyzable tooling) to make early-detection pipelines more usable, interpretable, and operationally relevant for predictive maintenance settings.
Finally, I connect time series intelligence with retrieval and RAG to reduce the friction from research to deployment and to support diversified evidence discovery rather than single-view retrieval. One example is bringing LLM+RAG into the time-series workflow itself (recommendation, benchmarking, and decision support), and another is diversified-view retrieval for complex information environments (e.g., news and fact verification). Across these themes, the unifying aim is cohesive: diversified retrieval to broaden evidence, just-in-time and small-data time series modeling to act under uncertainty, and white-box / governable components that can be integrated into reliable systems.
TSGBench: Time Series Generation Benchmark (PVLDB) – PDF: https://www.vldb.org/pvldb/vol17/p305-huang.pdf
TSGAssist: LLMs + RAG for Time Series Generation Recommendations & Benchmarking (PVLDB Demo) – PDF: https://www.vldb.org/pvldb/vol17/p4309-huang.pdf
Towards Controllable Time Series Generation (CTS) (arXiv) – PDF: https://arxiv.org/pdf/2403.03698.pdf
CTBench: Cryptocurrency Time Series Generation Benchmark (arXiv) – PDF: https://arxiv.org/pdf/2508.02758.pdf
EADS: An Early Anomaly Detection System for Sensor-based Multivariate Time Series (ICDE Demo) – PDF: https://www.comp.nus.edu.sg/~huangzy/icde2024demoEADS.pdf
A Stitch in Time Saves Nine: Enabling Early Anomaly Detection with Correlation Analysis (ICDE) – public full-text page (PDF available via page UI): https://www.researchgate.net/publication/372669739_A_Stitch_in_Time_Saves_Nine_Enabling_Early_Anomaly_Detection_with_Correlation_Analysis
Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback (TS-Agent) (arXiv) – PDF: https://arxiv.org/pdf/2508.13915.pdf
DiversiNews: Relevant Yet Diverse News Articles Retrieval (PVLDB Demo) – PDF: https://www.vldb.org/pvldb/vol17/p4277-huang.pdf
The Missing Parts: Augmenting Fact Verification with Half-Truth Detection (EMNLP) – PDF: https://aclanthology.org/2025.emnlp-main.1724.pdf
PRISM: A Framework for Producing Interpretable Political … (ACL) – PDF: https://aclanthology.org/2025.acl-long.1344.pdf
Don’t Reinvent the Wheel: Efficient Instruction-Following Text Embedding (ACL) – PDF: https://aclanthology.org/2025.acl-long.1196.pdf
A General Framework for Producing Interpretable Semantic Text Embeddings (ICLR) – PDF: https://proceedings.iclr.cc/paper_files/paper/2025/file/fa5617c176e76fee83f3f9947fdf9f3f-Paper-Conference.pdf
Detecting Leaked Data through Synthetic Data Injection and Model Querying (PVLDB) – PDF: https://www.vldb.org/pvldb/vol17/p1898-huang.pdf