Title: Watermarking of Generative Tabular Data
Abstract: In the first half of this talk, we provide an overview of synthetic data generation and claim that "creating something out of nothing" is possible and beneficial through concrete examples. This prompts the exploration of "Generative Data Science," which elucidates the underlying principles behind generative AI, and we further highlight its distinctions from statistical machine learning.
The latter half of this talk showcases our recent research on watermarking, an essential technique for establishing ownership in generative data, as an embodiment of generative data science. Specifically, we illustrate the embedding and detecting (invisible) watermarks in generative tabular data, ensuring their resilience against attacks through rigorous statistical analyses and theoretical validation.
Bio: Guang Cheng is a Professor of Statistics and Data Science at UCLA and leads the Trustworthy AI Lab (https://www.stat.ucla.edu/~guangcheng/). He received his BA in Economics from Tsinghua University in 2002 and PhD in Statistics from the University of Wisconsin-Madison in 2006. His research interests include generative data science, deep learning theory, and statistical machine learning. Cheng is an Institute of Mathematical Statistics Fellow, Simons Fellow in Mathematics, NSF CAREER awardee, and a member of the Institute for Advanced Study, Princeton.