【学术报告】From Dirt to Shovels: Automatic Tools Generation from Ad Hoc Data

发稿时间:2009-05-09浏览次数:3517

 

5.20学术报告之一

From Dirt to Shovels: Automatic Tools Generation from Ad Hoc Data

报告人:Kenny Zhu,Postdoctoral Researcher at Princeton University

时间:5月11日下午3:00
地点:蒙民伟楼109室

Abstract:
Ad hoc data is any non-standard, semi-structured data for which no useful data analysis and transformation tools are readily available. Such data is pervasive in many areas such as scientific repositories, financial data, system logs  and configs, sensor outputs, etc. In this work, we demonstrate that it is possible to generate a suite of useful data processing tools directly from the ad hoc data itself, without any human intervention, and thus improves the productivity of data analysts.

The key technical contribution of the work is a multi-phase algorithm that automatically infers the structure of an ad hoc data source, and produces a format specification in a declarative language called PADS. Such specifications can be used to generate printing and parsing libraries as well as other useful tools for processing the data. At the end of the talk, I will briefly introduce a few exciting new ideas in some on-going work that further improve the productivity of ad hoc data users.

Bio:
Kenny Zhu is a Postdoctoral Researcher at Princeton University. He graduated with B.Eng in Electrical Engineering and Ph.D in Computer Science, both from National University of Singapore. Prior to joining Princeton in 2007, he was a software design engineer at Microsoft in Seattle. Kenny's main research interests are languages and systems for data processing, artificial intelligence and concurrent/distributed systems. He has published in top-tier conferences such as POPL, SIGMOD, ICDE and ICLP, and has been actively reviewing for various conferences and journals. His current research is centered around the PADS data description language.