ICDE Influential Paper Awards
Justin J. Levandoski, David B. Lomet, Sudipta Sengupt
The Bw-Tree: A B-tree for new hardware platforms.
Rui Li, Kin Hou Lei, Ravi Khadiwala, Kevin Chen-Chuan Chang
TEDAS: A twitter-based event detection and analysis system
Alfons Kemper, Thomas Neumann
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. ICDE 2011
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Anthony, Hao Liu, Raghotham Murthy
Hive – a petabyte scale data warehouse using Hadoop. ICDE 2010
Citation: A path-finding paper towards supporting SQL for the unstructured hadoop environment so that the DB community becomes relevant to big data. Many citations by the MapReduce (MR) proponents. This MR fever has faded by now and Hive’s early MR execution engine described in the paper was replaced by Tez, a more sophisticated parallel execution engine. Selection of this paper as ICDE 2020 Ten-Year Influential Paper will bring the community an opportunity to learn the progress over the past ten years as well as to revisit the MR controversy.
Archana Ganapathi, Harumi A. Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael I. Jordan, David A. Patterson
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. ICDE 2009
Bin Zhou and Jian Pei
Preserving Privacy in Social Networks Against Neighborhood Attacks. ICDE 2008
Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer and Muthuramakrishnan Venkitasubramaniam
l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006
Citation: L-diversity is method for sharing sensitive data in a privacy-preserving way. The method sanitizes the dataset in order to protect the confidentiality of individuals in the data, while still preserving aggregate statistics. L-diversity was the one of the first methods to demonstrate that user-level information can be protected from attackers without having to precisely specify or know the background knowledge of the attackers. The work started a line of foundational research into formal privacy definitions and algorithms for privacy preserving data publication. Today, l-diversity is often used when solutions with stronger privacy constraints do not leave sufficient utility in the published data.
Ninghui Li, Tiancheng Li and Suresh Venkatasubramanian
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. ICDE 2007
Citation: This paper introduces a novel privacy notion called t-closeness for preventing the disclosure of a sensitive attribute. It identifies as privacy-preserving the hypothetical situation where all potentially identifying attributes are removed and only the distribution of the sensitive attribute in the overall population is published. And t-closeness limits any additional information one can learn. The proposed privacy notion is elegant and thought-provoking, and has significantly influenced subsequent research in data privacy.
Michael Stonebraker, Ugur Çetintemel
“One Size Fits All”: An Idea Whose Time Has Come and Gone. ICDE 2005
Citation: This paper asks the question whether we should continue to build general-purpose database systems or whether we should start building special-purpose systems that address a specific class of workloads. This question has raised heavy and ongoing debates in both industry and academia since 2005. The paper makes a case for special-purpose systems because they can achieve orders of magnitude better performance for their specific target workload.
Jeffrey Considine, Feifei Li, George Kollios, John W. Byers
Approximate aggregation techniques for sensor databases. ICDE 2004
Citation: The paper describes novel methods to handle duplicate-sensitive aggregates over distributed datasets. It carefully extends the duplicate-insensitive Flajolet-Martin method, adapting it to require little computation and communication efforts, and make it robust to link losses. This work has been highly impactful in the area of sensor networks, and has been shown to be applicable to any setting with multiple data sources that may suffer network failures, such as distributed data centers of today.
Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov
Schema Mediation in Peer Data Management Systems. ICDE 2003
Sergey Melnik, Hector Garcia-Molina, Erhard Rahm: Similarity Flooding
A Versatile Graph Matching Algorithm and its Application to Schema Matching. ICDE 2002
Citation: Together, these two papers describe techniques to match and mediate schemas. They show how to exploit schema structures for matching, how peer data management forms a next logical step for data integration research, and how to mediate among schemas in peer-to-peer settings. The proposed techniques are scalable and elegant, and have significantly influenced subsequent research in schema matching and peer data management.
Sanjay Agrawal, Surajit Chaudhuri, Gautam Das
DBXplorer: A System for Keyword-Based Search over Relational Databases. ICDE 2002
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan
Keyword Searching and Browsing in Databases using BANKS. ICDE 2002
Citation: Together, these papers from ICDE 2002 laid the foundations for keyword search over relational databases, paving the way for a significant body of follow-on work in the area of Information Retrieval and Databases. The solutions presented in these papers are elegant and highly effective.
Stephan Börzsönyi, Donald Kossmann, Konrad Stocker
The Skyline Operator, ICDE 2001
Citation: Skyline computation (a.k.a. the maximum vector problem) is a fundamental concept in multi-criteria decision making. This highly influential paper opened a new research topic in the database community. It framed the skyline concept in a database setting and offered a study of fundamental techniques for skyline query processing. The paper laid a solid foundation for a multitude of studies that have refined the concept of skylining and proposed efficient implementations in a variety of settings.
Kin-pong Chan, Ada Wai-Chee Fu
Efficient Time Series Matching by Wavelets, ICDE 1999.
Citation: This paper proposed the first efficient time-series indexing method by making use of discrete wavelet transform (DWT) and greatly influenced subsequent work on indexing of time series. It also showed that DFT (Discrete Fourier Transform) may not be the best representation for dimensionality reduction in time series, leading to significant research into alternative representations as well as wavelet-based scalable data analysis.
Rakesh Agrawal and Ramakrishnan Srikant
Mining Sequential Patterns, ICDE 1995.
Citation: This paper launched a new area in data mining. Sequential pattern mining has since become an important and active area with a variety of applications and much published work. The paper is a milestone in the field of data mining.
Kenneth Salem and Hector Garcia-Molina
Disk Striping, ICDE 1986.
Citation: This early paper on disk striping significantly influenced subsequent work on RAID storage.
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, ICDE 1996
Citation: This seminal paper defined a simple SQL construct that enables one to efficiently compute aggregations over all combinations of group-by columns in a single query, where previous approaches required multiple queries. This feature has had significant impact on industry and is now incorporated in all major database systems.
Goetz Graefe, William J. McKenna
The Volcano Optimizer Generator: Extensibility and Efficient Search. ICDE 1993.
Citation: This seminal paper laid the foundation for transformation-based query optimizers. Volcano was the first optimizer framework based on this approach and inspired several others. Multiple commercial database systems rely on transformation-based query optimizers.
Award Selection Process
Committee: A Small Group of Anonymous Members, who will be recognized at the end of their service (3 years and repeatable). Neither TCDE nor ICDE General Chairs/PC Chairs are involved in the selection process.
Chair: Facilitates drawing the unanimous decision
First Round: Draw top candidate papers
- Each member recommends a small number of papers to the chair
- The chair relays recommendations and further communications
- Top candidates emerge
Final Round: Review self-assessment
- The chair asks the authors of top candidate papers to submit their own self-assessment
- Preferably no longer than three paragraphs
- Within 24 hours: if the paper is influential, a couple of hours will be sufficient to write three paragraphs of self-assessment