We study problems at the intersection of pricing and data management in emerging cloud-computing environments.
Relational Data Market in the Cloud
Cloud-computing is transforming many aspects of data management. Most recently, the cloud is seeing the emergence of digital markets for data and associated services. We observe that our community has a lot to offer in building successful cloud-based data markets. In this project, we investigate some of the key challenges that such markets face and we build tools for supporting them.
The following paper discusses a framework for pricing relational data, along with several interesting open problems and challenges.
Current mechanisms for pricing data are very simple: buyers can choose only from a set of explicit views, each with a specific price. In the following work, we propose a framework for pricing data on the Internet that, given the price of a few views, allows the price of any query to be derived automatically. We call this capability query-based pricing.
An implementation of the pricing framework was presented as a demo at VLDB 2012.
We will be presenting a paper on our data pricing system (QueryMarket) at SIGMOD 2013.
Pricing Private Data
Personal data has huge value, both its owner and to institutions who would like to analyze it. As the awareness of the value of the personal data increases, there is a drive in industry to compensate the end user for her private information. This paper proposes a theory on how to price private data.
Data Use Management
When valuable data is exchanged or bought, it is frequently encumbered by restrictions on how it may be used. For ex_ ample, clinical data must not be used in such a way as to ex_ pose the patients’ identities. To date, these restrictions are enforced only contractually and compliance is checked only manually, if at all. To meet the needs of this growing set of applications we explore the design of a Data Use Manager and research efficient algorithms for its implementation as a component of a database system that enables the declarative specification and enforcement of sophisticated data use policies and provides capabilities for both their online enforcement and offline audit.
Collaborative Data Management in the Cloud
Data-management-as-a-service systems are increasingly used in collaborative settings, where multiple users access common data sets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major challenge: how to select which optimizations to perform and share their cost among users. The problem is especially challenging when users are selfish and will only report their true values for different optimizations if it maximizes their utility. We study mechanism-design-based techniques for addressing this challenge.
The Data Eco$y$tem project is partially supported by the National Science Foundation and Microsoft through NSF CiC grant CCF 1047815 and NSF grant IIS-0915054 and additional gifts from Microsoft Research. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.