Skyline Computation
Atish Das Sarma () - Skyline computation is a central area of research that lies in the intersection of databases, theory, and web data management. There has been a surge in skyline research (particularly in the database community) over the last decade. A skyline of a database is a subset of undominated entries in the database where an entry $i$ is said to dominate entry $j$ if $i$'s value on each attribute is at least as much as $j$'s (and strictly greater in at least one).
In joint work with A. Lall, D. Nanongkai, and J. Xu, we prove near-optimal bounds for computing the set of skyline data points under the streaming model. Given a set of $n$ points with a skyline of size $m$ and a memory constraint of space $s$, we show how to compute all skyline points in $\tilde{O}(m/s)$ passes over the entire point stream. For $s=m$ this is near optimal and is an exponential improvement over the best known results previously. Additionally, we extend these techniques for I/O-efficient algorithms and show via extensive experiments that our approach compares favorably to previous known methods. We now, along with R. J. Lipton, have work in progress on computing representative skylines where only a subset of the skylines can be shown to the users; the task is to decide which ones to show (a problem that hotel/flight booking websites have to deal with on a daily basis). We adopt a regret-minimizing approach to maximize user preferences and prove upper and lower bounds on the utility guarantees achievable depending on the size of the representation allowed.
The first part will appear at VLDB 2009 (and paper will be made from http://www.cc.gatech.edu/~atish/full-pubs-new.html within two weeks).
