Estimating the maximum

Saved in:
Bibliographic Details
Title: Estimating the maximum
Authors: Gum, Ben1 gum@cs.grinnell.edu, Lipton, Richard J.2 rjl@cc.gatech.edu, LaPaugh, Andrea3 aslp@cs.princeton.edu, Fich, Faith4 fich@cs.toronto.edu
Source: Journal of Algorithms. Jan2005, Vol. 54 Issue 1, p105-114. 10p.
Subjects: Statistical sampling, Algorithms, Algebra, Graph theory
Abstract: Estimating the maximum of a sampled dataset is an important and daunting task. We give a sampling algorithm for general datasets which gives estimates strictly better than the largest sample for an infinite family of datasets. Our algorithm overshoots the true maximum of the worst case dataset with probability at most (1/e)+O(1/k), where k is the size of our sample, which is much smaller than the size of the dataset. Our proof is the result of a new extremal graph coloring theorem: given any red/green coloring of the edges of a complete graph of n vertices, the probability that the edges among k randomly sampled vertices have a certain property is at most (1/e)+O(1/k). In addition, we show that if an algorithm gives an estimate strictly better than the largest sample for some dataset, then the algorithm overshoots the maximum on some other dataset with probability at least (1/e)-O(1/k). [Copyright &y& Elsevier]
Copyright of Journal of Algorithms is the property of Academic Press Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Estimating the maximum of a sampled dataset is an important and daunting task. We give a sampling algorithm for general datasets which gives estimates strictly better than the largest sample for an infinite family of datasets. Our algorithm overshoots the true maximum of the worst case dataset with probability at most <f>(1/e)+O(1/k)</f>, where <f>k</f> is the size of our sample, which is much smaller than the size of the dataset. Our proof is the result of a new extremal graph coloring theorem: given any red/green coloring of the edges of a complete graph of <f>n</f> vertices, the probability that the edges among <f>k</f> randomly sampled vertices have a certain property is at most <f>(1/e)+O(1/k)</f>. In addition, we show that if an algorithm gives an estimate strictly better than the largest sample for some dataset, then the algorithm overshoots the maximum on some other dataset with probability at least <f>(1/e)-O(1/k)</f>. [Copyright &y& Elsevier]
ISSN:01966774
DOI:10.1016/j.jalgor.2004.04.005