From: Kendra Smith
Sent: Tuesday, January 25, 2000 3:24 AM
To: M?crosöft Research Tech Talk, Sem. Notice
Cc: Kendra Smith
Subject: UW-CSE Colloq / 2-1-2000 / Charikar / Stanford / Algorithms for Clustering
UW-CSE Colloq / 2-1-2000 / Charikar / Stanford / Algorithms for Clustering
*NOTE* This lecture will be broadcast live via the Internet. See
http://www.cs.washington.edu/news/colloq.info.html for more information.
UNIVERSITY OF WASHINGTON
Seattle, Washington 98195
Department of Computer Science and Engineering
Box 352350
(206) 543-1695
COLLOQUIUM
SPEAKER: Moses Charikar, Stanford
TITLE: Algorithms for Clustering
DATE: Tuesday, February 1, 2000
TIME: 3:30 pm
PLACE: 134 Sieg Hall
HOST: Anna Karlin
ABSTRACT:
Clustering problems arise in various contexts including classification,
information retrieval and data mining. Roughly speaking, clustering
refers to partitioning a set of objects into groups of similar
objects. The objects (e.g. documents or images) are usually represented
as points in some space with a distance measure and the objective is to
obtain clusters of points that are close to each other. Problems of this
flavor also occur in discrete location theory, where the goal is to locate
a set of facilities (e.g. factories or warehouses) so as to serve a given
set of clients. This talk describes several algorithmic aspects of
clustering problems.
The first part of the talk will focus on the approximability of several
natural clustering objectives. The quality of a clustering is typically
measured by an objective function and the goal of an algorithm is to
minimize this. For most natural objective functions, the corresponding
optimization problem turns out to be NP-hard. In the face of this
fundamental intractability, researchers have shifted their focus from
exact solutions to obtaining approximate solutions that are guaranteed to
be close to the optimal. I will describe approximation algorithms for
classical clustering and location problems such as k-median and facility
location. Both linear programming based approaches as well as purely
combinatorial approaches such as local search can be used to obtain
constant factor approximations for these problems. Further, the ideas
behind these algorithms extend to other problems such as min-sum
clustering and clustering with excluded outliers.
In the second part of the talk, I will examine other algorithmic issues
that arise in clustering problems. I will describe min-wise independent
permutations that play an important role in efficiently estimating
similarity of documents. They are used in clustering documents to
eliminate near duplicates in the AltaVista search index. I will also
formulate problems and describe results related to the incremental
maintenance of clusters in a dynamic point set.
Refreshments to follow.
Email: talk-info@cs.washington.edu
Info: http://www.cs.washington.edu