From:                     Kendra Smith

Sent:                      Tuesday, January 25, 2000 3:24 AM

To:                         M?crosöft Research Tech Talk, Sem. Notice

Cc:                         Kendra Smith

Subject:                 UW-CSE Colloq / 2-1-2000 / Charikar / Stanford / Algorithms for Clustering

UW-CSE Colloq / 2-1-2000 / Charikar / Stanford / Algorithms for Clustering

 

*NOTE* This lecture will be broadcast live via the Internet. See

http://www.cs.washington.edu/news/colloq.info.html for more information.

 

UNIVERSITY OF WASHINGTON

Seattle, Washington 98195

 

Department of Computer Science and Engineering

Box 352350

(206) 543-1695

 

COLLOQUIUM

 

SPEAKER:      Moses Charikar, Stanford

 

TITLE:          Algorithms for Clustering

 

DATE:           Tuesday, February 1, 2000

 

TIME:           3:30 pm

 

PLACE:                   134 Sieg Hall

 

HOST:           Anna Karlin

 

ABSTRACT:

 

Clustering problems arise in various contexts including classification,

information retrieval and data mining.  Roughly speaking, clustering

refers to partitioning a set of objects into groups of similar

objects.  The objects (e.g. documents or images) are usually represented

as points in some space with a distance measure and the objective is to

obtain clusters of points that are close to each other.  Problems of this

flavor also occur in discrete location theory, where the goal is to locate

a set of facilities (e.g. factories or warehouses) so as to serve a given

set of clients. This talk describes several algorithmic aspects of

clustering problems.

 

The first part of the talk will focus on the approximability of several

natural clustering objectives.  The quality of a clustering is typically

measured by an objective function and the goal of an algorithm is to

minimize this.  For most natural objective functions, the corresponding

optimization problem turns out to be NP-hard.  In the face of this

fundamental intractability, researchers have shifted their focus from

exact solutions to obtaining approximate solutions that are guaranteed to

be close to the optimal.  I will describe approximation algorithms for

classical clustering and location problems such as k-median and facility

location.  Both linear programming based approaches as well as purely

combinatorial approaches such as local search can be used to obtain

constant factor approximations for these problems.  Further, the ideas

behind these algorithms extend to other problems such as min-sum

clustering and clustering with excluded outliers.

 

In the second part of the talk, I will examine other algorithmic issues

that arise in clustering problems. I will describe min-wise independent

permutations that play an important role in efficiently estimating

similarity of documents.  They are used in clustering documents to

eliminate near duplicates in the AltaVista search index. I will also

formulate problems and describe results related to the incremental

maintenance of clusters in a dynamic point set.

 

Refreshments to follow.

 

Email: talk-info@cs.washington.edu

Info: http://www.cs.washington.edu