From: Kendra Smith
Sent: Wednesday, April 19, 2000 8:38 PM
To: Research Division (FULL)
Cc: Kendra Smith
Subject: New MSR Technical Reports - Jan/Feb/March 2000
Categories: Completed
The following M?crosöft Research technical reports were published January through March, 2000:
MSR-TR-2000-25
Explicit Isoperimetric Constants, Phase Transitions in the Random-cluster
and Potts Models, and Bernoullicity
Häggström, Olle ; Jonasson, Johan ; Lyons, Russell
March 2000
31 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-25
Please follow link above to view abstract.
MSR-TR-2000-24
Managing Adjacency in Triangular Meshes
Loop, Charles
January 2000
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-24
The problem of efficiently accessing and maintaining adjacency information for triangulations over general surface domains is addressed. Rapid access to adjacent vertices, edges, and triangles is an important aspect of multiresolution techniques, from subdivision surfaces to mesh simplification. Novel data structures and algorithms for the construction, manipulation, and traversal of triangulations suitable for dynamic multiresolution framework are presented.
MSR-TR-2000-23
Statistical Learning and Kernel Methods
Schölkopf, Bernhard
February 2000
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-23
We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces.
MSR-TR-2000-22
Kernel Method for Percentile Feature Extraction
Schölkopf, Bernhard ; Platt, John C. ; Smola, Alex J.
February 2000
12 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-22
A method is proposed which computes a direction in a dataset such that a specified fraction of a particular class of all examples is separated from the overall mean by a maximal margin. The projector onto that direction can be used for class-specific feature extraction. The algorithm is carried out in a feature space associated with a support vector kernel function, hence it can be used to construct a large class of nonlinear feature extractors. In the particular case where there exists only one class, the method can be thought of as a robust form of principal component analysis, where instead of variance we maximize percentile thresholds. Finally, we generalize it to also include the possibility of specifying negative examples.
MSR-TR-2000-20
Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions
Yang, Cheng ; Fayyad, Usama ; Bradley, Paul S.
February 2000
25 p.
Download:
http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-20
We present a generalization of frequent itemsets allowing the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (customer-purchase data, web browsing data, text, etc.). This efficient algorithm exploits sparsity of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering high-dimensional data, query selectivity estimation and collaborative filtering. Results show that we consistently uncover structure in large sparse databases that other more traditional clustering algorithms in data mining fail to find.
MSR-TR-2000-18
Visualization of Navigation Patterns on a Web Site Using Model Based
Clustering
Cadez, Igor ; Heckerman, David ; Meek, Christopher ; Smyth, Padhraic ; White,
Steven
March 2000
21 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-18
We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we first partition site users into clusters such that only users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is a model based (as opposed to distance based) and partitions users according to the order in which they request Web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. Our algorithm scales linearly with both number of users and number of clusters, and our implementation easily handles millions of users and thousands of clusters. In the paper, we describe the details of our technology and a tool based on it called WebCANVAS. We illustrate the use of our technology on user-traffic data from msnbc.com.
MSR-TR-2000-17
A Decision Theoretic Approach to Targeted Advertising
Chickering, David Maxwell ; Heckerman, David
February 2000
11 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-17
A simple advertising strategy that can be used to help increase sales of a product is to mail out special offers to selected potential customers. Because there is a cost associated with sending each offer, the optimal mailing strategy depends on both the benefit obtained from a purchase and how the offer affects the buying behavior of the customers. In this paper, we describe two methods for partitioning the potential customers into groups, and show how to perform a simply cost-benefit analysis to decide which, if any, of the groups should be targeted. In particular, we consider two decision-tree learning algorithms. The first is an "off the shelf" algorithm used to model the probability that groups of customers will buy the product. The second is a new algorithm that is similar to the first, except that for each group, it explicitly models the probability of purchase under the two mailing scenarios: (1) the mail is sent to members of that group and (2) the mail is not sent to members of the group. Using data from a real-world advertising experiment, we compare the algorithms to each other and to a naive mail-to-all strategy.
MSR-TR-2000-16
Dependency Networks for Density Estimation, Collaborative Filtering, and
Data Visualization
Heckerman, David ; Chickering, David Maxwell ; Meek, Christopher ; Rounthwaite,
Robert ; Kadie, Carl
February 2000
19 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-16
We describe a graphical representation of probabilistic relationships - an alternative to the Bayesian network - called a dependency network. Like a Bayesian network, a dependency network has a graph and probability component. The graph component is a (cyclic) directed graph such that a node's parents render that node independent of all other nodes in the network. The probability component consists of the probability of a node given its parents for each node (as in a Bayesian network). We identify several basic properties of this representation, and describe its use in density estimation, collaborative filtering (the task of predicting preferences), and the visualization of predictive relationships.
MSR-TR-2000-15
Fast Learning from Sparse Data
Chickering, David Maxwell ; Heckerman, David
February 1999 (revised May 1999)
13 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-15
We describe two techniques that significantly improve the running time of several standard machine-learning algorithms when data is sparse. The first technique is an algorithm that efficiently extracts one-way and two-way counts - either real or expected - from discrete data. Extracting such counts is a fundamental step in learning algorithm for constructing a variety of models including decision trees, decision graphs, Bayesian networks, and naive-Bayes clustering models. The second technique is an algorithm that efficiently performs the E-step of the EM algorithm (i.e., inference) when applied to a naive-Bayes clustering model. Using real-world data sets, we demonstrate a dramatic decrease in running time for algorithms that incorporate these techniques.
MSR-TR-2000-14
Boolean Programs: A Model and Process for Software
Analysis
Ball, Thomas ; Rajamani, Sriram K.
February 2000 (Revised March 2000)
29 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-14
A fundamental issue in model checking of software is the choice of a model for software. We present a model called boolean programs that is expressive enough to represent features in common programming languages and is amenable to model checking. We present a model checking algorithm for boolean programs using context-free-language reachability. The model checking algorithm allows procedure calls with unbounded recursion, exploits locality of variable scopes, and gives short error traces. Furthermore, we give a process for incrementally refining an initial skeletal boolean program B (representing a source program P) with respect to a particular reachability query in P. The presence of infeasible paths in P may lead to the model checker reporting false positive errors in B. We show how to refine B by introducing boolean variables to rule out the infeasible paths. The process uses ideas from model checking, symbolic execution, and program slicing.
MSR-TR-2000-09
Language-Agnostic Program Rendering for Presentation, Debugging and
Visualization
Collberg, Christian ; Davey, Sean ; Proebsting, Todd A.
February 2000
8 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-09
We describe a language-independent and specification-driven program rendering tool that is able to produce high-quality code renderings of arbitrary complexity. The tool can incorporate arbitrary types of information together with the program code, allowing it to be used for debugging and profiling as well as for producing beautiful renderings of programs for publication.
We also present a model for the rendering of programs and apply it to the design of a rendering of Java control flow.
MSR-TR-2000-08
An Experimental Study of Internet Routing Convergence
Labovitz, Craig ; Ahuja, Abha ; Bose, Abhijit ; Jahanian, Farnam
February 2000
22 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-08
This paper examines the latency in Internet path failure, failover and repair due to the convergence properties of inter-domain routing. Unlike switches in the public telephony network which exhibit failover on the order of milliseconds, we show that inter-domain routers in the packet switched Internet may take several minutes to reach a consistent view of the network topology after a fault. These delays stem from temporary routing table oscillations formed during operation of the BGP path selection process on Internet backbone routers. During these periods of delayed convergence, end-to-end Internet paths will experience intermittent loss of connectivity, as well as increased packet loss and latency. We present a two-year study of Internet routing convergence through the experimental instrumentation of key portions of the Internet infrastructure, including both passive data collection and fault-injection machines at major Internet exchange points. Based on data from the injection and measurement of several hundred thousand inter-domain routing faults, we describe several unexpected properties of convergence and show that the measured upper bound on Internet inter-domain routing convergence delay is an order of magnitude slower than previously thought. Our analysis also shows that the upper computational bound on the number of router states and control messages exchanged during the process of BGP convergence is exponential with respect to the number of autonomous systems on the Internet. Finally, we demonstrate that much of the observed convergence delay stems from both specific router vendor implementation decisions, as well as ambiguity in the BGP specification.
MSR-TR-2000-07
A Toolkit for Building Dependable and Extensible Home Networking
Applications
Wang, Yi-Min ; Russell, Wilf ; Arora, Anish
February 2000
10 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-07
Dependability and extensibility are two of the key requirements to successful home networking. In this paper, we describe the design and implementation of a software development toolkit for building dependable and extensible home networking applications. A Soft-State Store (SSS) is implemented as a shared infrastructure to simplify robust distributed programming against device and object failures. SSS supports multi-timescale refreshes and selectively uses persistence to accommodate the battery power and network bandwidth constraints in the home networking environment. A publish/subscribe event system allows any changes in the SSS to be propagated to interested subscribers, which then perform appropriate adaptive, corrective, alerting, or cleanup actions. An Attributed-Based Lookup Service (ABLS) and a Name-Based Lookup Service (NBLS), both implemented on top of the SSS for robustness, provide one level of indirection for supporting extensibility as well as allow user-friendly, natural language-based access. We demonstrate the use of the toolkit for building home networking applications in an actual deployment and present performance numbers.
MSR-TR-2000-06
AlgoVista - A Search Engine for Computer Scientists
Collberg, Christian ; Proebsting, Todd A.
February 2000
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-06
We describe AλgoVista, a web-based search engine designed to allow applied computer scientists to classify problems and find algorithms and implementations that solve these problems. Unlike other search engines, AλgoVista is not keyword based. Rather, users provide a set of input output samples that describe the behavior of the problem they wish to classify. This type of query-by-example requires no knowledge of specialized terminology, only an ability to formalize the problem.
The search mechanism of AλgoVista is based on a novel application of program checking, a technique developed as an alternative to program verification and testing.
MSR-TR-2000-05
Cubism and Cameras: Free-form Optics for Computer Graphics
Glassner, Andrew S.
January 2000
16 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-05
The camera model used to create synthetic 3D computer graphics is usually based on physical cameras. Cubist painters explored the process of creating images from multiple, simultaneous points of view. We believe that cubist principles can lead to a rich variety of interesting and dramatically useful idioms for illustration and storytelling both in still images and in motion. We present the general idea, show some drawn images of the idioms, discuss the applications, and describe our implementation. We also discuss our plans for an improved interface for designers, directors, and other image-makers who do not wish to work directly with 3D modeling systems. We conclude with some preliminary results.
MSR-TR-2000-03
Programming Shorthands
Proebsting, Todd A. ; Zorn, Benjamin G.
January 2000
12 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-03
We propose programming language mechanisms to reduce redundancy in program source code. These abbreviation mechanisms, shorthands, make programs shorter and easier to write and read. In addition, we provide a framework for describing language abbreviation mechanisms.
MSR-TR-2000-02
Freedman, Michael H.
Poly-locality in quantum computing
December 1999
12 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-02
A polynomial depth quantum circuit effects, by definition a poly-local unitary transformation of tensor product state space. It is a physically reasonable belief [Fy][L][FKW] that these are precisely the transformations which will be available from physics to help us solve computational problems. The poly-locality of discrete Fourier transform on cyclic groups is at the heart of Shor's factoring algorithm. We describe a class of poly-local transformations, including all the discrete orthogonal wavelet transforms in the hope that these may be helpful in constructing new quantum algorithms. We also observe that even a rather mild violation of poly-locality leads to a model without one-way functions, giving further evidence that poly-locality is an essential concept
MSR-TR-2000-01
Tracking Self-Occluding Articulated Objects in Dense Disparity Maps
Jojic, Nebojsa ; Turk, Matthew ; Huang, Thomas S.
January 2000
9 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-01
In this paper, we present an algorithm for real-time tracking of articulated structures in dense disparity maps derived from stereo image sequences. A statistical image formation model that accounts for occlusions plays the central role in our tracking approach. This graphical model (a Bayesian network) assumes that the range image of each part of the structure is formed by drawing the depth candidates from a 3-D Gaussian distribution. The advantage over the classical mixture of Gaussians is that our model takes into account occlusions by picking the minimum depth (which could be regarded as a probabilistic version of z-buffering). The model also enforces articulation constraints among the parts of the structure. The tracking problem is formulated as an inference problem in the image formation model. This model can be extended and used for other tasks in addition to the one described in the paper and can also be used for estimating probability distribution functions instead of the ML estimates of the tracked parameters. For the purposes of real-time tracking, we used certain approximations in the inference process, which resulted in a real-time two-stage inference algorithm. We were able to successfully track upper human body motion in real time and in the presence of self-occlusions.
MSR-TR-99-101
Understanding Data Through Analysis of Structured Markup and Usage
Altschuler, Steven J. ; Jung, Edward ; Wu, Lani F.
November 1999
10 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-99-101
Data and documents with semantic markup (such as in XML) allows for the construction of usage models that are human interpretable. A framework for performing analysis over semantic markup data in conjunction with usage information is described along with applications for performing automated clustering and “find-similar”. A strong emphasis on “human interpretability” is placed on the results of these analytical techniques.
MSR-TR-99-84
From Polymorphic Subtyping to CFL Reachability: Context-Sensitive Flow
Analysis Using Instantiation Constraints
Fahndrich, Manuel ; Rehof, Jakob ; Das, Manuvir
March 2000
77 p.
Download: http://research.M?crosöft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-99-84
We present a novel approach to computing context-sensitive flow of values through procedures and data structures. Our approach combines and extends techniques from two seemingly disparate areas: polymorphic subtyping and interprocedural dataflow analysis based on context-free language reachability. The resulting technique offers several advantages over previous approaches: it works directly on higher-order programs, provides demand-driven interprocedural queries, and improves the asymptotic complexity of a known algorithm based on polymorphic subtyping from O(n8) to O(n3) for computing all queries. For intra-procedural flow restricted to equivalence classes, our algorithm yields linear inter-procedural flow queries.