research

These days I spend my time helping people build and launch new pieces of technology, and don't do much academic research. This page mostly chronicles old work from my previous life as an academician. Most of the links here need to be fixed; you can find some that are more likely to work over at http://research.google.com/pubs/DavidCohn.html.

(Past) Interests

Learning structure from document bases – the vast majority of available electronic documents reside in unstructured document bases. Hypertext remains woefully inadequate: it is labor-intensive, static, one-directional, and limited to the perspective of the hypertext author. And when we move beyond the traditional text document to audio, video, and other forms of recorded information, even the rudimentary benefits of hypertext are missing.

Can we automate the extraction of relationships between documents to generate a dynamic structure that lets users navigate and manipulate unstructured document bases at the conceptual level?

Active learning - many learning problems in pattern recognition and information retrieval afford chances for the learner to select or influence the training data it receives. How should it select training data to learn most quickly, for the least cost?

Also dabbled in: thin-client robotics, decision support and optimization.

The Slightly-More Recent Stuff

David Cohn, Deepak Verma and Karl Pfleger (2006). Recursive Attribute Factoring, in B. Scholkopf and J. Platt and T. Hoffman, eds., Advances in Neural Information Processing Systems 19. < abstract >

David Cohn (2003). Informed Projections, in S. Becker et al., eds., Advances in Neural Information Processing Systems 15. < abstract >

David Cohn and Thomas Hofmann (2001). The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity, in T. Leen et al., eds., Advances in Neural Information Processing Systems 13. < abstract >

Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, and Vibhu Mittal (2000). Bridging the lexical chasm: Statistical approaches to answer-finding, Proceedings of the 23rd Annual Conference on Research and Development in Information Retrieval (ACM SIGIR). Athens, Greece. < abstract >

David Cohn and Huan Chang (2000). Probabilistically Identifying AuthoritativeDocuments, Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, CA. < abstract >

Huan Chang, David Cohn and Andrew McCallum (2000).Creating Customized Authority Lists, Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, CA. < abstract >

Greg Schohn and David Cohn (2000). Less is More: Active Learning with Support Vector Machines, Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, CA. < abstract >

Brigham Anderson, Andrew Moore and David Cohn (2000). A Nonparametric Approach to Noisy and Costly Optimization, Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, CA. < abstract >

Earlier Research

Satinder Singh and David Cohn. (1998) How to dynamically merge Markov decision processes, in M. Jordan et al, eds, Advances in Neural Information Processing Systems 10. < abstract >

David Cohn. (1997) Minimizing Statistical Bias with Queries, in M. Mozer et al, eds, Advances in Neural Information Processing Systems 9. Also appears as AI Lab Memo 1552, CBCL Paper 124. < abstract >

David Cohn and Satinder Singh. (1997) Predicting lifetimes in dynamically allocated memory, in M. Mozer et al, eds, Advances in Neural Information Processing Systems 9. < abstract >

David Cohn, Zoubin Ghahramani, and Michael Jordan. (1996).Active learning with statistical models, Journal of Artificial Intelligence Research, (4): 129-145. < abstract >

David Cohn. (1996). Neural network exploration using optimal experiment design, Neural Networks (9)6: 1071-1083. Preliminary version available online as AI Lab Memo 1491, CBCL Paper 99 < abstract >

David Cohn, Les Atlas and Richard Ladner. (1994) Improving generalization with active learning, Machine Learning 15(2):201-221. < abstract >

David Cohn, Eve Riskin and Richard Ladner. (1994) The theory and practice of vector quantizers trained on small training sets, IEEE Transactions on Pattern Analysis and Machine Intelligence 16(1):54-65. < abstract >

David Cohn. (1994) Queries and exploration using optimal experiment design, in J. Cowan et al, eds, Advances in Neural Information Processing Systems 6. < abstract >

Selected paper abstracts

David Cohn, Deepak Verma and Karl Pfleger (2006).

Clustering, or factoring of a document collection attempts to .explain. each observed document in terms of one or a small number of inferred prototypes. Prior work demonstrated that when links exist between documents in the corpus (as is the case with a collection of web pages or scientific papers), building a joint model of document contents and connections produces a better model than that built from contents or connections alone.

Many problems arise when trying to apply these joint models to corpus at the scale of the World Wide Web, however; one of these is that the sheer overhead of representing a feature space on the order of billions of dimensions becomes impractical.

We address this problem with a simple representational shift inspired by probabilistic relational models: instead of representing document linkage in terms of the identities of linking documents, we represent it by the explicit and inferred attributes of the linking documents. Several surprising results come with this shift: in addition to being computationally more tractable, the new model produces factors that more cleanly decompose the document collection. We discuss several variations on this model and show how some can be seen as exact generalizations of the PageRank algorithm.

David Cohn (2003).

Low rank approximation techniques are widespread in pattern recognition research --- they include Latent Semantic Analysis (LSA), Probabilis tic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected.

Such techniques are generally "unsupervised," which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets.

David Cohn and Thomas Hofmann (2001).

We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as authoritative documents within those topics. Furthermore, the relationships between topics is mapped out in order to build a predictive model of link content. Among the many applications of this approach are information retrieval and search, topic identification, query disambiguation, focused web crawling, web authoring, and bibliometric analysis.

Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, and Vibhu Mittal (2000).

This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions. The learning process consists of inspecting a collection of answered questions and characterizing the relation between question and answer with a statistical model. For the purpose of learning this relation, we propose two sources of data: Usenet FAQ documents and customer service call-center dialogues from a large retail company. We will show that the task of "answer-finding" differs from both document retrieval and traditional question-answering, presenting challenges different from those found in these problems. The central aim of this work is to discover, through theoretical and empirical investigation, those statistical techniques best suited to the answer-finding problem.

David Cohn and Huan Chang, (2000).

We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumptions, our model provides probabilistic estimates that have clear semantics. We also find that in general, the identified authoritative documents correspond better to human intuition.

Huan Chang, David Cohn and Andrew McCallum (2000).

The proliferation of hypertext and the popularity of Kleinberg's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics literature provide a method for finding authoritative sources on a particular topic, they do not allow individual users to inject their own opinions about what sources are authoritative. This paper presents a learning technique which incorporates user feedback to adjust its model of document authority so that it better corresponds to the user's internal model. We demonstrate how this technique can be used to generate user-specific authority lists. We present experimental results based on a database of about one million references collected as part of the Cora on-line index of the computer science literature.

Greg Schohn and David Cohn (2000).

We describe a simple active learning heuristic which greatly enhances the generalization behavior of support vector machines (SVMs) on several practical document classification tasks. We observe a number of benefits, the most surprising of which is that a SVM trained on a well-chosen subset of the available corpus frequently performs better than one trained on all available data. The heuristic for choosing this subset is simple to compute, and makes no use of information about the test set. Given that the training time of SVMs depends heavily on the training set size, our heuristic not only offers better performance with fewer data, it frequently does so in less time than the naive approach of training on all available data.

Brigham Anderson, Andrew Moore and David Cohn (2000).

This paper describes pairwise bisection, a nonparametric approach to optimizing a noisy function with few function evaluations. The algorithm uses nonparametric reasoning about simple geometric relationships to find minima efficiently. Two factors often frustrate optimization: noise and cost. Output can contain significant quantities of noise or error, while time or money allow for only a handful of experiments. Pairwise bisection is used here to attempt to automate the process of robust and efficient experiment design. Real world functions also tend to violate traditional assumptions of continuousness and Gaussian noise. Since nonparametric statistics do not depend on these assumptions, thisalgorithm can optimize a wide variety of phenomena with fewer restrictions placed on noise. The algorithm's performance is compared to that of three competing algorithms, Amoeba, PMAX and Q2 on several different test functions. Results on these functions indicate competitive performance and superior resistance to noise.

Satinder Singh and David Cohn, (1998).

We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to efficiently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dynamic programming algorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem.

David Cohn (1997).

I describe a querying criterion that attempts to minimize the error of a learner by minimizing its estimated squared bias. I describe experiments with locally-weighted regression on two simple problems, and observe that this `bias-only' approach outperforms the more common `variance-only' exploration approach, even in the presence of noise.

David Cohn and Satinder Singh, (1997).

Predictions of lifetimes of dynamically allocated objects can and have been used to improve time and space efficiency of dynamic memory management in computer programs. Barrett and Zorn (1993) used a simple lifetime predictor and demonstrated this improvement on a variety of computer programs. In this paper, we use decision trees to do lifetime prediction on the same programs and show significantly better prediction. Our method also has the advantage that during training we can use a large number of features and let the decision tree automatically choose the relevant subset.

David Cohn, Zoubin Ghahramani, and Michael Jordan, (1996).

For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

David Cohn, Les Atlas, and Richard Ladner, (1994).

Active learning differs from passive "learning from examples" in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful that learning from examples alone, giving better generalization for a fixed number of training examples. In this paper, we consider the problem of learning a binary concept in the absence of noise (Valiant 1984). We describe a formalism for active concept learning called selective sampling, and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers "useful." We test our implementation, called an SG-network, on three domains, and observe significant improvement in generalization.

David Cohn, Eve Riskin, and Richard Ladner, (1994).

We examine how the performance of a memoryless vector quantizer changes as a function of its training set size. Specifically, we study how well the training set distortion predicts test distortion when the training set is a randomly drawn subset of blocks from the test or training image(s). Using the Vapnik-Chervonenkis dimension, we derive formal bounds for the difference of test and training distortion of vector quantizer codebooks. We then describe extensive empirical simulations that test these bounds for a variety of bit rates and vector dimensions, and give practical suggestions for determining the training set size necessary to achieve good generalization from a codebook. We conclude that, by using training sets comprised of only a small fraction of the available data, one can produce results that are close to the results obtainable when all available data are used.

David Cohn, (1994).

Consider the problem of learning input/output mappings through exploration, e.g. learning the kinematics or dynamics of a robotic manipulator. If actions are expensive and computation is cheap, then we should explore by selecting a trajectory through the input space which gives us the most amount of information in the fewest number of steps. I discuss how results from the field of optimal experiment design may be used to guide such exploration, and demonstrate its use on a simple kinematics problem.

David Cohn, (1996). (expanded version of above paper)

I consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov and MacKay, I apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely. I conclude that, while not a panacea, OED-based query/action has much to offer, especially in domains where its high computational costs can be tolerated.