Homework 2 - Paper Review
In this homework, you have to review a paper from the list given below.
Paper selection is on a first-come-first basis.
You will have to submit a small bibtex annotation for your paper, and make a
poster presentation.
The bibTeX is due by Friday Feb 7.
The poster presentations will be on the morning
of Saturday Feb 8. Presentations will be in four batches of 15 each,
from 9.00AM till 13.00. Attendance is mandatory.
The list of papers are given below this writeup.
Clicking on each [pdf] link will get the .pdf.
Clicking on "bibTeX" will get the
bibliography details and in many cases, the abstract, and perhaps
some rudimentary annotation.
For part (a) You will have to write a review in the "annote" entry. Note that
the annote field can include latex commands and can be shown by creating an
annotated bibliography. You can also link images etc. Please put your name
at the bottom of your annote like this:
Filename: youruserid.bib. Additionally, you may wish to upload the .pdf from your bibtex as youruserid-bib.pdf (just convert your annote into a .tex and compile it). b. We will have a poster presentation on FEB 8, where each of you will make a brief presentation on your chosen paper. Please upload your posters BEFORE the session:
youruserid-hw2.pdf Dates:
-- YOUR NAME, userid, year
What to write in your review (and in your poster)
Short paragraphs or a few lines on each of these:- Describe the problem and why it is important. Don't say banal things like "understanding language is a critical task in AI"
- State the work that has been done before. In all cases, you will need to read up on a good bit of the background to understand these papers. What are the main "claims" of novelty in the paper?
- Describe the approach in not more than a paragraph or two. You should not put too many equations, but some key ideas and formulae should be stated (in LaTeX). In the presentation you should have a bit more detail, should people ask.
- Do the results justify the "claims" made? What are the assumptions used in actually doing the work? Do they weaken the claims?
- Is it a system that is likely to revolutionize AI or just a small step? Is the code being made available? Or an useful dataset?
Selection based on early choice : Project groups
The first people to make their choices will get them. All choices should be made by Friday (tomorrow) evening. Late latifs may find many choices gone. Those who have not chosen by then will be assigned a paper at random. Project group formations will have some input based on what papers you select, but this will largely be random. Feel free to choose papers from areas that are not where you want to do a project.Submission:
All submissions will be in yourarea/cs365/hw2/ a. A short writeup in the bibTeX annote format, giving your review on the paper, by FRIDAY FEB 7.Filename: youruserid.bib. Additionally, you may wish to upload the .pdf from your bibtex as youruserid-bib.pdf (just convert your annote into a .tex and compile it). b. We will have a poster presentation on FEB 8, where each of you will make a brief presentation on your chosen paper. Please upload your posters BEFORE the session:
youruserid-hw2.pdf Dates:
- Selection: by Friday Jan 31
- 1-page bibTeX review : by Friday Feb 7
- Presentation at mini-workshop: Feb 8
Papers Selected for Review
BibTeX Entries
@inproceedings{cox-pinto-11_beyond-simple-features-face-recog, title={Beyond simple features: A large-scale feature search approach to unconstrained face recognition}, author={Cox, David and Pinto, Nicolas}, booktitle={Automatic Face \& Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on}, pages={8--15}, year={2011}, annote = { ABSTRACT—Many modern computer vision algorithms are built atop of a set of low-level feature operators (such as SIFT [1], [2]; HOG [3], [4]; or LBP [5], [6]) that transform raw pixel values into a representation better suited to subsequent processing and classification. While the choice of feature representation is often not central to the logic of a given algorithm, the quality of the feature representation can have critically important implications for performance. Here, we demonstrate a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand. In particular, we show that a brute-force search can generate representations that, in combination with standard machine learning blending techniques, achieve state-of-the-art performance on the Labeled Faces in the Wild (LFW) [7] unconstrained face recognition challenge set. These representations outperform previous stateof- the-art approaches, in spite of requiring less training data and using a conceptually simpler machine learning backend. We argue that such large-scale-search-derived feature sets can play a synergistic role with other computer vision approaches by providing a richer base of features with which to work. }} @inproceedings{naseer-sturm-13_followme-person-following-quadcopter, title={Followme: Person following and gesture recognition with a quadrocopter}, author={Naseer, Tayyab and Sturm, Jurgen and Cremers, Daniel}, booktitle={Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on}, pages={624--630}, year={2013}, organization={IEEE} } @inproceedings{kallman-mataric-04_motion-planning-dynamic-roadmap, title={Motion planning using dynamic roadmaps}, author={Kallman, M and Mataric, Maja}, booktitle={Robotics and Automation, 2004. Proceedings. ICRA'04. 2004 IEEE International Conference on}, volume={5}, pages={4399--4404}, year={2004}, organization={IEEE} } @article{schapiro-rogers-13_neural-events-from-temporal-community-structure, title={Neural representations of events arise from temporal community structure}, author={Schapiro, Anna C and Rogers, Timothy T and Cordova, Natalia I and Turk-Browne, Nicholas B and Botvinick, Matthew M}, journal={Nature neuroscience}, year={2013}, annote = { compares response-time and FMRI studies on human subjects with an ANN model that sequences the same input. Also compares GLM (General Linear Model) simulations on brain region models. temporal sequences shown to subjects, e.g. 15 rotated patterns then 15 straight. Subjects are asked to segment the sequence by pressing keybar. Expt1: response time based. Expt 3: FMRI ABSTRACT: Our experience of the world seems to divide naturally into discrete, temporally extended events, yet the mechanisms underlying the learning and identification of events are poorly understood. Research on event perception has focused on transient elevations in predictive uncertainty or surprise as the primary signal driving event segmentation. We present human behavioral and functional magnetic resonance imaging (fMRI) evidence in favor of a different account, in which event representations coalesce around clusters or ‘communities’ of mutually predicting stimuli. Through parsing behavior, fMRI adaptation and multivoxel pattern analysis, we demonstrate the emergence of event representations in a domain containing such community structure, but in which transition probabilities (the basis of uncertainty and surprise) are uniform. We present a computational account of how the relevant representations might arise, proposing a direct connection between event learning and the learning of semantic categories. Expt1: sequence alternated between blocks of 15 images generated from a random walk on the graph and blocks of 15 images generated from a randomly selected Hamiltonian path through the graph (a path visiting every node exactly once). The purpose of interspersing Hamiltonian paths was to ensure that parsing behavior could not be explained by local statistics of the sequence (for example, after seeing items within a cluster repeat several times, participants might use the relative novelty of an item from a new cluster as a parsing cue). }} @article{qiu-G-liu-Bing-11_opinion-word-double-propagation, title={Opinion word expansion and target extraction through double propagation}, author={Qiu, Guang and Liu, Bing and Bu, Jiajun and Chen, Chun}, journal={Computational linguistics}, volume={37}, number={1}, pages={9--27}, year={2011}, annote = { ABSTRACT Analysis of opinions, known as opinion mining or sentiment analysis, has attracted a great deal of attention recently due to many practical applications and challenging research problems. In this article, we study two important problems, namely, opinion lexicon expansion and opinion target extraction. Opinion targets (targets, for short) are entities and their attributes on which opinions have been expressed. To perform the tasks, we found that there are several syntactic relations that link opinion words and targets. These relations can be identified using a dependency parser and then utilized to expand the initial opinion lexicon and to extract targets. This proposed method is based on bootstrapping. We call it double propagation as it propagates information between opinion words and targets. A key advantage of the proposed method is that it only needs an initial opinion lexicon to start the bootstrapping process. Thus, the method is semi-supervised due to the use of opinion word seeds. In evaluation, we compare the proposed method with several state-of-the-art methods using a standard product review test collection. The results show that our approach outperforms these existing methods significantly. }} @article{klapaftis-manandhar-13_word-sense-induction, title={Evaluating Word Sense Induction and Disambiguation Methods}, author={Klapaftis, Ioannis P and Manandhar, Suresh}, journal={Language Resources and Evaluation}, pages={1--27}, year={2013}, publisher={Springer} doi={10.1007/s10579-012-9205-0}, pages={1-27}, annote ={ Ioannis P. Klapaftis, Suresh Manandhar Abstract Word Sense Induction (WSI) is the task of identifying the different uses (senses) of a target word in a given text in an unsupervised manner, i.e. without relying on any external resources such as dictionaries or sense-tagged data. This paper presents a thorough description of the SemEval-2010 WSI task and a new evaluation setting for sense induction methods. Our contributions are two-fold: firstly, we provide a detailed analysis of the Semeval-2010 WSI task evaluation results and identify the shortcomings of current evaluation measures. Secondly, we present a new evaluation setting by assessing participating systems’ performance according to the skewness of target words’ distribution of senses showing that there are methods able to perform well above the Most Frequent Sense (MFS) baseline in highly skewed distributions. }} @article{boyd-blei-10_syntactic-topic-models, title={Syntactic topic models}, author={Boyd-Graber, Jordan and Blei, David M}, journal={arXiv preprint arXiv:1002.4665}, year={2010}, annote = { When we read a sentence, we use two kinds of reasoning: one for understanding its syntactic structure and another for integrating its meaning into the wider context of other sentences, other paragraphs, and other documents. Both mental processes are crucial, and psychologists have found that they are distinct. A syntactically correct sentence that is semantically implausible takes longer for people to understand than its semantically plausible counterpart (Rayner et al. 1983). Furthermore, recent brain imaging experiments have localized these processes in different parts of the brain (Dapretto and Bookheimer 1999). Both of these types of reasoning should be accounted for in a probabilistic model of language. [Dapretto and Bookheimer1999] Mirella Dapretto and Susan Y. Bookheimer. 1999. Form and content: Dissociating syntax and semantics in sentence comprehension. Neuron, 24(2):427–432. To see how these mental processes interact, consider the following sentence from a travel brochure, Next weekend, you could be relaxing in ____. How do we reason about filling in the blank? First, because the missing word is the object of a preposition, it should act like a noun, perhaps a location like “bed,” “school,” or “church.” Second, because the document is about travel, we expect travel-related terms. This further restricts the space of possible terms, leaving alternatives like “Nepal,” “Paris,” or “Bermuda” as likely possibilities. Each type of reasoning restricts the likely solution to a subset of words, but the best candidates for the missing word are in their intersection. ABSTRACT: The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics. }} @inproceedings{yang-ramanan-12_proxemics-in-personal-photos, title={Recognizing proxemics in personal photos}, author={Yang, Yi and Baker, Simon and Kannan, Anitha and Ramanan, Deva}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on}, pages={3522--3529}, year={2012}, organization={IEEE} } @inproceedings{parkHSS-sheikh-13iccv_3D-reconstruction-articulation, title={3D reconstruction of a smooth articulated trajectory from a monocular image sequence}, author={Park, Hyun Soo and Sheikh, Yaser}, booktitle={Computer Vision (ICCV), 2011 IEEE International Conference on}, pages={201--208}, year={2011}, organization={IEEE} annote ={ http://www.cs.cmu.edu/~hyunsoop/articulated_trajectory.html ABSTRACT An articulated trajectory is defined as a trajectory that remains at a fixed distance with respect to a parent trajectory. In this paper, we present a method to reconstruct an articulated trajectory in three dimensions given the two dimensional projection of the articulated trajectory, the 3D parent trajectory, and the camera pose at each time instant. This is a core challenge in reconstructing the 3D motion of articulated structures such as the human body because endpoints of each limb form articulated trajectories. We simultaneously apply activity-independent spatial and temporal constraints, in the form of fixed 3D distance to the parent trajectory and smooth 3D motion. There exist two solutions that satisfy each instantaneous 2D projection and articulation constraint (a ray intersects a sphere at up to two locations) and we show that resolving this ambiguity by enforcing smoothness is equivalent to solving a binary quadratic programming problem. A geometric analysis of the reconstruction of articulated trajectories is also presented and a measure of the reconstructibility of an articulated trajectory is proposed. }} @inproceedings{karthikeyan-manjunath-13iccv_where-what-we-see, title={From Where and How to What We See}, author={Karthikeyan, S and Jagadeesh, Vignesh and Shenoy, Renuka and Eckstein, Miguel and Manjunath, BS}, booktitle={IEEE International Conference on Computer Vision$\}$} year=2013, annote = { predicting face and text regions in images using eye tracking data from multiple subjects. }} @article{kennedy-balzano-13_online-factorization-SVD, title={Online Algorithms for Factorization-Based Structure from Motion}, author={Kennedy, Ryan and Balzano, Laura and Wright, Stephen J and Taylor, Camillo J}, year={2013}, journal=arxiv annote = { ++++ Low-rank matrix completion is the problem of recovering a low-rank matrix from an incomplete sample of the entries. It was shown in [3], [22] that under assumptions on the number of observed entries and on incoherence of the singular vectors of this matrix with respect to the canonical coordinate axes, the nuclear norm minimization convex optimization problem solves the NP-hard rank minimization problem exactly. Since this breakthrough, a flurry of research activity has centered around developing faster algorithms to solve this convex optimization problem, both exact and approximate; see [23], [24] for two examples. The online algorithm Grouse [6] (Grassmannian Rank-One Update Subspace Estimation) outperforms all nonparallel algorithms in computational efficiency, often by an order of magnitude, while remaining competitive in terms of estimation error. [3] B. Recht, “A simpler approach to matrix completion,” Journal of Machine Learning Research, vol. 12, pp. 3413–3430, 2011. 1, 2 [6] L. Balzano, R. Nowak, and B. Recht, “Online identification and tracking of subspaces from highly incomplete information,” in Communication, Control, and Computing (Allerton). IEEE, 2010, pp. 704–711. 1, 2, 3, 8 [22] E. Cand`es and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717–772, December 2009. 2 ABSTRACT We present a family of online algorithms for real-time factorization-based structure from motion, leveraging a relationship between incremental singular value decomposition and recently proposed methods for online matrix completion. Our methods are orders of magnitude faster than previous state of the art, can handle missing data and a variable number of feature points, and are robust to noise and sparse outliers. We demonstrate our methods on both real and synthetic sequences and show that they perform well in both online and batch settings. We also provide an implementation which is able to produce 3D models in real time using a laptop with a webcam. }} @inproceedings{srivastava-salakhutdinov-12-nips_multimodal-learning-deep, title={Multimodal learning with deep Boltzmann machines}, author={Srivastava, Nitish and Salakhutdinov, Ruslan}, booktitle={Advances in Neural Information Processing Systems 25}, pages={2231--2239}, year={2012}, abstract = { A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains. }, annote = { Inputs are text and images separately; these are then merged after a 3-layer initial set. Can have different data-flow models between the layers, resulting in different computational (training) costs. Enables system to learn correlations between images and text labels, so that searching by new labels returns images etc. Based on the MIR Flickr Data set: 1 million images retrieved from Flickr along with their user assigned tags. [10] Mark J. Huiskes and Michael S. Lew. The MIR Flickr retrieval evaluation. In MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM. }} @incollection{monner-reggia-11_systematically-grounding-language-deep, title={Systematically grounding language through vision in a deep, recurrent neural network}, author={Monner, Derek D and Reggia, James A}, booktitle={Artificial General Intelligence}, pages={112--121}, year={2011}, publisher={Springer} } @inproceedings{hsiao-hebert-13_gradient-networks-shape-matching, title={Gradient Networks: Explicit Shape Matching Without Extracting Edges}, author={Hsiao, Edward and Hebert, Martial}, booktitle{Proceedings AAAI '13}, year={2013} annote = { ABSTRACT We present a novel framework for shape-based template matching in images. While previous approaches required brittle contour extraction, considered only local information, or used coarse statistics, we propose to match the shape explicitly on low-level gradients by formulating the problem as traversing paths in a gradient network. We evaluate our algorithm on a challenging dataset of objects in cluttered environments and demonstrate significant improvement over state-of-theart methods for shape matching and object detection. }} @article{pezzulo-barsalou-cangelosi-11_mechanics-of-embodiment-computational, title={The mechanics of embodiment: a dialog on embodiment and computational modeling}, author={Pezzulo, G. and Barsalou, L.W. and Cangelosi, A. and Fischer, M.H. and McRae, K. and Spivey, M.J.}, journal={Frontiers in psychology}, volume={2}, year={2011}, publisher={Frontiers Media SA} annote = { Abstract Embodied theories are increasingly challenging traditional views of cognition by arguing that conceptual representations that constitute our knowledge are grounded in sensory and motor experiences, and processed at this sensorimotor level, rather than being represented and processed abstractly in an amodal conceptual system. Given the established empirical foundation, and the relatively underspecified theories to date, many researchers are extremely interested in embodied cognition but are clamoring for more mechanistic implementations. What is needed at this stage is a push toward explicit computational models that implement sensorimotor grounding as intrinsic to cognitive processes. In this article, six authors from varying backgrounds and approaches address issues concerning the construction of embodied computational models, and illustrate what they view as the critical current and next steps toward mechanistic theories of embodiment. The first part has the form of a dialog between two fictional characters: Ernest, the "experimenter," and Mary, the "computational modeler." The dialog consists of an interactive sequence of questions, requests for clarification, challenges, and (tentative) answers, and touches the most important aspects of grounded theories that should inform computational modeling and, conversely, the impact that computational modeling could have on embodied theories. The second part of the article discusses the most important open challenges for embodied computational modeling. }} @article{little-sommer-13_learning-action-perception-loops, title={Learning and exploration in action-perception loops}, author={Little, Daniel Y and Sommer, Friedrich T}, journal={Frontiers in neural circuits}, volume={7}, year={2013}, instn = {UC Berkeley-Molecular and Cell Biology}, date = {22 March}, doi = {10.3389/fncir.2013.00037}, annote={ ABSTRACT: Discovering the structure underlying observed data is a recurring problem in machine learning with important applications in neuroscience. It is also a primary function of the brain. When data can be actively collected in the context of a closed action-perception loop, behavior becomes a critical determinant of learning efficiency. Psychologists studying exploration and curiosity in humans and animals have long argued that learning itself is a primary motivator of behavior. However, the theoretical basis of learning-driven behavior is not well understood. Previous computational studies of behavior have largely focused on the control problem of maximizing acquisition of rewards and have treated learning the structure of data as a secondary objective. Here, we study exploration in the absence of external reward feedback. Instead, we take the quality of an agent's learned internal model to be the primary objective. In a simple probabilistic framework, we derive a Bayesian estimate for the amount of information about the environment an agent can expect to receive by taking an action, a measure we term the predicted information gain (PIG). We develop exploration strategies that approximately maximize PIG. One strategy based on value-iteration consistently learns faster than previously developed reward-free exploration strategies across a diverse range of environments. Psychologists believe the evolutionary advantage of learning-driven exploration lies in the generalized utility of an accurate internal model. Consistent with this hypothesis, we demonstrate that agents which learn more efficiently during exploration are later better able to accomplish a range of goal-directed tasks. We will conclude by discussing how our work elucidates the explorative behaviors of animals and humans, its relationship to other computational models of behavior, and its potential application to experimental design, such as in closed-loop neurophysiology studies. 1. Introduction Computational models of exploratory behavior have largely focused on the role of exploration in the acquisition of external rewards (Thrun, 1992; Kaelbling et al., 1996; Sutton and Barto, 1998; Kawato and Samejima, 2007). In contrast, a consensus has emerged in behavioral psychology that learning represents the primary drive underlying explorative behaviors (Archer and Birke, 1983; Loewenstein, 1994; Silvia, 2005; Pisula, 2009). The computational principles underlying learning-driven exploration, however, have received much less attention. To address this gap, we introduce here a mathematical framework for studying how behavior affects learning and develop a novel model of learning-driven exploration. Machine learning techniques for extracting the structure underlying sensory signals have often focused on passive learning systems that can not directly affect the sensory input. Exploration, in contrast, requires actively pursuing useful information and can only occur in the context of a closed action-perception loop. Learning in closed action-perception loops differs from passive learning both in terms of “what” is being learned as well as “how” it is learned (Gordon et al., 2011). In particular, in closed action-perception loops: * Sensorimotor contingencies must be learned. * Actions must be coordinated to direct the acquisition of data. Sensorimotor contingencies refer to the causal role actions play on the sensory inputs we receive, such as the way visual inputs change as we shift our gaze or move our head. They must be taken into account to properly attribute changes in sensory signals to their causes. This tight interaction between actions and sensation is reflected in the neuroanatomy where sensory-motor integration has been reported at all levels of the brain (Guillery, 2005; Guillery and Sherman, 2011). We often take our implicit understanding of sensorimotor contingencies for granted, but in fact they must be learned during the course of development (the exception being contingencies for which we are hard-wired by evolution). This is eloquently expressed in the explorative behaviors of young infants (e.g., grasping and manipulating objects during proprioceptive exploration and then bringing them into visual view during intermodal exploration) (Rochat, 1989; O'Regan and Noë, 2001; Noë, 2004). Not only are actions part of “what” we learn during exploration, they are also part of “how” we learn. To discover what is inside an unfamiliar box, a curious child must open it. To learn about the world, scientists perform experiments. Directing the acquisition of data is particularly important for embodied agents whose actuators and sensors are physically confined. Since the most informative data may not always be accessible to a physical sensor, embodiment may constrain an exploring agent and require that it coordinates its actions to retrieve useful data. }} @article{emonet-varadarajan-13pami_temporal-motif-mixtures-dirichlet-process, title={Temporal Analysis of Motif Mixtures using Dirichlet Processes.}, author={Emonet, R{\'e}mi and Varadarajan, Jagannadan and Odobez, Jean-March}, journal={IEEE transactions on pattern analysis and machine intelligence}, year={2013} annote = { ABSTRACT: In this paper, we present a new model for unsupervised discovery of recurrent temporal patterns (or motifs) in time series (or documents). The model is designed to handle the difficult case of multivariate time series obtained from a mixture of activities, that is, our observations are caused by the superposition of multiple phenomena occurring concurrently and with no synchronization. The model uses nonparametric Bayesian methods to describe both the motifs and their occurrences in documents. We derive an inference scheme to automatically and simultaneously recover the recurrent motifs (both their characteristics and number) and their occurrence instants in each document. The model is widely applicable and is illustrated on datasets coming from multiple modalities, mainly videos from static cameras and audio localization data. The rich semantic interpretation that the model offers can be leveraged in tasks such as event counting or for scene analysis. The approach is also used as a mean of doing soft camera calibration in a camera network. A thorough study of the model parameters is provided and a cross-platform implementation of the inference algorithm will be made publicly available. earlier version from AVSS-11: emonet-varadarajan-11avss_multi-camera-anomaly-detection }} @article{perfors-tenenbaum-regier-11_11learnability-of-syntax, title={The learnability of abstract syntactic principles}, author={Perfors, Amy and Tenenbaum, Joshua B and Regier, Terry}, journal={Cognition}, volume={118}, number={3}, pages={306--338}, year={2011}, publisher={Elsevier} ABSTRACT: Children acquiring language infer the correct form of syntactic constructions for which they appear to have little or no direct evidence, avoiding simple but incorrect generalizations that would be consistent with the data they receive. These generalizations must be guided by some inductive bias – some abstract knowledge – that leads them to prefer the correct hypotheses even in the absence of directly supporting evidence. What form do these inductive constraints take? It is often argued or assumed that they reflect innately specified knowledge of language. A classic example of such an argument moves from the phenomenon of auxiliary fronting in English interrogatives to the conclusion that children must innately know that syntactic rules are defined over hierarchical phrase structures rather than linear sequences of words (e.g., Chomsky, 1965, Chomsky, 1971, Chomsky, 1980 and Crain and Nakayama, 1987). Here we use a Bayesian framework for grammar induction to address a version of this argument and show that, given typical child-directed speech and certain innate domain-general capacities, an ideal learner could recognize the hierarchical phrase structure of language without having this knowledge innately specified as part of the language faculty. We discuss the implications of this analysis for accounts of human language acquisition. }} @inproceedings{tahri-youcef-13_efficient-pose-estimation-from-set-of-points, title={Efficient decoupled pose estimation from a set of points}, author={Tahri, Omar and Araujo, Helder and Mezouar, Youcef and Chaumette, Fran{\c{c}}ois and others}, booktitle={IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, IROS'2013}, year={2013} }} @inproceedings{salah-coenen-13_extracting-debate-graphs-UK, title={Extracting debate graphs from parliamentary transcripts: a study directed at UK house of commons debates}, author={Salah, Zaher and Coenen, Frans and Grossi, Davide}, booktitle={Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law}, pages={121--130}, year={2013}, annote = { ABSTRACT: The paper proposes a framework---the Debate Graph Extraction (DGE) framework---for extracting debate graphs from transcripts of political debates. The idea is to represent the structure of a debate as a graph with speakers as nodes and "exchanges" as links. Links between nodes are established according to the semantic similarity between the speeches and indicate an alignment of content between them. Nodes are labelled according to the "attitude" (sentiment) of the speakers, positive or negative, using a lexicon based technique founded on SentiWordNet. The attitude of the speakers is then used to label the graph links as being either "supporting" or "opposing". If both speakers have the same attitude (both negative or both positive) the link is labelled as being supporting; otherwise the link is labelled as being opposing. The resulting graphs capture the abstract representation of a debate as two opposing fractions exchanging arguments on related content. }} ====ICML 2013 @inproceedings{livni-lehavi-13-icml_vanishing-component-analysis, title={Vanishing Component Analysis}, author={Livni, Roi and Lehavi, David and Schein, Sagi and Nachliely, Hila and Shalev-Shwartz, Shai and Globerson, Amir}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={597--605}, year={2013}, annote = { ABSTRACT The vanishing ideal of a set of points, S IN Rn, is the set of all polynomials that attain the value of zero on all the points in S. Such ideals can be compactly represented using a small set of polynomials known as generators of the ideal. Here we describe and analyze an efficient procedure that constructs a set of generators of a vanishing ideal. Our procedure is numerically stable, and can be used to find approximately vanishing polynomials. The resulting polynomials capture nonlinear structure in data, and can for example be used within supervised learning. Empirical comparison with kernel methods show that our method constructs more compact classi- ers with comparable accuracy. }} @inproceedings{balasubramanian-yu-K-13-icml_smooth-sparse-coding, title={Smooth Sparse Coding via Marginal Regression for Learning Sparse Representations}, author={Balasubramanian, Krishnakumar and Yu, Kai and Lebanon, Guy}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={597--605}, arXiv={arXiv preprint arXiv:1210.1121}, year={2012} } @inproceedings{maurer-pontil-13-icml_sparse-coding-for-multitask-and-transfer-learning, title={Sparse coding for multitask and transfer learning}, author={Andreas Maurer and Massi Pontil and Bernardino Romera-Paredes}, booktitle={International Conference on Machine Learning (ICML)$\}$}, year={2013} } @inproceedings{muandet-scholkopf-13-icml_domain-generalization-via-invariant-features, title={Domain Generalization via Invariant Feature Representation}, author={Muandet, Krikamol and Balduzzi, David and Sch{\"o}lkopf, Bernhard}, booktitle={International Conference on Machine Learning (ICML)$\}$}, year={2013} } @inproceedings{zhangX-chuD-13icml_sparse-uncorrelated-LDA, title={Sparse Uncorrelated Linear Discriminant Analysis}, author={Zhang, Xiaowei and Chu, Delin}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={45--52}, year={2013} annote = { ABSTRACT. we develop a novel approach for sparse uncorrelated linear discriminant analysis (ULDA). Our proposal is based on characterization of all solutions of the generalized ULDA. We incorporate sparsity into the ULDA transformation by seeking the solution with minimum `1-norm from all minimum dimension solutions of the generalized ULDA. The problem is then formulated as a `1-minimization problem and is solved by accelerated linearized Bregman method. Experiments on high-dimensional gene expression data demonstrate that our approach not only computes extremely sparse solutions but also performs well in classification. Experimental results also show that our approach can help for data visualization in lowdimensional space. }} @inproceedings{hennig-13-icml_fast-probabilistic-optimization-w-noise, title={Fast probabilistic optimization from noisy gradients}, author={Hennig, Philipp}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={62--70}, year={2013} } @inproceedings{zhu-J-chen-N-13-icml_gibbs-max-margin-topic-models, title={Gibbs Max-Margin Topic Models with Fast Sampling Algorithms}, author={Zhu, Jun and Chen, Ning and Perkins, Hugh and Zhang, Bo}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={124--132}, year={2013} annote={ ABSTRACT: Existing max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents Gibbs max-margin supervised topic models by minimizing an expected margin loss, an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables, we develop simple and fast Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems for both classification and regression. Empirical results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors. }} @inproceedings{menon-tamuz-13-icml_learning-to-program-by-example, title = {A Machine Learning Framework for Programming by Example}, url = {http://jmlr.csail.mit.edu/proceedings/papers/v28/menon13.pdf}, author = {Aditya Menon and Omer Tamuz and Sumit Gulwani and Butler Lampson and Adam Kalai}, number = {1}, pages = {187-195}, volume = {28}, editor = {Sanjoy Dasgupta and David Mcallester}, year = {2013}, booktitle = {Proceedings of the 30th International Conference on Machine Learning (ICML-13)} annote = { ABSTRACT: Learning programs is a timely and interesting challenge. In Programming by Example (PBE), a system attempts to infer a program from input and output examples alone, by searching for a composition of some set of base functions. We show how machine learning can be used to speed up this seemingly hopeless search problem, by learning weights that relate textual features describing the provided input-output examples to plausible sub-components of a program. This generic learning framework lets us address problems beyond the scope of earlier PBE systems. Experiments on a prototype implementation show that learning improves search and ranking on a variety of text processing tasks found on help forums. }} @inproceedings{song-darrell-13_discriminatively-activated-sparselets, title={Discriminatively Activated Sparselets}, author={Song, Hyun O and Darrell, Trevor and Girshick, Ross B}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={196--204}, year={2013} } @inproceedings{anandkumar-adel-13-icml_linear-bayesian-networks-latent, title = {Learning Linear Bayesian Networks with Latent Variables}, author = {Animashree Anandkumar and Adel Javanmard and Daniel J. Hsu and Sham M. Kakade}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages = {249-257}, url = {http://jmlr.csail.mit.edu/proceedings/papers/v28/anandkumar13.pdf}, abstract = { This work considers the problem of learning linear Bayesian networks when some of the variables are unobserved. Identifiability and efficient recovery from low-order observable moments are established under a novel graphical constraint. The constraint concerns the expansion properties of the underlying directed acyclic graph (DAG) between observed and unobserved variables in the network, and it is satisfied by many natural families of DAGs that include multi-level DAGs, DAGs with effective depth one, as well as certain families of polytrees. }} @inproceedings{zuluaga-sergent-13-icml_active-learning-multi-objective-optimization, title={Active Learning for Multi-Objective Optimization}, author={Zuluaga, Marcela and Sergent, Guillaume and Krause, Andreas and P{\"u}schel, Markus}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={462--470}, year={2013} annote = { ABSTRACT: In many fields one encounters the challenge of identifying, out of a pool of possible designs, those that simultaneously optimize multiple objectives. This means that usually there is not one optimal design but an entire set of Pareto-optimal ones with optimal tradeoffs in the objectives. In many applications, evaluating one design is expensive; thus, an exhaustive search for the Pareto-optimal set is unfeasible. To address this challenge, we propose the Pareto Active Learning (PAL) algorithm, which intelligently samples the design space to predict the Pareto-optimal set. Key features of PAL include (1) modeling the objectives as samples from a Gaussian process distribution to capture structure and accommodate noisy evaluation; (2) a method to carefully choose the next design to evaluate to maximize progress; and (3) the ability to control prediction accuracy and sampling cost. We provide theoretical bounds on PAL’s sampling cost required to achieve a desired accuracy. Further, we show an experimental evaluation on three real-world data sets. The results show PAL’s effectiveness; in particular it improves significantly over a state-of-the-art evolutionary algorithm, saving in many cases about 33%. }} ==== @inproceedings{quang-bazzani-13_unifying-manifold-regularization, title={A unifying framework for vector-valued manifold regularization and multi-view learning}, author={Quang, Minh H and Bazzani, Loris and Murino, Vittorio}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, pages={100--108}, year={2013} annote = { ABSTRACT This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) formulation for the problem of learning an unknown functional dependency between a structured input space and a structured output space, in the Semi-Supervised Learning setting. Our formulation includes as special cases Vector-valued Manifold Regularization and Multi-view Learning, thus provides in particular a unifying framework linking these two important learning approaches. In the case of least square loss function, we provide a closed form solution with an efficient implementation. Numerical experiments on challenging multi-class categorization problems show that our multi-view learning formulation achieves results which are comparable with state of the art and are significantly better than single-view learning. }} ====ICML TOP CITES @article{leeH-grosse-ng-11_unsupervised-hierarchical-representation-deep-learning, title={Unsupervised learning of hierarchical representations with convolutional deep belief networks}, author={Lee, H. and Grosse, R. and Ranganath, R. and Ng, A.Y.}, journal={Communications of the ACM}, volume={54}, number={10}, pages={95--103}, year={2011}, annote = { [feature hierarchies are discovered via convolutional RBM with "max-pooling" to generate compact descriptors. ] based on ICML-09 paper (315 cites): Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations ABSTRACT. There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks (DBNs); however, scaling such models to full-sized, highdimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model that scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique that shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images. }} @inproceedings{duchi-chandra-T-08_projections-onto-L1-ball-for-high-dimensions, title={Efficient projections onto the l 1-ball for learning in high dimensions}, author={Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar}, booktitle={Proceedings of the 25th international conference on Machine learning}, pages={272--279}, year={2008}, annote = { ABSTRACT: We describe efficient algorithms for projecting a vector onto the ℓ1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the ℓ1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with ℓ1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity. 1. Introduction A prevalent machine learning approach for decision and prediction problems is to cast the learning task as penalized convex optimization. In penalized convex optimization we seek a set of parameters, gathered together in a vector w, which minimizes a convex objective function in w with an additional penalty term that assesses the complexity of w. Two commonly used penalties are the 1- norm and the square of the 2-norm of w. An alternative but mathematically equivalent approach is to cast the problem as a constrained optimization problem. In this setting we seek a minimizer of the objective function while constraining the solution to have a bounded norm. Many recent advances in statistical machine learning and related fields can be explained as convex optimization subject to a 1-norm constraint on the vector of parameters w. Imposing an ℓ1 constraint leads to notable benefits. First, it encourages sparse solutions, i.e a solution for which many components of w are zero. When the original dimension of w is very high, a sparse solution enables easier interpretation of the problem in a lower dimension space. For the usage of ℓ1-based approach in statistical machine learning see for example (Tibshirani, 1996) and the references therein. Donoho (2006b) provided sufficient conditions for obtaining an optimal ℓ1-norm solution which is sparse. Recent work on compressed sensing (Candes, 2006; Donoho, 2006a) further explores how ℓ1 constraints can be used for recovering a sparse signal sampled below the Nyquist rate. The second motivation for using ℓ1 constraints in machine learning problems is that in some cases it leads to improved generalization bounds. For example, Ng (2004) examined the task of PAC learning a sparse predictor and analyzed cases in which an ℓ1 constraint results in better solutions than an ℓ2 constraint. }} @ Online dictionary learning for sparse coding J Mairal, F Bach, J Ponce, G Sapiro Proceedings of the 26th Annual International Conference on Machine Learning .. }} @ vincent-larochelle-bengio-08icml_robust-features-w-denoising-autoencoders Extracting and composing robust features with denoising autoencoders P Vincent, H Larochelle, Y Bengio, PA Manzagol Proceedings of the 25th international conference on Machine learning, 1096-1103 ABSTRACT Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite. }} @article{wang-komodakis-paragios_13-markov-random-field-modeling-Comp-vision, title={Markov Random Field modeling, inference \& learning in computer vision \& image understanding: A survey}, author={Wang, Chaohui and Komodakis, Nikos and Paragios, Nikos}, journal={Computer Vision and Image Understanding}, volume={117}, number={11}, pages={1610--1627}, year={2013}, annote = { ABSTRACT: In this paper, we present a comprehensive survey of Markov Random Fields (MRFs) in computer vision and image understanding, with respect to the modeling, the inference and the learning. While MRFs were introduced into the computer vision field about two decades ago, they started to become a ubiquitous tool for solving visual perception problems around the turn of the millennium following the emergence of efficient inference methods. During the past decade, a variety of MRF models as well as inference and learning methods have been developed for addressing numerous low, mid and high-level vision problems. While most of the literature concerns pairwise MRFs, in recent years we have also witnessed significant progress in higher-order MRFs, which substantially enhances the expressiveness of graph-based models and expands the domain of solvable problems. This survey provides a compact and informative summary of the major literature in this research topic. --- Mathematically, let D denote the observed data and x a latent parameter vector that corresponds to a mathematical answer to the visual perception problem. Visual perception can then be formulated as finding a mapping from D to x, which is essentially an inverse problem[1]. Mathematical methods usually model such a mapping through an optimization problem as follows: x_opt = argmin_x E(x,D;w) where the energy (or cost, objective) function E(x, D; w) can be regarded as a quality measure of a parameter configuration x in the solution space given the observed data D, and w denotes the model parameters.1 Hence, visual perception involves three main tasks: modeling, inference and learning. The modeling has to accomplish: (i) the choice of an appropriate representation of the solution using a tuple of variables x; and (ii) the design of the class of energy functions E(x, D; w) which can correctly measure the connection between x and D. The inference has to search for the configuration of x leading to the optimum of the energy function, which corresponds to the solution of the original problem. The learning aims to select the optimal model parameters w based on the training data. The main difficulty in the modeling lies in the fact that most of the vision problems are inverse, ill-posed and require a large number of latent and/or observed variables to express the expected variations of the perception answer. Furthermore, the observed signals are usually noisy, incomplete and often only provide a partial view of the desired space. Hence, a successful model usually requires a reasonable regularization, a robust data measure, and a compact structure between the variables of interest to adequately characterize their relationship (which is usually unknown). In the Bayesian paradigm, the model prior, the data likelihood and the dependence properties correspond respectively to these terms, and the maximization of the posterior probability of the latent variables corresponds to the minimization of the energy function in Eq. (1). Probabilistic graphical models (usually referred to as graphical models) combine probability theory and graph theory towards a natural and powerful formalism for modeling and solving inference and estimation problems in various scientific and engineering fields. In particular, one important type of graphical models – Markov Random Fields (MRFs) – has become a ubiquitous methodology for solving visual perception problems, in terms of both the expressive potential of the modeling process and the optimality properties of the corresponding inference algorithm, due to their ability to model soft contextual constraints between variables and the significant development of inference methods for such models. Generally speaking, MRFs have the following major useful properties that one can benefit from during the algorithm design. First, MRFs provide a modular, flexible and principled way to combine regularization (or prior), data likelihood terms and other useful cues within a single graph-formulation, where continuous and discrete variables can be simultaneously considered. Second, the graph theoretic side of MRFs provides a simple way to visualize the structure of a model and facilitates the choice and the design of the model. Third, the factorization of the joint probability over a graph could lead to inference problems that can be solved in a computationally efficient manner. In particular, development of inference methods based on discrete optimization enhances the potential of discrete MRFs and significantly enlarges the set of visual perception problems to which MRFs can be applied. Last but not least, the probabilistic side of MRFs gives rise to potential advantages in terms of parameter learning (e.g., [2], [3], [4] and [5]) and uncertainty analysis (e.g., [6] and [7]) over classic variational methods [8] and [9], due to the introduction of probabilistic explanation to the solution [1]. }} @article{sagha-chavarriaga-13-prl_online-anomaly-in-classifier-ensembles, title={On-line anomaly detection and resilience in classifier ensembles}, author={Sagha, Hesam and Bayati, Hamidreza and Mill{\'a}n, Jos{\'e} del R and Chavarriaga, Ricardo}, journal={Pattern Recognition Letters}, year={2013}, publisher={North-Holland} annote = ++ }} @article{bilen-namboodri-vanGool-13_object-action-classify-latent-windows, title={Object and Action Classification with Latent Window Parameters}, author={Bilen, Hakan and Namboodiri, Vinay P and Van Gool, Luc J}, journal={International Journal of Computer Vision}, pages={1--15}, year={2013}, annote = { Use Crop and Split operations to identify rectangles in the image where salient info about activity may lie. These are detected as ABSTRACT we propose a generic framework to incorporate unobserved auxiliary information for classifying objects and actions. This framework allows us to automatically select a bounding box and its quadrants from which best to extract features. These spatial subdivisions are learnt as latent variables. The paper is an extended version of our earlier work Bilen et al. (Proceedings of The British Machine Vision Conference, 2011), complemented with additional ideas, experiments and analysis. We approach the classification problem in a discriminative setting, as learning a max-margin classifier that infers the class label along with the latent variables. Through this paper we make the following contributions: (a) we provide a method for incorporating latent variables into object and action classification; (b) these variables determine the relative focus on foreground versus background information that is taken account of; (c) we design an objective function to more effectively learn in unbalanced data sets; (d) we learn a better classifier by iterative expansion of the latent parameter space.We demonstrate the performance of our approach through experimental evaluation on a number of standard object and action recognition data sets. }} @article{liang-jordan-klein-13_learning-dependency-based-semantics, title={Learning dependency-based compositional semantics}, author={Liang, Percy and Jordan, Michael I and Klein, Dan}, journal={Computational Linguistics}, volume={39}, number={2}, pages={389--446}, year={2013}, annote = { ABSTRACT: Suppose we want to build a system that answers a natural language question by representing its semantics as a logical form and computing the answer given a structured database of facts. The core part of such a system is the semantic parser that maps questions to logical forms. Semantic parsers are typically trained from examples of questions annotated with their target logical forms, but this type of annotation is expensive. Our goal is to learn a semantic parser from question-answer pairs instead, where the logical form is modeled as a latent variable. Motivated by this challenging learning problem, we develop a new semantic formalism, dependency-based compositional semantics (DCS), which has favorable linguistic, statistical, and computational properties. We define a log-linear distribution over DCS logical forms and estimate the parameters using a simple procedure that alternates between beam search and numerical optimization. On two standard semantic parsing benchmarks, our system outperforms all existing stateof- the-art systems, despite using no annotated logical forms. }} @article{luG-kudo-toyama-12_temporal-segmentation-actions-in-video, title={Temporal Segmentation and Assignment of Successive Actions in a Long-Term Video}, author={Lu, Guoliang and Kudo, Mineichi and Toyama, Jun}, journal={Pattern Recognition Letters}, year={2012}, annote = { ABSTRACT: We exploit a novel learning-based framework [for] Temporal segmentation of successive actions in a long-term video. Given a video sequence, only a few characteristic frames are selected by the proposed selection algorithm, and then the likelihood to trained models is calculated in a pair-wise way, and finally segmentation is obtained as the optimal model sequence to realize the maximum likelihood. The average accuracy on IXMAS dataset reached to 80.5% at frame level, using only 16.5% of all frames in computation time of 1.57 s per video which has 1160 frames on the average. }} @inproceedings{burghouts-hove-13_action-recog-multiple-views-bag-of-words, title={Improved action recognition by combining multiple 2D views in the Bag-of-Words model}, author={Burghouts, Gertjan and Eendebak, Pieter and Bouma, Henri and ten Hove, Johan-Martijn}, booktitle={Advanced Video and Signal Based Surveillance (AVSS), 2013 10th IEEE International Conference on}, pages={250--255}, year={2013}, annote = { ABSTRACT: Action recognition is a hard problem due to the many degrees of freedom of the human body and the movement of its limbs. This is especially hard when only one camera viewpoint is available and when actions involve subtle movements. For instance, when looked from the side, checking one's watch may look very similar to crossing one's arms. In this paper, we investigate how much the recognition can be improved when multiple views are available. The novelty is that we explore various combination schemes within the robust and ppppsimple bag-of-words (BoW) framework, from early fusion of features to late fusion of multiple classifiers. In new experiments on the publicly available IXMAS dataset, we learn that action recognition can be improved significantly already by only adding one viewpoint. We demonstrate that the state-of-the-art on this dataset can be improved by 5% - achieving 96.4% accuracy - when multiple views are combined. Cross-view invariance of the BoW pipeline can be improved by 32% with intermediate-level fusion. }} @article{wang-gould-roller-13_discriminative-learning-cluttered-indoor-scenes-w-latent-var, title={Discriminative learning with latent variables for cluttered indoor scene understanding}, author={Wang, Huayan and Gould, Stephen and Roller, Daphne}, journal={Communications of the ACM}, volume={56}, number={4}, pages={92--99}, year={2013}, annote = { original: ECCV-12 Stephen Gould - phd stanford } ====IJC 13 @article{hadjinikolis-modgil-13-ijc_opponent-modeling-dialogues, title={Opponent Modelling in Persuasion Dialogues}, author={Hadjinikolis, Christos and Yiannis Siantos and Sanjay Modgil and Elizabeth Black and Peter McBurney}, annote = { ABSTRACT: A strategy is used by a participant in a persuasion dialogue to select locutions most likely to achieve its objective of persuading its opponent. Such strategies often assume that the participant has a model of its opponents, which may be constructed on the basis of a participant's accumulated dialogue experience. However in most cases the fact that an agent's experience may encode additional information which if appropriately used could increase a strategy's efficiency, is neglected. In this work, we rely on an agent's experience to define a mechanism for augmenting an opponent model with information likely to be dialectally related to information already contained in it. Precise computation of this likelihood is exponential in the volume of related information. We thus describe and evaluate an approximate approach for computing these likelihoods based on Monte-Carlo simulation. }} ==== @article{gongD-zhao-medioni-12icml-multiple-manifold-structure-learning, title={Robust Multiple Manifolds Structure Learning}, author={Gong, D. and Zhao, X. and Medioni, G.}, journal={ICML-12}, year={2012} annote = { Combines local manifold construction and merges the manifolds obtained based on a new curvature-level similarity measure. A terrific idea. ??project? is code available? ABSTRACT: We present a robust multiple manifold structure learning (RMMSL) scheme to robustly estimate data structures under the multiple low intrinsic dimensional manifolds assumption. In the local learning stage, RMMSL efficiently estimates local tangent space by weighted low-rank matrix factorization. In the global learning stage, we propose a robust manifold clustering method based on local structure learning results. The proposed clustering method is designed to get the flattest manifolds clusters by introducing a novel curved-level similarity function. Our approach is evaluated and compared to state-of-the-art methods on synthetic data, handwritten digit images, human motion capture data and motorbike videos. We demonstrate the effectiveness of the proposed approach, which yields higher clustering accuracy, and produces promising results for challenging tasks of human motion segmentation and motion flow learning from videos. ICML site has discussion+video http://icml.cc/discuss/2012/191.html }} @InProceedings{boots-gordon-12icml_two-manifold-merging-from-separate-views, author = {Byron Boots and Geoff Gordon}, title = {Two-Manifold Problems with Applications to Nonlinear System Identification}, booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)}, series = {ICML '12}, year = {2012}, editor = {John Langford and Joelle Pineau}, location = {Edinburgh, Scotland, GB}, isbn = {978-1-4503-1285-1}, month = {July}, publisher = {Omnipress}, address = {New York, NY, USA}, pages= {623--630}, url = {http://arxiv.org/abs/1206.4648}, annote = { ABSTRACT: Recently, there has been much interest in spectral approaches to learning manifolds—so-called kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at two-manifold problems, in which we simultaneously reconstruct two related manifolds, each representing a different view of the same data. By solving these interconnected learning problems together, two-manifold algorithms are able to succeed where a non-integrated approach would fail: each view allows us to suppress noise in the other, reducing bias. We propose a class of algorithms for two-manifold problems, based on spectral decomposition of cross-covariance operators in Hilbert space and discuss when two-manifold problems are useful. Finally, we demonstrate that solving a two-manifold problem can aid in learning a nonlinear dynamical system from limited data. discussion+video on ICML site. http://icml.cc/discuss/2012/338.html }} @InProceedings{varoquaux-gramfort-12icml_small-sample-fmri-spatial-clustering, author = {Gael Varoquaux and Alexandre Gramfort and Bertrand Thirion}, title = {Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering}, booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)}, series = {ICML '12}, year = {2012}, editor = {John Langford and Joelle Pineau}, location = {Edinburgh, Scotland, GB}, isbn = {978-1-4503-1285-1}, month = {July}, publisher = {Omnipress}, address = {New York, NY, USA}, pages= {1375--1382}, annote = { ABSTRACT: Functional neuroimaging can measure the brain’s response to an external stimulus. It is used to perform brain mapping: identifying from these observations the brain regions involved. This problem can be cast into a linear supervised learning task where the neuroimaging data are used as predictors for the stimulus. Brain mapping is then seen as a support recovery problem. On functional MRI (fMRI) data, this problem is particularly challenging as i) the number of samples is small due to limited acquisition time and ii) the variables are strongly correlated. We propose to overcome these difficulties using sparse regression models over new variables obtained by clustering of the original variables. The use of randomization techniques, e.g. bootstrap samples, and hierarchical clustering of the variables improves the recovery properties of sparse methods. We demonstrate the benefit of our approach on an extensive simulation study as well as two publicly available fMRI datasets. discussion+video: http://icml.cc/discuss/2012/688.html }} @InProceedings{jawanpuria-nath-12icml_convex-feature-learning-for-latent-task-structure, author = {Pratik Jawanpuria and J. Saketha Nath}, title = {A Convex Feature Learning Formulation for Latent Task Structure Discovery}, booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)}, series = {ICML '12}, year = {2012}, editor = {John Langford and Joelle Pineau}, location = {Edinburgh, Scotland, GB}, isbn = {978-1-4503-1285-1}, month = {July}, publisher = {Omnipress}, address = {New York, NY, USA}, pages= {137--144}, Abstract: This paper considers the multi-task learning problem and in the setting where some relevant features could be shared across few related tasks. Most of the existing methods assume the extent to which the given tasks are related or share a common feature space to be known apriori. In real-world applications however, it is desirable to automatically discover the groups of related tasks that share a feature space. In this paper we aim at searching the exponentially large space of all possible groups of tasks that may share a feature space. The main contribution is a convex formulation that employs a graph-based regularizer and simultaneously discovers few groups of related tasks, having close-by task parameters, as well as the feature space shared within each group. The regularizer encodes an important structure among the groups of tasks leading to an efficient algorithm for solving it: if there is no feature space under which a group of tasks has close-by task parameters, then there does not exist such a feature space for any of its supersets. An efficient active set algorithm that exploits this simplification and performs a clever search in the exponentially large space is presented. The algorithm is guaranteed to solve the proposed formulation (within some precision) in a time polynomial in the number of groups of related tasks discovered. Empirical results on benchmark datasets show that the proposed formulation achieves good generalization and outperforms state-of-the-art multi-task learning algorithms in some cases. video: http://icml.cc/discuss/2012/90.html pratik.j, saketh@cse.iitb.ac.in }} @InProceedings{takeda-mitsugi-kanamori-12-icml_unified-robust-classification, author = {Akiko Takeda and Hiroyuki Mitsugi and Takafumi Kanamori}, title = {A Unified Robust Classification Model}, booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)}, series = {ICML '12}, year = {2012}, editor = {John Langford and Joelle Pineau}, location = {Edinburgh, Scotland, GB}, isbn = {978-1-4503-1285-1}, month = {July}, publisher = {Omnipress}, address = {New York, NY, USA}, pages= {129--136}, annote = { good review of supervised classification + combine main algorithms ABSTRACT: A wide variety of machine learning algorithms such as support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA), exist for binary classification. The purpose of this paper is to provide a unified classification model that includes the above models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVM become applicable to MPM and FDA, and vice versa. Another benefit is to provide theoretical results to above learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and propose a non-convex optimization algorithm that can be applied to non-convex variants of existing learning methods. discussion+video : http://icml.cc/discuss/2012/87.html }} SUPERVISED LEARNING @inproceedings{maX-luoP-11ijc_combining-supervised-unsupervised-via-embedding, title={Combining supervised and unsupervised models via unconstrained probabilistic embedding}, author={Xudong Ma and Ping Luo and Fuzhen Zhuang and Qing He and Zhongzhi Shi and Zhiyong Shen}, booktitle={Proceedings of the 22nd IJCAI Volume Two}, pages={1396--1401}, year={2011}, abstract = { Unsupervised learning is used to improve the learning based on several (conflicting) supervised learners. Given a data set, an ensemble categorization system traditionally consists of several supervised learners assign some class IDs. In the case of conflicts one may use voting etc. In this work, they try to use several unsupervised clustering algorithms, each of which create somewhat different clusters. The idea in this ensemble of learners is that the items in the same unsup clusters should belong to the same final classes- as far as possible. this is used to tune the result of the supervised learning. abstract: Ensemble learning with output from multiple supervised and unsupervised models aims to improvethe classification accuracy of supervised model ensembleby jointly considering the grouping results from unsupervised models. In this paper we cast this ensemble task as an unconstrained probabilistic embedding problem. Specifically, we assume both objects and classes/clusters have latent coordinates without constraints in a D-dimensional Euclidean space, and consider the mapping from the embedded space into the space of results from supervised and unsupervised models as a probabilistic generative process. The prediction of an objectis then determined by the distances between the objectand the classes in the embedded space. A solution of this embedding can be obtained using the quasi-Newton method, resulting in the objects and classes/clusters with high co-occurrence weights being embedded close. We demonstrate the benefits of this unconstrained embedding method by three real applications. }} @inproceedings{xiaoY-liuB-11ijc_similarity-based-positive-and-unlabeled-learning, title={Similarity-based approach for positive and unlabelled learning}, author={Yanshan Xiao and Bo Liu and Jie Yin and Longbing Cao and Chengqi Zhang and Zhifeng Hao}, booktitle={Proceedings of the 22nd IJCAI Volume Two}, pages={1577--1582}, year={2011}, abstract = { Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which can not be explicitly identified as positive or negative (we call them ambiguous examples), they either exclude them from the training phase or simply enforce them to either class. Consequently, their performance may be constrained. This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively. The local similarity-based and global similarity-based mechanisms are proposed to generate the similarity weights. The ambiguous examples and their similarity-weights are thereafter incorporated into an SVM-based learning phase to build a more accurate classifier. Extensive experiments on real-world datasets have shown that SPUL outperforms state-of-the-art PU learning methods. }} @inproceedings{liYF-HuJH-12aaai_what-patterns-trigger-what-labels, title={Towards Discovering What Patterns Trigger What Labels}, author={Yu-Feng Li and Ju-Hua Hu and Yuan Jiang and Zhi-Hua Zhou}, booktitle={Twenty-Sixth AAAI Conference on Artificial Intelligence}, year={2012}, pages = {1012-1018}, annote = { multiple labels are associated with each object, with many overlaps. perhaps a label relates to some subset of features in each object. how does one create models of categories from this? formulate the problem as a convex optimization problem. ABSTRACT: In many real applications, especially those involving data objects with complicated semantics, it is generally desirable to discover the relation between patterns in the input space and labels corresponding to different semantics in the output space. This task becomes feasible with MIML (Multi-Instance Multi-Label learning), a recently developed learning framework, where each data object is represented by multiple instances and is allowed to be associated with multiple labels simultaneously. In this paper, we propose KISAR, an MIML algorithm that is able to discover what instances trigger what labels. By considering the fact that highly relevant labels usually share some patterns, we develop a convex optimization formulation and provide an alternating optimization solution. Experiments show that KISAR is able to discover reasonable relations between input patterns and output labels, and achieves performances that are highly competitive with many state-of-the-art MIML algorithms. }} @inproceedings{caragea-silvescu-mitra-12_hashing-and-abstraction-sparse-high-D-features, title={Combining Hashing and Abstraction in Sparse High Dimensional Feature Spaces}, author={Cornelia Caragea and Adrian Silvescu and Prasenjit Mitra}, booktitle={Twenty-Sixth AAAI Conference on Artificial Intelligence}, year={2012} annote = { a popular approach to information retrieval from documents involves "bag of words". with a large vocabulary this becomes extremely high-dimensional and computationally intractable. In this work, one applies hashing and agglomerative clustering to obtain a smaller set of sparse features. ABSTRACT: With the exponential increase in the number of documents available online, e.g., news articles, weblogs, scientific documents, the development of effective and efficient classification methods is needed. The performance of document classifiers critically depends, among other things, on the choice of the feature representation. The commonly used “bag of words” and n-gram representations can result in prohibitively high dimensional input spaces. Data mining algorithms applied to these input spaces may be intractable due to the large number of dimensions. Thus, dimensionality reduction algorithms that can process data into features fast at runtime, ideally in constant time per feature, are greatly needed in high throughput applications, where the number of features and data points can be in the order of millions. One promising line of research to dimensionality reduction is feature clustering. We propose to combine two types of feature clustering, namely hashing and abstraction based on hierarchical agglomerative clustering, in order to take advantage of the strengths of both techniques. Experimental results on two text data sets show that the combined approach uses significantly smaller number of features and gives similar performance when compared with the “bag of words” and n-gram approaches. }} @InProceedings{tangY-salakhutdinov-hinton-12icml_deep-lambertian-albedo-learning, author = {Yichuan Tang and Ruslan Salakhutdinov and Geoffrey Hinton}, title = {Deep Lambertian Networks}, booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)}, series = {ICML '12}, year = {2012}, editor = {John Langford and Joelle Pineau}, location = {Edinburgh, Scotland, GB}, isbn = {978-1-4503-1285-1}, month = {July}, publisher = {Omnipress}, address = {New York, NY, USA}, pages= {1623--1630}, annote = { ABSTRACT: Visual perception is a challenging problem in part due to illumination variations. A possible solution is to first estimate an illumination invariant representation before using it for recognition. The object albedo and surface normals are examples of such representation. In this paper, we introduce a multilayer generative model where the latent variables include the albedo, surface normals, and the light source. Combining Deep Belief Nets with the Lambertian reflectance assumption, our model can learn good priors over the albedo from 2D images. Illumination variations can be explained by changing only the lighting latent variable in our model. By transferring learned knowledge from similar objects, albedo and surface normals estimation from a single image is possible in our model. Experiments demonstrate that our model is able to generalize as well as improve over standard baselines in one-shot face recognition. discussion+video: http://icml.cc/discuss/2012/791.html }} @inproceedings{levy-markovitch-12_machine-learning-from-metaphor, title={Teaching Machines to Learn by Metaphors}, author={Omer Levy and Shaul Markovitch}, booktitle={Twenty-Sixth AAAI Conference on Artificial Intelligence}, year={2012}, annote = { ABSTRACT Humans have an uncanny ability to learn new concepts with very few examples. Cognitive theories have suggested that this is done by utilizing prior experience of related tasks. We propose to emulate this process in machines, by transforming new problems into old ones. These transformations are called metaphors. Obviously, the learner is not given a metaphor, but must acquire one through a learning process. We show that learning metaphors yield better results than existing transfer learning methods. Moreover, we argue that metaphors give a qualitative assessment of task relatedness. }} @phdthesis{frank-13_bayesian-models-of-syntactic-category-acquisition, title={Bayesian models of syntactic category acquisition}, author={Frank, Stella Christina}, year={2013}, publisher={The University of Edinburgh} annote = { Both unsupervised morphological analysis and POS-tagging. Including the sentence type improves performance. Uses the EVE corpus from CHILDES POS TAGGING: models with local context (MORPHTAG, MORPHTAGNOSEG, BHMM) do dramatically better than models clustering words using only morphological information. Pitman-Yor model of data statistics (MORPHTAGNOSEG) does slightly better than the Dirichlet-multinomial (BHMM) MORPHOLOGY: 3 evaluation measures: tagVM, suffixVM, EMMA [either over-segmentation, or misses. harder to evaluate. ] Using SuffixVM or Emma to evaluate morphological segmentation performance, MORPHTAGTRUETAGS outperforms others, esp those without local syntactic constraints. expts in spanish similar. }} ====??? @ Smita, Sirker; Can we infer the non-Observable Mind without Language? Language: English Subject: Philosophy Issue: 1/2009 Page Range: 129-134 No. of Pages: 5 File size: 41 KB Summary: We know our minds through introspection and others through inference. The occult perception of the one’s “mind” is dependent on the “mental activity”; dependent on the “awareness of one’s mental states” itself. One finds difficulty in separating the distinct roles of inference and perception in case of self-knowledge. The life of philosophers, brain scientists and of course the ordinary folks sails through the stormy debate concerning whether “mind and its states exist” quite peacefully. The discourse between the philosopher and the brain scientist; the philosopher and the ordinary folk; the brain scientist and ordinary folk presupposes that our minds exist and we share our thoughts and doubts through our ordinary language which in a big way helps us in the inference of “other minds”. This brief article explores the role of ordinary language in our discourse to discover the enigma of the “mind”. Keywords: Descartes’ myth; introspection; ordinary language; mental activity; inference of mind; privilege access; phenomenal experience. }} ====ROBOTICS @article{fangY-liuX-zhangX-12_adaptive-visual-servoing-nonholonomic, title={Adaptive active visual servoing of nonholonomic mobile robots}, author={Fang, Yongchun and Liu, Xi and Zhang, Xuebo}, journal={Industrial Electronics, IEEE Transactions on}, volume={59}, number={1}, pages={486--497}, year={2012}, publisher={IEEE} annote = { ABSTRACT—This paper presents a novel two-level scheme for adaptive active visual servoing of a mobile robot equipped with a pan camera. In the lower level, the pan platform carrying an onboard camera is controlled to keep the reference points lying around the center of the image plane. On the higher level, a switched controller is utilized to drive the mobile robot to reach the desired configuration through image feature feedback. The designed active visual servoing system presents such advantages as follows: 1) a satisfactory solution for the field-of-view problem; 2) global high servoing efficiency; and 3) free of any complex pose estimation algorithm usually required for visual servoing systems. The performance of the active visual servoing system is proven by rigorousmathematical analysis. Both simulation and experimental results are provided to validate the effectiveness of the proposed active visual servoing method. }} ==== @inproceedings{jia-darrell-13_latent-task-adaptation-w-hierarchies, title={Latent Task Adaptation with Large-scale Hierarchies}, author={Jia, Yangqing and Darrell, Trevor and Wang, Fan and Huang, Qixing and Guibas, Leonidas J and Ni, Bingbing and Moulin, Pierre and Feng, Zheyun and Jin, Rong and Jain, Anil and others}, booktitle={The IEEE International Conference on Computer Vision (ICCV)$\}$}, year={2013} } @inproceedings{azary-savakis-13cvpr_grassmannian-sparse-representation-3D-actions, title={Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action Recognition}, author={Azary, Sherif and Savakis, Andreas}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on}, pages={492--499}, year={2013}, annote = { ABSTRACT: Manifold learning has been effectively used in computer vision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for computer vision problems because they promote smooth surfaces where points are represented as subspaces. In this paper we propose Grassmannian Sparse Representations (GSR), a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss L1-norm minimization for optimal classification. We further introduce a new descriptor that we term Motion Depth Surface (MDS) and compare its classification performance against the traditional Motion History Image (MHI) descriptor. We demonstrate the effectiveness of GSR on computationally intensive 3D action sequences from the Microsoft Research 3D-Action and 3D-Gesture datasets. }} @inproceedings{vemulapalli-chellappa-13cvpr_kernel-learning-manifolds, title={Kernel learning for extrinsic classification of manifold features}, author={Vemulapalli, Raviteja and Pillai, Jaishanker K and Chellappa, Rama}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on}, pages={1782--1789}, year={2013}, annote={ ABSTRACT: In computer vision applications, features often lie on Riemannian manifolds with known geometry. Popular learning algorithms such as discriminant analysis, partial least squares, support vector machines, etc., are not directly applicable to such features due to the non-Euclidean nature of the underlying spaces. Hence, classification is often performed in an extrinsic manner by mapping the manifolds to Euclidean spaces using kernels. However, for kernel based approaches, poor choice of kernel often results in reduced performance. In this paper, we address the issue of kernel selection for the classification of features that lie on Riemannian manifolds using the kernel learning approach. We propose two criteria for jointly learning the kernel and the classifier using a single optimization problem. Specifically, for the SVM classifier, we formulate the problem of learning a good kernel-classifier combination as a convex optimization problem and solve it efficiently following the multiple kernel learning approach. Experimental results on image set-based classification and activity recognition clearly demonstrate the superiority of the proposed approach over existing methods for classification of manifold features. }} @article{harandi-sanderson-13prl_kernel-on-grassmann-manifold-for-action-recog, title={Kernel analysis on Grassmann manifolds for action recognition}, author={Harandi, Mehrtash T and Sanderson, Conrad and Shirazi, Sareh and Lovell, Brian C}, journal={Pattern Recognition Letters}, year={2013}, annote = { ABSTRACT: Modelling video sequences by subspaces has recently shown promise for recognising human actions. Subspaces are able to accommodate the effects of various image variations and can capture the dynamic properties of actions. Subspaces form a non-Euclidean and curved Riemannian manifold known as a Grassmann manifold. Inference on manifold spaces usually is achieved by embedding the manifolds in higher dimensional Euclidean spaces. In this paper, we instead propose to embed the Grassmann manifolds into reproducing kernel Hilbert spaces and then tackle the problem of discriminant analysis on such manifolds. To achieve efficient machinery, we propose graph-based local discriminant analysis that utilises within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, respectively. Experiments on KTH, UCF Sports, and Ballet datasets show that the proposed approach obtains marked improvements in discrimination accuracy in comparison to several state-of-the-art methods, such as the kernel version of affine hull image-set distance, tensor canonical correlation analysis, spatial-temporal words and hierarchy of discriminative space-time neighbourhood features. }} UPDATE: @inproceedings{blasiak-rangwala-11ijc_hmm-variant-for-sequence-classifiction, title={A hidden markov model variant for sequence classification}, author={Sam Blasiak and Huzefa Rangwala}, @inproceedings{ciresan-meier-11ijc_convolutional-NN-for-image-classification, title={Flexible, high performance convolutional neural networks for image classification}, author={Dan C. Cire{\c{s}}an and Ueli Meier and Jonathan Masci and Luca Maria Gambardella and Jürgen Schmidhuber}, @ InProceedings{matuszek-fitzgerald-zettlemoyer-12icml_joint-language-and-perception-learning, ~vedant/cs365/hw2/report.pdf @inproceedings {chambers-jurafsky-11_template-script-extraction-from-text, title = {Template-based information extraction without the templates}, author = {Nathanael Chambers and Dan Jurafsky}, RELATED: ICML-12 abstract: Learning the Central Events and Participants in Unlabeled Text Nathanael Chambers and Dan Jurafsky @InProceedings{mnih-tehYW-12icml_neural-probabilistic-language-models, author = {Andriy Mnih and Yee Whye Teh}, title = {A fast and simple algorithm for training neural probabilistic language models}, (rohitangsu das review ~rohitdas/cs365/hw2/paper_cs365.pdf @inproceedings{kalakrishnan-righetti-11iros_learning-force-control-policies-compliant-manipulation, title = {Learning force control policies for compliant manipulation }, author = {Mrinal Kalakrishnan and Ludovic Righetti and Peter Pastor and Stefan Schaal}, booktitle = {Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ }, pages = {4639-4644}, @article{hinton-srivastava-12_NN-prevents-co-adaptation-of-feature-detectors, title={Improving neural networks by preventing co-adaptation of feature detectors}, author={Hinton, G.E. and Srivastava, N. and Krizhevsky, A. and Sutskever, I. and Salakhutdinov, R.R.}, journal={arXiv preprint arXiv:1207.0580}, year={2012}, @article{silveira-malis-12_direct-visual-servoing-control-nonmetric, title={Direct visual servoing: Vision-based estimation and control using only nonmetric information}, author={Silveira, Geraldo and Malis, Ezio}, journal={Robotics, IEEE Transactions on}, volume={28}, number={4}, pages={974--980}, year={2012}, annote = { ABSTRACT: This paper addresses the problem of stabilizing a robot at a pose specified via a reference image. Specifically, this paper focuses on six degrees-of-freedom visual servoing techniques that require neither metric information of the observed object nor precise camera and/or robot calibration parameters. Not requiring them improves the flexibility and robustness of servoing tasks. However, existing techniques within the focused class need prior knowledge of the object shape and/or of the camera motion. We present a new visual servoing technique that requires none of the aforementioned information. The proposed technique directly exploits 1) the projective parameters that relate the current image with the reference one and 2) the pixel intensities to obtain these parameters. The level of versatility and accuracy of servoing tasks are, thus, further improved. We also show that the proposed nonmetric scheme allows for path planning. In this way, the domain of convergence is greatly enlarged as well. Theoretical proofs and experimental results demonstrate that visual servoing can, indeed, be highly accurate and robust, despite unknown objects and imaging conditions. This naturally encompasses the cases of color images and illumination changes. this paper focuses on visual servoing techniques that do not require metric information of the observed target and can control all 6 DOF of a robot. The fact of not requiring metric information improves the flexibility and robustness of visual servoing tasks [6]. Indeed, recent studies in the domain of biological vision have suggested that the brain processes visual information nonmetrically [6]. Surprisingly, only few works have been conducted on the full 6 DOF nonmetric visual servoing. Moreover, these existing works require prior knowledge of the object shape and/or of the camera motion. [6] L. Thaler and M. A. Goodale, “Beyond distance and direction: The brain represents target locations non-metrically,” J. Vis., vol. 10, no. 3, pp. 1–27, 2010. }} @article{thaler-goodale-10-j-vision_beyond-distance-brain-non-metrically, title={Beyond distance and direction: The brain represents target locations non-metrically}, author={Thaler, Lore and Goodale, Melvyn A}, journal={Journal of Vision}, volume={10}, number={3}, year={2010}, publisher={Association for Research in Vision and Ophthalmology}, annote = { ABSTRACT: In their day-to-day activities human beings are constantly generating behavior, such as pointing, grasping or verbal reports, on the basis of visible target locations. The question arises how the brain represents target locations. One possibility is that the brain represents them metrically, i.e. in terms of distance and direction. Another equally plausible possibility is that the brain represents locations non-metrically, using for example ordered geometry or topology. Here we report two experiments that were designed to test if the brain represents locations metrically or non-metrically. We measured accuracy and variability of visually guided reach-to-point movements (Experiment 1) and probe-stimulus adjustments (Experiment 2). The specific procedure of informing subjects about the relevant response on each trial enabled us to dissociate the use of non-metric target location from the use of metric distance and direction in head/eye-centered, hand-centered and externally defined (allocentric) coordinates. The behavioral data show that subjects' responses are least variable when they can direct their response at a visible target location, the only condition that permitted the use of non-metric information about target location in our experiments. Data from Experiments 1 and 2 correspond well quantitatively. Response variability in non-metric conditions cannot be predicted based on response variability in metric conditions. We conclude that the brain uses non-metric geometrical structure to represent locations. } @article{tahri-youcef-13ras_robust-visual-servoing-invariant, title={Robust image-based visual servoing using invariant visual information}, author={Tahri, Omar and Araujo, Helder and Chaumette, Fran{\c{c}}ois and Mezouar, Youcef}, journal={Robotics and Autonomous Systems}, volume={61}, number={12}, pages={1588--1600}, year={2013}, annote = { Catadiotropic camera - camera + mirrors w single optical center A unified model for central imaging systems has been proposed in [9]. It consists in modeling the central imaging systems by two consecutive pro- jections: spherical and then perspective ... [9] C. Geyer and K. Daniilidis. A Unifying Theory for Central Panoramic Systems and Practical Implications. In Computer Vision- ECCV 2000 (pp. 445-461). Springer Berlin Heidelberg. ABSTRACT: This paper deals with the use of invariant visual features for visual servoing. New features are proposed to control the 6 degrees of freedom of a robotic system with better linearizing properties and robustness to noise than the state of the art in image-based visual servoing. We show that by using these features the behavior of image-based visual servoing in task space can be significantly improved. Several experimental results are provided and validate our proposal. }} @article{candes-li-X-11-jacm_robust-PCA-noisy-matrix, title={Robust principal component analysis?}, author={Cand{\`e}s, Emmanuel J and Li, Xiaodong and Ma, Yi and Wright, John}, journal={Journal of the ACM (JACM)}, volume={58}, number={3}, pages={11}, year={2011}, annote ={ ABSTRACT: This article is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces. }} @article{vandenBerg-abbeel-11ijrr_lqg-mp-motion-planning-uncertainty, title={LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information}, author={Van Den Berg, Jur and Abbeel, Pieter and Goldberg, Ken}, journal={The International Journal of Robotics Research}, volume={30}, number={7}, pages={895--913}, year={2011}, publisher={SAGE Publications} } ==== @inproceedings{jurgens-stevens-11_impact-of-sense-similarity-on-WSD, title={Measuring the impact of sense similarity on word sense induction}, author={Jurgens, David and Stevens, Keith}, booktitle={Proceedings of the First Workshop on Unsupervised Learning in NLP}, pages={113--123}, year={2011}, annote = { ABSTRACT: We describe results of a word sense annotation task using WordNet, involving half a dozen well-trained annotators on ten polysemous words for three parts of speech. One hundred sentences for each word were annotated. Annotators had the same level of training and experience, but interannotator agreement (IA) varied across words. There was some effect of part of speech, with higher agreement on nouns and adjectives, but within the words for each part of speech there was wide variation. This variation in IA does not correlate with number of senses in the inventory, or the number of senses actually selected by annotators. In fact, IA was sometimes quite high for words with many senses. We claim that the IA variation is due to the word meanings, contexts of use, and individual differences among annotators. We find some correlation of IA with sense confusability as measured by a sense confusion threshhold (CT). Data mining for association rules on a flattened data representation indicating each annotator’s sense choices identifies outliers for some words, and systematic differences among pairs of annotators on others. }} @article{goldwasser-roth-13acl_leveraging-domain-independent-semantics, author = {Dan Goldwasser and Dan Roth}, title = {Leveraging Domain-Independent Information in Semantic Parsing}, booktitle = {ACL}, year = {2013}, url = "http://cogcomp.cs.illinois.edu/papers/GoldwasserRoth13.pdf", annote = { ABSTRACT: Semantic parsing is a domain-dependent process by nature, as its output is defined over a set of domain symbols. Motivated by the observation that interpretation can be decomposed into domain-dependent and independent components, we suggest a novel interpretation model, which augments a domain dependent model with abstract information that can be shared by multiple domains. Our experiments show that this type of information is useful and can reduce the annotation effort significantly when moving between domains. }} ==== ICCV @article{ordonez-berg-13iccv_large-scale-image-entry-level-categories, title={From Large Scale Image Categorization to Entry-Level Categories}, author={Ordonez, Vicente and Deng, Jia and Choi, Yejin and Berg, Alexander C and Berg, Tamara L}, booktitle={International Conference on Computer Vision (ICCV)}, year={2013}, annote = { MARR PRIZE 2013 }} @inproceedings{cinbis-verbeek-13iccv_segmentation-driven-object-detection, title={Segmentation Driven Object Detection with Fisher Vectors}, author={Cinbis, Ramazan Gokberk and Verbeek, Jakob and Schmid, Cordelia and others}, booktitle={International Conference on Computer Vision (ICCV)}, year={2013} annote = { ABSTRACT: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results. }} @inproceedings{faragasso-oriolo-13-icra_vision-based-humanoid-corridor-walk, title={Vision-based corridor navigation for humanoid robots}, author={Faragasso, Angela and Oriolo, Giuseppe and Paolillo, Antonio and Vendittelli, Marilena}, booktitle={Robotics and Automation (ICRA), 2013 IEEE International Conference on}, pages={3190--3195}, year={2013}, organization={IEEE} annote = { Walks and turns (Nao) along a plain-wall artificial corridor environment. ABSTRACT: We present a control-based approach for visual navigation of humanoid robots in office-like environments. In particular, the objective of the humanoid is to follow a maze of corridors, walking as close as possible to their center to maximize motion safety. Our control algorithm is inspired by a technique originally designed for unicycle robots and extended here to cope with the presence of turns and junctions. The feedback signals computed for the unicycle are transformed to inputs that are suited for the locomotion system of the humanoid, producing a natural, human-like behavior. Experimental results for the humanoid robot NAO are presented to show the validity of the approach, and in particular the successful extension of the controller to turns and junctions. [6] J. M. Toibero, C. M. Soria, F. Roberti, R. Carelli, and P. Fiorini, “Switching visual servoing approach for stable corridor navigation,” in 14th International Conference on Advanced Robotics, pp. 1–6, 2009. }} @inproceedings{vandenberg-lin-manocha-08_reciprocal-velocity-obstacles, title={Reciprocal velocity obstacles for real-time multi-agent navigation}, author={Van den Berg, Jur and Lin, Ming and Manocha, Dinesh}, booktitle={Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on}, pages={1928--1935}, year={2008}, organization={IEEE}, annote={ ABSTRACT: In this paper, we propose a new concept — the “Reciprocal Velocity Obstacle”— for real-time multi-agent navigation. We consider the case in which each agent navigates independently without explicit communication with other agents. Our formulation is an extension of the Velocity Obstacle concept [3], which was introduced for navigation among (passively) moving obstacles. Our approach takes into account the reactive behavior of the other agents by implicitly assuming that the other agents make a similar collision-avoidance reasoning. We show that this method guarantees safe and oscillationfree motions for each of the agents. We apply our concept to navigation of hundreds of agents in densely populated environments containing both static and moving obstacles, and we show that real-time and scalable performance is achieved in such challenging scenarios. }}