CS365: Artificial Intelligence :: Homework 2

Homework 2 - Paper Review

In this homework, you have to review a paper from the list given below. Paper selection is on a first-come-first basis.

You will have to submit a small bibtex annotation for your paper, and make a poster presentation.

The bibTeX is due by Friday Feb 7. The poster presentations will be on the morning of Saturday Feb 8. Presentations will be in four batches of 15 each, from 9.00AM till 13.00. Attendance is mandatory.

The list of papers are given below this writeup. Clicking on each [pdf] link will get the .pdf. Clicking on "bibTeX" will get the bibliography details and in many cases, the abstract, and perhaps some rudimentary annotation.

For part (a) You will have to write a review in the "annote" entry. Note that the annote field can include latex commands and can be shown by creating an annotated bibliography. You can also link images etc. Please put your name at the bottom of your annote like this:

                                    -- YOUR NAME, userid, year

What to write in your review (and in your poster)

Short paragraphs or a few lines on each of these:

Describe the problem and why it is important. Don't say banal things like "understanding language is a critical task in AI"
State the work that has been done before. In all cases, you will need to read up on a good bit of the background to understand these papers. What are the main "claims" of novelty in the paper?
Describe the approach in not more than a paragraph or two. You should not put too many equations, but some key ideas and formulae should be stated (in LaTeX). In the presentation you should have a bit more detail, should people ask.
Do the results justify the "claims" made? What are the assumptions used in actually doing the work? Do they weaken the claims?
Is it a system that is likely to revolutionize AI or just a small step? Is the code being made available? Or an useful dataset?

The annotated bibliography of all your submissions will be archived on the course page.

Selection based on early choice : Project groups

The first people to make their choices will get them. All choices should be made by Friday (tomorrow) evening. Late latifs may find many choices gone. Those who have not chosen by then will be assigned a paper at random.

Project group formations will have some input based on what papers you select, but this will largely be random. Feel free to choose papers from areas that are not where you want to do a project.

Submission:

All submissions will be in yourarea/cs365/hw2/

a. A short writeup in the bibTeX annote format, giving your review on the paper, by FRIDAY FEB 7.
Filename: youruserid.bib. Additionally, you may wish to upload the .pdf from your bibtex as youruserid-bib.pdf (just convert your annote into a .tex and compile it).

b. We will have a poster presentation on FEB 8, where each of you will make a brief presentation on your chosen paper. Please upload your posters BEFORE the session:
youruserid-hw2.pdf

Dates:

Selection: by Friday Jan 31
1-page bibTeX review : by Friday Feb 7
Presentation at mini-workshop: Feb 8

Papers Selected for Review

MACHINE LEARNIING
1	SHOUVIK SACHDEVA	livni-lehavi-13-icml_vanishing-component-analysis	BibTeX
2	DEEPAK CHOUDHARY	duchi-chandra-T-08_projections-onto-L1-ball-for-high-dimensions	BibTeX
4	IRFAN HUDDA	maX-luoP-11ijc_combining-supervised-unsupervised-via-embedding	BibTeX
5	KARAN SINGH	candes-li-X-11-jacm_robust-PCA-noisy-matrix	BibTeX
6	LOHIT JAIN	zhu-J-chen-N-13-icml_gibbs-max-margin-topic-models	BibTeX
7	TAPAS AGARWAL	sagha-chavarriaga-13-prl_online-anomaly-in-classifier-ensembles	BibTeX
8	PANKAJ GUPTA	vemulapalli-chellappa-13cvpr_kernel-learning-manifolds	BibTeX
9	PRATIBHA PRAJAPATI	quang-bazzani-13_unifying-manifold-regularization	BibTeX
10	BHUPENDRA KASTORE	liYF-HuJH-12aaai_what-patterns-trigger-what-labels	BibTeX
12	ROHIT KUMAR JHA	menon-tamuz-13-icml_learning-to-program-by-example	BibTeX
13	S SAI KRISHNA PRASAD	levy-markovitch-12_machine-learning-from-metaphor	BibTeX
14	SAMARTH BANSAL	balasubramanian-yu-K-13-icml_smooth-sparse-coding	BibTeX
15	HARSHVARDHAN SHARMA	hinton-srivastava-12_NN-prevents-co-adaptation-of-feature-detectors	BibTeX
16	SHEFALI GARG	srivastava-salakhutdinov-12-nips_multimodal-learning-deep	BibTeX
17	SIDHARTH GUPTA	zuluaga-sergent-13-icml_active-learning-multi-objective-optimization	BibTeX
18	MASSAND SAGAR SUNIL	caragea-silvescu-mitra-12_hashing-and-abstraction-sparse-high-D-features	BibTeX
19	NIKUNJ AGRAWAL	leeH-grosse-ng-11_unsupervised-hierarchical-representation-deep-learning	BibTeX
20	BHAVISHYA MITTAL	kennedy-balzano-13_online-factorization-SVD	BibTeX
21	BUNEDRI VIVEK	xiaoY-liuB-11ijc_similarity-based-positive-and-unlabeled-learning	BibTeX
COMPUTER VISION
22	ANKUSH SACHDEVA	wang-gould-roller-13_discriminative-learning-cluttered-indoor-scenes-w-latent-var	BibTeX
23	KHAGESH PATEL	wang-komodakis-paragios_13-markov-random-field-modeling-for-vision	BibTeX
24	AMAN KUMAR	tahri-youcef-13_efficient-pose-estimation-from-set-of-points	BibTeX
25	ASHOK KUMAR M	cox-pinto-11_beyond-simple-features-face-recog	BibTeX
26	KHANDESH BHANGE	karthikeyan-manjunath-13iccv_where-what-we-see	BibTeX
27	PIYUSH KUMAR	luG-kudo-toyama-12_temporal-segmentation-actions-in-video	BibTeX
28	PRASHANT KUMAR	yang-ramanan-12_proxemics-in-personal-photos	BibTeX
29	PRIYANKA HARLALKA	parkHSS-sheikh-13iccv_3D-reconstruction-articulation	BibTeX
30	ANIRUDDHA ZALANI	tangY-salakhutdinov-hinton-12icml_deep-lambertian-albedo-learning	BibTeX
31	AYUSH MITTAL	ciresan-meier-11ijc_convolutional-NN-for-image-classification	BibTeX
32	JAVESH GARG	bilen-namboodri-vanGool-13_object-action-classify-latent-windows	BibTeX
33	PUNEET SINGH	burghouts-hove-13_action-recog-multiple-views-bag-of-words	BibTeX
34	S KRANTHI KUMAR	azary-savakis-13cvpr_grassmannian-sparse-representation-3D-actions	BibTeX
35	SATYAM K SHIVAM	cinbis-verbeek-13iccv_segmentation-driven-object-detection	BibTeX
36	SHUJAAT ISHAQ	hsiao-hebert-13_gradient-networks-shape-matching	BibTeX
COGNITIVE LEARNIING
37	AYUSH GUPTA	little-sommer-13_learning-action-perception-loops	BibTeX
38	REID RIZVI RAHMAN	thaler-goodale-10-j-vision_beyond-distance-brain-non-metrically	BibTeX
39	SAKSHI SINHA	schapiro-rogers-13_neural-events-from-temporal-community-structure	BibTeX
40	TANVI SONI	pezzulo-barsalou-cangelosi-12_computational-grounded-cognition	BibTeX
NATURAL LANGUAGE PROCESSING
41	Y KUSHAL	monner-reggia-11_systematically-grounding-language-deep	BibTeX
42	ANJANI KUMAR	jurgens-stevens-11_impact-of-sense-similarity-on-WSD	BibTeX
43	BALRAM MEENA	goldwasser-roth-13acl_leveraging-domain-independent-semantics	BibTeX
44	FANGYAN SUN	perfors-tenenbaum-regier-11_learnability-of-syntax	BibTeX
45	GUNUPUDI NISHANTH	boyd-blei-10_syntactic-topic-models	BibTeX
46	SHAIK NASEER BABA	chambers-jurafsky-11_template-script-extraction-from-text	BibTeX
47	SUMEDH MASULKAR	klapaftis-manandhar-13_word-sense-induction	BibTeX
48	CHETAN DALAL	sikdar-saha-12coling_differential-evolution-features_named-entity	BibTeX
49	DHRUV ANAND	salah-coenen-13_extracting-debate-graphs-UK	BibTeX
51	SHIVYANSH TANDON	hadjinikolis-modgil-13-ijc_opponent-modeling-dialogues	BibTeX
52	SRIJAN R SHETTY	qiu-G-liu-Bing-11_opinion-word-double-propagation	BibTeX
ROBOTICS / ROBOT MOTION PLANNING
53	ANUBHAV BIMBISARIYE	vandenberg-lin-manocha-08_reciprocal-velocity-obstacles	BibTeX
54	KANISHK VARSHNEY	silveira-malis-12_direct-visual-servoing-control-nonmetric	BibTeX
55	HARSHAD SAWHNEY	vandenBerg-abbeel-11ijrr_lqg-mp-motion-planning-uncertainty	BibTeX
56	SAMYAK DAGA	faragasso-oriolo-13-icra_vision-based-humanoid-corridor-walk	BibTeX
57	DHRUV K YADAV	naseer-sturm-13_followme-person-following-quadcopter	BibTeX

BibTeX Entries


@inproceedings{cox-pinto-11_beyond-simple-features-face-recog,
  title={Beyond simple features: A large-scale feature search approach to unconstrained
     face recognition},
  author={Cox, David and Pinto, Nicolas},
  booktitle={Automatic Face \& Gesture Recognition and Workshops (FG 2011), 2011
     IEEE International Conference on},
  pages={8--15},
  year={2011},
  annote = {

ABSTRACT—Many modern computer vision algorithms are built atop of a set of
low-level feature operators (such as SIFT [1], [2]; HOG [3], [4]; or LBP [5],
[6]) that transform raw pixel values into a representation better suited to
subsequent processing and classification. While the choice of feature
representation is often not central to the logic of a given algorithm, the
quality of the feature representation can have critically important
implications for performance.  Here, we demonstrate a large-scale feature
search approach to generating new, more powerful feature representations in
which a multitude of complex, nonlinear, multilayer neuromorphic feature
representations are randomly generated and screened to find those best suited
for the task at hand. In particular, we show that a brute-force search can
generate representations that, in combination with standard machine learning
blending techniques, achieve state-of-the-art performance on the Labeled
Faces in the Wild (LFW) [7] unconstrained face recognition challenge
set. These representations outperform previous stateof- the-art approaches,
in spite of requiring less training data and using a conceptually simpler
machine learning backend.  We argue that such large-scale-search-derived
feature sets can play a synergistic role with other computer vision
approaches by providing a richer base of features with which to work.

}}



@inproceedings{naseer-sturm-13_followme-person-following-quadcopter,
  title={Followme: Person following and gesture recognition with a quadrocopter},
  author={Naseer, Tayyab and Sturm, Jurgen and Cremers, Daniel},
  booktitle={Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference
     on},
  pages={624--630},
  year={2013},
  organization={IEEE}
}



@inproceedings{kallman-mataric-04_motion-planning-dynamic-roadmap,
title={Motion planning using dynamic roadmaps},
author={Kallman, M and Mataric, Maja},
booktitle={Robotics and Automation, 2004. Proceedings. ICRA'04. 2004 IEEE International
     Conference on},
volume={5},
pages={4399--4404},
year={2004},
organization={IEEE}
}



@article{schapiro-rogers-13_neural-events-from-temporal-community-structure,
  title={Neural representations of events arise from temporal community structure},
  author={Schapiro, Anna C and Rogers, Timothy T and Cordova, Natalia I and Turk-Browne,
     Nicholas B and Botvinick, Matthew M},
  journal={Nature neuroscience},
  year={2013},
  annote = {

compares response-time and FMRI studies on human subjects
with an ANN model that sequences the same input.
Also compares GLM (General Linear Model) simulations on brain region models.

temporal sequences shown to subjects, e.g. 15 rotated patterns then 15
straight.  Subjects are asked to segment the sequence by pressing keybar.
Expt1:  response time based. Expt 3: FMRI

ABSTRACT:
Our experience of the world seems to divide naturally into discrete,
temporally extended events, yet the mechanisms underlying the learning and
identification of events are poorly understood. Research on event perception
has focused on transient elevations in predictive uncertainty or surprise as
the primary signal driving event segmentation. We present human behavioral
and functional magnetic resonance imaging (fMRI) evidence in favor of a
different account, in which event representations coalesce around clusters or
‘communities’ of mutually predicting stimuli. Through parsing behavior, fMRI
adaptation and multivoxel pattern analysis, we demonstrate the emergence of
event representations in a domain containing such community structure, but in
which transition probabilities (the basis of uncertainty and surprise) are
uniform. We present a computational account of how the relevant
representations might arise, proposing a direct connection between event
learning and the learning of semantic categories.

Expt1:
sequence alternated between blocks of 15 images generated from a random walk on the
     graph and blocks of 15 images generated from a randomly selected Hamiltonian path through the graph
(a path visiting every node exactly once). The purpose of interspersing
Hamiltonian paths was to ensure that parsing behavior could not be explained
by local statistics of the sequence (for example, after seeing items within a
cluster repeat several times, participants might use the relative novelty of
an item from a new cluster as a parsing cue).

}}



@article{qiu-G-liu-Bing-11_opinion-word-double-propagation,
  title={Opinion word expansion and target extraction through double propagation},
  author={Qiu, Guang and Liu, Bing and Bu, Jiajun and Chen, Chun},
  journal={Computational linguistics},
  volume={37},
  number={1},
  pages={9--27},
  year={2011},
  annote = {

ABSTRACT

Analysis of opinions, known as opinion mining or sentiment analysis, has
attracted a great deal of attention recently due to many practical
applications and challenging research problems.  In this article, we study
two important problems, namely, opinion lexicon expansion and opinion target
extraction. Opinion targets (targets, for short) are entities and their
attributes on which opinions have been expressed. To perform the tasks, we
found that there are several syntactic relations that link opinion words and
targets. These relations can be identified using a dependency parser and then
utilized to expand the initial opinion lexicon and to extract targets.  This
proposed method is based on bootstrapping. We call it double propagation as
it propagates information between opinion words and targets. A key advantage
of the proposed method is that it only needs an initial opinion lexicon to
start the bootstrapping process. Thus, the method is semi-supervised due to
the use of opinion word seeds. In evaluation, we compare the proposed method
with several state-of-the-art methods using a standard product review test
collection. The results show that our approach outperforms these existing
methods significantly.

}}





@article{klapaftis-manandhar-13_word-sense-induction,
  title={Evaluating Word Sense Induction and Disambiguation Methods},
  author={Klapaftis, Ioannis P and Manandhar, Suresh},
  journal={Language Resources and Evaluation},
  pages={1--27},
  year={2013},
  publisher={Springer}
  doi={10.1007/s10579-012-9205-0},
  pages={1-27},
  annote ={

Ioannis P. Klapaftis, Suresh Manandhar

Abstract

Word Sense Induction (WSI) is the task of identifying the different uses
(senses) of a target word in a given text in an unsupervised manner,
i.e. without relying on any external resources such as dictionaries or
sense-tagged data. This paper presents a thorough description of the
SemEval-2010 WSI task and a new evaluation setting for sense induction
methods. Our contributions are two-fold: firstly, we provide a detailed
analysis of the Semeval-2010 WSI task evaluation results and identify the
shortcomings of current evaluation measures. Secondly, we present a new
evaluation setting by assessing participating systems’ performance according
to the skewness of target words’ distribution of senses showing that there
are methods able to perform well above the Most Frequent Sense (MFS) baseline
in highly skewed distributions.

}}



@article{boyd-blei-10_syntactic-topic-models,
  title={Syntactic topic models},
  author={Boyd-Graber, Jordan and Blei, David M},
  journal={arXiv preprint arXiv:1002.4665},
  year={2010},
  annote = {

When we read a sentence, we use two kinds of reasoning: one for understanding its
syntactic structure and another for integrating its meaning into the wider context
     of other
sentences, other paragraphs, and other documents. Both mental processes are crucial,
and psychologists have found that they are distinct. A syntactically correct sentence
     that
is semantically implausible takes longer for people to understand than its semantically
plausible counterpart (Rayner et al. 1983). Furthermore, recent brain imaging experiments
have localized these processes in different parts of the brain (Dapretto and Bookheimer
1999). Both of these types of reasoning should be accounted for in a probabilistic model
of language.

[Dapretto and Bookheimer1999]  Mirella Dapretto and Susan Y. Bookheimer. 1999. Form and
content: Dissociating syntax and semantics in sentence comprehension. Neuron,
24(2):427–432.

To see how these mental processes interact, consider the following sentence from a
travel brochure,

	Next weekend, you could be relaxing in ____.

How do we reason about filling in the blank? First, because the missing word
is the object of a preposition, it should act like a noun, perhaps a location
like “bed,” “school,” or “church.” Second, because the document is about
travel, we expect travel-related terms. This further restricts the space of
possible terms, leaving alternatives like “Nepal,” “Paris,” or “Bermuda” as
likely possibilities. Each type of reasoning restricts the likely solution to
a subset of words, but the best candidates for the missing word are in their
intersection.

ABSTRACT:

The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency. Each
element in the dependency parse tree also has a distribution over the topics
of its children, as in latent-state syntax models, which provides the
syntactic consistency. These distributions are convolved so that the topic of
each word is likely under both its document and syntactic context. We derive
a fast posterior inference algorithm based on variational methods. We report
qualitative and quantitative studies on both synthetic data and hand-parsed
documents. We show that the STM is a more predictive model of language than
current models based only on syntax or only on topics.

}}




@inproceedings{yang-ramanan-12_proxemics-in-personal-photos,
  title={Recognizing proxemics in personal photos},
  author={Yang, Yi and Baker, Simon and Kannan, Anitha and Ramanan, Deva},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on},
  pages={3522--3529},
  year={2012},
  organization={IEEE}
}



@inproceedings{parkHSS-sheikh-13iccv_3D-reconstruction-articulation,
  title={3D reconstruction of a smooth articulated trajectory from a monocular image
     sequence},
  author={Park, Hyun Soo and Sheikh, Yaser},
  booktitle={Computer Vision (ICCV), 2011 IEEE International Conference on},
  pages={201--208},
  year={2011},
  organization={IEEE}
  annote ={

http://www.cs.cmu.edu/~hyunsoop/articulated_trajectory.html

ABSTRACT
An articulated trajectory is defined as a trajectory that remains at a fixed
distance with respect to a parent trajectory. In this paper, we present a
method to reconstruct an articulated trajectory in three dimensions given the
two dimensional projection of the articulated trajectory, the 3D parent
trajectory, and the camera pose at each time instant. This is a core
challenge in reconstructing the 3D motion of articulated structures such as
the human body because endpoints of each limb form articulated
trajectories. We simultaneously apply activity-independent spatial and
temporal constraints, in the form of fixed 3D distance to the parent
trajectory and smooth 3D motion. There exist two solutions that satisfy each
instantaneous 2D projection and articulation constraint (a ray intersects a
sphere at up to two locations) and we show that resolving this ambiguity by
enforcing smoothness is equivalent to solving a binary quadratic programming
problem. A geometric analysis of the reconstruction of articulated
trajectories is also presented and a measure of the reconstructibility of an
articulated trajectory is proposed.

}}




@inproceedings{karthikeyan-manjunath-13iccv_where-what-we-see,
  title={From Where and How to What We See},
  author={Karthikeyan, S and Jagadeesh, Vignesh and Shenoy, Renuka and Eckstein,
     Miguel and Manjunath, BS},
  booktitle={IEEE International Conference on Computer Vision$\}$}
  year=2013,
  annote = {

predicting face and text regions in images using eye tracking data from
multiple subjects.

}}



@article{kennedy-balzano-13_online-factorization-SVD,
  title={Online Algorithms for Factorization-Based Structure from Motion},
  author={Kennedy, Ryan and Balzano, Laura and Wright, Stephen J and Taylor, Camillo J},
  year={2013},
  journal=arxiv
  annote = {

++++

Low-rank matrix completion is the problem of recovering a low-rank matrix
from an incomplete sample of the entries. It was shown in [3], [22] that
under assumptions on the number of observed entries and on incoherence of the
singular vectors of this matrix with respect to the canonical coordinate
axes, the nuclear norm minimization convex optimization problem solves the
NP-hard rank minimization problem exactly.  Since this breakthrough, a flurry
of research activity has centered around developing faster algorithms to
solve this convex optimization problem, both exact and approximate; see [23],
[24] for two examples. The online algorithm Grouse [6] (Grassmannian Rank-One
Update Subspace Estimation) outperforms all nonparallel algorithms in
computational efficiency, often by an order of magnitude, while remaining
competitive in terms of estimation error.

[3] B. Recht, “A simpler approach to matrix completion,” Journal of Machine
    Learning Research, vol. 12, pp. 3413–3430, 2011. 1, 2
[6] L. Balzano, R. Nowak, and B. Recht, “Online identification and tracking
    of subspaces from highly incomplete information,” in Communication,
    Control, and Computing (Allerton). IEEE, 2010, pp. 704–711. 1, 2, 3, 8
[22] E. Cand`es and B. Recht, “Exact matrix completion via convex
    optimization,” Foundations of Computational Mathematics, vol. 9, no. 6,
    pp. 717–772, December 2009. 2

ABSTRACT
We present a family of online algorithms for real-time factorization-based
structure from motion, leveraging a relationship between incremental singular
value decomposition and recently proposed methods for online matrix
completion. Our methods are orders of magnitude faster than previous state of
the art, can handle missing data and a variable number of feature points, and
are robust to noise and sparse outliers. We demonstrate our methods on both
real and synthetic sequences and show that they perform well in both online
and batch settings. We also provide an implementation which is able to
produce 3D models in real time using a laptop with a webcam.

}}




@inproceedings{srivastava-salakhutdinov-12-nips_multimodal-learning-deep,
  title={Multimodal learning with deep Boltzmann machines},
  author={Srivastava, Nitish and Salakhutdinov, Ruslan},
  booktitle={Advances in Neural Information Processing Systems 25},
  pages={2231--2239},
  year={2012},
  abstract = {

A Deep Boltzmann Machine is described for learning a generative model of data
that consists of multiple and diverse input modalities. The model can be used
to extract a unified representation that fuses modalities together. We find that
this representation is useful for classification and information retrieval tasks. The
model works by learning a probability density over the space of multimodal inputs.
It uses states of latent variables as representations of the input. The model can
extract this representation even when some modalities are absent by sampling
from the conditional distribution over them and filling them in. Our experimental
results on bi-modal data consisting of images and text show that the Multimodal
DBM can learn a good generative model of the joint space of image and text
inputs that is useful for information retrieval from both unimodal and multimodal
queries. We further demonstrate that this model significantly outperforms SVMs
and LDA on discriminative tasks. Finally, we compare our model to other deep
learning methods, including autoencoders and deep belief networks, and show that
it achieves noticeable gains. },

  annote = {

Inputs are text and images separately; these are then merged after a 3-layer
initial set.  Can have different data-flow models between the layers,
resulting in different computational (training) costs.

Enables system to learn correlations between images and text labels, so that
searching by new labels returns images etc.

Based on the MIR Flickr Data set:  1 million images
retrieved from Flickr along with their user assigned tags.

[10] Mark J. Huiskes and Michael S. Lew. The MIR Flickr retrieval
	evaluation. In MIR ’08: Proceedings of the 2008 ACM International
	Conference on Multimedia Information Retrieval, New York, NY, USA,
	2008. ACM.

}}



@incollection{monner-reggia-11_systematically-grounding-language-deep,
  title={Systematically grounding language through vision in a deep, recurrent neural
     network},
  author={Monner, Derek D and Reggia, James A},
  booktitle={Artificial General Intelligence},
  pages={112--121},
  year={2011},
  publisher={Springer}
}



@inproceedings{hsiao-hebert-13_gradient-networks-shape-matching,
  title={Gradient Networks: Explicit Shape Matching Without Extracting Edges},
  author={Hsiao, Edward and Hebert, Martial},
  booktitle{Proceedings AAAI '13},
  year={2013}
  annote = {

ABSTRACT

We present a novel framework for shape-based template matching in
images. While previous approaches required brittle contour extraction,
considered only local information, or used coarse statistics, we propose to
match the shape explicitly on low-level gradients by formulating the problem
as traversing paths in a gradient network. We evaluate our algorithm on a
challenging dataset of objects in cluttered environments and demonstrate
significant improvement over state-of-theart methods for shape matching and
object detection.

}}

 

@article{pezzulo-barsalou-cangelosi-11_mechanics-of-embodiment-computational,
  title={The mechanics of embodiment: a dialog on embodiment and computational modeling},
  author={Pezzulo, G. and Barsalou, L.W. and Cangelosi, A. and Fischer, M.H. and
     McRae, K. and Spivey, M.J.},
  journal={Frontiers in psychology},
  volume={2},
  year={2011},
  publisher={Frontiers Media SA}
  annote = {

Abstract
Embodied theories are increasingly challenging traditional views of cognition
by arguing that conceptual representations that constitute our knowledge are
grounded in sensory and motor experiences, and processed at this sensorimotor
level, rather than being represented and processed abstractly in an amodal
conceptual system. Given the established empirical foundation, and the
relatively underspecified theories to date, many researchers are extremely
interested in embodied cognition but are clamoring for more mechanistic
implementations. What is needed at this stage is a push toward explicit
computational models that implement sensorimotor grounding as intrinsic to
cognitive processes. In this article, six authors from varying backgrounds
and approaches address issues concerning the construction of embodied
computational models, and illustrate what they view as the critical current
and next steps toward mechanistic theories of embodiment. The first part has
the form of a dialog between two fictional characters: Ernest, the
"experimenter," and Mary, the "computational modeler." The dialog consists of
an interactive sequence of questions, requests for clarification, challenges,
and (tentative) answers, and touches the most important aspects of grounded
theories that should inform computational modeling and, conversely, the
impact that computational modeling could have on embodied theories. The
second part of the article discusses the most important open challenges for
embodied computational modeling.

}}



@article{little-sommer-13_learning-action-perception-loops,
  title={Learning and exploration in action-perception loops},
  author={Little, Daniel Y and Sommer, Friedrich T},
  journal={Frontiers in neural circuits},
  volume={7},
  year={2013},
  instn = {UC Berkeley-Molecular and Cell Biology},
  date = {22 March},
  doi = {10.3389/fncir.2013.00037},
  annote={

ABSTRACT:

Discovering the structure underlying observed data is a recurring problem in
machine learning with important applications in neuroscience. It is also a
primary function of the brain. When data can be actively collected in the
context of a closed action-perception loop, behavior becomes a critical
determinant of learning efficiency. Psychologists studying exploration and
curiosity in humans and animals have long argued that learning itself is a
primary motivator of behavior. However, the theoretical basis of
learning-driven behavior is not well understood. Previous computational
studies of behavior have largely focused on the control problem of maximizing
acquisition of rewards and have treated learning the structure of data as a
secondary objective. Here, we study exploration in the absence of external
reward feedback. Instead, we take the quality of an agent's learned internal
model to be the primary objective. In a simple probabilistic framework, we
derive a Bayesian estimate for the amount of information about the
environment an agent can expect to receive by taking an action, a measure we
term the predicted information gain (PIG). We develop exploration strategies
that approximately maximize PIG. One strategy based on value-iteration
consistently learns faster than previously developed reward-free exploration
strategies across a diverse range of environments. Psychologists believe the
evolutionary advantage of learning-driven exploration lies in the generalized
utility of an accurate internal model. Consistent with this hypothesis, we
demonstrate that agents which learn more efficiently during exploration are
later better able to accomplish a range of goal-directed tasks. We will
conclude by discussing how our work elucidates the explorative behaviors of
animals and humans, its relationship to other computational models of
behavior, and its potential application to experimental design, such as in
closed-loop neurophysiology studies.


1. Introduction

Computational models of exploratory behavior have largely focused on the role
of exploration in the acquisition of external rewards (Thrun, 1992; Kaelbling
et al., 1996; Sutton and Barto, 1998; Kawato and Samejima, 2007). In
contrast, a consensus has emerged in behavioral psychology that learning
represents the primary drive underlying explorative behaviors (Archer and
Birke, 1983; Loewenstein, 1994; Silvia, 2005; Pisula, 2009). The
computational principles underlying learning-driven exploration, however,
have received much less attention. To address this gap, we introduce here a
mathematical framework for studying how behavior affects learning and develop
a novel model of learning-driven exploration.

Machine learning techniques for extracting the structure underlying sensory
signals have often focused on passive learning systems that can not directly
affect the sensory input. Exploration, in contrast, requires actively
pursuing useful information and can only occur in the context of a closed
action-perception loop. Learning in closed action-perception loops differs
from passive learning both in terms of “what” is being learned as well as
“how” it is learned (Gordon et al., 2011). In particular, in closed
action-perception loops:

  *  Sensorimotor contingencies must be learned.
  *  Actions must be coordinated to direct the acquisition of data.

Sensorimotor contingencies refer to the causal role actions play on the
sensory inputs we receive, such as the way visual inputs change as we shift
our gaze or move our head. They must be taken into account to properly
attribute changes in sensory signals to their causes. This tight interaction
between actions and sensation is reflected in the neuroanatomy where
sensory-motor integration has been reported at all levels of the brain
(Guillery, 2005; Guillery and Sherman, 2011). We often take our implicit
understanding of sensorimotor contingencies for granted, but in fact they
must be learned during the course of development (the exception being
contingencies for which we are hard-wired by evolution). This is eloquently
expressed in the explorative behaviors of young infants (e.g., grasping and
manipulating objects during proprioceptive exploration and then bringing them
into visual view during intermodal exploration) (Rochat, 1989; O'Regan and
Noë, 2001; Noë, 2004).

Not only are actions part of “what” we learn during exploration, they are
also part of “how” we learn. To discover what is inside an unfamiliar box, a
curious child must open it. To learn about the world, scientists perform
experiments. Directing the acquisition of data is particularly important for
embodied agents whose actuators and sensors are physically confined. Since
the most informative data may not always be accessible to a physical sensor,
embodiment may constrain an exploring agent and require that it coordinates
its actions to retrieve useful data.

}}



@article{emonet-varadarajan-13pami_temporal-motif-mixtures-dirichlet-process,
  title={Temporal Analysis of Motif Mixtures using Dirichlet Processes.},
  author={Emonet, R{\'e}mi and Varadarajan, Jagannadan and Odobez, Jean-March},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2013}
  annote = {

ABSTRACT: In this paper, we present a new model for unsupervised discovery of
recurrent temporal patterns (or motifs) in time series (or documents). The
model is designed to handle the difficult case of multivariate time series
obtained from a mixture of activities, that is, our observations are caused
by the superposition of multiple phenomena occurring concurrently and with no
synchronization.  The model uses nonparametric Bayesian methods to describe
both the motifs and their occurrences in documents. We derive an inference
scheme to automatically and simultaneously recover the recurrent motifs (both
their characteristics and number) and their occurrence instants in each
document. The model is widely applicable and is illustrated on datasets
coming from multiple modalities, mainly videos from static cameras and audio
localization data. The rich semantic interpretation that the model offers can
be leveraged in tasks such as event counting or for scene analysis. The
approach is also used as a mean of doing soft camera calibration in a camera
network. A thorough study of the model parameters is provided and a
cross-platform implementation of the inference algorithm will be made
publicly available.

earlier version from AVSS-11:
emonet-varadarajan-11avss_multi-camera-anomaly-detection

}}




@article{perfors-tenenbaum-regier-11_11learnability-of-syntax,
  title={The learnability of abstract syntactic principles},
  author={Perfors, Amy and Tenenbaum, Joshua B and Regier, Terry},
  journal={Cognition},
  volume={118},
  number={3},
  pages={306--338},
  year={2011},
  publisher={Elsevier}

ABSTRACT:

Children acquiring language infer the correct form of syntactic constructions
for which they appear to have little or no direct evidence, avoiding simple
but incorrect generalizations that would be consistent with the data they
receive. These generalizations must be guided by some inductive bias – some
abstract knowledge – that leads them to prefer the correct hypotheses even in
the absence of directly supporting evidence. What form do these inductive
constraints take? It is often argued or assumed that they reflect innately
specified knowledge of language. A classic example of such an argument moves
from the phenomenon of auxiliary fronting in English interrogatives to the
conclusion that children must innately know that syntactic rules are defined
over hierarchical phrase structures rather than linear sequences of words
(e.g., Chomsky, 1965, Chomsky, 1971, Chomsky, 1980 and Crain and Nakayama,
1987). Here we use a Bayesian framework for grammar induction to address a
version of this argument and show that, given typical child-directed speech
and certain innate domain-general capacities, an ideal learner could
recognize the hierarchical phrase structure of language without having this
knowledge innately specified as part of the language faculty. We discuss the
implications of this analysis for accounts of human language acquisition.

}}




@inproceedings{tahri-youcef-13_efficient-pose-estimation-from-set-of-points,
  title={Efficient decoupled pose estimation from a set of points},
  author={Tahri, Omar and Araujo, Helder and Mezouar, Youcef and Chaumette, Fran{\c{c}}ois
     and others},
  booktitle={IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, IROS'2013},
  year={2013}

}}




@inproceedings{salah-coenen-13_extracting-debate-graphs-UK,
  title={Extracting debate graphs from parliamentary transcripts: a study directed
     at UK house of commons debates},
  author={Salah, Zaher and Coenen, Frans and Grossi, Davide},
  booktitle={Proceedings of the Fourteenth International Conference on Artificial
     Intelligence and Law},
  pages={121--130},
  year={2013},
  annote = {

ABSTRACT:
The paper proposes a framework---the Debate Graph Extraction (DGE)
framework---for extracting debate graphs from transcripts of political
debates. The idea is to represent the structure of a debate as a graph with
speakers as nodes and "exchanges" as links. Links between nodes are
established according to the semantic similarity between the speeches and
indicate an alignment of content between them. Nodes are labelled according
to the "attitude" (sentiment) of the speakers, positive or negative, using a
lexicon based technique founded on SentiWordNet. The attitude of the speakers
is then used to label the graph links as being either "supporting" or
"opposing". If both speakers have the same attitude (both negative or both
positive) the link is labelled as being supporting; otherwise the link is
labelled as being opposing. The resulting graphs capture the abstract
representation of a debate as two opposing fractions exchanging arguments on
related content.

}}

====ICML 2013



@inproceedings{livni-lehavi-13-icml_vanishing-component-analysis,
  title={Vanishing Component Analysis},
  author={Livni, Roi and Lehavi, David and Schein, Sagi and Nachliely, Hila and Shalev-Shwartz,
     Shai and Globerson, Amir},
  booktitle={Proceedings of the 30th International Conference on Machine Learning
     (ICML-13)},
  pages={597--605},
  year={2013},
  annote = {


ABSTRACT

The vanishing ideal of a set of points, S IN Rn, is the set of all
polynomials that attain the value of zero on all the points in S. Such ideals
can be compactly represented using a small set of polynomials known as
generators of the ideal. Here we describe and analyze an efficient procedure
that constructs a set of generators of a vanishing ideal. Our procedure is
numerically stable, and can be used to find approximately vanishing
polynomials.  The resulting polynomials capture nonlinear structure in data,
and can for example be used within supervised learning. Empirical comparison
with kernel methods show that our method constructs more compact classi-
ers with comparable accuracy.

}}



@inproceedings{balasubramanian-yu-K-13-icml_smooth-sparse-coding,
  title={Smooth Sparse Coding via Marginal Regression for Learning Sparse
  Representations}, author={Balasubramanian, Krishnakumar and Yu, Kai and
  Lebanon, Guy}, booktitle={Proceedings of the 30th International Conference
  on Machine Learning (ICML-13)}, pages={597--605}, arXiv={arXiv preprint
  arXiv:1210.1121}, year={2012} }



@inproceedings{maurer-pontil-13-icml_sparse-coding-for-multitask-and-transfer-learning,
  title={Sparse coding for multitask and transfer learning}, author={Andreas
  Maurer and Massi Pontil and Bernardino Romera-Paredes},
  booktitle={International Conference on Machine Learning (ICML)$\}$},
  year={2013} }



@inproceedings{muandet-scholkopf-13-icml_domain-generalization-via-invariant-features,
  title={Domain Generalization via Invariant Feature Representation},
  author={Muandet, Krikamol and Balduzzi, David and Sch{\"o}lkopf, Bernhard},
  booktitle={International Conference on Machine Learning (ICML)$\}$},
  year={2013} }


@inproceedings{zhangX-chuD-13icml_sparse-uncorrelated-LDA, title={Sparse
  Uncorrelated Linear Discriminant Analysis}, author={Zhang, Xiaowei and Chu,
  Delin}, booktitle={Proceedings of the 30th International Conference on
  Machine Learning (ICML-13)}, pages={45--52}, year={2013} annote = {

ABSTRACT.  we develop a novel approach for sparse uncorrelated linear
discriminant analysis (ULDA). Our proposal is based on characterization of
all solutions of the generalized ULDA. We incorporate sparsity into the ULDA
transformation by seeking the solution with minimum `1-norm from all minimum
dimension solutions of the generalized ULDA. The problem is then formulated
as a `1-minimization problem and is solved by accelerated linearized Bregman
method. Experiments on high-dimensional gene expression data demonstrate that
our approach not only computes extremely sparse solutions but also performs
well in classification. Experimental results also show that our approach can
help for data visualization in lowdimensional space.

}}



@inproceedings{hennig-13-icml_fast-probabilistic-optimization-w-noise,
  title={Fast probabilistic optimization from noisy gradients},
  author={Hennig, Philipp}, booktitle={Proceedings of the 30th International
  Conference on Machine Learning (ICML-13)}, pages={62--70}, year={2013} }



@inproceedings{zhu-J-chen-N-13-icml_gibbs-max-margin-topic-models,
  title={Gibbs Max-Margin Topic Models with Fast Sampling Algorithms},
  author={Zhu, Jun and Chen, Ning and Perkins, Hugh and Zhang, Bo},
  booktitle={Proceedings of the 30th International Conference on Machine
  Learning (ICML-13)}, pages={124--132}, year={2013}

annote={

ABSTRACT: Existing max-margin supervised topic models rely on an iterative
procedure to solve multiple latent SVM subproblems with additional mean-field
assumptions on the desired posterior distributions. This paper presents Gibbs
max-margin supervised topic models by minimizing an expected margin loss, an
upper bound of the existing margin loss derived from an expected prediction
rule. By introducing augmented variables, we develop simple and fast Gibbs
sampling algorithms with no restricting assumptions and no need to solve SVM
subproblems for both classification and regression. Empirical results
demonstrate significant improvements on time efficiency. The classification
performance is also significantly improved over competitors.

}}



@inproceedings{menon-tamuz-13-icml_learning-to-program-by-example, title = {A
    Machine Learning Framework for Programming by Example}, url =
    {http://jmlr.csail.mit.edu/proceedings/papers/v28/menon13.pdf}, author =
    {Aditya Menon and Omer Tamuz and Sumit Gulwani and Butler Lampson and
    Adam Kalai}, number = {1}, pages = {187-195}, volume = {28}, editor =
    {Sanjoy Dasgupta and David Mcallester}, year = {2013}, booktitle =
    {Proceedings of the 30th International Conference on Machine Learning
    (ICML-13)} annote = {

ABSTRACT: Learning programs is a timely and interesting challenge. In
Programming by Example (PBE), a system attempts to infer a program from input
and output examples alone, by searching for a composition of some set of base
functions. We show how machine learning can be used to speed up this
seemingly hopeless search problem, by learning weights that relate textual
features describing the provided input-output examples to plausible
sub-components of a program. This generic learning framework lets us address
problems beyond the scope of earlier PBE systems. Experiments on a prototype
implementation show that learning improves search and ranking on a variety of
text processing tasks found on help forums.

}}



@inproceedings{song-darrell-13_discriminatively-activated-sparselets,
  title={Discriminatively Activated Sparselets}, author={Song, Hyun O and
  Darrell, Trevor and Girshick, Ross B}, booktitle={Proceedings of the 30th
  International Conference on Machine Learning (ICML-13)}, pages={196--204},
  year={2013} }


@inproceedings{anandkumar-adel-13-icml_linear-bayesian-networks-latent, title
    = {Learning Linear Bayesian Networks with Latent Variables}, author =
    {Animashree Anandkumar and Adel Javanmard and Daniel J. Hsu and Sham
    M. Kakade}, booktitle={Proceedings of the 30th International Conference
    on Machine Learning (ICML-13)}, pages = {249-257}, url =
    {http://jmlr.csail.mit.edu/proceedings/papers/v28/anandkumar13.pdf},
    abstract = {

This work considers the problem of learning linear Bayesian networks when
some of the variables are unobserved. Identifiability and efficient recovery
from low-order observable moments are established under a novel graphical
constraint. The constraint concerns the expansion properties of the
underlying directed acyclic graph (DAG) between observed and unobserved
variables in the network, and it is satisfied by many natural families of
DAGs that include multi-level DAGs, DAGs with effective depth one, as well as
certain families of polytrees.

}}



@inproceedings{zuluaga-sergent-13-icml_active-learning-multi-objective-optimization,
  title={Active Learning for Multi-Objective Optimization}, author={Zuluaga,
  Marcela and Sergent, Guillaume and Krause, Andreas and P{\"u}schel,
  Markus}, booktitle={Proceedings of the 30th International Conference on
  Machine Learning (ICML-13)}, pages={462--470}, year={2013} annote = {

ABSTRACT: In many fields one encounters the challenge of identifying, out of
a pool of possible designs, those that simultaneously optimize multiple
objectives. This means that usually there is not one optimal design but an
entire set of Pareto-optimal ones with optimal tradeoffs in the
objectives. In many applications, evaluating one design is expensive; thus,
an exhaustive search for the Pareto-optimal set is unfeasible. To address
this challenge, we propose the Pareto Active Learning (PAL) algorithm, which
intelligently samples the design space to predict the Pareto-optimal set. Key
features of PAL include (1) modeling the objectives as samples from a
Gaussian process distribution to capture structure and accommodate noisy
evaluation; (2) a method to carefully choose the next design to evaluate to
maximize progress; and (3) the ability to control prediction accuracy and
sampling cost. We provide theoretical bounds on PAL’s sampling cost required
to achieve a desired accuracy. Further, we show an experimental evaluation on
three real-world data sets. The results show PAL’s effectiveness; in
particular it improves significantly over a state-of-the-art evolutionary
algorithm, saving in many cases about 33%.

}}



====




@inproceedings{quang-bazzani-13_unifying-manifold-regularization, title={A
  unifying framework for vector-valued manifold regularization and multi-view
  learning}, author={Quang, Minh H and Bazzani, Loris and Murino, Vittorio},
  booktitle={Proceedings of the 30th International Conference on Machine
  Learning (ICML-13)}, pages={100--108}, year={2013} annote = {

ABSTRACT This paper presents a general vector-valued reproducing kernel
Hilbert spaces (RKHS) formulation for the problem of learning an unknown
functional dependency between a structured input space and a structured
output space, in the Semi-Supervised Learning setting. Our formulation
includes as special cases Vector-valued Manifold Regularization and
Multi-view Learning, thus provides in particular a unifying framework linking
these two important learning approaches. In the case of least square loss
function, we provide a closed form solution with an efficient implementation. Numerical
     experiments
on challenging multi-class categorization problems show that our multi-view
learning formulation achieves results which are comparable with state of the
art and are significantly better than single-view learning.

}}




====ICML TOP CITES




@article{leeH-grosse-ng-11_unsupervised-hierarchical-representation-deep-learning,
title={Unsupervised learning of hierarchical representations with convolutional deep
     belief networks},
author={Lee, H. and Grosse, R. and Ranganath, R. and Ng, A.Y.},
journal={Communications of the ACM},
volume={54},
number={10},
pages={95--103},
year={2011},
annote = {

[feature hierarchies are discovered via convolutional RBM
with "max-pooling" to generate compact descriptors.
]

based on ICML-09 paper (315 cites):
Convolutional deep belief networks for scalable unsupervised learning of hierarchical
     representations


ABSTRACT. There has been much interest in unsupervised learning of
hierarchical generative models such as deep belief networks (DBNs); however,
scaling such models to full-sized, highdimensional images remains a difficult
problem. To address this problem, we present the convolutional deep belief
network, a hierarchical generative model that scales to realistic image
sizes. This model is translation-invariant and supports efficient bottom-up
and top-down probabilistic inference.  Key to our approach is probabilistic
max-pooling, a novel technique that shrinks the representations of higher
layers in a probabilistically sound way. Our experiments show that the
algorithm learns useful high-level visual features, such as object parts,
from unlabeled images of objects and natural scenes. We demonstrate excellent
performance on several visual recognition tasks and show that our model can
perform hierarchical (bottom-up and top-down) inference over full-sized
images.

}}




@inproceedings{duchi-chandra-T-08_projections-onto-L1-ball-for-high-dimensions,
  title={Efficient projections onto the l 1-ball for learning in high dimensions},
  author={Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar},
  booktitle={Proceedings of the 25th international conference on Machine learning},
  pages={272--279},
  year={2008},
  annote = {

ABSTRACT:
We describe efficient algorithms for projecting a vector onto the ℓ1-ball. We
present two methods for projection. The first performs exact projection in
O(n) expected time, where n is the dimension of the space. The second works
on vectors k of whose elements are perturbed outside the ℓ1-ball, projecting
in O(k log(n)) time. This setting is especially useful for online learning in
sparse feature spaces such as text categorization applications. We
demonstrate the merits and effectiveness of our algorithms in numerous batch
and online learning tasks. We show that variants of stochastic gradient
projection methods augmented with our efficient projection procedures
outperform interior point methods, which are considered state-of-the-art
optimization techniques. We also show that in online settings gradient
updates with ℓ1 projections outperform the exponentiated gradient algorithm
while obtaining models with high degrees of sparsity.

1. Introduction

A prevalent machine learning approach for decision and prediction problems is
to cast the learning task as penalized convex optimization. In penalized
convex optimization we seek a set of parameters, gathered together in a
vector w, which minimizes a convex objective function in w with an additional
penalty term that assesses the complexity of w. Two commonly used penalties
are the 1- norm and the square of the 2-norm of w.

An alternative but mathematically equivalent approach is to cast the problem
as a constrained optimization problem. In this setting we seek a minimizer of
the objective function while constraining the solution to have a bounded
norm. Many recent advances in statistical machine learning and related fields
can be explained as convex optimization subject to a 1-norm constraint on the
vector of parameters w. Imposing an ℓ1 constraint leads to notable
benefits. First, it encourages sparse solutions, i.e a solution for which
many components of w are zero. When the original dimension of w is very high,
a sparse solution enables easier interpretation of the problem in a lower
dimension space.

For the usage of ℓ1-based approach in statistical machine learning see for
example (Tibshirani, 1996) and the references therein. Donoho (2006b)
provided sufficient conditions for obtaining an optimal ℓ1-norm solution
which is sparse. Recent work on compressed sensing (Candes, 2006; Donoho,
2006a) further explores how ℓ1 constraints can be used for recovering a
sparse signal sampled below the Nyquist rate.  The second motivation for
using ℓ1 constraints in machine learning problems is that in some cases it
leads to improved generalization bounds. For example, Ng (2004) examined the
task of PAC learning a sparse predictor and analyzed cases in which an ℓ1
constraint results in better solutions than an ℓ2 constraint.

}}

@
Online dictionary learning for sparse coding
J Mairal, F Bach, J Ponce, G Sapiro
Proceedings of the 26th Annual International Conference on Machine Learning
..

}}

@
vincent-larochelle-bengio-08icml_robust-features-w-denoising-autoencoders
Extracting and composing robust features with denoising autoencoders
P Vincent, H Larochelle, Y Bengio, PA Manzagol
Proceedings of the 25th international conference on Machine learning,
1096-1103

ABSTRACT
Previous work has shown that the difficulties in learning deep generative or
discriminative models can be overcome by an initial unsupervised learning
step that maps inputs to useful intermediate representations. We introduce
and motivate a new training principle for unsupervised learning of a
representation based on the idea of making the learned representations robust
to partial corruption of the input pattern. This approach can be used to
train autoencoders, and these denoising autoencoders can be stacked to
initialize deep architectures. The algorithm can be motivated from a manifold
learning and information theoretic perspective or from a generative model
perspective. Comparative experiments clearly show the surprising advantage of
corrupting the input of autoencoders on a pattern classification benchmark
suite.

}}



@article{wang-komodakis-paragios_13-markov-random-field-modeling-Comp-vision,
  title={Markov Random Field modeling, inference \& learning in computer vision \&
     image understanding: A survey},
  author={Wang, Chaohui and Komodakis, Nikos and Paragios, Nikos},
  journal={Computer Vision and Image Understanding},
  volume={117},
  number={11},
  pages={1610--1627},
  year={2013},
  annote = {

ABSTRACT:
In this paper, we present a comprehensive survey of Markov Random Fields
(MRFs) in computer vision and image understanding, with respect to the
modeling, the inference and the learning. While MRFs were introduced into the
computer vision field about two decades ago, they started to become a
ubiquitous tool for solving visual perception problems around the turn of the
millennium following the emergence of efficient inference methods. During the
past decade, a variety of MRF models as well as inference and learning
methods have been developed for addressing numerous low, mid and high-level
vision problems.  While most of the literature concerns pairwise MRFs, in
recent years we have also witnessed significant progress in higher-order
MRFs, which substantially enhances the expressiveness of graph-based models
and expands the domain of solvable problems. This survey provides a compact
and informative summary of the major literature in this research topic.

---

Mathematically, let D denote the observed data and x a latent parameter
vector that corresponds to a mathematical answer to the visual perception
problem. Visual perception can then be formulated as finding a mapping from D
to x, which is essentially an inverse problem[1]. Mathematical methods
usually model such a mapping through an optimization problem as follows:

		x_opt = argmin_x E(x,D;w)

where the energy (or cost, objective) function E(x, D; w) can be regarded as
a quality measure of a parameter configuration x in the solution space given
the observed data D, and w denotes the model parameters.1  Hence, visual
perception involves three main tasks: modeling, inference and learning. The
modeling has to accomplish: (i) the choice of an appropriate representation
of the solution using a tuple of variables x; and (ii) the design of the
class of energy functions E(x, D; w) which can correctly measure the
connection between x and D. The inference has to search for the configuration
of x leading to the optimum of the energy function, which corresponds to the
solution of the original problem. The learning aims to select the optimal
model parameters w based on the training data.

The main difficulty in the modeling lies in the fact that most of the vision
problems are inverse, ill-posed and require a large number of latent and/or
observed variables to express the expected variations of the perception
answer. Furthermore, the observed signals are usually noisy, incomplete and
often only provide a partial view of the desired space. Hence, a successful
model usually requires a reasonable regularization, a robust data measure,
and a compact structure between the variables of interest to adequately
characterize their relationship (which is usually unknown). In the Bayesian
paradigm, the model prior, the data likelihood and the dependence properties
correspond respectively to these terms, and the maximization of the posterior
probability of the latent variables corresponds to the minimization of the
energy function in Eq. (1).

Probabilistic graphical models (usually referred to as graphical models)
combine probability theory and graph theory towards a natural and powerful
formalism for modeling and solving inference and estimation problems in
various scientific and engineering fields. In particular, one important type
of graphical models – Markov Random Fields (MRFs) – has become a ubiquitous
methodology for solving visual perception problems, in terms of both the
expressive potential of the modeling process and the optimality properties of
the corresponding inference algorithm, due to their ability to model soft
contextual constraints between variables and the significant development of
inference methods for such models. Generally speaking, MRFs have the
following major useful properties that one can benefit from during the
algorithm design. First, MRFs provide a modular, flexible and principled way
to combine regularization (or prior), data likelihood terms and other useful
cues within a single graph-formulation, where continuous and discrete
variables can be simultaneously considered. Second, the graph theoretic side
of MRFs provides a simple way to visualize the structure of a model and
facilitates the choice and the design of the model. Third, the factorization
of the joint probability over a graph could lead to inference problems that
can be solved in a computationally efficient manner. In particular,
development of inference methods based on discrete optimization enhances the
potential of discrete MRFs and significantly enlarges the set of visual
perception problems to which MRFs can be applied. Last but not least, the
probabilistic side of MRFs gives rise to potential advantages in terms of
parameter learning (e.g., [2], [3], [4] and [5]) and uncertainty analysis
(e.g., [6] and [7]) over classic variational methods [8] and [9], due to the
introduction of probabilistic explanation to the solution [1].

}}



@article{sagha-chavarriaga-13-prl_online-anomaly-in-classifier-ensembles,
  title={On-line anomaly detection and resilience in classifier ensembles},
  author={Sagha, Hesam and Bayati, Hamidreza and Mill{\'a}n, Jos{\'e} del R and Chavarriaga,
     Ricardo},
  journal={Pattern Recognition Letters},
  year={2013},
  publisher={North-Holland}
  annote =

++
}}



@article{bilen-namboodri-vanGool-13_object-action-classify-latent-windows,
  title={Object and Action Classification with Latent Window Parameters},
  author={Bilen, Hakan and Namboodiri, Vinay P and Van Gool, Luc J},
  journal={International Journal of Computer Vision},
  pages={1--15},
  year={2013},
  annote = {

Use Crop and Split operations to identify rectangles in the image where
salient info about activity may lie.  These are detected as

ABSTRACT  we propose a generic framework to incorporate
unobserved auxiliary information for classifying objects and actions. This
framework allows us to automatically select a bounding box and its quadrants
from which best to extract features. These spatial subdivisions are learnt as
latent variables. The paper is an extended version of our earlier work Bilen
et al. (Proceedings of The British Machine Vision Conference, 2011),
complemented with additional ideas, experiments and analysis.

We approach the classification problem in a discriminative setting, as
learning a max-margin classifier that infers the class label along with the
latent variables. Through this paper we make the following contributions: (a)
we provide a method for incorporating latent variables into object and action
classification; (b) these variables determine the relative focus on
foreground versus background information that is taken account of; (c) we
design an objective function to more effectively learn in unbalanced data
sets; (d) we learn a better classifier by iterative expansion of the latent
parameter space.We demonstrate the performance of our approach through
experimental evaluation on a number of standard object and action recognition
data sets.

}}



@article{liang-jordan-klein-13_learning-dependency-based-semantics,
  title={Learning dependency-based compositional semantics},
  author={Liang, Percy and Jordan, Michael I and Klein, Dan},
  journal={Computational Linguistics},
  volume={39},
  number={2},
  pages={389--446},
  year={2013},
  annote = {

ABSTRACT:
Suppose we want to build a system that answers a natural language question by
representing its semantics as a logical form and computing the answer given a
structured database of facts. The core part of such a system is the semantic
parser that maps questions to logical forms. Semantic parsers are typically
trained from examples of questions annotated with their target logical forms,
but this type of annotation is expensive.

Our goal is to learn a semantic parser from question-answer pairs instead,
where the logical form is modeled as a latent variable. Motivated by this
challenging learning problem, we develop a new semantic formalism,
dependency-based compositional semantics (DCS), which has favorable
linguistic, statistical, and computational properties. We define a log-linear
distribution over DCS logical forms and estimate the parameters using a
simple procedure that alternates between beam search and numerical
optimization. On two standard semantic parsing benchmarks, our system
outperforms all existing stateof- the-art systems, despite using no annotated
logical forms.

}}




@article{luG-kudo-toyama-12_temporal-segmentation-actions-in-video,
  title={Temporal Segmentation and Assignment of Successive Actions in a Long-Term
     Video},
  author={Lu, Guoliang and Kudo, Mineichi and Toyama, Jun},
  journal={Pattern Recognition Letters},
  year={2012},
  annote = {

ABSTRACT:

We exploit a novel learning-based framework [for] Temporal segmentation of
successive actions in a long-term video. Given a video sequence, only a few
characteristic frames are selected by the proposed selection algorithm, and
then the likelihood to trained models is calculated in a pair-wise way, and
finally segmentation is obtained as the optimal model sequence to realize the
maximum likelihood. The average accuracy on IXMAS dataset reached to 80.5% at
frame level, using only 16.5% of all frames in computation time of 1.57 s per
video which has 1160 frames on the average.

}}





@inproceedings{burghouts-hove-13_action-recog-multiple-views-bag-of-words,
  title={Improved action recognition by combining multiple 2D views in the Bag-of-Words
     model},
  author={Burghouts, Gertjan and Eendebak, Pieter and Bouma, Henri and ten Hove,
     Johan-Martijn},
  booktitle={Advanced Video and Signal Based Surveillance (AVSS), 2013 10th IEEE
     International Conference on},
  pages={250--255},
  year={2013},
  annote = {

ABSTRACT:
Action recognition is a hard problem due to the many degrees of freedom of
the human body and the movement of its limbs. This is especially hard when
only one camera viewpoint is available and when actions involve subtle
movements. For instance, when looked from the side, checking one's watch may
look very similar to crossing one's arms. In this paper, we investigate how
much the recognition can be improved when multiple views are available. The
novelty is that we explore various combination schemes within the robust and
ppppsimple bag-of-words (BoW) framework, from early fusion of features to late
fusion of multiple classifiers. In new experiments on the publicly available
IXMAS dataset, we learn that action recognition can be improved significantly
already by only adding one viewpoint. We demonstrate that the
state-of-the-art on this dataset can be improved by 5% - achieving 96.4%
accuracy - when multiple views are combined. Cross-view invariance of the BoW
pipeline can be improved by 32% with intermediate-level fusion.

}}



@article{wang-gould-roller-13_discriminative-learning-cluttered-indoor-scenes-w-latent-var,
    
  title={Discriminative learning with latent variables for cluttered indoor scene
     understanding},
  author={Wang, Huayan and Gould, Stephen and Roller, Daphne},
  journal={Communications of the ACM},
  volume={56},
  number={4},
  pages={92--99},
  year={2013},
  annote = {

original: ECCV-12
Stephen Gould - phd stanford
}




====IJC 13



@article{hadjinikolis-modgil-13-ijc_opponent-modeling-dialogues,
  title={Opponent Modelling in Persuasion Dialogues},
  author={Hadjinikolis, Christos and Yiannis Siantos and Sanjay Modgil and Elizabeth
     Black and Peter McBurney},
  annote = {

ABSTRACT:
A strategy is used by a participant in a persuasion dialogue to select
locutions most likely to achieve its objective of persuading its
opponent. Such strategies often assume that the participant has a model of
its opponents, which may be constructed on the basis of a participant's
accumulated dialogue experience. However in most cases the fact that an
agent's experience may encode additional information which if appropriately
used could increase a strategy's efficiency, is neglected. In this work, we
rely on an agent's experience to define a mechanism for augmenting an
opponent model with information likely to be dialectally related to
information already contained in it. Precise computation of this likelihood
is exponential in the volume of related information. We thus describe and
evaluate an approximate approach for computing these likelihoods based on
Monte-Carlo simulation.

}}

====



@article{gongD-zhao-medioni-12icml-multiple-manifold-structure-learning,
  title={Robust Multiple Manifolds Structure Learning},
  author={Gong, D. and Zhao, X. and Medioni, G.},
  journal={ICML-12},
  year={2012}
  annote = {

Combines local manifold construction and merges the manifolds obtained based
on a new curvature-level similarity measure.  A terrific idea.

??project?  is code available?

ABSTRACT: We present a robust multiple manifold structure learning (RMMSL)
scheme to robustly estimate data structures under the multiple low intrinsic
dimensional manifolds assumption. In the local learning stage, RMMSL
efficiently estimates local tangent space by weighted low-rank matrix
factorization. In the global learning stage, we propose a robust manifold
clustering method based on local structure learning results. The proposed
clustering method is designed to get the flattest manifolds clusters by
introducing a novel curved-level similarity function. Our approach is
evaluated and compared to state-of-the-art methods on synthetic data,
handwritten digit images, human motion capture data and motorbike videos. We
demonstrate the effectiveness of the proposed approach, which yields higher
clustering accuracy, and produces promising results for challenging tasks of
human motion segmentation and motion flow learning from videos.

ICML site has discussion+video
http://icml.cc/discuss/2012/191.html

}}




@InProceedings{boots-gordon-12icml_two-manifold-merging-from-separate-views,
  author =    {Byron Boots and Geoff Gordon},
  title =     {Two-Manifold Problems with Applications to Nonlinear System Identification},
    
  booktitle = {Proceedings of the 29th International Conference on Machine Learning
     (ICML-12)},
  series =    {ICML '12},
  year =      {2012},
  editor =    {John Langford and Joelle Pineau},
  location =  {Edinburgh, Scotland, GB},
  isbn =      {978-1-4503-1285-1},
  month =     {July},
  publisher = {Omnipress},
  address =   {New York, NY, USA},
  pages=      {623--630},
  url =       {http://arxiv.org/abs/1206.4648},
  annote = {

ABSTRACT: Recently, there has been much interest in spectral approaches to
learning manifolds—so-called kernel eigenmap methods. These methods have had
some successes, but their applicability is limited because they are not
robust to noise. To address this limitation, we look at two-manifold
problems, in which we simultaneously reconstruct two related manifolds, each
representing a different view of the same data. By solving these
interconnected learning problems together, two-manifold algorithms are able
to succeed where a non-integrated approach would fail: each view allows us to
suppress noise in the other, reducing bias. We propose a class of algorithms
for two-manifold problems, based on spectral decomposition of
cross-covariance operators in Hilbert space and discuss when two-manifold
problems are useful. Finally, we demonstrate that solving a two-manifold
problem can aid in learning a nonlinear dynamical system from limited data.

discussion+video on ICML site.
http://icml.cc/discuss/2012/338.html

}}



@InProceedings{varoquaux-gramfort-12icml_small-sample-fmri-spatial-clustering,
  author =    {Gael Varoquaux and Alexandre Gramfort and Bertrand Thirion},
  title =     {Small-sample brain mapping: sparse recovery on spatially correlated
     designs with randomization and clustering},
  booktitle = {Proceedings of the 29th International Conference on Machine Learning
     (ICML-12)},
  series =    {ICML '12},
  year =      {2012},
  editor =    {John Langford and Joelle Pineau},
  location =  {Edinburgh, Scotland, GB},
  isbn =      {978-1-4503-1285-1},
  month =     {July},
  publisher = {Omnipress},
  address =   {New York, NY, USA},
  pages=      {1375--1382},
  annote = {


ABSTRACT: Functional neuroimaging can measure the brain’s response to an
external stimulus. It is used to perform brain mapping: identifying from
these observations the brain regions involved. This problem can be cast into
a linear supervised learning task where the neuroimaging data are used as
predictors for the stimulus. Brain mapping is then seen as a support recovery
problem. On functional MRI (fMRI) data, this problem is particularly
challenging as i) the number of samples is small due to limited acquisition
time and ii) the variables are strongly correlated. We propose to overcome
these difficulties using sparse regression models over new variables obtained
by clustering of the original variables. The use of randomization techniques,
e.g. bootstrap samples, and hierarchical clustering of the variables improves
the recovery properties of sparse methods. We demonstrate the benefit of our
approach on an extensive simulation study as well as two publicly available
fMRI datasets.

discussion+video: http://icml.cc/discuss/2012/688.html

}}


@InProceedings{jawanpuria-nath-12icml_convex-feature-learning-for-latent-task-structure,
  author =    {Pratik Jawanpuria and J. Saketha Nath},
  title =     {A Convex Feature Learning Formulation for Latent Task Structure Discovery},
    
  booktitle = {Proceedings of the 29th International Conference on Machine Learning
     (ICML-12)},
  series =    {ICML '12},
  year =      {2012},
  editor =    {John Langford and Joelle Pineau},
  location =  {Edinburgh, Scotland, GB},
  isbn =      {978-1-4503-1285-1},
  month =     {July},
  publisher = {Omnipress},
  address =   {New York, NY, USA},
  pages=      {137--144},

Abstract: This paper considers the multi-task learning problem and in the
setting where some relevant features could be shared across few related
tasks. Most of the existing methods assume the extent to which the given
tasks are related or share a common feature space to be known apriori. In
real-world applications however, it is desirable to automatically discover
the groups of related tasks that share a feature space. In this paper we aim
at searching the exponentially large space of all possible groups of tasks
that may share a feature space. The main contribution is a convex formulation
that employs a graph-based regularizer and simultaneously discovers few
groups of related tasks, having close-by task parameters, as well as the
feature space shared within each group. The regularizer encodes an important
structure among the groups of tasks leading to an efficient algorithm for
solving it: if there is no feature space under which a group of tasks has
close-by task parameters, then there does not exist such a feature space for
any of its supersets. An efficient active set algorithm that exploits this
simplification and performs a clever search in the exponentially large space
is presented. The algorithm is guaranteed to solve the proposed formulation
(within some precision) in a time polynomial in the number of groups of
related tasks discovered. Empirical results on benchmark datasets show that
the proposed formulation achieves good generalization and outperforms
state-of-the-art multi-task learning algorithms in some cases.

video: http://icml.cc/discuss/2012/90.html

pratik.j, saketh@cse.iitb.ac.in

}}




@InProceedings{takeda-mitsugi-kanamori-12-icml_unified-robust-classification,
  author =    {Akiko Takeda and Hiroyuki  Mitsugi and Takafumi  Kanamori},
  title =     {A Unified Robust Classification Model},
  booktitle = {Proceedings of the 29th International Conference on Machine Learning
     (ICML-12)},
  series =    {ICML '12},
  year =      {2012},
  editor =    {John Langford and Joelle Pineau},
  location =  {Edinburgh, Scotland, GB},
  isbn =      {978-1-4503-1285-1},
  month =     {July},
  publisher = {Omnipress},
  address =   {New York, NY, USA},
  pages=      {129--136},
  annote = {

good review of supervised classification + combine main algorithms

ABSTRACT: A wide variety of machine learning algorithms such as support
vector machine (SVM), minimax probability machine (MPM), and Fisher
discriminant analysis (FDA), exist for binary classification. The purpose of
this paper is to provide a unified classification model that includes the
above models through a robust optimization approach. This unified model has
several benefits. One is that the extensions and improvements intended for
SVM become applicable to MPM and FDA, and vice versa. Another benefit is to
provide theoretical results to above learning methods at once by dealing with
the unified model. We give a statistical interpretation of the unified
classification model and propose a non-convex optimization algorithm that can
be applied to non-convex variants of existing learning methods.

discussion+video : http://icml.cc/discuss/2012/87.html

}}


SUPERVISED LEARNING




@inproceedings{maX-luoP-11ijc_combining-supervised-unsupervised-via-embedding,
  title={Combining supervised and unsupervised models via unconstrained probabilistic
     embedding},
  author={Xudong Ma and Ping Luo and Fuzhen Zhuang and Qing He and Zhongzhi Shi and
     Zhiyong Shen},
  booktitle={Proceedings of the 22nd IJCAI Volume Two},
  pages={1396--1401},
  year={2011},
  abstract = {

Unsupervised learning is used to improve the learning based on
several (conflicting) supervised learners.  Given a data set, an ensemble
categorization system traditionally consists of
several supervised learners assign some class IDs.  In the case of conflicts
one may use voting etc.  In this work, they try to use
several
unsupervised clustering algorithms, each of which create somewhat different
clusters.  The idea in this ensemble of learners is that the items in the
same unsup clusters should belong to the same final classes- as far as
possible.  this is used to tune the result of the supervised learning.

abstract:
Ensemble learning with output from multiple supervised and unsupervised
models aims to improvethe classification accuracy of supervised model
ensembleby jointly considering the grouping results from unsupervised
models. In this paper we cast this ensemble task as an unconstrained
probabilistic embedding problem. Specifically, we assume both objects and
classes/clusters have latent coordinates without constraints in a
D-dimensional Euclidean space, and consider the mapping from the embedded
space into the space of results from supervised and unsupervised models as a
probabilistic generative process. The prediction of an objectis then
determined by the distances between the objectand the classes in the embedded
space. A solution of this embedding can be obtained using the quasi-Newton
method, resulting in the objects and classes/clusters with high co-occurrence
weights being embedded close. We demonstrate the benefits of this
unconstrained embedding method by three real applications.

}}



@inproceedings{xiaoY-liuB-11ijc_similarity-based-positive-and-unlabeled-learning,
  title={Similarity-based approach for positive and unlabelled learning},
  author={Yanshan Xiao and Bo Liu and Jie Yin and Longbing Cao and Chengqi Zhang
     and Zhifeng Hao},
  booktitle={Proceedings of the 22nd IJCAI Volume Two},
  pages={1577--1582},
  year={2011},
  abstract = {

Positive and unlabelled learning (PU learning) has been investigated to deal
with the situation where only the positive examples and the unlabelled
examples are available. Most of the previous works focus on identifying some
negative examples from the unlabelled data, so that the supervised learning
methods can be applied to build a classifier. However, for the remaining
unlabelled data, which can not be explicitly identified as positive or
negative (we call them ambiguous examples), they either exclude them from the
training phase or simply enforce them to either class. Consequently, their
performance may be constrained. This paper proposes a novel approach, called
similarity-based PU learning (SPUL) method, by associating the ambiguous
examples with two similarity weights, which indicate the similarity of an
ambiguous example towards the positive class and the negative class,
respectively. The local similarity-based and global similarity-based
mechanisms are proposed to generate the similarity weights. The ambiguous
examples and their similarity-weights are thereafter incorporated into an
SVM-based learning phase to build a more accurate classifier. Extensive
experiments on real-world datasets have shown that SPUL outperforms
state-of-the-art PU learning methods.


}}



@inproceedings{liYF-HuJH-12aaai_what-patterns-trigger-what-labels,
  title={Towards Discovering What Patterns Trigger What Labels},
  author={Yu-Feng Li and Ju-Hua Hu and Yuan Jiang and Zhi-Hua Zhou},
  booktitle={Twenty-Sixth AAAI Conference on Artificial Intelligence},
  year={2012},
  pages = {1012-1018},
  annote = {

multiple labels are associated with each object, with many overlaps.
perhaps a label relates to some subset of features in each object.
how does one create models of categories from this?

formulate the problem as a convex optimization problem.

ABSTRACT:
In many real applications, especially those involving data objects with
complicated semantics, it is generally desirable to discover the relation
between patterns in the input space and labels corresponding to different
semantics in the output space. This task becomes feasible with MIML
(Multi-Instance Multi-Label learning), a recently developed learning
framework, where each data object is represented by multiple instances and is
allowed to be associated with multiple labels simultaneously. In this paper,
we propose KISAR, an MIML algorithm that is able to discover what instances
trigger what labels. By considering the fact that highly relevant labels
usually share some patterns, we develop a convex optimization formulation and
provide an alternating optimization solution. Experiments show that KISAR is
able to discover reasonable relations between input patterns and output
labels, and achieves performances that are highly competitive with many
state-of-the-art MIML algorithms.

}}



@inproceedings{caragea-silvescu-mitra-12_hashing-and-abstraction-sparse-high-D-features,
  title={Combining Hashing and Abstraction in Sparse High Dimensional Feature Spaces},
  author={Cornelia Caragea and Adrian Silvescu and Prasenjit Mitra},
  booktitle={Twenty-Sixth AAAI Conference on Artificial Intelligence},
  year={2012}
  annote = {

a popular approach to information retrieval from documents involves "bag of
words".  with a large vocabulary this becomes extremely high-dimensional and
computationally intractable.  In this work, one applies hashing and
agglomerative clustering to obtain a smaller set of sparse features.

ABSTRACT:
With the exponential increase in the number of documents available online,
e.g., news articles, weblogs, scientific documents, the development of
effective and efficient classification methods is needed. The performance of
document classifiers critically depends, among other things, on the choice of
the feature representation. The commonly used “bag of words” and n-gram
representations can result in prohibitively high dimensional input
spaces. Data mining algorithms applied to these input spaces may be
intractable due to the large number of dimensions. Thus, dimensionality
reduction algorithms that can process data into features fast at runtime,
ideally in constant time per feature, are greatly needed in high throughput
applications, where the number of features and data points can be in the
order of millions. One promising line of research to dimensionality reduction
is feature clustering. We propose to combine two types of feature clustering,
namely hashing and abstraction based on hierarchical agglomerative
clustering, in order to take advantage of the strengths of both techniques.
Experimental results on two text data sets show that the combined approach
uses significantly smaller number of features and gives similar performance
when compared with the “bag of words” and n-gram approaches.

}}



@InProceedings{tangY-salakhutdinov-hinton-12icml_deep-lambertian-albedo-learning,
  author =    {Yichuan Tang and Ruslan Salakhutdinov and Geoffrey Hinton},
  title =     {Deep Lambertian Networks},
  booktitle = {Proceedings of the 29th International Conference on Machine Learning
     (ICML-12)},
  series =    {ICML '12},
  year =      {2012},
  editor =    {John Langford and Joelle Pineau},
  location =  {Edinburgh, Scotland, GB},
  isbn =      {978-1-4503-1285-1},
  month =     {July},
  publisher = {Omnipress},
  address =   {New York, NY, USA},
  pages=      {1623--1630},
  annote = {

ABSTRACT: Visual perception is a challenging problem in part due to
illumination variations. A possible solution is to first estimate an
illumination invariant representation before using it for recognition. The
object albedo and surface normals are examples of such representation. In
this paper, we introduce a multilayer generative model where the latent
variables include the albedo, surface normals, and the light
source. Combining Deep Belief Nets with the Lambertian reflectance
assumption, our model can learn good priors over the albedo from 2D
images. Illumination variations can be explained by changing only the
lighting latent variable in our model. By transferring learned knowledge from
similar objects, albedo and surface normals estimation from a single image is
possible in our model. Experiments demonstrate that our model is able to
generalize as well as improve over standard baselines in one-shot face
recognition.

discussion+video: http://icml.cc/discuss/2012/791.html

}}



@inproceedings{levy-markovitch-12_machine-learning-from-metaphor,
  title={Teaching Machines to Learn by Metaphors},
  author={Omer Levy and Shaul Markovitch},
  booktitle={Twenty-Sixth AAAI Conference on Artificial Intelligence},
  year={2012},
  annote = {

ABSTRACT
Humans have an uncanny ability to learn new concepts with very few
examples. Cognitive theories have suggested that this is done by utilizing
prior experience of related tasks. We propose to emulate this process in
machines, by transforming new problems into old ones. These transformations
are called metaphors. Obviously, the learner is not given a metaphor, but
must acquire one through a learning process. We show that learning metaphors
yield better results than existing transfer learning methods. Moreover, we
argue that metaphors give a qualitative assessment of task relatedness.

}}




@phdthesis{frank-13_bayesian-models-of-syntactic-category-acquisition,
  title={Bayesian models of syntactic category acquisition},
  author={Frank, Stella Christina},
  year={2013},
  publisher={The University of Edinburgh}
  annote = {

Both unsupervised morphological analysis and POS-tagging.
Including the sentence type improves performance.

Uses the EVE corpus from CHILDES

POS TAGGING:
models with local
context (MORPHTAG, MORPHTAGNOSEG, BHMM) do dramatically
better than models clustering words
using only morphological information.

Pitman-Yor model of data statistics (MORPHTAGNOSEG) does slightly better than
the Dirichlet-multinomial (BHMM)

MORPHOLOGY:

3 evaluation measures:
tagVM, suffixVM, EMMA

[either over-segmentation, or misses.  harder to evaluate. ]

Using SuffixVM or Emma to evaluate morphological segmentation performance,
MORPHTAGTRUETAGS outperforms others, esp those without local
syntactic constraints.

expts in spanish similar.

}}


====???

@
Smita, Sirker;
Can we infer the non-Observable Mind without Language?
Language:	English
Subject:	Philosophy
Issue:	1/2009
Page Range:	129-134
No. of Pages:	5
File size:	41 KB

Summary:
We know our minds through introspection and others through inference. The
occult perception of the one’s “mind” is dependent on the “mental activity”;
dependent on the “awareness of one’s mental states” itself. One finds
difficulty in separating the distinct roles of inference and perception in
case of self-knowledge. The life of philosophers, brain scientists and of
course the ordinary folks sails through the stormy debate concerning whether
“mind and its states exist” quite peacefully. The discourse between the
philosopher and the brain scientist; the philosopher and the ordinary folk;
the brain scientist and ordinary folk presupposes that our minds exist and we
share our thoughts and doubts through our ordinary language which in a big
way helps us in the inference of “other minds”. This brief article explores
the role of ordinary language in our discourse to discover the enigma of the
“mind”.

Keywords:	Descartes’ myth; introspection; ordinary language; mental
activity; inference of mind; privilege access; phenomenal experience.

}}

====ROBOTICS



@article{fangY-liuX-zhangX-12_adaptive-visual-servoing-nonholonomic,
  title={Adaptive active visual servoing of nonholonomic mobile robots},
  author={Fang, Yongchun and Liu, Xi and Zhang, Xuebo},
  journal={Industrial Electronics, IEEE Transactions on},
  volume={59},
  number={1},
  pages={486--497},
  year={2012},
  publisher={IEEE}
  annote = {

ABSTRACT—This paper presents a novel two-level scheme for adaptive active
visual servoing of a mobile robot equipped with a pan camera. In the lower
level, the pan platform carrying an onboard camera is controlled to keep the
reference points lying around the center of the image plane. On the higher
level, a switched controller is utilized to drive the mobile robot to reach
the desired configuration through image feature feedback. The designed active
visual servoing system presents such advantages as follows: 1) a satisfactory
solution for the field-of-view problem; 2) global high servoing efficiency;
and 3) free of any complex pose estimation algorithm usually required for
visual servoing systems.  The performance of the active visual servoing
system is proven by rigorousmathematical analysis. Both simulation and
experimental results are provided to validate the effectiveness of the
proposed active visual servoing method.

}}

====


@inproceedings{jia-darrell-13_latent-task-adaptation-w-hierarchies,
  title={Latent Task Adaptation with Large-scale Hierarchies},
  author={Jia, Yangqing and Darrell, Trevor and Wang, Fan and Huang, Qixing and Guibas, 
    Leonidas J and Ni, Bingbing and Moulin, Pierre and Feng, Zheyun and Jin,
    Rong and Jain, Anil and others},
  booktitle={The IEEE International Conference on Computer Vision (ICCV)$\}$},
  year={2013}
}



@inproceedings{azary-savakis-13cvpr_grassmannian-sparse-representation-3D-actions,
  title={Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action
     Recognition},
  author={Azary, Sherif and Savakis, Andreas},
  booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE
     Conference on},
  pages={492--499},
  year={2013},
  annote = {

ABSTRACT:
Manifold learning has been effectively used in computer vision applications
for dimensionality reduction that improves classification performance and
reduces computational load. Grassmann manifolds are well suited for computer
vision problems because they promote smooth surfaces where points are
represented as subspaces. In this paper we propose Grassmannian Sparse
Representations (GSR), a novel subspace learning algorithm that combines the
benefits of Grassmann manifolds with sparse representations using least
squares loss L1-norm minimization for optimal classification. We further
introduce a new descriptor that we term Motion Depth Surface (MDS) and
compare its classification performance against the traditional Motion History
Image (MHI) descriptor. We demonstrate the effectiveness of GSR on
computationally intensive 3D action sequences from the Microsoft Research
3D-Action and 3D-Gesture datasets.

}}



@inproceedings{vemulapalli-chellappa-13cvpr_kernel-learning-manifolds,
  title={Kernel learning for extrinsic classification of manifold features},
  author={Vemulapalli, Raviteja and Pillai, Jaishanker K and Chellappa, Rama},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on},
  pages={1782--1789},
  year={2013},
  annote={

ABSTRACT:
In computer vision applications, features often lie on Riemannian manifolds
with known geometry. Popular learning algorithms such as discriminant
analysis, partial least squares, support vector machines, etc., are not
directly applicable to such features due to the non-Euclidean nature of the
underlying spaces. Hence, classification is often performed in an extrinsic
manner by mapping the manifolds to Euclidean spaces using kernels. However,
for kernel based approaches, poor choice of kernel often results in reduced
performance. In this paper, we address the issue of kernel selection for the
classification of features that lie on Riemannian manifolds using the kernel
learning approach. We propose two criteria for jointly learning the kernel
and the classifier using a single optimization problem. Specifically, for the
SVM classifier, we formulate the problem of learning a good kernel-classifier
combination as a convex optimization problem and solve it efficiently
following the multiple kernel learning approach. Experimental results on
image set-based classification and activity recognition clearly demonstrate
the superiority of the proposed approach over existing methods for
classification of manifold features.

}}





@article{harandi-sanderson-13prl_kernel-on-grassmann-manifold-for-action-recog,
  title={Kernel analysis on Grassmann manifolds for action recognition},
  author={Harandi, Mehrtash T and Sanderson, Conrad and Shirazi, Sareh and Lovell,
     Brian C},
  journal={Pattern Recognition Letters},
  year={2013},
  annote = {

ABSTRACT:
Modelling video sequences by subspaces has recently shown promise for
recognising human actions. Subspaces are able to accommodate the effects of
various image variations and can capture the dynamic properties of
actions. Subspaces form a non-Euclidean and curved Riemannian manifold known
as a Grassmann manifold. Inference on manifold spaces usually is achieved by
embedding the manifolds in higher dimensional Euclidean spaces. In this
paper, we instead propose to embed the Grassmann manifolds into reproducing
kernel Hilbert spaces and then tackle the problem of discriminant analysis on
such manifolds. To achieve efficient machinery, we propose graph-based local
discriminant analysis that utilises within-class and between-class similarity
graphs to characterise intra-class compactness and inter-class separability,
respectively. Experiments on KTH, UCF Sports, and Ballet datasets show that
the proposed approach obtains marked improvements in discrimination accuracy
in comparison to several state-of-the-art methods, such as the kernel version
of affine hull image-set distance, tensor canonical correlation analysis,
spatial-temporal words and hierarchy of discriminative space-time
neighbourhood features.

}}


UPDATE:

@inproceedings{blasiak-rangwala-11ijc_hmm-variant-for-sequence-classifiction,
  title={A hidden markov model variant for sequence classification},
  author={Sam Blasiak and Huzefa Rangwala},



@inproceedings{ciresan-meier-11ijc_convolutional-NN-for-image-classification,
  title={Flexible, high performance convolutional neural networks for image classification},
    
  author={Dan C. Cire{\c{s}}an and Ueli Meier and Jonathan Masci and Luca Maria Gambardella
     and Jürgen Schmidhuber},

@
InProceedings{matuszek-fitzgerald-zettlemoyer-12icml_joint-language-and-perception-learning,
    
~vedant/cs365/hw2/report.pdf



@inproceedings {chambers-jurafsky-11_template-script-extraction-from-text,
  title =	 {Template-based information extraction without the templates},
  author =	 {Nathanael Chambers and Dan Jurafsky},
RELATED: ICML-12 abstract:
Learning the Central Events and Participants in Unlabeled Text
Nathanael Chambers and Dan Jurafsky




@InProceedings{mnih-tehYW-12icml_neural-probabilistic-language-models,
  author =    {Andriy Mnih and Yee Whye Teh},
  title =     {A fast and simple algorithm for training neural probabilistic language
     models},
(rohitangsu das review
~rohitdas/cs365/hw2/paper_cs365.pdf



@inproceedings{kalakrishnan-righetti-11iros_learning-force-control-policies-compliant-manipulation,
    
  title =	 {Learning force control policies for compliant manipulation },
author =	 {Mrinal Kalakrishnan and Ludovic Righetti and Peter Pastor
and Stefan Schaal},
booktitle =	 {Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ },
pages =	 {4639-4644},




@article{hinton-srivastava-12_NN-prevents-co-adaptation-of-feature-detectors,
  title={Improving neural networks by preventing co-adaptation of feature detectors},
  author={Hinton, G.E. and Srivastava, N. and Krizhevsky, A. and Sutskever, I. and
     Salakhutdinov, R.R.},
  journal={arXiv preprint arXiv:1207.0580},
  year={2012},



@article{silveira-malis-12_direct-visual-servoing-control-nonmetric,
  title={Direct visual servoing: Vision-based estimation and control using only nonmetric
     information},
  author={Silveira, Geraldo and Malis, Ezio},
  journal={Robotics, IEEE Transactions on},
  volume={28},
  number={4},
  pages={974--980},
  year={2012},
  annote = {

ABSTRACT:

This paper addresses the problem of stabilizing a robot at a pose specified
via a reference image. Specifically, this paper focuses on six
degrees-of-freedom visual servoing techniques that require neither metric
information of the observed object nor precise camera and/or robot
calibration parameters. Not requiring them improves the flexibility and
robustness of servoing tasks. However, existing techniques within the focused
class need prior knowledge of the object shape and/or of the camera motion.
We present a new visual servoing technique that requires none of the
aforementioned information. The proposed technique directly exploits 1) the
projective parameters that relate the current image with the reference one
and 2) the pixel intensities to obtain these parameters. The level of
versatility and accuracy of servoing tasks are, thus, further improved. We
also show that the proposed nonmetric scheme allows for path planning.  In
this way, the domain of convergence is greatly enlarged as well. Theoretical
proofs and experimental results demonstrate that visual servoing can, indeed,
be highly accurate and robust, despite unknown objects and imaging
conditions. This naturally encompasses the cases of color images and
illumination changes.

this paper focuses on visual servoing techniques that do
not require metric information of the observed target and can control all
6 DOF of a robot. The fact of not requiring metric information improves
the flexibility and robustness of visual servoing tasks [6]. Indeed, recent
studies in the domain of biological vision have suggested that
the brain processes visual information nonmetrically [6]. Surprisingly,
only few works have been conducted on the full 6 DOF nonmetric visual
servoing. Moreover, these existing works require prior knowledge
of the object shape and/or of the camera motion.

[6] L. Thaler and M. A. Goodale, “Beyond distance and direction: The brain
represents target locations non-metrically,” J. Vis., vol. 10, no. 3,
pp. 1–27, 2010.

}}



@article{thaler-goodale-10-j-vision_beyond-distance-brain-non-metrically,
  title={Beyond distance and direction: The brain represents target locations non-metrically},
    
  author={Thaler, Lore and Goodale, Melvyn A},
  journal={Journal of Vision},
  volume={10},
  number={3},
  year={2010},
  publisher={Association for Research in Vision and Ophthalmology},
  annote = {

ABSTRACT:
In their day-to-day activities human beings are constantly generating
behavior, such as pointing, grasping or verbal reports, on the basis of
visible target locations. The question arises how the brain represents target
locations. One possibility is that the brain represents them metrically,
i.e. in terms of distance and direction. Another equally plausible
possibility is that the brain represents locations non-metrically, using for
example ordered geometry or topology. Here we report two experiments that
were designed to test if the brain represents locations metrically or
non-metrically. We measured accuracy and variability of visually guided
reach-to-point movements (Experiment 1) and probe-stimulus adjustments
(Experiment 2). The specific procedure of informing subjects about the
relevant response on each trial enabled us to dissociate the use of
non-metric target location from the use of metric distance and direction in
head/eye-centered, hand-centered and externally defined (allocentric)
coordinates. The behavioral data show that subjects' responses are least
variable when they can direct their response at a visible target location,
the only condition that permitted the use of non-metric information about
target location in our experiments. Data from Experiments 1 and 2 correspond
well quantitatively. Response variability in non-metric conditions cannot be
predicted based on response variability in metric conditions. We conclude
that the brain uses non-metric geometrical structure to represent locations.

}



@article{tahri-youcef-13ras_robust-visual-servoing-invariant,
  title={Robust image-based visual servoing using invariant visual information},
  author={Tahri, Omar and Araujo, Helder and Chaumette, Fran{\c{c}}ois and Mezouar,
     Youcef},
  journal={Robotics and Autonomous Systems},
  volume={61},
  number={12},
  pages={1588--1600},
  year={2013},
  annote = {

Catadiotropic camera - camera + mirrors w single optical center

A unified model for central imaging systems has been proposed in [9].
It consists in modeling the central imaging systems by two consecutive pro-
jections: spherical and then perspective ...

[9] C. Geyer and K. Daniilidis. A Unifying Theory for Central Panoramic
Systems and Practical Implications. In Computer Vision- ECCV 2000
(pp. 445-461). Springer Berlin Heidelberg.

ABSTRACT:
This paper deals with the use of invariant visual features for visual
servoing. New features are proposed to control the 6 degrees of freedom of a
robotic system with better linearizing properties and robustness to noise
than the state of the art in image-based visual servoing. We show that by
using these features the behavior of image-based visual
servoing in task space can be significantly improved. Several experimental
results are provided and validate our proposal.


}}


@article{candes-li-X-11-jacm_robust-PCA-noisy-matrix,
  title={Robust principal component analysis?},
  author={Cand{\`e}s, Emmanuel J and Li, Xiaodong and Ma, Yi and Wright, John},
  journal={Journal of the ACM (JACM)},
  volume={58},
  number={3},
  pages={11},
  year={2011},
  annote ={

ABSTRACT:

This article is about a curious phenomenon. Suppose we have a data matrix,
which is the superposition of a low-rank component and a sparse
component. Can we recover each component individually? We prove that under
some suitable assumptions, it is possible to recover both the low-rank and
the sparse components exactly by solving a very convenient convex program
called Principal Component Pursuit; among all feasible decompositions, simply
minimize a weighted combination of the nuclear norm and of the L1 norm. This
suggests the possibility of a principled approach to robust principal
component analysis since our methodology and results assert that one can
recover the principal components of a data matrix even though a positive
fraction of its entries are arbitrarily corrupted. This extends to the
situation where a fraction of the entries are missing as well.  We discuss an
algorithm for solving this optimization problem, and present applications in
the area of video surveillance, where our methodology allows for the
detection of objects in a cluttered background, and in the area of face
recognition, where it offers a principled way of removing shadows and
specularities in images of faces.


}}




@article{vandenBerg-abbeel-11ijrr_lqg-mp-motion-planning-uncertainty,
  title={LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect
     state information},
  author={Van Den Berg, Jur and Abbeel, Pieter and Goldberg, Ken},
  journal={The International Journal of Robotics Research},
  volume={30},
  number={7},
  pages={895--913},
  year={2011},
  publisher={SAGE Publications}
}

====



@inproceedings{jurgens-stevens-11_impact-of-sense-similarity-on-WSD,
  title={Measuring the impact of sense similarity on word sense induction},
  author={Jurgens, David and Stevens, Keith},
  booktitle={Proceedings of the First Workshop on Unsupervised Learning in NLP},
  pages={113--123},
  year={2011},
  annote = {

ABSTRACT:
We describe results of a word sense annotation task using WordNet, involving
half a dozen well-trained annotators on ten polysemous words for three parts
of speech. One hundred sentences for each word were annotated. Annotators had
the same level of training and experience, but interannotator agreement (IA)
varied across words. There was some effect of part of speech, with higher
agreement on nouns and adjectives, but within the words for each part of
speech there was wide variation. This variation in IA does not correlate with
number of senses in the inventory, or the number of senses actually selected
by annotators. In fact, IA was sometimes quite high for words with many
senses. We claim that the IA variation is due to the word meanings, contexts
of use, and individual differences among annotators. We find some correlation
of IA with sense confusability as measured by a sense confusion threshhold
(CT). Data mining for association rules on a flattened data representation
indicating each annotator’s sense choices identifies outliers for some words,
and systematic differences among pairs of annotators on others.

}}



@article{goldwasser-roth-13acl_leveraging-domain-independent-semantics,
  author = {Dan Goldwasser and Dan Roth},
  title = {Leveraging Domain-Independent Information in Semantic Parsing},
  booktitle = {ACL},
  year = {2013},
  url = "http://cogcomp.cs.illinois.edu/papers/GoldwasserRoth13.pdf",
  annote = {

ABSTRACT:
Semantic parsing is a domain-dependent process by nature, as its output is
defined over a set of domain symbols. Motivated by the observation that
interpretation can be decomposed into domain-dependent and independent
components, we suggest a novel interpretation model, which augments a domain
dependent model with abstract information that can be shared by multiple
domains. Our experiments show that this type of information is useful and can
reduce the annotation effort significantly when moving between domains.

}}

==== ICCV



@article{ordonez-berg-13iccv_large-scale-image-entry-level-categories,
  title={From Large Scale Image Categorization to Entry-Level Categories},
  author={Ordonez, Vicente and Deng, Jia and Choi, Yejin and Berg, Alexander C and
     Berg, Tamara L},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2013},
  annote = {

MARR PRIZE 2013

}}




@inproceedings{cinbis-verbeek-13iccv_segmentation-driven-object-detection,
  title={Segmentation Driven Object Detection with Fisher Vectors},
  author={Cinbis, Ramazan Gokberk and Verbeek, Jakob and Schmid, Cordelia and others},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2013}
  annote = {

ABSTRACT:
We present an object detection system based on the Fisher vector (FV) image
representation computed over SIFT and color descriptors. For computational
and storage efficiency, we use a recent segmentation-based method to generate
class-independent object detection hypotheses, in combination with data
compression techniques. Our main contribution is a method to produce
tentative object segmentation masks to suppress background clutter in the
features. Re-weighting the local image features based on these masks is shown
to improve object detection significantly. We also exploit contextual
features in the form of a full-image FV descriptor, and an inter-category
rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show
that our detector improves over the current state-of-the-art detection
results.
}}




@inproceedings{faragasso-oriolo-13-icra_vision-based-humanoid-corridor-walk,
  title={Vision-based corridor navigation for humanoid robots},
  author={Faragasso, Angela and Oriolo, Giuseppe and Paolillo, Antonio and Vendittelli,
     Marilena},
  booktitle={Robotics and Automation (ICRA), 2013 IEEE International Conference on},
  pages={3190--3195},
  year={2013},
  organization={IEEE}
  annote = {

Walks and turns (Nao) along a plain-wall artificial corridor environment.

ABSTRACT:

We present a control-based approach for visual navigation of humanoid robots
in office-like environments. In particular, the objective of the humanoid is
to follow a maze of corridors, walking as close as possible to their center
to maximize motion safety. Our control algorithm is inspired by a technique
originally designed for unicycle robots and extended here to cope with the
presence of turns and junctions. The feedback signals computed for the
unicycle are transformed to inputs that are suited for the locomotion system
of the humanoid, producing a natural, human-like behavior. Experimental
results for the humanoid robot NAO are presented to show the validity of the
approach, and in particular the successful extension of the controller to
turns and junctions.

[6] J. M. Toibero, C. M. Soria, F. Roberti, R. Carelli, and P. Fiorini,
“Switching visual servoing approach for stable corridor navigation,”
in 14th International Conference on Advanced Robotics, pp. 1–6, 2009.

}}


@inproceedings{vandenberg-lin-manocha-08_reciprocal-velocity-obstacles,
  title={Reciprocal velocity obstacles for real-time multi-agent navigation},
  author={Van den Berg, Jur and Lin, Ming and Manocha, Dinesh},
  booktitle={Robotics and Automation, 2008. ICRA 2008. IEEE
International Conference on},
  pages={1928--1935},
  year={2008},
  organization={IEEE},
  annote={

ABSTRACT:

In this paper, we propose a new concept — the “Reciprocal Velocity Obstacle”—
for real-time multi-agent navigation. We consider the case in which each
agent navigates independently without explicit communication with other
agents. Our formulation is an extension of the Velocity Obstacle concept [3],
which was introduced for navigation among (passively) moving obstacles. Our
approach takes into account the reactive behavior of the other agents by
implicitly assuming that the other agents make a similar collision-avoidance
reasoning.  We show that this method guarantees safe and oscillationfree
motions for each of the agents. We apply our concept to navigation of
hundreds of agents in densely populated environments containing both static
and moving obstacles, and we show that real-time and scalable performance is
achieved in such challenging scenarios.

}}

CS365: Artificial Intelligence

Department of Computer Science & Engineering, IIT Kanpur

Jan - Apr 2014

Homework 2 - Paper Review

What to write in your review (and in your poster)

Selection based on early choice : Project groups

Submission:

Papers Selected for Review

MACHINE LEARNIING

COMPUTER VISION

COGNITIVE LEARNIING

NATURAL LANGUAGE PROCESSING

ROBOTICS / ROBOT MOTION PLANNING

BibTeX Entries