Learning image segmentation and recognition
Learning spectral graph segmentation. Timothee Cour, Nicolas Gogin, Jianbo Shi.
Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2005.

A learnable spectral memory graph for recognition and segmentation. Timothee Cour, Jianbo Shi. 
Technical Report MS-CIS-04-12, University of Pennsylvania CIS Technical Reports, Philadelphia, PA, June 2004.


Image segmentation is often treated as an unsupervised task. Segmentation by human, in contrast, relies heavily on memory to produce an object-like clustering, through a mechanism of controlled hallucination. This paper presents a learning algorithm for memory-driven object segmentation and recognition. We propose a general spectral graph learning algorithm based on gradient descent in the space of graph weight matrix using derivatives of eigenvectors. The gradients are efficiently computed using the theory of implicit functions. This algorithm effectively learns a graph network capable of memorizing and retrieving multiple patterns given noisy inputs. We demonstrate the validity of this approach on segmentation and recognition tasks, including geometric shape extraction, and hand-written digit recognition.

Encoding Recognition and Segmentation with Spectral Graph Cuts

A Spectral Memory Graph consists of a set nodes (pixels, features, class labels), a set of edges indicating which nodes are connected, and a weight function W specifying their pair-wise similarity. Given an input image I, we detect a subset of “on” features, inducing the weight submatrix W(I). The output for both segmentation and recognition in image I is encoded with Ncut eigenvector Xncut, defined as the second eigenvector of the system W(I)X = µ DW(I)X, where DW = diag(W1). The aim is to learn W either directly in the coefficient space (Wij) or through a parameterization.
Three applications are detailed below to illustrate the concept.

Learning algorithm

We use the following error functions for learning images {I}:
 equation 1
    equation 5  

where X*(I) is the target and Xncut[W(I)] is the output (Ncut eigenvector of W(I)). Our goal is to minimize the error function by gradient descent :

equation 2             equation 3

The main difficulty is to compute this gradient term, which involves the partial derivatives of eigenvectors w.r.t. all the coefficients Wij. We show in the paper how to do this efficiently in exact analytical form.


1. Segmentation and Recognition of noisy digits

We show how to apply the algorithm to segmentation and recognition of noisy digits (taken from MNIST database).
Spectral memory Graph for digit segmentation and recognition

input images
learning noisy digits
Results: Top row = Input images ; middle row = target images ; bottom row = learned segmentation

2. Learning boolean functions

We demonstrate that we can learn non-linearly separable functions such as XOR. Here a network is trained to learn simultaneously AND, OR, XOR. The graph nodes are {x, ¬x, y, ¬y, zAND, zOR, zXOR}, and the weight matrix W is of size 7x7 (49 parameters to learn). Note that a seperate network was also trained with just
{x, ¬x, y, ¬y, zXOR}.
 boolean learning
Learning functions AND, OR, XOR. Graph nodes:
{x, ¬x, y, ¬y, zAND, zOR, zXOR}.
Ncut on a subgraph extracted
from the learned graph W encodes the desired logical operations.

3. Learning geometrical shapes

We learned a parameterization of the weights that is invariant by translation, allowing us to learn square shapes with a very limited training set. The graph nodes are pairs of edgelet position xi and edglet angle µj. The graph weights W(xi,µj  ; xi′ ,µj′ ) = f(xi′ −xi, µj′ −µj) are learned with 50 training examples on image size 100 × 100. f has a total of 4096 free parameters to be learned (edgelet position displacements ±dx=±dy =16, edgelet orientation displacements ±dµ = 4).






training set

testing set

testing on real images


Back to my home page