J-Linkage

Problem: fitting multiple instances of a model to data corrupted by noise and outliers.
Theory: Based on random sampling and conceptual data representation. Each point is represented with the characteristic function of the set of random models that fit the point.
A tailored agglomerative clustering, called J-linkage, is used to group points belonging to the same model.
Advantages
- prior specification of the number of models not required
- parameters tuning not required

Pseudo-outlier: outliers to the structure of interest but inliers to a different structure.
Minimal sample sets (MSS): The minimal sets of data points necessary to estimate the model.
Concensus set (CS): The CS of a model is the set of points such that their distance from the model is less than a threshold \(\sigma\).
Preference set (PS): Imagine to build a \(N \times M\) matrix where entry \((i, j)\) is \(1\) if point \(i\) belongs to the CS of model \(j\), 0 otherwise. Each column of that matrix is the characteristic function of the CS of a model hypothesis. Each row indicates which models a point has given consensus to, i.e., which models it prefers. We call this the preference set (PS) of a point.
Conceptual representation: The characteristic function of the preference set of a point can be regarded as a conceptual representation of that point. Points belonging to the same structure will have similar conceptual representations, in other words, they will cluster in the conceptual space.
Jaccard distance: Given two sets A and B, the Jaccard distance is \(d_{\mathrm{J}}(A, B)=\frac{|A \cup B|-|A \cap B|}{|A \cup B|}\).

The Jaccard distance measures the degree of overlap of the two sets and ranges from 0 (identical sets) to 1 (disjoint sets).

The general agglomerative clustering algorithm proceeds in a bottom-up manner: Starting from all singletons, each sweep of the algorithm merges the two clusters with the smallest distance. The way the distance between clusters is computed produces different flavours of the algorithm, namely the simple linkage, complete linkage and average linkage.

Input: the set of data points, each point represented by its preference set (PS)
Output: clusters of points belonging to the same model
Process
1. Put each point in its own cluster.
2. Define the PS of a cluster as the intersection of the PSs of its points.
3. Among all current clusters, pick the two clusters with the smallest Jaccard distance between the respective PSs.
4. Replace these two clusters with the union of the two original ones.
5. Repeat from step 3 while the smallest Jaccard distance is lower than 1.

R. Toldo and A. Fusiello. Robust multiple structures estimation with J-linkage. In European Conference on Computer Vision, pages 537–547, 2008. ↩
Magri, Luca, and Andrea Fusiello. “T-linkage: A continuous relaxation of j-linkage for multi-model fitting.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. ↩