- Leibe, B.; Leonardis, A. & Schiele, B.

Robust Object Detection with Interleaved Categorization and Segmentation

IJCV, Kluwer Academic Publishers, 2008, 77, 259-289

that's the original work presenting the RNN clustering idea - López-Sastre, R. J.; Oñoro-Rubio, D.; Gil-Jiménez, P. & Maldonado-Bascón, S.

Fast Reciprocal Nearest Neighbors Clustering

Signal Processing, 2012, 92, 270-275

a variant of RNN clustering which focus on accelerating it even further

- in Leibe's paper you will find pseudo source code for the RNN clustering algorithm
- but: the main point is not mentioned! what are the final clusters? Quite funny that neither they do mention it in the pseudo code, nor in the paper…
- for this issue have a look into López-Sastre's paper
- it looks similar to the pseudo-code mentioned in Leibe's paper, doesn't it? but compare the
`discard NN chain`

lines… - in López-Sastre's paper you will find the important hint, that each time you discard a NN-chain, you can put all the clusters currently in the NN-chain into the list of final clusters

- both paper's don't tell you what to do in special cases!
- here are the two most important special cases, which are not covered by both Leibe's and López-Sastre's pseudo code
- a cluster can be in exactly one of these three lists:
- the NN-chain list
`L`

`R`

, the list of*open clusters*still to consider- the final cluster list
`C`

- special case #1: the algorithm stops with R is empty, but in the NN-chain list
`L`

there are still some clusters –> we have to check whether we can still merge them further, if not, we discard the list and put the remaining clusters into the final cluster list`C`

- special case #2: the algorithm stops with an empty NN-chain
`L`

and`R`

has exactly one cluster`v`

inside –> we cannot find a NN`v2`

for this cluster from the list of open clusters`R`

(since there is only`v`

inside)! so we put`v`

into the final cluster list`C`

- RNN + fast-RNN reference implementation by Roberto J. López-Sastre

- RNN clustering is based on the construction of reciprocal nearest neighbor pairs (RNN pairs),that is pairs of clusters
`a`

and`b`

, such that`a`

is`b`

’s nearest neighbor and vice versa - why are RNNs so cool?
- the merging of the corresponding clusters
`a,b`

of a reciprocal nearest-neighbor pair to a 3rd new cluster`ab`

does not alter the nearest-neighbor relations of any other clusters`c`

and`d`

- i.e. after merging a,b to a new cluster
`ab`

we just have to recompute the distances between`ab`

and all other clusters`c`

, but that's it! - BUT: this is only true if we make sure
*Bruynooghe’s reducibility property*is fulfilled! - this criterion demands a cluster similarity measure
`Sim()`

such that when two clusters`a`

and`b`

are agglomerated, the similarity of the merged cluster`ab`

to any third cluster`c`

may only decrease – compared to the state before the merging action - and which cluster similarity measure
`Sim()`

fulfill this criteron? - among others:
- group average criterion – regardless of the used
`VecSim()`

measure - centroid criterion based on correlation – but not on Euclidian distances!

- the basic idea here is to construct a nearest-neighbor chain
- what's that?
- it consists of an arbitrary start cluster, which is followed by its nearest neighbor, which is again followed by its nearest neighbor from among the remaining points, and so on
- note that when the nearest neighbor for the last chain element n is the element n-1, we stop with the generation with the NN chain
- we have found a RNN pair and can agglomerate it
- and now comes the actual point!
- the nearest-neighbor assignments stay valid for the remaining chain members, which can thus be reused for the next iteration of agglomeration
- if we do not find any RNN pair more in the remaining NN chain, we create a new one for – again – a random start point

- we can recompute new distances very fast – 'fast' means: not depending on the number of vecs assigned to a cluster, but in constant time
- for this we use the group average
`Sim()`

measure - and represent the 'centroid' of a cluster by the mean and the variance, which can be updated iteratively when merging clusters
- for details see (Leibe et al.)'s paper

This demo will show you the RNN clustering procedure for a larger set of sample vectors to be clustered. Note that the resulting cluster size heavily depends on the threshold `t`

to be defined by the user to launch the clustering process with.

public/reciprocal_nearest_neighbour_clustering_rnn.txt · Last modified: 2012/06/21 16:51 (external edit) · []