3.2.1 Algorithm

In the extension of Neural Gas to NGPCA, a code-book vector is replaced by a hyper-ellipsoid. An ellipsoid has a center , and axes given by a local PCA, which extracts *q* eigenvectors
and eigenvalues
with
*l* = 1,..., *q*. The size of the ellipsoid in the direction of the *d* - *q* minor components is given by the mean residual variance
in these directions. The algorithm has the same annealing scheme as Neural Gas, and it also has the same parameters. Again, at the beginning of each annealing step, a data point is randomly drawn from the training set. After presenting a point the centers are updated as the code-book vectors in Neural Gas,

The weight is defined by =

The ranking of the units cannot depend on the Euclidean distance anymore, since this would ignore the shape of the ellipsoids. Instead, an elliptical error measure is chosen,

This measure is a normalized Mahalanobis distance plus reconstruction error (Hinton et al., 1997). = - is the deviation of the vector from the center of a unit. The matrix contains in its columns the eigenvectors , and the diagonal matrix contains the eigenvalues . It can be shown (appendix C.1) that this error measure is the same (up to a constant) as the double negative logarithm of the local probability density given in probabilistic PCA. Thus, a unit can be interpreted as a local density

The second term in (3.2) is the reconstruction error divided by . depends on the estimate of the total residual variance
*v*_{res}, which is updated according to

The total residual variance is evenly distributed among all

This equation is the same as for the noise in probabilistic PCA (2.11). To adjust the principal axes and their lengths, we do one step with an on-line PCA method:

We use a PCA algorithm similar to the robust recursive least square algorithm (RRLSA) (Ouyang et al., 2000). RRLSA is a sequential network of single-neuron principal component analyzers based on deflation of the input vector (Sanger, 1989). While the are normalized to unit length, internally the algorithm works with unnormalized , which are updated according to

After each step

Since the orthogonality of is not fully preserved for each step, the algorithm has to be combined with an orthogonalization method, here we used Gram-Schmidt (Möller, 2002). Orthogonality is essential for the computation of the error (3.2). This orthogonalization concludes one annealing step.

The unit centers are initialized by randomly chosen examples from the training set. The eigenvector estimates are initialized with random orthogonal vectors. The eigenvalues and the variances are initially set to one. To avoid zero variance in a direction during the computation of the local PCA, a uniform noise randomly chosen from the interval [- 0.0005;0.0005] is added to each component of a randomly drawn data point (for all experiments).

2005-03-22