Next: 4.7 Discussion
Up: 4. Abstract recurrent neural
Previous: 4.5.2 Results
4.6 Dependence on the number of input dimensions
As shown in section 4.5, the performance of the abstract recurrent neural network depends on the number of input dimensions chosen. A higher number of input dimensions results in a higher sum of square errors per pattern. This relationship is investigated theoretically on a simplified abstract RNN.
The first simplification is that the model consists of spheres instead of ellipsoids. Thus, the distribution of training data is approximated by a set of m codebook vectors , and for each of them the potential field is given by the Euclidean distance. The second simplification is that the codebook vectors are uniformly randomly distributed. This is justified for distributions that are wrapped inside the pattern space, and are not restricted to embedded hyperplanes, as it is likely the case for the kinematic arm model. Recall works as in section 4.2 (see also figure 4.14).
Figure 4.14:
The trained abstract network consists of a set of codebook vectors, here illustrated as circles. These vectors are distributed randomly. Recall happens by finding the closest vector to a constraint space (gray line). The
offset of this line from the origin is the input.

We assume that the codebook vectors lie inside a ddimensional cube of side length two, centered at the origin. Since the codebook vectors are
distributed uniformly, the average error is independent of the specific value of the input (offset of the constrained space)^{4.3}. Thus, we arrange the constrained spaces such that they all go through the
origin. If d  1 input dimensions are given, the constraint is the
x_{d}axis. For d  2 input dimensions the axis x_{d} and x_{d1} span the
constraint space, and so on (see figure 4.15 as an
example). Instead of choosing different input values to compute the sum of
square errors, we draw a new set of from a random distribution
for each test trial.
Figure 4.15:
Example using r = 1 input dimensions in a n = 3 dimensional
space. The x_{2}x_{3} plane is the constrained space. Codebook vectors are
illustrated as circles. Possible codebook locations are enclosed by the cube drawn with dotted lines. The dashed lines show the distances between constraint and codebook vectors.

Given r input dimensions, the squared distance E^{j} of to the
constrained space is
E^{j} = c^{ j}_{i} . 
(4.8) 
We define the square error E as the minimum squared distance to the data approximation (in the kinematic arm model, this matches the computation of the square error in the case of arbitrary directions, see section 4.5.1). Thus,
E = min(c^{1}_{i})^{2},(c^{2}_{i})^{2},...,(c^{m}_{i})^{2} . 
(4.9) 
The sum of square errors per pattern is the expectation value of E given a
random distribution of . To compute the expectation of a minimum, we use the
following trick (Wentzell, 2003). The cumulative probability P_{c}(T) that all E^{j} are larger than a threshold T is computed. P_{c}(T) is monotone descending, starting with P_{c}(0) = 1. The negative derivative of P_{c}(T) can be interpreted as the probability density function of T. For a given T, this function provides the probability density that T equals the smallest member of the set {E^{j}}. Thus, the expectation value of T (the first momentum of the probability density function) equals the expectation value of E, the minimum of {E^{j}}. Therefore,
The probability p that a point has a squared distance
E^{j} larger or equal to T is the cube volume outside the rsphere^{4.4}with radius centered at the origin divided by the cube volume (figure 4.16). The cumulative probability P_{c} is p to the power of m.
Figure 4.16:
The gray area relative to the total area of the square is the
probability that a point lies outside the circle. Two examples with different radii are shown.

To make the function p over T analytically integrable, we make an approximation. The rcube, the space of all possible codebook vectors (in
the first r dimensions), is replaced by an rsphere with radius
centered at the origin. This sphere encloses tightly the rcube. To
compensate this step, we multiply the number of codebook vectors m with the
rsphere volume divided by the rcube volume. The volume of an rsphere
with unit radius can be written as
V_{r} = . 
(4.11) 
is the Gammafunction. It is related to the factorial by
(n) = (n  1)! for positive integers n. For r = 2, for example, the above volume V_{r} equals . Hence, the resulting volume relation v^{r} (here, for the sphere radius ) is
v^{r} = . 
(4.12) 
The above step is justified by the uniform distribution of the
. But, it is not an equivalence transformation (for r > 1). The quality of this approximation will be tested later. The resulting number of
vectors,
= int(v^{r}m), is rounded to an integer value. With the approximation, the
probability p(T) that a vector has a squared distance E^{j}T can be expressed as
p(T) = 1  . 
(4.13) 
The total probability P_{c} that all vectors fulfill the above condition is
P_{c}(T) = p(T) = (1  ()^{})^{} . 
(4.14) 
The value of T extends from 0 to r. Using (4.10), we obtain for
the expectation value of E
E =  T dT = P_{c}(T)dT  P_{c}(T)T = P_{c}(T)dT . 
(4.15) 
The second equality sign uses integration by parts, the last equality sign
uses (4.14). The final integration, using (4.14), gives
The integral was solved with the help of
MATLAB® and its symbolic toolbox. Using the equality
(x) = (x  1)(x  1), the expression can be simplified to
For large , the latter is more feasible for numerical evaluation than (4.16).
We further investigate the quality of our approximation. The case m = 1 can be evaluated correctly. For one codebook vector, the expectation of its
squared distance can be directly calculated:
E = 2^{r}^{ ... }c_{i}^{2}dc_{i} = 2^{r}r = . 
(4.18) 
Figure 4.17 shows the result of this comparison. The mismatch
increases with the number of input dimensions r.
Figure 4.17:
Comparison between the approximation, replacing the rcube by a rsphere resulting in (4.12) and (4.17), and the correctly evaluated result (4.18) for one codebook vector.

We want to check our approximation using larger m. The result from (4.12) and (4.17) was compared to a simulation, in which were drawn randomly from a rcube with side length 2. Figure 4.18 shows the result, using r = 10 input dimensions and 100 000 trials for each m value in the simulation. The approximation got better the more codebook vectors were used. The number of codebook vectors needed for a good approximation depends on the value of r. The higher r, the more vectors are needed.
Figure 4.18:
Comparison between the theory, using (4.12) and
(4.17), and the result from a simulation, with r = 10. The number of codebook vectors is m.

Finally, the dependency on the number of input dimensions is demonstrated for the case m = 200. As above, the
theory is compared to a simulation, using 100 000 trials for each r
value. In this test, theory and simulation results did overlap (figure 4.19). Between r = 5 and r = 8 the increase is approximately exponential with exponent 0.69.
Figure 4.19:
Dependency of the mean square error per output dimension on the number of input dimensions r for m = 200 (d = 10). Theory, using (4.12) and (4.17), and simulation
results are shown.

Next: 4.7 Discussion
Up: 4. Abstract recurrent neural
Previous: 4.5.2 Results
Heiko Hoffmann
20050322