This section explains why a multilayer perceptron that is trained to map data points within a sensory manifold, may map data points outside its training domain closer to the manifold (section 7.3.2, figure 7.11, left). This phenomenon depends on the structure of the training domain. It is not a general property of MLPs.
First, I show that all image vectors have about the same length, independent of the position of the robot. Second, I give a twodimensional synthetic example having the same property. Third, I explain theoretically why in the example data points outside the training domain are mapped closer to the domain. Last, I show that the abstract RNN does not have this property in the example.
We estimate the length of an image vector (the sensory representation). Although the worldtocamera mapping was nonlinear, the image of the obstacle circle was still close to circular (figure 7.3). Its area was further almost independent of the robot's position. Thus, we assume that also on the camera image, the obstacles form a circle with fixed area. Within this region, the robot can stay at any point. To obtain the sensory representation, the circle is subdivided into ten sectors centered at the robot's position (figure 7.15).

Let s_{i} be the length of each sector, and be the angle of every sector (figure 7.15). If is small enough then the area of a sector is well approximated by
A_{i} = s_{i}^{2} .  (7.2) 
^{2} = s_{i}^{2} = A_{i} A_{o} .  (7.3) 
In the synthetic example discussed in the following, a circle is mapped onto a circle; that is, input and output are twodimensional, and the training domain is a circle in the input and in the output space. The two circles would coincide if input and output coordinate system were put on top of each other. Each point in the input circle has in the second circle a target point that is rotated relative to by 23^{o} around the origin. 200 training points uniformly distributed around the circle were generated. An MLP learned the mapping from to for all i = 1,..., 200. The MLP had a three layer structure composed of two input neurons, h = 5 hidden neurons, and two output neurons. In the hidden layer, the activation function was sigmoidal (tanh), and in the other layers, it was the identity function. Initially, the weights were drawn uniformly from the interval [0.1; 0.1]. Using backpropagation in online mode, the network trained until convergence.
Figure 7.16 shows the result after training. Points outside the training domain (distance to the origin: 2.0) were mapped closer to the origin in the output space (distance around 1.5), and points inside the training domain (distance: 0.66) were mapped closer the unit circle (distance around 0.75).

In the following, this finding is studied theoretically. The MLP maps an input to an output ,
In the example with the twodimensional circle, it was observed that in the trained network, the column vectors of were approximately orthogonal and had unit length; the same held for the row vectors^{7.2} of . Thus, we assume that = and = . With this assumption, it can be shown (appendix C.4) that points outside the circle are mapped closer to the circle,
 < .  (7.6) 
The assumption = further predicts that the contraction effect decreases with increasing number of neurons h in the hidden layer. The assumption infers that u^{2}_{jk} = 1. Thus, the expectation value of u^{2}_{jk} equals 1/h. The argument of tanh is u_{jk}s_{k}. Here, the only random variables are {u_{jk}}, since the statement should hold for all . Further, we assume that the expectation value of u_{jk} is zero. Then, for all inputs with length , the expectation value of the squared tanhargument can be written as
This finding was tested with the above experiment for different values of h. The result is shown in table 7.4. The values were averaged over three separately trained networks and on 360 trials each. The length of input vectors was set to 2.0. This experiment is in agreement with the above theoretical prediction.

Different from the MLP, the abstract RNN maintains the scale in the circle task (figure 7.17). The 200 pairs of circle points (,) were approximated using a mixture of five units, each with two principal components (using for training MPPCAext). The centers of the ellipsoids turned out to be evenly distributed around the circle. Figure 7.17 shows that the distance to the origin is consistent between input and output pairs. As in (7.5), the local linear mappings do not change the length of input patterns.
