This section shows that a multilayer perceptron maps data points outside its training domain closer to its domain if the perceptron is trained to map data distributed in a circle onto the same circle (see section 7.4). Let be the input to the trained network. Here, has unit length and is a scalar.
We study the effect of on the network output . Let be a h×2 matrix containing the weights between the input and the hidden layer, and be a 2×h matrix with the weights between the hidden and the output layer. Further, let be a column vector of , and be a row vector of . We assume that all threshold values equal zero, and that the weights fulfill: = and = .
We first look at the case = 1. The network output is

Let be the vector with components tanh(y_{j}). A larger number h of hidden units leads to smaller components of (section 7.4: y_{j} equals on average 1/h). Therefore, we approximate tanh(y_{j}) y_{j}. It follows that also lies on the circle in the span of {}.
Next, we look at the effect of the weight matrix . After training, all (which have unit length) are mapped (C.15) onto a circle with radius one. Thus, needs to project the circle in the span of {} onto the unit circle in the twodimensional output space. This is only achieved if both row vectors and lie in the span of {} (otherwise, the projection would be an ellipse). It follows that is also in the span of {}, and any vector in the span of {} can be written as = ().
Next, we look at the case > 1. Let () be the vector with components tanh(y_{j}). Here, the above tanhapproximation is generally not valid, and () might protrude out of the plane spanned by {}. Thus, we need to write () = ()^{T} + , with orthogonal to {}. The squares of this equation fulfill ()^{2} = ()^{T}^{2} + ^{2}, from which follows:
()^{T}^{2}()^{2} .  (C.17) 
() < .  (C.19) 