M. Sugiyama (Germany, Japan), Y. Okabe, and H. Ogawa (Japan)
Machine Learning, Generalization Error, Input Noise, Measurement Noise, Perturbation, Model Selection
Estimating the generalization capability is one of the most important problems in supervised learning. Therefore, var ious generalization error estimators have been proposed so far, in the presence of noise in output values. On the other hand, noise often exists in input values as well as output values. In this paper, we therefore investigate the influ ence of input noise on a generalization error estimator. We focus on a particular generalization error estimator called the subspace information criterion (SIC), which is shown to be unbiased in the absence of input noise. Intuitively, small input noise does not seem to affect the unbiasedness of SIC severely because small input noise varies the output values only slightly if the learning target function is contin uous. On the contrary to this intuition, we show that even small input noise can totally corrupt the unbiasedness of SIC. This fact casts doubt on the use of SIC in the presence of input noise. To cope with this problem, we provide a sufficient condition to guarantee that SIC is unbiased in the limit of small input noise. We finally show that this condi tion is always fulfilled when the standard ridge estimation is used for learning, which allows us to use SIC without concern even in the presence of small input noise.
Important Links:
Go Back