From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Dec 03 2003 - 08:38:14 MST
Mitchell Porter wrote:
> 
> Noise can prevent invalid generalization ("overfitting"):
> http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-2.html
> http://www.inf.ethz.ch/~schraudo/NNcourse/overfitting.html
> http://citeseer.nj.nec.com/bishop94training.html
> http://citeseer.nj.nec.com/448824.html
While I don't know enough about this topic to be an expert, Bayesian 
methods are supposed to automatically prevent overfitting - or rather, 
overfitting results from violation of Bayesian first principles.  On the 
other hand, true Bayesian methods might be too expensive.  My point is 
that noise is not the only possible way of preventing overfitting; there 
are deeper, more powerful, mathematically more elegant ways of preventing 
overfitting, to which the injection of noise is only an approximation.  A 
cheap, good approximation?  Perhaps.  But it isn't magic.  Nor is the 
relation between noise and preventing overfitting as deep as it seems. 
According to Bishop94 above, for example, it amounts to adding an extra 
term in the loss function, and it is possible to get the same benefit 
without some dangers by modifying the algorithm directly instead of 
training with noise.
> And noise can make a weak signal detectable ("stochastic resonance"):
> http://mathworld.wolfram.com/StochasticResonance.html
> http://flux.aps.org/meetings/YR9596/BAPSMAR96/mar96/vpr/A5.04.html
> http://www.umbrars.com/sr/introsr.htm
One of these papers describes how adding a noise term to a weak input 
signal to a cricket *neuron* resulted in more information in the spike 
train emerging *from the cell*; it doesn't mean that adding noise to a 
weak signal actually adds information to it!  Furthermore it seems fairly 
obvious how adding to noise to a weak input signal could result in more 
information from the spike train, given the cell's *lossy* processing 
properties; if the noise boosts the weak signal into the steepest part of 
the slope of the cell's response threshold, the resulting spike train may 
contain more information or better temporal information (if previously 
weak signals needed to accumulate before the cell fired).  Perhaps the 
cricket cells evolved to work in the presence of noise, or in the presence 
of sound at a particular threshold value, but that does not mean the noise 
"boosts processing efficiency"; an ab initio algorithm could extract more 
data from the signal without the noise.
Noise is not magic, and for noise to result in any fundamental algorithmic 
improvement would, I maintain, violate the second law of thermodynamics.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Tue Feb 21 2006 - 04:22:26 MST