Estimating the Probability of Sampling a Trained Neural Network at Random

We present and analyze an algorithm for estimating the size, under a Gaussian or uniform measure, of a localized neighborhood in neural network parameter space with behavior similar to an ``anchor'' point. We refer to this as the "local volume" of the anchor. We adapt an existing basin-volume estimator, which is very fast but in many cases only provides a lower bound. We show that this lower bound can be improved with an importance-sampling method using gradient information that is already provided by popular optimizers. The negative logarithm of local volume can also be interpreted as a measure of the anchor network's information content. As expected for a measure of complexity, this quantity increases during language model training. We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior. This smaller volume can also be interpreted in an MDL sense as suboptimal compression. Our results are consistent with a picture of generalization we call the "volume hypothesis": that neural net training produces good generalization primarily because the architecture gives simple functions more volume in parameter space, and the optimizer samples from the low-loss manifold in a volume-sensitive way. We believe that fast local-volume estimators are a promising practical metric of network complexity and architectural inductive bias for interpretability purposes.

Previous
Previous

Mechanistic Anomaly Detection for "Quirky" Language Models

Next
Next

Examining Two Hop Reasoning Through Information Content Scaling