ExNAS: Experiential neural architecture selection for real-time inference optimization
Abstract
Operational amnesia constitutes both a limiting condition and an emergent cost in neural networks: models process each input as if encountering it for the first time, unable to recall which computational patterns proved effective in similar contexts. This stateless processing wastes computational resources by re-solving familiar problems and prevents networks from leveraging their own operational history. We introduce ExNAS (Experiential Neural Architecture Selection), a system that addresses operational amnesia through real-time, fine-grained substructure selection (channels, neurons, or heads, depending on architecture) during inference. ExNAS maintains a lightweight experiential memory that records layer-wise activation fingerprints—not content, but process signatures from guardrail-satisfying runs. Given a new input, ExNAS retrieves similar past contexts and applies structurally guided selection across non-consecutive layers, effectively allowing the network to remember how it solved comparable problems. We evaluate ExNAS on compact CNNs and Transformer language models. On CNNs (CPU), ExNAS achieves up to 7.9% wall-clock time reduction without retraining. On Transformers (Qwen2-1.5B, DistilGPT-2), we observe +2 to 4% throughput gains under 1% or less perplexity degradation via experience-guided FFN selection. All configurations respect hardware alignment constraints for compatibility with optimized kernels. Results are reported with 95% confidence intervals over repeated runs. ExNAS demonstrates that addressing operational amnesia through inference-time experiential selection is feasible and effective under explicit quality guardrails—providing reproducible gains on modern architectures without retraining. Code and reproducibility harness are publicly released.