Nested Learning: How Your Neural Network Already Learns at Multiple Timescales
•17 min read
#deep-learning#neural-networks#optimization#memory-consolidation#language-models#transformers#continual-learning
Modern language models suffer from anterograde amnesia. A new framework reveals deep learning isn't about stacking layers, but nested optimization at different frequencies.