Learning in continuously changing environments
Rethinking stability and plasticity in continual reinforcement learning. This project was accepted as a poster at the main conference of ICML 2026.
Paper: https://arxiv.org/abs/2605.26357
Code [3D Four Rooms / Jax]: https://github.com/raymondchua/multi-timescale-successor-features-fourrooms
Code [MuJoCo / PyTorch]: https://github.com/raymondchua/multi-timescale-successor-features-mujoco
This blog post is based on our recent work on Balancing Plasticity and Stability with Fast and Slow Successor Features, which explores how reinforcement learning (RL) agents can remain robust in environments that evolve continuously over time. A preliminary version of this work was also presented at the Computational and Systems Neuroscience conference (Cosyne) in Lisbon, Portugal in 2021.
The challenge of learning in a changing world
Imagine learning to walk on ice during winter, only for the ground to slowly become dry pavement again by spring. Biological systems continuously adapt to environments that never remain fixed. Yet most Aritificial Intelligence(AI) systems struggle when the world changes over time.
One of the central challenges in both neuroscience and AI is balancing:
- Plasticity - the ability to adapt to new experiences
- Stability - the ability to preserve previously learned knowledge
Artificial Neural Networks (ANNs) are known to be highly plastic, but this often comes at the cost of catastrophic forgetting, where learning new information disrupts previously acquired knowledge.
This problem becomes even more severe in continual reinforcement learning, where
- Policies evolve during training,
- Data distribution shifts over time,
- And environmental dynamics themselves may change
The problem with how continual reinforcement learning is usually studied
Most prior work in continual RL studies abrupt task switches:
- One environment suddenly becomes another,
- or dynamics change instantaneously
However, real-world environments rarely evolve this way. Instead:
- Terrain gradually becomes slippery
- Robots slowly wear down
- Sensors drift over time
- Bodies change continuously
These forms of non-stationarity are gradual, persistent, and often smooth. This distinction turns out to matter. Abrupt changes place enormous demands on rapid adaptation. But under gradual environmental drift, excessive plasticity may actually become harmful because the agent overwrites useful knowledge.
This raises an important question: Under continuous environmental change, is the main bottleneck really insufficient plasticity - or is it instability?
Most researchers assume continual learning systems fail because they cannot adapt quickly enough.
Surprisingly, we found the opposite. Under gradual environmental change, the dominant problem is not insufficient plasticity, but insufficient stability.
Introducing naturalistic continuous dynamics drift
To study this problem, we introduced naturalistic continuous changes into standard Mujoco embodiments. Rather than switching abruptly between tasks, we continuously perturbed the embodiment mass of the agents by sampling from a noisy sinusoidal process. This produced environments whose dynamics evolve smoothly over time.
Figure 1 is an example based on the humanoid embodiment. This setup (Right in Figure 1) allowed us to investigate continual learning under persistent dynamics drift rather than discrete task boundaries.
Surprisingly, stability matters more than plasticity
We first compared approaches designed to:
- Increase plasticity
- Versus approaaches designed to preserve stability
Plasticity-oriented methods periodically reset subsets of the network parameters to encourage continued adaptation. Stability-oriented methods instead rely on consolidation mechanisms that preserve previously learned knowledge by:
- Protecting important synaptic parameters
- or modeling synaptic weight changes across multiple timescales.
Key insight
Much of the continual learning literature has focused on restoring plasticity. We expected these methods to excel under changing environments.
Instead, the opposite happened. Under gradual and continuous environmental drift, the dominant bottleneck was not insufficient plasticity — it was instability. Methods designed to preserve stability consistently outperformed methods designed to increase plasticity.
Below we show results for the humanoid embodiment:
Why predictive representations might matter
Among the stability-preserving approaches, the most effective mechanism was a neuro-inspired synaptic consolidation model that stabilizes learning across multiple timescales. Initially, this mechanism was applied directly to the parameters of the Q-value function.
If stability matters, then preserving memory becomes critical. But not all memories are equally useful. This led us to ask a deeper question:
If we can only preserve some aspects of past experience, what should we preserve? From a computational neuroscience perspective, this is a particularly compelling question because the brain appears to build reusable predictive maps of the world — often referred to as cognitive maps — rather than merely storing cached action values (Q-values).
If predictive representations such as Successor Features (SFs) capture aspects of these cognitive maps, this raises the possibility that biological memory systems may preferentially consolidate predictive structure instead of task-specific value estimates, while still maintaining the behavioral flexibility needed for survival.
Are Successor Features a better target for consolidation?
Interestingly, our results suggest that the answer depends on the severity of the environmental drift. When the environmental changes were relatively mild or moderate, consolidating Q-values remained surprisingly effective.
However, when the dynamics evolved to highly severe levels, such as right before the physics simulation became unstable, consolidating Successor Features became substantially more effective!
Under mild drift, the environment remains relatively close to stationary, making cached value estimates sufficiently reliable. But as the dynamics increasingly evolve over time, preserving predictive structure becomes more important for robust adaptation.
Why multiple timescales matter
The synaptic consolidation mechanism we used models memory across multiple timescales. Fewer consolidation variables correspond to shorter memory horizons, while more variables produce increasingly long-timescale memory traces.
Across the different embodiments, we consistently observed that longer-timescale consolidation improved robustness under continual environmental drift.
But this raised another question:
What roles do the individual timescales actually play during learning?
To investigate this, we introduced a cross-attention mechanism over the multi-timescale Successor Features.
The reward representation was treated as the query, while SFs operating at different timescales served as keys and values. This allowed us to measure how much each timescale contributed throughout training.
This provides evidence that long-timescale memories are not merely passive storage. Even while the environment changes, the agent continues to draw on slowly accumulated predictive knowledge. Thus, these results suggest that slowly changing predictive structure remains useful despite persistent non-stationarity.
Is it simply about having more parameters?
One possible explanation for the improved performance is that the consolidation mechanism simply introduces additional parameters. To test this possibility, we scaled the baseline models so that their parameter count matched the consolidation-based models.
Surprisingly, simply increasing model capacity was insufficient.
Even with comparable numbers of parameters, the baseline models still struggled under continuous dynamics drift. This suggests that the robustness arises from the consolidation mechanism itself rather than from additional capacity alone.
Towards lifelong learning systems
For years, continual learning research has largely focused on restoring plasticity. Our results suggest that under gradual and persistent change, the more fundamental challenge may be stability.
More importantly, not all knowledge is equally worth preserving. Predictive representations such as Successor Features appear to provide a particularly effective target for long-term memory, especially under severe non-stationarity.
More broadly, building AI systems capable of lifelong adaptation may require moving beyond purely plastic learners toward architectures that preserve predictive structure across multiple timescales, a principle that biological memory systems may have exploited all along.