Continual Learning Baselines: Why Avalanche Is the Standard Benchmark for Catastrophic Forgetting

The problem of catastrophic forgetting — where a neural network loses previously learned knowledge when trained on new tasks — has long plagued deep learning. ContinualAI, an open-source community, has tackled this head-on with its continual-learning-baselines repository. The project provides a unified, reproducible implementation of over a dozen major continual learning strategies, all built on top of the Avalanche framework. Algorithms include Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI), Gradient Episodic Memory (GEM), its variant AGEM, Learning without Forgetting (LwF), iCaRL, and GDumb. The repository standardizes data loading, model architectures, evaluation metrics, and logging, making it trivial for researchers to compare methods on benchmarks like Split MNIST, Permuted MNIST, and CIFAR-100. With over 350 stars and active daily contributions, it has become the go-to resource for academics and practitioners entering the field. However, the project currently lacks support for modern Transformer-based continual learning methods, and newcomers must first learn the Avalanche API. This article dissects the technical architecture, evaluates the trade-offs of each included strategy, and offers predictions on how this baseline will evolve as the field matures.

Technical Deep Dive

The continual-learning-baselines repository is not just a collection of algorithms — it is a meticulously engineered benchmarking system. At its core lies the Avalanche framework, which provides a modular pipeline for continual learning experiments. Avalanche handles dataset streaming (task-incremental, class-incremental, domain-incremental scenarios), model management, metric logging, and checkpointing. The baselines repository extends Avalanche by implementing each algorithm as a standalone plugin that can be swapped with a single configuration change.

Architecture Overview:
- Data Flow: Each benchmark (e.g., Split MNIST) is converted into a sequence of experiences. Avalanche’s `ExponentialLR` or `MultiTask` scenarios define how tasks arrive. The repository uses `Naive` (fine-tuning) as the lower-bound baseline and `Joint` (training on all data at once) as the upper bound.
- Model Wrapping: A standard multi-head or single-head classifier is used. For EWC, the model is wrapped with a `RegularizationPlugin` that computes Fisher information after each task. For memory-based methods like GEM, a `MemoryPlugin` stores a subset of past examples.
- Evaluation: Metrics include average accuracy, forgetting measure (difference in accuracy on previous tasks), and forward/backward transfer. The repository logs these via Avalanche’s `MetricTracker`.

Algorithm Implementations:
- EWC (Elastic Weight Consolidation): Adds a quadratic penalty to the loss for weights that were important for previous tasks. The repository computes the Fisher information matrix using a diagonal approximation. A key parameter is `ewc_lambda`, which controls the strength of regularization. Typical values range from 0.1 to 1000 depending on the task.
- SI (Synaptic Intelligence): Similar to EWC but uses online importance estimates during training rather than post-hoc Fisher computation. This makes it more computationally efficient but slightly less stable.
- GEM (Gradient Episodic Memory): Stores a small episodic memory (e.g., 200 samples per task). During training, it projects the gradient to avoid increasing loss on any previous task. The repository uses a quadratic programming solver (CVXPY) for the projection step, which can be slow for large memories.
- AGEM (Averaged GEM): A faster approximation that projects the gradient onto a single randomly sampled constraint from memory, reducing the quadratic solve to a dot product. Trade-off: slightly lower accuracy but much faster.
- iCaRL (Incremental Classifier and Representation Learning): Combines knowledge distillation (LwF-style) with a nearest-class-mean classifier and exemplar selection. The repository implements herding selection for memory update. iCaRL is particularly strong for class-incremental learning where task boundaries are unknown.
- LwF (Learning without Forgetting): Uses distillation loss on the outputs of the previous model for new task data. No memory required. The repository implements a temperature-scaled softmax for distillation.
- GDumb (Greedy Dumb): A surprisingly simple baseline that stores a balanced memory buffer and trains only on that buffer from scratch at each step. Despite its simplicity, it often outperforms more complex methods on certain benchmarks.

Benchmark Performance:
The following table shows average accuracy on Split MNIST (5 tasks, 2 classes each) from the repository’s own evaluation scripts. Results are averaged over 5 runs with standard deviation.

| Algorithm | Avg Accuracy (%) | Forgetting (%) | Memory Size | Training Time (s) |
|---|---|---|---|---|
| Naive (fine-tune) | 19.2 ± 1.5 | 80.8 | 0 | 45 |
| EWC (λ=100) | 87.3 ± 2.1 | 12.7 | 0 | 62 |
| SI (c=0.1) | 88.1 ± 1.8 | 11.9 | 0 | 58 |
| GEM (mem=200/task) | 91.5 ± 1.2 | 8.5 | 1000 | 210 |
| AGEM (mem=200/task) | 88.9 ± 1.6 | 11.1 | 1000 | 95 |
| iCaRL (mem=200/task) | 92.3 ± 1.0 | 7.7 | 1000 | 180 |
| LwF (T=2) | 85.6 ± 2.3 | 14.4 | 0 | 50 |
| GDumb (mem=1000) | 90.1 ± 1.4 | 9.9 | 1000 | 120 |
| Joint (upper bound) | 97.8 ± 0.5 | — | all data | 300 |

Data Takeaway: Memory-based methods (GEM, iCaRL, GDumb) consistently outperform regularization-only methods (EWC, SI) on Split MNIST, but at the cost of storing past data. iCaRL achieves the best accuracy-forgetting trade-off. However, the gap narrows on more complex datasets like CIFAR-100, where regularization methods often struggle.

Limitations of the Current Implementation:
- No Transformer Support: All models are simple CNNs (e.g., two convolutional layers + MLP). Modern continual learning research increasingly uses Vision Transformers (ViT) with prompt-based methods (e.g., L2P, DualPrompt). The repository does not include these.
- Fixed Task Boundaries: The benchmarks assume clear task boundaries. Real-world scenarios often involve blurry boundaries or gradual shifts.
- Memory Scaling: The episodic memory is limited to a few hundred samples. For large-scale datasets, this becomes a bottleneck.

Takeaway: The repository excels at providing a fair, reproducible playground for classic algorithms. Researchers should use it as a sanity check before proposing new methods, but must be aware that results may not transfer to modern architectures or real-world deployment.

Key Players & Case Studies

The continual-learning-baselines repository is maintained by the ContinualAI community, a non-profit organization founded by researchers from the University of Tübingen, MIT, and DeepMind. Key contributors include:
- Vincenzo Lomonaco (University of Pisa): Lead maintainer of Avalanche and a prominent figure in continual learning. His work on the CORe50 benchmark is widely cited.
- Lorenzo Pellegrini (University of Bologna): Core developer of the Avalanche framework, focusing on efficient data streaming and metric logging.
- Massimo Cossu (University of Pisa): Contributed the GEM and AGEM implementations, as well as the herding selection for iCaRL.

Comparison with Other Baselines:

| Feature | continual-learning-baselines (Avalanche) | FACIL (Facebook AI) | Sequoia (Mila) |
|---|---|---|---|
| Framework | Avalanche (PyTorch) | PyTorch | PyTorch |
| Algorithms | 12+ classic methods | 5 methods | 8 methods |
| Benchmark Support | MNIST, CIFAR, CORe50, TinyImageNet | MNIST, CIFAR | MNIST, CIFAR, Atari |
| Multi-GPU Support | Yes | No | No |
| Latest Update | 2025 (active) | 2022 (archived) | 2023 (low activity) |
| Community Size | ~350 stars, 100+ forks | ~200 stars | ~150 stars |

Data Takeaway: ContinualAI’s repository has the most active development and broadest algorithm coverage. FACIL, while historically important, is no longer maintained. Sequoia offers reinforcement learning benchmarks but lacks the breadth of Avalanche.

Case Study: iCaRL in Production
A notable real-world application is in robotics at the Max Planck Institute for Intelligent Systems, where researchers used iCaRL (via this repository) to enable a robot to continuously learn new object categories without forgetting old ones. The robot’s vision system used a ResNet-18 backbone, and the herding memory allowed it to retain 95% accuracy on previously seen objects after learning 20 new categories. The repository’s standardized evaluation made it easy to compare iCaRL against EWC and GEM, leading to the selection of iCaRL for deployment.

Takeaway: The repository’s value extends beyond academia — it provides a reliable starting point for industry teams building lifelong learning systems, especially in robotics and autonomous driving where data distribution shifts are common.

Industry Impact & Market Dynamics

Continual learning is a critical enabler for several high-growth markets:
- Autonomous Vehicles: Cars must adapt to new driving scenarios (e.g., snow, construction zones) without forgetting previous skills. Companies like Waymo and Tesla invest heavily in continual learning, though their proprietary systems are not public.
- Personalized AI Assistants: Voice assistants (e.g., Amazon Alexa, Google Assistant) need to learn user-specific preferences over time without retraining from scratch. The market for AI assistants is projected to reach $30 billion by 2027 (Statista).
- Edge AI: Deploying models on resource-constrained devices requires efficient continual learning to adapt to new data without cloud retraining. The edge AI market is expected to grow at 20% CAGR to $50 billion by 2028.

Market Data:

| Sector | 2024 Market Size | 2030 Projected Size | Continual Learning Relevance |
|---|---|---|---|
| Autonomous Vehicles | $60B | $400B | High (safety-critical adaptation) |
| AI Assistants | $15B | $30B | Medium (personalization) |
| Edge AI | $20B | $50B | High (on-device learning) |
| Healthcare Diagnostics | $10B | $40B | Medium (domain adaptation) |

Data Takeaway: The autonomous vehicle and edge AI sectors have the highest need for continual learning due to their dynamic, safety-critical environments. The baselines repository serves as a foundational tool for R&D teams in these industries.

Adoption Challenges:
Despite the repository’s quality, industry adoption remains limited. A 2024 survey by the ContinualAI community found that only 15% of companies use continual learning in production. The main barriers are:
1. Lack of Standardized Benchmarks for Real-World Data: Most benchmarks use toy datasets. Companies need to adapt the code to their proprietary data.
2. Computational Overhead: Memory-based methods like GEM require storing past data, which raises privacy concerns in regulated industries (e.g., healthcare).
3. Integration Complexity: The Avalanche framework requires significant refactoring of existing training pipelines.

Takeaway: The repository will likely see increased adoption as more companies move from proof-of-concept to production, but only if the community adds support for privacy-preserving methods (e.g., differential privacy) and Transformer architectures.

Risks, Limitations & Open Questions

1. Catastrophic Forcing of Benchmarks: The repository’s focus on Split MNIST and Permuted MNIST may lead to overfitting to these benchmarks. A method that works well on 5-task MNIST may fail on 100-task ImageNet. The community needs more diverse, large-scale benchmarks.
2. Reproducibility vs. Flexibility: The strict API makes it easy to reproduce results but hard to experiment with novel architectures. Researchers often fork the repository and modify the core code, defeating the purpose of standardization.
3. Memory Privacy: GEM and iCaRL store raw training samples. In applications like medical imaging, this is a data protection risk. The repository does not offer encrypted memory or synthetic replay alternatives.
4. Lack of Meta-Learning Integration: Many modern continual learning methods combine meta-learning (e.g., MAML) with regularization. The repository does not include any meta-learning baselines.
5. Evaluation Metrics Debate: The repository uses average accuracy and forgetting, but these metrics can be misleading. For example, a method that maintains high accuracy on early tasks but fails on later tasks may still score well. The community is still debating better metrics like backward transfer and forward transfer.

Open Question: Can the repository’s design be extended to support task-free continual learning, where task boundaries are unknown? Current implementations assume explicit task IDs, which is unrealistic for many applications.

AINews Verdict & Predictions

The continual-learning-baselines repository is an indispensable resource for the continual learning community. It has lowered the barrier to entry, enabled fair comparisons, and accelerated research. However, its value is currently limited to classic algorithms and small-scale benchmarks.

Predictions:
1. Within 12 months, the repository will add support for Transformer-based methods (L2P, DualPrompt, CODA-Prompt) due to community demand. This will be driven by the growing popularity of prompt-based continual learning.
2. Within 24 months, the repository will incorporate privacy-preserving memory mechanisms (e.g., generative replay with diffusion models) to address industry concerns.
3. The repository will become the de facto standard for continual learning benchmarking, similar to how GLUE and SuperGLUE became standards for NLP. We expect it to be cited in over 500 papers by 2027.
4. A major cloud provider (AWS, GCP, Azure) will integrate Avalanche into their ML platforms to offer continual learning as a managed service, leveraging the baselines repository as the reference implementation.

What to Watch: The next major update should include a leaderboard for large-scale benchmarks (e.g., ImageNet-1000 with 100 tasks). If the community delivers this, the repository will cement its position as the definitive benchmark suite for lifelong learning.

More from GitHub

常见问题

GitHub 热点“Continual Learning Baselines: Why Avalanche Is the Standard Benchmark for Catastrophic Forgetting”主要讲了什么？

The problem of catastrophic forgetting — where a neural network loses previously learned knowledge when trained on new tasks — has long plagued deep learning. ContinualAI, an open-…

这个 GitHub 项目在“continual learning baselines avalanche benchmark”上为什么会引发关注？

The continual-learning-baselines repository is not just a collection of algorithms — it is a meticulously engineered benchmarking system. At its core lies the Avalanche framework, which provides a modular pipeline for co…

从“ewc vs icarl comparison continual learning”看，这个 GitHub 项目的热度表现如何？

当前相关 GitHub 项目总星标约为 352，近一日增长约为 0，这说明它在开源社区具有较强讨论度和扩散能力。