rw-book-cover

Metadata

Highlights

  • Today, we officially release ERNIE 5.1. While inheriting the pre-training foundation of ERNIE 5.0, it compresses total parameters to approximately one-third and active parameters to approximately one-half, achieving leading foundational performance at its model scale using only about 6% of the pre-training cost of comparable models.
    To advance the evolution of large models toward autonomous decision-making agents, we built an entirely new disaggregated fully-asynchronous reinforcement learning infrastructure, specifically addressing the global optimization challenges posed by training-inference divergence, low resource utilization, and long-tail effects.
    On this foundation, through scaled agentic post-training combined with an end-to-end synergy strategy across environment, expert, and integration stages, we achieved a dual leap in both training efficiency and model capability, ensuring that the model maintains exceptional stability and outstanding performance even when handling complex long-tail tasks.
    As one of the current cost-performance benchmarks among Chinese-developed large models, ERNIE 5.1 achieves a leap forward in parameter efficiency and training cost optimization while maintaining flagship-level intelligence. Its performance has been validated on internationally authoritative leaderboards: on May 9, ERNIE 5.1 scored 1,223 to claim 4th place globally and 1st among Chinese models on the Arena Search leaderboard.
    ERNIE 5.1 on Arena Search Arena (View Highlight)
  • ERNIE 5.1: Outstanding Agent and Reasoning Capabilities, with World Knowledge Ranking Among Top-Tier Models#
    ERNIE 5.1 delivers strong results across multiple authoritative industry benchmarks, particularly in agentic capabilities, knowledge, reasoning, and deep search:
    1. Outstanding agentic capabilities on par with the world’s top models: On the τ³-bench and SpreadsheetBench-Verified agent evaluation tasks, ERNIE 5.1 surpasses DeepSeek-V4-Pro, with agentic capabilities approaching those of leading closed-source models. It also performs exceptionally well on the Search Arena leaderboard.
    2. Leading world knowledge and creative writing capabilities: On GPQA and MMLU-Pro evaluations, ERNIE 5.1 approaches the performance of leading closed-source models. In internal evaluations, ERNIE 5.1’s creative writing capabilities approach those of Gemini 3.1 Pro.
    3. Reasoning capabilities approaching leading closed-source models: On AIME26 (with tool use), a challenging mathematical competition benchmark, ERNIE 5.1 scores 99.6 — second only to Gemini 3.1 Pro.
      ERNIE 5.1 Benchmark (View Highlight)
  • Multi-Dimensional Elastic Pre-Training: Pre-training Compute Cost at Only 6% of Comparable Models#
    ERNIE 5.1 is derived from ERNIE 5.0, extracting the optimal sub-network architecture from ERNIE 5.0’s multi-dimensional elastic sub-model matrix to effectively inherit the knowledge and capabilities encoded in ERNIE 5.0 while significantly reducing pre-training cost. The R&D team proposed an innovative Once-For-All elastic training framework. While traditional approaches require separate pre-training runs for models at different scales, ERNIE 5.0 jointly optimizes a large number of sub-models with varying depths, expert capacities, and routing sparsity levels through a dynamic sampling mechanism within a single pre-training run, constructing a sub-model matrix that spans diverse parameter scales and computational budgets. Throughout this process, the model achieves elastic compression and expansion along three dimensions:
    Elastic depth: During training, the number of active Transformer layers is randomly varied, enabling sub-models at different depths to share weights and adaptively learn a balance between deep and shallow representations.
    Elastic width / expert capacity: The effective expert capacity in MoE layers is elastically controlled by varying the number of experts participating in routing. By dynamically sampling subsets of experts, the model learns to operate under both full and reduced expert-pool configurations, thereby improving expert utilization efficiency.
    Elastic sparsity: Through a variable Top-k routing mechanism, the number of activated experts is flexibly adjusted. Activating fewer experts reduces inference cost and improves decoding efficiency, while activating more enhances model capability, achieving a dynamic trade-off between inference overhead and performance. (View Highlight)
  • A Multi-Stage Reinforcement Learning Training Pipeline Centered on OPD, Ensuring Comprehensive Capability Integration#
    The post-training of conventional large language models (LLMs) typically follows a sequential pipeline, progressing from supervised fine-tuning (SFT) to multi-stage mixed reinforcement learning (Mixed RL). However, as model capabilities continue to scale, this sequential training paradigm has increasingly become a bottleneck, severely hindering the efficiency of research, development, and iteration. Moreover, attempting to fuse all capabilities within a single training stage introduces severe multi-objective optimization conflicts, making it extremely difficult to balance performance across different domain tasks and achieve Pareto optimality — improvements in one capability often come at the cost of regressions in another (i.e., the “seesaw” effect).
    To overcome these fundamental challenges, we propose a multi-stage reinforcement learning training pipeline centered on Multi-Teacher On-Policy Distillation (MOPD). This pipeline significantly accelerates the R&D cycle through parallelized expert model training while ensuring comprehensive and conflict-free capability integration. Specifically, the post-training pipeline of ERNIE 5.1 is a four-stage process that decouples expert training from unified capability fusion:
    Stage 1: Unified Supervised Fine-Tuning (SFT). High-quality multi-domain instruction data is leveraged for fine-tuning, establishing the model’s foundational capabilities in instruction following and tool invocation, which serve as the initialization checkpoint for subsequent capability expansion.
    Stage 2: Domain Expert Model Training. Multiple domain-specific expert models (e.g., code, reasoning, agentic tasks) are trained in parallel. Each direction independently customizes its dedicated reward signals and training algorithms, fundamentally eliminating mutual interference across heterogeneous tasks.
    Stage 3: On-Policy Distillation (OPD). With the unified SFT model as the student and multiple domain expert models as teachers, the student samples from its own policy distribution and concurrently learns from multiple teachers’ capabilities via token-level reverse KL divergence, efficiently consolidating the capabilities of diverse experts into a unified parameter space.
    Stage 4: General Online Reinforcement Learning (General-RL). Following the initial OPD stage, we deliberately introduce an online RL phase tailored for general-purpose conversational scenarios. Our experiments reveal that not all tasks are amenable to capability fusion via token-level KL-based OPD. Specifically, tasks characterized by high-entropy distributions — such as open-ended chat or creative writing — tend to suffer from low distillation efficiency and may cause excessive smoothing of the output probability distribution. To address this, we forgo distillation for this domain and instead apply online RL on top of the post-OPD model. This stage ensures the model’s instruction-following capability, generation diversity, and improved alignment with human preferences, substantially enhancing general-purpose competence while preserving the expert capabilities acquired in earlier stages.
    Illustration of ERNIE 5.1 Post-Training Pipeline (View Highlight)
  • Outstanding Creative Capabilities#
    Through iterative optimization of the technical architecture and targeted refinement of core technologies, ERNIE 5.1 delivers a comprehensive upgrade in foundational capabilities while also excelling in creative performance.
    Whether it is the precise alignment of “inspiration–emotion–expression” in creative writing, the coordinated control of logic–character–pacing in long-form narrative, or the dual balance of knowledge accuracy–stylistic adaptability in professional content, ERNIE 5.1 consistently penetrates beyond users’ surface-level requests to capture their core intent, producing work that is warm, deep, and logical — exceeding expectations. This closed-loop capability from intent insight to content creation achieves not only precise synergy between comprehension and generation at the technical level, but has also earned widespread recognition from creative enterprises, content platforms, and professional writers — regarded as a benchmark creative model that understands users, understands content, and understands context.
    ERNIE 5.1 Creative Capabilities (View Highlight)