[RL4LLM] 异步RL框架: Areal
https://github.com/inclusionAI/AReaL 纯异步RL方案 异步PPO训练调用流程 graph TD A[用户执行: examples/run_async_ppo.sh] --> B[training/main_async_ppo.py] B --> C[AsyncPPOMATHConfig配置解析] C --> D[training/utils.py: run_experiment] D --> E[Ray初始化] E --> F[exp_cfg.initial_setup] F --> G[AsyncRLExperimentConfig.initial_setup] G --> H[创建ExperimentConfig] H --> I[启动Workers] I --> J[MasterWorker] I --> K[ModelWorker] I --> L[GenerationServer] I --> M[GserverManager] I --> N[RolloutWorker] %% MasterWorker训练流程 J --> J1[MasterWorker._poll_async] J1 --> J2[FunctionExecutor.execute_step] J2 --> J3[执行数据流图遍历] J3 --> J4[发送训练请求到ModelWorker] %% ModelWorker处理流程 K --> K1[ModelWorker._poll] K1 --> K2[接收MasterWorker请求] K2 --> K3[处理训练/推理请求] K3 --> K4[执行模型前向/反向传播] %% Rollout流程 N --> N1[RolloutWorker._poll_async] N1 --> N2[load_next_data] N2 --> N3[allocate_new_rollout] N3 --> N4[agent.collect_trajectory] N4 --> N5[env.step计算奖励] N5 --> N6[推送数据到训练端] %% 生成服务器流程 L --> L1[GenerationServer._poll] L1 --> L2[启动SGLang子进程] L2 --> L3[处理生成请求] %% 生成服务器管理器 M --> M1[GserverManager._poll] M1 --> M2[HTTP服务线程] M2 --> M3[请求调度和权重更新] %% 数据流 N6 --> O[stream_dataset.py] O --> J4 %% 异步通信 J4 -.->|异步请求| K2 N3 -.->|HTTP请求| M2 M2 -.->|调度请求| L3 %% 权重更新 K4 --> P[参数更新] P --> Q[权重同步] Q --> M3 M3 --> R[更新生成服务器权重] style A fill:#e1f5fe style J fill:#f3e5f5 style K fill:#e8f5e8 style L fill:#fff3e0 style M fill:#fce4ec style N fill:#f1f8e9 用户入口到配置解析 examples/run_async_ppo.sh → training/main_async_ppo.py ...