LLM Architecture Calculator

config.json 到推理显存 · 8 个开源 SOTA 模型(Nemotron-3-Ultra / MiniMax-M3 / Kimi-K2.5 / DeepSeek-V4-Flash / GLM-5.1 / MiMo-V2-Flash / Qwen3.5-MoE / MiniMax-M2.7)的量化拆解
Architecture
Attention Mamba MoE/FFN Dense DeltaNet
Model Info
Vendor / Year
Params (Total / Active)
Context Window
Hidden Dim (d)
Layers (L)
Attention Type
Heads (Q / KV)
MoE Experts
License
Parameter Decomposition
ModuleParams%
Sum
FLOPs / Token
ComponentFLOPs%
Total
KV Cache
Per-sample KV cache
Inference Memory Footprint
Weights
KV Cache (× Batch)
Activations
Total Footprint
Minimum Deployment

训练成本估算

选择模型规模和硬件配置,预估训练时长与成本
15%25%35%45%60%
训练 Tokens
总训练 FLOPs
预估训练时长
预估成本

公式准确性说明: