MLA 架构简化图 — 关键参数 & Tensor 流

MLA (Multi-head Latent Attention) 简化数据流图

Non-Absorbed 路径 · 以 DeepSeek-V3 参数为例 (h=128, d_c=512, d_rope=64, d_head=128)

Tensor (shape)

Linear 投影

操作节点

KV Cache 存储

Attention 计算