DeepSeek open source DeepEP – library for MoE training and Inference

DeepEP is a specialized communication library for Mixture-of-Experts (MoE) and expert parallelism. It offers high-throughput and low-latency GPU kernels for MoE dispatch and combine operations, supporting low-precision FP8 operations. Unique to DeepEP is its optimization for group-limited gating algorithms, enabling efficient data forwarding across different domains. The library also features low-latency kernels for inference decoding, SM number control, and hook-based communication-computation overlap methods. While certain instructions may lead to undefined behavior on non-Hopper architectures, DeepEP provides a versatile tool for optimizing model training and inference tasks.

https://github.com/deepseek-ai/DeepEP