Transformer related optimization, including BERT, GPT
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
A high-throughput and memory-efficient inference and serving engine for LLMs
A framework for few-shot evaluation of autoregressive language models.
New blog site using hexo.
Development repository for the Triton language and compiler
PArallel Distributed Deep LEarning (PaddlePaddle核心框架,高性能单机、分布式训练和跨平台部署)
PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT
Optimized primitives for collective multi-GPU communication