Open deep learning compiler stack for cpu, gpu and specialized accelerators
Yet another re-implement of jetson-containers, targeting for Jetson Thor, Spark, and x86.
Hackable and optimized Transformers building blocks, supporting a composable construction.
A high-throughput and memory-efficient inference and serving engine for LLMs
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications...
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Wan: Open and Advanced Large-Scale Video Generative Models
FlashInfer: Kernel Library for LLM Serving
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling