Open deep learning compiler stack for cpu, gpu and specialized accelerators
Yet another re-implement of jetson-containers, targeting for Jetson Thor, Spark, and x86.
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications...
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
A high-throughput and memory-efficient inference and serving engine for LLMs
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Fast and memory-efficient exact attention