Yet another re-implement of jetson-containers, targeting for Jetson Thor, Spark, and x86.
A high-throughput and memory-efficient inference and serving engine for LLMs
Hackable and optimized Transformers building blocks, supporting a composable construction.
FlashInfer: Kernel Library for LLM Serving
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications...