Python AxmlParser
Fast CUDA Kernels for ResNet Inference. Using Winograd algorithm to optimize the efficiency of co...
My configure file repo.
A high-throughput and memory-efficient inference and serving engine for LLMs
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, ...
SGLang is a fast serving framework for large language models and vision language models.
Benchmarking code for running quantized kernels from vLLM and other libraries
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop va...
IQ of AI