Python AxmlParser
Fast CUDA Kernels for ResNet Inference. Using Winograd algorithm to optimize the efficiency of co...
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, ...
SGLang is a fast serving framework for large language models and vision language models.
IQ of AI
Receipts for creating AI Applications with APIs from DashScope (and friends)!
Benchmarking code for running quantized kernels from vLLM and other libraries
Open standard for machine learning interoperability