Python AxmlParser
Fast CUDA Kernels for ResNet Inference. Using Winograd algorithm to optimize the efficiency of co...
AI-powered drum removal tool using Meta's Demucs. Drop in any song, get a drumless backing track ...
SGLang is a fast serving framework for large language models and vision language models.
Simple app to learning the lure finish tech
AI Tensor Engine for ROCm
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, ...
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA d...
tabnotes
Benchmarking code for running quantized kernels from vLLM and other libraries