Python AxmlParser
Fast CUDA Kernels for ResNet Inference. Using Winograd algorithm to optimize the efficiency of co...
A high-throughput and memory-efficient inference and serving engine for LLMs
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop va...
IQ of AI
Receipts for creating AI Applications with APIs from DashScope (and friends)!
ModelScope: bring the notion of Model-as-a-Service to life.
SGLang is a fast serving framework for large language models and vision language models.