FlashComm EP Kernels - Elegant Python host-side binding for LittleKernel. This module mirrors FlashComm's EPKernels class but uses LittleKernel's build() API instead of C++ extension modules. All CUDA ...
Launching pyptx — a Python DSL for writing NVIDIA PTX kernels directly. https://lnkd.in/e2yZSjs9 Today I'm open-sourcing a project I've been building on personal time: pyptx, a Python DSL where the ...
Python ≥ 3.10 PennyLane ≥ 0.34 NumPy ≥ 1.24 scikit-learn ≥ 1.3 matplotlib ≥ 3.7 Use these APIs when you already have application data, for example features from a physics simulator, sensor pipeline, ...
i hit cuBLAS level performance on Ada Lovelace using CuTe DSLs kernels in python i walk through a series of 6 kernels, show how to profile them and understand profiling bottlenecks, tons of visuals ...