Show HN: Stop GPU pods placement getting bottlenecked by reserved VRAM
We are looking for teams to give it a try.
More details to get a trial license - https://www.woolyai.com.
We are looking for teams to give it a try.
More details to get a trial license - https://www.woolyai.com.
We’ve come up with a different model, similar to how operating systems schedule tasks. Instead of carving up the GPU, we run multiple ML jobs inside a single shared GPU context and schedule their kernels directly. No slices, no preemption windows — just a deterministic, SLA-style kernel scheduler deciding which job’s kernels run when.
This results in the GPU behaving more like an always-on compute fabric rather than a dedicated device. SMs stay busy, memory stays warm, and high-priority jobs still get predictable latency. More details at https://woolyai.com/blog/a-new-approach-to-gpu-kernel-scheduling-for-higher-utilization/ Check out our technology at https://www.woolyai.com.
- Higher GPU utilization & lower cost Pack many jobs per GPU with WoolyAI’s server-side scheduler, VRAM deduplication, and SLO-aware controls. - GPU portability Run the same ML container on NVIDIA and AMD backends—no code changes. - Hardware flexibility Develop/run on CPU-only machines; execute kernels on your remote GPU pool.