Use your Nvidia GPU's VRAM as swap space on Linux (opens in new tab)

(github.com)

472 pointstanelpoder22d ago126 comments

126 comments

100 comments · 36 top-level

yjftsjthsd-h22d ago· 10 in thread

> Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work.

Well, that does at least answer my immediate question about why I would ever swap from expensive RAM to really expensive RAM:) Feels niche, but when you want it it's a good idea.

Wowfunhappy22d ago

Another possible reason that occurred to me: what if you have VRAM but you're not using it all the time? For example, let's say you bought a GPU because you like to play video games. When you're not actively gaming, you probably don't need 16 GB of VRAM just to render the desktop. Might as well use it for something else, right?

Edit: Although, this is predicated on the system being able to release VRAM that is acting as swap when it's time to start a game. Can it do that?

c0dejedi21d ago

I am catching up on comments

The reason I wrote this is I run this laptop in hybrid (AMD display + NVIDIA as swap). So all at VRAM was going to waste.

On your question re: switchable swap. It's on my to-do list ;)

1 more reply

Saris22d ago

It's easy enough to 'offline' swap space on Linux normally so I suspect that would work fine, as long as you didn't instantly run out of RAM when doing so.

1 more reply

nuccy21d ago

Best case is if gaming and productivity (with high memory use) activities are not concurrent, and productivity applications are stopped before gaming starts, then `swapoff` can easily release swap device without restart.

ornornor21d ago

> you probably don't need 16 GB of VRAM just to render the desktop

Microsoft: hold my beer

1 more reply

ErroneousBosh21d ago

In the olden days we called that a "RAM Disk" and it made our Atari STs go really fast!

On the old Amstrad PCWs that were everywhere at least in the UK in the mid 80s to mid 90s you could have up to 512kB of RAM, a fair chunk of which could be a RAM disk. This made compiling stuff in Turbo Pascal really fast too :-)

3form21d ago

Except swap is, like, opposite of RAM disk.

That said, still an nice and fun concept. Though caching got better since I assume :)

1 more reply

Phelinofist21d ago

So can VRAM actually be used like regular RAM? E.g. if I have a 16GB module and my GPU has 16GB VRAM, could it be made so that my system reports 32GB RAM? What would be the implications of that?

tobyhinloopen21d ago

It behaves like slower ram I assume, due to the increased distance from the CPU and overhead. Still, it’s much faster than normal SWAP which uses a disk or SSD.

How it is reported? As SWAP space, not as RAM.

Tuna-Fish21d ago

Typical desktop GPU ram does not support being write-back cached by the CPU. With PCIe resizable BAR, you could map the area into ram, so you could technically fit 32GB to memory, but it would have to be uncached (or write-combine cached), which would make it really, really slow.

There are a bunch of datacenter GPUs that support full cache coherency, but if you used them like that the VRAM would be very high latency from the CPU. So it would only be really slow.

3 more replies

dragontamer22d ago· 9 in thread

Remember how 16GBs used to be an enterprise level database mainframe?

Well, GPUs also have stupid amounts of compute on them. I have to imagine that there is some kind of database format that's useful with GPU compute attached.

Since the data is already in VRAM, the GPU can sort, join, or otherwise manipulate data as needed.

tmostak22d ago

GPU-accelerated databases have a long history. I founded HeavyAI (previously MapD/OmniSci) in 2013, but there are or have been many other startups in this space, such as Voltron Data, Kinetica, Sqream, etc. And now you have major players like IBM, Starburst, and Microsoft (which just announced Fabric SQL on GPU today) working on their own GPU-accelerated systems. GPUs have a huge advantage in terms of compute, memory, and interconnect bandwidth over CPU, as long as you can keep them fed with data.

I believe within 2-3 years databases and data warehouses on GPU will be common. The widespread use of agents to query data will be a part of this, as there will be a need to run far more queries at lower latency than needed for the ETL and BI workloads of the past.

myself24821d ago

And smart NICs are moving significant amounts of compute directly onto the network interface, though I haven't seen anyone combining a GPU and a 100GbE NIC into a single part yet.

Where does a few more steps of evolution take us? A wide path between a few heavy devices, and then the CPU off to the side just orchestrating the data flow?

c0dejedi21d ago

Insightful take, looking into these

einichi22d ago

oh god please don't create more demand for GPUs

giancarlostoro22d ago

Can we somehow make them work with 1 TB PCIes so we can churn through way more data?

dragontamer22d ago

Have you heard of the "Radeon Pro SSG" ??

It must have failed because I never heard of an update to this GPU. But AMD definitely made a GPU with 4x NVMe SSDs attached to the GPU.

strictnein22d ago

You are able to use GPU Direct Storage to communicate between the GPU and PCIE storage devices. It's nice, but it's not typically as performant as one would like, in comparison to the onboard memory.

https://docs.nvidia.com/gpudirect-storage/

https://github.com/microsoft/DirectStorage/tree/main

the847221d ago

linux has P2P-DMA for this. The drivers, devices and bus topology need to support it though.

https://docs.kernel.org/driver-api/pci/p2pdma.html

1 more reply

Nate75Sanders22d ago

Possibly LSM compaction.

dlt71370522d ago· 8 in thread

Does anyone these days really use swap for anything than S4 suspend ?

kccqzy22d ago

https://news.ycombinator.com/item?id=40697318

This HN comment and the linked post brought up a lot of good points. The main takeaway is that swap should primarily be considered a mechanism for equality of reclamation, not for emergency extra memory, where equality of reclamation means file-backed pages and anonymous pages are subject to similar criteria for being evicted from physical memory.

I used to have zero swap on my Linux desktop and this convinced me to add at least a small swap partition.

dlt71370521d ago

My point is not to say that swap should not be configured on a Linux system. On bare-metal machines, I personally always set a swap partition equal in size to the amount of RAM because I usually want to be able to put the machine into S4 (suspend to disk).

I don't consider swap to be emergency RAM storage. I know that the kernel will decide by itself to use swap even if it has plenty of available RAM and the swappiness threshold is not reached.

Nevertheless, my two decent laptops (one with 16 GB RAM, the other with 64 GB RAM) never swap, even with Docker Swarm and multiple stacks, multiple VMs, desktop activities, and gaming.

It's been a while since I last saw a physical machine actively swapping.

I understand that some limited hardware may need swap, but I can't see such hardware having a GPU with plenty of VRAM.

That said, hacking things is always fun :)

1 more reply

sidewndr4622d ago

I just set swappiness to zero years ago and never looked back.

1 more reply

Saris22d ago

It's useful on lower RAM systems as the least frequently used memory can be moved to swap, freeing up more RAM for stuff that needs it. Even when using zram it works out pretty well on my laptop with 8GB of RAM, it'll often have 4GB+ in zram swap space compressed down to only 1GB or so of physical RAM usage.

yjftsjthsd-h22d ago

It really depends on what you run and how much RAM you have to do it in. I run some machines into swap just by running a couple browsers and some containers in the background on a 16GB laptop. I've also run a single light browser and essentially nothing else on 4GB and been fine:)

dlt71370522d ago

I'm very surprised, I run a docker swarm with multiple docker stacks on my 16GB RAM laptop which is also my main machine. i have multiple browsers each with multiple tabs. I also run multiple VM (Qemu/KVM) and even by gaming on top of all of that, I can not make it swap.

Edit: Typo

xxs21d ago

>S4 suspend

Is not popular in general, so yes. But also no - I don't use swap ever, if I have to go over the RAM (32GB being low, with 64GB the norm), might as well consider the system dead.

the847221d ago

For me opening huge datasets, e.g. many gigabytes worth of profiling data, combined with other stuff running on the system, can end up pushing things to swap.

RachelF22d ago· 7 in thread

Nice idea, but something has gone very wrong here:

>Sequential throughput: ~1.3 GB/s

[on a RTX 3070 Laptop]

This RTX 3070 chip is on PCIe 4.0 x16 which should give 64GB/s. The 8GB of GDDR6 is 448GB/s.

Swapping to an NVMe drive would be twice as fast, but with higher latency.

Teknoman11722d ago

Gen 4.0 x16 is 32 GB/s in each direction, but the way this is implemented is not the way you'd go about this if you wanted high performance.

Edit: Their benchmarks are also run using ZRAM, which compresses pages before writing to swap. Not sure what the performance overhead of that is, but it's probably quite a bit.

First of all, it's a userspace program hooking the nbd driver, which is known for being slow. It also uses a bounce buffer in userspace before transferring to the GPU. So when the kernel needs to swap a page, it has to first copy it into a userspace facing buffer. The userspace program that has to wake back up and issue the cuda operation to copy the page into device memory.

nbd also doesn't really do a good job of supporting high queue depth or merging adjacent accesses. So if the kernel is issuing a bunch of 4K page swaps without any coalescing, you're going to end up with at least million kernel/userspace context switches per second just to handle 4 GB/s (4 GB / 4K page), let alone 64 GB/s. And that's just the NBD portion, forget the mess that is the NVIDIA driver. PCIe can move a lot of data, but in order to get anything even resembling the full bandwidth, you have to have use DMA engines with long page lists. Having to set up a transfer for every 4K page over PCIe will not reach full saturation of the bus.

Swapping to NVMe is a very optimized path -> the swapper can submit lists of pages directly to the NVMe driver and the controller can DMA them directly out of RAM, no copies or context switches CPU side at all.

This could probably be improved by migrating to the ublk driver as it might let you avoid the userspace bounce buffer. It'd also be able to have multiple write queues to at least set up CUDA copies in parallel.

tumblestick21d ago

It's true that Linux kernel is the throughput bottleneck. Unfortunately, the optimizations described above aren't sufficient to get within even 10% of hardware bandwidth.

Even if the swap system overhead drops to just a data copy, the memory management layer prevents swap from scaling to higher bandwidths. The issue is not data movement; it is in the page unmapping step (which requires expensive TLB shootdowns). Larger kernel changes are required.

My group wrote a paper on this: https://dl.acm.org/doi/10.1145/3731569.3764842

Linux's swap system is undergoing some large refactors lately. Hopefully some insights either from our work or Hermit (NSDI '23) can make it in to the mainline. I think Hermit's `rmap` optimization in particular is a candidate for upstream use.

lstodd22d ago

yup. it's nbd and userspace making it slow. zram on the other hand adds little.

one can get rid of zram and just reimplement some compression in shaders but I think that would be a pointless optimization.

dannyw21d ago

Swapping to a NVMe will also consume PE cycles on your NAND, ie wearing it out over time.

RAM/VRAM don’t degrade from use.

markhahn21d ago

flash is a consumable, yes.

but flash endurance isn't a strong argument here. you probably have O(TB) of flash, and aren't going to produce PB of swap writes any time soon. if you do a lot of swapping to a small flash device, it'll happen sooner.

I'm typing from a quite old 4GB laptop, which swaps heavily to a 250G SATA ssd. sure, it's not great, but it also costs zero. currently 9GB of swap is used, and it's not really noticeable. if I open 20 more tabs, it can introduce pauses.

google says this drive was released in 2014, and SMART says POH is about 10 years.

SMART also says wear leveling count is 665 and total written is 165327189538 LBAs (78834 GiB, or 338 drive-writes). I'm not expecting it to die soon, though using a 4G laptop is a bit of a stunt these days...

the point is that a system that has sustained heavy swapping for years has not generates so many writes to worry much. a modern system with 10x speed and 10x capacity (and probably less RAM deficit) would have even less effect. even for QDR with it's few-hundred cycle endurance spec...

LtdJorge21d ago

I guess you haven’t tried AMD’s composable kernel on Gentoo, or qtwebkit. I have a special env for the former called half-the-threads because it eats 2.5GB per thread. I removed the latter as soon as I was able to. I even add 32GB (half my RAM) of ZRAM for CK, and the Gentoo ebuild has a check for enough RAM per thread that stops the build if unmet, it wasn’t there before and I’ve had my system lock up because of OOM which OOMD wasn’t quick enough to catch.

All of this is to say that, it does have a potential impact on flash, if you rebuild often, which tends to happen on Gentoo.

c0dejedi21d ago

This was a consideration when I wrote this

simonask22d ago· 5 in thread

I mean, cool, but I’d rather not?

margalabargala22d ago

So don't. Not everything is for you.

TurkTurkleton22d ago

Didn't you hear? The author of this daemon is going around and forcibly installing it on anyone's computer that has soldered memory and an Nvidia GPU. I heard even he brings a Ludovico-technique chair with him and straps you in and pins your eyelids open like A Clockwork Orange so you have to watch.

gchamonlive22d ago

Wouldn't it be faster to swap to vram if you are sitting there with 8gigs of it unused than swapping to ssd and burning its write cycles, assuming you absolutely need swap

monkpit21d ago

The HN equivalent of “1 star - I’ve never eaten at this restaurant” type of reviews.

dspillett22d ago

So, erm, don't?

xfalcox22d ago· 3 in thread

Given my dev machine has 32GB of RAM and 32GB of VRAM that sits mostly idle when I'm not running AI models, this is not that bad of an idea.

mathisfun12322d ago

this is the pcmasterrace equivalent of being all upper body and with scrawny legs lol

zamadatix21d ago

Actually not that crazy of a spread. E.g. I have 48 GB + 32 GB in my gaming PC because if you go beyond 48 GB you start having to trade off more and more performance to keep the memory controller from falling over, so you really have to have a good reason to want to load more. Server platforms, like Epyc, it tends not to matter as much because you have so many channels for bandwidth and a beefier memory controller to handle them. Then on the VRAM side it's more about what makes sense for the GPU and how you plan on using it there (games or AI or modeling or whatever), and for a lot of cases the 5090 is just a good card to get for one reason or another (it just has a ton of compute + bandwidth for a consumer part).

2 more replies

tempoponet22d ago

It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now.

1 more reply

bobsmooth22d ago· 3 in thread

RAM disks have always fascinated me. In a different timeline every PC has a 100gb of RAM and 50TB HDDs are the norm.

pixl9722d ago

Back when HDDs were all there was ramdisks were interesting, but SSDs pretty much killed most of that as they have massively increased IOPS over disks.

Hard drives that huge scare me as it would take days to backup all the data off them.

bobsmooth22d ago

In my fantasy RAM was the predominate technology over flash.

freedomben21d ago

Having 128 GB in my desktop, I can never, ever go back. It truly unlocks a whole different computing experience. I've only had one OOM in the last 5 years and it was in my own code where I had a bad memory leak. It's the only way to live

hardwaresofton22d ago· 3 in thread

You want to waste VRAM, in this economy?

kevin_thibedeau22d ago

It's 1GiB. What could it cost, $10?

hardwaresofton22d ago

can't stand the thought of people not understanding the above reply[0]

[0]: https://knowyourmeme.com/memes/its-one-banana-michael-what-c...

theandrewbailey21d ago

I wish it cost $10.

(Kinda goes against the original spirit of the reference)

effnorwood22d ago· 3 in thread

use your car for an anchor on a big boat!

bandrami22d ago

There's probably somebody in Monaco who does that

SV_BubbleTime22d ago

I mean, if you aren’t using the car while using the boat and it won’t really damage the car… yes?

oneshtein21d ago

And when you use your car, then what?

1 more reply

mmastrac22d ago· 2 in thread

I seriously looked at this as a way to improve the RAM situation in a QNAP 2U unit that I was having trouble sourcing RAM for. It's somewhat annoying that legit memory-over-PCIe is gated on PCIe5 and chipset support.

In the end I just had to bite the bullet and take a gamble on finding ECC DDR4 RAM that would work with the ancient AMD chipset...

This particular implementation seems to be running over too many layers to be particularly performant. Why not a custom block driver instead?

Teknoman11722d ago

Memory on an expansion card isn't gated on PCIe 5, it's gated on CXL support. CXL and PCIe use the same electrical/physical layer but the protocol is very different.

The problem with putting (system) RAM on a PCIe card is that PCIe is not a cache-coherent interconnect. If you have a cache line that resides on your GPU sitting inside your processor's cache a remote modification to that memory by either the GPU, another CPU core or some other PCIe device with NOT invalidate the CPU cache line. You also have the fun situation that if it's modified on both ends simultaneously the resulting state will be non-deterministic.

Device drivers have to be very careful about synchronization when accessing memory-like areas on PCIe. CXL adds a cache coherency protocol among other things, so that invalidations and snoops can be exchanged over the interconnect.

wang_li22d ago

It’s deterministic. But as the user you don’t know enough to know what was determined.

ProllyInfamous21d ago· 2 in thread

Why does my Apple Sillicon Mac with 32gb of RAM use (or even create?!) a swapfile, when 20gb is still unused/"free"?

Why can I not just enter a simple command to entirely-disable swapfile, like with Linux's:

>>>>swapoff -a

Seems kind of silly, unless the point is intentionally to wear-down the SSD's lifespan.

Having a GUI swapfile-disable system preference would be awesome. It would also be awesome if Apple finally abandoned this system settings/layout "phase" – it's still word-salad (compared to decades of preference panes).

#Apple #Feedback #swapfile

netbsdusers21d ago

The principle of a paging system is that main memory is just a cache for secondary memory, and the concept of "free memory" ideally rather means something like "memory that can be quickly reclaimed for another purpose". Sometimes anonymous memory will be of less us occupying main memory at some given time than would be letting cached file contents take their place.

ProllyInfamous20d ago

You've described what the purpose of a swapfile is – thanks? My argument is simply that you don't need a swapfile with enough real memory; that swapfiles can unnecessarily shorten the lives of SSDs, entirely unnecessarily.

----

>I have 20GB of RAM free (me)

>>~need to have quick access to main memory

>I have 20GB of RAM free (still me)

>>~yeah but "quickly reclaimed for another purpose"

>I have 20GB of RAM free (!m!e!)

//of//32GB//ttl//

----

In linuxland, I'd just type `sudo swapoff -a` and be done with it. That machine has 96GB of RAM, so it would have ~84GB of RAM free (if, hypothetically, the same hardare/configuration were operating that system).

Does. not. need. a swapfile.

The operating system, during bootup, should think "hey I have dozens of gigabytes of RAM, won't be needing any swapfiles" – behind the scenes and without input.

tlb21d ago· 2 in thread

Building a swap device at user level used to be one of those classic unsolvable problems, because what if your daemon needs to swap in a page in order to swap in a page? Or at least it was discussed at a reason why microkernels will never work. I’m not sure what the solution is here.

kccqzy21d ago

Your daemon can be smart enough to know which are its own pages and prevent them from being swapped out. The Linux kernel also prevents its own text pages from being swapped out, so the solution exists and I don’t see why it doesn’t apply to microkernel designs.

netbsdusers21d ago

The general principle is that what is involved in paging should not be paged itself. Wiring the memory of that whole daemon is then a trivial solution to the problem.

enthus1ast_21d ago· 2 in thread

And now, I want HDMI as direct connection hight speed network socket.

voxadam21d ago

HDMI is largely unidirectional.

enthus1ast_21d ago

But it's 48 gbp/s unidirectional.

kimixa22d ago· 1 in thread

I remember this being a thing done a while back using linux's MTD/phram drivers - https://wiki.archlinux.org/title/Swap_on_video_RAM - not sure if that's still relevant though as I don't know how it'll interact with DRM and how it handles reserving some of the vram - the suggested limit using xorg.conf is probably pretty obsolete now.

That page also has a fuse filesystem implementation on top of opencl - https://github.com/Overv/vramfs - which may be more compatible.

aa-jv21d ago

Yeah, I used to map my 8 megabytes of video memory through the mtd back in the day, it helped build those .. you know .. X11 drivers .. ;)

Man, that brings back memories.

molticrystal22d ago· 1 in thread

For windows I saw something similar to this years ago. An experimental proof of concept driver that allows the creation of a ram drive from vram for NVIDIA cards. Sequential is fast as you'd expect, random has lots of room for improvement.

>GpuRamDrive

>Create a virtual drive backed by GPU RAM.

https://github.com/prsyahmi/GpuRamDrive

Fork with AMD support:

https://github.com/brzz/GpuRamDrive/

c0dejedi21d ago

Thanks for sharing this, good read :-)

rwmj21d ago· 1 in thread

Similar but using OpenCL APIs, so it works on AMD (for some definition of "works" since their drivers are quite buggy): https://libguestfs.org/nbdkit-vram-plugin.1.html

c0dejedi21d ago

Thank you for pointing me to this

willis93622d ago· 1 in thread

I'm more interested in the opposite. Nvidia linux drivers crash when you try to address more VRAM than you have. It'd be nice if they didn't.

SV_BubbleTime22d ago

They already do that on windows and it kinda sucks. If you are targeting something like LMStudio or ComfyUI, both of those have superior methods to do exactly this.

nialv722d ago· 1 in thread

I mean, you prompted something useful out of an AI, good job. But then use that to ask for donation? Feels weird, man.

dspillett21d ago

As much as I'm avoiding GenAI myself⁰ I think your reaction is what feels a bit weird. You wouldn't be sending a tip for simply prompting the LLM, but for having the original idea and verifying/testing the result. If you don't feel right donating for that, then don't. Seeing a “buy me a coffee” link is hardly onerous, and it isn't exactly in-your-face here (I didn't notice at all until your comment mentioned it).

--------

[0] I want to code, I like the nitty-gritty, and if I want to outsource I'd prefer to outsource to a human¹ than GenAI

[1] they might outsource to GenAI of course, that is their choice and as long as they properly verify the output before handing it on to me I shouldn't have to care

drdaeman22d ago

What about backpressure, how does it handle requirements for VRAM allocation when VRAM is used for swap space?

With X11 it's not that bad (buffers are pre-allocated), but with Wayland allocations are a lot more dynamic, so running low on VRAM can easily crash the whole desktop. I just had a few of such crashes with Hyprland+llama-server+KVM switching between computers without freeing VRAM.

sgjohnson22d ago

>Sequential throughput: ~1.3 GB/s

sounds VERY low, also, wouldn't random read/write speed be MUCH more relevant here?

theblazehen21d ago

I've implemented the same idea with OpenCL: https://github.com/theblazehen/vramblk

There is originally https://github.com/Overv/vramfs however that has the overhead of a FUSE filesystem + loop device when using as a swap device.

The performance is rather lackluster however, it's far from a miracle "now you effectively have more ram for a 90% performance drop" - it definitely feels like traditional swapping

c0dejedi12d ago

The nbd-vram daemon got a thread pool, ~4x concurrent 4K IOPS, and the swap-pressure freeze is finally gone.

Fresh benchmarks against NVMe included:

https://www.seanlobjoit.com/posts/2026-06-12-vram-swap-two-w...

londons_explore21d ago

It seems obvious to me that this should be a built in functionality of the kernel.

The kernels job is to manage resources - and GPU ram is one such resource, and it can be used for many of the same uses as regular ram.

LouisvilleGeek22d ago

Finally a use for the expensive ram when it's not needed in workloads!

Now if it could be dynamically used and vacated on other GPU workloads?

NortySpock21d ago

Nice, I might try using this as I'm currently on 16 GB of RAM / 11 GB VRAM and feel like the VRAM is usually idle except for when I game or try a local LLM.

It would be nice to have dynamic scaling or even just auto-shutoff on VRAM pressure if I forget I have this enabled and then fire up a game or LLM.

UnfitFootprint22d ago

No software benchmarks? BAR for RAM is cool but I want to see how much it _actually_ beats pcie nvme

tgtweak21d ago

I think you can definitely improve the throughput/iops by using BAR vs treating it like a file store/mount through cuda which adds a lot of overhead.

steeve21d ago

sidenote: it is possible to use Vulkan to map GPU memory to CPU space and even map it back to CUDA: https://x.com/steeve/status/2055042304344231978?s=20

jcmfernandes22d ago

Q: Why? A: Why not?

mrwizrd21d ago

I have long wished it was possible to do this. What a great bit of code. Thanks.

1matin21d ago

Nice idea, but I'm sure a ton of things can go wrong with it. It needs extensive edge case handling in order to be usable widely.

AI207019d ago

The catch is volatility: one CUDA process reclaims the VRAM, and your swap just evaporates.

1 more reply

hearstcastle820d ago

Iterating VRAM as swap space. Reliable tracker .iso installs on SSD drive.

zx808021d ago

Nvme ssd weights much less than GPU, and it matters for a laptop.

usxr151522d ago

Nice

lowbloodsugar22d ago

This is why I read HN.

j / k navigate · click thread line to collapse

126 comments

100 comments · 36 top-level

yjftsjthsd-h22d ago· 10 in thread

> Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work.

Well, that does at least answer my immediate question about why I would ever swap from expensive RAM to really expensive RAM:) Feels niche, but when you want it it's a good idea.

Wowfunhappy22d ago

Edit: Although, this is predicated on the system being able to release VRAM that is acting as swap when it's time to start a game. Can it do that?

c0dejedi21d ago

I am catching up on comments

The reason I wrote this is I run this laptop in hybrid (AMD display + NVIDIA as swap). So all at VRAM was going to waste.

On your question re: switchable swap. It's on my to-do list ;)

1 more reply

Saris22d ago

It's easy enough to 'offline' swap space on Linux normally so I suspect that would work fine, as long as you didn't instantly run out of RAM when doing so.

1 more reply

nuccy21d ago

ornornor21d ago

> you probably don't need 16 GB of VRAM just to render the desktop

Microsoft: hold my beer

1 more reply

ErroneousBosh21d ago

In the olden days we called that a "RAM Disk" and it made our Atari STs go really fast!

3form21d ago

Except swap is, like, opposite of RAM disk.

That said, still an nice and fun concept. Though caching got better since I assume :)

1 more reply

Phelinofist21d ago

So can VRAM actually be used like regular RAM? E.g. if I have a 16GB module and my GPU has 16GB VRAM, could it be made so that my system reports 32GB RAM? What would be the implications of that?

tobyhinloopen21d ago

It behaves like slower ram I assume, due to the increased distance from the CPU and overhead. Still, it’s much faster than normal SWAP which uses a disk or SSD.

How it is reported? As SWAP space, not as RAM.

Tuna-Fish21d ago

There are a bunch of datacenter GPUs that support full cache coherency, but if you used them like that the VRAM would be very high latency from the CPU. So it would only be really slow.

3 more replies

dragontamer22d ago· 9 in thread

Remember how 16GBs used to be an enterprise level database mainframe?

Well, GPUs also have stupid amounts of compute on them. I have to imagine that there is some kind of database format that's useful with GPU compute attached.

Since the data is already in VRAM, the GPU can sort, join, or otherwise manipulate data as needed.

tmostak22d ago

myself24821d ago

And smart NICs are moving significant amounts of compute directly onto the network interface, though I haven't seen anyone combining a GPU and a 100GbE NIC into a single part yet.

Where does a few more steps of evolution take us? A wide path between a few heavy devices, and then the CPU off to the side just orchestrating the data flow?

c0dejedi21d ago

Insightful take, looking into these

einichi22d ago

oh god please don't create more demand for GPUs

giancarlostoro22d ago

Can we somehow make them work with 1 TB PCIes so we can churn through way more data?

dragontamer22d ago

Have you heard of the "Radeon Pro SSG" ??

It must have failed because I never heard of an update to this GPU. But AMD definitely made a GPU with 4x NVMe SSDs attached to the GPU.

strictnein22d ago

You are able to use GPU Direct Storage to communicate between the GPU and PCIE storage devices. It's nice, but it's not typically as performant as one would like, in comparison to the onboard memory.

https://docs.nvidia.com/gpudirect-storage/

https://github.com/microsoft/DirectStorage/tree/main

the847221d ago

linux has P2P-DMA for this. The drivers, devices and bus topology need to support it though.

https://docs.kernel.org/driver-api/pci/p2pdma.html

1 more reply

Nate75Sanders22d ago

Possibly LSM compaction.

dlt71370522d ago· 8 in thread

Does anyone these days really use swap for anything than S4 suspend ?

kccqzy22d ago

https://news.ycombinator.com/item?id=40697318

I used to have zero swap on my Linux desktop and this convinced me to add at least a small swap partition.

dlt71370521d ago

I don't consider swap to be emergency RAM storage. I know that the kernel will decide by itself to use swap even if it has plenty of available RAM and the swappiness threshold is not reached.

Nevertheless, my two decent laptops (one with 16 GB RAM, the other with 64 GB RAM) never swap, even with Docker Swarm and multiple stacks, multiple VMs, desktop activities, and gaming.

It's been a while since I last saw a physical machine actively swapping.

I understand that some limited hardware may need swap, but I can't see such hardware having a GPU with plenty of VRAM.

That said, hacking things is always fun :)

1 more reply

sidewndr4622d ago

I just set swappiness to zero years ago and never looked back.

1 more reply

Saris22d ago

yjftsjthsd-h22d ago

dlt71370522d ago

Edit: Typo

xxs21d ago

>S4 suspend

Is not popular in general, so yes. But also no - I don't use swap ever, if I have to go over the RAM (32GB being low, with 64GB the norm), might as well consider the system dead.

the847221d ago

For me opening huge datasets, e.g. many gigabytes worth of profiling data, combined with other stuff running on the system, can end up pushing things to swap.

RachelF22d ago· 7 in thread

Nice idea, but something has gone very wrong here:

>Sequential throughput: ~1.3 GB/s

[on a RTX 3070 Laptop]

This RTX 3070 chip is on PCIe 4.0 x16 which should give 64GB/s. The 8GB of GDDR6 is 448GB/s.

Swapping to an NVMe drive would be twice as fast, but with higher latency.

Teknoman11722d ago

Gen 4.0 x16 is 32 GB/s in each direction, but the way this is implemented is not the way you'd go about this if you wanted high performance.

Edit: Their benchmarks are also run using ZRAM, which compresses pages before writing to swap. Not sure what the performance overhead of that is, but it's probably quite a bit.

tumblestick21d ago

It's true that Linux kernel is the throughput bottleneck. Unfortunately, the optimizations described above aren't sufficient to get within even 10% of hardware bandwidth.

My group wrote a paper on this: https://dl.acm.org/doi/10.1145/3731569.3764842

lstodd22d ago

yup. it's nbd and userspace making it slow. zram on the other hand adds little.

one can get rid of zram and just reimplement some compression in shaders but I think that would be a pointless optimization.

dannyw21d ago

Swapping to a NVMe will also consume PE cycles on your NAND, ie wearing it out over time.

RAM/VRAM don’t degrade from use.

markhahn21d ago

flash is a consumable, yes.

google says this drive was released in 2014, and SMART says POH is about 10 years.

LtdJorge21d ago

All of this is to say that, it does have a potential impact on flash, if you rebuild often, which tends to happen on Gentoo.

c0dejedi21d ago

This was a consideration when I wrote this

simonask22d ago· 5 in thread

I mean, cool, but I’d rather not?

margalabargala22d ago

So don't. Not everything is for you.

TurkTurkleton22d ago

gchamonlive22d ago

Wouldn't it be faster to swap to vram if you are sitting there with 8gigs of it unused than swapping to ssd and burning its write cycles, assuming you absolutely need swap

monkpit21d ago

The HN equivalent of “1 star - I’ve never eaten at this restaurant” type of reviews.

dspillett22d ago

So, erm, don't?

xfalcox22d ago· 3 in thread

Given my dev machine has 32GB of RAM and 32GB of VRAM that sits mostly idle when I'm not running AI models, this is not that bad of an idea.

mathisfun12322d ago

this is the pcmasterrace equivalent of being all upper body and with scrawny legs lol

zamadatix21d ago

2 more replies

tempoponet22d ago

It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now.

1 more reply

bobsmooth22d ago· 3 in thread

RAM disks have always fascinated me. In a different timeline every PC has a 100gb of RAM and 50TB HDDs are the norm.

pixl9722d ago

Back when HDDs were all there was ramdisks were interesting, but SSDs pretty much killed most of that as they have massively increased IOPS over disks.

Hard drives that huge scare me as it would take days to backup all the data off them.

bobsmooth22d ago

In my fantasy RAM was the predominate technology over flash.

freedomben21d ago

hardwaresofton22d ago· 3 in thread

You want to waste VRAM, in this economy?

kevin_thibedeau22d ago

It's 1GiB. What could it cost, $10?

hardwaresofton22d ago

can't stand the thought of people not understanding the above reply[0]

[0]: https://knowyourmeme.com/memes/its-one-banana-michael-what-c...

theandrewbailey21d ago

I wish it cost $10.

(Kinda goes against the original spirit of the reference)

effnorwood22d ago· 3 in thread

use your car for an anchor on a big boat!

bandrami22d ago

There's probably somebody in Monaco who does that

SV_BubbleTime22d ago

I mean, if you aren’t using the car while using the boat and it won’t really damage the car… yes?

oneshtein21d ago

And when you use your car, then what?

1 more reply

mmastrac22d ago· 2 in thread

In the end I just had to bite the bullet and take a gamble on finding ECC DDR4 RAM that would work with the ancient AMD chipset...

This particular implementation seems to be running over too many layers to be particularly performant. Why not a custom block driver instead?

Teknoman11722d ago

Memory on an expansion card isn't gated on PCIe 5, it's gated on CXL support. CXL and PCIe use the same electrical/physical layer but the protocol is very different.

wang_li22d ago

It’s deterministic. But as the user you don’t know enough to know what was determined.

ProllyInfamous21d ago· 2 in thread

Why does my Apple Sillicon Mac with 32gb of RAM use (or even create?!) a swapfile, when 20gb is still unused/"free"?

Why can I not just enter a simple command to entirely-disable swapfile, like with Linux's:

>>>>swapoff -a

Seems kind of silly, unless the point is intentionally to wear-down the SSD's lifespan.

#Apple #Feedback #swapfile

netbsdusers21d ago

ProllyInfamous20d ago

----

>I have 20GB of RAM free (me)

>>~need to have quick access to main memory

>I have 20GB of RAM free (still me)

>>~yeah but "quickly reclaimed for another purpose"

>I have 20GB of RAM free (!m!e!)

//of//32GB//ttl//

----

Does. not. need. a swapfile.

The operating system, during bootup, should think "hey I have dozens of gigabytes of RAM, won't be needing any swapfiles" – behind the scenes and without input.

tlb21d ago· 2 in thread

kccqzy21d ago

netbsdusers21d ago

The general principle is that what is involved in paging should not be paged itself. Wiring the memory of that whole daemon is then a trivial solution to the problem.

enthus1ast_21d ago· 2 in thread

And now, I want HDMI as direct connection hight speed network socket.

voxadam21d ago

HDMI is largely unidirectional.

enthus1ast_21d ago

But it's 48 gbp/s unidirectional.

kimixa22d ago· 1 in thread

That page also has a fuse filesystem implementation on top of opencl - https://github.com/Overv/vramfs - which may be more compatible.

aa-jv21d ago

Yeah, I used to map my 8 megabytes of video memory through the mtd back in the day, it helped build those .. you know .. X11 drivers .. ;)

Man, that brings back memories.

molticrystal22d ago· 1 in thread

>GpuRamDrive

>Create a virtual drive backed by GPU RAM.

https://github.com/prsyahmi/GpuRamDrive

Fork with AMD support:

https://github.com/brzz/GpuRamDrive/

c0dejedi21d ago

Thanks for sharing this, good read :-)

rwmj21d ago· 1 in thread

Similar but using OpenCL APIs, so it works on AMD (for some definition of "works" since their drivers are quite buggy): https://libguestfs.org/nbdkit-vram-plugin.1.html

c0dejedi21d ago

Thank you for pointing me to this

willis93622d ago· 1 in thread

I'm more interested in the opposite. Nvidia linux drivers crash when you try to address more VRAM than you have. It'd be nice if they didn't.

SV_BubbleTime22d ago

They already do that on windows and it kinda sucks. If you are targeting something like LMStudio or ComfyUI, both of those have superior methods to do exactly this.

nialv722d ago· 1 in thread

I mean, you prompted something useful out of an AI, good job. But then use that to ask for donation? Feels weird, man.

dspillett21d ago

--------

[0] I want to code, I like the nitty-gritty, and if I want to outsource I'd prefer to outsource to a human¹ than GenAI

[1] they might outsource to GenAI of course, that is their choice and as long as they properly verify the output before handing it on to me I shouldn't have to care

drdaeman22d ago

What about backpressure, how does it handle requirements for VRAM allocation when VRAM is used for swap space?

sgjohnson22d ago

>Sequential throughput: ~1.3 GB/s

sounds VERY low, also, wouldn't random read/write speed be MUCH more relevant here?

theblazehen21d ago

I've implemented the same idea with OpenCL: https://github.com/theblazehen/vramblk

There is originally https://github.com/Overv/vramfs however that has the overhead of a FUSE filesystem + loop device when using as a swap device.

The performance is rather lackluster however, it's far from a miracle "now you effectively have more ram for a 90% performance drop" - it definitely feels like traditional swapping

c0dejedi12d ago

The nbd-vram daemon got a thread pool, ~4x concurrent 4K IOPS, and the swap-pressure freeze is finally gone.

Fresh benchmarks against NVMe included:

https://www.seanlobjoit.com/posts/2026-06-12-vram-swap-two-w...

londons_explore21d ago

It seems obvious to me that this should be a built in functionality of the kernel.

The kernels job is to manage resources - and GPU ram is one such resource, and it can be used for many of the same uses as regular ram.

LouisvilleGeek22d ago

Finally a use for the expensive ram when it's not needed in workloads!

Now if it could be dynamically used and vacated on other GPU workloads?

NortySpock21d ago

Nice, I might try using this as I'm currently on 16 GB of RAM / 11 GB VRAM and feel like the VRAM is usually idle except for when I game or try a local LLM.

It would be nice to have dynamic scaling or even just auto-shutoff on VRAM pressure if I forget I have this enabled and then fire up a game or LLM.

UnfitFootprint22d ago

No software benchmarks? BAR for RAM is cool but I want to see how much it _actually_ beats pcie nvme

tgtweak21d ago

I think you can definitely improve the throughput/iops by using BAR vs treating it like a file store/mount through cuda which adds a lot of overhead.

steeve21d ago

sidenote: it is possible to use Vulkan to map GPU memory to CPU space and even map it back to CUDA: https://x.com/steeve/status/2055042304344231978?s=20

jcmfernandes22d ago

Q: Why? A: Why not?

mrwizrd21d ago

I have long wished it was possible to do this. What a great bit of code. Thanks.

1matin21d ago

Nice idea, but I'm sure a ton of things can go wrong with it. It needs extensive edge case handling in order to be usable widely.

AI207019d ago

The catch is volatility: one CUDA process reclaims the VRAM, and your swap just evaporates.

1 more reply

hearstcastle820d ago

Iterating VRAM as swap space. Reliable tracker .iso installs on SSD drive.

zx808021d ago

Nvme ssd weights much less than GPU, and it matters for a laptop.

usxr151522d ago

Nice

lowbloodsugar22d ago

This is why I read HN.

j / k navigate · click thread line to collapse