undefined | Better HN

0 pointscfn2y ago0 comments

It is strange that it does that given that there's plenty of free memory available in the system (it has 256Gb of RAM and wasn't running anything else).

0 comments

1 comments · 1 top-level

dTal2y ago

Not really, it's just a question of accounting. mmap is functionally the same as disk cache. As long as you've got the RAM, it'll run from RAM. If you really want, you can force llama.cpp not to use mmap and explicitly load everything into RAM, but there's not really any performance reason to do that - if the kernel keeps dropping your pages, you're under memory pressure anyway and "locking" that memory will probably end up either thrashing or invoking the OOM killer.

j / k navigate · click thread line to collapse