undefined | Better HN

0 pointshaolez3y ago0 comments

Why? Is it a limitation of the model or just something with the configuration that you couldn't figure out for this test?

0 comments

1 comments · 1 top-level

downvotetruth3y ago

I tried mark's OMP_NUM_THREADS suggestion (https://news.ycombinator.com/item?id=35018559), did not see any an obvious change to make it parallel, and given the huggingface patch (https://github.com/huggingface/transformers/pull/21955) once it gets in is suppose to allow streaming from RAM to the GPU. So, for me it was not worth the effort to keep working on the CPU version as even the best case ~30X speedup will still take around a minute to run the 7B.

j / k navigate · click thread line to collapse