Ask HN: How much money do you burn on AI APIs with coding agents?
I was easily able to burn $1 a minute with deepinfra. But I have just tried it only a couple of times.
Interests: AR/VR, Climate Tech, Cycling, Digital Nomad, Entrepreneurship, Fintech, Freelancing, Hacking, Hardware, IoT, Open Source, Programming, Remote Work, Robotics, Science, Space Tech, Startups, Technology, Travel
---
I was easily able to burn $1 a minute with deepinfra. But I have just tried it only a couple of times.
From what I dig so far it looks like dual Arc A770 is supported by llama.cpp. And saw some reports that llama.cpp on top of IPEX-LLM is fastest way for inference on intel card.
On the other end there is more expensive 7900 XTX on which AMD claims (Jan '25) that inference is faster than on 4090.
So - what is the state of the art as of today, how does one compare to another (apple to apple)? What is tokens/s diff?
What tools do you use with llama.cpp?
Is there anything you recommend to avoid when it comes to llama.cpp?
Want to collect your best practices/experiences and advice around llama.cpp. Eg. if you work with Visual Studio Code - what plugins you recommend, and what not. Etc...
[ 6488.577727] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout, signaled seq=1715593, emitted seq=1715595
[ 6488.577881] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process firefox pid 756 thread firefox:cs0 pid 831
[ 6488.578005] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
And wondering - is it linux driver or is it firefox? Anyone have any experience/insights?