undefined | Better HN

0 pointsphito1y ago0 comments

Honestly I have no idea what you are asking about. It's just dedicated hardware to a yolo-like object detection model

0 comments

They are asking about LLMs. There is a confusion it seems -- you are thinking of the object detection model (YOLO) which runs perfectly fine in (near) real time with a Coral or other NPU. The parent is referring the Llava part, which is a full-fledged language model with a vision projector glued onto to it for vision capability. Large language models are generally quantized (converted from full precision float values to less precise floats or ints for instance F16, Q8, Q4) because they would otherwise be extremely large and slow and require a ton of RAM (the model has to access the entire weights for every token generated, so if you don't have a gigantic amount of VRAM you would be pushing many tens of gigabytes of model weights through the system bus slowly).

janalsncm1y ago

Recall and false positives are classification metrics which relates to the YOLO part.

j / k navigate · click thread line to collapse

0 comments

Eisenstein1y ago

janalsncm1y ago

Recall and false positives are classification metrics which relates to the YOLO part.

j / k navigate · click thread line to collapse