undefined | Better HN

0 pointsscottlamb3y ago0 comments

Maybe. Running inference at 10 fps is probably plenty. But that doesn't mean you only have to do 10 fps of H.264/H.265 decoding. I think the most common scenario is for the input video to be e.g. 30 fps with mostly P frames that each depend on the prior frame in a chain. In that case, you need to decode almost [1] 30 fps to get 10 fps of evenly spaced frames to process.

[1] You could skip the last P frame before an IDR frame, but that doesn't buy you much.

0 comments

zamadatix3y ago

If your source is 96 YouTube videos sure, if it's 96 CCTV cameras it's different.

scottlambOP3y ago

Still depends. As it happens, I'm developing my own open source NVR software, [1] so I know a bit about this. Some cameras are fairly good about this, supporting the following features:

* "Temporal SVC", in which the frame dependencies are structured so you can discard down to 1/2 or 1/4th of the nominal frame rate and still decode the remainder.

* Three output streams, which you could configure for say forensics (high-bandwidth/high-resolution/high-fps), inference (mid-bandwidth/mid-resolution/low-fps), and viewing multiple streams / over mobile networks (low-bandwidth/low-resolution/mid-fps).

* On-camera ML tasks too. (Although I haven't seen one that lets you upload your own model.)

But other cameras are less good. E.g. some Reolinks [2] only support two streams, and the "sub" stream is fixed at 640x352, which is uncomfortably low. Your inference network may not take more resolution than that, but even if not, you might want to crop down to the area of interest (where there's motion and/or where the user has configured an alert) to improve quality. (You probably wouldn't pair that cheap Reolink camera with this expensive inference card, but the point stands in general.)

Even the "better" cameras' timestamp handling is awful, so it's hard to reliably match up the main stream, sub stream, analytics output, and wall clock time. Given that limitation it'd be desirable to just use the main stream for everything but the on-NVR transcoding's likely unaffordable.

[1] https://github.com/scottlamb/moonfire-nvr

[2] https://github.com/scottlamb/moonfire-nvr/wiki/Cameras:-Reol...

zamadatix3y ago

Price wise if this card is $5,000 that's $52 per where you don't need any onboard smarts handled by the camera in a space where commercial cameras are hundreds of dollars to buy or replace to have the particular smarts you're looking for that day. I've done a few PoCs in the smart city/smart retail space they are advertising here and they pretty much end up falling into the "everything must be pre-processed as much as possible and sent to the cloud" or "everything must be dumb and sent to the central recorder" buckets as anything in the middle creates a bad cost balance where you're neither optimising hardware+simplicity costs or data+cloud costs. I'll admit though I don't normally go out to sell cameras all day it's just something we've added as clients in part of a larger connectivity rework (CBRS/LTE/Wi-Fi/GPON/traditional wired) and we typically partner up with some specialized company on the video processing use case. The onboard camera processing is usually about justifying a cloud pitch ("we use data to send video when something interesting happens" or "we send only the best picture of the face in HD to save bandwidth but still be able to ID them later") not so much letting you go in and solve your own problem. One exception I ran into was license plates at a car wash outfit where they were able to send the plate numbers back to their main app but that probably came from being a pre-baked solution for road tolls.

I also have a sneaking suspicion using lower channel counts let you raise the FPS but the max of 96 channels is the hard limit, tuned to allow up to use cases like recognition from unprocessed feeds but the documentation access seems to be a manual approval process so I can't verify for sure.

1 more reply

j / k navigate · click thread line to collapse

0 comments

zamadatix3y ago

If your source is 96 YouTube videos sure, if it's 96 CCTV cameras it's different.

scottlambOP3y ago

Still depends. As it happens, I'm developing my own open source NVR software, [1] so I know a bit about this. Some cameras are fairly good about this, supporting the following features:

* "Temporal SVC", in which the frame dependencies are structured so you can discard down to 1/2 or 1/4th of the nominal frame rate and still decode the remainder.

* On-camera ML tasks too. (Although I haven't seen one that lets you upload your own model.)

[1] https://github.com/scottlamb/moonfire-nvr

[2] https://github.com/scottlamb/moonfire-nvr/wiki/Cameras:-Reol...

zamadatix3y ago

1 more reply

j / k navigate · click thread line to collapse