undefined | Better HN

0 pointslostdog5y ago0 comments

Yes (though the details are private).

All the deep learning libraries are Python wrappers around C/C++ (which then call into CUDA). If you call the C++ layers directly, you have control over the memory operations applied to your data. The biggest wins come from reducing the number of copies, reducing the number of transfers between CPU and GPU memory, and speeding up operations by moving them from the CPU to the GPU (or vice versa).

This is basically what the article does, but if you want to squeeze out all the performance, the Python layer is still an abstraction that gets in the way of directly choosing what happens to the memory.

0 comments

5 comments · 2 top-level

threatripper5y ago· 3 in thread

I could see this working for the evaluation which basically just glues OpenCV video reading with Tensorflow to extract a handful of parameters per frame. The rest could stay in Python.

Do you have experience how single frame processing compares between Python and C++? I see that batched processing in Python gives me a huge speed boost which hints at inefficiencies at some point but I don't know if those are related to Python, Tensorflow or CUDA itself. (Or just bad resource management that requires re-initalization of some costly things in between evaluations.)

whimsicalism5y ago

The fact that batching is faster does not inherently imply some sort of inefficiency, but rather is indicative of the fact that sequential memory access is faster than random.

I am curious what the basis behind the idea that Python is the performance bottleneck for inference is.

g_airborne5y ago

It's not that Python is by definition much slower than C++, rather, doing inference in C++ makes it much easier to control exactly when memory is initialised, copied and moved between CPU and GPU. Especially on frame-by-frame models like object detection this can make a big difference. Also, the GIL can be a real problem if you are trying to scale inference on multiple incoming video streams for example.

1 more reply

dheera5y ago

It depends, e.g. if you are moving data from memory into a Python data structure and then sending it to the GPU you will have a huge performance bottleneck in loading the data into Python.

dheera5y ago

There are lots of cases where people use e.g. ROS on robots and Python to do inferences, which basically converts a ROS binary image message data into a Python list of bytes (ugh), then convert that into numpy (ugh), and then feed that into TensorFlow to do inferences. This pipeline is extremely sub-optimal, but it's what most people probably do.

All because nobody has really provided off the shelf usable deployment libraries. That Bazel stuff if you want to use the C++ API? Big nope. Way too cumbersome. You're trying to move from Python to C++ and they want you to install ... Java? WTF?

Also, some of the best neural net research out there has you run "./run_inference.sh" or some other abomination of a Jupyter notebook instead of an installable, deployable library. To their credit, good neural net engineers aren't expected to be good software engineers, but I'm just pointing out that there's a big gap between good neural nets and deployable neural nets.

j / k navigate · click thread line to collapse

0 comments

5 comments · 2 top-level

threatripper5y ago· 3 in thread

I could see this working for the evaluation which basically just glues OpenCV video reading with Tensorflow to extract a handful of parameters per frame. The rest could stay in Python.

whimsicalism5y ago

The fact that batching is faster does not inherently imply some sort of inefficiency, but rather is indicative of the fact that sequential memory access is faster than random.

I am curious what the basis behind the idea that Python is the performance bottleneck for inference is.

g_airborne5y ago

1 more reply

dheera5y ago

It depends, e.g. if you are moving data from memory into a Python data structure and then sending it to the GPU you will have a huge performance bottleneck in loading the data into Python.

dheera5y ago

j / k navigate · click thread line to collapse