Show HN: Neuropod – Uber ATG's open source deep learning inference engine (opens in new tab)

(github.com)

80 pointsvpanyam6y ago25 comments

25 comments

22 comments · 8 top-level

vpanyamOP6y ago· 8 in thread

Hey Everyone! I lead the development of Neuropod. Happy to answer any questions

There's also a blog post that has more detail: https://eng.uber.com/introducing-neuropod/

Super excited to open-source it!

m0zg6y ago

TensorRT offers significant advantages wrt inference and it takes ONNX files. Best I can tell this does not have a TensorRT backend (https://github.com/uber/neuropod/search?q=nvinfer.h&unscoped...). Why not?

vpanyamOP6y ago

Adding backends for TensorRT, ONNX, JAX, etc are on our TODO list (and we'd love to see PRs to add support for these and others)!

We actually do use TensorRT with several of our models, but our approach is generally to do all TRT related processing before the Neuropod export step. For example, we might do something like

    TF model -> TF-TRT optimization -> Neuropod export

    PyTorch model
    -> (convert subset of model to a torchscript engine)
    -> PyTorch model + custom op to run TRT engine
    -> TorchScript model + custom op to run TRT engine
    -> Neuropod export

Since Neuropod wraps the underlying model (including custom ops), this approach works well for us.

emef6y ago

I wrote our internal lightweight version of neuropod at another SDC startup where we did use TensorRT. Our ML researchers worked in pytorch and more often than not, the pytorch -> onnx -> tensorrt conversion did not work. We ended up needing to replicate the network architecture using the tensorrt library and manually convert the weights from pytorch. Then we'd use the tensorrt serialization to compile the models so they could be run in c++. I imagine that they may have tried this in neuropod and saw the same conversion problems. TensorRT was a big investment to get running smoothly but it did shave off 20% or so off our inference latency

1 more reply

jaredtn6y ago

What will the continued support for this project be, given that Uber has shuttered their AI Labs?

vpanyamOP6y ago

Neuropod was created at Uber ATG (not AI Labs) and powers hundreds of models across the company (ATG and the core business). It's been used in production for over a year and we're continuing to actively work on it.

The blog post I linked above goes into more detail, but here's a relevant quote about usage within Uber:

> Neuropod has been instrumental in quickly deploying new models at Uber since its internal release in early 2019. Over the last year, we have deployed hundreds of Neuropod models across Uber ATG, Uber AI, and the core Uber business. These include models for demand forecasting, estimated time of arrival (ETA) prediction for rides, menu transcription for Uber Eats, and object detection models for self-driving vehicles.

mysterEFrank6y ago

I'm a former Uber AI Labs member - they're different orgs. Begs the question though, who at Uber will use this now?

1 more reply

tourist_on_road6y ago

From the looks of it. It is from uber ATG not from AI labs. I believe those are two different orgs. Someone from uber can clarify

yshvrdhn6y ago

any possible support for tensorrt ?

voz_6y ago· 3 in thread

Consider using ONNX instead.

shockinglytrue6y ago

It always strikes me as uncannily brave to see a post like this. So many statements associated with one username..

- I no longer work at $company, and their stuff sucks

- ergo, they fired me, or I left on bad terms

- I clearly didn't get on well with my coworkers, as I'm happy to shit on their work from across the pond

- ergo, I have some deep attitude problem I'm likely to bring to my next placement

dang6y ago

This is not a good thread for a disgruntled grudge post.

voz_6y ago

That is fair. It does not contribute to curiosity, discovery, or good conversation. I will remove the negative bits.

damvigilante6y ago· 2 in thread

How does this compare to ONNX https://github.com/onnx/onnx in terms of feature completeness/performance and what made you develop your own runtime ?

vpanyamOP6y ago

This is a good question. I want to write a more detailed post about this in the future, but here are a few points for now:

- Neuropod is an abstraction layer so it can do useful things on top of just running models locally. For example, we can transparently proxy model execution to remote machines. This can be super useful for running large scale jobs with compute intensive models. Including GPUs in all our cluster machines doesn’t make sense from a resource efficiency perspective so instead, if we proxy model execution to a smaller cluster of GPU-enabled servers, we can get higher GPU utilization while using fewer GPUs. The "Model serving" section of the blog post ([1]) goes into more detail on this. We can also do interesting things with model isolation (see the "Out-of-process execution" section of the post).

- ONNX converts models while Neuropod wraps them. We use TensorFlow, TorchScript, etc. under the hood to run a model. This is important because we have several models that use custom ops, TensorRT, etc. We can use the same custom ops that we use at training time during inference. One of the goals of Neuropod is to make experimentation, deployment, and iteration easier so not having to do additional "conversion" work is useful.

- When we started building Neuropod, ONNX could only do trace-based conversions of PyTorch models. We've generally had lots of trouble with correctness of trace-based conversions for non-trivial models (even with TorchScript). Removing intermediate conversion steps (and their corresponding verification steps) can save a lot of time and make the experimentation process more efficient.

- Being able to define a "problem" interface was important to us (e.g. "this is the interface of a model that does 2d object detection"). This lets us have multiple implementations that we can easily swap out because we concretely defined an interface. This capability is useful for comparing models across frameworks without doing a lot of work. The blog post ([1]) talks about this in more detail.

The blog post ([1]) goes into a lot more detail about our motivations and use cases so it's worth a read.

[1] https://eng.uber.com/introducing-neuropod/

dtrailin6y ago

Yes, this seems very similar to the ONNX Runtime https://github.com/microsoft/onnxruntime. I'm not sure why they needed to reinvent the wheel here.

j88439h846y ago· 1 in thread

How does it differ from pyro?

mysterEFrank6y ago

Neuropods can wrap pyro models

manicksurya6y ago

How is the performance of inferencing compared to the native serving solutions provided by frameworks like TFServing etc

aloknnikhil6y ago

Found it interesting that most of the commits are under 1 contributor (OP). Are you the most active contributor or was this an artifact of open-sourcing it? Just wondering if you get hit by a bus tomorrow, what would we do? :)

Thanks for this, btw!

leonfedden6y ago

This looks great thanks for open sourcing it.

Have you had a chance to try running your models on baremetal devices such as ARM cortex M4?

Is there a list of OPs that are supported or crucially, unsupported?

xbsd986y ago

Are there any examples of demand forecasting ? Thanks.

j / k navigate · click thread line to collapse

25 comments

22 comments · 8 top-level

vpanyamOP6y ago· 8 in thread

Hey Everyone! I lead the development of Neuropod. Happy to answer any questions

There's also a blog post that has more detail: https://eng.uber.com/introducing-neuropod/

Super excited to open-source it!

m0zg6y ago

vpanyamOP6y ago

Adding backends for TensorRT, ONNX, JAX, etc are on our TODO list (and we'd love to see PRs to add support for these and others)!

We actually do use TensorRT with several of our models, but our approach is generally to do all TRT related processing before the Neuropod export step. For example, we might do something like

    TF model -> TF-TRT optimization -> Neuropod export

    PyTorch model
    -> (convert subset of model to a torchscript engine)
    -> PyTorch model + custom op to run TRT engine
    -> TorchScript model + custom op to run TRT engine
    -> Neuropod export

Since Neuropod wraps the underlying model (including custom ops), this approach works well for us.

emef6y ago

1 more reply

jaredtn6y ago

What will the continued support for this project be, given that Uber has shuttered their AI Labs?

vpanyamOP6y ago

The blog post I linked above goes into more detail, but here's a relevant quote about usage within Uber:

mysterEFrank6y ago

I'm a former Uber AI Labs member - they're different orgs. Begs the question though, who at Uber will use this now?

1 more reply

tourist_on_road6y ago

From the looks of it. It is from uber ATG not from AI labs. I believe those are two different orgs. Someone from uber can clarify

yshvrdhn6y ago

any possible support for tensorrt ?

voz_6y ago· 3 in thread

Consider using ONNX instead.

shockinglytrue6y ago

It always strikes me as uncannily brave to see a post like this. So many statements associated with one username..

- I no longer work at $company, and their stuff sucks

- ergo, they fired me, or I left on bad terms

- I clearly didn't get on well with my coworkers, as I'm happy to shit on their work from across the pond

- ergo, I have some deep attitude problem I'm likely to bring to my next placement

dang6y ago

This is not a good thread for a disgruntled grudge post.

voz_6y ago

That is fair. It does not contribute to curiosity, discovery, or good conversation. I will remove the negative bits.

damvigilante6y ago· 2 in thread

How does this compare to ONNX https://github.com/onnx/onnx in terms of feature completeness/performance and what made you develop your own runtime ?

vpanyamOP6y ago

This is a good question. I want to write a more detailed post about this in the future, but here are a few points for now:

The blog post ([1]) goes into a lot more detail about our motivations and use cases so it's worth a read.

[1] https://eng.uber.com/introducing-neuropod/

dtrailin6y ago

Yes, this seems very similar to the ONNX Runtime https://github.com/microsoft/onnxruntime. I'm not sure why they needed to reinvent the wheel here.

j88439h846y ago· 1 in thread

How does it differ from pyro?

mysterEFrank6y ago

Neuropods can wrap pyro models

manicksurya6y ago

How is the performance of inferencing compared to the native serving solutions provided by frameworks like TFServing etc

aloknnikhil6y ago

Thanks for this, btw!

leonfedden6y ago

This looks great thanks for open sourcing it.

Have you had a chance to try running your models on baremetal devices such as ARM cortex M4?

Is there a list of OPs that are supported or crucially, unsupported?

xbsd986y ago

Are there any examples of demand forecasting ? Thanks.

j / k navigate · click thread line to collapse