Prototool – A Swiss Army Knife for Protocol Buffers (opens in new tab)

(github.com)

215 points_mway8y ago62 comments

62 comments

33 comments · 8 top-level

throwaway847428y ago· 21 in thread

In another decade or so the world might replicate half of the very nice internal tools Google has.

Suggestion for a project: make a tool that, given a proto description and a file that contains concatenated proto messages stored as binary strings (sort of like RecordIO at Google) lets you run simple SQL queries on the data and extract a subset of the fields from messages matching a predicate, and maybe even do simple aggregations. That was pretty handy. I really wish Google would open source some or most of this stuff. It’s not like keeping it closed source creates any kind of insurmountable competitive advantage, especially compared to the advantages that would accrue from broader adoption of protobufs.

puzzle8y ago

Other tools and features that don't exist outside:

- a tee loadbalancer for gRPC, forwarding the same requests to both A and B backend pools, but only returning results from A. I don't think Envoy has this, but it should.

- load balancing dashboards showing traffic between frontends and backends

- load balancer support for dynamic sharding

- gnubbyd under ChromeOS: https://groups.google.com/a/chromium.org/forum/m/#!msg/chrom... (I think most of this is doable these days, but the initial setup requires a Linux system)

- Kubernetes: server-specific custom hyperlinks on dashboards (e.g. links to POD_IP:PORT/stats, /debug, etc. for each individual pod you are looking at)

- Kubernetes: multiple Docker images in the same container or pod. E.g. the first container could be your code, while the second one might be data or the JVM runtime, etc., without having to bundle them together or doing costly copies in init containers.

- Kubernetes: canaries and automatic rollbacks

necubi8y ago

> - a tee loadbalancer for gRPC, forwarding the same requests to both A and B backend pools, but only returning results from A. I don't think Envoy has this, but it should.

Envoy can do this, via its shadowing feature. See the docs here: https://www.envoyproxy.io/docs/envoy/v1.6.0/api-v2/api/v2/ro....

1 more reply

kodablah8y ago

> Kubernetes: canaries and automatic rollbacks

Hot off the presses: https://cloudplatform.googleblog.com/2018/04/introducing-Kay.... Though you have to use Spinnaker.

1 more reply

philsnow8y ago

> a tee loadbalancer for gRPC, forwarding the same requests to both A and B backend pools, but only returning results from A

I would call that a "(live) traffic replayer" rather than a load balancer. "load balance" implies to me that the upstream traffic is divvied up among the downstream sinks, not that the upstream traffic gets copied to multiple downstreams.

1 more reply

smarterclayton8y ago

Image based volumes (second last to bullet) has long been blocked on the container runtime having a really clean way to enable and keep the container filesystems mounted. Definitely something I want to see fixed since otherwise you just end up doing hacky copies via emptydir.

akhilcacharya8y ago

Should you really be detailing the functionality of internal tools like this?

2 more replies

zellyn8y ago

When I was at Google, I kept an eye on the open sourcing of RecordIO. Apparently there was no desire not to open source it: it was simply that nobody had the time to disentangle and/or clean it up for release.

Looks like some parts of it have escaped… https://github.com/eclesh/recordio

haberman8y ago

I think the open-source equivalent of RecordIO is the leveldb log format:

https://github.com/google/leveldb/blob/master/doc/log_format...

https://github.com/google/leveldb/blob/master/db/log_reader....

https://github.com/google/leveldb/blob/master/db/log_writer....

I think the decision not to open-source RecordIO is likely related to legacy baggage that's baked into the format. The LevelDB format above avoids that.

It doesn't appear that the headers for this are public though.

vinkelhake8y ago

If you were interested in RecordIO, then this project might also be of interest to you: https://github.com/google/riegeli

2 more replies

dekhn8y ago

TFRecords are the closest thing to recordio that has Google support.

1 more reply

Willson508y ago

You might be interested in KSQL, SQL queries that run on Kafka streams. https://www.confluent.io/product/ksql/

throwaway847428y ago

Nah. I’m interested in quickly querying on-disk data specifically, ie proto-based application logs and the like (another thing the world needs to adopt more broadly imo).

1 more reply

reacharavindh8y ago

Have limited protobuf knowledge.

Why not use SQLite[1] for storing this data? Storing structured data in binary format, and being able to run SQL queries on it, is already possible with SQLite right?

[1] - https://www.sqlite.org/appfileformat.html

1 more reply

SOLAR_FIELDS8y ago

While not exactly what you are describing, I work for another company that uses protobufs extensively and we have some nice internal tools similar to what you describe. I really wish we could open source those too. I feel like the wheel is reinvented a lot with protobuf in several of the large companies who use it.

endymi0n8y ago

I‘m smelling the SQL case could be reasonably easily thrown together with PostgreSQL and a custom Foreign Data Wrapper based on protobuf-c (prior art: cstore_fdw by the Citus folks). Proto definitions then should compile rather cleanly to table definitions, at least one level down (PG isn‘t so good with nested structures).

The main thing stopping this endeavour is probably that to the best of my knowledge, there isn‘t any standardization in the Protobuf community about file formats serializing multiple of these together like RecordIO - that, and my C skills are pretty rusty by now :)

grandinj8y ago

You could add a TableEngine extension to H2 (h2database.com), pretty easily which would give you full SQL query functionality over such a file

throwaway847428y ago

Nope. Protos have repeated fields and can be hierarchical (that is, can contain other protos) and even recursive (that is, contain themselves, possibly as repeated fields). H2 is not going to work.

1 more reply

chrissnell8y ago

I would also love to see a protobuf/gRPC decoder for wireshark. Bonus: the ability to filter sniffed packets based on a field value.

lobster_johnson8y ago

How does RecordIO compare with Parquet and Arrow? Different use cases?

throwaway847428y ago

Don’t know about Arrow, but Parquet is a columnar format. Such formats can’t write record-by-record, they need a large number of records to shred into columns in order to realize their columnar benefits. In contrast, appending to RecordIO is little more than writing a binary string. The downside of RecordIO is that you can’t just read some fields in a message and not others. You have to deserialize the whole message. RecordIO is cheap to write and well suited for cases where reading the entire message is not that big a deal. Columnar formats are more suited for the cases where it’s ok to pay the relatively substantial up front encoding cost for vastly greater performance in analytical workloads. Advanced ones contain additional metadata (such as range and hash constraints, the former can be both per file and per block) which the analytical runtime will be able to take advantage of in order to avoid doing the work that doesn’t need to be done.

erik_seaberg8y ago

Sounds a lot like a Hive query over self-describing Avro files.

mabynogy8y ago· 4 in thread

I can't use something with a CoC.

durkie8y ago

this seems like it's only relating to people wishing to contribute to prototool. also, it's uber, so it's nice/expected that they would have this sort of thing.

also the code is basically about not being a jerk to other people. seems like a low bar to meet.

n428y ago

Why not?

recursive8y ago

I'm guessing the code disallows their conduct.

mabynogy8y ago

It's a political tool. It means they are into politics.

2 more replies

adam_gyroscope8y ago

We built https://github.com/GyroscopeHQ/grpcat at my company, which takes text-format protos as input and sends them to a gRPC endpoint. Looking at Prototool I think I should just merge the functionality into Prototool. This is cool!

kodablah8y ago

Says "Handle installation of protoc [...] behind the scenes in a platform-independent manner without any work on the part of the user", doesn't support Windows yet [0]. Granted, as pre-1.0 I should probably read the features as goals.

0 - https://github.com/uber/prototool/issues/9

flippmoke8y ago

Here are some other great tools that is quite useful with protobufs, one in C++ and one in pure javascript.

https://github.com/mapbox/protozero

https://github.com/mapbox/pbf

grizzles8y ago

danby - grpc for the browser :: is looking for testers https://github.com/ericbets/danby There are two upcoming features. The first one is streaming support. The second is a callback API template that mirrors the grpc node API exactly. Or you will have the choice to stick with the current promise API. It's not a priority for us at the moment but adding a simple load balancer that distributed traffic randomly across a set of servers would be a ~5 line patch.

hurricaneSlider8y ago

Always wished there was a tool for protobuf which could test whether a changes to any .proto files were backwards compatible and if not raise an error

ris8y ago

Yet another tool trying to "manage" "packages" on my machine!

j / k navigate · click thread line to collapse

62 comments

33 comments · 8 top-level

throwaway847428y ago· 21 in thread

In another decade or so the world might replicate half of the very nice internal tools Google has.

puzzle8y ago

Other tools and features that don't exist outside:

- a tee loadbalancer for gRPC, forwarding the same requests to both A and B backend pools, but only returning results from A. I don't think Envoy has this, but it should.

- load balancing dashboards showing traffic between frontends and backends

- load balancer support for dynamic sharding

- gnubbyd under ChromeOS: https://groups.google.com/a/chromium.org/forum/m/#!msg/chrom... (I think most of this is doable these days, but the initial setup requires a Linux system)

- Kubernetes: server-specific custom hyperlinks on dashboards (e.g. links to POD_IP:PORT/stats, /debug, etc. for each individual pod you are looking at)

- Kubernetes: canaries and automatic rollbacks

necubi8y ago

> - a tee loadbalancer for gRPC, forwarding the same requests to both A and B backend pools, but only returning results from A. I don't think Envoy has this, but it should.

Envoy can do this, via its shadowing feature. See the docs here: https://www.envoyproxy.io/docs/envoy/v1.6.0/api-v2/api/v2/ro....

1 more reply

kodablah8y ago

> Kubernetes: canaries and automatic rollbacks

Hot off the presses: https://cloudplatform.googleblog.com/2018/04/introducing-Kay.... Though you have to use Spinnaker.

1 more reply

philsnow8y ago

> a tee loadbalancer for gRPC, forwarding the same requests to both A and B backend pools, but only returning results from A

1 more reply

smarterclayton8y ago

akhilcacharya8y ago

Should you really be detailing the functionality of internal tools like this?

2 more replies

zellyn8y ago

Looks like some parts of it have escaped… https://github.com/eclesh/recordio

haberman8y ago

I think the open-source equivalent of RecordIO is the leveldb log format:

https://github.com/google/leveldb/blob/master/doc/log_format...

https://github.com/google/leveldb/blob/master/db/log_reader....

https://github.com/google/leveldb/blob/master/db/log_writer....

I think the decision not to open-source RecordIO is likely related to legacy baggage that's baked into the format. The LevelDB format above avoids that.

It doesn't appear that the headers for this are public though.

vinkelhake8y ago

If you were interested in RecordIO, then this project might also be of interest to you: https://github.com/google/riegeli

2 more replies

dekhn8y ago

TFRecords are the closest thing to recordio that has Google support.

1 more reply

Willson508y ago

You might be interested in KSQL, SQL queries that run on Kafka streams. https://www.confluent.io/product/ksql/

throwaway847428y ago

Nah. I’m interested in quickly querying on-disk data specifically, ie proto-based application logs and the like (another thing the world needs to adopt more broadly imo).

1 more reply

reacharavindh8y ago

Have limited protobuf knowledge.

Why not use SQLite[1] for storing this data? Storing structured data in binary format, and being able to run SQL queries on it, is already possible with SQLite right?

[1] - https://www.sqlite.org/appfileformat.html

1 more reply

SOLAR_FIELDS8y ago

endymi0n8y ago

grandinj8y ago

You could add a TableEngine extension to H2 (h2database.com), pretty easily which would give you full SQL query functionality over such a file

throwaway847428y ago

Nope. Protos have repeated fields and can be hierarchical (that is, can contain other protos) and even recursive (that is, contain themselves, possibly as repeated fields). H2 is not going to work.

1 more reply

chrissnell8y ago

I would also love to see a protobuf/gRPC decoder for wireshark. Bonus: the ability to filter sniffed packets based on a field value.

lobster_johnson8y ago

How does RecordIO compare with Parquet and Arrow? Different use cases?

throwaway847428y ago

erik_seaberg8y ago

Sounds a lot like a Hive query over self-describing Avro files.

mabynogy8y ago· 4 in thread

I can't use something with a CoC.

durkie8y ago

this seems like it's only relating to people wishing to contribute to prototool. also, it's uber, so it's nice/expected that they would have this sort of thing.

also the code is basically about not being a jerk to other people. seems like a low bar to meet.

n428y ago

Why not?

recursive8y ago

I'm guessing the code disallows their conduct.

mabynogy8y ago

It's a political tool. It means they are into politics.

2 more replies

adam_gyroscope8y ago

kodablah8y ago

0 - https://github.com/uber/prototool/issues/9

flippmoke8y ago

Here are some other great tools that is quite useful with protobufs, one in C++ and one in pure javascript.

https://github.com/mapbox/protozero

https://github.com/mapbox/pbf

grizzles8y ago

hurricaneSlider8y ago

Always wished there was a tool for protobuf which could test whether a changes to any .proto files were backwards compatible and if not raise an error

ris8y ago

Yet another tool trying to "manage" "packages" on my machine!

j / k navigate · click thread line to collapse