Dependency inference in Pants 2.0.0: Precise caching without the boilerplate (opens in new tab)

(blog.pantsbuild.org)

8 pointsstuhood5y ago3 comments

3 comments

3 comments · 1 top-level

stuhoodOP5y ago· 2 in thread

Hey folks! As mentioned in the post: file level dependency inference allows for dependency lists that are 30% smaller (improving cache hit rates), even while removing 90% of boilerplate from `BUILD` files.

Happy to answer any questions!

laurentlb5y ago

Nice article, and congratulations for Pants v2! What's the performance impact of dependency inference on `pants dependencies`?

Instead of just evaluating BUILD files, you also need to read every source file, if I understand correctly. I wonder how this affects performance at scale.

For comparison, users of Bazel sometimes use tools to generate/update the BUILD files when they modify a source file. You could view the BUILD file as a cache layer in that case.

stuhoodOP5y ago

Thanks for the kind words!

> What's the performance impact of dependency inference on `pants dependencies`?

There are two components to inferring dependencies:

1) Extracting dependencies from the code being built (in Python this means extracting `import` statements). This involves reading source files, but its runtime is proportional to the amount of code being built, rather than to the repository size.

2) Determining which files throughout the repository provide the various packages (due to how Python's module system requires things to be laid out in directories, this is possible with only file existence checks). This part is proportional to repository size, but in many languages it only requires file existence checks.

Since they're proportional to different things, the amount of time spent on each step depends on what is being built. But the output of both steps is very stable, so with the `pantsd` daemon running (by default), you pay a first build cost that is approximately "listing all files in the repository, and parsing the ASTs of only the files you asked to build". After that, edits to files incur a small incremental cost to parse their ASTs to update imports. Because Pants implements "early cutoff"/cleaning (see https://github.com/pantsbuild/pants/blob/master/src/rust/eng...), if you haven't edited the import statements of a file, the dependencies of that file will not be invalidated.

> For comparison, users of Bazel sometimes use tools to generate/update the BUILD files when they modify a source file. You could view the BUILD file as a cache layer in that case.

Yea. Since it isn't integrated, it's a slightly error prone arrangement, since you must always use wrapper tools (with their own daemons) to invoke Bazel.

How Pants accomplishes the second step above depends on the particular inference plugin that is in use: it's entirely possible to cache the indexing, but it would provide little benefit for Python due to the index only requiring file existence checks.

j / k navigate · click thread line to collapse