I think this could be hugely useful to very large open source projects (like databases or operating systems) that may be intimidating for contributors to build and test.
Using Bazel (aka. Blaze) every day is one of the things that has made me dread ever leaving Google. Fast, reproducible builds are amazing. Once you have used this tool, it is very hard to go back. Personally, I'm thrilled that it has been open sourced.
Nice to see a bunch of projects that've been generalizable and heavily used internally finally see the light of the outside world. Now, to start evangelizing them.
A little too optimistic :) You can't build Android, Chrome, ChromeOS, iOS apps, etc. via blaze.
EDIT #1: I see support for building Objective-C apps is already present in Bazel. EDIT #2: Bazel uses Skylark, a Python-like language, which could be used to implement all sorts of extensions, including the one I was referring to.
For Java and C++ binaries, yes, assuming you do not change the toolchain. If you have build steps that involve custom recipes (eg. executing binaries through a shell script inside a rule), you will need to take some extra care:
Do not use dependencies that were not declared. Sandboxed execution (–spawn_strategy=sandboxed, only on Linux) can help find undeclared dependencies.
Avoid storing timestamps in generated files. ZIP files and other archives are especially prone to this.
Avoid connecting to the network. Sandboxed execution can help here too.
Avoid processes that use random numbers, in particular, dictionary traversal is randomized in many programming languages."
5 or 6 years ago I had to have Windows to run CAD software, but I found it easier to have a virtualbox install w/ Ubuntu in it for software development than trying to write code on Windows. The performance was good enough an the usability was pretty good. I imagine it has only gotten better since then.
Not having a unix specific build system work on windows seems to be pretty much the expected behaviour. As opposed to a firewall that runs linux internally requiring windows or OS/X to talk to it...
If you've got the expertise to add it, sounds like it would be welcomed.
The FAQ is pretty clear about their reasons. It talks about tools, not other dependencies, but I'm sure the reasoning is the same: "Your project never works in isolation... To guarantee builds are reproducible even when we upgrade our workstations, we at Google check most of these tools into version control, including the toolchains and Bazel itself."
It's a sensible policy and one I use myself. Do you have a better reason for disliking this policy than a knee-jerk "yuck?"
Some reasons are the bloat, the possibility of "accidental" forks when a non-upstream version is compiled and checked-in binary-only, crufty old versions hanging around, and security problems. It adds extra work for downstream packagers having to pick it apart for distros.
Bundling gets particularly bloaty for git repos, since the history is always included in each clone. For perforce or SVN it doesn't matter so much as you only get the latest version of everything. In git each time there's a dependency update, it will pretty much add the size of the new jar to the .git directory. Over time it's going to grow huge. If at a later date the repository owner decides on a new policy where the third party files are not bundled, then even removing the directory from the current head doesn't shrink the repo size.
There are binaries in there for Mac, Linux and Windows (.exe file at least). You either need one or the other, not all at the same time.
This sort of thing is fine for proprietary software used in a controlled environment, but for open source it looks kludgy.
An alternative could be to have a "dependencies" repository that would be shallow-cloned as needed. At least that way the source code repo only would have source in it, not jars or executables. It'd ensure separation was enforced and you could still track requirements per version or change the policy later.
Google has a legendarily awesome centralized version control system.
I thought it was just perforce.
It seems like a stricter, huge make-like harness (in fact it reminds me of the mozilla firefox python build system a bit).
It's not bad by any means, but it seems like to me it doesn't "magically" fix the "be reproducible" problem at all (which is what it seem to claim)
Am I missing something?
What Bazel does, however, is to make it possible to run build steps in a sandbox (although the current one is kinda leaky) so that your build is isolated from the environment and thus behaves in the same way on any computer. It also tracks dependencies correctly so that it knows when a specific action needs to be re-run.
This makes it possible to diagnose non-reproducible build steps easily. At Google, the hit rate of our distributed build cache usually floats around 99%, and this would be impossible without reproducible build steps.
https://wiki.debian.org/ReproducibleBuilds
Would Bazel help with the remaining long tail of packages in Debian?
[0]: https://github.com/google/bazel/tree/master/tools/cpp [1]: https://github.com/google/bazel/tree/master/src/java_tools/b...
If you run a script that outputs intermediate files, Bazel needs to know about that scripts inputs and outputs. And it works better if it knows them ahead of time.
There are a handful of Blaze derivatives built by Xooglers. Pants and Buck come to mind. They also share the trait of using sandboxed Python to define a build configuration. I'll take it over make syntax any day!
Writing generators to run this way is kind of a pain, actually, sort of like writing code to run in a sandbox. Also, the generators themselves must be checked in, and often built from source. But we consider the results worth it.
It never explains any of this explicitly, but there are hints. [1], [2], [3].
[1] "Many rules also have additional attributes for rule-specific kinds of dependency, e.g. 'compiler'" -- http://bazel.io/docs/build-ref.html#types_of_dependencies
[2] http://bazel.io/docs/build-encyclopedia.html#cc_binary.hdrs_...
[3] "The build system runs tests in an isolated directory where only files listed as 'data' are available" -- http://bazel.io/docs/build-ref.html#data
Edit: A comment below seems to suggest that this is not the case: "Within Google we use a form of sandboxing to enforce that" (emphasis mine). -- https://news.ycombinator.com/item?id=9259147
Is Bazel developed fully in the open?
Unfortunately not. We have a significant amount of code
that is not open source; in terms of rules, only ~10% of
the rules are open source at this point. We did an
experiment where we marked all changes that crossed the
internal and external code bases over the course of a few
weeks, only to discover that a lot of our changes still
cross both code bases.What they mean is that changes to the internal source of Blaze often involve changes to both the open sourced part, which is Bazel, and the closed parts, which are additional rules that are neither open sourced, nor included in Bazel (Blaze has about 5x as many rules as Bazel).
It's best to make atomic changes, so rather than split the changes, review and submit the open source changes externally, and the closed rules changes internally (which would complicate reviews, testing, syncing and rollbacks), then pull in the external changes, they submit these cross-code-base changes internally, then dump the change into the external repo. The next paragraph on that page makes it clear that the code is open, even if not all of the development process is.
To be clear, all of Bazel is open source and the source is available here: https://github.com/google/bazel
EDIT: I fully understand that this is a build tool for multiple languages. But its raison d'etre is speed. So I'm asking what techniques does Bazel use to accelerate builds and how do they differ from those used by sjavac, which is also designed to accelerate builds of huge projects?
Bazel also builds other languages, such as C++ and Objective-C.
We do invoke the Java compiler through a wrapper of our own. We think we can make that work as a daemon process to benefit from a hot JVM, but haven't gotten round to that.
>> "Gradle: Bazel configuration files are much more structured than Gradle's, letting Bazel understand exactly what each action does. This allows for more parallelism and better reproducibility"
The value of "more parallelism" depends on the complexity of your Java source code base. I can easily imagine why this extra structure can lead to more parallelism.
However, I am not buying "better reproducibility" without justification or explanation. I've had very reproducible Maven builds for years (and I don't see how Gradle would be different). So I would love to know which aspects are improved upon with this structure, if someone could expand or explain.
Finally, I'm very wary of "much more structure". The worst thing about Maven is its extreme insistence on structure and schema and very specific architecture of your build tasks and components. In contrast, with Gradle, you can freely shape your build scripts to reflect the "build architecture" of your source tree in a minimal, maintainable way. Furthermore, when your application's needs change, refactoring your build is far easier in Gradle, thanks to its internal-DSL style (the build script is code).
If the structure isn't "free", you pay for structure with reduced build script development speed. For Google, it's a tradeoff worth having with that massive source tree.
We've put a bunch of work into making sure that we know about every file that goes into the Java compilation, and if any of them changes (and only then) do we recompile. Within Google, we use a form of sandboxing to enforce that.
You're also right that it isn't free - we have reason to believe that larger projects and larger teams will see benefits from using Bazel. Use your best judgement.
If you're interested, hanwen wrote a bunch rules with similar semantics as the internal rules, see https://github.com/google/bazel/tree/master/base_workspace/e... .
It would be nice to make these semantics match the external ones better, but it requires us to open up more tooling, so people won't need to write BUILD files.
BTW, thanks for the release! Will have a fun time digging through this over the next few days. I heard some murmurs that Blaze was going to be open sourced from around the watercooler but didn't think it'd be so soon.
* If I have a Maven-based project with heavy reliance on pre-built jars from Maven Central, what's the recipe to port it to Bazel?
* Related, if I have multiple github repos, say a couple open source libraries and a couple private repos, what's a good recipe in conjunction to Bazel?
For multiple Github repos, use http://bazel.io/docs/build-encyclopedia.html#http_archive or http://bazel.io/docs/build-encyclopedia.html#new_http_archiv... (depending on if it's a Bazel repository or not). Let us know if you have any questions or issues!
A couple more questions :)
* Any pointers for adding Scala (sbt?) support? I'd start here: http://bazel.io/docs/skylark/rules.html.
* Suppose I develop using multiple repos and http_archive. I'd like to make changes both to a library and to a project that depends on it simultaneously, without committing the library patches to master github repo just yet. Is there a way to configure the http_archive, let's say by saying "bazel --mode=local", and have it customize the remote archive http to use a different url (say, my github's fork instead of the master github) for that build?
I just wish that I had a high-performance replacement for linking that was cross-platform (deterministic mode for ar), and for non-C/++ flows. Writing a deterministic ar is about 20 lines of C-code, but then I have to bake that into the tool in awkward ways. For generalized flows, I've looked at fabricate.py as a ccache replacement, but the overhead of spinning up the Python VM always nukes performance.
Do you have some kind of way to verify that your makefile dependencies conform to your source dependencies? Is clang/gcc tracking sufficient for your use case? What about upgrading the compiler itself, does your makefile depend on that? If so, how?
Have you considered tup[0]? Or djb-redo[1]? Both seem infinitely better than Make if you are paranoid. tup even claims to work on Windows, although I have no idea how they do that (or what the slowdown is like). Personally, I'm in the old Unix camp of many-small-executables, non of which goes over 1M statically linked (modern "small"), so it's rarely more than 3 secs to rebuild an executable from scratch.
> (deterministic mode for ar)
Why do you care about ar determinism? Shouldn't it be ld determinism you are worried about?
Nope. I explicitly use a conservative approximation—this guarantees correctness, over speed. Building everything every time with a clean tree is where I begin; I start optimizing after that.
> Is clang/gcc tracking sufficient for your use case? What about upgrading the compiler itself, does your makefile depend on that? If so, how?
Self-rewriting Makefiles (to consume the .d files), combined with the cleaning necessary for them, become a large technical debt—especially given the complexity of the Makefile needed to generate them. Modern CCen just aren't capable of this. Perhap Doug Gregor's module system will land in C21/C++21, and we'll see some good, then.
> Have you considered tup[0]? Or djb-redo[1]?
Yes. They are both don't provide significantly better correctness guarantees combined with sufficiently better performance to justify the cost to porting to older Unixen. (This is a consensus opinion at my shop; I, personally, enjoy tup.)
> Why do you care about ar determinism? Shouldn't it be ld determinism you are worried about?
Determinism let's me cache *.o/a/so/dylib/exe/whatnot without getting false-positives due to time-stamp changes and owner/group permissions in the obj/ar files (see ar(1)). ld is deterministic under all the CCen I use by setting the moral-equivalent of -frandom-seed.
1. Binaries are checked in to source 2. It's more structured than Gradle 3. It's for very large code bases 5. It's nix only
But...
1. We've already had the "chuck it in a lib directory" approach. The distributed approach maven/ivy etc seems to be working for the millions of developers out there who just have to get through the end of the day without production going up in flames. I suppose it's like moving a portion maven central into your code base. Checked in. Feels very odd, and kinda against one of the pillars of JVM: Maven. Love it or hate it it's one of most mature build/repository types out there. npm, bower anyone?
2. Got to agree with astral303. This isn't really something to shout about. Better reproducibility? Gradle/SBT have had incremental builds for quite a while. We all know there's no silver bullet, if you don't declare your inputs and outputs to gradle/blaze tasks or seed with random values then you're only going to get unrepoduceable builds.
3. Very large, I get that.
4. Very large code bases tend to enterprise systems. Enterprise systems tend to have a plethora of platforms/OSs so it being
nix only is a drawback. However I suppose that if in charge of 10MLOC code base then I could mandate nix only builds? However in my experience they also tend to gravitate towards standards that seem to have longevity.I'm yet to give it a go so I'll reserve final judgement. However I will say that I do wonder how far we'd be if Googles through their brightest minds at and worked with Maven/Gradle/SBT etc to scale their builds. (Yes I realise it's multi-lang - so is gradle). Perhaps the whole community would benefit from performance benefits.
Anyway hats off Google guys. It looks impressive and no doubt I'll jumping all over it in 12 months. In the mean time I'm off to go read up on Angular 2.0, or Typescript or ES6 or ES7 or whatever else I
need* to know to get me through the day.Really I'm just jealous I don't have 10MLOC code base :D
The problem with maven and gradle is that their build actions/plugins can have have unobservable side effects.
This approach is more 'pure functional'. You have rules which take inputs, run actions, produce outputs and memoize them. If inputs don't change, then you use memoized outputs and don't run the action.
As long as your actions produce observable side effects in the outputs (and don't produce side effects which are not part of the outputs, but product state which depended upon in some manner), then you can do a lot of optimizations on this graph.
In my experience with maven and gradle, they are way way slower, and that's on relatively small projects
I look forward to trying it out. The ObjectiveC rules sound interesting especially given the state of XCode which is a laughable IDE.
So they started with the use cases likely to be the most popular.
Additionally, there are definitely cases where the implementations of rules at Google are a morass, and rather than dump it on the open source community, it makes more sense to clean them up when they get rebuilt.
Presumably will also make opensourcing internal projects easier. That can't be a bad thing :)
I read in the "Getting started":
> You can now create your own targets and compose them.
So does this mean it is a replacement for `make`? => Yes
Found the answer here: http://bazel.io/docs/FAQ.html
> Users interact with Bazel on a higher level. For example, it has built-in rules for "Java test", "C++ binary", and notions such as "target platform" and "host platform". The rules have been battle tested to be foolproof.
But does it give the optional custom level of control that for example CMake + Ninja provide? Or it's only high level rules?
You can [at least internally] define custom rules to handle pretty much anything, in almost-but-not-quite-python.
I know it as Blaze, which Bazel is an anagram of. Many files in the source have references to Blaze.
Multi-language support: Bazel supports Java, Objective-C and C++ out of the box, and can be extended to support arbitrary programming languages.
c'mon, not even the Go language from Google itself ?
When I first saw the headline I thought they'd open-sourced it.
What, if any, does the convergence among these projects look like longevity-wise?