Is this for cross-language doc generation? Refactoring tools? Something else?
Are there any concrete examples of a tool built on top of this that would otherwise be impossible / very difficult?
Google uses this approach internally to generate cross-references for a huge, heterogeneous multi-language codebase. Linking across generated code, connecting documentation to its references, and exposing all those features in editors, code browsers, code-review tools, and so forth, are all a lot easier when that information has a common representation.
And of course, those problems exist even in much smaller codebases. Kythe isn't really a "product", but rather an interlanguage for tools that manipulate source code.
That's a concise explanation. Thanks.
Of course, the bottleneck is always in achieving widespread integration with existing tooling. Your overview lists requirements for compiler and build system instrumentation alike, as well as tools that then consume and filter the graph data. It'll be interesting to see if Kythe gains the needed mindshare for this.
Shared IR (graph schema) and optimizers. Pluggable front ends (syntax viewers, editors, etc.) and backends (languages).
For those interested in code analysis and dev tools, another library you might want to check out is srclib (I'm one of the authors). srclib is an open-source polyglot code analysis library designed for editors and code explorers. Its mission, supporting a common language-independent schema to support building better language-aware tools, is closely aligned with Kythe's. There's documentation and a succinct description of the problem we're trying to solve at https://srclib.org.
srclib currently supports Go, Java, Python, JavaScript, Ruby, Haskell, and soon PHP. There's a simple command line API that editor plugins can call, and currently there are srclib plugins for Emacs, Sublime, and Atom. srclib also powers https://sourcegraph.com.
I'm looking forward to seeing where Kythe goes and hopefully integrating Kythe and srclib. I think this is a huge step forward toward better tools for programmers. Just ask anyone who works/used to work at Google about the quality of their internal dev tools vs. the outside world. Thanks to the Kythe team for sharing this with the world!
If there's a lot of custom logic, then it might be better to write an ad hoc tool that checks the AST of the Java against the AST of the C#. srclib and I think also Kythe are designed for building tools that want to be language-agnostic, rather than digging into specific language behavior.
I think an interesting possible application of this tool would be source-to-source compilation between languages. For example, once Objective-C support is added, could Kythe be the basis for something like j2objc?
"Kythe
A pluggable, (mostly) language-agnostic ecosystem for building tools that work with code."
For me, this is the best of all the listed "overviews", albeit still not fully clear.
The big features I'd like to see are more around collaboration and remote execution. The ability to share, search, remotely debug a big stack easily would be great. Github has taken some big steps forward on that but I'd love wrap that up into the editor. Use cases like natively connecting to a coworker's editor and see what is failing or review some code.
Is there search built-in or planned? I see some discussion in the storage format section, but only as a negative statement.