How GitHub Uses GitHub to Document GitHub (opens in new tab)

(github.com)

256 pointstmm111y ago53 comments

53 comments

42 comments · 9 top-level

conorgil14511y ago· 16 in thread

I personally found this write up extremely interesting and exciting.

I have always been interested in documentation and its order in the priority list of tasks which a development team has to tackle. It is not an original observation that documentation is critically important to the success of a project/code-base and yet it is often the last artifact produced (and many skip it altogether). I have recently been extremely interested in the idea that documentation should be moved to the top of the priority list and, rather than being a duplicative post-processing step, should be the "ground-truth" for generating lots of the follow on artifacts. For example, write API documentation first and use that to generate client side libraries, an API test suite, and server boiler plate code/skeleton.

In my search for existing projects and approaches, I came across many interesting things.

Swagger: https://helloreverb.com/developers/swagger

API Doc: http://apidoc.me/doc/gettingStarted

Slate: https://github.com/tripit/slate

Write the Docs: http://docs.writethedocs.org/

It was very interesting to read this GitHub post because they presented yet another approach to treating documentation as a first class citizen with different methods to write docs, host docs, and keep the docs updated.

I recently updated the API docs at my workplace to use the Slate tool I referenced above. We manually write docs in a Markdown file, manually use Slate to compile the MD file into HTML, and then manually deploy it to our host. This is approach is incredibly basic and non-scalable, but is light years better than what we had previously, which was API docs directly in the repo's README file.

I hope to learn more about the projects listed above (and many others!) as I explore different approaches for treating docs as a first class citizen and pick the approach which meets the requirements of my current team.

[EDIT] I am also anxiously awaiting a beta invite for http://readthedocs.com

lapfi11y ago

I always thought it would be a good idea to generate API documentation from tests. Kind of the same idea you have but the other way around.

The problem I used to have is that if you write your documentation by hand it tends to get out of sync with the code. You make a quick change to the code and forget to update the docs. After a while it's a mess unless you stay vigilant.

But if you generate documentation from tests they can't get out of sync. The example output that gets written to the docs comes from the application so it can't be wrong. And if a test fails, documentation doesn't get written. It also forces you to write tests which is a good thing. If you don't, you don't have docs.

I don't know if there are tools like this. I created one for Ruby/Rack apps that I have used in some of my projects. I think this approach works pretty well.

gjtorikian11y ago

I wholeheartedly agree with everything above. Two additional and related thoughts:

1. The original documentation tool I wrote for Atom, Biscotto, kept track of the number of undocumented classes / methods: https://github.com/gjtorikian/biscotto/blob/59f48ba2621a92ae... . It was hooked up to CI, and if the count fell below a certain threshold, the test failed.

2. Right now, we're exploring into working with JSON schema as a means of providing both testing validation and accurate documentation. If the schema says "This REST method expects a parameter of this type," it becomes very easy to write a test to enforce that behavior; documentation can be easily generated from it; and of course your production code is safer for it.

I'm a huge fan of introducing more cross-overs between testing and documentation. I think a lot of time is spent on "clever" (and subjective) validations like http://www.hemingwayapp.com/, but not enough time is spent on basic content checks. It's very easy to drift code, tests, and docs apart. We need to start thinking about all three of them working together.

1 more reply

conorgil14511y ago

That is an interesting idea.

I do like the idea of starting by writing docs because they are very lightweight and it forces you to think through the design of the API. For example, "the params on that route don't look correct" or "this is really difficult to explain, so I think we just over complicated this end point. Let's make it simpler".

One could make the TDD argument that writing tests first would accomplish the same thing, but I think tests are much heavier than docs because certain routes could have dependencies and other similar gotchas. For example, a route may require authentication so you either need to have a fake auth token in the test DB or you need to have your test first obtain an auth token via the API and then hit the end point you really want to test.

If you start by writing docs in something like Swagger, then it should be straight forward to generate API tests from those docs (given, you will still have some gotchas like in the authentication scenario, but if you are generating the tests, then you solve that issue once in your test generator and then the boiler plate is created for you). For example, if you add a required param to a route, you first update your docs to define this new require param and then re-generate the automated test suite. A generated test suite like that could be used for TDD purposes to develop against.

Can you expand on your idea of generating docs from tests? Your tool sounds very interesting. Is it open source for me to look at? If not, how did you have to format/structure your tests so that you knew what information to include in the docs? How did the generation work?

1 more reply

epidemian11y ago

RSpec does something along that vein with the documentation on https://relishapp.com/rspec. E.g., here's the documentation for predicate matchers: https://relishapp.com/rspec/rspec-expectations/v/3-1/docs/bu..., and here's the executable Cucumber file from which that page is generated: https://github.com/rspec/rspec-expectations/blob/master/feat....

Maybe an unnecessarily meta example, given that it's the executable documentation/test suite for a testing framework itself. But it's wicked cool!

pc8611y ago

To your knowledge, is there a tool like this for the C# and/or .NET ecosystem?

forsaken11y ago

Another option is Read the Docs, which is how we host the Write the Docs docs: http://docs.readthedocs.org/en/latest/ -- It's basically the above workflow that GitHub describes, but with Sphinx (http://sphinx-doc.org/) -- which is a mature and powerful documentation system.

conorgil14511y ago

I forgot to mention that, but it is already on my list of things to investigate. I came across readthedocs.com in the last few months (probably on HN) and sent a passionate email requesting access to the private beta. I haven't been selected yet, but I am anxiously waiting to try it out.

How would you say that readthedocs.com differs from the tool mentioned in a sibling comment (Readme: https://readme.io/)?

crdoconnor11y ago

>I have always been interested in documentation and its order in the priority list of tasks which a development team has to tackle. It is not an original observation that documentation is critically important to the success of a project/code-base and yet it is often the last artifact produced (and many skip it altogether). I have recently been extremely interested in the idea that documentation should be moved to the top of the priority list and, rather than being a duplicative post-processing step, should be the "ground-truth" for generating lots of the follow on artifacts. For example, write API documentation first and use that to generate client side libraries, an API test suite, and server boiler plate code/skeleton.

I wish people would stop viewing documentation as an additional formal step needed to 'package' the software (a nasty chore) and instead viewed it as an exercise in communicating as clearly and concisely as possible what the project is about and how it works.

The number one thing on my wishlist for any project's documentation is simply a glossary, which is trivial to create but is almost never done. Virtually every project has special terms for artefacts, usage, features, etc. which are burned into developers' skulls so deeply that they often forget that outsiders do not use this special terminology.

It gets worse when multiple terms refer to the same thing and the same terms refer to multiple (often ever so slightly different) things.

No fancy technology needed to write one of these things either. A simple text file will do.

The other thing that bugs me is how much documentation is simply describing code that should exist but doesn't. Scripts to build a project or deploy it, for instance. Or mindless test scripts.

conorgil14511y ago

> I wish people would stop viewing documentation as an additional formal step needed to 'package' the software (a nasty chore) and instead viewed it as an exercise in communicating as clearly and concisely as possible what the project is about and how it works.

Yes 1000x. One of the first things I do when evaluating a new project is read the docs and tutorials.

A glossary for each project would also be super useful. I have also run into the issues that you outline (one term to many definitions and vice versa). Do you think the glossary would help solve that by forcing folks to rename things to have clearer terms (if one is already used in the glossary, for example) or do you think that it is just a barrier for some developers to even create the glossary in the first place (out of sight, out of mind mentality).

Would you keep the glossary in git (or similar) so that it versions along with the code? It could be a required part of a code review to make sure that reasonable terms and/or acronyms introduced in a PR are added to the glossary.

NhanH11y ago

Isn't what you describe (documentation first, then code) basically a poor man version of literate programming ;) ?

conorgil14511y ago

I wasn't familiar with the term, but a quick search reveals Literate Programming on Wikipedia [1]. I assume that is what you were referencing. I skimmed the summary of the article and it sounds interesting (and similar to what I'm describing).

[1] https://en.wikipedia.org/wiki/Literate_programming

codezero11y ago

You may also want to check out Readme: https://readme.io/

conorgil14511y ago

Thanks for suggesting another tool to look into! I only glanced at their homepage, but it looks like they may be using Swagger under the hood because some visual elements are similar and the functionality is also similar. It does appear to have many of the features that I am looking for, so it definitely warrants further investigation.

2 more replies

zeroDivisible11y ago

If you like swagger, I'd also check RAML - http://raml.org/

I found it recently, but it looks quite interesting.

conorgil14511y ago

Thanks for suggesting this. I glanced over the home page and I am definitely adding it to my list of projects to explore. It claims to support many of the features that I am looking for and it will be fun to kick the tires!

rpgmaker11y ago

Whatever happened to that "git-based" document format Linus said he was going to work on?

jondot11y ago· 6 in thread

I'm planning to build a stack for internal company domain knowledge, and I've been thinking about middleman (http://middlemanapp.com) instead of Jekyll.

Middleman has impressive workflows and markdown processing (I'm guessing parallel to that of the Github/Jekyll solution or better). Also conrefs can be implemented by simple partials (which makes less contention for the probably huge conref file)

Though I have to be convinced by trying the Github/Jekyll stack, this does open my mind regarding Jekyll 2.0. I'm happy to see Github tell us their Jekyll story :)

technoweenie11y ago

I think the main take-away with this post is that static sites for docs are awesome. GitHub uses [nanoc](http://nanoc.ws/) for the [API documentation](https://developer.github.com/).

The only downside of hosting static text on GitHub Pages without Jekyll is that you have to push the generated HTML too.

gjtorikian11y ago

> which makes less contention for the probably huge conref file

We're planning on splitting up the conref files by section. So for example, we'd get a separate conref file for Pages, one for UI stuff, one for Enterprise, etc. My only compliant with partials is that it's one piece of content per file, but it's a trade-off vs one file with several conrefs (gotta CTRL-F for that text you want to change).

conorgil14511y ago

Ironcially, just earlier today my co-worker and I were also discussing plans to setup a domain knowledge service internally. He came across a tool called Raneto [1] which looks very promising, but we have not had time to play with it extensively yet. Perhaps it could be useful for your use-cases too.

[1] http://raneto.com/

mtmail11y ago

you probably mean https://middlemanapp.com/

jondot11y ago

Much thanks, fixed :)

nahname11y ago

Middleman has almost no automated testing suite. Important to note if you are going to invest in something.

Animats11y ago· 3 in thread

Github's convention that web pages for a project are in a different branch of the same project is kind of strange.

Also, those things they call "conrefs" are just "macros".

masklinn11y ago

> Github's convention that web pages for a project are in a different branch of the same project is kind of strange.

It does have the advantage of 0-configuration 0-conflict.

But because it namespaces through the branch, if you're using the repository for something other than just the pages you can't have the gh-pages simply follow/trail master unless you want a bunch of site crap at the root of your repository, and interacting with both code and documentation at the same time is more painful than it needs be.

> Also, those things they call "conrefs" are just "macros".

Macros have a wider implied range of behavior, possibly completely arbitrary.

A content reference attribute is just a placeholder or a very small textual include[0] (XML calls them "named entities", rST calls them "substitutions")

[0] usually not of a complete document

gjtorikian11y ago

I think a macro implies something that can be executed, and (rightly) ought to cause security-minded folks to double-take.

Conref isn't something we invented, it's straight out of DITA: http://dita.xml.org/arch-conref

snogglethorpe11y ago

> I think a macro implies something that can be executed

That isn't true... Traditionally a macro just refers to a substitution, maybe (but not necessarily) with parameter replacement, rescanning, etc. I'd say that lisp-style macros which can execute arbitrary code are actually rather rare historically....

bcRIPster11y ago· 3 in thread

Yo dawg!

bcRIPster11y ago

Awe, negative points? Really? You know you were thinking it when you saw the link title.

scrollaway11y ago

If everyone was already thinking it, what makes you think they want to see it in the comments?

1 more reply

stefs11y ago

yes, but it's still not in any way adding to the discussion.

1 more reply

SEJeff11y ago· 2 in thread

This title alone is gitception!

myared11y ago

If you like this, also see the talk titled "How Github uses Github to build Github".

http://zachholman.com/talk/how-github-uses-github-to-build-g...

bcRIPster11y ago

Welcome to the downmod train brother. Seems the two of us are being persecuted for trying to add some levity amongst the Hipsters.

afarrell11y ago· 1 in thread

I'm curious, how they write internal-facing documentation and how that effects the development experience for new github engineers.

Source diving through open source libraries, I've often wished for a "spelunker's guide": a text file laying out where things were and what I should read first to build a mental model I could use in understanding the rest of the source. I'm currently trying to figure out what the best way is for someone to write a spelunker's guide, especially if they've forgotten what it's like to be a beginner.

gjtorikian11y ago

> I'm curious, how they write internal-facing documentation and how that effects the development experience for new github engineers.

I wish I could show up a sample, but I can't, because it's internal. ;)

Honestly, I think a lot of the engineering documentation started organically. When you have a small team working on a feature, it's difficult to scale explanations to the rest of the company. One day someone sits down and starts writing all their thoughts out in Markdown, and just checks it into a docs folder. That's it. It's easy-to-read, short on code, and usually full of ASCII, like this: https://i.imgur.com/KTbyhyq.png

Writing documentation is the best way to get outside contributors involved with minimal investment on your part. It also forces you to try and explain what you've built.

If you can't pretend to go back and look at things like a beginner, grab someone unfamiliar with the project, and have them describe to you what they would expect, and how they think they should proceed. They may be able to provide you with insights on what needs to be described.

waldir11y ago· 1 in thread

Unfortunately the repository (as suggested by the screenshot[1]) seems to be private: https://github.com/github/help-docs

I assume that's because they may be documenting upcoming features before they are announced.

1. https://cloud.githubusercontent.com/assets/64050/5449088/7ad...

fidz11y ago

I wonder if they host everything on their own site (Github.com) or their own Github Enterprise site, which is inaccessible from outside network.

bostonvaulter211y ago· 1 in thread

Isn't the three second load page they list on the slow side?

gjtorikian11y ago

I wasn't super thrilled with it either, until I dug in and discovered that it's the global average. We have a ton of international and mobile traffic, which factors into this sum:

1. United States (average load: 1.97 s) 2. United Kingdom (average load: 2.29 s) 3. India (average load: 7.48 s) 4. China (average load: 12.11 s)

Average load of these countries is 6 seconds, which seems absolutely horrid...until I tell you that the US has about seven times more traffic than the UK.

I didn't want to fudge the graph and take out those slow outliers--the truth's the truth.

forrestthewoods11y ago

GitHub Pages is one of the most shocking hacks I've ever come across. Not the worst mind you, just the most shocking. Most of GitHub is clean and good. But making a magic gh-pages branches is simply horrific. I'm still somewhat dumb founded that's the best method they could come up with.

j / k navigate · click thread line to collapse

53 comments

42 comments · 9 top-level

conorgil14511y ago· 16 in thread

I personally found this write up extremely interesting and exciting.

In my search for existing projects and approaches, I came across many interesting things.

Swagger: https://helloreverb.com/developers/swagger

API Doc: http://apidoc.me/doc/gettingStarted

Slate: https://github.com/tripit/slate

Write the Docs: http://docs.writethedocs.org/

[EDIT] I am also anxiously awaiting a beta invite for http://readthedocs.com

lapfi11y ago

I always thought it would be a good idea to generate API documentation from tests. Kind of the same idea you have but the other way around.

I don't know if there are tools like this. I created one for Ruby/Rack apps that I have used in some of my projects. I think this approach works pretty well.

gjtorikian11y ago

I wholeheartedly agree with everything above. Two additional and related thoughts:

1 more reply

conorgil14511y ago

That is an interesting idea.

1 more reply

epidemian11y ago

Maybe an unnecessarily meta example, given that it's the executable documentation/test suite for a testing framework itself. But it's wicked cool!

pc8611y ago

To your knowledge, is there a tool like this for the C# and/or .NET ecosystem?

forsaken11y ago

conorgil14511y ago

How would you say that readthedocs.com differs from the tool mentioned in a sibling comment (Readme: https://readme.io/)?

crdoconnor11y ago

It gets worse when multiple terms refer to the same thing and the same terms refer to multiple (often ever so slightly different) things.

No fancy technology needed to write one of these things either. A simple text file will do.

The other thing that bugs me is how much documentation is simply describing code that should exist but doesn't. Scripts to build a project or deploy it, for instance. Or mindless test scripts.

conorgil14511y ago

Yes 1000x. One of the first things I do when evaluating a new project is read the docs and tutorials.

NhanH11y ago

Isn't what you describe (documentation first, then code) basically a poor man version of literate programming ;) ?

conorgil14511y ago

[1] https://en.wikipedia.org/wiki/Literate_programming

codezero11y ago

You may also want to check out Readme: https://readme.io/

conorgil14511y ago

2 more replies

zeroDivisible11y ago

If you like swagger, I'd also check RAML - http://raml.org/

I found it recently, but it looks quite interesting.

conorgil14511y ago

rpgmaker11y ago

Whatever happened to that "git-based" document format Linus said he was going to work on?

jondot11y ago· 6 in thread

I'm planning to build a stack for internal company domain knowledge, and I've been thinking about middleman (http://middlemanapp.com) instead of Jekyll.

Though I have to be convinced by trying the Github/Jekyll stack, this does open my mind regarding Jekyll 2.0. I'm happy to see Github tell us their Jekyll story :)

technoweenie11y ago

I think the main take-away with this post is that static sites for docs are awesome. GitHub uses [nanoc](http://nanoc.ws/) for the [API documentation](https://developer.github.com/).

The only downside of hosting static text on GitHub Pages without Jekyll is that you have to push the generated HTML too.

gjtorikian11y ago

> which makes less contention for the probably huge conref file

conorgil14511y ago

[1] http://raneto.com/

mtmail11y ago

you probably mean https://middlemanapp.com/

jondot11y ago

Much thanks, fixed :)

nahname11y ago

Middleman has almost no automated testing suite. Important to note if you are going to invest in something.

Animats11y ago· 3 in thread

Github's convention that web pages for a project are in a different branch of the same project is kind of strange.

Also, those things they call "conrefs" are just "macros".

masklinn11y ago

> Github's convention that web pages for a project are in a different branch of the same project is kind of strange.

It does have the advantage of 0-configuration 0-conflict.

> Also, those things they call "conrefs" are just "macros".

Macros have a wider implied range of behavior, possibly completely arbitrary.

A content reference attribute is just a placeholder or a very small textual include[0] (XML calls them "named entities", rST calls them "substitutions")

[0] usually not of a complete document

gjtorikian11y ago

I think a macro implies something that can be executed, and (rightly) ought to cause security-minded folks to double-take.

Conref isn't something we invented, it's straight out of DITA: http://dita.xml.org/arch-conref

snogglethorpe11y ago

> I think a macro implies something that can be executed

bcRIPster11y ago· 3 in thread

Yo dawg!

bcRIPster11y ago

Awe, negative points? Really? You know you were thinking it when you saw the link title.

scrollaway11y ago

If everyone was already thinking it, what makes you think they want to see it in the comments?

1 more reply

stefs11y ago

yes, but it's still not in any way adding to the discussion.

1 more reply

SEJeff11y ago· 2 in thread

This title alone is gitception!

myared11y ago

If you like this, also see the talk titled "How Github uses Github to build Github".

http://zachholman.com/talk/how-github-uses-github-to-build-g...

bcRIPster11y ago

Welcome to the downmod train brother. Seems the two of us are being persecuted for trying to add some levity amongst the Hipsters.

afarrell11y ago· 1 in thread

I'm curious, how they write internal-facing documentation and how that effects the development experience for new github engineers.

gjtorikian11y ago

> I'm curious, how they write internal-facing documentation and how that effects the development experience for new github engineers.

I wish I could show up a sample, but I can't, because it's internal. ;)

Writing documentation is the best way to get outside contributors involved with minimal investment on your part. It also forces you to try and explain what you've built.

waldir11y ago· 1 in thread

Unfortunately the repository (as suggested by the screenshot[1]) seems to be private: https://github.com/github/help-docs

I assume that's because they may be documenting upcoming features before they are announced.

1. https://cloud.githubusercontent.com/assets/64050/5449088/7ad...

fidz11y ago

I wonder if they host everything on their own site (Github.com) or their own Github Enterprise site, which is inaccessible from outside network.

bostonvaulter211y ago· 1 in thread

Isn't the three second load page they list on the slow side?

gjtorikian11y ago

I wasn't super thrilled with it either, until I dug in and discovered that it's the global average. We have a ton of international and mobile traffic, which factors into this sum:

1. United States (average load: 1.97 s) 2. United Kingdom (average load: 2.29 s) 3. India (average load: 7.48 s) 4. China (average load: 12.11 s)

Average load of these countries is 6 seconds, which seems absolutely horrid...until I tell you that the US has about seven times more traffic than the UK.

I didn't want to fudge the graph and take out those slow outliers--the truth's the truth.

forrestthewoods11y ago

j / k navigate · click thread line to collapse