Docs as code (2017) (opens in new tab)

(writethedocs.org)

111 pointssumnole1y ago99 comments

99 comments

78 comments · 27 top-level

simonw1y ago· 13 in thread

A subset of this idea is a hill I am willing to die on: the documentation for a codebase should live in the same repository as the codebase itself.

I'm talking about API documentation here - for both code-level APIs (how to use these functions and classes) as well as HTTP/JSON/GRPC/etc APIs that the codebase exposes to others.

If you keep the documentation in the same repo as the code you get so many benefits for free:

1. Automatic revision control. If you need to see documentation for a previous version it's right there in the repo history, visible under the release tag.

2. Documentation as part of code review: if a PR updates code but forgets to update the accompanying documentation you can catch that at review time.

3. You can run documentation unit tests - automated tests that check that the documentation at least mentions specific pieces of the code (discovered via introspection). I wrote about that a few years ago and it's been working great for me: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...

4. Most important: your documentation can earn trust. Most documentation is out of date and everyone knows that, which means people default to not trusting documentation. If anyone who looks at the commit log can see that the documentation is being actively maintained alongside the code it documents they are far more likely to learn to trust it.

The exception to this rule for me is user-facing documentation describing how end users should use the features provided by the software. I'd ideally love to keep this in the repo too, but there are rational reasons not to - it might be maintained by the customer support team who may want to work in more of a CMS environment, for example.

xorcist1y ago

Love your blog, but in this case I want to take a more nuanced, if not opposite, stance:

There are many things closely related to code, that shouldn't necessarily live in the same repository. First, we need a common understanding of what should live together in a repository. This is much like the discussion about mono vs. multi-repo. A good rule of thumb is that if it is branched together, it lives together.

Effective documentation is not only a strict API reference, and not something that can be generated from docstrings alone. It offers a high level overview to understand the problem being solved, the architecture of the software, and a general roadmap of how it is developed. Effective documentation should cover both backwards and forwards revisions and how those migrations should be handled.

But this is also true on a reference level. Reading the documentation of a specific function I want to know if something relevant happens to this function in the next revision. There is nothing worse than checking out documentation for current production revision 34.5 and follow best practice there only to discover I should have checked out revision 34.6 instead because best practice changes there. Specific revisions should be documented, but documentation should not be limited to a specific revision.

There is a scale of how closely other artifacts follow code revisions: Tests is mostly branched with code, and should probably live together. Documentation can sometimes be branched with code, some should and some shouldn't live together with code. Deployment code and configuration management must be able to deploy old and new code from the same code base, and is even less likely to benefit from living with it. Then there's application state and test data which is something else entirely.

simonw1y ago

If the deployment code needs to be able to ship different versions, I would keep that deployment code in a separate repository - with its documentation bundled there.

The other form of documentation that I am passionate about is documentation that lives in issues, and then linked to from commit messages.

The great thing about issues and issue comments is that they have a clear timestamp attached to them, and there is no expectation that they will be kept up-to-date in the future.

This makes them the ideal place to keep documentation about how the code evolved overtime, and the design decisions that were made along the way.

1 more reply

NomDePlum1y ago

"Specific revisions should be documented, but documentation should not be limited to a specific revision."

It's unclear to me what this is trying to argue. So apologies if the below entirely misses your point.

Technical documentation that refers to a codebase should live and be maintained with it. Otherwise there will certainly be drift. It can obviously still happen but at least it is provable it shouldn't have.

Not maintaining accurate documents is like disabling tests because they don't pass. It's easy to do but not right.

A checked in codebase to me should be as current and correct as possible. That includes accurate documentation.

I've rarely seen documentation that isn't tied to the codebase being maintained/valued.

1 more reply

godelski1y ago

> a hill I am willing to die on: the documentation for a codebase should live in the same repository as the codebase itself.

I'm a big fan of this and treating documentation like a first class citizen.

There's also another benefit I think should be explicitly mentioned. It makes debugging, onboarding, and solving things much faster. We all know and have experienced the joke where you question who wrote this pile of garbage to find out that it was you all along. But at the core of this joke is the fact that we can't even remember what we ourselves did. So while things make sense at the time and might even seem obvious, that does not mean it'll continue to make sense nor that it'll be obvious to others. Especially to people who are onboarding into a new codebase.

Yes, documenting while you code takes "longer." But it only takes longer in the short run. It is much faster in the long run. The question you have to ask is if you're doing a sprint or a marathon. But then again there's very ill advised and self-contradictory advice on well known sites[0] and some companies perform back to back sprints. But I don't think people realize we're the ones creating our own messes. As anyone with anxiety will tell you, when you are rushing around it becomes easy to overlook small mistakes that will compound and only accumulate to make your anxiety worse than it was had you just slowed down in the first place. Creating a negative feedback loop where you only get more stressed to end up creating more problems than you solve.

There's times to move fast and break things, but if you don't also dedicate time to clean up your house will be filled with garbage and inhabited by a Lovecraftian entities made of spaghetti and duct tape.

[0] https://www.codecademy.com/resources/blog/what-is-a-sprint/

tivert1y ago

5. The documentation won't get lost in a botched wiki migration or something like that.

The documentation in the repo should not be restricted to relatively low-level stuff about APIs, it should also include design documents and cover the higher level concepts the developers use to make sense of the app and its APIs. I can't tell you how many times I've seen these concepts lost after the original developers move on, and then get violated in ways that make the app much harder to comprehend.

ranger_danger1y ago

The "documentation" for Lemmy consists merely of an auto-generated JavaScript library API dump with no real explanation for what most of the endpoints do (and are often named ambiguously) or how the general flow of things is supposed to work, or even how to do common things like find a user's comments or posts (would you have guessed they're both under "/user"? Because they sure don't tell you that). Especially if you don't know Javascript you're going to have a bad time trying to use that API. And the devs defend it if you tell them this, claiming "it defines everything perfectly, it's so easy."

One time my company purchased a $5k commercial license for x264 and were met with "the code is the documentation." That set us back literal weeks.

gofreddygo1y ago

  > the documentation for a codebase should live in the same repository as the codebase itself

This! 100%. Emphasis - codebase documentation. Not user guides.

After doing this a couple times, it's a no brainer. The benefits are significant, the effort minimal. Just add a docs dir at the project root and go to town.

The docs dir has some very interesting stuff - how to run parts of the api locally, tricks to make auth bearable for local development, commands that get new team members going at hyperspeed, what parts talk to what parts, which files are important for what flows, why some refactoring was attempted but abandoned, high level limitations and benchmarks, history on how some monstrosity came to be with some jokes sprinkled about.

everything just one cmd+shift+f away.

hitchstory1y ago

It works for user facing documentation too. There are actually pretty good reasons for this - e.g. you can use the test to autogenerate up to date screenshots with playwright to put in the documentation.

I'm pretty convinced that there should be a single source of truth for specifications, tests and documentation but I think the industry will take a while to catch up to this idea.

I built a testing library centered around this (same as my username) but it's hard to get people to stop writing unit tests :)

simonw1y ago

I actually built my own Playwright screenshotting software with this idea in mind too: https://shot-scraper.datasette.io/ - I wrote about using that for my project documentation here: https://simonwillison.net/2022/Oct/14/automating-screenshots...

Really it comes down to the team you are working with. If you have user-facing documentation authors who are happy with Markdown and Git you can probably get this to work.

1 more reply

strken1y ago

I would be very happy if as much developer documentation as possible was actually executed as part of the code.

For example, a diagram of how different services interact can go out of date. It would be better if there was a config file describing which services can be called, and this config file was used to generate firewall rules (for the case where dependencies on services are missing) and alert rules (for the case where unnecessary dependencies are never removed). Another example might be OpenAPI docs that you use to validate requests and responses.

I think that when you enforce a common source of truth behind both your docs and the functionality of your system, those docs can never become outdated. If you just shove docs into git without using them for anything they can easily rot away.

clumsysmurf1y ago

I have often wondered why Android's javadoc is so awful ... and thought, maybe precisely because its its embedded in such a large codebase it doesn't get updated for risk vs perceived benefit reasons (becuase of proximity of javadoc to code). Of course, it could be cultural or other things ... Perhaps the tooling sees changed sources, false positives for code changes, and there is a desire to eliminate this to help downstream consumers etc?

cedws1y ago

True. Confluence or whatever corp shitware is where technical documentation goes to die.

klysm1y ago

I’ll happily die on that hill with you

RandallBrown1y ago· 11 in thread

One of the main bullet points on the page is automated tests.

How do you write automated tests for documentation? Somehow require that blocks of code have documentation linked to them?

thesuperbigfrog1y ago

>> How do you write automated tests for documentation? Somehow require that blocks of code have documentation linked to them?

It could be tests to ensure documentation "builds" into all of the desired formats (e.g. web, pdf, ebooks, etc.) correctly.

Some programming languages have the idea of "documentation tests". In Rust, tests that are part of the documentation will run as part of the documentation build:

https://doc.rust-lang.org/rustdoc/write-documentation/docume...

Nzen1y ago

If we treat specifications written in gherkin syntax [0] as documentation, then the cucumber framework can match a line or stanza of gherkin to a test function [1].

I admit that, while I write instructions for how to test specific functionality in gherkin, our company would not countenance publishing a non-narrative description of the system's behavior to our client's employees.

[0] https://www.manning.com/books/writing-great-specifications

  Given a work order xx
  and xx isExpedite
  When an operator prints the jobcard
  Then expect a label in the footer that says Expedite

[1] https://cucumber.io/docs/cucumber/step-definitions/?lang=jav...

darknavi1y ago

Loads of things!

- Making sure example snippets still compile

- Checking if links are dead

- Check for standardized/proper formatting

Basically anything you'd want to enforce manually, try to enforce with CI.

0cf8612b2e1e1y ago

I believe Rust and Python have the ability to run tests defined in docstrings.

Spasnof1y ago

Yes there are certain libraries that can handle this. Essentially asserting that functions documented are valid / return the proper results.

See https://docs.python.org/3/library/doctest.html#module-doctes... as an example.

simonw1y ago

I wrote about my way of doing that here: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...

Short version: have tests that use introspection (listing functions and classes in a module, iterating over JSON API endpoints in the codebase etc) and then run regular expressions against your documentation searching for relevant headings or other pre-determined structures.

tpoacher1y ago

In matlab, where functions are "always" defined in their own file, there is a tool that checks if function documentations have all the right headers expected conventionally by matlab's documentation system (e.g. header, usage, examples, see-also links, etc). So this would be one example, I guess.

But I too would be interested to hear other people's insights who subscribe to this Docs as Code model.

spondylosaurus1y ago

Linters like Vale are pretty common for docs(-as-code).

blowski1y ago

> Somehow require that blocks of code have documentation linked to them

The Symfony (PHP) framework now does this. Code and config examples in the docs have automated regression tests.

brobdingnag_pp1y ago

Or just require file/function level comments. Requiring them to be helpful can be managed interpersonally, like someone was slacking off (they are)

fmbb1y ago

Yeah that’s one way. And you can test that docs don’t link to code that does not exist.

Here are some other Good Ideas in a blog post I stumbled upon the other week: https://azdavis.net/posts/test-repo/

WillAdams1y ago· 5 in thread

Why not just put forth/use Literate Programming?

https://www-cs-faculty.stanford.edu/~knuth/lp.html

corysama1y ago

I've been working on a personal project for what I call "semiliterate programming" ;) because I think "Write a book about your code that happens to contain all of your code" is a bridge too far for nearly everyone.

So, I'm trying to find the place between Doxygen and full-blown literate programming. Encouraging disjoint prose documentation rather than parameter-by-parameter docs or chapter-by-chapter docs.

Doxygen Markdown Support made my system largely unnecessary. But, I still use mine because it has real-time preview, is based on https://casual-effects.com/markdeep/, and I personally don't care for classic Doxygen style documentation.

Meanwhile, this article sounds like it's about literate programming stuff. But, it's actually about using code-oriented tools to write documentation.

WillAdams1y ago

I'm finding that future-self really appreciates the effort past-self made to document things in book form:

https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...

and

https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...

for my current project.

Moreover, it seems to me that there would be a great deal of synergy in using Literate Programming techniques when:

>using code-oriented tools to write documentation.

miohtama1y ago

Literate programming only works for small scripts and narrative documentation, not for e.g. API documentation.

nocman1y ago

not true:

"This book describes pbrt, a physically based rendering system based on the ray-tracing algorithm." ( https://www.pbr-book.org/3ed-2018/Introduction )

and:

"This book (including the chapter you’re reading now) is a long literate program." ( https://www.pbr-book.org/3ed-2018/Introduction/Literate_Prog... )

2 more replies

WillAdams1y ago

OIC.

I guess that the books at:

https://www.goodreads.com/review/list/21394355-william-adams...

which include a typesetting system, a font design language, a 3D renderer, and an MP3 implementation qualify as "small scripts"? What is the threshold for such? TeX.web outputs some 20,619 lines of Pascal code for conversion to C and compiling.

1 more reply

gwern1y ago· 4 in thread

An entertaining outcome here is that LLMs may render the docs vs code debate largely moot: as LLM coding capabilities increase and the cost per token plummets, it becomes increasingly possible to simply stop writing code at all, and instead write docs which are 'compiled' each time by a LLM to code which is then compiled normally and the code thrown away. The code can never get out of sync with the docs because it is always generated from the docs, in a way that previous brittle fragile complicated 'generate code from docs' approaches could only vaguely dream of.

To do bug fixes, one simply updates the docs to explain the new behavior and intentions, and perhaps include an example (ie. unit test) or a property. This is then reflected in the new version of the codebase - the codebase as a whole, not simply one function or module. So the global refactoring or rewrites happen automatically, simply from conditioning on the new docs as a whole.

This might sound breathtaking inefficient and expensive, but it's just the next step in the long progression from the raw machine ops to assembler to low-level languages like C or LLVM to high-level languages to docs/specifications... I'm sure at each step, the masters of the lower stage were horrified by the profligacy and waste of just throwing away the lower stage each time and redoing everything from scratch.

davidthewatson1y ago

Reacting to this and the previous comment, I've pursued the idea of iterative human-computer interaction via GPT as perhaps not the ultimate solution, but an extant solution to the problem posed by Knuth in literate programming before we had the magic to do it in HCI where the spectrum extends from humans to machines and assumes that the humans are as tolerant of the machines as the machines are of the human (Postel's law), in a frame that Peter Pirolli described here:

https://www.efsa.europa.eu/sites/default/files/event/180918-...

Which is to say that with an iterative, human-computer interaction (HCI), that is back-ended by a GPT (API) algorithm which can learn from the conversation and perhaps be enhanced by RAG (retrieval augmented generation) of code AND documentation (AKA prompt engineering), results beyond the average intern-engineer pair are not easily achievable, but increasingly probable given how both humans and computers are learning iteratively as we interact with emergent technology.

The key is that realizing that the computer can generate code, but that code is going to be frequently bad, if not hallucinatory in its compilability and perhaps computability and therefore, the human MUST play a DevOps or SRE or tech writer role pairing with the computer to produce better code, faster and cheaper.

Subtract either the computer or the human and you wind up with the same old, same old. I think what we want is GPT-backed metaprogramming produce white box tests precisely because it can see into the design and prove the code works before the code is shared with the human.

I don't know about you, but I'd trust AI a lot further if anything it generated was provable BEFORE it reached my cursor, not after.

The same is true here today.

Why doesn't every GPT interaction on the planet, when it generates code, simply generate white box tests proving that the code "works" and produces "expected results" to reach consensus with the human in its "pairing"?

I'm still guessing. I've posed this question to every team I've interacted with since this emerged, which includes many names you'd recognize.

Not trivial, but increasingly straightforward given the tools and the talent.

gwern1y ago

> Why doesn't every GPT interaction on the planet, when it generates code, simply generate white box tests proving that the code "works" and produces "expected results" to reach consensus with the human in its "pairing"? I'm still guessing. I've posed this question to every team I've interacted with since this emerged, which includes many names you'd recognize.

My guess would be that it's simply rare in the training data to have white box tests right there next to the new snippet of code, rather than a lack of capability. Even when code does have tests, it's usually in other modules or source code files, written in separate passes, and not the next logical thing to write at any given point in an interactive chatbot assistant session. (Although Claude-3.5-sonnet seems to be getting there with its mania for refactoring & improvement...)

When I ask GPT-4 or Claude-3 to write down a bunch of examples and unit-test them and think of edge-cases, they are usually happy to oblige. For example, my latex2unicode.py mega-prompt is composed almost 100% of edge cases that GPT-4 came up with when I asked it to think of any confusing or uncertain LaTeX constructs: https://github.com/gwern/gwern.net/blob/f5a215157504008ddbc8... There's no reason they couldn't do this themselves and come up with autonomous test cases, run it in an environment, change the test cases and/or code, come up with a finalized test suite, and add that to existing code to enrich the sample. They just haven't yet.

euroderf1y ago

I'd think that some combination of user-facing documentation (for the outside of the software) and requirements specs (for the inside of the software) oughta do the trick.

flunhat1y ago

Not sure why you got downvoted, this is basically the logical conclusion of programming in some sense. Sure, generating code from docs via an LLM will be riddled with bugs, but it's not like the sloppy Python code some postdoc in a biology lab writes is much better. A lot of their code gets to be correct via trial and error anyway.

"Professional" programmers won't rely on this level of abstraction, but that's similar in principle to how professional programmers don't spend their time doing data analysis with Python & pandas. i.e. the programming is an incidental inconvenience for the research analyst or data scientist or whatever and being able to generate code by just writing english docs and specs makes it much easier.

The real issue is debuggability, and in particular knowing your code is "generally" correct and not overfit on whatever specs you provided. But we are discussing a tractable problem at this point.

natpalmer17761y ago· 4 in thread

Personally I'm a fan of writing your first draft of documentation before writing the first line of code.

vlod1y ago

Readme Driven Development https://tom.preston-werner.com/2010/08/23/readme-driven-deve...

eesmith1y ago

Or from the 1990s, "User manual as spec" - https://archive.org/details/rapiddevelopment00mcco/page/324/...

For example, the Excel Basic spec: https://www.joelonsoftware.com/2006/06/16/my-first-billg-rev...

> Then I sat down to write the Excel Basic spec, a huge document that grew to hundreds of pages. I think it was 500 pages by the time it was done. (“Waterfall,” you snicker; yeah yeah shut up.)

On the page above "user manual as spec" is "point of departure spec", which would be more like the iterative prototyping style.

tracker11y ago

I'll often do similar if I'm exposing a library... I usually want to work out the semantics and API for how to use the library before actually writing the interface.

natpalmer17761y ago

Interestingly enough, my personal philosophy is to write all backend code as if it is a library for my future self.

That is to say, I want to be able to forget everything about a project and still have the resources I need to use the project code as if it were a black box consumable library.

fjni1y ago· 3 in thread

This is such an ignorantly engineering centric perspective.

There is value in the larger organization being able to consume documentation and commenting on it and contributing to it.

There is conceptual value in some of these things, but I find it to be overstated and the downsides entirely ignored.

Most documentation systems have a version history.

And most documentation systems are far easier adopted by people other than engineers.

This is the equivalent of pointing out that figma has x, y, and z benefits and designers are fluent in it, so we should be using that for documentation.

MetaWhirledPeas1y ago

> This is such an ignorantly engineering centric perspective.

I gather this is for technical documentation. For people who either are engineers or who work closely with engineers.

> There is value in the larger organization being able to consume documentation and commenting on it and contributing to it.

Agreed! One benefit of "docs as code" as this person calls it is that you can pile tools and metadata on top of it. People have created excellent tools to comment on and make suggestions to Git pull requests, for instance.

> And most documentation systems are far easier adopted by people other than engineers.

That really will depend. And no matter how good the software is, you're likely going to be locked into one corporate service provider. If you instead treat documentation like you do code, you'll have access to a wide variety of wholly interoperable UI alternatives with no threat of lock-in.

MilStdJunkie1y ago

> And most documentation systems are far easier adopted by people other than engineers.

Whew, gonna have to have a hard disagree with you there. DaC is several times - nay, orders of magnitude - less complicated than standing up a S1000D, a DITA, or even a DocBook publishing system. For anyone.

Count the layers of configuration.

S1000D, you have to worry about issue (which has zero compatibility, and the Technical Steering says they have zero intention of releasing any guide to matching the different issues up), you have to worry about BREX, then you have to worry about bespoke DMC schemes, and then you have all the many ways the PDF or IETM build can get built out to Custom Solution X, since the TS/SGs offer absolutely bupkiss for guidance in that department (it's a publication specification that doesn't specify the publication, what can I say?). The DITA side's not a lot better: you have multiple DITA schemas, DTD customization, specialization, and you have a very very very diverse batch of DITA-OT versions to pick from, then on top of that you have the wide wide world of XSL interpreters, again with very little interplay. DocBook is probably the sanest of the bunch, here, but we're still going to be wrestling with external entities, profiles, XSL, and whether we're doing 4.X or 5 or whatever is in DBNG.

Not to mention all of this stuff costs money. Sometimes a whole lot of it. Last time I shopped round, just the reviewer per seat licenses for the S1000D system were 13k per seat per year, the writer seats were over 50k per year.

DaC, on the other hand, I want to get re-use and conditionals, so I get Visual Studio Code. I get Asciidoc. I get some extensions. I get gitlab, set up whatever actions I want to use, set up the build machine if I want one, and if I'm feeling adventurous, Antora. I'm literally writing an hour later. I'll probably spend more time explaining to the reviewers what a Pull Request is.

sam_bristow1y ago

You might be interested in this project I came across a few months ago. This person is trying to build a S1000D style system based on Asciidoc.

https://github.com/lopsotronic/Ascii1000D

MilStdJunkie1y ago· 3 in thread

The DaC debates grow increasingly grim as the overall employment situation worsens across industries. It's pretty hard to get people to react authentically, rather than see the discussion as an attack on how they do their jobs[0].

I'm going to head all this off at the pass, and say instead that DaC[1] is a technological tool for a limited number of business use cases. It's not a panacea, no more than XML publishing in a CCMS (component content management system) was seen as the Alpha and the Omega (and indeed still is by a whoooooole lot of people). I say this as a heartfelt believer in the DaC approach vs a big heavy XML approach.

Your first question - really, this should always be your first question - is, "how do people do their jobs today?". If you work in a broom factory, and the CAD guy reads word documents, the pubs guys use Framemaker, the reviews are in PDF, and the final delivery is a handful of PDF documents....well, using DaC is going to be a jump.

Now, is that jump worth it? Well, it might be. Your CAD guy might know his way around gitlens, your pubs folks probably have some experience with more complex publishing build systems, and, most important of all, you might have a change tempo that really recommends the faster-moving flows of DaC. If you're going the Asciidoc route, you could even try out some re-use via the `include` and `conditional` directives. But it also could be a disaster, with no one using VCS, no one planning out re-use properly[2], people passing reviews around in whatever format, and PDF builds hand-tooled each time. It's not something you dive into because it's what the cool kids are doing. Some places, maybe even most places legacy industry wise, it's just not going to work. Your task - if your job is consulting about such things - is to be able to read the room real fast, and recognize where it's a good fit, and where you might need to back off and point to a heavier solution.

[0] Big traditional XML publishing systems are also in the crosshairs, as they're quite frankly usuriously expensive, also writer teams have started noticing the annoying tendency of vendors to sell a big CCMS and then - once the content's migrated - completely disappearing, knowing that the costs of migration will keep you paying the bill basically forever.

[1] DaC defined as : lightweight markup (adoc, md, rst, etc), written/reviewed with a general-purpose text editor, where change/review/publish is handled on generic version control (git, hg, svn, etc), and the consumable "documents" are produced as part of a build system.

[2] Which crashes ANY CCMS, regardless of how expensive or how DaC-y it is.

hu31y ago

Perhaps there's a market for a WYSIWYG markdown editor that reads/saves to git for non techies so they can keep README.md and similar files updated.

MilStdJunkie1y ago

Also a market for putting a more doc-focused UI on git, integrating that too[1]. Pull Requests are basically gold for doc review, but the process of getting to the PR is something that always seems to need a bit of training. Nothing like the training that's needed to grok the basic graph-based change model, and how it's going to work for natural language (ish) documents, but that's a whole other kettle of fish.

[1] GitLens comes pretty close to this, however.

euroderf1y ago

Bonus points for adding a WYSIWYG HTML editor that can work with rendered Markdown and then write its edits back out as Markdown, to Markdown (altho maybe in the worst cases falling back to embedding simple HTML).

K0nserv1y ago· 2 in thread

This is focused on people whose job it is to write documentation, but I think it applies generally. A previous company I worked at moved away from Read The Docs to Confluence and it was terrible. This decision was resisted by much of engineering because we recognised that disconnecting documentation from code would make it worse, it did.

esafak1y ago

did it happen because nontechnical stakeholders did not want to read the code?

tikhonj1y ago

I've seen pressure to move to Confluence in a different setting because some non-technical users did not want to use git, and (thanks to big company bureaucracy) some of them did not have access to GitHub at all.

That said, GitHub has okay(ish) ways to edit files right from the web UI now, so having to use git should not be a complete blocker any more.

1 more reply

pavel_lishin1y ago· 2 in thread

The landing page doesn't really explain anything, except a tangential quickstart into Github hosting.

godelski1y ago

Simonw[0] and 10 minutes later ChrisArchitect[1] mentioned another HN thread which it looks like dang __just__ merged. But that other post has a different link that is probably the intended one[2].

Though it is quite interesting to see how many comments are responding to (presumably) the title (and thus their interpretation of the title) and also didn't read each other's comments. Because when I hit reply, there were at least a dozen and I began writing this before dang merged.

[0] https://news.ycombinator.com/item?id=40920767

[1] https://news.ycombinator.com/item?id=40920876

[2] https://www.writethedocs.org/guide/docs-as-code/

pavel_lishin1y ago

I think regardless of where the original link linked, it's a very weird choice for a landing page.

"Follow these quickstart instructions!"

"Why, what are we quickstarting?"

"First, make a new repo!"

1 more reply

eysgshsvsvsv1y ago· 2 in thread

Why hoard random sentences. Let go. Your time is more valuable.

Zambyte1y ago

I hoard "random sentences" because I see my time as valuable. Instead of processing the same thoughts over and over and concluding the same thing (or worse, the wrong thing and failing as I previously have), I just write things down. Recalling notes on my computer takes seconds at most, where I may have to think about something for minutes or hours to come to the same conclusion.

hju22_-31y ago

Why have a door? Remove it. You're going to enter anyway, your time is more valuable.

But seriously; what do you write to have this opinion? Just random, pointless drivel fit for Twitter?

Having some—any—kind of history has saved my ass a lot of work, and time in the process, by simply having either a restore point or earlier reference. Notes that were removed, but helped me remember something relevant or useful at the time, that I couldn't directly remember, but remembered having written at least something about.

Heck, even Office's history in documents have helped for restoring from errors caused by collaboration, or whatever else. And sure, I don't like Microsoft, and a lot of it is their fault for just shitty in-document synchronization, but a lot of it hasn't been too.

jkaptur1y ago· 1 in thread

This is a really interesting topic, and it has complexity I didn't consider until I became deeply involved in some similar systems.

For example, in code, you can generally use feature flags, A/B testing, etc. to show different things to different people quite flexibly, but (depending on how the documentation is actually published) you might have very different capabilities.

MilStdJunkie1y ago

Lots of DaC shops use feature flags for their conditional content. "Conditional Content" is a huge hobbyhorse in component content, because you need conditionals to re-use chunks. How else could the chunk be made applicable to multiple people? In doculandia, it's more common to run into conditional handling that's inline with the document markup - ifdef/ifeval/ifndef in Asciidoc, some stuff in Jekyll, S1000D applic, DocBook profiles, DITA class/ditaval - but I'm not one hundred percent sold that's a solid practice. Moving conditionals into the document layer might have been a mistake. I dunno! I'd love to kick off a conversation.

MilStdJunkie1y ago· 1 in thread

As a pretty die-hard enthusiast for this approach - even for legacy, hard industries - let's take a close look at some of the limitations of this approach.

First, code is formal language, and docs are natural language. That's a lot of jargon; what does it mean? It means that the chunks inside of a piece of code are consistently significant; a method is a method, a function is a function. Chunks in a document are, woo boy, good luck with that one. XML doesn't even have line breaks normalized. Again, no matter what the XML priesthood natters about, it's natural language.

A consequence of this is that the units of change are much, much smaller in a repo of code vs a corpus of documents. This, well, is can be ok, but it also means that a PR in a docs as code arrangement can be frickin' terrifying. What this means, is that you have to have a pretty good handle on controlling the scope of change. Don't branch based on doc revisions, but rather on much more incremental change, like an engineering change order or a ticket number.

Your third problem is that the review format will never - can never - be completely equivalent to the deliverable. The build process will always stand in the way, because doing a full doc build for every read is too much overhead for basically any previewer or system on the planet. This is a hard stop for a lot of DaC adopters, as many crusty managers insist that the review format has to be IDENTICAL to the format as it's delivered. Of course, that means when you use things like CIRs (common information repositories) that you end up reviewing hundreds of thousands of books because an acronym changed....but I call 'em "crusty" for a reason. They're idiots.

taeric1y ago

It can be intimidating. And it probably isn't worth the investment for many projects. Especially not small ones. But https://www.amazon.com/gp/product/1541259335/ is a very compelling example of something in this vein.

Therenas1y ago

This is exactly what we do for the Factorio modding API docs. The docs are embedded inside the codebase, alongside the classes and methods that implement the functionality the docs describe.

So they are written and adjusted as the functionality is implemented, they can be reviwed alongside the code PRs. The CI builds the docs and makes sure there are no issues.

The format is a custom one, which is parsed and converted into JSON for language servers and into the API website. Not sure how you‘d test the docs content, but this parser is tested for sure.

Works great for us in general.

bluGill1y ago

Developers are too close to the code to write effective documentation for it. They will go into great detail about things that nobody else cares about, while skipping important parts because to them it is obvious.

While it is possible to do okay anyway, it only happens if there is effort over time.

I'm convinced that the best thing to do it when someone asks you a question about your APIs the response should be to go (now that you are not so close to the code you can better to this) write the answer that person needs, and have them review it until they understand. You are not allowed to talk to that person except via new documentation, while they can pester you as much as they want until you make the documentation usable. It will still take some rounds, but if nobody is reading the documentation there is no point in writing it either.

avg_dev1y ago

I believe I remember reading that for merging branches to the Postgres project, you need to update the docs too in order to pass code review. A nice way of doing it, I thought. Pg has some great docs that I have been reading for some years.

scoot1y ago

MUI has always done this (since 2014), but goes one or two steps further than the bullet list at the beginning of the article. Most significantly, API documentation is generated from the code of the components being documented, so is always accurate and up to date.

https://mui.com

Spivak1y ago

Love the concept, hate the article. The article doesn't actually say anything other than "store your docs in git" which... yeah, obviously. You don't need anyone to tell you that being able go to a snapshot of the docs as they were at the time of the commit/release you're looking at is a powerful feature.

But that's not really treating your docs as code, more like "storing your docs in the same place as your code." A system like Sphinx with autosummary and autodoc where the docs are generated from your code and human-readable details like examples are pulled from the relevant docstrings is very much docs as code. Same with FastAPI's automatic OpenAPI generation and automatic Swagger. Pulling the examples section for your functions directly from your tests, now that's docs as code.

igtztorrero1y ago

Love this approach DBC Doc Before Code, very useful when working with Jr Developer

rickydroll1y ago

For me, this should be the end goal of the AI pair programming. I write documentation for APIs, data structures, etc., hand it to the AI, and it should crank out functional code that meets the requirements spelled out in the documentation.

We are close, but it's not there yet. I will always need to run a validation test against the code and eyeball to ensure it's not insane. But today, it's clear to me that if ChatGPT/copilot doesn't generate correct code quickly and easily from what I wrote, I didn't understand the problem and couldn't express it clearly.

kkfx1y ago

Well...

I agree having a shared doc repo so anyone can commit changes/patches to docs, witch is well... Not much different than what most wikis offer already, and while useful wikis prove that's not enough to have good docs and little to no garbage in them...

But I will NEVER "host my docs" on someone else platform depending from their services (if you host code/docs as a mere repo, GH and alike are just mirror of something most dev have, if you use their features your workflow hardly depend on them) and I also never use MD as my default choice.

bllchmbrs1y ago

I've thought about this problem set for years, I've written many docs and technical books.

Version control is the best for documentation.

But maintaining it is hard - lots of great comments here.

For anyone interested, I'm working on https://hyperlint.com/ (disclaimer: bootstrapped founder). To help automate the toil around documentation.

fucalost1y ago

From my limited interactions with document-intensive sectors (i.e. legal), I think they’re sorely lacking something like this.

When the same document is edited by two separate individuals and diverges, it is a nightmare to reconcile the two.

I truly wish (i.) Microsoft Word was a nicer format for VCS, or (ii.) Markdown was more suitable for “formal” legal texts and specifications — probably in that order (!)

xixixao1y ago

In recomputer[0] I put the docs sources directly next to the relevant implementation, and also tested the examples.

For this to work well (not just like an API reference), the implementation itself had to be structured well.

[0] https://github.com/xixixao/recomputer

batterylow1y ago

Similar for the PlotAPI docs [1] which are all Jupyter notebooks!

[1] https://plotapi.com/docs/

benrutter1y ago

I felt myself agreeing hard with this until I read it!

I thought it was gonna be all about ensuring your api documentation is closely coupled with your code. But it's more about using code tools to write docs.

I'm kinda two ways on it, doesn't it depend on what "docs" actually are? (I couldn't find a definition on the page). Wikipedia is a kind of documentation, but tieing it to version contril tools would massively restrict the number of people contributing and therefore the quality if the docs.

I dunno, maybe I'm missing the point.

wruza1y ago

tl;dr: commit index.md to github repo and use github pages to host it.

lkdfjlkdfjlg1y ago

Is documentation that important? Even when I think is excellent like postgres, I've only ever had a few pages of it. Which leads me to think, who's reading the other thousands (?) of pages?

I think the amount of effort you should put into documentation varies wildly on the scope of the project.

j / k navigate · click thread line to collapse

99 comments

78 comments · 27 top-level

simonw1y ago· 13 in thread

A subset of this idea is a hill I am willing to die on: the documentation for a codebase should live in the same repository as the codebase itself.

I'm talking about API documentation here - for both code-level APIs (how to use these functions and classes) as well as HTTP/JSON/GRPC/etc APIs that the codebase exposes to others.

If you keep the documentation in the same repo as the code you get so many benefits for free:

1. Automatic revision control. If you need to see documentation for a previous version it's right there in the repo history, visible under the release tag.

2. Documentation as part of code review: if a PR updates code but forgets to update the accompanying documentation you can catch that at review time.

xorcist1y ago

Love your blog, but in this case I want to take a more nuanced, if not opposite, stance:

simonw1y ago

If the deployment code needs to be able to ship different versions, I would keep that deployment code in a separate repository - with its documentation bundled there.

The other form of documentation that I am passionate about is documentation that lives in issues, and then linked to from commit messages.

The great thing about issues and issue comments is that they have a clear timestamp attached to them, and there is no expectation that they will be kept up-to-date in the future.

This makes them the ideal place to keep documentation about how the code evolved overtime, and the design decisions that were made along the way.

1 more reply

NomDePlum1y ago

"Specific revisions should be documented, but documentation should not be limited to a specific revision."

It's unclear to me what this is trying to argue. So apologies if the below entirely misses your point.

Not maintaining accurate documents is like disabling tests because they don't pass. It's easy to do but not right.

A checked in codebase to me should be as current and correct as possible. That includes accurate documentation.

I've rarely seen documentation that isn't tied to the codebase being maintained/valued.

1 more reply

godelski1y ago

> a hill I am willing to die on: the documentation for a codebase should live in the same repository as the codebase itself.

I'm a big fan of this and treating documentation like a first class citizen.

[0] https://www.codecademy.com/resources/blog/what-is-a-sprint/

tivert1y ago

5. The documentation won't get lost in a botched wiki migration or something like that.

ranger_danger1y ago

One time my company purchased a $5k commercial license for x264 and were met with "the code is the documentation." That set us back literal weeks.

gofreddygo1y ago

  > the documentation for a codebase should live in the same repository as the codebase itself

This! 100%. Emphasis - codebase documentation. Not user guides.

After doing this a couple times, it's a no brainer. The benefits are significant, the effort minimal. Just add a docs dir at the project root and go to town.

everything just one cmd+shift+f away.

hitchstory1y ago

I'm pretty convinced that there should be a single source of truth for specifications, tests and documentation but I think the industry will take a while to catch up to this idea.

I built a testing library centered around this (same as my username) but it's hard to get people to stop writing unit tests :)

simonw1y ago

Really it comes down to the team you are working with. If you have user-facing documentation authors who are happy with Markdown and Git you can probably get this to work.

1 more reply

strken1y ago

I would be very happy if as much developer documentation as possible was actually executed as part of the code.

clumsysmurf1y ago

cedws1y ago

True. Confluence or whatever corp shitware is where technical documentation goes to die.

klysm1y ago

I’ll happily die on that hill with you

RandallBrown1y ago· 11 in thread

One of the main bullet points on the page is automated tests.

How do you write automated tests for documentation? Somehow require that blocks of code have documentation linked to them?

thesuperbigfrog1y ago

>> How do you write automated tests for documentation? Somehow require that blocks of code have documentation linked to them?

It could be tests to ensure documentation "builds" into all of the desired formats (e.g. web, pdf, ebooks, etc.) correctly.

Some programming languages have the idea of "documentation tests". In Rust, tests that are part of the documentation will run as part of the documentation build:

https://doc.rust-lang.org/rustdoc/write-documentation/docume...

Nzen1y ago

If we treat specifications written in gherkin syntax [0] as documentation, then the cucumber framework can match a line or stanza of gherkin to a test function [1].

[0] https://www.manning.com/books/writing-great-specifications

  Given a work order xx
  and xx isExpedite
  When an operator prints the jobcard
  Then expect a label in the footer that says Expedite

[1] https://cucumber.io/docs/cucumber/step-definitions/?lang=jav...

darknavi1y ago

Loads of things!

- Making sure example snippets still compile

- Checking if links are dead

- Check for standardized/proper formatting

Basically anything you'd want to enforce manually, try to enforce with CI.

0cf8612b2e1e1y ago

I believe Rust and Python have the ability to run tests defined in docstrings.

Spasnof1y ago

Yes there are certain libraries that can handle this. Essentially asserting that functions documented are valid / return the proper results.

See https://docs.python.org/3/library/doctest.html#module-doctes... as an example.

simonw1y ago

I wrote about my way of doing that here: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...

tpoacher1y ago

But I too would be interested to hear other people's insights who subscribe to this Docs as Code model.

spondylosaurus1y ago

Linters like Vale are pretty common for docs(-as-code).

blowski1y ago

> Somehow require that blocks of code have documentation linked to them

The Symfony (PHP) framework now does this. Code and config examples in the docs have automated regression tests.

brobdingnag_pp1y ago

Or just require file/function level comments. Requiring them to be helpful can be managed interpersonally, like someone was slacking off (they are)

fmbb1y ago

Yeah that’s one way. And you can test that docs don’t link to code that does not exist.

Here are some other Good Ideas in a blog post I stumbled upon the other week: https://azdavis.net/posts/test-repo/

WillAdams1y ago· 5 in thread

Why not just put forth/use Literate Programming?

https://www-cs-faculty.stanford.edu/~knuth/lp.html

corysama1y ago

So, I'm trying to find the place between Doxygen and full-blown literate programming. Encouraging disjoint prose documentation rather than parameter-by-parameter docs or chapter-by-chapter docs.

Meanwhile, this article sounds like it's about literate programming stuff. But, it's actually about using code-oriented tools to write documentation.

WillAdams1y ago

I'm finding that future-self really appreciates the effort past-self made to document things in book form:

https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...

and

https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...

for my current project.

Moreover, it seems to me that there would be a great deal of synergy in using Literate Programming techniques when:

>using code-oriented tools to write documentation.

miohtama1y ago

Literate programming only works for small scripts and narrative documentation, not for e.g. API documentation.

nocman1y ago

not true:

"This book describes pbrt, a physically based rendering system based on the ray-tracing algorithm." ( https://www.pbr-book.org/3ed-2018/Introduction )

and:

"This book (including the chapter you’re reading now) is a long literate program." ( https://www.pbr-book.org/3ed-2018/Introduction/Literate_Prog... )

2 more replies

WillAdams1y ago

OIC.

I guess that the books at:

https://www.goodreads.com/review/list/21394355-william-adams...

1 more reply

gwern1y ago· 4 in thread

davidthewatson1y ago

https://www.efsa.europa.eu/sites/default/files/event/180918-...

I don't know about you, but I'd trust AI a lot further if anything it generated was provable BEFORE it reached my cursor, not after.

The same is true here today.

I'm still guessing. I've posed this question to every team I've interacted with since this emerged, which includes many names you'd recognize.

Not trivial, but increasingly straightforward given the tools and the talent.

gwern1y ago

euroderf1y ago

I'd think that some combination of user-facing documentation (for the outside of the software) and requirements specs (for the inside of the software) oughta do the trick.

flunhat1y ago

The real issue is debuggability, and in particular knowing your code is "generally" correct and not overfit on whatever specs you provided. But we are discussing a tractable problem at this point.

natpalmer17761y ago· 4 in thread

Personally I'm a fan of writing your first draft of documentation before writing the first line of code.

vlod1y ago

Readme Driven Development https://tom.preston-werner.com/2010/08/23/readme-driven-deve...

eesmith1y ago

Or from the 1990s, "User manual as spec" - https://archive.org/details/rapiddevelopment00mcco/page/324/...

For example, the Excel Basic spec: https://www.joelonsoftware.com/2006/06/16/my-first-billg-rev...

> Then I sat down to write the Excel Basic spec, a huge document that grew to hundreds of pages. I think it was 500 pages by the time it was done. (“Waterfall,” you snicker; yeah yeah shut up.)

On the page above "user manual as spec" is "point of departure spec", which would be more like the iterative prototyping style.

tracker11y ago

I'll often do similar if I'm exposing a library... I usually want to work out the semantics and API for how to use the library before actually writing the interface.

natpalmer17761y ago

Interestingly enough, my personal philosophy is to write all backend code as if it is a library for my future self.

That is to say, I want to be able to forget everything about a project and still have the resources I need to use the project code as if it were a black box consumable library.

fjni1y ago· 3 in thread

This is such an ignorantly engineering centric perspective.

There is value in the larger organization being able to consume documentation and commenting on it and contributing to it.

There is conceptual value in some of these things, but I find it to be overstated and the downsides entirely ignored.

Most documentation systems have a version history.

And most documentation systems are far easier adopted by people other than engineers.

This is the equivalent of pointing out that figma has x, y, and z benefits and designers are fluent in it, so we should be using that for documentation.

MetaWhirledPeas1y ago

> This is such an ignorantly engineering centric perspective.

I gather this is for technical documentation. For people who either are engineers or who work closely with engineers.

> There is value in the larger organization being able to consume documentation and commenting on it and contributing to it.

> And most documentation systems are far easier adopted by people other than engineers.

MilStdJunkie1y ago

> And most documentation systems are far easier adopted by people other than engineers.

Count the layers of configuration.

sam_bristow1y ago

You might be interested in this project I came across a few months ago. This person is trying to build a S1000D style system based on Asciidoc.

https://github.com/lopsotronic/Ascii1000D

MilStdJunkie1y ago· 3 in thread

[2] Which crashes ANY CCMS, regardless of how expensive or how DaC-y it is.

hu31y ago

Perhaps there's a market for a WYSIWYG markdown editor that reads/saves to git for non techies so they can keep README.md and similar files updated.

MilStdJunkie1y ago

[1] GitLens comes pretty close to this, however.

euroderf1y ago

K0nserv1y ago· 2 in thread

esafak1y ago

did it happen because nontechnical stakeholders did not want to read the code?

tikhonj1y ago

That said, GitHub has okay(ish) ways to edit files right from the web UI now, so having to use git should not be a complete blocker any more.

1 more reply

pavel_lishin1y ago· 2 in thread

The landing page doesn't really explain anything, except a tangential quickstart into Github hosting.

godelski1y ago

Simonw[0] and 10 minutes later ChrisArchitect[1] mentioned another HN thread which it looks like dang __just__ merged. But that other post has a different link that is probably the intended one[2].

[0] https://news.ycombinator.com/item?id=40920767

[1] https://news.ycombinator.com/item?id=40920876

[2] https://www.writethedocs.org/guide/docs-as-code/

pavel_lishin1y ago

I think regardless of where the original link linked, it's a very weird choice for a landing page.

"Follow these quickstart instructions!"

"Why, what are we quickstarting?"

"First, make a new repo!"

1 more reply

eysgshsvsvsv1y ago· 2 in thread

Why hoard random sentences. Let go. Your time is more valuable.

Zambyte1y ago

hju22_-31y ago

Why have a door? Remove it. You're going to enter anyway, your time is more valuable.

But seriously; what do you write to have this opinion? Just random, pointless drivel fit for Twitter?

jkaptur1y ago· 1 in thread

This is a really interesting topic, and it has complexity I didn't consider until I became deeply involved in some similar systems.

MilStdJunkie1y ago

MilStdJunkie1y ago· 1 in thread

As a pretty die-hard enthusiast for this approach - even for legacy, hard industries - let's take a close look at some of the limitations of this approach.

taeric1y ago

Therenas1y ago

This is exactly what we do for the Factorio modding API docs. The docs are embedded inside the codebase, alongside the classes and methods that implement the functionality the docs describe.

So they are written and adjusted as the functionality is implemented, they can be reviwed alongside the code PRs. The CI builds the docs and makes sure there are no issues.

The format is a custom one, which is parsed and converted into JSON for language servers and into the API website. Not sure how you‘d test the docs content, but this parser is tested for sure.

Works great for us in general.

bluGill1y ago

While it is possible to do okay anyway, it only happens if there is effort over time.

avg_dev1y ago

scoot1y ago

https://mui.com

Spivak1y ago

igtztorrero1y ago

Love this approach DBC Doc Before Code, very useful when working with Jr Developer

rickydroll1y ago

kkfx1y ago

Well...

bllchmbrs1y ago

I've thought about this problem set for years, I've written many docs and technical books.

Version control is the best for documentation.

But maintaining it is hard - lots of great comments here.

For anyone interested, I'm working on https://hyperlint.com/ (disclaimer: bootstrapped founder). To help automate the toil around documentation.

fucalost1y ago

From my limited interactions with document-intensive sectors (i.e. legal), I think they’re sorely lacking something like this.

When the same document is edited by two separate individuals and diverges, it is a nightmare to reconcile the two.

I truly wish (i.) Microsoft Word was a nicer format for VCS, or (ii.) Markdown was more suitable for “formal” legal texts and specifications — probably in that order (!)

xixixao1y ago

In recomputer[0] I put the docs sources directly next to the relevant implementation, and also tested the examples.

For this to work well (not just like an API reference), the implementation itself had to be structured well.

[0] https://github.com/xixixao/recomputer

batterylow1y ago

Similar for the PlotAPI docs [1] which are all Jupyter notebooks!

[1] https://plotapi.com/docs/

benrutter1y ago

I felt myself agreeing hard with this until I read it!

I thought it was gonna be all about ensuring your api documentation is closely coupled with your code. But it's more about using code tools to write docs.

I dunno, maybe I'm missing the point.

wruza1y ago

tl;dr: commit index.md to github repo and use github pages to host it.

lkdfjlkdfjlg1y ago

Is documentation that important? Even when I think is excellent like postgres, I've only ever had a few pages of it. Which leads me to think, who's reading the other thousands (?) of pages?

I think the amount of effort you should put into documentation varies wildly on the scope of the project.

j / k navigate · click thread line to collapse