Write plain text files - https://news.ycombinator.com/item?id=30521545 - March 2022 (345 comments)
The author invokes the concept of "authenticity", and that's where it gets interesting.
I used to set my students a question about information content in a class on the philosophy of procedural representation.
We had a very high resolution photo of the aviation pioneer Amelia Earhart, and a short grainy video clip of her getting into a plane and smiling and waving.
My question was: Which one of these two media conveys more information about Amelia?
One gave extraordinary detail of her face, eyes, and seemed to many was a much better "fidelity" document. Others noticed that although you couldn't see her face in the video, you could feel from her gait, waving, body language and the way she shook hands _much more_ about her than from the static photo.
Both files are the same size in bytes.
So which one has more "information"? Which one is more "authentic"?
Not to attempt to answer here with a deep dive into phenomenology, but each carries a different kind of information, which can be static, dynamic, or meta-dynamic in higher orders relative to a matrix of assumptions that must be carried forward in parallel by the culture that wants to decode the message later.
I like that Miris tries to explore this by questioning the richness of text. But maybe the question doesn't hold up well under those conditions of investigation - because one might say that a great poet using only a few words might capture a landscape better than a painting, but if our culture drifts toward a visual one where poetry is no longer understood we cannot say that the medium itself degraded.
(It is also worth noting that these higher fidelity sources are often left to decay or are intentionally destroyed due to the difficulty and expense of maintaining them.)
All those monks slaving away doing doodles in the margins weren't just staving off carpal tunnel. There's meaning there you can't get any other way. Before Man could write, Man made art.
The other HN article about "plaintext only" I also agree with. HTML is the synthesis of the two. Sometimes I forget what a great idea and a blessing HTML is. Even if you don't have a browser that can render it, reading an HTML document isn't difficult if it isn't festooned with auto-generated nonsense.
That has little to do with the medium. If the painter takes as much effort as the poet, and the viewer as much time and effort as the reader, just as much information and emotion can be gleaned from the painting as the poem.
I would posit that while a painting has can be very high context, that the tendency is for poetry to be even more context dependent. Transplanted outside its native culture, I suspect visual works (again, on the margin) can be grasped with more depth by the viewer than than a reader of the (on the margin) poem.
I thought photo quality was rather meh until the 50s or at least the 40s? Even with large films the results are often muddy in olden shots—while 70 mm movie film from the 60s will probably still be redigitized into super-duper-hd formats in the late 21st century (e.g. https://youtu.be/sCv-dIFGcd0).
Or maybe, what changes are behind systems that change?
I don't think anyone really argues that everything should be plain text, even if that's an easy shorthand. The real argument is "use the simplest, most open format possible."
Nobody is suggesting you go through all of your photos, transcribe your emotional reaction to each picture, and then delete the image. But, if you want to view those same photos when you're fifty years old, or seventy-five, you're better off storing them as a JPEG than a PSD, and you're better off storing them on a hard drive you have access to in addition to whatever cloud they're currently occupying.
"Write plain text" is a shorthand for "use open formats." Because so much of what this audience does is test-based, plain text is the most common format we use, from source code to journaling, but that message applies to pretty much anything: if you lock yourself into a proprietary format, or a proprietary editor, you will almost certainly lose data over the long term.
OTOH there are many photos I have, taken a decade or two ago, where I wish I'd written down my thoughts and reactions at the time, rather than just taken the picture. A picture may be worth a thousand words, but just having lots of pictures and no contemporaneous words, leaves more of a gap the longer ago it was.
Also, the 70s and early 80s were a bit more orange than I remember.
A few years ago, when on a hike, if I came about some beautiful scenery, out came the camera. I'd spend most of the time capturing images with the device, rather than take in the landscape through my own senses.
Later, I'd look at those photos and noticed that they failed to convey a great deal of the emotional dimension. Now, I spend more time looking at the landscape, trying to notice all the details, and only take one or two snapshots. The idea of writing down my thoughts and reactions is worthwhile, or for practicality, maybe just audio record them and transcribe later.
The author mentions converting to other open, text-based formats like HTML and LaTeX for publishing and writes:
> Keep your graphics files alongside your text files. But keep your text as plain text.
Seems more like a misunderstanding of it than a response. As you quote explicitly from the Sivers article, he is talking about keeping text as plain text, not about keeping images as plain text. And the Miris article is basically saying the same thing (at the end he even says plain text is still his first choice), yet appears to think he's giving some kind of opposing viewpoint.
"Write plain text" is definitely not a shorthand for "use open formats".
PDF is an open format.
Approximately nobody who says "write plain text" thinks putting everything in PDF is an acceptable alternative.
They don't even want you writing in HTML, for that matter. They want Markdown.
They really do mean something fairly close to "plain text".
> The real argument is "use the simplest, most open format possible."
For most collections of words, that means Markdown, not PDF. But if the words you're saving are a mortgage document or power of attorney, PDF is actually a better choice.
Thus HTML, Markdown and LaTeX make sense:
\begin{document}
Blah
...
Is completely understandable to a reader even 50 years down the line, even if they don't have LaTeX on-hand.But, it does bring an interesting counter-point: what does $$\frac{1}{n}$$ mean (to not even bring up more complex examples). It's probably no surprise that LaTeX is the lingua franca of math input because it brings in terseness, simplicity and some readability to plain text. Still, it's a programming language, so literally all bets are off in a document (you can redefine \frac to mean something else entirely).
I guess both articles, as noted elsewhere, attempt to nail down one familiar truth: use the simplest expression possible, but not simpler. One thinks that's always plain-text except for images, but there are just more contexts where this applies.
Pretty sure this article is a rebuttal to the front page post on HN yesterday which said exactly that
* It may be an attempt at a rebuttal, but in actuality it mostly agrees. But yeah, rather obviously in reaction to that article.
* That article didn't say quite exactly that everything should be plain text; only that most text should be plain text.
Plain text just works, everywhere, all the time.
-- https://news.ycombinator.com/item?id=30525605https://en.wikipedia.org/wiki/Photo_CD#Converting_Photo_CD_i...
The image format was jpeg if I remember correctly, wasn't it?
But maybe we should all use monochrome bitmap files for everything? That would be very simple.
In addition to UTF-8, my language happens to have ~2 additional code pages/Latin based encodings. Some websites still serve (or very recently used to serve) text files in such broken encodings, so I have to convert such files before use. It's deeply unpleasant. Windows has supported UTF-8 in some fashion for over 15 years, get with the program people.
(I would make an exception for preserving historical non-UTF-8 files in their original byte-exact form, for the same reason that I wouldn't digitise an analogue photograph and then burn the original - but let's be real, all such files have been created by now)
File longevity wins over grammatical correctness most of the time for me. I have text files going back to the 80s, so I'm glad I didn't use any fancier software to write them as they'd be completely unreadable today.
Will re-re-re-revise it again with fresh eyes after resting 'em!
Thankfully I'm using Org-mode these days, which is reasonably ‘plain text’ under practical definitions—but I make dozens new headings every week, and each of them is stamped with the creation time. But boy do I miss having modification times too—should probably finally set up automatic commits to Git. Also need to mess with Orgzly so that it marks notes that are created on the phone.
https://www.dublincore.org/specifications/dublin-core/dcmi-t...
or 'Dublin core' which is RDF.
(Evernote went to shit over the years, so don't take this as an endorsement.)
Sometimes it's also useful to figure out what I was doing when writing a note, by placing the time among my other activities. This gives some context for the thoughts.
Now that I migrated to outlines and the notes are much more granular, plus I started making more of them—they can often serve as a timestamped log of my day. When did I eat the breakfast—so I can put the dinner in the stomach before it begins an acid-fest? Well, I logged watching an episode of the series during the breakfast, so the creation time tells me the answer.
I'm scatterbrained, okay. Or rather, the notes are part of my ‘brain’ now.
In fact, I do miss granular times in other logs of my activity—ironically, in regard to privacy. I watched a video on a particular topic around last summer, and would like to find it now—but YT's ‘watch history’ is crude and just leafing through all of it is infeasible. (Actually, perhaps I should look into the ‘takeout’ dumps of activity for the timestamps, and make a list of the vids in a better format.)
Oh, sweet summer child. Scribe/mss. Koalapad. A bunch of Apple 2GS, Apple 3, and Lisa formats. Lotus Improv.
The points about semantics and authenticity are wonderful, but I think the presumption that all formats can be opened is mistaken exactly because those that can’t be opened become effectively invisible and lost.
I don't see why pure plain text is better in any way than plain text with formatting, like a simplified form of HTML (<a>, <b>, <sup>, some kind of table formatting, etc). The latter is non-proprietary, easily read and diffed, and communicates better than pure text.
Images have their own value, as do animations and video on occasion. Here matters become more complicated - image formats are generally non-human-readable and non-diffable (though SVG or a similar format could solve those problems for schematic-type images) and image conversions generally involve data loss. For starters, though, one should at least use a non-proprietary format for images and video.
Yes, but, the problem isn't typically being proprietary, when it comes to future use, but a closed, non standard, unknown format.
Yet you're creating a new standard here, with your own rules, which no one will understand, and which no automated tools can convert to another format.
(Eg some kind of table formatting)
Better to be 100% html than this.
(Maybe you meant that, but regardless, this is a good place for me to comment on standards being more important than anything else.)
If they had needed to convey an image or contextual information like some rich API spec, they would presumably have used something else.
On the serious side, ASCII art diagrams are splendid and I very often use them myself, though they can get quite complex and thus messy to maintain. There comes a certain point where they lose their simplicity, sadly.
I've been using computers daily for about 35 years now and I have a _lot_ of plain text files that I regularly use -- notes, lists, outlines, quotes, links, etc. Does anyone who has been around a while, have a large multi-decade collection of texts that are _not_ plain text? What formats do you use? How do you maintain access to those files over time?
Do Wordstar files open in modern Word applications, even on iOS? That's part of the access aspect over the long term -- files that can be used, everyday, with your daily-driver tools with minimal special software needed.
Mine as well (maybe not quite 25 but close). But music isn't written word, clearly it wouldn't be in an ASCII text file.
The key is universal, non-proprietary formats that are supported by thousands of open source applications. Those are the formats that will last a lifetime and beyond. So, plain text for the written word (HTML counts as plain text, you can read and write it in any plain text editor), JPG for pictures, MP3 for music.
For video there doesn't seem to be an answer that is fully satisfactory, that I feel confident I can still view in 50 years. So I mostly take photos, not much video, since I can't trust the longevity of video.
I would contend that capturing a picture is absolutely a massive distortion of reality because reality is three dimensional, exists in many spectra beyond visible light, has sounds, smells, taste, and feeling, and exists in a historical context. The selection of framing, distance, focus, all of these are biases of the photographer. A photo is a lie, too. Just because it's higher resolution doesn't mean it has indeed captured the right information.
Text is a lie too, granted. But in our current digitization zeitgeist, we have forgotten that our media (pictures, video, recordings, not just the TV, cable, and internet) lie to us. Our own bias towards slicing apart the world into computer-digestible bits is just us lying more convincingly to ourselves.
I take issue with that. This is stripping the word "lie" from it's time-honoured meaning (~"distorting or fabricating truths to influence decision making or perception"), and dilutes it for when we actually need to call out lies.
Some examples are sorely needed. How is a Word/InDesign file more authentic than a plain text file? Or is the author talking about media? Is a ProTools session more authentic than Wav files?
Dunno about 'authentic', but since the part you've quoted specifically talks about "loss of information", the WAV files indeed incur loss of information compared to a ProTools session.
E.g. if it's a single stereo wav file render, it would miss all the individual channels, for starters.
If it's multiple wav files with all the channels as stems, it will still miss the effect chain settings (and hardcode them in the final result), the MIDI notes (hardcoded as the rendered VST output), session markers, tempo change tracks, and other such things.
A DAW session is like notes for writing a book. Not everything is going to make it in, and the choice of what does make it from the notes to the book, and how it's changed, is quite intentional. And I, personally, don't consider a book to be "lossy" or "unauthentic" because it doesn't also come with all the author's notes.
So, if it's not in the final mix, it's because it's not supposed to be in the final mix; it's not that the data is lost because of technical limitations. And like notes from a book, unless you throw them away, they're not going anywhere.
On a more technical note, underneath the hood, the recorded items are all stored as .wav files too...
I would have preferred PDF/A
Real archivists use a lot of data :)
Or, you need to become a better writer.
My timeline thing [0] keeps the original archives, stores the timeline entries in a database, and exports them hourly as JSON + files. If the code stops working or the database crashes, the files are still there. The automated backups are there too. No information is lost.
However, the richness is not lost in the process. This timeline has geolocation history, notebook scans and a bunch of other things that don't really translate to plain text.
The most important difference is that I can write to my timeline from my phone. Managing text files across devices is quite troublesome by comparison. If I want plain text out of it, I can write a new Destination that pipes entries to plain text files or to a fax machine.
I have a daemon that watches for binary changes in writing documents.
If changes are identified then it runs:
$ libreoffice --headless --convert-to txt <CHANGED_FILES>
Then commits the plaintext to a git repo.Allows for diffs, text search, and "longevity" across "authentic" docs.
Despite text being fully portable, it is limited when it's needed to link a image or other files. People often forget how useful this concept is.
Html is not a viable option as it is awfully verbose for taking simple a note.
Markdown adds just enough semantics that is perfectly readable. From a hex editor to Microsoft Word.
We're in a somewhat critical moment, where markdown can either stay as it is, then dominate and become a godsend format of solid usability for decades, or a harmful feature is added that would slowly drag the whole thing down until the next Just Write Plain Text blog post.
That would give you great "authenticity" (in his definition) and great longevity.
Not practical for reading back, but that was not the point. With the help of a few simple scripts, writing is easy. So, in the end, not really an argument against storing information exclusively in plaintext.
Text+ is compelling because you can have images and some kind of formatting. You want to store metadata and have backlinks and tags. Ideally with the possibility of collaborative editing.
There should be a way to fuse these two.
This would be great for many reasons. At the top of that list for example, is getting a lot more use out of those hard drives you paid for.
Would you kindly clarify this? Did you mean scan in handwritten material but save it in a scalable image format like SVG? I'm quite interested but maybe i'm not capturing what you mean here, because i have not had my breakfast. :-)
For the state of the art, look up "image tracing".
Paraphrased: Make your information capture format as simple as possible, but no more so.