Linus Torvalds: “I'm happily hacking on a new save format using ‘libgit2’” (opens in new tab)

(plus.google.com)

303 pointshebz0rl12y ago259 comments

259 comments

139 comments · 25 top-level

mrcharles12y ago· 32 in thread

On the game I'm currently working on, it's built very heavily around Lua. So for the save system, we simply fill a large Lua table, and then write that to disk, as Lua code. The 'save' file then simply becomes a Lua file that can be read directly into Lua.

This is absolutely amazing for debugging purposes. Also you never have to worry about corrupt save files or anything of it's ilk. Development is easier, diagnosing problems is easier, and using a programmatic data structure on the backend means that you can pretty much keep things clean and forward compatible with ease.

(Oh, also being able to debug by altering the save file in any way you want is a godsend).

JackC12y ago

You probably know this, but remember that storing user data as code is a place where you (general "you") have to think very carefully about security.

Is there any way that arbitrary code in the file could compromise the user's system? If so, does the user know to treat these data files as executables? Is there any way someone untrusted could ever edit the file without the user's knowledge? Even in combination with other programs the user might be running? Are you sure about all of that?

Maybe Lua in particular is sandboxed so that's not a problem (beats me), but in general this is an area where safe high-level languages can all of a sudden turn dangerous. Personally I would rarely find it worth it.

TelmoMenezes12y ago

This is a good point, but I feel that discouraging this type of approach is not the way to go.

I apologise in advance for ranting... I hope this is not too off-topic, but instead a "zoom out" on the issue.

This touches on something deep and wrong about how we use computers these days. Computers are really good at being computers, and the amplification of intellectual capabilities they afford is tremendous, but this is reserved for a limited few that were persistent enough and learned enough to rediscover the raw computer buried underneath, and what it can do.

For example, I dream of a world where everything communicates through s-expressions, all code is data and all data is code. Everything understandable all the way down. Imagine what people from all fields could create with this level of plug-ability and inter-operability. We had a whiff of that with the web so far, but it could be so much more powerful, so much simpler, so much more elegant. All the computer science is there, it's just a social problem.

I understand the security issues, but surely limiting the potential of computers is not the solution. There has to be a better way.

5 more replies

ufo12y ago

Lua sandboxing is relatively straightfoward. You can choose what functinos from the standard library the script you are evaluating will see in its global scope. By passing an empty scope the only thing the evaluated script can do is build tables, concatenate strings, do arithmetic, etc. You only need to worry about DOS due to infinite loops but there are also workarounds for that).

In Loa 5.1 you can use setfenv http://www.lua.org/manual/5.1/manual.html#pdf-setfenv

And in Lua 5.2 the functions that eval strings receive the global scope as an optional parameter. http://www.lua.org/manual/5.2/manual.html#pdf-loadfile

chalst12y ago

I trust Lua sandboxing. See, e.g.:

1. http://stackoverflow.com/questions/1224708/how-can-i-create-...

2. http://stackoverflow.com/questions/4134114/capabilities-for-...

I find it easier to trust Lua than similar facilities in other programming languages because the kernel of the language has a relatively simple semantics, so the TCB of a sandbox is lower, and the source is easier to understand than most other languages.

Note that sandboxing in Lua 5.2 has a still simpler semantics than for Lua 5.1 - few other languages evolve in a way that makes the language easier to trust.

1 more reply

unsigner12y ago

Lua can be sandboxed so your data file can't call arbitrary functions (but can still call a controlled subset, e.g. a function called RGB that does r255255+g*255+b so your colors are somewhat human-readable in the file, yet 24-bit integers in memory).

But it's still code, so you can e.g. inject an infinite loop and the loader will hang. (You can protect against this, you can install a debug hook that gets called after every N instructions executed, and kill the loader.)

1 more reply

Guvante12y ago

Typically sandboxing is stage one of any lua implementation. You don't need raw IO access and rarely need to print to the screen for instance.

mrcharles12y ago

Aware of all the issues and already have a plan. But Lua generally only has access to the APIs you give it; from game code our Lua VM has no access to the OS at all, just game functions, and those game functions are never system related.

The biggest 'concern' would be save hacking, but at the end of the day that will happen no matter what so it doesn't bother me much.

mattgreenrocks12y ago

Preach it. My favorite persistence code: stuff that has nothing to do with SQL/NoSQL.

I leaned heavily on Python's pickle module for serializing a few thousand entities to disk a few years ago. By streaming them to the application at startup time, it remained plenty fast for all datasets it'd encounter. I intended to replace it with SQLite one day, but I never had to. I could just keep them all in memory.

I'd probably choose something a bit safer now, but it was hard to beat the simplicity.

3pt1415912y ago

I used to do that, but pickle bit me once. I think it changes between versioning or something. I had to start the statistical model from scratch.

2 more replies

dbaupp12y ago

Why does using a Lua-basef format stop the files being corrupted?

politician12y ago

Maybe he means that he doesn't have to deal with bugs in a custom binary serializer.

mrcharles12y ago

It can only become corrupted by external factors; a lot of games I've worked on, in-game bugs could lead to corrupted saves being written out to disk. Since in this case we are just serializing lua data, unless the serializer itself has a bug, it will always write out correctly, and any issues become issues of game logic rather than anything else.

Igglyboo12y ago

I don't think it does. I think he meant that if a save became corrupted it wouldn't do so silently, it would violently crash the game because of a syntax error.

Zecc12y ago

It doesn't. But it makes them much easier to fix.

Edit: Igglyboo has a point too.

TheEzEzz12y ago

I did this with C# in my last game. All the map/object editors output C# code on save, which was then included in the compiled code on the next build. The beauty is that your "data" files get automatically updated when you refactor your regular code! On top of that loading is faster, because you don't need to worry about fetching a file and parsing it, the whole thing is just compiled code embedded in your executable.

Tanner12y ago

That Lua was originally designed as a configuration language becomes really clear when you start doing things like this. Having my code and configuration being separate but equal was really a paradigm shift for me.

Also, the Tiled Map Editor exports directly to Lua.

agumonkey12y ago

IIRC that's how Office and Photoshop file format started. I think it's a nightmare for compatibility in the end.

frik12y ago

So, it's similar to JSON (JavaScript), but valid Lua syntax.

  local t = {}
  t = {["foo"] = "bar", [123] = 456}
  t.foo2 = "bar2"

ilovecookies12y ago

While one could say that this is about savefiles for games, I would say it's implications could more be about savefiles for software projects. If you are building the game in LUA, of course LUA is going to be the preferred way to save your game in since you are already using LUA objects and interpreting files in that language will be easy to integrate.

If you ever used maven xml configs, java object marshalling or c# xml you would understand the pains of using xml as a file format for software projects and data representation. You have to find a solution that is language agnostic, neither LUA or JSON is.

balls18712y ago

I did something similar, but used JSON instead (pretty trivial to (de)/serialize LUA tables to JSON. This made it easy to send data to the server, and inspect with standard tools as well.

seivan12y ago

This sounds a lot like NSCoding for Objective-C (Cocoa). Though you'd still have to define the types/classes and name for each property you want to save. But you could technically save it in a big blob, and then read it into memory as you resume.

Could persist to disk as a binary, sql or a plist (xml).

I guess the only downside is, that if you got a lot of composite classes all with their own properties and associations (say a graph), there's a lot of manual work to be done.

hootener12y ago

I've had to write output save file formats for various projects on several occasions, and it never occurred to me to take this approach.

Thanks for sharing this, it's one of those ideas that (to me) seems so brilliant in its simplicity that I probably would've never thought of it.

Any hiccups in the day-to-day work using this approach? I'm just trying to get a better idea of the workflow since I'm very seriously considering applying it to my next project.

mrcharles12y ago

The biggest hiccup is almost a literal one; serializing large lua structures and then writing them to disk can take a lot of time. But this can largely be mitigated by just saving compiled lua instead of text lua.

Touche12y ago

That's how people are going to cheat at your game.

roryokane12y ago

I had a lot more fun recently playing the free game Boson X for PC (http://www.boson-x.com/) than I would have otherwise because I discovered that the game folder contains editable Lua scripts. The scripts control the game physics, scoring system, controls, level data, and more.

I’ve created mods of the game where you fun faster but gravity is stronger, and where all levels are randomly mixed into one level, and where the dangerous falling platforms also give you energy while you’re on them, and where the sound effects give the player clearer feedback on what they’re doing. And though I could cheat by multiplying my score by 1000 and submitting it online, I actually have been careful to always comment out the high-score saving and submission code in each of my mods.

I like the game much more than if the developers had obfuscated the Lua files so I couldn’t read and edit them.

outworlder12y ago

The save format does not matter at all. It wouldn't matter even if it were an obscure, made-up format. All it would do is slow down 'cheaters' by half an hour.

The only argument against human-editable text files is parsing speed, not security.

1 more reply

phn12y ago

Cheating is good, I remember having tons of fun with age of empires and sim city because I used cheat codes.

If the player has fun, it's a nice feature! :D

1 more reply

jethro_tell12y ago

Plot twist, It's actually a 'teach yourself lua game'

1 more reply

jiggy201112y ago

Does it matter unless the game is multiplayer, in which case you should assume that client files are untrustworthy anyways.

mrcharles12y ago

There are ways around it but if people want to cheat their own SP experience who am I to stop them? We'll obfuscate a bit to dissuade casual users but I don't know that I've ever encountered a game that didn't have some level of save hacking available.

Hell, I've used it myself more than a few times.

6d0debc07112y ago

Hash the information and include the hash in the file. If the hash and the contents don't match when you try to load it, you can refuse it.

If not loading things is important to you, mind.

1 more reply

saucetenuto12y ago

It's only cheating if the developer disapproves.

bhaak12y ago· 26 in thread

What's with all the XML hate? Of course, doing everything in XML is a stupid idea (e.g. XSLT and Ant) and thanks heaven that hype is over.

But if I want something that is able to express data structures customized by myself, usually with hierarchical data that can be verified for validity and syntax (XML Schemas or old-school DTD), what other options are there?

Doing hierarchical data in SQL is a bitch and if you want to transfer it, well good luck with a SQL dump. JSON and other lightweight markup languages fail the verification requirement.

rbehrends12y ago

XML is unnecessarily verbose, for the supposed sake of human readability. But used as a serialization format, it isn't really readable or editable by humans (except in the sense that a Turing machine is programmable): remember that the ML in XML stands for "markup language", and SGML, its predecessor, was designed as a way of marking up normal text, not littering data with angular brackets and identifiers. (XML/SGML arguably isn't that hot as a markup language, either.)

If you really need a hierarchical serialization format that is "verified for validity and syntax", the problem is that XML has prevented the adoption of something better (because it was "good enough").

If you don't need that, then XML is overkill and bloat and makes your format less readable than it could be. And you rarely need it, because either your data is computer-generated and -read, so there's little point in putting in extra schema checks, or schema verification is woefully insufficient (because it can't verify the contents of fields, relations between fields, or a ton of other stuff that can accidentally go wrong).

_pmf_12y ago

You fail to address OP's question:

> But if I want something that is able to express data structures customized by myself, usually with hierarchical data that can be verified for validity and syntax (XML Schemas or old-school DTD), what other options are there?

2 more replies

k__12y ago

> the problem is that XML has prevented the adoption of something better

What would be better?

3 more replies

lmm12y ago

If used sensibly XML isn't too bad. But there's a whole lot of cruft in the standard that seems to do nothing except make it harder to use. Part of this is a problem with popular libraries rather than inherent to the format, but we judge a thing by its ecosystem rather than in isolation. So: namespaces are a pain, making it much harder than it should be to just make my xpath work. DTDs are annoying, especially when a production system breaks because a remote server that was hosting a DTD goes down so now your parser refuses to load a file. User-defined entities seem pointless, and though most parsers can handle the billion laughs these days it wasn't always so. The handling of text nodes is confusing; whitespace is irrelevant except when it isn't. Specifying the encoding inside the document itself seems wrong, and supporting multiple encodings at all causes trouble (e.g. sometimes it's simply impossible to include one document in another inline).

Is XML schema really so much better than e.g. JSON schema?

To me it feels like there's an impedance mismatch between the kind of structures XML lends itself to and the kind of structures programs are good at dealing with. So for program-to-program communications with a certain level of validation I find Protocol Buffers is a much better fit. Conversely in cases where human readability is really important, XML isn't good enough compared to JSON.

saurik12y ago

> So: namespaces are a pain, making it much harder than it should be to just make my xpath work.

Namespaces exist to solve a real-world problem that happens in real-world use cases (SVG embedded in HTML, HTML embedded in RSS). While it would be nice to look at things that are complex and say "it would be less complex for these trivial cases without this feature", in reality there are then common use cases that become more complex or even impossible in the general case, which seems like a very short-sighted benefit. Namespace prefixes are really not that difficult to configure, and once configured XPath makes them very easy to use :/.

2 more replies

rcxdude12y ago

Indeed. XML gets a lot of hate because it's so difficult to use. It would be fine if you could use it without having to care about the 100 features you don't care about and just use the ones you need, but pretty much every library I've seen makes parsing (or generating) a document a huge and complicated task, and most of it is completely irrelevant to the problem I'm trying to solve.

And because of this almost no-one bothers to actually handle it properly so you often can't actually use the advanced features even if you wanted to.

1 more reply

Sharlin12y ago

The issue is probably that 99.999% of all XML use cases don't use (or need) the verification aspect. For all of those, XML is overkill. Besides, surely it would be possible to design a verification layer on top of JSON, for instance - the fact that one does not currently exist does not mean that XML (and abuse of XML!) should not be criticized.

bananas12y ago

One of the core aspects of XML that is really important is that no typing is inferred by the structure of the file unlike JSON. JSON is by nature tied to the JavaScript type system which is sparse and inaccurate. For example, if you look at the following:

   { "name": "bob", "salary": 1e999 }

Ah crap! Deserializer blew (in most cases silently converting the number to null)

   <person>
      <name>bob</name>
      <salary>1e999</salary>
   </person>

No problem. The consumer can throw that at their big decimal deserialiser.

And the following is not acceptable as it breaks the semantics of JSON and requires a secondary deserialisation step as strings ain't numbers...

   { "name": "bob", "salary": "1e999" }

JSON is a popular format but it's awful.

8 more replies

_delirium12y ago

People seem to prefer JSON, but I don't find it any better to hand-write/hand-edit than XML. If anything it's slightly worse, because it has more syntax edge cases.

3 more replies

icebraining12y ago

It does exist: http://json-schema.org/

nly12y ago

I personally miss having schemas and XSLT in JSON.

> doing everything in XML is a stupid idea (e.g. XSLT and Ant)

XSLT actually made a lot of sense. If everyone writes code to transform format1 to format2 then what you end up with a lot of slightly different transformations. Its main downfall, just like XML itself, was that it was annoying and time consuming to write.

How would you replace all this if you moved away from XML?

http://git.hohndel.org/?p=subsurface.git;a=tree;f=xslt;hb=HE...

_pmf_12y ago

> Its main downfall, just like XML itself, was that it was annoying and time consuming to write.

And impossible to debug. Write once, do something else for some weeks, and trying to understand what you were doing at a later point is nearly impossible.

allochthon12y ago

There are schemas in JSON (see, for example, Kwalify) although they are not something that is built into the specification. I don't think the equivalent of XSLT is as necessary when the document readily translates to data structures in a scripting language.

eponeponepon12y ago

> it was annoying and time consuming to write.

It remains annoying and time-consuming :) But there's no better option for reliably creating valid EPUBs to a predetermined business specification.

sparkie12y ago

The problem with XSD and DTD is they only offer primitive ways to validate data, and it takes significant effort to validate some data (eg,[ https://stackoverflow.com/questions/3382944 ]). As a result, there have been a bunch of other XML schema validators created to counter these problems, but we should really ask why we need to keep inventing new languages when the existing ones turn out to be insufficient.

If we start out instead with something that's turing complete and simple to begin with (perhaps S-expressions?), we can (often trivially) write our own validators/type-checkers, or any other processing tool to verify the document structure, with few or no constraints, and without requiring the effort and expertise to parse complex syntax.

tomphoolery12y ago

Unfortunately, XML is way too open-ended for my tastes. You end up getting entire rows of DB content (with full text paragraphs and everything) entirely in one tag, with attributes and values. There are so many options that you typically get a lot of idiot programmers who don't understand the purpose of all the shit in XML, so they fuck up their implementation.

Simply put, XML does not correctly model the data by which we intend to interchange. It was a noble effort, but it didn't come from a place of innovation. It came from corporate needs for standardization.

hardwaresofton12y ago

S-expressions are also a reasonable choice, with some well-placed carriage returns and a serialization implementer that names things well

seanalltogether12y ago

My biggest hatred of xml as a data structure, and believe me I've seen this in production systems more then once, is that it allows for the following.

  <customer>
	<account>
		<type>Personal</type>
		...
	</account>
	<account>
		<type>Business</type>
		...
	</account>
	<custid>496F3AB</custid>
  </customer>

This may seem innocuous, but XML allows mixing of arrays and objects too liberally, and makes automatic parsing overly complex. At first <customer> appears to be an array of account objects, but wait now that we reach the end we find that <customer> is an object with multiple keys and must create an unnamed array key to hold accounts.

XML is a document markup language, not a data format.

tokenizerrr12y ago

Well yes. The problem there is that someone made a bad decision on how to structure their XML. If the same was done like this:

      <customer custid="496F3AB">
    	<account>
    		<type>Personal</type>
    		...
    	</account>
    	<account>
    		<type>Business</type>
    		...
    	</account>
      </customer>

it would make a lot more sense, I think.

2 more replies

mahyarm12y ago

People dislike XML because it's way overkill for %99 of people's use cases but it still gets used anyway! Most people who use it should of been using something simpler like JSON to create their configuration file or return their list of strings in some HTTP API. You can have bloody security vulnerabilities with XML, like you had recently with facebook: https://www.facebook.com/BugBounty/posts/778897822124446

The likelihood of a JSON feature biting you in the ass like that is far lower. Don't use XML until you actually need something XML SPECIFICALLY provides.

Also JSON easily translates with easy to work with dictionaries and lists, XML parsers take more code to work with equivalent items.

throwaway575212y ago

He hates it, but he's using it. Take it for what it's worth.

wtbob12y ago

S-expressions work great. Syntax checking is far simpler, and validity checking is hence something you can roll yourself (and writing an S-expression schema checker ain't tough).

lucian190012y ago

Similarly, what's with the binary format/protocol hate? That's what brought us our current extremely common mishandling of text encoding.

Sir_Cmpwn12y ago

I haven't used it myself, but lisp seems well suited to the task. I've also heard good things about yaml, which is more well-supported by your language of choice.

norswap12y ago

I'd say use JSON, but JSON doesn't have adequate schemas yet (JSON-Schemas is crap).

outworlder12y ago

Because XML solves a 'problem' in the worst possible way. It is not that easy to parse for machines and only the simplest XML files are readable by humans.

Besides, since 1960 or thereabouts we have S-Expressions. The world should just have used that without reinventing the wheel once again.

oneeyedpigeon12y ago· 11 in thread

I don't quite get Linus' problem with XML for document markup (for anything else - config files, build scripts - sure, XML is horrible). Does anyone know any more details about what his specific gripe is? For me, asciidoc (which looks very similar, conceptually, to markdown) suffers from one huge problem: it's incomplete. Substituting symbols for words results in a more limited vocabulary, if that vocabulary is to remain at all memorable.

Sure, XML can be nasty, but thats very much a function of the care taken to a) format the file sensibly b) use appropriate structure (i.e. be as specific as necessary, and no more).

eponeponepon12y ago

Document markup is the one place XML is a no-brainer - more specifically, long-form, highly structured documents (i.e., essentially books).

Without it, publishing would be stuck in a morass of nebulous, ill-documented proprietary messes, and a great deal of current learning would be at risk of being lost to posterity. The fact that there are associated open standards such as XSLT with which to transform it is just the icing on the cake as far as publishing is concerned.

This is why there's so much distaste for XML - people try to use it for applications where it isn't ideal (and there are many more of those than there are applications where it is ideal) because they've swallowed someone else's hype, and as a consequence they have a bad time. If not for the unbelievable exaggeration a few years back (I heard people claim without irony that XML - a markup language for god's sake - would literally change the world), the divisiveness wouldn't exist, and it would be a technology used by experts quietly getting on with the jobs it's best for.

ternaryoperator12y ago

>Document markup is the one place XML is a no-brainer

That's only true for minimally formatted documents. For anything that approaches professional typesetting requirements, XML is a nightmare.

By far the biggest problem, it the requirement that inner elements must be closed before outer ones can be. This frequently means that the software must do a huge amount of read-ahead to figure out which aspect of the formatting changes first to make that formatting element innermost.

Sometimes, that's simply not possible to arrange and so you have to close a whole bunch of elements and then reopen all but one of them.

All this because a constraint of the format.

Ideal formats, such as used by typesetting systems that don't use XML, allow you to say: keep this formatting trait on until it's switched off. There is no concept of every element needing to be a subset of its encompassing element.

1 more reply

bct12y ago

Yeah, we're still well in the backlash phase of XML's hype cycle.

I just hope that opinion of it as a markup language can be rehabilitated before someone reinvents it and kicks off a new hype cycle.

babarock12y ago

For those who missed it, here's what Linus wrote in the comments:

"+Aaron Traas no, XML isn't even good for document markup.

Use 'asciidoc' for document markup. Really. It's actually readable by humans, and easier to parse and way more flexible than XML.

XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist.

As to JSON, it's certainly a better format than XML both for humans and computers, but it ends up sharing a lot of the same issues in the end: putting everything in one file is just not a good idea. There's a reason people end up using simple databases for a lot of things.

INI files are fine for simple config stuff. I still think that "git config" is a good implementation."

bhaak12y ago

Linus' adversion to XML explains also why parsing git's output is so abysmal inconsistent.

Subversion has a really good XML output for its log command which is a joy to use (and that's something to say if you work with XML) whereas with git you always have ugly format options that are most of the time underdocumented.

1 more reply

edoloughlin12y ago

Use 'asciidoc' for document markup.

I just had a quick scan of the user guide. It's very impressive. Looks like markdown but with all the edge cases thought out.

1 more reply

underwater12y ago

I like that the syntax for features is illustrative, so that the raw text representation doesn't end up as a mess of ugly tags. However it looks completely inflexible. It's a mishmash of special cases. How would I implement a new feature without breaking existing implementations? Or without having to write a new parser that in all likelihood will break on some subtle edge case?

At its core XML (if you ignore all the DTD, namespace and entity rubbish) is both simpler and more powerful than this. You have text, tags and attributes. What those tags and attributes mean is up to the application, but at the very least you can be sure that the document can always be reliably parsed into a form you can work with.

saraid21612y ago

> in the end: putting everything in one file is just not a good idea. There's a reason people end up using simple databases for a lot of things.

I'd really like to hear more about this perspective, if anyone feels like they can elaborate.

72deluxe12y ago

I think he overuses the single-word sentence "really" too much. Really.

adobriyan12y ago

This link needs to be posted again and again and again.

I'm sure quite a lot of people will easily recognize it. :^)

Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)

https://groups.google.com/forum/message/raw?msg=comp.lang.li...

oneeyedpigeon12y ago

That's a wonderful rant - I particularly appreciate the digression into anti-bush rhetoric - but:

1. There's very little detail here; it's a nicely worded, emotionally charged piece that leaves a lot of detail unaddressed, e.g. "'I would like to hear why you think it is so bad, can you be more specific please?' If you really need more information, search the Net, please." That's not very helpful.

2. It argues for 'simpler' markup via the removal of attributes. Where possible, I totally agree, as at least hinted at in my original post. Sometimes, though, this would be impossible or unwieldy (e.g. HREF attribute on an A element).

3. Character entities vs. unicode - totally agree. Wherever possible, I use proper unicode characters rather than ugly character entities in my markup.

4. "But the one thing I would change the most from a markup language ... is to go for a binary representation." Linus would vehemently disagree on this point.

2 more replies

tzury12y ago· 8 in thread

I just realized that Linus' posts are the only reason I ever go to Google Plus.

cbsmith12y ago

The question nobody is asking, but actually should is: I wonder what other good G+ content you are missing?

G+ is largely misunderstood. It is a lousy tool for interaction with people connected to you purely socially. It's a very good way to find and interact with people connected to you by interest.

icefox12y ago

The really sad thing is that I have tried several times to search for content that I know exists on G+, but I can't find it, even when I knew the author. After the third time failing at this my usage of G+ dropped significantly. Of all of the things that you would think would work search would be at the top... :|

1 more reply

ChikkaChiChi12y ago

This is exactly how I explain Google+ to folks. It's built for communities, not cliques.

jan_g12y ago

For me it's not just G+, but also Facebook and Twitter. Only reason I ever visit those sites is indirectly through HN posts and similar.

npsimons12y ago

I know this is completely off-topic, and I'll happily be downvoted for it, but why in the world does Google+ capture keyboard shortcuts that are already bound to other well known browser functions? (C-PgUp, C-PgDn, C-w, etc).

unsigner12y ago

Linus : G+ :: notch : Java

xentronium12y ago

That's unfair. Lots of infrastructural projects are done in java. E.g. my personal favorite: lucene (+ solr, elasticsearch).

1 more reply

kurrent12y ago

I wonder if Linus ever reads Hacker News.....

bananas12y ago· 7 in thread

I think this title is wrong.

Firstly some clarification - this appears to just be about the persistence format for his dive log. It was XML, now it's git based with plain text.

As someone who had to manage a system which worked with plain text files structured in a filesystem for a number of years in the 1990s, this is done to death already.

You now end up with the following problems: locking, synchronising filesystem state with the program, inode usage, file handles to manage galore and concurrency. All sorts.

Basically this is a "look I've discovered maildir and stuffed it in a git repo".

Not saying there is a better solution but this isn't a magic bullet. It's just a different set of pain.

e12e12y ago

> You now end up with the following problems: locking, synchronising filesystem state with the program, inode usage, file handles to manage galore and concurrency. All sorts.

Which is why he's reusing git for resolving those pain points? Well presumably all except "synchronizing filesystem state with the program" -- where he's gone from using some kind of xml parser to marshal xml to objects/structs in ram to using a (simple(r)?) text parser to do the same.

I'm guessing he just writes/reads a full (part) of a log (a branch of the full tree, or whatever is used in the program. Maybe a list anchored at a date?) -- and lets git sort the history/backup thing.

So, yes, it's a different format, but I think the argument you're making is off -- seeing as he already has git for that? It's more like combining Maildir (or mboxes, only commited when valid) and git.

xsace12y ago

Maybe you want to wait till he release something. Cause you know, if he took months to get the big picture in mind, I doubt you grasp what he envision just by reading his comment.

bananas12y ago

If it's not that, I'll eat my hat, and my pyjamas.

There's not much more to infer from the comment.

Unless he's invented a new ASN.1 encoding which plugs into libgit or something or a new text serialisation format (both unlikely).

bsder12y ago

Yes, because his design of git was so well-formed.

Git is so well-designed that expert users manage to trash their repositories and propagate the damage.

Maybe that's not a problem of libgit. But tools are both the infrastructure and the UI.

1 more reply

crucialfelix12y ago

I took this to mean that what he is replacing is a single XML file whose content is a tree of element nodes. Every time you have to make a change to that file (changing, removing or adding children nodes within the file) you would have to store a new copy of the file. The most efficient you can get is to store just the text diffs using git or something.

But what he replaces it with is a git object store. Each xml-node becomes a git object. They each point to a parent (just as git commits point to a parent commit).

Now writing to this datastore means adding a new node to the git object database and changing the parent references.

Where git stores commits that are related sequentially in time, this stores nodes in a tree relationship that IS the document.

If he's not talking about this then I'd like to officially take credit for my weird idea right now.

dangoor12y ago

The impression I got was that he was going to store his data in a git object database and that the files would be virtual in there. It would be like the .git directory without the working files on disk. It's all just conjecture until his code his out.

Regardless, I would think that some applications are simple enough (store few enough separate objects in the file system) that the issues you cite are not likely to cause a problem.

mixedbit12y ago

What you describe is quite similar to how gollum wiki uses git for storage: https://github.com/gollum/gollum

WalterBright12y ago· 5 in thread

Back in the bad old DOS days, instead of creating a file format for saving/loading the configuration of the text editor, I simply wrote out the image in memory of the executable to the executable file. (The configuration was written to static global variables.)

Running the new executable then loaded the new configuration. This worked like a champ, up until the Age of Antivirus Software, which always had much grief over writing to executable files.

It's a trick I learned from the original Fortran version of ADVENT.

picomancer12y ago

Readers may be familiar with the TI-83 programmable graphing calculator's assembly language functionality (especially those who took high school math classes in the mid-to-late 1990's). The TI-83's only user-writable storage was 32K of RAM (there was a small lithium battery to keep it powered when you changed the AAA's; also some of the RAM was used for system stuff so somewhat less than 32K was actually available for user purposes).

You could write hex values in the program text editor, then you could tell the calculator to execute the hex codes as machine code. I understand the previous models, TI-82 and TI-85, were hacked / backdoored to run user-supplied assembly language, so TI responded by including an official entry point and developer documentation for the TI-83.

People later wrote loaders which allowed programs to be stored as binary instead of text (using half the space). Some loaders also had the capability to run binary programs by swapping them into the target address rather than copying them (theoretically a third option would be possible, running programs in-place if they weren't written to depend on their load address, but this wasn't a direction the community went in. gcc users may be familiar with -fPIC which produces code which can run from any address, and this flag is necessary when compiling code for shared libraries.)

This allowed people to create massive 20K+ applications (an RPG called Joltima comes to mind), that used most of the available RAM.

The fact that this loading scheme made static variables permanent was also quite convenient. (And most variables were static; stack-based addressing would be tough because the Z80 only has two index registers, one of which is used -- or perhaps I should say "utterly wasted" -- by the TI-83 OS.)

The next generation, the TI-83+, included I think 256K of flash ROM, and a special silver edition was released which contained 2 MB.

chongli12y ago

Reminds me of the approach taken by Xmonad where the configuration is compiled into a new executable and then run.

strictfp12y ago

Thank you for that anecdote, it made my day. Simply awesome.

WalterBright12y ago

I learned a heckuva lot from reading the ADVENT Fortran source code. I was floored when I figured out how it was saving its configuration - such a brilliant idea. And in DOS it could be implemented in about 5 lines of simple C code. (Code size was critical in the old 64Kb days.)

The other huge thing I learned from ADVENT was polymorphism. The comment in the source code "the troll is a modified dwarf" was an epiphany for me.

WalterBright12y ago

I forgot to mention, on a floppy disk system, saving the configuration in the exe file made for fast loading of the program, since it didn't need to do extra floppy file operations to load the config.

aashishkoirala12y ago· 4 in thread

This is what Linus does. He has strong opinions and he throws them around. You can't let that get to you. Both XML and JSON are just fine if used properly.

icebraining12y ago

http://harmful.cat-v.org/software/xml/

theandrewbailey12y ago

This is the first profanity-free Linus rant that I've read in a long time.

vacri12y ago

Almost all of Torvalds' "profanity rants" that get passed around are the result of frustration at an existing conversation, and you can find profanity-free comments by him simply by checking out a slightly earlier one.

aashishkoirala12y ago

Haha, right. I play in the .NET space, so it's never going to happen, but God help us both if I ever have to end up working for this guy.

jmnicolas12y ago· 3 in thread

From the comments (Tristan Colgate) :

"XML is what you do to a sysadmin if waterboarding him would get you fired."

Made my day :-)

Ygg212y ago

That's just mean. Waterboarding isn't that bad...

jmnicolas12y ago

But it gets you fired ... on the other end, nobody has ever been fired for using XML.

nzp12y ago

With my occasional sysadmin hat on, until a few weeks ago I had the luck to never have had to deal with XML configuration files. Then came Solr and now I know what horror is. (To be clear, Solr itself is great, but those god damn config files...)

fuzzix12y ago· 3 in thread

> "I actually want to have a good mental picture of what I'm doing before I start prototyping. And while I had a high-level notion of what I wanted, I didn't have enough of a idea of the details to really start coding."

This I like. The race away from the waterfall straw man has also stripped us of the advantages of BDUF.

While rigid phase-driven project management helps nobody, I think there's still room for speccing as much as we can upfront within iterative processes.

Or you could run to the IDE and start ramming design pattern boilerplate down its throat the second you're out of the first meeting ;)

hvidgaard12y ago

You should be speccing what you want to achieve: the goals, the why, the impact, the external limitations, measures of success and so forth. This also allows you to describe and plan testing up front. The "how" is best handled in an iterative manner.

A lot of people use AGILE to avoid planning at all, which is a particular destructive anti-pattern, and the exact opposite of what you need.

fuzzix12y ago

> "A lot of people use AGILE to avoid planning at all"

Yup, I've seen this a lot.

In one instance "Agile" meant I could finish a major task using an unfamiliar language, framework and code base in short order.

Genuinely, the customer was told "Of course, fuzzix here is familiar with Agile processes so you should have this in 3 weeks".

edit of course this also meant there was no formal spec for the task, though I did have a photo of the whiteboard.

pessimizer12y ago

>The "how" is best handled in an iterative manner.

I think that the first "how" should be planned as much as anything else. I understand how you refactor from v0.0.1 to v5.34.2 iteratively, but I think that getting from vNothing to v0.0.1 is qualitatively different.

If I don't have a complete idea of how my minimally functional thing will work that is small enough that I can completely hold it in my head, and instead just architect by agglutination and test writing, 1) my results are going to be hacky garbage, 2) my first 50 iterations are going to be devoted to replacing it all haphazardly to fix bugs, and 3) the code and interface will become increasingly more complex, harder to work with, and strewn with special cases.

When v0.0.1 is well planned, v2.5.2 may not look anything like the plan anymore, but in my experience it becomes shorter, cleaner, and more correct rather than a giant ball of band-aids propped up with tests.

1 more reply

tedchs12y ago· 3 in thread

Why reinvent on-disk data formats when you can just make a file of protocol buffers? https://code.google.com/p/protobuf/

sparkie12y ago

Why reinvent binary serialization when you could use ASN.1, or any of the thousand binary serialization formats that pre-date protobufs?

lern_too_spel12y ago

For that specific example, you can find a good discussion here: https://groups.google.com/forum/m/#!topic/protobuf/eNAZlnPKV...

McP12y ago

Ironically that has already been reinvented in the form of Cap'n Proto: http://kentonv.github.io/capnproto/

(other than that I agree it's a good solution)

lifeisstillgood12y ago· 2 in thread

What I like is the "I dont start prototyping till I have a good mental picture"

I am currently stuck on a project I want to start becasue I cannot get it to fit right in my (future) head. And I am glad I am not an idiot for not being able to knock out my next great project in between lattes.

(Ok, in direct comparison terms I am an idiot, but at least its not compounded)

specialist12y ago

  "A change in perspective is worth 80 IQ points."
  
  -- Alan Kay

My biggest hurdle solving new problems is divining a unifying, simplifying metaphor. Once you have the right notion, that Eureka! moment, everything falls into place, like magic.

Like how Kepler was able to fully explain Bache's astronomical data once he realized the planets orbits the sun.

Personal example: I used to write print production software. Placing pages onto much larger sheets of paper that get folded and bound into a book. A task called image positioning aka imposition. It took me years to figure out how to model the problem. Key insight was simulating the work backwards, from binding back to the press. Then when I showed the new solution to my coworkers, the response was "Well, duh."

tim33312y ago

Yeah, I noted that too, also that it took him months to to get his good mental picture. It makes me feel not so bad about spending months trying to get clear on some of my stuff.

splitbrain12y ago· 2 in thread

he talks about a save file format, not a file system. or do we have different concepts of "file system"?

sp33212y ago

I agree it's confusing, I think the submitter just meant "system for files" or something.

Pxtl12y ago

That would be excusable if we were talking abuot somebody who writes higher-level programs that would be excusable, but not for a kernel developer.

1 more reply

twic12y ago· 2 in thread

Title is entirely misleading. Tech support! TECH SUPPORT!!

anon412y ago

Have you tried turning it off and on again?

theandrewbailey12y ago

Is your title plugged in?

k2enemy12y ago· 1 in thread

I don't really understand what he's talking about here (my ignorance, not his fault.) Is it something like https://camlistore.org/ that is a content addressable (the git part) datastore?

saljam12y ago

Yep, I thought it sounded like Camlistore, but as a library.

pcj12y ago· 1 in thread

>>So I've been thinking about this for basically months, but the way I work, I actually want to have a good mental picture of what I'm doing before I start prototyping. And while I had a high-level notion of what I wanted, I didn't have enough of a idea of the details to really start coding.

This might be a tangential discussion. Earlier, I used to have a similar approach. Can't code until I have the complete picture. But, it's tough to do in a commercial world and you have deliverables. So, nowadays, I start with what I know and scramble my way until I get a better picture. There are times when that approach works. But, there have been days where I was like - "wish I had spent some more time thinking about this".

I am curious how folks on HN handle this "coding block".

tonyarkles12y ago

I've got a few strategies that might help, depending on the circumstances.

A notebook: I'll write down some notes and just kind of free write whatever thoughts come to mind. If there's something that I think is important to come back to, I'll draw an empty box in the left margin (to be filled with a check mark later)

Readme: start writing the Readme for the project, even if you're not entirely sure of the details. Include code examples. If you don't like how the API is coming together, change it. It's way less work to modify the API now than it will be later.

Write a test: I don't always unit test, but when I do I test first :). This works well on projects that already have a decent test suite. It's kind of an executable version of the Readme.

Branch and Hack: branches are cheap. Make one and start playing. Don't like how it's turning out? Make a new branch and try again!

Ctrl-Z: maybe the answer won't come to you right away. Let it sit and run in the background for a while and come back to it. If I'm worried about forgetting details, I'll write it down in a notebook first.

senthilnayagam12y ago· 1 in thread

why do you need to view filesystem and make it readable for humans, you would interact it via commands "ls" or some gui

git as the basis of filesystem is interesting, hope we don't need to manually make branches and commits to use it

oneeyedpigeon12y ago

Did you read the article? It's not really about the filesystem. 1 part your fault for seemingly not reading the article you're commenting about, 1 part the submitter's fault for choosing such a misleading title.

joelhaasnoot12y ago· 1 in thread

Worked on a project a few years ago where we needed distributed sync capability. Using git (or bazaar or mercurial) was one of the options - store everything in it versus a database. Interesting to see the same thought "coming back".

fit2rule12y ago

I've also used libgit as a means to a similar end - providing versioned data across a local filesystem. Its an idea whose time has come ..

hardwaresofton12y ago· 1 in thread

Why not sqlite or sexpressions? Linus states that databases can't hold previous state but that's not really true...

I'm not sure why git is the best tool for the job in this case, even after reading the post & some of the contents.

tmzt12y ago

They can, if you recreate the primary feature of Git on top of them.

meapix12y ago· 1 in thread

xml haters!!! using other formats how can I define DTDs?

1ris12y ago

https://news.ycombinator.com/item?id=7333354

josephlord12y ago

https://github.com/torvalds/subsurface

I didn't really know what he was talking about but I think this is it.

The title does need changing though as it is definitely file formats under discussion not file systems.

vfclists12y ago

What is it with HN commenters and their demented ability to send topics completely of track? I would have thought someone might have examined the code or what Linus is trying to implement and comment about it.

But here we have threads about Lua, why people hate XML and love JSON and all kinds if irrelevant issues which have been well hashed elsewhere ad nauseam. Why not restrict to an analysis of whatever it is Linus developing?

HN is getting truly annoying and sucky, if it isn't so already.

beagle312y ago

And the actual description is here: http://lists.hohndel.org/pipermail/subsurface/2014-March/010...

Gonzih12y ago

Current title that I see "Linus Torvalds on implementation of human-readable file system" is off. It's about file formats, not file systems.

signa1112y ago

erik-naggum's most excellent xml rant: http://www.schnada.de/grapt/eriknaggum-xmlrant.html

sam_bwut12y ago

At work we have a git backed document store that just saves as json - versioning makes keeping track of audit points nice and easy.

j / k navigate · click thread line to collapse

259 comments

139 comments · 25 top-level

mrcharles12y ago· 32 in thread

(Oh, also being able to debug by altering the save file in any way you want is a godsend).

JackC12y ago

You probably know this, but remember that storing user data as code is a place where you (general "you") have to think very carefully about security.

TelmoMenezes12y ago

This is a good point, but I feel that discouraging this type of approach is not the way to go.

I apologise in advance for ranting... I hope this is not too off-topic, but instead a "zoom out" on the issue.

I understand the security issues, but surely limiting the potential of computers is not the solution. There has to be a better way.

5 more replies

ufo12y ago

In Loa 5.1 you can use setfenv http://www.lua.org/manual/5.1/manual.html#pdf-setfenv

And in Lua 5.2 the functions that eval strings receive the global scope as an optional parameter. http://www.lua.org/manual/5.2/manual.html#pdf-loadfile

chalst12y ago

I trust Lua sandboxing. See, e.g.:

1. http://stackoverflow.com/questions/1224708/how-can-i-create-...

2. http://stackoverflow.com/questions/4134114/capabilities-for-...

Note that sandboxing in Lua 5.2 has a still simpler semantics than for Lua 5.1 - few other languages evolve in a way that makes the language easier to trust.

1 more reply

unsigner12y ago

1 more reply

Guvante12y ago

Typically sandboxing is stage one of any lua implementation. You don't need raw IO access and rarely need to print to the screen for instance.

mrcharles12y ago

The biggest 'concern' would be save hacking, but at the end of the day that will happen no matter what so it doesn't bother me much.

mattgreenrocks12y ago

Preach it. My favorite persistence code: stuff that has nothing to do with SQL/NoSQL.

I'd probably choose something a bit safer now, but it was hard to beat the simplicity.

3pt1415912y ago

I used to do that, but pickle bit me once. I think it changes between versioning or something. I had to start the statistical model from scratch.

2 more replies

dbaupp12y ago

Why does using a Lua-basef format stop the files being corrupted?

politician12y ago

Maybe he means that he doesn't have to deal with bugs in a custom binary serializer.

mrcharles12y ago

Igglyboo12y ago

I don't think it does. I think he meant that if a save became corrupted it wouldn't do so silently, it would violently crash the game because of a syntax error.

Zecc12y ago

It doesn't. But it makes them much easier to fix.

Edit: Igglyboo has a point too.

TheEzEzz12y ago

Tanner12y ago

Also, the Tiled Map Editor exports directly to Lua.

agumonkey12y ago

IIRC that's how Office and Photoshop file format started. I think it's a nightmare for compatibility in the end.

frik12y ago

So, it's similar to JSON (JavaScript), but valid Lua syntax.

  local t = {}
  t = {["foo"] = "bar", [123] = 456}
  t.foo2 = "bar2"

ilovecookies12y ago

balls18712y ago

I did something similar, but used JSON instead (pretty trivial to (de)/serialize LUA tables to JSON. This made it easy to send data to the server, and inspect with standard tools as well.

seivan12y ago

Could persist to disk as a binary, sql or a plist (xml).

I guess the only downside is, that if you got a lot of composite classes all with their own properties and associations (say a graph), there's a lot of manual work to be done.

hootener12y ago

I've had to write output save file formats for various projects on several occasions, and it never occurred to me to take this approach.

Thanks for sharing this, it's one of those ideas that (to me) seems so brilliant in its simplicity that I probably would've never thought of it.

Any hiccups in the day-to-day work using this approach? I'm just trying to get a better idea of the workflow since I'm very seriously considering applying it to my next project.

mrcharles12y ago

Touche12y ago

That's how people are going to cheat at your game.

roryokane12y ago

I like the game much more than if the developers had obfuscated the Lua files so I couldn’t read and edit them.

outworlder12y ago

The save format does not matter at all. It wouldn't matter even if it were an obscure, made-up format. All it would do is slow down 'cheaters' by half an hour.

The only argument against human-editable text files is parsing speed, not security.

1 more reply

phn12y ago

Cheating is good, I remember having tons of fun with age of empires and sim city because I used cheat codes.

If the player has fun, it's a nice feature! :D

1 more reply

jethro_tell12y ago

Plot twist, It's actually a 'teach yourself lua game'

1 more reply

jiggy201112y ago

Does it matter unless the game is multiplayer, in which case you should assume that client files are untrustworthy anyways.

mrcharles12y ago

Hell, I've used it myself more than a few times.

6d0debc07112y ago

Hash the information and include the hash in the file. If the hash and the contents don't match when you try to load it, you can refuse it.

If not loading things is important to you, mind.

1 more reply

saucetenuto12y ago

It's only cheating if the developer disapproves.

bhaak12y ago· 26 in thread

What's with all the XML hate? Of course, doing everything in XML is a stupid idea (e.g. XSLT and Ant) and thanks heaven that hype is over.

Doing hierarchical data in SQL is a bitch and if you want to transfer it, well good luck with a SQL dump. JSON and other lightweight markup languages fail the verification requirement.

rbehrends12y ago

_pmf_12y ago

You fail to address OP's question:

2 more replies

k__12y ago

> the problem is that XML has prevented the adoption of something better

What would be better?

3 more replies

lmm12y ago

Is XML schema really so much better than e.g. JSON schema?

saurik12y ago

> So: namespaces are a pain, making it much harder than it should be to just make my xpath work.

2 more replies

rcxdude12y ago

And because of this almost no-one bothers to actually handle it properly so you often can't actually use the advanced features even if you wanted to.

1 more reply

Sharlin12y ago

bananas12y ago

   { "name": "bob", "salary": 1e999 }

Ah crap! Deserializer blew (in most cases silently converting the number to null)

   <person>
      <name>bob</name>
      <salary>1e999</salary>
   </person>

No problem. The consumer can throw that at their big decimal deserialiser.

And the following is not acceptable as it breaks the semantics of JSON and requires a secondary deserialisation step as strings ain't numbers...

   { "name": "bob", "salary": "1e999" }

JSON is a popular format but it's awful.

8 more replies

_delirium12y ago

People seem to prefer JSON, but I don't find it any better to hand-write/hand-edit than XML. If anything it's slightly worse, because it has more syntax edge cases.

3 more replies

icebraining12y ago

It does exist: http://json-schema.org/

nly12y ago

I personally miss having schemas and XSLT in JSON.

> doing everything in XML is a stupid idea (e.g. XSLT and Ant)

How would you replace all this if you moved away from XML?

http://git.hohndel.org/?p=subsurface.git;a=tree;f=xslt;hb=HE...

_pmf_12y ago

> Its main downfall, just like XML itself, was that it was annoying and time consuming to write.

And impossible to debug. Write once, do something else for some weeks, and trying to understand what you were doing at a later point is nearly impossible.

allochthon12y ago

eponeponepon12y ago

> it was annoying and time consuming to write.

It remains annoying and time-consuming :) But there's no better option for reliably creating valid EPUBs to a predetermined business specification.

sparkie12y ago

tomphoolery12y ago

hardwaresofton12y ago

S-expressions are also a reasonable choice, with some well-placed carriage returns and a serialization implementer that names things well

seanalltogether12y ago

My biggest hatred of xml as a data structure, and believe me I've seen this in production systems more then once, is that it allows for the following.

  <customer>
	<account>
		<type>Personal</type>
		...
	</account>
	<account>
		<type>Business</type>
		...
	</account>
	<custid>496F3AB</custid>
  </customer>

XML is a document markup language, not a data format.

tokenizerrr12y ago

Well yes. The problem there is that someone made a bad decision on how to structure their XML. If the same was done like this:

      <customer custid="496F3AB">
    	<account>
    		<type>Personal</type>
    		...
    	</account>
    	<account>
    		<type>Business</type>
    		...
    	</account>
      </customer>

it would make a lot more sense, I think.

2 more replies

mahyarm12y ago

The likelihood of a JSON feature biting you in the ass like that is far lower. Don't use XML until you actually need something XML SPECIFICALLY provides.

Also JSON easily translates with easy to work with dictionaries and lists, XML parsers take more code to work with equivalent items.

throwaway575212y ago

He hates it, but he's using it. Take it for what it's worth.

wtbob12y ago

S-expressions work great. Syntax checking is far simpler, and validity checking is hence something you can roll yourself (and writing an S-expression schema checker ain't tough).

lucian190012y ago

Similarly, what's with the binary format/protocol hate? That's what brought us our current extremely common mishandling of text encoding.

Sir_Cmpwn12y ago

I haven't used it myself, but lisp seems well suited to the task. I've also heard good things about yaml, which is more well-supported by your language of choice.

norswap12y ago

I'd say use JSON, but JSON doesn't have adequate schemas yet (JSON-Schemas is crap).

outworlder12y ago

Because XML solves a 'problem' in the worst possible way. It is not that easy to parse for machines and only the simplest XML files are readable by humans.

Besides, since 1960 or thereabouts we have S-Expressions. The world should just have used that without reinventing the wheel once again.

oneeyedpigeon12y ago· 11 in thread

Sure, XML can be nasty, but thats very much a function of the care taken to a) format the file sensibly b) use appropriate structure (i.e. be as specific as necessary, and no more).

eponeponepon12y ago

Document markup is the one place XML is a no-brainer - more specifically, long-form, highly structured documents (i.e., essentially books).

ternaryoperator12y ago

>Document markup is the one place XML is a no-brainer

That's only true for minimally formatted documents. For anything that approaches professional typesetting requirements, XML is a nightmare.

Sometimes, that's simply not possible to arrange and so you have to close a whole bunch of elements and then reopen all but one of them.

All this because a constraint of the format.

1 more reply

bct12y ago

Yeah, we're still well in the backlash phase of XML's hype cycle.

I just hope that opinion of it as a markup language can be rehabilitated before someone reinvents it and kicks off a new hype cycle.

babarock12y ago

For those who missed it, here's what Linus wrote in the comments:

"+Aaron Traas no, XML isn't even good for document markup.

Use 'asciidoc' for document markup. Really. It's actually readable by humans, and easier to parse and way more flexible than XML.

XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist.

INI files are fine for simple config stuff. I still think that "git config" is a good implementation."

bhaak12y ago

Linus' adversion to XML explains also why parsing git's output is so abysmal inconsistent.

1 more reply

edoloughlin12y ago

Use 'asciidoc' for document markup.

I just had a quick scan of the user guide. It's very impressive. Looks like markdown but with all the edge cases thought out.

1 more reply

underwater12y ago

saraid21612y ago

> in the end: putting everything in one file is just not a good idea. There's a reason people end up using simple databases for a lot of things.

I'd really like to hear more about this perspective, if anyone feels like they can elaborate.

72deluxe12y ago

I think he overuses the single-word sentence "really" too much. Really.

adobriyan12y ago

This link needs to be posted again and again and again.

I'm sure quite a lot of people will easily recognize it. :^)

Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)

https://groups.google.com/forum/message/raw?msg=comp.lang.li...

oneeyedpigeon12y ago

That's a wonderful rant - I particularly appreciate the digression into anti-bush rhetoric - but:

3. Character entities vs. unicode - totally agree. Wherever possible, I use proper unicode characters rather than ugly character entities in my markup.

4. "But the one thing I would change the most from a markup language ... is to go for a binary representation." Linus would vehemently disagree on this point.

2 more replies

tzury12y ago· 8 in thread

I just realized that Linus' posts are the only reason I ever go to Google Plus.

cbsmith12y ago

The question nobody is asking, but actually should is: I wonder what other good G+ content you are missing?

G+ is largely misunderstood. It is a lousy tool for interaction with people connected to you purely socially. It's a very good way to find and interact with people connected to you by interest.

icefox12y ago

1 more reply

ChikkaChiChi12y ago

This is exactly how I explain Google+ to folks. It's built for communities, not cliques.

jan_g12y ago

For me it's not just G+, but also Facebook and Twitter. Only reason I ever visit those sites is indirectly through HN posts and similar.

npsimons12y ago

unsigner12y ago

Linus : G+ :: notch : Java

xentronium12y ago

That's unfair. Lots of infrastructural projects are done in java. E.g. my personal favorite: lucene (+ solr, elasticsearch).

1 more reply

kurrent12y ago

I wonder if Linus ever reads Hacker News.....

bananas12y ago· 7 in thread

I think this title is wrong.

Firstly some clarification - this appears to just be about the persistence format for his dive log. It was XML, now it's git based with plain text.

As someone who had to manage a system which worked with plain text files structured in a filesystem for a number of years in the 1990s, this is done to death already.

You now end up with the following problems: locking, synchronising filesystem state with the program, inode usage, file handles to manage galore and concurrency. All sorts.

Basically this is a "look I've discovered maildir and stuffed it in a git repo".

Not saying there is a better solution but this isn't a magic bullet. It's just a different set of pain.

e12e12y ago

> You now end up with the following problems: locking, synchronising filesystem state with the program, inode usage, file handles to manage galore and concurrency. All sorts.

xsace12y ago

Maybe you want to wait till he release something. Cause you know, if he took months to get the big picture in mind, I doubt you grasp what he envision just by reading his comment.

bananas12y ago

If it's not that, I'll eat my hat, and my pyjamas.

There's not much more to infer from the comment.

Unless he's invented a new ASN.1 encoding which plugs into libgit or something or a new text serialisation format (both unlikely).

bsder12y ago

Yes, because his design of git was so well-formed.

Git is so well-designed that expert users manage to trash their repositories and propagate the damage.

Maybe that's not a problem of libgit. But tools are both the infrastructure and the UI.

1 more reply

crucialfelix12y ago

But what he replaces it with is a git object store. Each xml-node becomes a git object. They each point to a parent (just as git commits point to a parent commit).

Now writing to this datastore means adding a new node to the git object database and changing the parent references.

Where git stores commits that are related sequentially in time, this stores nodes in a tree relationship that IS the document.

If he's not talking about this then I'd like to officially take credit for my weird idea right now.

dangoor12y ago

Regardless, I would think that some applications are simple enough (store few enough separate objects in the file system) that the issues you cite are not likely to cause a problem.

mixedbit12y ago

What you describe is quite similar to how gollum wiki uses git for storage: https://github.com/gollum/gollum

WalterBright12y ago· 5 in thread

Running the new executable then loaded the new configuration. This worked like a champ, up until the Age of Antivirus Software, which always had much grief over writing to executable files.

It's a trick I learned from the original Fortran version of ADVENT.

picomancer12y ago

This allowed people to create massive 20K+ applications (an RPG called Joltima comes to mind), that used most of the available RAM.

The next generation, the TI-83+, included I think 256K of flash ROM, and a special silver edition was released which contained 2 MB.

chongli12y ago

Reminds me of the approach taken by Xmonad where the configuration is compiled into a new executable and then run.

strictfp12y ago

Thank you for that anecdote, it made my day. Simply awesome.

WalterBright12y ago

The other huge thing I learned from ADVENT was polymorphism. The comment in the source code "the troll is a modified dwarf" was an epiphany for me.

WalterBright12y ago

aashishkoirala12y ago· 4 in thread

This is what Linus does. He has strong opinions and he throws them around. You can't let that get to you. Both XML and JSON are just fine if used properly.

icebraining12y ago

http://harmful.cat-v.org/software/xml/

theandrewbailey12y ago

This is the first profanity-free Linus rant that I've read in a long time.

vacri12y ago

aashishkoirala12y ago

Haha, right. I play in the .NET space, so it's never going to happen, but God help us both if I ever have to end up working for this guy.

jmnicolas12y ago· 3 in thread

From the comments (Tristan Colgate) :

"XML is what you do to a sysadmin if waterboarding him would get you fired."

Made my day :-)

Ygg212y ago

That's just mean. Waterboarding isn't that bad...

jmnicolas12y ago

But it gets you fired ... on the other end, nobody has ever been fired for using XML.

nzp12y ago

fuzzix12y ago· 3 in thread

This I like. The race away from the waterfall straw man has also stripped us of the advantages of BDUF.

While rigid phase-driven project management helps nobody, I think there's still room for speccing as much as we can upfront within iterative processes.

Or you could run to the IDE and start ramming design pattern boilerplate down its throat the second you're out of the first meeting ;)

hvidgaard12y ago

A lot of people use AGILE to avoid planning at all, which is a particular destructive anti-pattern, and the exact opposite of what you need.

fuzzix12y ago

> "A lot of people use AGILE to avoid planning at all"

Yup, I've seen this a lot.

In one instance "Agile" meant I could finish a major task using an unfamiliar language, framework and code base in short order.

Genuinely, the customer was told "Of course, fuzzix here is familiar with Agile processes so you should have this in 3 weeks".

edit of course this also meant there was no formal spec for the task, though I did have a photo of the whiteboard.

pessimizer12y ago

>The "how" is best handled in an iterative manner.

1 more reply

tedchs12y ago· 3 in thread

Why reinvent on-disk data formats when you can just make a file of protocol buffers? https://code.google.com/p/protobuf/

sparkie12y ago

Why reinvent binary serialization when you could use ASN.1, or any of the thousand binary serialization formats that pre-date protobufs?

lern_too_spel12y ago

For that specific example, you can find a good discussion here: https://groups.google.com/forum/m/#!topic/protobuf/eNAZlnPKV...

McP12y ago

Ironically that has already been reinvented in the form of Cap'n Proto: http://kentonv.github.io/capnproto/

(other than that I agree it's a good solution)

lifeisstillgood12y ago· 2 in thread

What I like is the "I dont start prototyping till I have a good mental picture"

(Ok, in direct comparison terms I am an idiot, but at least its not compounded)

specialist12y ago

  "A change in perspective is worth 80 IQ points."
  
  -- Alan Kay

My biggest hurdle solving new problems is divining a unifying, simplifying metaphor. Once you have the right notion, that Eureka! moment, everything falls into place, like magic.

Like how Kepler was able to fully explain Bache's astronomical data once he realized the planets orbits the sun.

tim33312y ago

Yeah, I noted that too, also that it took him months to to get his good mental picture. It makes me feel not so bad about spending months trying to get clear on some of my stuff.

splitbrain12y ago· 2 in thread

he talks about a save file format, not a file system. or do we have different concepts of "file system"?

sp33212y ago

I agree it's confusing, I think the submitter just meant "system for files" or something.

Pxtl12y ago

That would be excusable if we were talking abuot somebody who writes higher-level programs that would be excusable, but not for a kernel developer.

1 more reply

twic12y ago· 2 in thread

Title is entirely misleading. Tech support! TECH SUPPORT!!

anon412y ago

Have you tried turning it off and on again?

theandrewbailey12y ago

Is your title plugged in?

k2enemy12y ago· 1 in thread

I don't really understand what he's talking about here (my ignorance, not his fault.) Is it something like https://camlistore.org/ that is a content addressable (the git part) datastore?

saljam12y ago

Yep, I thought it sounded like Camlistore, but as a library.

pcj12y ago· 1 in thread

I am curious how folks on HN handle this "coding block".

tonyarkles12y ago

I've got a few strategies that might help, depending on the circumstances.

Write a test: I don't always unit test, but when I do I test first :). This works well on projects that already have a decent test suite. It's kind of an executable version of the Readme.

Branch and Hack: branches are cheap. Make one and start playing. Don't like how it's turning out? Make a new branch and try again!

senthilnayagam12y ago· 1 in thread

why do you need to view filesystem and make it readable for humans, you would interact it via commands "ls" or some gui

git as the basis of filesystem is interesting, hope we don't need to manually make branches and commits to use it

oneeyedpigeon12y ago

joelhaasnoot12y ago· 1 in thread

fit2rule12y ago

I've also used libgit as a means to a similar end - providing versioned data across a local filesystem. Its an idea whose time has come ..

hardwaresofton12y ago· 1 in thread

Why not sqlite or sexpressions? Linus states that databases can't hold previous state but that's not really true...

I'm not sure why git is the best tool for the job in this case, even after reading the post & some of the contents.

tmzt12y ago

They can, if you recreate the primary feature of Git on top of them.

meapix12y ago· 1 in thread

xml haters!!! using other formats how can I define DTDs?

1ris12y ago

https://news.ycombinator.com/item?id=7333354

josephlord12y ago

https://github.com/torvalds/subsurface

I didn't really know what he was talking about but I think this is it.

The title does need changing though as it is definitely file formats under discussion not file systems.

vfclists12y ago

HN is getting truly annoying and sucky, if it isn't so already.

beagle312y ago

And the actual description is here: http://lists.hohndel.org/pipermail/subsurface/2014-March/010...

Gonzih12y ago

Current title that I see "Linus Torvalds on implementation of human-readable file system" is off. It's about file formats, not file systems.

signa1112y ago

erik-naggum's most excellent xml rant: http://www.schnada.de/grapt/eriknaggum-xmlrant.html

sam_bwut12y ago

At work we have a git backed document store that just saves as json - versioning makes keeping track of audit points nice and easy.

j / k navigate · click thread line to collapse