The problem with such strongly hierarchical system is that it fails if there is some document, note, picture, etc. that would be useful to keep in multiple locations. Obviously we can introduce links between objects, but I believe tags are more comfortable to use.
Hierarchical system, folders are artifacts of the physical world in which a single object, tool, pipe, screw, book cannot be in two places at the same time. In the abstract world of computers a note about new game could be in #games, #fun, #to-check, #interesting-ideas, #great-graphics, etc.
"Tags fucking suck."
They are literally the worst possible way to store and organize your information, and they are only useful when you just want a random sampling of a category - not a specific document or piece of information. Ex: Great for social media or looking at old photos or just playing a song from a genre you like, bad (fucking terrible) for organization and structure.
---
Hierarchical structures have downsides, but the exact thing you complain about (artifacts of the physical world) is exactly their strength... You have a body that is adapted to the physical world - routing and navigation through a series of ordered steps is a VERY well developed human skill. We are primed to be able to remember things like:
- Go left at the tree,
- Straight until you hit road
- Right at the road
- continue until you hit a red house with a big garden
- etc...
That skill set maps directly into the hierarchical system of folder:
- Find the "documents" folder on the desktop
- scroll down to "my super sweet project"
- open that folder
- Find the "icons" folder
- open it and double click "exactly_the_thing_you_wanted.jpg"
------
You can absolutely still make horrible, unorganized messes - but if done well (ex: this article is actually a fairly good system) it's a much, much better system than tags.
https://en.wikipedia.org/wiki/A_City_Is_Not_a_Tree
Your brain doesn't organize information hierarchically. Let's say I ask you:
1. Name a band that starts with "B".
2. Name a band from England.
3. Name a rock band.
If your brain stored bands in a hierarchy, you'd only be able to come up with "The Beatles" as an answer for one of those questions. You'd have to figure out whether to categorize the Beatles by name, location, or genre and it would be absent from the other categories.
For instance there are ideas from OWL where you could define a category instead of other categories and their attributes, for instance tag D could be the union of tag A and tag B and the complement of tag C.
Implication is also useful both as a way to implement subclassing but also containment relationships. For instance on Danbooru a character that has several forms would have the various forms of the character imply that character and the character would imply the media property that the character comes from.
I am looking at what a tagging system looks like in the transformer age and one key idea is a kind of three value logic around tags which can be in a “positive”, “indeterminant” and “negative” state. If you are training a machine learning system to auto tag you will need (1) a number of examples where a tag does not apply (the tag not being applied is not evidence that the tag doesn’t apply, poor coverage of negative examples is one reason why YouTube recommendation is worse than TikTok) and (2) to deal with cases where the ML model tags something incorrectly. If the model tagging something puts it in an indeterminant polarity and that result can later be switched to negative or positive that is a great way to manage the situation.
Everywhere where you have a lot of stuff to manage (photos, music, videos, documents, links) hierarchies don't work and only tags can tame all the chaos.
The analogy to "path finding" doesn't hold, imho. That's not how our brains organize information! We organize memories by association and not by some hierarchical structures.
some of the most common reasons
- things exist in multiple categories that aren't in the same branch of the tree
- different state of mind during data retrieval means you expect the same item to be in different categories.
- different humans think the same thing belongs in different hierarchical locations
there's also been a LOT of scientific research around informational organization. It all came to the same conclusion. Hierarchies have interesting promises but fail when it meets the practical reality of the human brain.
in the end hierarchical organization of knowledge is a terrible solution expect in VERY restricted cases.
Links are great as part of that too, they can provide shortcuts.
Real-world use: I am an artist, and I have found that the best way to organize my work is with a series of yearly directories. If I begin a large, multi-year project, it goes in a directory within the year I start it; I'll make a link to it that lives next to all the yearly directories.
I also use OSX's tags a ton. Files get marked as 'in progress', 'complete', 'paid for', 'commission', and 'experiment' (and a few other things). When I want to decide what to work on in any particular day it's super easy to open up the saved search for "everything in progress" that I keep on my desktop; this shows me everything in those yearly directories that's marked as 'in progress', whether it's personal work, client work, whether it's part of a large multi-file project with its own folder hierarchy or just a single file in the yearly directory. I also have a saved search for 'commission'+'in progress' for those days when I know I want to work on clearing the commission queue. And whenever I spend some time just fooling around with different effects to create interesting looks, I'll save my scribblings with the 'experiment' tag; when I decide to use it later I can easily tell Illustrator to open a file, and look through the 'experiment' tag to find the file full of some crazy procedural explorations, regardless of how long ago I did it. This habit has saved me hours of digging for that one file where I did that cool trick once.
Trying to organize all the files in my artwork directory with just tags would be a total fucking nightmare, the subdirectory for a multi-year graphic novel has its own folder hierarchy that's several levels deep, and when I know that what I want to work on today is "getting the prepress files together for book 3 of the graphic novel" it's definitely great to be able to just hit the top-level link to the graphic novel directory, then go into "books", then "3", and have its own little file hierarchy in there.
Tags by themselves are not very good for serious organization, but they can be very good for pulling things out of a hierarchical structure. They take work - I have to remember to mark a new file as 'in progress' and possibly a 'commission', though that's become routine, and changing something from 'in progress' to 'complete' is a pleasure. But it's work well worth doing to create a nice little network of shortcuts and secret passages through the terrain of your thoughtfully-laid-out tree of folders.
I find that this skill is better utilized with a system that has hyperlinks like Obsidian.
Also purely hierarchical systems break down over time, they can be supported with tags. https://karl-voit.at/2022/01/29/How-to-Use-Tags/
> To my surprise, we tend to think in hierarchical categories all the time. As I have written in my article on Logical Disjunct Categories Don't Work, the real world does not fit into disjunct categories.
> Therefore, we should embrace multi-classification more often. If you do want to learn more about the rationale, you may as well read the first chapters of my PhD thesis or the book "Everything is Miscellaneous" by David Weinberger, just to give you two resources of many.
> Long story short: tagging does take away the burden of finding one single spot in a strict hierarchy of entities which is actually a heavily intertwined network of concepts we do find in the real world. It's far from being a neat hierarchy. Everybody who tries to put "the world" into a strict hierarchy will fail.To my surprise, we tend to think in hierarchical categories all the time. As I have written in my article on Logical Disjunct Categories Don't Work, the real world does not fit into disjunct categories.
They created this so the hierarchy is unambiguous (as much as possible), you want a document, you are two steps away from it in an easy to find way.
tag systems have far too much maintenance and adding a new tag is almost impossible to do exhaustively so you have a lot of partial tags.
This isn't a response to the parent commenter's point, right? They were describing how many projects have items where a resource easily fits within the scope of N different categories, at which point they become max N steps away from it, not max 2 steps.
I think there's a good way forward that uses typical hierarchical Johnny.Decimal filesystems, with an overlay filesystem with tags that can update the tags every so often based on the content in the files. Obviously letting the user have a hand in this via a TUI/gui would be helpful for choosing tags for which they're comfortable.
Unfortunately I haven't settled on a good filesystem with tags (how to do this with ZFS?) or how to interact with it as a network filesystem served to many different OS (cifs with tags?).
> you want a document, you are two steps away from it in an easy to find way
This is not how people work in general. This kind of thing might be OK for institution for taxonomy like collections.
Many think hierarchies come from limits in the physical world but that's not what's happening. Yes, that's some of the cause but does not explain all of it.
The deeper rooted reason is that hierarchies are a convenience to aid the human mind. Even without any limitations of physical shelves, the brain likes to:
- notice the relationships from the general-to-specific and navigate them with spatial cues of dirs parent-->child-->grandchild-->etc
- group related items together -- using spatial cues of moving file icons into a file system folder
The world the the blog essay is working in is the os file system. The various files have to be put somewhere on the file system. Since putting hundreds/thousands of files into a single flat folder is useless, one creates some child subfolders to organize it it in some way.
The tagging system assumes a different mechanism (e.g. a separate "database" of tags which filesystems like Microsoft NTFS and Linux ext4 do not have natively.) This happens above the native filesystem. (Incidentally, by placing a file into a subfolder, the name of that folder and the names of parent folders above it act as an "implied set of tags" for free.)
That said, both hierarchical folders and tags solve different needs. Also, hierarchies simulate/approximate "tags" by "virtual folders" and 1-to-n softlinks. Likewise, tagging can simulate "hierarchies" via compound-multi-word-tags.
Of course the other problem with tags is management. Placing something into multiple relevant categories involves more effort. Failing to place something into a relevant category makes it harder to find since you are now dealing with either a flat file namespace (worse yet, a disorganized one) or a flat tag namespace. In theory, some of this can be handled by letting someone else handle the tags (e.g. the creator, the publisher, or the seller), but that has its own problems since there is frequently a conflict of interest (e.g. irrelevant tags are applied to increase the visibility of a product).
At the end of the day, we have to accept there is no perfect system of categorization. Some will prefer hierarchies. Some will prefer tags. From the tone of the article, it is clear that they prefer hierarchies.
I’m the Johnny who wrote Johnny.Decimal and this is basically it.
The OP clearly isn’t one of the people for whom finding JD is a massive mental relief. I know those people exist: they write and tell me.
Others find the idea baffling. Stupid, even. That’s fine. If this helps you, enjoy it. If it doesn’t, use something else.
Grep is self-explanatory. Linking works like hard links in Unix, where the same note appears as a child of multiple different parents (added a command to find "orphans" in case you unlink it from everywhere).
At this point I might not even bother adding tags.
What you described with hard links is exactly how I use tags, so that would satisfy my need for tags as an organizational tool.
As an example, if you’re organizing your toolbox, you don’t mark a drawer “hand tools” because it’s a useless categorization. You mark one “socket tools” which will include everything from the sockets and wrenches themselves to adapters that connect a socket to an impact wrench (but an impact wrench does not go in there because it is not exclusively a socket tool). If it really does come down to something that may really fit in two categories (hey, there’s always exceptions), you put your mindset in the place of yourself when you want to look it up: what’s the most common situation in which you’ll be looking this thing up?
* Allocate space up front in the form of containers
* Position containers around workspaces
* Use containers appropriate to the type of object and its use(e.g. "rounds in rounds" - put round bottles on turntable racks so you can spin to access)
* Duplicate objects you need to use in multiple locations, e.g. scissors for the kitchen and for the office
* Label spaces where things belong
And the key thing to it is that this isn't a hard rule like always organizing hierarchically or always labelling. The hierarchy helps compress space(that's why books and folders are powerful) and the labels help define uses, but in many instances, the level of organization you need is an open bin with some dividers - the drawer organizer, cube storage, cardboard box, book bin, cafe tray etc.
Computer file systems are somewhat resistant to unlabelled open-bin storage because that means you're allocating with less precision, but I think everyone in practice knows that they will shove things in "Documents" or "Downloads" and just periodically purge it.
Edit: Some corrections. I forgot to mention which OS: GNU/Linux and/or BSDs.
https://github.com/jbruchon/jdupes
Though that works fine from a script perspective I'd like some more interactive way of sorting directories etc. Identifying is just the first step, jdupes helps with linking the files (both soft and hard links comes with caveats though!) but that is mostly to save space, not to help in reorganisation.
I've tried many, but rmlint is the most flexible and reliable. Esp. the tagging works really well.
And in the mean time, all my stuff is searchable, browsable, findable, and tidy.
I'm not saying it will work equally well in all environments or for all purposes, but for mine, it solved many years' worth of stress.
This is an important point. A person’s interests and areas of responsibility evolve over time; so refactoring is not only permissible; it’s probably also helpful to unload accumulated organizational cruft that’s no longer relevant.
When it comes to indecision about where a file goes, I’ll often just place a .txt file in the “wrong” location pointing to the correct spot. Or an alias.
(I'm not a user of this; just guessing)
He does seem to address this at least somewhat[0], but the justification is so flimsy it's hardly worth addressing. In essence, he doesn't like alphabetical ordering because the index can change when something new is added. He would prefer new folders to be inserted at the end of the list. He is evidently unaware that folders can be sorted by creation date.
[0] https://johnnydecimal.com/10-19-concepts/11-core/11.02-areas...
It forces you to whittle your categories down to ten (and sub categories). I would argue that in and of itself is a useful constraint.
It's basically the same as the described system except you are forced to categorize your files even more severely since every level of the hierarchy only allows two subcategories.
It must therefore be superior, right?
The 03.65 like naming can indeed be switched out for something with words, but I believe the best of both worlds is to make the words "unix-like", i.e. small, and explanatory.
For instance *~* 10 main directories (code, doc, vid, etc) with *~* 10 subdirectories (note, tv, movie, etc) is nice to try to fit your data into, but if one of the subdirectories has only 8 things, it's not the end of the world. This tends to work extremely well for "longer term" storage (a drive mounted beside your OS for data when 'finalized' or 'semi-finalized') but the mess of OS and everyday files isn't as appropriate for it.
I use SimpleNote a lot for JD content and put the category in the title of each note. I type a piece of the JD number in the search box and it instantly filters down to relevant notes. Sort by title sorts by topic.
And yet, we've all been to a library. Information organized by topic, then by author, and inside the books everything is further organized into chapters, and then there's an index referencing all of that (plus a card catalog/search system).
I use something similar to the Johnny Decimal system described at work, except the high level is by project not by topic. I find chronological filing split into projects (i.e. chunks of time/money/effort) matches my workday better.
Companies run on mental models that are occasionally partly solidified (and ultimately ossified) in a textual format.
Seems like better tech could improve this.
Arguably Whatsapp history already has, because at least stuff tends to collect in one place and be searchable, as opposed to being on desktops sent to individuals on request and forgotten.
Start -> Open Folder -> End
In reality, natural language uses synonyms that often start with different letters. So without numbers, I still need to scan every directory one by one.With numbers, I assign categories according to the phase of the process in which the item occurs. For example,
1 plans
|- A first draft
|- B Lisa's notes
`- C design
2 analysis
|- A exploratory
`- B design implementation
3 deliverables
|- A May 2023 report
|- B June 2023 presentation
`- C August 2023 report
I can limit my search to folders and items that are in the low/medium/high range, according to what I am looking for. But alphabetically sorted, this directory structure would look much more ad hoc: analysis
|- design implementation
`- exploratory
deliverables
|- August 2023 report
|- June 2023 presentation
`- May 2023 report
plans
|- Lisa's notes
|- design
`- first draftWay back in 2010 or so I published a series of instructions for the 36th Wing that followed this kind of naming/information numbering convention which was frustrating to fit into, but ultimately once you understand the framework it's faster to write.
That isn't to say it isn't confusing and complicated - which happens to everything at scale - simply that this kind of structure for documentation is pretty common and literally battle tested.
[1]https://www.esd.whs.mil/Portals/54/Documents/DD/iss_process/...
But there is a good reason why I navigate to news.ycombinator.com and not 209.216.230.240.
For digital resources like URLs or file systems, using numbers as prefixes or primary IDs only makes sense if their ordinal values represent the most important and intuitive way to browse through the hierarchy.
But in most cases, the name rather than the number is the most important thing, and it's very easy to sort or filter by name -- whereas sorting or filtering by number is only useful if there's an inherent ordering (e.g. date modified) to the numbers.
Names can also be difficult if not done correctly / uniformly. For instance, "Category Name", "CategoryName", "category_name", and "category-name" can all return differently through search.
I don't think the key is names vs. numbers vs. whatever else, I think it's more important to pick a system that works for the use case, then define / document / communicate it as wide and loud as possible.
What an opening sentence.
This method of only being two levels deep is interesting. If it works, that's great, but there's nothing to stop you from going three if required, e.g. 10.20.30. But keeping everything constrained has value in itself, if only in that it forces you to think in larger discreet chunks.
Any sufficiently complicated library management system contains an ad-hoc, informally-specified, bug-ridden, inconsistent implementation of half of the Dewey System [2].
[1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
[2] https://en.wikipedia.org/wiki/Dewey_Decimal_Classification
It's also not informally specified. The shared link is literally the specification document. It's written in a kinds of informal style, sure, but that's a different kind of informal - Greenspun's informal means "not written down at all".
My top level relations:
* Fun: Sex, drugs, rock & roll
* Home: Rent, buy, interior, yard, cars, places
* Meta: This system
* Mind: Philsophy, language, math, art, music, science
* Money: Accounts, investments, Bitcoin
* People: Family, friends, everyone
* Self: Fitness, health & illness, spirit, food, fashion
* Tools: Computing, devices, productivity, maker, crafts
* Work: Career, job
Roget's original thesaurus, which divides every word into 6 (or something) top-level relations was also an inspiration.
These are my root items in Workflowy (with its infinitely nested bullets).
I star active projects so they show up in the sidebar. I shift-drag (to mirror) items out of projects into the root (above the relations) to serve as my daily todo list. All in all, simple, efficient, and comprehensive.
I have a template with empty folders and files (like Notes.md, Todo.md, etc.), and I can copy-paste this template for each new project. As long as I improve my template, every future project will have the new structure.
It's like the GTD system (which I also enjoy), but for organizing your thoughts, notes, and files in different projects. It's weird because I'm not fond of naming folders with numbers but this time it seems to work. Every project has the same structure and I'm not lost. I guess it's good for people who needs a serious structures as it forces you to have a good organization.
Interestingly, I had a boss 10 years ago that was using an equivalent method with a template and numbered directories. He was successful at managing projects and I think I discovered his secret.
Last but not least, once a project is done, I can zip it and reuse its number.
It helps with two things: - 1. A little easier to be consistent across projects so not to reinvent the wheel every time - 2. The prefix increments as new folders are added during a project, painting a convenient picture of “progress” as things move along.
We tend to have: 10 to 19 reserved for admin stuff, like Admin, Incoming, Outgoing, Documentation, Meeting notes, etc.
Then anything from 20 onwards is ad-hoc per project
We also timestamp children of Incoming and Outgoing, with an ISO prefix. This is very useful to keep track of what was received and shared and when.
Overall the goal is to have as little protocol as possible to prevent total chaos. Anything more than that is usually too much to ask or doesn’t stick longer than a single project.
10. Admin
11. Incoming
2023-10-12 sender, subject
12. Outgoing
2023-09-01 Estimate
13. Documentation
20. Design
30. Production
40. BlahI hate it.
The problems I have with it (some of them implementation details that can probably be fixed)
- On smaller projects, you have a big directory tree of nothing, with maybe a quarter of the directories being populated. This is because it starts from a template.
- You tend to get long directory paths, enough to get over MAX_PATH in some instances, don't fit in a single line, etc...
- Remembering arbitrary numbers is hard. Try using arbitrary numbers in your code for your variables, I am sure it will be appreciated...
- And especially when there are several number based systems in place. So you have the software version number, the ticket number, the number system used by your customer, etc... Do you really want another number system on top of that?
- The article says there is no overlap. There is never "no overlap" in the real life. For example, as a dev, I should have nothing to do in the "sales" folder, except that the technical specifications are here because they are part of the contract. It really belongs in both "sales" and "dev".
- I still use search as may primary tool.
Note that someone mentioned the military. I have worked on defense contracts, they are the worst. Acronyms and codes everywhere, I guess they are too special to name things with regular words. And I am talking about the unclassified stuff, it is even worse when confidential information is involved: "The name should follow the ZB4455 convention, ZB4455 is in document L45.34c, can I have L45.34c? No it is classified, but actually, it just means it should be lowercase and start with an underscore." So I wouldn't take what the military does as a good example.
Besides, if you named everything with regular words your poor MAX_PATH would be working overtime. There's a time and a place for abbreviations and codes, and if a multi-theater, technically-advanced military force with global and extraplanetary reach isn't one of those times and places, then I don't know what would be.
But I do agree with you about assigning an arbitrary number to a project. 773.0034 is not that helpful a descriptor and I wouldn't want to see a whole "Downloads" folder full of those. But it does help you find things quickly.
Neither here nor there but god I need a drink. And it’s only 8am. Reading that sentence reminds me of the nationalistic radio broadcasts mentioned in A Canticle For Lebowitz before everyone is annihilated in nuclear fire…
Interestingly, we have a similar "BASIC line numbering" system in our company. Allows for easy traversing the directories if you can remember the numbers (I cannot), such as "05_Contracts/15_Employees/041_John_Doe/07_Testemonies".
I like how simple the core concept is explained, but I feel it would box me into categories when I like tags more (categorizing items in multiple orthogonal domains). OTOH maybe well thought-out categories would bring more structure than tags.
My current notes strategy is to prefix the date to markdown filenames (for example '2023-05-31 canvas scan transform matrix.md') and put them into single dir. These are active journal-style notes that I'm free to update over next days while they are still in focus. Every few weeks the list of nodes gets busy and I 'archive' older notes into sub-dirs (personal, hobby project, work project) and backup the whole structure. The method requires minimal maintenance and the full text search works well for my needs.
Edit: I like how the author leverages the CLI auto-completion and I try to do the same, but I think Johnny would work against my brain. When naming the directory or a script, I put myself in mind frame where I'd want to use it and I'm trying to recall its name. So I give semantic names like 'build-android.sh'. If it's a new thing I try to come up with a short catchy name for it. Having to recall the `10-19` category each time I want to access specific subscope seems like too much cognitive burden. Just theorizing, haven't given it a shot so far.
> An important restriction of the system is that you’re not allowed to create any folders inside a Johnny.Decimal folder.
This being said immediately after a screenshot with three levels of directories confuses me. One problem I immediately identified with this system is that I would have to take extra steps to peek into the applicable directory to see what the current index is...
I'm always looking for a good organizational methodology. This seems to be per project, no? Any suggestions for a system for overall data organization?
Me too, but reading more I understand this now. A "Johnny.Decimal folder" is a folder that starts with a name like 12.04, meaning it represents a unique item. It will already be inside two other folders, the 12 folder and the 10-19 folder. The point is that while 12.04 can be a folder if the unique item is actually multiple files, you can't have more folders inside 12.04, because that's considered too much nesting.
> This seems to be per project, no? Any suggestions for a system for overall data organization?
Multi-project organization is covered later on: https://johnnydecimal.com/10-19-concepts/13-multiple-project...
I do allow myself subsubdirs wherever it makes sense though. E.g. right now I have a file browser open to "64.05 TV Shows" (60 - 69 is "Media"; 64 is "Video"), and within 64.05 I have one subdir per TV show. I don't feel obliged to give each show a special number, and I also don't feel troubled by each show being a sub(sub)dir. This system is searchable and browsable within my tolerances.
The other day I looked at my DEVONthink database I’ve populated over the last 15 years or so, and what do ya know. It has a couple dozen top-level folders, each with a handful of folders inside, and that’s about it. I didn’t deliberately set out to do this, but “Banking/{Bank1|Bank2|Bank3}”, “Medical/{Me,wife,kid}”, “Taxes/{2020,2021,2022}”, and so on evolved that way anyway.
I love the idea of tagging, but turns out nearly all the information I care to store long-term can be filed more easily than it can be tagged. It’s rare that I want to have the same doc in 2 places, mainly limited to when I’m collecting information to send to someone else (e.g. filing taxes, applying for a business loan). When that happens, I just - shocker! - make copies of those docs in a new folder I’ve created to collect everything I need. DEVONthink makes the copy a zero-sized reference to the original doc and gives each copy a special icon so you know it’s a duplicate.
So basically, Johnny Decimal couldn’t possibly work for me, and yet I ended up with a sad version of the exact same thing on my own naturally. Well, huh. Maybe it’s not so silly after all.
(Also, regarding tagging: the idea of a database with a few tens of thousands of files in the same namespace, searchable by tagging, gives me hives. I know people do this all the time, and it’s a “me problem” that it bothers me, but oh, how it bothers me.)
My organization also evolved to a simple hierarchy over time, but the fact that files can live in several directories at the same time is very useful in some cases. When there is ambivalence where a file should go, it can just go in two places – but it‘s not a duplicate, so you don‘t run into uncertainties which one is the latest version, etc. So it’s a bit like tagging (which in DT you can additionally do), but also not quite…
Plus, it's so freaking good at finding stuff wherever you might have happened to have squirreled it away.
Let the experts handle this stuff. How many times have you found some super important production piece being handled in a disaster of Excel and 400 different versions all named ridiculous things, and nobody knows which is the right one to use? Why? Because they didn't bring software development in soon enough.
0: Librarian is our commonly understood word for the broad profession of information management, but the experts tend to have many different job titles for their discipline, get a subject matter expert(I'm not one) to help you track down the right job title for your specific project.
Firstly, I strongly recommend just reading up on Dewey Decimal[0] (which is what JD cribs almost everything conceptually from), there's a decent explanation about it on Wikipedia. Should help you "get" the categories you might want to make a bit more.
Secondly, don't marry yourself to JDs limitations. The site likes to evangelicize about some things that really aren't as important as you might think. Feel free to ignore something if it doesn't work for you - in particular the "no subfolders" rule might just... not be worthwhile to follow.
Personally I've always pretty much ignored this rule - if you look at Dewey, the left hand of the number is meant to be a classification for the broad category while the number on the right is meant for the broad project. In other words, applying a decimal organization system to specific files? Yeah not what it's meant for, don't do that.
Even in a library, where Dewey is used, an individual books Dewey Classification isn't actually unique to that book. For example all books on MySQL will have the same Dewey Class.
Build it as a system that works for you, don't try to forcefully refit your system to match the explanation of this website. Also, don't use it for small projects. That'll just make it a bigger mess than it's worth. Stick a small project in a bigger folder system, it'll work way better that way.
As for mental mapping - keep a readme file to just list the broad categories in the top of the structure, it'll help a lot. The site recommends spreadsheets but really, that's wayy overkill and will just cause dumb overhead each time you have to add a file.
[0]: https://en.wikipedia.org/wiki/Dewey_Decimal_Classification
“ Dewey Decimal[0] (which is what JD cribs almost everything conceptually from)”
seems a little uncharitable! It’s pretty openly a specialization/variation on DD, and I’d be surprised if many people on here (or really in the culture at all) weren’t vaguely aware of DD from their school days. So “crib” seems a little pejorative imo
Re: substance, I’d be interested in a clarification if you find the time: why do codes for individual files bother you so?
You need to differentiate them somehow, and the first pure-DD solution I found doesn’t apply at all:
“ we also add to the end the first three letters of the author's last name (or, if no author is given, then the first three letters of the title). In our example, the author is James Brock, so BRO is added to the end of the Dewey call number to get 595.789/BRO.” - https://www.oakland.edu/Assets/upload/docs/SEHS/ERL/Document...
It just seems plainly helpful to have numbers before files, especially for ones that you’ll be returning to and/or recreating for other projects a lot, e.g. documents within your usual project management system.
The problem is that rather than being descriptive (as in "this works for me, see what works for you"), lots of these organization guides are prescriptive, which helps pretty much only the person who wrote them to begin with. It gets really grating after a while, especially if they offer things like templates that are a pain to actually refit for personal use. (Which to be fair, JD doesn't do, but the author very clearly has that type of workflow in mind - older versions of the JD website straight up recommended using airtable for organizing stuff, template iirc included.)
My annoyance with numbering individual files in JD in this case is pretty much the result of "nobody else works in your Dewey decimal system". Like, start working with any kinda enterprise-y management tool and you'll very quickly learn that a lot of software is not written with JD in mind because they assume control over an entire folder and organize it in a way that makes sense to them. That is a problem that often combines with when you start receiving external files which are a folder of dependencies with one file you can open in the aforementioned tool. Yes, you can often spend time to edit the internals to "correct" that document to the Dewey decimal system, but that creates extra overhead and can also sometimes gravely annoy the other person if the document has to be send back and forth a couple times.
In that case, it's just way more straightforward to assign a unique ID to the parent folder instead of spending upwards of 30 minutes fiddling with every incoming file.
As for adding author last name - that's just for shelf organization in libraries, libraries sort all books on author/title alphabetical level. DDC just adds another organizational layer on top of that for scientific books (most fiction and (auto)biographies usually ends up organized outside Dewey entirely for practical reasons). You can have multiple 595.789/BRO in a single library (dictionaries for example with multiple books will have the same DDC).
One organizational system many programmers may appreciate is keeping your git/GitHub repos in the same place, under `.../g/<username>/<reponame>`. Huge fan of this method.
Probably just missing something obvious!
For instance, in the site's own structure, we have
11-core/11.01-introduction
But that would leave two digit categories at the top level. The top level is organized by groups of ten and so we need 10-19-concepts/11-core/11.01-introduction
One question is what if 10/11 gets more than ten items, so there is an 10/11/11, 10/11/12?Isn't there a division into ten needed there?
If the bottom level never goes beyond 00-09, the zero is redundant. It's actually a three level system with a branching factor of 10, and you might as well just have
10-concepts/11-core/1-introduction
I would just have 10/11/1
and have symlinks concepts -> 10
10/core -> 11
10/11/introduction -> 1
Using the numbers as prefixes for the symbolic names means that someone who remembers the symbolic name but not the number cannot use tab completion nicely. They have to use tab completion to scan the entire directory level, then type the number, then tab complete again.Symlinks going from symbolic to numeric is probably the right direction. The OS symlink resolution then teaches the users what the categories are:
$ realpath --relative-to=. concepts/core/introduction
10/11/01
There could be accelerator symlinks at the top level: 11.1 -> 10/11/1
Now you get the full benefit. If you remember that introduction is 11.1, you actually have that as an instantly navigable identifier in the system.I’m not following this (and thus, I think, your entire point). I think you might be slightly misunderstanding something, the files inside a category(11-core in the example) would never have a prefix other than the category - 10/11/11 is the only option - 10/11/12 would be breaking the system.
Once you’re inside a category, there is no division into 10 anymore. The 11 category would allow documents from 11.01 to 11.99. And as I believe is mentioned in the spec, if you need more than .99 you likely have too broad of a category or area.
For what it’s worth, I’ve used this system at work and in my own notes for around 2 years and haven’t run into this problem (yet).
I rely on it a lot for my personal data and projects. The simplicity and constraints have a positive impact on the usability of the organized information
At least based on my priorities from five years ago.
To me though the overwhelming benefit of the process is the act of bucketing. Another strategy then would be to bucket down to 8 categories instead of 10 — like line numbering in BASIC you allow yourself a bit of space if needed in the future.
I'm pretty sure Microsoft will integrate LLMs to automate file naming, and I hope other systems follow suit.
More interestingly, LLMs will easily organize data hierarchically based on the contents. I hope this becomes a reality this or next year.
I hate manually organizing a filesystem.
e-Discovery applications like Relativity have been doing this for years. You run a PCA against a bunch of OCR'd documents, look for correlations between words or phrases within documents, look for repetitions of those particular correlations, call them 'issues' or 'motifs' and slap a label on them. Attorneys used to use it to scan millions of documents in a discovery set and auto-flag them for possible privilege issues for further review, and even automatically mark them as such.
When he first got a computer, back in Windows 3.11 days, it only seemed natural to use what he was familiar with. So he would store documents and emails in directories based on the Dewey decimal system.
However a problem quickly arose. A document might pertain to multiple topics. With index cards this was simple, you just noted the book or document on each of the relevant index cards.
With files however it was less clear. The only way he found was to save the same file in multiple directories. With the obvious nightmare of keeping it all in sync.
It got somewhat better when I taught him how to make shortcuts to the documents, but still...
[1]: https://en.wikipedia.org/wiki/Index_card
[2]: https://en.wikipedia.org/wiki/Dewey_Decimal_Classification
And I have so far always find important emails (notably because important topics are easily found emails chains and far more often than not in the dedicated meeting report).
Structuring data is cultural so you should rather learn to use the system used by your organization. Only super small teams and solo-founders need to think about how to store data. Most workers should follow their community to let other people find the information.
Folders, drawer, cabinet have been around for 3 centuries at least and imho, we are not gonna reinvent the wheel with this or that way to structure information.
The whole point of Johnny.Decimal is that most organizations have absolutely no system to organize information. It’s tossed into a huge pile.
Even organizations that have systems concern themselves only with organization-wide needs. Individuals still have needs that the organization does not address.
"It’s very unlikely you will end up with a hundred categories." -the page
Exactly this will result in about 20-30 folders for most, with any real amount of documents some folders might hold 100-1000 docs.
The advise you should take from this is that forcing structure is useful. Look att large code repos for example.
(Also, it would force me to consider ... do I need 1000 files here? I've certainly been known to join related documents into a single PDF, Uber-document, if you will.)
It really grates on me when people offer solutions that work for them, as if they will work for everyone.
No.
My top level categories are `inbox` (stuff that isn't sorted yet), `Media` (stuff that other people made), and `Vault` (stuff that I made).
`Media` contains `Audiobooks`, `Books`, `Courses`, `Films`, `TV`, `Music`, and `Broadway.
`Vault` contains `Backups`, `Projects`, `Audio`, Video`, and `Photos`.
Anything one layer deeper is either a file of the type described by the parent folder name or a folder containing related files (ex: `Video/2023-06-12 makers.dev 119` is a folder containing the raw recordings and processed end video and audio for my podcast).
I've got about 10TB and tens of millions of files organized in this system. It works better than anything else I've tried.
It's still not perfect, because ultimately the subcategories of "shared" need to actually be accessible, or mirrored, or it's not actually true. And sometimes, a project goes into "shared" aspirationally, even if I have no collaborators yet, as a subtle reminder that I might share it someday, so I don't want to put anything in that folder that I'm not comfortable being public or semi-public.
Backporting old docs to this system is a real chore and honestly, I haven't been very disciplined about that part, besides moving old Project folders under the top-level Projects folder. But this is always going to be an issue with any new filing system, and I don't think there's a lot of value in doing it. Maybe would be an interesting programmatic exercise. But I, hotsauceror at his keyboard, am NOT going to go and retroactively assign a 753.0026 etc identifier to every document lol...
My rough, rough hierarchy is as follows:
100 - Administrative
- 110 Interview Notes
- 110.001-eng-john-smith.md
- 120 Onboarding
- 130 Performance
- 140 Training + Certification
- 150 Travel + Expense
200 - Analysis
- 210 Code Review
- 220 Performance Tuning
- 230 Technical Specs
300 - Documentation
- 310 HOWTOs and Runbooks
- 320 Technical Specifications
- 330 Environment
- 340 Processes
400 - Meetings (this is a catchall)
- YYYY-MM-DD-annual-project-plan.md
- YYYY-MM-DD-budget.md
- YYYY-MM-DD-new-policy-rollout.md
500 - Operations
- 510 Stack #1
- 510.001-turn-it-off-and-back-on-again.md
- 520 Stack #2
- 520.001-reset-proxysql-after-network-partition.md
- 530 ...
600 - Troubleshooting (another outlier)
- yyyy-mm-dd-stack-2345-bad-plan
- yyyy-mm-dd-stack-1234-cpu-peg
- yyyy-mm-dd-stack-3456-non-yielding-scheduler
700 - Projects
- 701 Project 01
- 702 Project 02
- 703 Project 03...
800 - Reports
900 - Training
- 901 Brown Bags / Lunch+Learn
- 902 Terraform Certification
- 903 AWS Certification
I have recently added a 000 - Logs folder for places like coding journals, another trendy suggestion that pops up here on HN from time to time that I may or may not stick with...Use this for your own files where no one else has to find anything.
Avoid reorganizing other people's files.
If you do the organizing, it may make sense to you, but may not for other people.
Adding the decimals has the primary benefit of nothing being recognizable from before, so that new brain maps can be made, not horribly and painfully mangled, warped and twisted from the old maps.
If you have to navigate one of these systems and you didn't create it, use search and hope files are named well, and hope the creator didn't go overboard with making foldets. Otherwise, welcome to a little hell of clicking into a million empty folders and never being able to find anything.
Has anyone mentioned Aristotle yet? His abstraction of categorizability works, but is so obviously wrong once you have to accomplish any practical task.
For us, organic folder structure development for as long as possible, or avoiding folders as much as possible is better. Then, some intelligent and pragmatic decision making, and no hard and fast rules. We are human friendly first, where file systems are primarily intended for human navigation.
But I just enjoy the speed of feeling like I can cd to any directory at any time in like... 8 keypresses (`jd 20.21` is an alias I use to cd).
https://github.com/bpevs/johnny_decimal
Edit: I had a separate hierarchy I used on my work machine when I was still working at a larger company, but this is the one from my personal machine (with some redacted)...
10-19 Notes
10 Quick [Daily-life kind of stuff]
10.01 Daily Notes
10.02 Cooking
10.03 Listening Notes
...
11 Research
11.00 Device Setup
11.01 Project Name 1
11.02 Project Name 2
12 Reference [Basically categorizing random notes]
12.00 Unsorted
12.05 History and Current Events
...
12.28 Spatial Audio
12.29 Music, Cognition, and Computerized Sound
13 Travel
13.01 中文
...
13.10 Maps
18 bpev.me
19 Documents
[Various documents here]
20-29 Projects [Active Projects]
20 Code
20.00 gists
20.01 bpev.me
[insert projects I am committing to often]
21 Media
21.01 Music
[insert Music album work here]
30-39 Archives
30 Code
30.03 favioli
30.04 johnny_decimal
.....
basically, maintanence-mode projects.
If I start committing on a more regular cadence, I move to `20 Code`
31 Media
I have a separate, date-based hierarchy within these...
31.01 Music
31.02 Photos
31.03 Videos
31.04 Memes
31.05 Screenshots
39 Backups
39.01 Contacts
39.03 bpev.me
39.04 Savefiles
39.05 ApplicationsHow does this handle inter-project-files? What exactly is a project even in this context? How does it handle things which can be in multiple categories? This smells for like someone pressing everything into a hard form to circumvent the flaws of their tools, instead of getting better tooling.
(1) Although it's just a hierarchy/tree, which is nothing new, its size and shape is (supposed to be) a sweet spot. There are trade-offs with hierarchy sizes and shapes, so a sweet spot is a plausible idea.
(2) By limiting the size of the tree, you force people all across the organization to share the same parts of it rather than giving them private spaces they control exclusively. This means they are forced to work together on how information is organized. This could encourage there being one coherent idea of how information is organized. Everyone will have to agree on how it's organized, and everyone will be more familiar with how others' stuff is organized.
(3) The numbers are small enough that you can remember them and talk about them. When you ask someone where something is, they can give you the answer directly instead of promising to send you a link. (It's like how you can read an IPv4 address off one screen and go type it into a config file on another computer, whereas unfortunately this is not easy with IPv6.) This increases the odds of success in finding the info.
I cannot follow any of those organizational, rigidly structured methods. They make me anxious, I much rather live in my mess and let it automatically prioritize stuff for me.
Things I don't know where I left are likely unimportant, and no energy should be wasted on them.
I think I finally made peace with my mess.
This whole discussion of hierarchy vs. tags feels like discussing if hammers are better or worse than screwdrivers, with each people assuming nails or screws out of nowhere. Some things for example organise themselves naturally into hierarchies, such as biological species (both the "old" taxonomy and cladistics are tree-based models.); odds are that the same applies to tags, with some junk out there being specially well suited for tagging.
There's also the possibility that different people do work better with one or another.
It would be specially useful to identify corner cases where each fails to deliver. Both systems are bound to have flaws; the "right" one is not the perfect one, but the one with the flaws that are easier to address and/or tolerate.
A few people mentioned items that could be assigned to multiple nodes as a shortcoming of the hierarchical system, but isn't this rather easy to solve with a disambiguation rule? e.g. for Johnny Decimal, "if an item can be reasonably assigned to two numbers, pick the smaller one." I also don't see much of a problem with synonyms, or in this case links.
Fine grained categories take up a lot of space and involve a lot of containers,which then creates more objects to manage. They also take Moore effort to put things away, for only small gains in retrieval speed unless your memory is good enough to find the category box right away.
My theory is an organization system should be optimized for storage rather than retrieval as that is what takes time and effort.
But I have a lot of trouble with numbers or abstract symbols and don't want to spend forever learning them, so I use three letters abbreviations.
All the categories are based on observation of what is already close together, rather than by trying to create a system logically, to take advantage of things I've seen in one place long enough to remember, and not have to relearn the location.
So, I have a category BAM, for bulk artificial materials. This exists because there was a bunch of paint, some cleaning supplies, and paper towels stored together.
There's also TAM, tapes, attachments, and materials. This has some screws, some ratchet straps, and some balsa wood and steel wire, some foam tape, and some keyring split rings, and a bunch of other stuff.
If things overflow a container, I split them into subcategories.
At the root I have a small number of ALLCAPS folders. Those each have a small number of ALLCAPS folders themselves, and nothing else. In a few cases the hierarchy goes a little deeper than that, but not much deeper.
An ALLCAPS folder can either be part of this ALLCAPS hierarchy and contain other ALLCAPS folders and nothing else, or it can be an ALLCAPS leaf: contain normal folders and nothing else.
The final rule is that nothing is allowed to depend on any relative hierarchy until you get into one of the folders inside an ALLCAPS leaf.
What this means is I can reorganize any time I want by moving or renaming any ALLCAPS folders at any time, or any of the folders inside an ALLCAPS leaf. I find this distinction relieving. I don't have to get the perfect organization forever, I just have to organize it in a way that works for me right now, and I can reorganize any part of it at any time without worrying about it.
I was mid-reply and I realised I was typing out my problem statement, so I’ll just paste it here. This is a work in progress.
---
# The problem
When we kept everything on paper, organised people had these things called filing cabinets. They stored all of their documents in them in a structured way so that they could find them again.
Now those same people store all of their files in arbitrarily named folders on their company’s shared drive and wonder why they can’t find anything.
## Information wasn’t always free
When we kept everything on paper, generating information came with a cost. Paper cost money. Typing out a document took real effort. Duplicating a document meant a trip to the photocopier.
Every document produced was a tangible thing. It was there, on your desk. You couldn’t ignore it.
Now anyone can duplicate anything, instantly, invisibly, for free. We assume this is an improvement.
Is it?
## You had to be organised
When we kept everything on paper, you had to be organised. There was no other option.
If you weren’t organised, the information was lost. Not lost as in ‘it’ll take me a while to find it’: lost as in ‘gone forever’.
Now you can be disorganised, but at what cost? The cost is the time it takes you to find a thing; it is the risk that the thing that you find is a duplicate or an old version. It is the constant frustration that comes from knowing that something exists, but having no idea where it is.
We all feel this every day and we have come to believe that it is normal.
It is not normal.
## Why aren’t we given training?
When we kept everything on paper, it was someone’s job to organise it. This was an occupation: you were trained. You became an expert.
Now we employ Gen Z’s who didn’t grow up with the concept of ‘a file’ yet we expect them to navigate the byzantine hierarchy of the company’s SharePoint.[genz]
[genz]: https://www.theverge.com/22684730/students-file-folder-direc...
You work at a keyboard all day, so we make you sit through a module so you know to bend your knees when you lift a box.
But when it comes to information management: you’re on your own.
It seems too hard to memorize the numbers for first time placement.
So let's make a program that asks us when moving it into our collections?
`dewey <file to organize>`
Will then lead you down a tree of decisions. Insta-organized. It's so good I just might try it.
(The file will move to wherever your organized files are specified in your .config/dewey.conf)
On Windows this could be a right-click -> Dewey, where it then pops up a small window to pick the categorization.
I had previously implemented it on my personal Nextcloud instance, but found it to be less impactful, as I already tended to over-organize my digital files.
Johnny.Decimal – A System to Organize Projects - https://news.ycombinator.com/item?id=36300472 - June 2023 (1 comment)
Johnny.Decimal - https://news.ycombinator.com/item?id=25398027 - Dec 2020 (187 comments)
Johnny.Decimal – A system to organise projects - https://news.ycombinator.com/item?id=13770827 - March 2017 (2 comments)
(2) "TagTrees: Improving Personal Information Management Using Associative Navigation" at https://karl-voit.at/tagstore/en/papers.shtml
(3) "TagTree: Storing and Re-finding Files Using Tags" at https://karl-voit.at/tagstore/downloads/Voit2011.pdf
One thing I have learned to do which bends the rules a bit is to use date stamped folders in the lowest level instead of XX.YY.
Examples of places where I use this with success is for folders containing: meeting minutes, travel documents, receipts, etc.
So,
1. notes a-la gists use tags:
'notes/Rsync notes #cli #foss #notes #x-platform.md'
'notes/Windows initialization #windows #powershell.md'
'notes/Modafinil notes #medical #nootropic.md'
2. event-like things use both dates and tags 'work/meetings/2023-01-03 Project XYZ meeting #project-xyz.md'
3. stuff I just collect dont use anything or some of above 'dms/wallpapers/w1.png; w2.png ...'
'dms/shopping/2023-06-13 Dyson Absolute 15/README.md; receipt.jpg'
I keep basic folder hierarchy very limited for now.
I use vscode to commit any change on save and pull git on folder open, making this behave like always in sync cloud a la Github Gists, especially together with vscode sync that brings my plugins, configs and shortcuts everhwhere.CTRL+P to quickly find stuff by name or tags, and vscodes very fast ripgrep search to get files containing any content - so I just need to remember any word or phrase to find it. If I can't remember anything I browse over tags (having handy script to display all of them) or dates (since I usually know a time range). As another mechansism, I use double commander file manager with its fuzzy file names search to get interactive lists by typing tags or keywords while in particular folder.
To encrypt some pages I use GPG with vscode extension.
This serves me well, and I don't get lost, either when searching for previous knowledge or when trying to find where the single one is.
I evaluated Johnny Decimal prior to this, and it didn't fit this workflow - seems ad hoc enough so I can live without it and has nothing tags or good search can't solve. Also, it feels not flexible enough particularly as stuff can't have multiple categories. Tags are much better mechanism for information organization, you just need to keep them organized, keep their number relatively low, and have mechanism for delete/merge/move/rename which is simple enough here as it is all on the file system and is a few shell commands away.
Says someone who’s never worked at Google and used Moma. I still don’t understand why Google doesn’t offer Moma as a on-prem thing to replace JIRA’s suite. Is the market too small? They used to have an on-prem appliance way back when but surely a container package is all you need these days?
But first visit to their web site shows numberings exceeding 10.
Ok.
Still a novel idea worth pursuing.