Building personal search infrastructure for your knowledge and code (opens in new tab)

(beepb00p.xyz)

661 pointsoctober_sky6y ago157 comments

157 comments

I've given up with trying to find The One True Note Taking Tool, so have ended up writing my own thing that I tinker with now and again to tune it to exactly what I need.

It's essentially a simple web server that sits on top of a bunch of markdown files.

The frontend renders the markdown using markdown-it and supports KaTeX for simple inline mathy things, along with the extended markdown stuff like tables etc. I've even made it so that you can drag and drop files (including images) into the edit box and it will upload them to the server and render the correct markdown syntax so they can be rendered when you look at the note.

Alongside the files, the data is also stored in a SQLite database file with some metadata, and I'm using the Full Text Search (FTS5) engine to support search which seems to work ok.

If the database gets corrupted it can just be rebuilt, it's really just there to augment the notes. If I stop developing it or want to move on, the notes are there as text files.

It works well enough in a mobile browser, although admittedly a bit rubbish if you need offline access.

Works well enough for me. I might open source it one day but I think I'd need to clean up the code a bit first :)

EDIT: the core of the tool was mostly inspired by this article https://golang.org/doc/articles/wiki/

gwgundersen6y ago

This sounds a lot like a tool I built for myself [1], sans the database. I agree that Markdown + Katex with a local server seems like the right move for most technical people. Lots of things like encryption, backups, and basic text search can be done via other Unix tools. I also agree that the big win is owning your data long-term, even if you get tired of maintaining the software.

[1] https://github.com/gwgundersen/anno

archontes6y ago

Sir! I have to say seeing you here that I appreciate your contributions.

haddr6y ago

Same here: I'm taking a lot of notes and find SimpleNote/Notepad++/NV work for me for note taking but not for note management. And I also build several tools to manage it. For now I use very simple full-text search (qgrep) and fzf (with my own modifications) to perform search-as-you-type to find notes/source code. qgrep is good for really quick index and it has incremental indexing that works for my byt i'm hitting some problems as it works good for code search but not so good for notes. However i don't feel like using anything that start a server. I just don't think I have that many notes.

hammerbrostime6y ago

I did something similar for a few years... then my web app was hacked. I realized what a big liability it was to have such precious stuff inside an app with such limited security. Since then I’ve learned to make do with simpler tools. Omni Outliner is my favorite!

coleifer6y ago

I have a similar tool I've been using for years now. It is built on sqlite and uses the fts extension to provide full-text search.

Some nice features I've added over the years: bookmarklet and automatic page-screenshotting, tags (and smart auto-tagging), everything markdown supports, file upload and attach, media embed (YouTube link becomes player, eg). Oh, I can also attach email reminders and make to-do lists (with little checkboxes and everything). It started out very simple and has grown over time. Sqlite is a great foundation for projects like this. Strongly recommend.

sqs6y ago

Sourcegraph CEO here. I see the doc mentions Sourcegraph for code search (cool!). Something like ripgrep is indeed better for your case, a single person who just needs to search code in local directories on their own machine. I made a PR for our docs at https://github.com/sourcegraph/sourcegraph/pull/8075 that should clarify this.

Sourcegraph is a web-based code search tool that automatically syncs and indexes many repositories from your organization's code host(s). It's intended for every developer at an organization to use for searching across all of the organization's code (and for navigating/cross-referencing with code intelligence). It's self hosted and usually there is 1 Sourcegraph instance per organization. If you love local+personal code search, I bet you and your teammates would love organization-wide code search, so give Sourcegraph a try (https://docs.sourcegraph.com/#quickstart). :)

karlicoss6y ago

Thank you for replying and updating the docs, appreciate it!

I still wish it was easier, it's such a cool tool :) In theory it should be possible to set up inotify watches on local repositories and reindex on changes (perhaps with some throttling logic if it's too heavy), although I understand it's harder than it sounds and my usecase is probably somewhat marginal. I might set it up anyway if my personal infrastructure ever settles.

mikepurvis6y ago

Another great option for local code/repo search is Hound. I maintain an instance of it at my workplace, but it's so lightweight and easy to deploy that I could easily imagine running an instance of it on my laptop for offline personal use.

https://github.com/hound-search/hound

blyry6y ago

YES! We have been using hound for several years now, having all hundreds of our org repos searchable in one spot, in a LIGHTNING FAST manner has been an invaluable tool to help our various teams keep up with the legacy sprawl and effectively remove old features and all their dependencies from our sprawly systems. I even wrote a microservice that uses gitlab global hooks to keep hound up to date without polling, and a little c# config generator that runs as a cron job on our gitlab instance and redeploys hound with the newest repos included.

Hound falls short on access control front (we wrapped our instance with a saml proxy), but it's still a 'you either can search every piece of software for \'password\'' or you don't have any access at all. Having to index a specific branch instead of all of them kinda stinks too; for those two specific reasons we have been eyeing sourcegraph, esp. as the gitlab integration matures.

I can't emphasize enough how fast hound is and how pleasurable it is having a regex based code search that doesn't make me wait.

1 more reply

avisaven6y ago

A similar but more structured (though perhaps Hound supports a similar feature set) code searching tool is OpenGrok [1]. It's a bit more setup as it uses Apache Tomcat, but once it is setup it has an incredibly fast and useful code querying tool with really useful abilities to x-ref functions/structures, highlight uses of variables, and integrates git info as well. If you've ever used elixir.bootlin.com to go through the Linux source code, opengrok is effectively a more feature packed open source version of that. I highly recommend taking a look to anyone who spends a lot of time digging through code.

[1] https://oracle.github.io/opengrok/

pouta6y ago

Recently interviewed for a PM role at Sourcegraph, so I read everything Sourcegraph shares online and it was amazing to see your plans and OKR's for the future being laid out in the open. Kudos on running an open, successful organization.

After reading about your masterplan I would love to know your thoughts on the question presented regarding phase 2.

Will coding in the future be more like writing a novel or like knowing how to read+write? I feel the latter will eventually be true as the the human-machine interface becomes more 'native'.

otakucode6y ago

I'm not familiar with your product, so this question might be either overly simple or entire outside your wheelhouse, but does Sourcegraph have any integration with IDEs? Like if my organization had a module in a repository that did image compression (random example), could I hit a hotkey and search for it, then have the plugin pull the module out of a different project and insert it into what I'm working on?

sqs6y ago

Sourcegraph has editor plugins (https://docs.sourcegraph.com/integration/editor) that give you editor hotkeys for (1) multi-repository search and (2) go-to-file on Sourcegraph (in your web browser, so you can read the code without ruining your editor state or share the URL of your current file with a teammate).

That advanced use case you mentioned isn't supported, but it sounds very cool. It's in the realm of things we'd like to offer someday. If anyone's interested in hacking on that (and making a PR to https://github.com/sourcegraph/sourcegraph), I'd be happy to screenshare with them and give them some pointers.

scarejunba6y ago

I was in an Uber with one of your engineers who was heading to Gophercon. He seemed cool so I'm going to assume you're all cool people.

mlthoughts20186y ago

Wow, the prices seem extremely high to me for a search engine across code repos.

$30/person is almost double what Stack Overflow charges, and that product can act as a frontend to search not just code but any type of documents, with voting, tagging, analytics on what confuses people the most and more.

It would be hard for me to justify even $10/person for something like Sourcegraph in my company (a Fortune 500 ecommerce brand), for the highest enterprise tier of functionality.

$30/person per month for the lowest tier? Boy, I wish I knew of companies willing to pay that. None in my experience ever have been.

seagreen6y ago

> in my company (a Fortune 500 ecommerce brand)

My strategic advice is to get whatever's best in class, and not worry about $X0/month. Compared to what you should be spending on devs that rounds to free.

1 more reply

capableweb6y ago

Seems like more than half of this comment is just speaking about Sourcegraph, something sqs themselves acknowledge is not the right tool here. I know you're the CEO, but maybe can avoid pushing your product when it's not relevant :)

Thank you for updating the documentation to clarify the use case though!

ssivark6y ago

Meta-observation. This topic seems to be getting a lot of attention on HN over the last few months, indicating massive interest. Further, looking at the landscape of developments in this space (past all the me-too Markdown note taking apps): Evernote seems to have a fading presence on the landscape, Notion seems to be a (too?) well-funded behemoth startup, Roam is trying some exciting things, and Tiago Forte is putting together some interesting things under the BASB banner. (Any others? Oh btw, there’s also Perkeep)

It’s amazing for how long Emacs’ Org-mode has been largely unparalleled! Apart from the revered desktop setup, there are now a bunch of mobile offerings including Organice — not quite slick, but definitely useful.

I‘m sincerely rooting for more experiments in this area. I would love to be able to write by hand or speak to my memex (multi-modal interaction). Vannevar Bush’s “As we may think” has languished uncourted for pitifully long. In some ways, this was supposed to be the first “killer app” for personal computing.

jhoechtl6y ago

> It’s amazing for how long Emacs’ Org-mode has been largely unparalleled!

I use org-mode all the day but frankly OneNote is great too!

If OneNote would save in plain text and have a cross-platform gui I would use it (even if it's resource-sucking electron)

0x262d6y ago

I use onenote extensively and the mobile app has gotten steadily a little slower and worse. I would love to have a simpler, faster mobile app that would sync to my computer in plaintext so I could use vim with it....

Basically, onenote is almost there but I would love to leave it

beerandt6y ago

OneNote is the only note taking app that I've come remotely close to "successfully" useing.

I especially love how it automatically cites and links to whatever you copy and past from the web. That alone is so valuable for documenting workflow and how-to write-ups.

However the combination of me using a desktop less and mobile more, plus Microsoft's attempts to turn Office into a web app have soured me to it. That and the limitations mentioned above. I'd love to be able to export to a wiki style interface, but I cringe at thinking about what that html would look like (a la Word's html export).

But I have yet to find anything that I like better. Or will consistently use as much.

voltagex_6y ago

The OneNote format is documented. If I knew Haskell I might have a go at adding support to OneNote, but if anyone wants to have a crack at it I'd support you.

tra36y ago

> It’s amazing for how long Emacs’ Org-mode has been largely unparalleled!

After jumping into org last year, I think it because orgmode has a solid foundation for organizing stuff that's infinitely customizable with elisp. Roam, evernote, onenote; they just don't have the flexibility. The lack of customizability is a feature in itself: it's easy to pick up.

On the other hand, orgmode has a fairly vibrant community that will keep improving orgmode for many years to come.

gallegojaime6y ago

My hackerspace is working on a tool to put this knowledge-in-a-computer to work.

Essentially, it's a knowledge management system that makes input almost frictionless. This is then mapped into a shareable ontology graph on which algorithms can be executed. Valuable data can be extracted from here.

For example: do you need to find a team with a specialized couple of skills? Have applicants send their verified graphs and use those relations to find the best fit.

Or, alternatively, someone who's learned a trade/skill can share their dense knowledge with a community, to direct learning more effectively.

It's on a very early stage, for now purely for the fun of it. But if there's interest or suggestions (definitely some hard problems to solve) we could focus more efforts towards that.

rhizome6y ago

>Evernote seems to have a fading presence on the landscape

My understanding is that they're throwing their presence away? Maybe they pivoted to enterprise, I don't know, but for at least a couple years all I've heard about them was people talking about what to use instead.

macintux6y ago

I’ve started retiring my use of Evernote because I simply don’t have any idea whether they’re going to be around for the long run. I’ve been using Emacs for nearly 30 years; it’s about time I learned org-mode.

slightwinder6y ago

I think for a long time they simply had no idea what road they should walk down. So they tried a bunch of stuff, walked in circles and wasted money left and right. Now they settle down on what they always were and try to maintain the established userbase while not scaring them away with too much innovation, and instead delivering conservative improvments.

This came with a shutdown of freeusers heaven, and focusing more on the paying customers. Because of which many people seem to be cranky over evernote. Similar think seems to happen at the moment with dropbox too BTW.

marviel6y ago

It's a ripe space. I'm using notion mostly right now, but I've also used:

-Coda.io (big, more scriptable player)

-Hypernote (super new player, but with a cool new take on inter-note relationships)

-Tiddlywiki (super customizable, really fast -- but also has a fair amount of footguns)

-Airtable (only played with it a few times but it's usually mentioned in the same breath as notion, I notice)

Hopefully someday we'll achieve Alan Kay's dream :)

DavideNL6y ago

If only Notion was private (like for example Omnifocus.) I can't imagine uploading all my private data to the cloud of a "free" app.

OmniFocus is more expensive but i gladly pay to prevent my data from being analyzed & sold.

1 more reply

wowtip6y ago

Airtable is fantastic for so many things, I only wish there was a cheaper "personal" plan with more records, for a couple of bucks / month.

I can't really justify $10 / month for "just for fun" personal projects, and the 1200 records / base is too limited for many ideas (and also 5000 records for $10/month is on the low side as well, even if putting it as a company expense)

Yes, I know, they got to eat and everything, and maybe cost vs income is not feasible for personal accounts.

elbear6y ago

I've only started using Tiddlywiki, so I didn't get the chance to dive deep. Can you mention some of the footguns?

1 more reply

thiagomgd6y ago

Have you looked at Wiki.js? I played around with it a little bit and it seemed nice

bachmeier6y ago

> Evernote seems to have a fading presence on the landscape

They've made massive changes over the past year. They'll even have a Linux app coming out soon!

pottertheotter6y ago

Really? What massive changes? I use it regularly but haven't noticed anything massive.

klft6y ago

(1) For note taking I stumbled across anno[1] via[2] two weeks ago. It's a python flask application which you run on your localhost. You write markdown which gets stored locally as file and is rendered as html using pandoc[3]. It's really basic but I love it.

(2) For physical documents I use a Fujitsu ScanSnap iX500[4] for scanning. A runtime-licencse of ABBYY FineReader for OCR is included. The resulting PDF has embedded text which I extract using pdftotext[5]. I wrote a python application to search and tag this documents. It loads all the text in-memory which is perfecty fine as I have < 10,000 documents. I use it since 5 years and it works OK.

[1] https://github.com/gwgundersen/anno

[2] https://news.ycombinator.com/item?id=22033792

[3] https://pandoc.org/

[4] https://www.fujitsu.com/global/products/computing/peripheral...

[5] https://en.wikipedia.org/wiki/Pdftotext

tomatocracy6y ago

I have a ScanSnap scanner too (mine's an S1500 - I have had it for c10 years or so and it still works perfectly) and it's great to be able to search what used to be paper documents quickly and easily. It saves a lot of physical space as well, most documents I scan then shred immediately once I've verified the scan is good and backed up.

There are some reasonably good OCR tools on Linux now as well - I've been pretty happy with Tesseract[0]. It was an absolute pain to script everything to "just work" when I press the button on my scanner though.

Recoll[1] works very well for indexing documents for me including my OCRd scans. When that's not enough, I revert to pdfgrep.

0. https://github.com/tesseract-ocr/tesseract 1. https://www.lesbonscomptes.com/recoll/

lifeisstillgood6y ago

Actually, what has been bugging me recently is the inability to "tag" photos on my iphone - all I want is to snap a copy of my bill / invoice whatever, tag it with "gas bill" and let it upload to icloud / dropbox. from there I am sure I can onwards process looking for "gas bill" but actually there seems to be no obvious way to do it, (even looked into EXIf data), and I guess it will age to wait till i learn ios coding

jamiek886y ago

Touch and hold , then tap an option. Custom: Tap , tap Enter New Tag, type a customtag, and tap Done. Create additional custom tags: Tap , tap Enter New Tag, type a custom tag, and tap Done. Add more than one custom tag to a photo: Tap , and tap each tag you want to add (so a checkmark appears next to it).

1 more reply

ubercow6y ago

Have you looked into apps like Scanbot [1]

1: https://scanbot.io/en/index.html

lifeisstillgood6y ago

Totally unrelated but I love these "how I built my version of" threads - I learn about tech and projects I never knew existed

ok carry on please, diversion over :/)

PenguinCoder6y ago

I've been looking for a good multi-document feed scanner. Do you have experience using the iX500 with Linux, or gscan2pdf?

My usecase would be scanning multi page documents with minimal effort, and saving to PDF somewhere.

klft6y ago

I thought about Linux but while it should be possible to use the iX500 with Linux you would lose OCR. I did some tests and compared the OCR of the included ABBYY FineReader with Tesseract[1]. Tesseract was not good enough for my use case. So I still use the iX500 on Windows.

[1] https://en.wikipedia.org/wiki/Tesseract_(software)

stillwater566y ago

Does anyone else find that the simple act of writing notes helps them remember and process better? I spent forever trying to find an ideal note-taking solution, but now I just write things in a single notebook. I rarely review my notes, but I find that simply writing thoughts down consistently has improved my memory and understanding of new concepts.

Terr_6y ago

Yes, the effectiveness of note-taking (particularly handwritten) on memory has been a subject of scientific interest for a while.

I feel it shares aspects with Rubber Duck Debugging: The effort of taking something you "know" and forcing it back out through other brain-circuits (i.e. language and/or simulating a social interaction) helps to fill gaps that your brain would otherwise skip over. The act of hearing/seeing your output also causes other parts of your brain to analyze it as if it were someone else's thought.

I suspect our consciousness isn't nearly as unified as we like to believe.

hinkley6y ago

This has changed back and forth over time for me.

Tests in middle school, I could recall writing things down, even the part of the page I wrote them in.

By college I would write TODO's down and lose them, and not be able to recall what I wrote down. Misplacing the note was more likely than forgetting the task, so I stopped writing them down.

I should try to measure this again because right now I couldn't tell you which works better.

One of the most uncomfortable things about getting older is that in your teens and 20's you spent all this time figuring out who you are, what you like, what you're good at and what you struggle with. Age, changes in health, coping mechanisms, changes in perspective all fuck around with this and you can find yourself in situations you should avoid or avoiding situations you could embrace.

It's like a weird mid-life crisis.

otakucode6y ago

This certainly applies for me personally. My theory is that it ties in with the sort of 'geographic' memory where when you think of something, you might not be able to remember exactly what it is, but you can remember pretty precisely that it's in the middle of a certain notebook, on a heavily marked-up page, in the bottom left corner. By tying things to a location which you can remember, placing it in a bit of a context, its easier to hold on to. I also find, and for this I have no explanation at all, that I can remember sequences of numbers and code very well, better than anything else. I couldn't tell you the date I started or left my job 2 employers ago, but I could rattle off my 7-digit numeric security code for the door no problem. The brain is weird.

lazyasciiart6y ago

Well, you also used that 7 digit code a lot more often than you ever had to recall your start or end dates.

slightwinder6y ago

I think notetaking is way to "materialize" thoughts. In which I mean a thought is bunch of uncertain values within a rough area, and they come a go fast. But writting it down makes the values clear and specific and puts and gives you something to remember, instead of something to think.

Hm, basically it's making real final decision, instead of playing with a bunch of potenial possible decisions which are all somewhat equal, but also kinda fuzzy,

_y5hn6y ago

This is well-known. Hand-writing are found in research to work very well, though digital note-taking also works. Those dilligent students were rewarded, though very good to learn this yourself!

I've found Freemind to work well enough for me. Search not needed as I browse the graph easily enough.

nik_0_06y ago

Very much so. For me, note-taking also forces basic internal thought (I guess what you called process) as well, which is great. As I'm writing down the note, the next line has a relatively high percentage chance of being "wait, no that can't be right because X".

I am now dabbling with reviewing them, although not sure what that will lead to, as they are so unstructured. There are generally a few gems in there to be remembered, but mostly spur of the moment gibberish!

semitext6y ago

I think there is a lot of truth to this, however I also (when I remember to) like to review notes I wrote a month or so later, and check to see if they make any sense to me. If they don't it means I didn't understand the concept as well as I thought I did and it is worth going back over the source material.

lcall6y ago

I wrote and use daily http://onemodel.org (AGPL, uses postgres), for many reasons listed there :) . One way to think of its current state is a text-mode, easy-to-learn (i hope) infinite mind map of things, where I store and can query effectively everything: calendar, reminders, quasi-anki-like knowledge review, journal, automatic activity log, notes on subjects, very efficiently for the user. (It also stores documents, but that is not very smooth compared to other document systems, nor is browser integration smooth at all.)

Edit: It also has a very basic security model (private, public, unspecified), and with that in mind, can export trees of notes as html or as outline documents (text), with or w/o indentation & numbering, which I've found very useful. And anything can be in as many places in the tree as is helpful. The export to simple html, I use to generate my 2 web sites.

(I plan to move it to Rust, and maybe sqlite, eventually, as well as add features like anki, internal code attached to entity classes for cheap internal customization/automation, etc, but have been slow lately.)

(Edit: it is currently only self-hosted by each user. Have considered doing hosting for other users, and might some day.)

oliv__6y ago

Looks interesting but honestly I had trouble keeping my attention focused enough to read through the intro page.

A little CSS (max-width: 700px; margin: 0 auto;) on the body would go very far.

lcall6y ago

Thanks for the comment. That was debated slightly, in a previous HN discussion, where some pointed out that using browser defaults appeals more (to some people, I don't know if the majority), especially if they have particular needs. I admit to insensitivity to such things, but I will make a note to try your suggestion sometime. :)

slightwinder6y ago

I remember this one. Looked interessting, but was lacking the "big picture" of data. It seems to be easy to get lost in data and loose your trail.

lcall6y ago

Thanks. Can you elaborate, including on what would be a fix for you?

For me, the big picture is I organize everything in ways that work well for me, which I have tried to mention on the web site (in screen shots and some org ideas somewhere). Like, todos, historical things, documents, contacts I have (orgs and people), calendar + tickler file (so I dont have to think about things until the date I should start thinking about it, but I don't forget, if I check it habitually), habit reminders and other review/study material, and notes by topics organized in ways I can find things. I have a top level list/hierarchy/outline (actually a few of them, and anything can link to anything else for quick reference, depending on the convenience of the moment for lookup), or I can remember some search terms (<x-company> main" to get a phone # for x-company). I also have standard patterns (with some support in the software for making data look like templates) for details about contacts or other things, logging journal notes, conversation notes with businesses or doctors or whatever, and it then becomes easy to refer to history. Then anything is basically available via a few to several keystrokes, to get exactly what I want. There is also text search, or queries by date. It seems like one would have to do that with any kind of mind map, org-mode, or note system: organize things and/or search for them in a way that helps oneself as the user. Maybe some pre-fabricated forms or examples of that would help someone get started though...

(some edits for clarity above, and)

Edit: Also, when navigating in to one's data, one can then hit 0 or ESC to go back out the way you came, even holding down ESC to go back to the top level. I also tried to make it so the UI shows what can be done at any given time, if one reads the screen.

Is any of that relevant, or do you have something else in mind? Thanks again for the feedback.

1 more reply

lcall6y ago

(If one has possible future interest, there is an announcements list, and feedback is also appreciated.)

gotts6y ago

telnet demo seems to be down at the moment: Trying 52.37.29.12...

lcall6y ago

True; sorry about that. Maybe I should remove that from the web site until I decide better. But the best thing is probably to check the screen shots via the web site, then install/try it if you like...

Edit: I have removed mention of the telnet demo from the site. If there were sufficient real interest I would put it back (or consider hosting the system for others). If so, email me via the mailing list at the site, or via the address at the site footer. Thanks.

lcall6y ago

If you have possible future interest, there is an announcements list.

gricardo996y ago

A great time saver for me was simply setting up better bash history and search capabilities[1].

I wrote a wrapper function, sbh (search bash history) that allows me to input date strings like "2 months ago", or "last week", which narrows the search. Linux 'date' function with --date string arg is pretty powerful[2].

1 - https://spin.atomicobject.com/2016/05/28/log-bash-history/

2 - https://www.thegeekstuff.com/2013/05/date-command-examples/

dchichkov6y ago

Reminds me somewhat similar - CEO of Wolfram developed a nice way of record keeping: https://writings.stephenwolfram.com/2019/02/seeking-the-prod...

By the way, is there, by chance, a "note taking/indexing tool from photo"? I'd like to be able to take a photo of an title/abstract of computer science paper with my phone. And then be able to find it, by approximate date and keywords. (I use Android. Seems like something relatively easy to hack, actually, on top of Google photos.)

tibu6y ago

Evernote does character recognition quite well. I don't know if there are any others but would be good to have something else too so I can leave Evernote for Notion.

napoleond6y ago

I've been thinking a lot about how I manage my own data lately (notes, photos, code, reference material, etc) and have concluded that the primary feature I'm looking for is longevity. I'm saddened by the amount of data I've lost over the years, either because of hard disk failures or third-party services going out of business/making it difficult to extract things/getting too expensive.

In light of this, I'm biasing toward simple file formats managed by tools I write myself, and optimizing for cost in a way that I otherwise don't, since any recurring costs incurred by the system are effectively a lifelong commitment. I am relying on S3 for primary storage (so that it is accessible anywhere) but with a sync to offline backup.

So far, I've implemented a personal Zettelkasten tool (with built-in spaced repetition, so doubles as an Anki replacement) and a search engine that's based on Presto (via AWS Athena) so that I don't need to keep an Elasticsearch instance alive. I'm planning to build out other repository tools as I go.

It's been very liberating to build tools that are never meant to be used by anyone other than myself, and with the confidence that the tools don't matter too much anyway since the underlying files are stored in evergreen formats.

silicon24016y ago

what's the optimal setup for long-term, large-scale (personal) data storage?

I want to build one big Backup. Some initial research has pointed me to something like Bacula to manage the data backup process from a machine. With the 3-2-1 rule, I know I also need my Backup itself to have at least 3 copies, in at least 2 different forms (cloud/hard disk), at least one of which is off-site from me.

As an individual, do you or anybody else know the best way to implement such a system? Should I buy one giant hard drive, use many hard drives to create a RAID array, something else?

kortex6y ago

Oooh. I've been wrestling with this problem for a while now.

Basically I'm working on a tiered system. Files/dirs are categorized by size (<10MB, <25GB, >25GB) , and by sensitivity (public, confidential, secure. And importance is usually proportional to security). I have fortunately found that security is usually inverse to size. Github/lab anything which makes sense. Confidential small stuff (sans keys) is just stored in gmail/drive. Big, boring stuff (music, ebooks) is just kept on external hard drives.

Secure, ultra-important stuff, I don't really have a system for.

The system I'm leaning towards is just encrypt archives and store the key/password securely, and store it like you would any boring data, with a local NAS and a cloud backup service of some sort, or just stored on drives offsite.

1 more reply

philsnow6y ago

You mention S3 and Athena, but also that you're building for longevity. Are you planning for the future obsolescence of AWS, or going to cross that bridge when you get to it?

napoleond6y ago

The S3 files are mirrored to a local drive as a collection of plain .md, .jpg, etc. The Athena search index is secondary in importance to the source data and not necessarily permanent (presumably the options for "take this folder full of files and let me search it" will only improve over time).

That being said, one of the reasons I chose S3 vs. other AWS services or other companies is because I expect it to be around for a very long time. (Just because I've preserved the option of migrating away doesn't mean I relish the idea.)

spdustin6y ago

I'd really like a personal "correlate all the things!" setup that has a plugin architecture for any source and creates a time series and document-based store of whatever I want. Tweets, e-mails, text messages, time tracking, etc.

There are lots of tools that do the individual moving parts, but a personal aggregator of everything would be interesting. Basically, a tool that lets you become your own personal data broker—just for your own personal data.

karlicoss6y ago

I'm kind of working on that too :) https://github.com/karlicoss/my

I wrote a post on some data that I collect and have/will integrate: https://beepb00p.xyz/my-data.html#consumers

K0SM0S6y ago

I only skimmed through and the combined breadth + intent of your projects seems very, very interesting — I mean it speaks to me. So, way to go! Mad props, please keep it up!

If you ask me, this is the shape of things to come.

1 more reply

capableweb6y ago

I (and many others I'm sure) have been thinking about similar things as well. Not really sure how it work though. Any one care to brainstorm up an architecture that can support this?

user00012-ab6y ago

My problem with a lot of services listed below, is they all eventually go away, and all your data is off somewhere else. Unless you store your data locally in a human readable format (markdown) you are just putting all your data into a system that WILL go away at some point in the future.

Google has already had 2-3 services to manage your data that they have closed down. Maybe they are the ones that taught me not to trust your data with anything on the web.

Even something like Evernote is iffy, they seem like they are constantly on the verge of shutting down.

Although I do find it sad that that the human race as a whole puts so little value into this type of software, and so much value into sports and politics.

lcall6y ago

http://onemodel.org , described briefly elsewhere in this discussion page and more at that site, is self-hosted, which today means installing postgres and editing one config file, doing backups & upgrades (but I might be able to help some).

Maybe I could host for others sometime if there were sufficient interest. And/or move it to sqlite.

capableweb6y ago

Yeah, seems neither self-host (onemodel) or letting someone else (you or Evernote) is particularly attractive, because the chance of data loss is always there.

Is it possible there is a solution that makes the data more permanent and allows multiple parties to backup the same sources, or something similar? Some sort of federation protocol maybe.

1 more reply

ramses06y ago

https://en.wikipedia.org/wiki/Blosxom ... I got started in 2004 with that setup: http://www.robertames.com/blog.cgi/entries/bloxsom-started.h...

...a bit contrarian compared to the WordPress and BlogSpot frenzy at the time, but I've been happy with it.

[rames@...:~/blog/entries]$ find . -type f | wc -l 331 [rames@...:~/blog/entries]$ find . -type f | xargs -n1 cat | wc -c 574481

It's been very stable over ~15 years, but I think it might be time to adopt SQLite, at least as a caching layer. ;-)

nabnob6y ago

Someone shared this on HN yesterday - https://labs.tomasino.org/gnu-recutils/

It's a set of unix-style tools that let you treat text files as databases.

theshrike796y ago

This is what I've moved to: https://joplinapp.org

It's just plain markdown and syncs to any cloud provider or a webdav share. Butt-ugly especially on iOS, but it works and there is no vendor lock-in.

Kiro6y ago

Honestly, the risk of me losing my local data is much higher than a note service shutting down. It has happened countless of times and I'm just too sloppy with the backups. Purely personal of course but for me your argument is reversed.

ketzo6y ago

It's been mentioned a few times in these comments, but I want to add a +1 for Roam[1]. Note-taking/personal knowledge tool that's very, very different from anything I've seen before -- closest thing I can compare it to is Wikipedia. It's still in beta with some rough edges, but VERY worth checking out.

[1] roamresearch.com

qot6y ago

Worth mentioning the pricing [0]:

$30 / month

$10,000 / lifetime

[0]: https://twitter.com/Conaw/status/1214855473876201472

K0SM0S6y ago

"trackcmp.net" keeps breaking navigation for me on roamresearch.com, and that tracker is not even https (classic unsafe warnings from Chrome). Weird. Unfortunately, that makes the whole thing look shady, and I can't even get to the create account screen after signing up. :/

Maybe they'd do better to ease up on the tracking, especially for a "give us all your documentation" service.

jefurii6y ago

Looks neat but it's a service, which is kinda the opposite of rolling something for yourself.

slightwinder6y ago

Isn't this just a desktop-wiki with auto-linking?

Fiveplus6y ago

> all digital trace I'm leaving (tweets, internet comments, annotations)

I would be open to the idea of a tool which combines the entirety of my digital presence at any point in time in a single platform. Kinda like a dynamically updated list which updates itself - every time a linked account makes a comment, 'likes' a post or performs any activity that may link it back to me.

kirubakaran6y ago

I'm building this https://histre.com/ It has Hacker News, Telegram, and web browsing (notes, bookmarks, history) integrations already. Up next: Emacs org-mode exports, integrations with Pocket and Pinboard.

Here is a bit longer comment on that which I made earlier today: https://news.ycombinator.com/item?id=22160026

swozey6y ago

This is cool, I'd dig a $2-5/mo unlimited account for 1 person/team with the same unlimited settings.

1 more reply

matlin6y ago

We've been building something to solve this exact problem. We started looking at how much we could derive from email notifications but parsing unstructured emails is very error-prone and typically incomplete so we decided to lean on external APIs where possible. Essentially what we made was a platform to index your data across services in one searchable document store or build new applications natively on this database where all of your data is one place. If you're interested in getting early access (we're still in early alpha) you can sign up on http://www.aspen.cloud

mikepurvis6y ago

It's barely a year old, but I think Timeliner is kind of trying to be this, as something you run yourself to protect against disappearing cloud services:

https://github.com/mholt/timeliner

wtracy6y ago

This have me a hairbrained idea for a browser extension that drops every web page you visit into a private Lucene database.

mariushn6y ago

Try out https://booxia.wensia.com/ or https://historio.us/

cobby6y ago

You might want to look into WorldBrain's Memex extension. I think it crawls every page you visit into IndexedDB so you can do full-text searches later.

ryanfox6y ago

I’ve been building essentially that, but more than just websites: https://apse.io. I am really pleased with how it works - just released v2.0!

user00012-ab6y ago

I was kind of having the same idea, except any site you bookmark gets added to a personal web crawler, and then you have your own search site for things you find interesting.

walterbell6y ago

This exists on iPhone/iPad! DevonThink2Go, local crawl/search + optional encrypted sync over self-hosted WebDAV or public cloud services. Can also take/search markdown notes.

jmakov6y ago

No mentions of https://tiddlywiki.com/?

user00012-ab6y ago

tiddlywiki was great until all the browsers stop supporting writing to local files, now saving changes is a pain, making me find something else.

ahnick6y ago

Maybe this solves your problems? It creates a database in your browser's LocalStorage.

https://noteself.github.io/

1 more reply

type-26y ago

I used to run it on node but then I switched to notion I prefer the notion way much more. However I'm looking to move away from notion to something selfhosted.

hinkley6y ago

My coworker used to swear by TW and every once in a while when I read browser release notes I have wondered if it still works for him.

It sounds like you're saying that nobody bothered to modify it to use LocalStorage, which is a surprise.

fsiefken6y ago

I run it on a webdav server like caddy, there is also this ruby script you can start. Works ok, set it up once and you forget about it.

capableweb6y ago

Everything I write about (journal + other things, task lists and what not) is written in plain markdown files currently (about to move it to TiddlyWiki, one of these days...) and to get search, I just use `the-silver-searcher` which searches the entire directory of my files. Simple and scalable (got around 9k documents by now)

insomniacity6y ago

My eternal frustration in this space is that my employer has strict firewalls, web filtering and data-loss prevention software, and remote access is over Citrix with no copy-paste. Consequently, if I build a knowledge base, it is stuck inside the firewall. Equally, if I build it outside, I can't use it at work.

fctorial6y ago

Why don't you host it on an ec2 instance? They won't be blocking amazon ip. Where do you work?

insomniacity6y ago

There's definitely no external access without going through the web proxy. And a new uncategorized site would be blocked by the web proxy - and it wouldn't pass review.

I work in a highly regulated industry...

Any workaround would be grounds for termination. So there's no point to my comment really - just curious if anyone else is in the same boat.

5 more replies

karlicoss6y ago

Hey, author here. Happy to answer any questions!

saadalem6y ago

Is there a way we can subsribe to the blog ?

porker6y ago

> Ideally I want to be able to do fulltext realtime search over anything that I ever had in my visual field. Not even necessarily text, but audio and video as well.

Where I find all these systems break down is recall. They're designed for someone who can recall a word or phrase that was in the content. I can usually recall "It was about X" or "The document/web page/image looked like Y". But an actual word? The author's name? Not a chance.

While a more difficult problem, if the tool is to live up to the "Future" section of this page, it's got to go a long way beyond what's in the source data, to what's thought of by the user.

albertzeyer6y ago

This topics comes up again and again. I collected some notes about this here: https://github.com/albertz/wiki/blob/master/personal-knowled...

E.g. one software I started to use is nvALT, via: https://www.macstories.net/links/organizing-everything-with-...

But I'm nowhere near a perfect and complete solution yet...

computronus6y ago

The successor to nvALT, nvUltra, is currently in private beta. I'm looking forward to its release!

https://nvultra.com/

porker6y ago

And still Mac only :(

ajphdiv6y ago

I self host a confluence server. All my content is available to me offline. Might be a bit overkill, but I have knowledge bases for all my work. If there is a web page I come across I can just copy/paste the content into a new post. Everything is searchable. It really is great. They offer a starter license, which is $10 per year:

https://www.atlassian.com/licensing/starter

tomerbd6y ago

I have less notes after being fed up with nites. It's really time consing to manage notes so - I manage logs. I just log everything I do each task in it's new page. It's append only.

For notes which I mutate I just keep a personal web site and I tried to keep this as cheatsheet and as compact as possible so I don't need to manage it.

So append only log in quip new folder for each task.

Mutative cheatsheet super compact pages in personal website.

Oh and for quick sniper's alfred.

That's it.

glinkot6y ago

I use a few things for this (on windows):

- For notes, OneNote, though I'm always on the lookout for an alternative with decent UI and syncing, but using open file formats. Full text search simple enough with this. Code formatting isn't good but there's an addin where the free version formats it as it was copied.

- To search local files, Voidtools Everything is great. Searching instantly by filename is a real time saver.

- If I want full text search of a large base of documents, I used Likasoft Archivarius which cost me $30 about 10 years ago and is still handy. It's the only local desktop search I've found that supports full text indexing of tons of formats like outlook .ost, etc and can look inside archive files

- For backups I've continued to stick with external drives, mirrored periodically with Freefilesync. 3 copies - one as master, two mirrors ensuring one is offsite.

seized6y ago

Take a look at Standard Notes. It is privacy focussed with encryption but has markdown and code editors and can be self hosted

glinkot6y ago

Thanks, looks interesting. I find Markdown a great idea in theory, but have found very few examples of wysiwyg markdown editors that work 'as you'd expect'. For me that means:

- Bullets with multiple indents going from 1 to 1) a. etc - Table handling - Usual formatting like heading levels etc

And there seem to be lots of flavours of markdown too, just to add another layer to things.

NetOpWibby6y ago

I love SN and have been using it for a few months. No complaints thus far. I will add that for me, the Simple Markdown Editor module is better the other one (forget the name).

flaque6y ago

If you're into this sort of thing, you might want to checkout Roamresearch: https://roamresearch.com/

losteric6y ago

Seems similar to ZIM (https://zim-wiki.org/), except proprietary/hosted? I've just started using zim - can someone more experienced compare the two?

dapithor6y ago

I wish things like https://piggydb.net/ had more momentum or competitors... personal knowledge databases seem to be such a tough niche to tackle.

Edit: since there is a new project here is more details years back: http://www.linux-magazine.com/Issues/2014/160/Workspace-Pigg...

jslakro6y ago

We could fill a whole internet with each personal method for storing, classifying and accesing. We're missing a OS for our own memory.

jefurii6y ago

I wish there was a method for printing QR codes or URLs on paper that would be the reverse of scanning a QR code. This would make it easy to write complex URLs in your paper diary/techo/commonplace book/notebook.

andreygrehov6y ago

I keep my knowledge in a private Git repo managed by https://www.gitbook.com/. So far it works out great. Going to make it public soon.

karlicoss6y ago

That's cool, please drop me an email (or just share here?) when you release it, I'm collecting (https://beepb00p.xyz/tags.html#exobrain) other people's wikis!

spoontoeat6y ago

Thanks for making your notes public. It inspired some further thinking for my org-mode setup.

ziyadb6y ago

The holy grail [https://beepb00p.xyz/pkm-search.html#future] of this really resonated with me and fully mirrors what I've been thinking about the past few months. In my observations, it's input capture, information organization, and subsequent retrieval:

Information Capture:

Input Capture - You’re going to have all-encompassing tracking and recording of all activity, but want configurable privacy on the extent to which you want your daily conversations and observations of external things you encounter and are exposed to. Capturing input needs to be holistic and incorporate all properties of encounters and new information.

Potential sources of input:

Vision — point of view recording, see snapchat spectacles, etc as primitive examples. Audio (voice notes and multi-party conversations) - voice calls, video, etc. and other forms of audio transmission where there is more than a single party in the interaction. Digital interactions You will need to keep track of web pages you visit at what times Conversations you see on Twitter, etc.

Properties and cues must be extrapolated from the information that is captured on input, in the case of audio, transcriptions are sufficient for transcription and retrieval purposes, however since video is a visual medium, it includes significantly more properties that need to be accounted for.

The aim here is to identify sufficient data points (cues) that are subsequently represented in such a way that they are easy to search across things you have encountered but only seem to recall a certain property or cue from. This is because of the fact that human beings tend to remember things in fragments, for instance, you might remember a certain color on a page that you visited within the last 6 months and nothing else.

So long as you are capturing sufficient input and actions then you should be able to go back to any given point in time. How and where are you going to store this information? Storing everything is going to be a large amount of data. The essence of the information and context must be preserved. If you want to wind back to an arbitrary position in time with the original context intact, you want to retain as much as you can in the most efficient manner possible, so determining which data points to retain is essential. (Once the content structure has been figured out, this will be viable).

Examples of Primary Cues:

Time - humans generally keep track of things in a linear time-based fashion. Color - invokes emotion and is memorable. Physical Location - the efficiency of information retrieval is highly influenced by the location at which it is originally synthesized, encountered, and stored. Keywords - the default conventional mode. Can and should be extracted from video/imagery and audio. Imagery - search for images based on their contents and ambience.

Potential Secondary Cue — Music - see historical associated input and actions while certain music was played. (What else?)

Meta Cues — Subjects - Automated tagging of keywords/encountered content.

Any combination of these queries is possible, but ultimately the killer feature is the ability to backtrack through time to find a certain piece of information that is made available thanks to the always-on recorded nature of your interactions with the physical and digital worlds combined.

Knowing what to store, and how, + displaying it needs to be worked on further.

lcall6y ago

http://onemodel.org, described elsewhere here and moreso at that site, tries to model arbitrary knowledge and has a vision encompassing any kind of info one wanted to be tracked (again, more at the site). (Edit: If you have possible future interest, there is an announcements list.)

maurits6y ago

I've been pondering on building something like this for a while.

For now, I've settled on sphinx because it can be easily exported to dash, and tied in to an alfred workflow for search.

Unsimplified6y ago

Tried the custom webapp and DB solution for a while. Wasn't publicly portable enough (for others to copy paste/export easily).

Currently using markdown files in git repos.

hvasilev6y ago

I use a vim plugin called vimwiki and I export my todos and notes into HTML. Works fine for me.

JabavuAdams6y ago

I basically live in Evernote. Will gradually transition to personal tooling.

rawoke0836006y ago

Most stuff (links, photos, docs, etc) I just email it to myself

voltagex_6y ago

Is there anything for people who don't use Vim?

executesorder666y ago

emacs has org-mode.

jacquesm6y ago

A search infrastructure for my knowledge would require access to wetware. Code I can see working.

marv3lls6y ago

Ya lost me at $(emacs)!

chimichangga6y ago

I just email links, code, docs, etc. to myself with descriptive subjects and tags.

j / k navigate · click thread line to collapse

157 comments

djhworld6y ago

I've given up with trying to find The One True Note Taking Tool, so have ended up writing my own thing that I tinker with now and again to tune it to exactly what I need.

It's essentially a simple web server that sits on top of a bunch of markdown files.

Alongside the files, the data is also stored in a SQLite database file with some metadata, and I'm using the Full Text Search (FTS5) engine to support search which seems to work ok.

If the database gets corrupted it can just be rebuilt, it's really just there to augment the notes. If I stop developing it or want to move on, the notes are there as text files.

It works well enough in a mobile browser, although admittedly a bit rubbish if you need offline access.

Works well enough for me. I might open source it one day but I think I'd need to clean up the code a bit first :)

EDIT: the core of the tool was mostly inspired by this article https://golang.org/doc/articles/wiki/

gwgundersen6y ago

[1] https://github.com/gwgundersen/anno

archontes6y ago

Sir! I have to say seeing you here that I appreciate your contributions.

haddr6y ago

hammerbrostime6y ago

coleifer6y ago

I have a similar tool I've been using for years now. It is built on sqlite and uses the fts extension to provide full-text search.

sqs6y ago

karlicoss6y ago

Thank you for replying and updating the docs, appreciate it!

mikepurvis6y ago

https://github.com/hound-search/hound

blyry6y ago

I can't emphasize enough how fast hound is and how pleasurable it is having a regex based code search that doesn't make me wait.

1 more reply

avisaven6y ago

[1] https://oracle.github.io/opengrok/

pouta6y ago

After reading about your masterplan I would love to know your thoughts on the question presented regarding phase 2.

Will coding in the future be more like writing a novel or like knowing how to read+write? I feel the latter will eventually be true as the the human-machine interface becomes more 'native'.

otakucode6y ago

sqs6y ago

scarejunba6y ago

I was in an Uber with one of your engineers who was heading to Gophercon. He seemed cool so I'm going to assume you're all cool people.

mlthoughts20186y ago

Wow, the prices seem extremely high to me for a search engine across code repos.

It would be hard for me to justify even $10/person for something like Sourcegraph in my company (a Fortune 500 ecommerce brand), for the highest enterprise tier of functionality.

$30/person per month for the lowest tier? Boy, I wish I knew of companies willing to pay that. None in my experience ever have been.

seagreen6y ago

> in my company (a Fortune 500 ecommerce brand)

My strategic advice is to get whatever's best in class, and not worry about $X0/month. Compared to what you should be spending on devs that rounds to free.

1 more reply

capableweb6y ago

Thank you for updating the documentation to clarify the use case though!

ssivark6y ago

jhoechtl6y ago

> It’s amazing for how long Emacs’ Org-mode has been largely unparalleled!

I use org-mode all the day but frankly OneNote is great too!

If OneNote would save in plain text and have a cross-platform gui I would use it (even if it's resource-sucking electron)

0x262d6y ago

Basically, onenote is almost there but I would love to leave it

beerandt6y ago

OneNote is the only note taking app that I've come remotely close to "successfully" useing.

I especially love how it automatically cites and links to whatever you copy and past from the web. That alone is so valuable for documenting workflow and how-to write-ups.

But I have yet to find anything that I like better. Or will consistently use as much.

voltagex_6y ago

The OneNote format is documented. If I knew Haskell I might have a go at adding support to OneNote, but if anyone wants to have a crack at it I'd support you.

tra36y ago

> It’s amazing for how long Emacs’ Org-mode has been largely unparalleled!

On the other hand, orgmode has a fairly vibrant community that will keep improving orgmode for many years to come.

gallegojaime6y ago

My hackerspace is working on a tool to put this knowledge-in-a-computer to work.

For example: do you need to find a team with a specialized couple of skills? Have applicants send their verified graphs and use those relations to find the best fit.

Or, alternatively, someone who's learned a trade/skill can share their dense knowledge with a community, to direct learning more effectively.

It's on a very early stage, for now purely for the fun of it. But if there's interest or suggestions (definitely some hard problems to solve) we could focus more efforts towards that.

rhizome6y ago

>Evernote seems to have a fading presence on the landscape

macintux6y ago

slightwinder6y ago

marviel6y ago

It's a ripe space. I'm using notion mostly right now, but I've also used:

-Coda.io (big, more scriptable player)

-Hypernote (super new player, but with a cool new take on inter-note relationships)

-Tiddlywiki (super customizable, really fast -- but also has a fair amount of footguns)

-Airtable (only played with it a few times but it's usually mentioned in the same breath as notion, I notice)

Hopefully someday we'll achieve Alan Kay's dream :)

DavideNL6y ago

If only Notion was private (like for example Omnifocus.) I can't imagine uploading all my private data to the cloud of a "free" app.

OmniFocus is more expensive but i gladly pay to prevent my data from being analyzed & sold.

1 more reply

wowtip6y ago

Airtable is fantastic for so many things, I only wish there was a cheaper "personal" plan with more records, for a couple of bucks / month.

Yes, I know, they got to eat and everything, and maybe cost vs income is not feasible for personal accounts.

elbear6y ago

I've only started using Tiddlywiki, so I didn't get the chance to dive deep. Can you mention some of the footguns?

1 more reply

thiagomgd6y ago

Have you looked at Wiki.js? I played around with it a little bit and it seemed nice

bachmeier6y ago

> Evernote seems to have a fading presence on the landscape

They've made massive changes over the past year. They'll even have a Linux app coming out soon!

pottertheotter6y ago

Really? What massive changes? I use it regularly but haven't noticed anything massive.

klft6y ago

[1] https://github.com/gwgundersen/anno

[2] https://news.ycombinator.com/item?id=22033792

[3] https://pandoc.org/

[4] https://www.fujitsu.com/global/products/computing/peripheral...

[5] https://en.wikipedia.org/wiki/Pdftotext

tomatocracy6y ago

Recoll[1] works very well for indexing documents for me including my OCRd scans. When that's not enough, I revert to pdfgrep.

0. https://github.com/tesseract-ocr/tesseract 1. https://www.lesbonscomptes.com/recoll/

lifeisstillgood6y ago

jamiek886y ago

1 more reply

ubercow6y ago

Have you looked into apps like Scanbot [1]

1: https://scanbot.io/en/index.html

lifeisstillgood6y ago

Totally unrelated but I love these "how I built my version of" threads - I learn about tech and projects I never knew existed

ok carry on please, diversion over :/)

PenguinCoder6y ago

I've been looking for a good multi-document feed scanner. Do you have experience using the iX500 with Linux, or gscan2pdf?

My usecase would be scanning multi page documents with minimal effort, and saving to PDF somewhere.

klft6y ago

[1] https://en.wikipedia.org/wiki/Tesseract_(software)

stillwater566y ago

Terr_6y ago

Yes, the effectiveness of note-taking (particularly handwritten) on memory has been a subject of scientific interest for a while.

I suspect our consciousness isn't nearly as unified as we like to believe.

hinkley6y ago

This has changed back and forth over time for me.

Tests in middle school, I could recall writing things down, even the part of the page I wrote them in.

By college I would write TODO's down and lose them, and not be able to recall what I wrote down. Misplacing the note was more likely than forgetting the task, so I stopped writing them down.

I should try to measure this again because right now I couldn't tell you which works better.

It's like a weird mid-life crisis.

otakucode6y ago

lazyasciiart6y ago

Well, you also used that 7 digit code a lot more often than you ever had to recall your start or end dates.

slightwinder6y ago

Hm, basically it's making real final decision, instead of playing with a bunch of potenial possible decisions which are all somewhat equal, but also kinda fuzzy,

_y5hn6y ago

This is well-known. Hand-writing are found in research to work very well, though digital note-taking also works. Those dilligent students were rewarded, though very good to learn this yourself!

I've found Freemind to work well enough for me. Search not needed as I browse the graph easily enough.

nik_0_06y ago

semitext6y ago

lcall6y ago

(Edit: it is currently only self-hosted by each user. Have considered doing hosting for other users, and might some day.)

oliv__6y ago

Looks interesting but honestly I had trouble keeping my attention focused enough to read through the intro page.

A little CSS (max-width: 700px; margin: 0 auto;) on the body would go very far.

lcall6y ago

slightwinder6y ago

I remember this one. Looked interessting, but was lacking the "big picture" of data. It seems to be easy to get lost in data and loose your trail.

lcall6y ago

Thanks. Can you elaborate, including on what would be a fix for you?

(some edits for clarity above, and)

Is any of that relevant, or do you have something else in mind? Thanks again for the feedback.

1 more reply

lcall6y ago

(If one has possible future interest, there is an announcements list, and feedback is also appreciated.)

gotts6y ago

telnet demo seems to be down at the moment: Trying 52.37.29.12...

lcall6y ago

If you have possible future interest, there is an announcements list.

gricardo996y ago

A great time saver for me was simply setting up better bash history and search capabilities[1].

1 - https://spin.atomicobject.com/2016/05/28/log-bash-history/

2 - https://www.thegeekstuff.com/2013/05/date-command-examples/

dchichkov6y ago

Reminds me somewhat similar - CEO of Wolfram developed a nice way of record keeping: https://writings.stephenwolfram.com/2019/02/seeking-the-prod...

tibu6y ago

Evernote does character recognition quite well. I don't know if there are any others but would be good to have something else too so I can leave Evernote for Notion.

napoleond6y ago

silicon24016y ago

what's the optimal setup for long-term, large-scale (personal) data storage?

As an individual, do you or anybody else know the best way to implement such a system? Should I buy one giant hard drive, use many hard drives to create a RAID array, something else?

kortex6y ago

Oooh. I've been wrestling with this problem for a while now.

Secure, ultra-important stuff, I don't really have a system for.

1 more reply

philsnow6y ago

You mention S3 and Athena, but also that you're building for longevity. Are you planning for the future obsolescence of AWS, or going to cross that bridge when you get to it?

napoleond6y ago

spdustin6y ago

karlicoss6y ago

I'm kind of working on that too :) https://github.com/karlicoss/my

I wrote a post on some data that I collect and have/will integrate: https://beepb00p.xyz/my-data.html#consumers

K0SM0S6y ago

I only skimmed through and the combined breadth + intent of your projects seems very, very interesting — I mean it speaks to me. So, way to go! Mad props, please keep it up!

If you ask me, this is the shape of things to come.

1 more reply

capableweb6y ago

I (and many others I'm sure) have been thinking about similar things as well. Not really sure how it work though. Any one care to brainstorm up an architecture that can support this?

user00012-ab6y ago

Google has already had 2-3 services to manage your data that they have closed down. Maybe they are the ones that taught me not to trust your data with anything on the web.

Even something like Evernote is iffy, they seem like they are constantly on the verge of shutting down.

Although I do find it sad that that the human race as a whole puts so little value into this type of software, and so much value into sports and politics.

lcall6y ago

Maybe I could host for others sometime if there were sufficient interest. And/or move it to sqlite.

capableweb6y ago

Yeah, seems neither self-host (onemodel) or letting someone else (you or Evernote) is particularly attractive, because the chance of data loss is always there.

Is it possible there is a solution that makes the data more permanent and allows multiple parties to backup the same sources, or something similar? Some sort of federation protocol maybe.

1 more reply

ramses06y ago

https://en.wikipedia.org/wiki/Blosxom ... I got started in 2004 with that setup: http://www.robertames.com/blog.cgi/entries/bloxsom-started.h...

...a bit contrarian compared to the WordPress and BlogSpot frenzy at the time, but I've been happy with it.

[rames@...:~/blog/entries]$ find . -type f | wc -l 331 [rames@...:~/blog/entries]$ find . -type f | xargs -n1 cat | wc -c 574481

It's been very stable over ~15 years, but I think it might be time to adopt SQLite, at least as a caching layer. ;-)

nabnob6y ago

Someone shared this on HN yesterday - https://labs.tomasino.org/gnu-recutils/

It's a set of unix-style tools that let you treat text files as databases.

theshrike796y ago

This is what I've moved to: https://joplinapp.org

It's just plain markdown and syncs to any cloud provider or a webdav share. Butt-ugly especially on iOS, but it works and there is no vendor lock-in.

Kiro6y ago

ketzo6y ago

[1] roamresearch.com

qot6y ago

Worth mentioning the pricing [0]:

$30 / month

$10,000 / lifetime

[0]: https://twitter.com/Conaw/status/1214855473876201472

K0SM0S6y ago

Maybe they'd do better to ease up on the tracking, especially for a "give us all your documentation" service.

jefurii6y ago

Looks neat but it's a service, which is kinda the opposite of rolling something for yourself.

slightwinder6y ago

Isn't this just a desktop-wiki with auto-linking?

Fiveplus6y ago

> all digital trace I'm leaving (tweets, internet comments, annotations)

kirubakaran6y ago

Here is a bit longer comment on that which I made earlier today: https://news.ycombinator.com/item?id=22160026

swozey6y ago

This is cool, I'd dig a $2-5/mo unlimited account for 1 person/team with the same unlimited settings.

1 more reply

matlin6y ago

mikepurvis6y ago

It's barely a year old, but I think Timeliner is kind of trying to be this, as something you run yourself to protect against disappearing cloud services:

https://github.com/mholt/timeliner

wtracy6y ago

This have me a hairbrained idea for a browser extension that drops every web page you visit into a private Lucene database.

mariushn6y ago

Try out https://booxia.wensia.com/ or https://historio.us/

cobby6y ago

You might want to look into WorldBrain's Memex extension. I think it crawls every page you visit into IndexedDB so you can do full-text searches later.

ryanfox6y ago

I’ve been building essentially that, but more than just websites: https://apse.io. I am really pleased with how it works - just released v2.0!

user00012-ab6y ago

I was kind of having the same idea, except any site you bookmark gets added to a personal web crawler, and then you have your own search site for things you find interesting.

walterbell6y ago

This exists on iPhone/iPad! DevonThink2Go, local crawl/search + optional encrypted sync over self-hosted WebDAV or public cloud services. Can also take/search markdown notes.

jmakov6y ago

No mentions of https://tiddlywiki.com/?

user00012-ab6y ago

tiddlywiki was great until all the browsers stop supporting writing to local files, now saving changes is a pain, making me find something else.

ahnick6y ago

Maybe this solves your problems? It creates a database in your browser's LocalStorage.

https://noteself.github.io/

1 more reply

type-26y ago

I used to run it on node but then I switched to notion I prefer the notion way much more. However I'm looking to move away from notion to something selfhosted.

hinkley6y ago

My coworker used to swear by TW and every once in a while when I read browser release notes I have wondered if it still works for him.

It sounds like you're saying that nobody bothered to modify it to use LocalStorage, which is a surprise.

fsiefken6y ago

I run it on a webdav server like caddy, there is also this ruby script you can start. Works ok, set it up once and you forget about it.

capableweb6y ago

insomniacity6y ago

fctorial6y ago

Why don't you host it on an ec2 instance? They won't be blocking amazon ip. Where do you work?

insomniacity6y ago

There's definitely no external access without going through the web proxy. And a new uncategorized site would be blocked by the web proxy - and it wouldn't pass review.

I work in a highly regulated industry...

Any workaround would be grounds for termination. So there's no point to my comment really - just curious if anyone else is in the same boat.

5 more replies

karlicoss6y ago

Hey, author here. Happy to answer any questions!

saadalem6y ago

Is there a way we can subsribe to the blog ?

porker6y ago

> Ideally I want to be able to do fulltext realtime search over anything that I ever had in my visual field. Not even necessarily text, but audio and video as well.

While a more difficult problem, if the tool is to live up to the "Future" section of this page, it's got to go a long way beyond what's in the source data, to what's thought of by the user.

albertzeyer6y ago

This topics comes up again and again. I collected some notes about this here: https://github.com/albertz/wiki/blob/master/personal-knowled...

E.g. one software I started to use is nvALT, via: https://www.macstories.net/links/organizing-everything-with-...

But I'm nowhere near a perfect and complete solution yet...

computronus6y ago

The successor to nvALT, nvUltra, is currently in private beta. I'm looking forward to its release!

https://nvultra.com/

porker6y ago

And still Mac only :(

ajphdiv6y ago

https://www.atlassian.com/licensing/starter

tomerbd6y ago

I have less notes after being fed up with nites. It's really time consing to manage notes so - I manage logs. I just log everything I do each task in it's new page. It's append only.

For notes which I mutate I just keep a personal web site and I tried to keep this as cheatsheet and as compact as possible so I don't need to manage it.

So append only log in quip new folder for each task.

Mutative cheatsheet super compact pages in personal website.

Oh and for quick sniper's alfred.

That's it.

glinkot6y ago

I use a few things for this (on windows):

- To search local files, Voidtools Everything is great. Searching instantly by filename is a real time saver.

- For backups I've continued to stick with external drives, mirrored periodically with Freefilesync. 3 copies - one as master, two mirrors ensuring one is offsite.

seized6y ago

Take a look at Standard Notes. It is privacy focussed with encryption but has markdown and code editors and can be self hosted

glinkot6y ago

Thanks, looks interesting. I find Markdown a great idea in theory, but have found very few examples of wysiwyg markdown editors that work 'as you'd expect'. For me that means:

- Bullets with multiple indents going from 1 to 1) a. etc - Table handling - Usual formatting like heading levels etc

And there seem to be lots of flavours of markdown too, just to add another layer to things.

NetOpWibby6y ago

I love SN and have been using it for a few months. No complaints thus far. I will add that for me, the Simple Markdown Editor module is better the other one (forget the name).

flaque6y ago

If you're into this sort of thing, you might want to checkout Roamresearch: https://roamresearch.com/

losteric6y ago

Seems similar to ZIM (https://zim-wiki.org/), except proprietary/hosted? I've just started using zim - can someone more experienced compare the two?

dapithor6y ago

I wish things like https://piggydb.net/ had more momentum or competitors... personal knowledge databases seem to be such a tough niche to tackle.

Edit: since there is a new project here is more details years back: http://www.linux-magazine.com/Issues/2014/160/Workspace-Pigg...

jslakro6y ago

We could fill a whole internet with each personal method for storing, classifying and accesing. We're missing a OS for our own memory.

jefurii6y ago

andreygrehov6y ago

I keep my knowledge in a private Git repo managed by https://www.gitbook.com/. So far it works out great. Going to make it public soon.

karlicoss6y ago

That's cool, please drop me an email (or just share here?) when you release it, I'm collecting (https://beepb00p.xyz/tags.html#exobrain) other people's wikis!

spoontoeat6y ago

Thanks for making your notes public. It inspired some further thinking for my org-mode setup.

ziyadb6y ago

Information Capture:

Potential sources of input:

Examples of Primary Cues:

Potential Secondary Cue — Music - see historical associated input and actions while certain music was played. (What else?)

Meta Cues — Subjects - Automated tagging of keywords/encountered content.

Knowing what to store, and how, + displaying it needs to be worked on further.

lcall6y ago

maurits6y ago

I've been pondering on building something like this for a while.

For now, I've settled on sphinx because it can be easily exported to dash, and tied in to an alfred workflow for search.

Unsimplified6y ago

Tried the custom webapp and DB solution for a while. Wasn't publicly portable enough (for others to copy paste/export easily).

Currently using markdown files in git repos.

hvasilev6y ago

I use a vim plugin called vimwiki and I export my todos and notes into HTML. Works fine for me.

JabavuAdams6y ago

I basically live in Evernote. Will gradually transition to personal tooling.

rawoke0836006y ago

Most stuff (links, photos, docs, etc) I just email it to myself

voltagex_6y ago

Is there anything for people who don't use Vim?

executesorder666y ago

emacs has org-mode.

jacquesm6y ago

A search infrastructure for my knowledge would require access to wetware. Code I can see working.

marv3lls6y ago

Ya lost me at $(emacs)!

chimichangga6y ago

I just email links, code, docs, etc. to myself with descriptive subjects and tags.

j / k navigate · click thread line to collapse