SQLite Is a Library of Congress Recommended Storage Format (opens in new tab)

(sqlite.org)

648 pointswhatisabcdefgh3d ago192 comments

192 comments

I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.

So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.

The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.

I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.

https://github.com/tnelsond/peakslab

SQLite is very well engineered, PeakSlab is very simple.

sgbeal3d ago

> Compare that to SQLite's 1.2mb of wasm and glue code

The current trunk is actually 1.7mb in its canonical unminified form (which includes very nearly as much docs as JS code), split almost evenly between the WASM and JS pieces :/. Edit: it is 1.2mb in minified form, though.

Disclosure: i'm its maintainer.

Edit: current trunk, for the sake of trivia:

    sqlite3.wasm 896745
    sqlite3.mjs  816270 # unminified w/ docs
    sqlite3.mjs  431388 # unminified w/o docs
    sqlite3.mjs  310975 # minified

smartmic3d ago

Many comments here to your creation, PeakSlab, but not yet a dedicated praise. I didn't know it but I have to say it is really cool and innovative! The performance of the dictionary is indeed superb and I will definitely bookmark this for future reuse. So, in a nutshell: thanks for sharing!

pjc503d ago

I think actually this competes with the old BerkeleyDB: https://en.wikipedia.org/wiki/Berkeley_DB - which I now see is no longer BSD-licensed, and in any case has been rendered almost extinct by SQLite. It was used for basic on-disk key-value store work.

tnelsond43d ago

Even BerkeleyDB tries to be mutable. What I'm doing doesn't need the mutability so it's much more similar to dictionary formats (though probably simpler) than it is to a database. Though a lot of people do use full databases for immutable dictionary key-value stuff. I just couldn't get any database to work well enough for a pwa dictionary.

raxxorraxor2d ago

SQLite is simple in its own way and I like the design principle of their SQL dialect.

"Right joins are just left joins in the wrong direction, you don't need that crap"

Of course it always gets simpler or more specialised. I think many apps using databases would run with SQLite just as well. And some would probably run just as well with a textfile instead of any db like SQLite.

luckystarr2d ago

For the love of god, don't do blank textiles anymore. In the end you have a software that has 20 (or more) individual files for each programs section, which works fine until you want the files to be consistent. Boom. And then you add a lock to fix it and suddenly your whole program can only run sequentially. And then your customers ask why it's so slow in ingress. I won't name any names here, but this is a real commercial product.

BenjiWiebe2d ago

We use a cheap invoicing program. It works fine except it gets very slow when dealing with large numbers if invoices. Turns out each invoice (or payment record, or customer record, or whatever) is a separate text file with form-urlencoded data. No indices.

gpvos3d ago

A more standard solution would be cdb.[0] Although that doesn't support compressed data.

[0] https://cdb.cr.yp.to/ , https://en.wikipedia.org/wiki/Cdb_(software)

giza1823d ago

Perhaps a dumb question, but how do you get data into it if you’re not doing writes

andrelaszlo3d ago

I think it's just immutable once you've generated it. No need to update indexes or check consistency on writes, no need for transactions, etc.

tnelsond43d ago

Generate it one time from a source tsv file or folder of media.

pfortuny3d ago

Think historical records of, say, share values for past years. You might have a single db for 1900-2000, for instance. Things like that.

Not everything needs to be real-time updated.

meindnoch3d ago

It is crashing Safari.

zoky3d ago

something something XKCD competing standards something something

lpln34523d ago

Creating something new for a different use case isn't pointless. It's like comparing inline skates to ice skates.

tnelsond43d ago

Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa cross platform tooling.

bbkane3d ago

Does this remain true now that SQLite has a WASM build?

1 more reply

keybored3d ago

Doesn’t even apply unless someone says that (1) there are too many “standards”, and (2) so we are making this standard (neither apply here). Someone made something.

We should really consider eventually retiring memes because they just end up as thought-terminating cliches.

This is of course referring to xkcd #927. How do I know that?

alexpotato3d ago

I have always loved SQLite.

I have also heard that some firms ban its use.

Why?

Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.

DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.

I still love SQLite though.

13 more replies

faangguyindia3d ago

I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"

SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.

Today, most of my apps are simply go binary + SQLite + systemd service file.

I've yet to lose data. Performance is great and plenty for most apps

michaelchisari3d ago

The single writer is less of an issue in practice than it's made out to be. Modern nvme drives are incredible and it's trivial to get 5k writes per second in an optimized WAL setup. Way more than most apps could ever dream.

And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.

0123456789ABCDE3d ago

all* of that + sharding -> https://sqlite.org/lang_attach.html

ex: main.db + fts.db. reading and writing to main.db is always available; updating the fts index can be done without blocking the main database — it only needs to read, the reads can be chunked, and delayed. fts.db keeps the index + a cursor table — an id or last change ts

could also use a shard to handle tables for metrics, or simply move old data out of main.db

* some examples:

  conn = sqlite3.connect("data.db")
  conn.execute("PRAGMA journal_mode=WAL")        # concurrent reads (see above)
  conn.execute("PRAGMA synchronous=NORMAL")      # fsync at checkpoint, not every commit
  conn.execute("PRAGMA cache_size=-62500")       # ~61 MB page cache (negative = KB)
  conn.execute("PRAGMA temp_store=MEMORY")       # temp tables and indexes in RAM
  conn.execute("PRAGMA busy_timeout=5000")       # wait 5s on lock instead of failing

edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.

my replies are being ratelimited, so let me add this

the heavy duty server other databases have is doing that load bearing work that folks tend to complain about sqlite can't do

the real dmbs's are doing mostly the same work that sqlite does, you just don't have to think about it once they're set up. behind that chunky server process the database is still dealing with writing your data to a filesystem, handling transaction locks, etc.

by default sqlite gives you a stable database file, that when you see the transaction complete, it means the changes have been committed to storage, and cannot be lost if the machine were to crash exactly after that.

you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.

hparadiz3d ago

Oh fun something I have some metrics on. I just made this benchmark for every php orm a few weeks ago for fun.

https://the-php-bench.technex.us/

There's a huge performance difference between memory and file storage within sqlite itself. Not even getting into tuning specifics.

1 more reply

Ringz3d ago

I usually try to explain it like this: “Single writer” is rarely a real problem, because a writer is not slow. It writes exclusively, but very quickly.

"Batch writer pattern" is a good idea to get rid of expensive commits.

srcreigh3d ago

2026 recommended storage formats: https://www.loc.gov/preservation/resources/rfs/data.html

nashashmi3d ago

Taking a minute to appreciate the level of long term thinking required for storing data, to plan for 300-500 years into the future, to be able to withstand all kinds of innovations, and survive basic obsolescence.

What is the longest surviving paper medium?

rmunn3d ago

> As of this writing (2018-05-29) ...

So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.

(Thanks for the correction. Brief brain malfunction in the math department there).

tehlike3d ago

Sir, it's 2026. It's 8 years old.

harrouet3d ago

Not if the GP was written 2 years ago :)

rmunn3d ago

Corrected; thanks.

frollogaston3d ago

Was going to say, was having deja vu reading this

akihitot3d ago

For public-sector data preservation, it may be one of the best options.

The specification is publicly available

- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk

From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.

Spooky233d ago

Archivists also love formats close to native. SQLite lets the relational relationships be present in a way that csv cannot.

b40d-48b2-979e2d ago

Foreign keys are not enforced unless you enable it but only for that connection.

akihitot3d ago

That's certainly true. The ability to define table relationships is a major difference from CSV.

afshinmeh3d ago

I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.

maxloh3d ago

FYI, they added a lot more formats to the list after that.

  Preferred
  
  1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
    a. Formats using well known schemas with public validation tool available
    b. Line-oriented, e.g. TSV, CSV, fixed-width
    c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
  
  2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
  
  3. Character Encoding, in descending order of preference:
    a. UTF-8, UTF-16 (with BOM),
    b. US-ASCII or ISO 8859-1
    c. Other named encoding
  
  ---
  
  Acceptable
  
  For data (in order of preference):
  
  1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
  2. Text-based data formats with available schema
  
  For aggregation or transfer:
  
  1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.

https://www.loc.gov/preservation/resources/rfs/data.html

xxs3d ago

.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.

[0]: https://7-zip.org/7z.html

[1]: CVE-2025-0411

tnelsond43d ago

I love using zstd, it's so fast to decompress. I especially like that the JavaScript decoder is 8kb and still really fast. Though the 25kb wasm decoders are about twice as fast.

What are the advantages or reasons to use zstd in a 7z container versus just .zst?

1 more reply

tombert3d ago

On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.

I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.

And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.

I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.

topham3d ago

The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.

tombert3d ago

I am not sure the problem is actually fully solvable. I think SQLite helps at least a little.

mmooss3d ago

> I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"

Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...

noirscape2d ago

[delayed]

tombert3d ago

Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.

pbhjpbhj3d ago

Presumably Microsoft fear making it easy to swap OSes and access the same data.

"I can use Linux because if I get stuck I can just switch to Windows and still access my data" is a comfort that probably keeps people from even trying Linux (or other OSes)?

Why else would MS not support BTRFS/ZFS/Ext or whatever?

{I'm not saying that I think this works.}

1 more reply

ghrl3d ago

Something MacOS and Windows support natively would be a good start, it could grow from there.

Ringz3d ago

Looking at *all* my external drives now... that would be great.

tracker12d ago

I've used line-delimited, gzipped JSON for archive formats on several projects myself, which is a pretty good option... If I wanted more flexibility, would definitely consider SQLite.

In fact, I've worked on several projects, where I heavily advocated that even the primary app storage was SQLite, and that archival was simply copying the database after an event. Specifically, elections, petition verification, etc. It's kind of difficult coming up with complex schemas to handle multiple events as well as the state of data at those events... by separating the database itself, using SQLite, that simplifies a lot of thigs. Though it does, practically limit scale a bit. The main thing would be to archive the application and the database after a given event. If the application is containerized, you could create an image of the source, the container and the database after the event.

I think this kind of structure would work well for a lot of things... especially if you're considering data sharding anyway.

testermelon3d ago

I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.

[1] https://www.loc.gov/preservation/resources/rfs/data.html

mort963d ago

Especially when Office 365 shows that not even Microsoft is capable of making software which can display Office files anymore... if you have a Word file which was created or has ever been modified by the Word application, working with it through Office 365 in a browser is such a pain. I've literally had images which are impossible to delete or move in the web version, and they will absolutely render in the wrong place.

acdha3d ago

Archivists and librarians have to think in terms of practicality: if many tools exist to read something and it’s a mainstream software product, the odds are good that they’ll be unable to use those files 50 years from now. Not certain, but good, and that matters with limited budget and ability to tell the rest of the world what format to provide things in.

This can require nuance: for example, PDF has profiles because the core format is widely supported but you could do things like embed plugin content from now-defunct vendors and they would only want the former for long-term preservation.

pletnes3d ago

You can unzip the xlsx and read the xml inside. It’s not the worst format by far.

perching_aix3d ago

What would you reckon is the worst format? I'm very curious of your standards given this.

ray_v3d ago

It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!

llagerlof3d ago

I used SQLite for a few applications several years ago. One time, the database got corrupted and all the data was lost. That was the day I stopped using SQLite.

Also, the lack of enforced column data types was always a negative for me.

jjice3d ago

No matter the medium, backups are a must.

llagerlof2d ago

A hard lesson learned...

benhurmarcel2d ago

For column types there are STRICT tables now

llagerlof2d ago

Thank you!

justin662d ago

> the database got corrupted

What caused that?

llagerlof2d ago

I don't know why that happened, but one fine day I tried to open the file using the vanilla SQLite client, and it didn't open.

danborn262d ago

It is great to see SQLite getting this level of institutional recognition. The single file format makes archival storage incredibly straightforward compared to traditional database dumps.

lenwood2d ago

Just yesterday it occurred to me that it had been a while since I last saw an SQLite post at the top of HN.

I really like the simplicity and speed of SQLite, I've used in both personal and professional projects. For day-to-day work I still end up in Excel, not because I like it more (I don't), but because its ubiquity makes it the lowest friction way to share & explore datasets with less technical stakeholders and execs.

2 more replies

fpj2d ago

I don't know much about the LoC use case, but my initial reaction to the post is to ask why they are not building a data lake with open formats. I'm sure there are reasons for discarding open-table formats. Claude keeps telling me that the issue is that they don't address preservation properly.

xiaod2d ago

The operational complexity is worth comparing here. The migration path and schema evolution story often matter more than raw performance numbers for teams choosing between these options.

semiquaver2d ago

It certainly will be in the toolkits of data archeologists hundreds of years from now. Must be a weird feeling to create something so potentially long-lasting.

imrozim2d ago

I use postgresql for my startup but every time i needed a quick local testing i wish it was as simple as sqlite. No config just works.

infogulch2d ago

SQLite is remarkably versatile. Just a couple weeks ago an extension to do cross-process queues, streams, pub/sub etc in SQLite was released:

Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite | 327 points | 94 comments | https://news.ycombinator.com/item?id=47874647

Live notifications was one of the big missing pieces to implement whole apps on a sqlite backend, and now there's a decent solution.

amai2d ago

Which version of SQLite?

fragmede2d ago

Yes! Can it replace CSV, please?

butterNaN3d ago

(US)

GeorgeTirebiter2d ago

Now, if only the LoC would recognize the brilliance of the Fossil SCM ....

guelo3d ago

I get annoyed at all the other DBs that require their own heavy duty server process when for 90% of my projects there is only one client, my app server. Is there a DB that combines sqlite's embedded simplicity with higher concurrent write throughput?

TeriyakiBomb2d ago

I think the concurrent write thing is not as much of an issue nowadays with the speed of NVMEs and WAL.

graemep2d ago

Firebird, maybe?

j / k navigate · click thread line to collapse

192 comments

tnelsond43d ago

I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.

I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.

https://github.com/tnelsond/peakslab

SQLite is very well engineered, PeakSlab is very simple.

sgbeal3d ago

> Compare that to SQLite's 1.2mb of wasm and glue code

Disclosure: i'm its maintainer.

Edit: current trunk, for the sake of trivia:

    sqlite3.wasm 896745
    sqlite3.mjs  816270 # unminified w/ docs
    sqlite3.mjs  431388 # unminified w/o docs
    sqlite3.mjs  310975 # minified

smartmic3d ago

pjc503d ago

tnelsond43d ago

raxxorraxor2d ago

SQLite is simple in its own way and I like the design principle of their SQL dialect.

"Right joins are just left joins in the wrong direction, you don't need that crap"

luckystarr2d ago

BenjiWiebe2d ago

gpvos3d ago

A more standard solution would be cdb.[0] Although that doesn't support compressed data.

[0] https://cdb.cr.yp.to/ , https://en.wikipedia.org/wiki/Cdb_(software)

giza1823d ago

Perhaps a dumb question, but how do you get data into it if you’re not doing writes

andrelaszlo3d ago

I think it's just immutable once you've generated it. No need to update indexes or check consistency on writes, no need for transactions, etc.

tnelsond43d ago

Generate it one time from a source tsv file or folder of media.

pfortuny3d ago

Think historical records of, say, share values for past years. You might have a single db for 1900-2000, for instance. Things like that.

Not everything needs to be real-time updated.

meindnoch3d ago

It is crashing Safari.

zoky3d ago

something something XKCD competing standards something something

lpln34523d ago

Creating something new for a different use case isn't pointless. It's like comparing inline skates to ice skates.

tnelsond43d ago

Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa cross platform tooling.

bbkane3d ago

Does this remain true now that SQLite has a WASM build?

1 more reply

keybored3d ago

Doesn’t even apply unless someone says that (1) there are too many “standards”, and (2) so we are making this standard (neither apply here). Someone made something.

We should really consider eventually retiring memes because they just end up as thought-terminating cliches.

This is of course referring to xkcd #927. How do I know that?

alexpotato3d ago

I have always loved SQLite.

I have also heard that some firms ban its use.

Why?

DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.

I still love SQLite though.

13 more replies

faangguyindia3d ago

I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"

SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.

Today, most of my apps are simply go binary + SQLite + systemd service file.

I've yet to lose data. Performance is great and plenty for most apps

michaelchisari3d ago

And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.

0123456789ABCDE3d ago

all* of that + sharding -> https://sqlite.org/lang_attach.html

could also use a shard to handle tables for metrics, or simply move old data out of main.db

* some examples:

  conn = sqlite3.connect("data.db")
  conn.execute("PRAGMA journal_mode=WAL")        # concurrent reads (see above)
  conn.execute("PRAGMA synchronous=NORMAL")      # fsync at checkpoint, not every commit
  conn.execute("PRAGMA cache_size=-62500")       # ~61 MB page cache (negative = KB)
  conn.execute("PRAGMA temp_store=MEMORY")       # temp tables and indexes in RAM
  conn.execute("PRAGMA busy_timeout=5000")       # wait 5s on lock instead of failing

edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.

my replies are being ratelimited, so let me add this

the heavy duty server other databases have is doing that load bearing work that folks tend to complain about sqlite can't do

you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.

hparadiz3d ago

Oh fun something I have some metrics on. I just made this benchmark for every php orm a few weeks ago for fun.

https://the-php-bench.technex.us/

There's a huge performance difference between memory and file storage within sqlite itself. Not even getting into tuning specifics.

1 more reply

Ringz3d ago

I usually try to explain it like this: “Single writer” is rarely a real problem, because a writer is not slow. It writes exclusively, but very quickly.

"Batch writer pattern" is a good idea to get rid of expensive commits.

srcreigh3d ago

2026 recommended storage formats: https://www.loc.gov/preservation/resources/rfs/data.html

nashashmi3d ago

What is the longest surviving paper medium?

rmunn3d ago

> As of this writing (2018-05-29) ...

So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.

(Thanks for the correction. Brief brain malfunction in the math department there).

tehlike3d ago

Sir, it's 2026. It's 8 years old.

harrouet3d ago

Not if the GP was written 2 years ago :)

rmunn3d ago

Corrected; thanks.

frollogaston3d ago

Was going to say, was having deja vu reading this

akihitot3d ago

For public-sector data preservation, it may be one of the best options.

The specification is publicly available

- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk

From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.

Spooky233d ago

Archivists also love formats close to native. SQLite lets the relational relationships be present in a way that csv cannot.

b40d-48b2-979e2d ago

Foreign keys are not enforced unless you enable it but only for that connection.

akihitot3d ago

That's certainly true. The ability to define table relationships is a major difference from CSV.

afshinmeh3d ago

I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.

maxloh3d ago

FYI, they added a lot more formats to the list after that.

  Preferred
  
  1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
    a. Formats using well known schemas with public validation tool available
    b. Line-oriented, e.g. TSV, CSV, fixed-width
    c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
  
  2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
  
  3. Character Encoding, in descending order of preference:
    a. UTF-8, UTF-16 (with BOM),
    b. US-ASCII or ISO 8859-1
    c. Other named encoding
  
  ---
  
  Acceptable
  
  For data (in order of preference):
  
  1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
  2. Text-based data formats with available schema
  
  For aggregation or transfer:
  
  1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.

https://www.loc.gov/preservation/resources/rfs/data.html

xxs3d ago

[0]: https://7-zip.org/7z.html

[1]: CVE-2025-0411

tnelsond43d ago

I love using zstd, it's so fast to decompress. I especially like that the JavaScript decoder is 8kb and still really fast. Though the 25kb wasm decoders are about twice as fast.

What are the advantages or reasons to use zstd in a 7z container versus just .zst?

1 more reply

tombert3d ago

I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.

topham3d ago

The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.

tombert3d ago

I am not sure the problem is actually fully solvable. I think SQLite helps at least a little.

mmooss3d ago

> I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"

Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...

noirscape2d ago

[delayed]

tombert3d ago

Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.

pbhjpbhj3d ago

Presumably Microsoft fear making it easy to swap OSes and access the same data.

"I can use Linux because if I get stuck I can just switch to Windows and still access my data" is a comfort that probably keeps people from even trying Linux (or other OSes)?

Why else would MS not support BTRFS/ZFS/Ext or whatever?

{I'm not saying that I think this works.}

1 more reply

ghrl3d ago

Something MacOS and Windows support natively would be a good start, it could grow from there.

Ringz3d ago

Looking at *all* my external drives now... that would be great.

tracker12d ago

I've used line-delimited, gzipped JSON for archive formats on several projects myself, which is a pretty good option... If I wanted more flexibility, would definitely consider SQLite.

I think this kind of structure would work well for a lot of things... especially if you're considering data sharding anyway.

testermelon3d ago

[1] https://www.loc.gov/preservation/resources/rfs/data.html

mort963d ago

acdha3d ago

pletnes3d ago

You can unzip the xlsx and read the xml inside. It’s not the worst format by far.

perching_aix3d ago

What would you reckon is the worst format? I'm very curious of your standards given this.

ray_v3d ago

It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!

llagerlof3d ago

I used SQLite for a few applications several years ago. One time, the database got corrupted and all the data was lost. That was the day I stopped using SQLite.

Also, the lack of enforced column data types was always a negative for me.

jjice3d ago

No matter the medium, backups are a must.

llagerlof2d ago

A hard lesson learned...

benhurmarcel2d ago

For column types there are STRICT tables now

llagerlof2d ago

Thank you!

justin662d ago

> the database got corrupted

What caused that?

llagerlof2d ago

I don't know why that happened, but one fine day I tried to open the file using the vanilla SQLite client, and it didn't open.

danborn262d ago

It is great to see SQLite getting this level of institutional recognition. The single file format makes archival storage incredibly straightforward compared to traditional database dumps.

lenwood2d ago

Just yesterday it occurred to me that it had been a while since I last saw an SQLite post at the top of HN.

2 more replies

fpj2d ago

xiaod2d ago

The operational complexity is worth comparing here. The migration path and schema evolution story often matter more than raw performance numbers for teams choosing between these options.

semiquaver2d ago

It certainly will be in the toolkits of data archeologists hundreds of years from now. Must be a weird feeling to create something so potentially long-lasting.

imrozim2d ago

I use postgresql for my startup but every time i needed a quick local testing i wish it was as simple as sqlite. No config just works.

infogulch2d ago

SQLite is remarkably versatile. Just a couple weeks ago an extension to do cross-process queues, streams, pub/sub etc in SQLite was released:

Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite | 327 points | 94 comments | https://news.ycombinator.com/item?id=47874647

Live notifications was one of the big missing pieces to implement whole apps on a sqlite backend, and now there's a decent solution.

amai2d ago

Which version of SQLite?

fragmede2d ago

Yes! Can it replace CSV, please?

butterNaN3d ago

(US)

GeorgeTirebiter2d ago

Now, if only the LoC would recognize the brilliance of the Fossil SCM ....

guelo3d ago

TeriyakiBomb2d ago

I think the concurrent write thing is not as much of an issue nowadays with the speed of NVMEs and WAL.

graemep2d ago

Firebird, maybe?

j / k navigate · click thread line to collapse