Next paragraph mentions TOAST and this byte is related to that. The low order bits (on little endian platforms) determine whether the value is stored inline (00, first 4 bytes are total length), is stored in TOAST table (11) or is shorter than 127 bytes (01 for even length, 10 for odd length, the total length is first byte >> 1). So for 0x25 you get 01, so length is 0x25 >> 1 = 18, which is that byte followed by "Equatorial Guinea".
Edit: the reason why endianness matters is that the same representation is also used in memory and the whole first word is interpreted as one length value. The toast tag bits have to be in first byte, which is most easily done as two highest order bits of that word on big endian. That means that it is placed in the two highest bits of the byte.
In this case the substring is part of the author's name. Such names are not at all uncommon.
You may have heard of him more recently with the Hexagonal Architecture approach.
Apparently might be Dang's UX and not against the Mod. ¯\_(ツ)_/¯
“The rumors are true!”
(Although less amusing, you could also just ask the IT guys and gals)
Is „tube“ on a blocklist as well?
> The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport. This document is intended to provide adequate detail to create a new implementation of the columnar format without the aid of an existing implementation. We utilize Google’s Flatbuffers project for metadata serialization, so it will be necessary to refer to the project’s Flatbuffers protocol definition files while reading this document. The columnar format has some key features:
> Data adjacency for sequential access (scans)
> O(1) (constant-time) random access
> SIMD and vectorization-friendly
> Relocatable without “pointer swizzling”, allowing for true zero-copy access in shared memory
Are the major SQL file formats already SIMD optimized and zero-copy across TCP/IP?
Arrow doesn't do full or partial indexes.
Apache Arrow supports Feather and Parquet on-disk file formats. Feather is on-disk Arrow IPC, now with default LZ4 compression or optionally ZSTD.
Some databases support Parquet as the database flat file format (that a DBMS process like PostgreSQL or MySQL provides a logged, permissioned, and cached query interface with query planning to).
IIUC with Parquet it's possible both to use normal tools to offline query data tables as files on disk and also to online query tables with a persistent process with tunable parameters and optionally also centrally enforce schema and referential integrity.
From https://stackoverflow.com/questions/48083405/what-are-the-di... :
> Parquet format is designed for long-term storage, where Arrow is more intended for short term or ephemeral storage
> Parquet is more expensive to write than Feather as it features more layers of encoding and compression. Feather is unmodified raw columnar Arrow memory. We will probably add simple compression to Feather in the future.
> Due to dictionary encoding, RLE encoding, and data page compression, Parquet files will often be much smaller than Feather files
> Parquet is a standard storage format for analytics that's supported by many different systems: Spark, Hive, Impala, various AWS services, in future by BigQuery, etc. So if you are doing analytics, Parquet is a good option as a reference storage format for query by multiple systems
Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?
Edit: What are some of the DLT solutions for indexing given a consensus-controlled message spec designed for synchronization?
- cosmos/iavl: a Merkleized AVL+ tree (a balanced search tree with Merkle hashes and snapshots to prevent tampering and enable synchronization) https://github.com/cosmos/iavl/blob/master/docs/overview.md
- Google/trillion has Merkle hashed edges between rows in order in the table but is centralized
- "EVM Query Language: SQL-Like Language for Ethereum" (2024) https://news.ycombinator.com/item?id=41124567 : [...]
While logical decoding is about WAL, it is not related to the recovery process. Logical decoding is a mechanism to convert the WAL entries back into the high-level operations that caused the WAL entries, for example for replication or audit.
I never entirely got it. Either your WAL is on more reliable media, or duplicated. If its just "easier" to write the WAL and faster to read off properly indexed state, ok, thats a local optimisation.
If your WAL is on the same filesystem behind a vendor specific RAID controller, you're still stuffed, if that RAID card dies.
It doesn't guarantee you don't lose data written during the crash, but it does guarantee you can get the database back into a usable state.
Logical decoding (which needs wal_level=logical which extends the WAL format with additional metadata) is about parsing the WAL for other purposes than performing the recovery (or physical replication, which is essentially the same thing as recovery, but performed on another instance of the same cluster). The name "logical decoding" is certainly intended to emphasize that there are other uses for that than logical replication, but these are not that different from logical replication on this level (get a stream of changed tuples in tables).
printf '{HEX STRING}' | xxd -r -p | xxd
Make sure to omit the leading `\x`. e.g., > printf '0a00000029416e746967756120616e64204261726275646107414709415447093032381d49534f20333136362d323a414713416d657269636173414c6174696e20416d657269636120616e64207468652043617269626265616e1543617269626265616e093031390934313909303239' | xxd -r -p | xxd
00000000: 0a00 0000 2941 6e74 6967 7561 2061 6e64 ....)Antigua and
00000010: 2042 6172 6275 6461 0741 4709 4154 4709 Barbuda.AG.ATG.
00000020: 3032 381d 4953 4f20 3331 3636 2d32 3a41 028.ISO 3166-2:A
00000030: 4713 416d 6572 6963 6173 414c 6174 696e G.AmericasALatin
00000040: 2041 6d65 7269 6361 2061 6e64 2074 6865 America and the
00000050: 2043 6172 6962 6265 616e 1543 6172 6962 Caribbean.Carib
00000060: 6265 616e 0930 3139 0934 3139 0930 3239 bean.019.419.029docker: Error response from daemon: create ./pg-data: "./pg-data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
>Something really important about tables which isn’t obvious at first is that, even though they might have sequential primary keys, tables are not ordered.
This was very surprising to read.
It is weird that "--data-checksums" isn't the default for new databases, even when it cost a bit in performance. Integrity should be more important than performance.
Here's a benchmarking exercise I found: https://www-staging.commandprompt.com/uploads/images/Command...
With a tidy summary:
> Any application with a high shared buffers hit ratio: little difference. > Any application with a high ratio of reads/writes: little difference. > Data logging application with a low ratio of reads/inserts, and few updates and deletes: little difference. > Application with an equal ratio of reads/inserts, or many updates or deletes, and a low shared buffers hit ratio (for example, an ETL workload), especially where the rows are scattered among disk pages: expect double or greater CPU and disk I/O use. > Run pg_dump on a database where all rows have already been previously selected by applications: little difference. > Run pg_dump on a database with large quantities of rows inserted to insert-only tables: expect roughly double CPU and disk I/O use.
Most often it is. But not always. There certainly are cases where speed is far more important than integrity in databases. I cannot think of a case where this would be true for a RDBMS or even a Document DB (Though MongoDB had different opinions on this...).
But e.g. redis as caching server, or memcached, or even these non-normalized data that I have in a PG that can be reproduced from other sources easily in case of corruption or stale-ness: it's fine to trade in integrity for speed there.
The issue is that page size caps row size (for on-row storage). Also, if you have a smart clustering index, larger pages can be more efficient use of index addressing. So it's a trade-off.
https://docs.oracle.com/en/database/oracle/oracle-database/1...
> Default value 8192
```const n = document.getElementById('nav-header');
document.addEventListener(
'click',
s => {
u.hidden ||
s.target === null ||
n === null ||
n.contains(s.target) ||
r()
}```
Above, in the same function, there exists the function `e.addEventListener('click', r);`, which is likely closer to what the author intended. This fires the 'click' event any time the page is clicked, which opens the nav menu when it shouldn't.I disagree. SQLite does a good job in uniting the 2 worlds: complex SQL queries with excellent data consistency and simple file(s). Although SQLite is for sure not the one size fits all solution.
Nor is Postgres. PG is surprisingly versatile. E.g. with some extensions can be used as key-value storage (hashtable), document database, time-series db and so on. And it works quite well. Beyond "good enough" for many use cases. Added benefit, aside from having to run only one db-server, is that you can mix it: part relational, part document, etc.
But the PG versions nearly ever get as good as focused, dedicated solutions get. Which makes sense if you think about it: a team developing a dedicated key-value storage that does that and only that, for years, will always produce a better key-value storage then one bolted onto a generic RDBMS.
A practical example was where we used ltree extension to store ever growing hierarchies. We needed access control over subtrees (so that the X report for John only includes the entities of Johns devision and lower). While it worked in PG, it turned out that "simply replacing" it with OpenLDAP, which had all this built in, made it faster, easier and above all easier to maintain.
(Enjoyed the post)