Create a functional index.
If you have pid,{name:’val’, others...}. And an index of name with a million John and one Jane. Good luck getting fast results.
Fast results: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=dd9370966b1528....
Postgres does actually keep statistics on json columns, but if you've got a functional index on the table and the query uses it then it doesn't matter if there is one "jane" and a million "johns". You're looking up a key in a btree index.
There are ways around the statistics issue in some cases, e.g. defining a functional index on a jsonb property will collect proper statistics.
* Extract the attributes you're interested in into their own columns, index these. With the extraction happening outside the database, this is the most flexible option.
* Similar to above, use a trigger to automatically extract these attributes.
* Also similar to above, used a generated column[0] to automatically extract these attributes.
* Create an index on the expression[1] you use to extract the attributes.
My use a JSON in PostgreSQL tends towards the first option. This works well enough for cases where documents are ingested and queried, but not updated. The last three options are automatic - add/change the JSON document and the extracted/indexed values are automatically updated.
[0] https://www.postgresql.org/docs/12/ddl-generated-columns.htm...
[1] https://www.postgresql.org/docs/12/indexes-expressional.html
There is always a trade off. If the column is important enough, then you are right, it should stand on its own, but then you lose the json flexibility. I personally almost always only use jsonb if I know I only care about that overall object as a whole, and rarely need to poke around to find an exact value. As a the grandparent comment mentions, if you do need a particular value, then it might be slower if your JSON records are too different (if you think about it, how can you calculate selectivity stats on a value if you have no idea how wide or different JSON records are?).