Some years ago we commissioned a developer to make CultureObject[0], a free and open source WordPress plugin to make it easier to ingest collections data for display on the web. At the heart it's a glorified data importer, and many people just use the CSV mode to sync and import collections data.
It requires some dev effort - we've built an add-on which makes this easier but there's no denying that search, faceting and display needs knowledge of wordpress development.
Three years ago we then launched The Museum Platform[1] which is a more SaaS based model - we take away the need for dev skills and ask clients to just send us a CSV and any related media and we do the hard work. It's WordPress again but a modified version where we also facilitate storytelling and narrative around the ingested collections.
The interesting thing about this journey is that the requirement to "get a collection online" is apparently and theoretically easy. But the reality is it gets hard quite quickly as the need for search / filtering appears, and it gets harder still as scale comes into it. 1000 records is fine. 100,000 gets quite a bit harder.
There are also many subtleties - particularly with museum collections. "Location" of a record could be where it was collected, or where it is now, or where it's on display. Relational stuff is hard, as are taxonomies and authority terms. It's hard to generalise and it's hard to scale.
[0] https://cultureobject.co.uk/ [1] https://themuseumplatform.com/
We chose WordPress because of its ubiquity and power - plus it's insanely easy to host and use as a non technical editor, which (last time I looked) can't be said of Drupal.
That being said, I found it much, much easier to develop than WordPress.
It seems like the data storage / search / filtering aspects of your software would be really fun and interesting to develop flexible solutions to. The Wordpress aspects probably wouldn't be so fun to maintain, but it's always pick-your-poison when it comes to CMSs unless you develop your own in-house.
That being said, a collection CMS doesn't necessarily need to have all the plugins and doodads that a Wordpress site does. It could be something bare-bones and extensible that was written to be more tightly coupled to a layer that interpreted the underlying data structure. Just toying with the idea, maybe even something that flattened the data views of the collection into static webpages for deployment so that at least some of the indexing could be handled by naming conventions and directory structure without recourse to database searches.
The world could definitely use an open source kit along these lines, with a GUI backend that would let non-developers build their own table structure and search parameters, draw up some page layouts, and just generate a searchable site that collated CSV records with images.
Some of this actually reminds me of what HyperCard could do... it allowed some really interesting experiments with user-classified data. Like this, from 1989: https://core.ac.uk/download/pdf/225955134.pdf
Relational stuff is hard, as you say, but in a structure built around a collection it seems like you could come up with a DSL that defined which columns needed to relate to other tables (any column with repeating data, for instance), suggest making that column "normalized", and automatically generate a linked table.
One of my favorite small databases is https://hiregoats.com/ - it's a simple site showing goat herds for rent (for clearing brush in a sustainable way, etc.), monetized with at $35 listing fee and nothing else. There's no e-commerce, no attempt to insert the site into the transaction or funds flow, no bells and whistles. Certainly this doesn't scale to other niches where suppliers are less incentivized to pay a listing fee, but I'd love to see this kind of thing be more common, and incentivize people to curate.
Or if you want to go super-niche, Panorama is still around, and (they say) the longest-running Mac software developer apart from Microsoft. https://www.provue.com
Either one makes it easy to build a database+interface.
Odd pricing though = pay in advance credits. Ummm, not something I'd like to use for work when I'm in the middle of an important analysis with a deadline and I (inevitably) run out of credits and have to start faffing about with in-app purchases. Maybe its not that bad and I'm being unfair.
- DBs platforms (Best for more advanced DB) : Airtable, getgrist.com
- wikis+DB platforms (Best for building a site around the DB) : notion.so, coda.io
- Airtable/GSheet publishing (Best for simple/custom UI) : glideapps.com, siteoly.com
- Bookmarks/Collections (Best for links/References) : Zotero (online groups), are.na
- List sharing (Best for open collaboration?) : listium.com, (ranker.com ?)
- BI platforms (Best for advanced filters/charts) : polymersearch.com, Google Data Studio
- Data Set Hosting (Best for downloading?) : data.world, kaggle.com
All these allow publishing, and some collaboration
I spend almost all of my time thinking about this class of problems and hanging out with other people who do, and sadly it's vanishingly rare to run into anyone outside of academia who's trying to use the classic semantic web stack (RDF an suchlike) to build this kind of thing.
the commercial community of practice is small for sure.
Excel doesn’t cover the publishing and discovery aspect. It is absolutely atrocious from a machine usability and schema perspective, nevermind performance, etc.
Even if you think excel does address those, I think the shortcomings of the format should rule it out. It is better to have a more powerful tool, and fix the usability aspects, rather than trying to proverbially rub glitter on what amounts to a turd of a format.
It's very simple. If your small database was about cars, your structure might look something like this:
database/
grammar/
engine.grammar
interior.grammar
things/
model3.car
camry.car
The `grammar` files are written in a Tree Language called Grammar. Those are your schema files. You basically create a new syntax-free plain text "language" for storing your data, in this case 1 "car" file per model of car.It was a pipedream of mine until the M1's came out. Those changed everything, because then it became fast enough to actually do it.
We have a new release coming out soon with a new query language that will change everything. Here is the source code: https://github.com/breck7/jtree/tree/main/treeBase
Amen. I'm surprised the post doesn't mention sqlite3 WASM/JS (https://sqlite.org/wasm/doc/trunk/about.md). That, paired with an easy-to-use faceting library, would go a long way.
Imagine if there was a niche search engine for everything, and the search engine was customized for that niche.
I think the main problems here are:
- Data format and ingestion - Domain-specific indexing/relevance
Most data is super messy and it not accessible through nice APIs, which presents a problem. You might need custom ingestion for each niche and it's pretty likely you'll need some rules to standardize data from multiple sources, neither of which seems easy to generalize and automate because they're very domain-specific.
The other part to this is indexing/relevance so the search feels good to use. Some fields are obviously going to be more important than others and people are going to want to utilize search for things that are to predict ahead of time.
To use the authors example of artists in Brooklyn, people might want to search for artists near them. Now you have to gather location data, format it, ingest it, index it and add it to the search UI.
The fact that adding another field to index on is a vertical integration adds a lot of overhead.
All of this stuff in isolation is not difficult, but when you put it together it becomes quite a lot of work that generally isn't easily scalable.
https://assets.amazon.science/c4/11/de2606884b63bf4d95190a3c...
The "small database" in question is, well, an HTML page. It can be shared and passed around by selecting the portions of it that you need and pressing Ctrl+C/Ctrl+V. Search is accomplished by the browser using Ctrl+F. Collaboration can take many forms - wikis, comments, forums, live editing. Links between databases are what URL links are. The database that OP is looking for is a page of text (for unstructured data) or somewhat structured solutions like CSV, JSON, or YAML.
Now, yes, there are certain participants on the WWW who make poor web design choices that cause agreed-upon functionality to break. E.g. unnecessary pagination or accordions breaking Ctrl+F, not offering data for download, not having useful URL paths etc.
Substack is an interesting example. It's great for written content with a few images, which mostly looks the same everywhere. But it lacks great customisation features that I think a database would need, because that stuff is hard to do.
If I had to propose a solution, it would be this: if you want to do a small database, do it. Experimentation in the cyberspace is very cheap. These days you have lots of resources for everything online. It can be intimidating, and can lead to analysis paralysis. I'm supposed to be a professional developer and still struggle with that. But one thing that has helped me a lot recently is to try stuff, see if it works, if it fails, ask questions (to either real people or ChatGPT/Copilot, Copilot is especially valuable to get in a "just keep writing, editing comes later" mood). It's not always fun, in fact it can be quite frustrating, but that's how things are.
In the end, this is about decentralisation and you can't have proper decentralisation if you don't also decentralise the skills, the know-how. For example, there has been a lot of talk about Mastodon as a decentralised alternative to Twitter. And it is one. But if you simply go from being a user on Twitter to being a user on Mastodon, well you don't regain much control. On the other hand if you try running a small instance, even just a local instance to see how it works, or maybe add a few feature to your preferred client (it can be code, but it could also be helping translation, or maybe a color scheme (you wouldn't believe how many color scheme are barely usable when you're colorblind)), well then you start being in control.
In the meantime I've made a big update to the Airtable with links to tools, examples and further reading:
https://airtable.com/shrYY94GrqVB4HUsi/tblHPrdomiPbLpod6/viw...
What’s missing is the added search + UI capabilities.
I think about saas ideas a lot and this is actually quite a common one (though I’m generally thinking of a specific niche) —- enabling people to craft and expose datasets would surely be a great startup.
Even with advanced views offered by tools like ERDLab.io it is a pain in the ass to collaborate on large schemas at various stages of development.
We just need to somehow tie it together so anyone can explain their use case, and show an example of the data in plain english, then lock in a schema and feed everything in.
For more complex data to be shared, maybe it can be csv/md/mdx shared over git as well?
It can have stable url and be searchable from github, search engines, and 3rd indicies
Funnily enough, a friend and I have been building https://Trayja.com, a tool which does this exact thing, with a focus on the "community" aspect. There's a huge amount of wisdom in communities, whose value could be multiplied if it would be aggregated in a structured, indexable, searchable way. This article articulated so much of what I've been trying to explain about my project.