Boolean & multi-choice indices are just companion arrays where position i corresponds to a property of document i in the index: boolean for binary attributes (for example, whether the item has free shipping or not), or using a bigger integer space to encode more options, like say an int8 coupled with helper functions that check which bit is set, maybe for some set of 8 categories the items can be filtered by).
The “index” is just the serialized arrays backing the sparse matrix, the arrays backing the filters, and helper functions for decoding what the filter bits mean.
A query is then just applying the filters followed by performing the sparse matrix inner product and sorting.
It’s very basic, but allows you to heavily optimize it, whether optimizing for deletes, writes, certain heavily used filters, etc.
And you can of course add whatever fancy NLP stuff on top of or in place of the sparse matrix as well.
If you think of these as sparse row vectors (the columns correspond to all vocabulary entries), then you store them as a matrix where you stack on another row for each “document” in your data set.
Later on when you get a new “document” at query time, you transform it into the same bag of words vector format, and then an inner product between the matrix and the query vector corresponds to a type of relevance / similarity useful for sorting into a ranked order of results.
In practical situations you have to work harder, because you need more units of text than just words (such as n-grams), and the raw term counts usually need to be weighted (e.g. matching a 3-gram probably means more than matching a single word) or normalized (e.g. longer documents happen to have more words, but that doesn’t mean they are more similar), and you need to account for results that are historically more popular or results that are newer.
It’s a very simple approach to document search, but it works well and there are extensions that utilize word embeddings or models that predict rankings of results.
Once you get a system running with the term-document matrix, it is a nice platform for more advanced experimentation and machine learning feature development.