Okay, so looking at the first two tests - "Retrieve all documents" and "Get 1000 documents by ID" ...
If you switch the order around, does it make a difference to the benchmark? Because I suspect that the first test preloads all records into RAM, and the second test simply searches RAM, which is not what we usually do with SQLite. We don't cache all records before searching.
Switch those first two tests around, and lets see if it makes a difference.