Every time new issues cropped up we would put new data in the test data designed to reproduce it. Every edge case we would run into.
It provided so much confidence because it would catch and trigger so many edge cases that testing with mocks or by hand would miss.
Edit: also, it’s great for test/dev environments. You don’t have to worry about losing important data or filling new environments with data. Just start with the full test data and you’re good to go. It’s got stuff for all the corner cases already. Something got screwed up? Blow it away and reload, it’s not precious.
at my company they just told us to stop reporting edge cases. much easier, much cheaper.
- we create a template database using the migrations
- for *every* integration test we do `CREATE DATABASE test123 TEMPLATE test_template;`
- we tune the PostgreSQL instance inside Docker to speed up things, for exampling disabling synchronous_commit
On a successful test, we drop the test123 database. On a failed test, we keep the database around, so we can inspect it a bit.The really great thing about this approach (IMHO), is that you can validate certain constraint violations.
For example, exclusion constraints are great for modelling certain use cases where overlapping ranges should be avoided. In our (go) code, the test cases can use the sqlstate code, or the constraint name to figure out if we hit the error we expect to hit.
This approach is pretty much as fast as our unit tests (your mileage may vary), but it prevents way more bugs from being merged into our codebase.
> Testcontainers for .NET is a library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions. The library is built on top of the .NET Docker remote API and provides a lightweight implementation to support your test environment in all circumstances.
Edit: Ok, example helps. https://dotnet.testcontainers.org/examples/aspnet/
I can declare docker infrastructure from my code, right?
Our overall strategy is to create a master "test" DB with a test dataset, and for each test, copy this master DB to a test specific DB (CREATE DATABASE <testdb> TEMPLATE <master>) so that tests can run in parallel without interfering with each other and without the significant overhead of a "from scratch" DB initialization.
For schema migrations, we build the "branch/PR" version and the "main" version, then we check that 'init DB with "main" + migration with the "branch/PR" version' results in the same schema as 'init the DB directly with the "branch/PR" version' using apgdiff.
This strategy could probably be extended to migrating from every older version by building each tag, but we don't have that need.
We could also probably improve checks on the data itself however as for now, we only check the schemas.
Few things to note:
* it's still possible to run the tests outside of docker and use a local DB instead with some light setup (it's faster than running everything in docker when developing)
* docker argument --tmpfs <dir/of/db> is quite good, assuming you have enough ram for your dataset
* few configuration tweaks on the DB, like max connection might be necessary.
Overall, we are quite happy with this setup as it permits to implement end to end integration tests quite easily without spending too much time mocking dependencies.
As a general pattern, I find instantiating dependencies internal to your service (like a DB or Queue) to be the way to go, with mocking only for external dependencies (like external APIs) or exceptionally to reach a specific code branch (specially error handling sections).
Or along similar lines you could divide it such that developers can test things locally on their machines with SQLite, but once it gets pushed into CI (and passes code review etc.) it's tested against the heavy db.
Since OP mentioned DBT (kind of weird, hopefully, it's at least DBT2, since DBT is very old), they mean to test the productivity of the system rather than correctness of some queries (typical tests that deal with workloads similar to DBT2 are, eg. pgbench). Running pgbench over a database setup in a container will tell you nothing / might just confuse you, if you don't understand the difference between how database is meant to be setup and what happens in container.
Took about a week of duplicating happy path tests, investigating gaps, and switching inputs/assertion types to get everything passing, but less than a week later we had our first verifiable test failure.
I think the answers would depend on the types of tests that the term "this" encompasses. From how I understand it, calling something a unit test or an integration test depends on the context of what is is being tested. For example, if a developer is writing a unit test for a HTTP handler, and the handler implementation includes calls to an external database, then one would have to use a mock for the database, in order for the test to be deemed a true unit test [1]. If the developer is writing an integration test for the same HTTP handler, then the database would have to be a real database implementation [2].
On other hand, if the developer were testing SQL queries for syntactical or logical correctness against a database, these tests would want to use a real database implementation. Note that though the test uses a real database, it is still a true unit test [3]. Additionally, note that using a mocked database here would not serve the purpose of the tests, which is to catch syntactical and logical errors against a database engine. This can, of course, only be achieved by using a real database—or, if you insisted on using a "mock", then, by implementing an entire SQL engine, with the exact same quirks as the real database's engine, inside the mock!
On the original question:
> Do you do this in place of unit tests (where you have to mock/stub the DB interactions) or do you do both?
I guess the answer would be: It would depend on the objectives of and types of tests. Do both of them, because some tests, such as unit tests on the HTTP handler, would use use mocks, while other tests, such as the SQL query correctness tests, would use the real database.
[1] A true unit test is one that has no external interactions, neither directly nor transitively, besides interactions with the system under test (SUT). The SUT here is the HTTP handler.
[2] An integration test should include external system interactions, if any. That's what integration means.
[3] The SUT is each SQL query and the its interaction with the database. There are no interactions in the test with systems outside the SUT, so it is still a true unit test.
jokes aside, redshift is based on pg^1, you can try an older version to get some semblance of it running locally.
1. https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-an...
I have seen teams give each developer a personal schema on a dev cluster, to ensure their Redshift SQL actually works. The downside is that now your tests are non-local, so it's a real tradeoff. In CI you probably connect to a real test cluster.
If you use DuckDB to run the tests, you can reference those files as if they were tables (select * from 'in.parquet'), and the tests will run extremely fast
One challenge if you're using Spark is that test can be frustratingly slow to run. One possible solution (that I use myself) is to run most tests using DuckDB, and only e.g. the overall test using Spark SQL.
I've used the above strategy with PyTest, but I'm not sure conceptually it's particularly sensitive to the programming language/testrunner you use.
Also I have no idea whether this is good practice - it's just something that seemed to work well for me.
The approach with csvs can be nice because your customers can review these files for correctness (they may be the owners of the metric), without them needing to be coders. They just need to confirm in.csv should result in expected_out.csv.
If it makes it more readable you can also inline the 'in' and 'expected_out' data e.g. as a list of dicts and pass into DuckDB as a pandas dataframe
One gotya is SQL does not guarantee order so you need to somehow sort or otherwise ensure your tests are robust to this
If you build your unit tests so that a thread will not commit until it's finished, or clear it before/after, it's pretty ideal for that. You can feed in data for that test and it won't be visible anywhere else.
Potentially if you scripted your table creates you could add an additional rule that GLOBAL TEMPORARY TABLE gets added to any other tabledefs but only for unit tests. Or just update both in parallel.
The other useful use-case I've found for this is that you can do an unlimited-size "WHERE myId IN (:1, :2 ...)" by doing a GTT expressed as "INNER JOIN myId = myGtt.myid", and this has the advantage of not thrashing the query planner for every literal sqltext resulting from different lengths of myList (this will fill up your query cache!). Since it's one literal string it always hits the same query plan cache.
I am told this has problems in oracle's OCI when using PDBs, apparently GTTs will shit up the redo-log tablespace (WAL equivalent). But so do many many things, PDBs are apparently a much different and much less capable and incredibly immature implementation that seems essentially abandoned/not a focus from what I've been told. I was very surprised having originally written that code in regular Oracle and not having had problems but PDBs just aren't the same (right down to nulls being present in indexes!).
I used this trick (join temporaryFoo instead of where foo in ...) in production fifteen years ago, using MySQL. The gain was really astonishing. Several instructions can be optimized using joins on specialty craft tables (I know of LIMIT for instance).
This is one of the worst drawbacks of orm everywhere: nobody even seems to think about those optimisations anymore.
* reuse the same Spark session throughout the test suite
* Set shuffle partitions to 2 (instead of default which is 200)
* Use dependency injection to avoid disk I/O in the test suite
* Use fast DataFrame equality when possible. assertSmallDataFrameEquality is 4x faster than assertLargeDataFrameEquality. Some benchmarks here: https://github.com/MrPowers/spark-fast-tests#why-is-this-lib...
* Use column equality to test column functions. Don't compare DataFrames unless you're testing custom DataFrame transformations. See the spark-style-guide for definitions for these terms: https://github.com/MrPowers/spark-style-guide/blob/main/PYSP...
Spark is an underrated tool for testing SQL. Spark makes it really easy to abstract SQL into unit testable chunks. Configuring your tests properly takes some knowledge, but you can make the tests run relatively quickly.
You can structure your code to read from paths when run in the production environment, but inject DataFrames you build in memory for your test suite. Spark is designed to read multiple files in parallel, so it's not optimized to read a single tiny file. That's why it's best to avoid I/O in Spark test suites whenever possible.
[0] https://www.postgresql.org/docs/current/queries-with.html#id...
A common table expression, or CTE, is a temporary named result set created from a simple SQL statement that can be used in subsequent SELECT, DELETE, INSERT, or UPDATE statements.
I find this helps a heck of a lot with maintainability + debugging as well
1. Make sure you're using a shared session between the test suite. So that spin-up only has to occur once per suite and not per test. This has the drawback of not allowing dataframe name reuse across tests, but who cares.
2. If you have any kind of magic-number N in the SparkSQL or dataframe calls (e.g coalesce(N), repartition(N)) change N to be parameterized, and set it to 1 for the test.
3. Make sure the master doesn't have any more than 'local[2]' set. Or less depending on your workstation.
Not much more to say, just observing, sorry if this is irrelevant commentary.
The title is "Ask HN: How do you test SQL"
declare @debug bit = 1;
;with cte1 as (
select
@debug AS Debug1,
...
),
cte2 as (
select
@debug AS Debug2,
... from cte1
),
cte3 as (
select
@debug AS Debug3,
... from cte2
)
select
-- dump intermediate if debug
(
select * from cte1 where Debug1=1 for xml raw ('row'), root ('cte1'), type
)
,(
select * from cte2 where Debug2=1 for xml raw ('row'), root ('cte2'), type
)
,(
select * from cte3 where Debug3=1 for xml raw ('row'), root ('cte2'), type
)
-- final results
,(
select ...
from cte3 for xml raw ('row'), type
)
for xml raw ('results'), type;Could you or anyone else on the post provide an example?
if you are storing fully denormalized concrete data about the value of salary/medical/retirement both pre- and post-tax that was actually paid to each pay period (because this can vary!), then you can define a view that does salary-per-employee (taxable, untaxable, etc), and then a view that rolls up employees-per-department. And you can write unit tests for all of those.
that's a super contrived example but basically once group aggregate or window functions and other complex sub-queries start coming into the picture it becomes highly desirable to write those as their own views. And you can write some simple unit tests for those views. there are tons of shitty weird sql quirks that come from nullity/etc and you can have very weird specific sum(mycol where condition) and other non-trivial sub-subquery logic, and it's simple to just write an expression that you think is true and validate that it works like you think, that all output groups (including empty/null groups etc) that you expect to be present or not present actually are/aren't, etc.
I'm not personally advocating for writing those as CTEs specifically as a design goal in preference to views, personally I'd rather write views where possible. But recursive CTEs are the canonical approach for certain kinds of queries (particularly node/tree structures) and at minimum a CTE certainly is a "less powerful context" than a generalized WHERE EXISTS (select 1 from ... WHERE myVal = outerVal) or value-select subquery. it's desirable to have that isolation from the outer SQL query cursor imo (and depending on what you're asking, it may optimize to something different in terms of plan).
Writing everything as a single query, where the sub-sub-query needs to be 100% sure not to depend on the outer-outer-cursor, is painful. What even is "DEEP_RANK()" in the context of this particular row/window? If you've got some bizarre (RANK(myId order by timestamp) or whatever, does it really work right? Etc. It's just a lot easier to conceptually write each "function" as a level with its own unit tests. Same as any other unit-testable function block, it's ideal if it's Obviously Correct and then you define compositions of Obvious Correctness with their own provable correctness.
And if it's not Obviously Correct then you need the proof even more. Encapsulate whatever dumb shit you have to do to make it run correctly and quick into a subquery and just do a "inner join where outerQuery.myId = myView.myId". Hide the badness.
I used this exact same method. Not only does it help me but those who come after trying to understand what's going on.
Quick web search confirms suspicions, it is not easy
The other tests from your link are just silly. You did not write code to terminate or to provide an answer for all inputs, you wrote code to provide the right answer.
To test some HTML for instance, think about what information you want the page to convey, load up a browser, and check that the info is displayed. Easy as that.
Definitely not an expert. But to my eyes checking for non-termination seems very close to things that happen in SQL all the time.
For example, a parent query joins two subqueries that both map data, and then maps on that data. To test, I could just test one specific scenario with specific values in all columns. That would be more of a concrete test case.
But I might want something more robust checking the outer bounds of acceptable data (like a min and a max). That seems much closer to the non-termination test.
Of course, I'm open to being corrected on this.
SQL is far and away understood as a declarative language for what data you want out of a relational database. I challenge you to find any literature that does not describe it as a declarative language.
Now, can we munge definitions and pull in an odd true scotsman argument about it not being a "true declarative" language? I mean, yeah. But, this is like arguing that LISP is not a functional language by some specific modern view of that term. Certainly true, but far from useful. And almost certainly not what anyone you would talk to expects from those terms.
"That is, it is not generating anything." if you speak about code generation like code behind in VS, it has nothing do with the paradigm. If you speak about underlying technology, it does not matter. In case of declarative languages, the framework determines what the language is capable of. And that's the point of them, they are generalizing the solution for common problems. The browser do a lot of things in the background when it draws a rectangle the same way how SQL server is doing a lot things when you execute a query, but these things have nothing to do with programming paradigm, which can be understood without computers.
Imperative: cooking recipe
Functional: assembly line in a factory
Declarative: your assistant
(compare the real world examples to the things i mentioned)
> SQL is far and away understood as a declarative language for what data you want out of a relational database. I challenge you to find any literature that does not describe it as a declarative language.
Ok, and I challenge you to compare HTML and SQL. Are they the same by nature. Are the steps, the approach of problem solving is the same? Because if SQL is declarative, than it is really similar to HTML
> Now, can we munge definitions and pull in an odd true scotsman argument about it not being a "true declarative" language? I mean, yeah. But, this is like arguing that LISP is not a functional language by some specific modern view of that term. Certainly true, but far from useful. And almost certainly not what anyone you would talk to expects from those terms.
We are speaking about semantics and taxonomy here. These kind of a questions are always debatable, so i can't really say anything about this, because you are speaking of ways of discussion. What i'm saying that my taxonomy makes the most sense to me, and if someone truly can challenge it, it's gonna make my mind change, but "literature mentions it somehow else" is not an argument makes me change my mind.
SQL : CSV :: GraphQL : JSON :: React : HTMLSELECT CASE WHEN employee.type = 'contract' THEN salary CASE WHEN employee.type = 'full-time' THEN salary + benefit_costs END CASE FROM employee
Tell me how is this declarative?
-SQL is functional -CSV is not a language, it's a data format -GraphQL, im not familiar -JSON is literally executable javascript code, arguable -React is a javascript framework, binding is a declarative nature (if it has it, i dont know it too well), but it is not a language, it's a framework -HTML is absolutely, 100% declarative, yes
On the other hand, the details of query execution are left to the planner and optimizer.
What's the case that it's functional, but not declarative?
But the part that is declarative is sure difficult to test.
DBT, specifically, DBT-2 is a suit of tests designed to benchmark a database system. These tests aren't interested in, eg. correctness of an application that is using the database. They are meant to be testing the system as a whole by modeling some sort of an "average business" and defining some sort of an "average business operation" and estimating how many of such operations can a particular deployment of a system perform.
Such tests are rarely of any interest to application developer, and are more geared towards DBAs who execute such tests to estimate the efficiency of a system they deploy or to estimate the amount of hardware necessary to support a business.
MySQL DBT2 suit: https://dev.mysql.com/downloads/benchmarks.html
PostgreSQL DBT2 suit: https://wiki.postgresql.org/wiki/DBT-2
Those tools are typically modeled on TPC-B... And, it would require a separate discussion to describe why these tests are obsolete and why there isn't really any replacement.
----
However, from the rest of your question it seems that you may use DBT acronym in some other way... So, what exactly are you testing? Are you interested in performance? A benchmark? Schema correctness? Are you perhaps trying to simply test the application that is using a SQL database and you want to avoid dealing with the database setup as much as possible?
Well... I don't know anything about this tool, but from cursory reading, this is the correctness kind of testing that is meant to examine whether the way you defined the schema actually leads to the results you expect in the data. Well... it's not an interesting kind of testing, at least not for me. So, I don't know much about it. Also, while, obviously, I'm not familiar with this newer tool, I wouldn't trust much someone who reused the name of an existing tool to do something completely unrelated in the same domain. It just feels like these people don't have a very good familiarity with the subject because they caused this confusion.
Actually happened to me twice (different distros at different times did include Docker, the container management tool as "docker" while more commonly naming it something else, like "docker-engine").
As for dbt. I mean... imagine you are selecting the name for your company, that's in SQL business, and you don't even bother to type into your favorite Web search engine "dbt sql"... I mean, it's definitely a business about DataBases, so, certainly a 3-letter acronym with "D" and "B" in it in that order would be truly unique, right?
I'd bet is that what happened is that these people, just like you, thought "I don't know what that is, so it must be unimportant", and they never stopped to consider that they might not know something important about the field they are in.
Best ideas IMO (no particular order):
- make SQL dumber, move logic that needs testing out of SQL
- use an ORM that allows composing, disconnect composition & test (ie EF for .NET groups, test the LINQ for correct filtering etc, instead of testing for expected data from a db) (I see this has already been recommended elsewhere)
* edited formatting
1. Moving logic out of SQL can break transactionality and hurt performance (more data leaves the DB)
2. ORMs hide the SQL from you, making all sorts of other things harder
My favoured approach is to test my application against a real local database, built from a fresh snapshot each time.
I do not change my application code to make testing easier. The application code is optimised for maintainability - so simplicity and ease of reading.
Those sorts of things are programmed in EF by LINQ queries. All I'm saying is, test the LINQ queries the same way you would test non-EF LINQ queries. LINQ doesn't require a database.
Kinda, but personally I describe as using LINQ queries. The dbcontext just isn't hooked up. It's a method that takes in an IQueryable<T> (there's the interface I suppose) and outputs a filtered IQueryable<T>. The unit test (see my next response) provides a test collection and expects a certain result.
> and integration tests
No, unit tests
Here's [2] a slide deck by David Wheeler giving an introduction into how it works.
[2] https://www.slideshare.net/justatheory/unit-test-your-databa...
I'm using it w/ supabase, and it works really well.
They have a small doc on it that's a better primer than the pgtap docs: <https://supabase.com/docs/guides/database/extensions/pgtap>.
Pretty easy to get started, I'm doing something like this in a Makefile.
supabase_tests := $(call rfind, supabase/tests/*_tests.sql)
DB_URL := "postgresql://postgres:postgres@localhost:54322/postgres"
test-supabase: $(supabase_tests)
@echo "Testing supabase..." && \
echo "${supabase_tests}" && \
psql -f supabase/tests/setup.sql ${DB_URL} && \
$(foreach t,$(supabase_tests),\
echo "Test: $(t)..." && psql -f $(t) ${DB_URL} $(__EXEC)) && \
psql -f supabase/tests/teardown.sql ${DB_URL}Guide is here: https://supabase.com/docs/guides/database/testing
When the tests pass, we can change from DuckDB to Spark. This helps decouple testing Spark pipelines from the SparkSession and infrastructure, which saves a lot of compute resources during the iteration process.
This setup requires an abstraction layer to make the SQL execution agnostic to platforms and to make the data sources mockable. We use the open source Fugue layer to define the business logic once, and have it be compatible with DuckDB and Spark.
It is also worth noting that FugueSQL will support warehouses like BigQuery and Snowflake in the near future as part of their roadmap. So in the future, you can unit test SQL logic, and then bring it to BigQuery/Snowflake when ready.
For more information, there is this talk on PyData NYC (SQL testing part): https://www.youtube.com/watch?v=yQHksEh1GCs&t=1766s
Fugue project repo: https://github.com/fugue-project/fugue/
Straight TDD with spark is perfectly fine if you know what you're doing. I'm not saying it's easy or there's an easy guide somewhere, but it's possible.
If you're using Pyspark via the API, it's likely an incredibly important part of your process.
Our CICD platform and their owners get unhappy if we spawn an ad hoc spark session for testing purposes.
There is also a general expectation that unit tests are self contained and portable. So you could execute them in mac, linux, and arm ISA without much effort.
Another point was that we need to make this mocking or test setup easy because data scientist and ML Modellers are the most important persona who needs to write these tests ideally.
So mocking the data source with an abstraction layer and passing pandas dataframes, worked reasonably well for our use case.
But I find myself running one given test, making some code changes, and then wanting to run it again, over and over. Instantiating a local spark session takes several seconds every iteration. Enough for me to often want to "alt tab" into something else instead of waiting. It's very disruptive.
I did not know about Fugue but will definitely give it a try. Looks almost too good to be true.
Stored procedures are a different beast though. Having significantly struggled to debug stored procedures running in MSSQL on a Macbook (on Windows SQL Management Studio lets you set breakpoints, on Mac you're SOL), if I was building an application based on them I'd definitely try to spin up some kind of testing framework around them. I guess what I'd probably do is have a temporary database and some regular testing framework that nukes the db, then calls the stored proc(s) with different inputs and checks what's in the tables after each run. Sounds slow and clunky?
Good news! Since SSMS18 you're SOL on Windows too, as Microsoft completely removed that feature :)
Sounds like an opportunity though…
For my own use-cases, I usually test this at the application level and not the DB level. This is admittedly not unit-testing my SQL (or stored procs or triggers) but integration-testing it.
I follow those steps on my pipeline:
Every time I commit changes the CI/CD pipeline follow those steps, on this order:
- I use sqitch for the database migration (my DB is postgresql).
- Run the migration script `sqitch deploy`. It runs only the items that hasn't been migrated yet.
- Run the `revert all` feature of sqitch to check if the revert action works well too.
- I run `sqitch deploy` again to test if the migration works well from scratch.
- After the schema migration has been applied, I run integration tests with Typescript and a test runner, which includes a mix of application tests and database tests too.
- If everything goes well, then it runs the migration script to the staging environment, and eventually it runs on the production database after a series of other steps on the pipeline.
I test my database queries from Typescript in this way:
-in practice I'm not strict on separating the tests from the database queries and the application code, instead, I test the layers as they are being developed, starting from simple inserts on the database, where I test my application CRUD functions that is being developed, plus to the fixtures generators (the code that generate synthetic data for my tests) and the deletion and test cleanup capabilities.
-having those boilerplate code, then I start testing the complex queries, and if a query is large enough (and assuming there are no performance penalties using CTE for those cases), I write my largue queries on small chunks on a cte, like this (replace SELECT 1 by your queries):
export const sql_start = `
WITH dummy_start AS (
SELECT 1
)
export const step_2 = `${sql_start},
step_2 AS (
SELECT 1
)
`;
export const step_3 = `${step_2},
step_3 AS (
SELECT 1
)
`;
export const final_sql_query_to_use_in_app = ` ${step_3},
final_sql_query_to_use_in_app AS(
SELECT 1
)
SELECT \* FROM final_sql_query_to_use_in_app
`;Then on my tests I can quickly pick any step of the CTE to test it
import {step_2, step_3, final_sql_query_to_use_in_app} from './my-query';
test('my test', async t => {
//
// here goes the code that load the fixtures (testing data) to the database
//
//this is one test, repeat for each step of your sql query
const sql = `${step_3}
SELECT * FROM step_3 WHERE .....
`;
const {rows: myResult} = await db.query(sql, [myParam]);
t.is(myResult.length, 3);
//
// here goes the code that cleanup the testing data created for this test
//
});
and on my application, I just use the final query: import {final_sql_query_to_use_in_app} from './my-query';
db.query(final_sql_query_to_use_in_app)
The tests start with an empty database (sqitch deploy just ran on it), then each test creates its own data fixtures (this is the more time consuming part of the test process) with UUIDs as synthetic data so I don't have conflicts between each test data, which makes it possible to run the tests concurrenlty, which is important to detect bugs
on the queries too. Also, I include a cleanup process after each tests so after finishing the tests the database is empty of data again.For sql queries that are critical pieces, I was be able to develop thounsands of automated tests with this approach and in addition to combinatorial approaches. In cases where a column of a view are basically a operation of states, if you write the logic in sql directly, you can test the combination of states from a spreadsheet (each colum is an state), and combining the states you can fill the expectations directly on the spreadsheet and give it to the test suites to run the scenarios and expectations by consuming the csv version of your spreadsheets.
If you are interested on more details just ping me, I'll be happy to share more about my approach.
There is another pattern too, when I implemented a RLS based multi-tenancy with RBAC support, which needed an relatively large sql codebase and needed to be battle tested because it was critical, I've splited a big part of the sql code in a lot of sql functions instead of views to test the code units or integrations (using something similar to dependency injection but for the data, to switch the tenant RBAC's contexts), because for the sql functions I can pass different Postgresql's Configuration Parameters to test different tenants for example.
Plenty of constraints, uniques and foreign keys and not nulls. Enum types.
Visuals, dump to csv and plot some graphs. Much easier to find gaps and strange distributions visually.
Asserts in DO blocks, mostly counts being equal.
Build tables in a a _next suffix schema and swap when done.
Never mutating the source data.
Using psqls ON_ERROR_STOP setting.
Avoid all but the most trivial CTEs, preferring intermediate tables that can be inspected. Constraints and assertions on the intermediate tables.
“Wasting” machine resources and always rebuilding from scratch when feasible. CREATE TABLE foo AS SELECT is much simpler than figuring out which row to UPDATE. Also ensures reproducibility, if you’re always reproducing from scratch it’s always easy. State is hard.
Overall i’m quite happy with the workflow and very rarely do we make mistakes that unit tests would have caught. Our source data is complex and not always well understood (10+ years of changing business logic) so writing good tests would be very hard. Because we never touch the raw source data any errors we inevitably make are recoverable.
This talk by Dr Martin Loetzsch helped a lot: https://youtu.be/whwNi21jAm4
I also have a few 'test' queries that insert there results into a 'test_results' table. Most of the queries check the cardinality of the table since to me wrong cardinality is where the biggest errors come from. I do something like :
insert into test_table
select case when count(*) = count(distinct users) then 'pass' else 'fail' end as result, 'test_cardinality_temporary_table_a' as test_name from temporary_table_a
You can hook up dbt tests to your CI and Git(hub|lab) for data PRs.
Depending on your needs, you can also look into data observability tools such as Datafold (paid) or re_data (free)
OP (or others) - If you've used dbt tests, I'm curious where it fell short? Tt doesn't cover everything, but it's pretty good in my experience.
Basically, treat the query and database as a black-box for testing like you would another third party API call.
I would strongly suggest having a layer of code in your application that is exclusively your data access and keeping any logic you can out of it. Data level tests are pretty onerous to write in the best circumstances and the more complexity you allow to grow around the raw SQL the worse of a time you'll have - swapping out where clauses and the like dynamically is a cost you'll need to eat, and sometimes having a semi-generic chunk that you reuse with some different joins can be more efficient than writing ten completely different access functions with completely different internal logic so judgement is required.
At the end of the day a database is like any other third party software component - data goes in, data comes out... the nice thing is that SQL is well defined and you've got all the definitions so it's easier to find the conditional cases you need to really closely tests... but databases are complex beasties and it'll never be easy.
Overall data integrity is hard.
I would love testing to work.
Have set up and maintained several unit test suites in Jest.
Wrote several large e2e test suites in Cypress.
I don't think anyone won time from simply having a manual checklist and testing manually.
Maybe me and my former teammates are doing it wrong. Talking 8+ teams, from corporate to startup.
But loved de proven wrong. E2e def. catched most issues.
Also testing with a checklist by human is easy to outsource to cheap labor. Whereas testing engineers are pretty expensive
not a unique problem with sql, btw.
basically you cannot test queries against a big database. you just have to hope for the best.
This is easier if you have the same input every time the tests run, like a frozen database image, because then you can basically have snapshot tests.
It's a lot faster and easier than dealing with containers and the like.
Example:
db = Fake().expect_query("SELECT * FROM users", result=[(1, 'Bob'), (2, 'Joe')])
Then you do:
db.query("SELECT * FROM users")
and get back the result.
In Python if you do this in a context manager, you can ensure that all expected queries actually were issued, because the Fake object can track which ones it already saw and throw an exception on exit.
The upside of this is, you don't need any database server running for your tests.
update: This pattern is usually called db-mock or something like this. There are some packages out there. I built it a few times for companies I worked for.
And that can be done by dumping the database (possibly verifying the content of that dump), taking a backup, restoring the backup to a fresh container, then comparing dump of that freshly restored database to the one you took at the start.
I dont know the technical detail of how to set it up, it was already setup when I worked there
But basically, we wrote SQL script that included statements to
1. create the db structure, tables or views
2. insert statement to enter test data (you can insert corner cases etc..)
3. ran the function or procedure
4. ran an assert to confirm if results are to our expectation
test script were ran by the CI/CD process
I ought to produce unit tests that prove that tuples from each join operation produces the correct dataset. I've only ever tested with 3 join operations in one query.
For a user perspective, I guess you could write some tooling that loads example data into a database and does an incremental join with each part of the join statement added.
You can achieve the same thing with "docker commit"-ing data into docker images of your dB engine of choice, and firing your queries on them, but that only really works with smaller datasets.
https://github.com/allaboutapps/integresql
disclaimer: author
On the back on that professional use I wrote a blog post [2] explaining why you might choose to go down this route as it wasn't the way database was developed way back then (SQL wasn't developed in the same way as the other front-end and back-end code).
A few years later I gave a short 20-minute talk (videoed) to show what writing SQL using TDD looked like for me. It's hard to show all the kinds of tests we wrote in practice at the bank but the talk is intended to show how rapid the feedback loop can be using a standard DB query tool and two code windows - production code and tests.
Be kind, it was a long time ago and I'm sure the state of the art has improved a lot in the intervening years :o).
Chris Oldwood
---
[1] SQL Server Unit: https://github.com/chrisoldwood/SS-Unit
[2] You Write Your SQL Unit Tests in SQL?: https://chrisoldwood.blogspot.com/2011/04/you-write-your-sql...
[3] Test-Driven SQL: https://www.youtube.com/watch?v=5-MWYKLM3r0
We use Microsoft SQL's docker image and spin it up in the background on our laptop/CI server so port 1433 has a database.
Then we have our homegrown migration file runner that will compute a hash of the migrations, make a database template_a5757f7e, and run the hundreds of migrations on it, whenever we add a new SQL migration (todo: make one template build on the previous).
Then we use the BACKUP command to dump the db to disk (within the docker image)
Finally, each test function is able to make a new database and restore that backup from file in less than a second. Populate with some relevant test data, run code, inspect results, drop database.
So our test suite uses hundreds of fresh databases and it still runs in a reasonable time.
(And..our test suite is written in Go, with a lot of embedded SQL strings, even if a lot of our business logic is in SQL)
1)Set up a test db instance with controlled data in it as the basis for your test cases. Ideally this data is taken from real data that has caused pipeline problems in the past but scrubbed for PII etc. You can also use or write generators to pad this out with realistic-looking fake data. If you do this the same dataset can be used for demos (once you add data for your demo paths).
2)Write test cases using whatever test framework you use in your main language. Say you code in python, you write pytest cases, java -> junit etc. You can help yourself by writing a little scaffolding that takes a sql query and a predicate, runs the query and asserts the predicate over the result. If you don't have a "main language", just write these test cases in a convenient language.
3)Consider resetting the state of the database (probably by reloading a controlled dump before each test batch) so any tests which involve inserts/deletes etc work. You may actually want to create an entirely new db and load it before each test run so that you can run multiple test batches concurrently against different dbs without contention messing up your results. Depending on your setup you may be able to achieve a similar effect using schemas or (sometimes but not always) transactions. You want each test run to be idempotent and isolated though.
Doing it this way has a number of benefits because it's easy to add your sql test cases into your CI/CD (they just run the same as everything else).
1) The same way you'd write any other tests. Use your favourite testing framework to write fixtures and tests for the SQL queries:
- connect to the database
- create tables
- load test data
- run the query
- assert you get the results you expect
For insert or update queries, that assertion step might involve running another query.2) DBT has support for testing! It's quite good. See https://docs.getdbt.com/docs/build/tests
First, by “testing SQL pipelines”, I assume you mean testing changes to SQL code as part of the development workflow? (vs. monitoring pipelines in production for failures / anomalies).
If so:
1 – assertions. dbt comes with a solid built-in testing framework [1] for expressing assertions such as “this column should have values in the list [A,B,C]” as well checking referential integrity, uniqueness, nulls, etc. There are more advanced packages on top of dbt tests [2]. The problem with assertion testing in general though is that for a moderately complex data pipeline, it’s infeasible to achieve test coverage that would cover most possible failure scenarios.
2 – data diff: for every change to SQL, know exactly how the code change affects the output data by comparing the data in dev/staging (built off the dev branch code) with the data in production (built off the main branch). We built an open-source tool for that: https://github.com/datafold/data-diff, and we are adding an integration with dbt soon which will make diffing as part of dbt development workflow one command away [2]
We make money by selling a Cloud solution for teams that integrates data diff into Github/Gitlab CI and automatically diffs every pull request to tell you the how a change to SQL affects the target table you changed, downstream tables and dependent BI tools (video demo: [3])
I’ve also written about why reliable change management is so important for data engineering and what are key best practices to implement [4]
[1] https://docs.getdbt.com/docs/build/tests [2] https://github.com/calogica/dbt-expectations [3] https://github.com/datafold/data-diff/pull/364 [4] https://www.datafold.com/dbt [5] https://www.datafold.com/blog/the-day-you-stopped-breaking-y...
Be wary of too many techniques that are supposed to be making it easier to test, but also make it hard for you to leave a query pipeline. In particular, SQL should be very easy in the "with these as our base inputs, we expect these as our base outputs." Trying to test individual parts of the queries is almost certainly doomed to massive bloat of the system and will cause grief later.
We've written our own tool to compare different data sources against each other. This allows, for example, to test for invariants (or expected variations) between and after a transformation.
The tool is open source: https://github.com/QuantCo/datajudge
We've also written a blog post trying to illustrate a use case: https://tech.quantco.com/2022/06/20/datajudge.html
As others have mentioned, you want to compare the results of your queries against a previously known 'good' state of the data. So, as you're making data model changes, you can regularly check your development environment against production to see how your changes affect the data.
Data profiling is the perfect tool for this, especially when your pipeline reaches a certain size, or you're dealing with very large datasets.
I work on the team creating PipeRider.io, which uses data profiling comparisons as a method of "code review for data".
It becomes particularly useful when you automate generating data profiles of development and production environments in CI, and attach the data profile comparison to the pull request comment. It makes seeing the impact of changes so much easier.
Here's an article that discusses the benefits of this: https://blog.infuseai.io/why-you-lack-confidence-merging-dbt...
We have two flavours of test: one that drops the transaction each time, ensuring a clean, known state. And one that doesn’t, allowing your tests to avoid lots of overhead by “walking through a series of incremental states”.
Yes, some might call the latter heresy. But it works great.
During test: - At the start of the test (fixture), run a new DB instance - Apply DB schema. - possibly: Remove constraints that that would disturb your tests (eg. unimportant foreign keys) - possibly: Add default values for columns that are not important for your test (but do with caution) - run you test - Assert results (maybe also directly as access to databse or via a dump of tables). - Tear down database possibly removing all data (except error logs).
I used this pattern to test software that uses MySQL or MariaDB server. For Microsoft SQL server it may be enough to create a new database instead of running a new instance (possible but not as easy as for MySQL/MariaDB).
On CI server this can be used to run tests against all required DB server types and versions.
Look at this JetBrains survey: https://www.jetbrains.com/lp/devecosystem-2021/databases/
Around half of the people never debug stored procedures. Three quarters of people don't have tests in their databases. Only half of the people version their database scripts.
Personally: the answer is containers. Spin up a database in a container (manually or on CI server) and do whatever you need with it. Seed it with some test data, connect an app to it, check that app tests pass when writing to and reading from a live database (as opposed to mock data stores or something like H2), then discard the container.
Even if you don't have a traditional app, throwaway instances of the real type of DB that you'll be using are great, both for development and testing.
* It has a language level module support, similar to other languages. Thus SQL functions are reusable across multiple codebases without depending on code generation tricks. One of the major blocker for SQL adoption has been complex domain specific business logic and now the situation is better.
* It has an official unit test support. Google use Blaze (which is known as Bazel externally), so adding a unit test for SQL code is as simple as adding a SQL module (and its test input) dependency to SQL test target, write a test query and its expected output in a format of approval testing. Setting up the DB environment is all handled by the testing framework.
* It has an official SQL binary support. It's just a fancy name for handling lots of tedious stuffs for running a SQL query (e.g. putting everything needed into a single package, performing type checks, handling input parameters, managing native code dependencies for FFI etc etc).
None of those are technically too sophisticated at least in theory, actually these combined together become pretty handy. Now I can write a simple SQL module which mostly depends on other team's SQL module, do a simple unit test for it then run a SQL binary just as other languages. I haven't worried a single time on how to set up a DB instance. This loop is largely focused on OLAP so it's a bit different for OLTP, which has another type of established testing patterns.As the meme say: App worked before. App work afterwards. Can’t explain that.
[1] https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.html#module-sq...
Rails for Ruby comes with some pretty nice setups for testing the database code. There's a test DB by default with the same schema as Production, and the usual test frameworks (FactoryBot and RSpec) make it easy to set up some data in the actual DB for each spec, run model code that makes actual SQL queries, and assert against the results.
I would have hoped most other web hosting frameworks would make as much effort to making it straightforward to test your database code, but it doesn't really seem to be the case.
In Rust, there's a very handy crate called sqlx. What it does is, at compile time, it runs all of the SQL in your codebase against a copy of your database to both validate that it runs without errors and map the input and output types to typecheck the Rust code.
When it comes to stuff like validating that your queries are performant against production datasets or that there isn't any unexpected data in production that breaks your queries, well I pretty much got nothing. Maybe try a read replica to execute against?
More trivial example:
{%
call dbt_unit_testing.test(
'REDACTED',
'Should replace nullish values with NULL'
)
%}
{% call dbt_unit_testing.mock_source('REDACTED', 'REDACTED', opts) %}
"id" | "industry"
1 | 'A'
2 | 'B'
3 | ''
4 | 'Other'
5 | 'C'
6 | NULL
{% endcall %}
{% call dbt_unit_testing.expect(opts) %}
"history_id" | "REDACTED"
1 | 'A'
2 | 'B'
3 | NULL
4 | NULL
5 | 'C'
6 | NULL
{% endcall %}
{% endcall %}Starting with a framework that is programming language first (IE Spark) can help you build your own tooling to help you actually build unit tests. It's frustrating though, that this isn't just common across other ETL tooling.
A list of best practices: https://docs.getdbt.com/guides/legacy/best-practices
And shameless plug but there's a chapter on modeling in my book: https://theinformedcompany.com
A ref() concept like dbt's is sufficient. When testing, have ref output a different (test-x) name for all your references.
The backbone for this is that we spin up a DB per unit test, so we don't have to worry about shared state.
https://news.ycombinator.com/item?id=34580675
I am always baffled by why this ins't more popular way of writing SQL.
Most of the times there is a layer around your sql (a repository, a bash script or whatever) that you can use for integration testing.
Anyone have any recommendations on testing SSIS ?
One of the premises that we have is the ability to instantly create a test environment by creating a branch. I'd love to hear what you think about it.
This way i don't waste time with unit tests that quickly get old and no one wants to maintain and run
Excellent for checking delete queries before running them.
in postgresql a cool tool for performance is"hypothetical indexing", which predicts how the optimizer will use indexes in any sql query. i could see an automated testing tool written around "hypothetical indexing".
also, i believe MSServer supports HI.
for real though I love tools like SequelPro or TablePlus that let me work out a query before I bake logic or stuff into my apps. Also sometimes I use it to work out the data needed for reports. I am working with salesforce for the first time in my life and apparently there are tools that let me treat it like I'm used to SequelPro.
But my app is for six users at one site, it’s not mission critical, and the sqlite DB is backed up hourly.
Life’s too short for (unnecessary) testing.
Learn to use IMPORT TABLESPACE in MySQL or just dump and import SQL.
Every time you run a test you set up the mock databases again.
There are a few types of tests one would like from a SQL pipeline, each with a different value add:
- Quality assurance tests: these are things like DBT tests, they mainly test the accuracy of the result after the tables are produced. Examples of this would be tests like "this column should not contain any `null` values" or "it should have only X, Y and Z values". They are valuable checks, but the matter of the fact is that there are many cases where running this sorts of tests after the data is produced is a bit too late.
- Integration tests: specify an input table and your expected output, and run your queries against it, the end result must match with the expectations at all times. This is useful for running them regularly and serve as "integration tests" for your SQL assets. They allow validating the logic inside the query, provided that the input is covering the cases that needs to be covered, they can be executed in CI/CD pipelines. We are exploring a new way of doing this with Blast CLI, effectively running a BigQuery compatible database in-memory and running tests against every asset in the pipeline locally.
- Validation tests: these tests aim to ensure that the query is syntactically correct on the production DWH, usually using tricks like `EXPLAIN` or dry-run in BigQuery. These sorts of tests would ensure that the tables/fields referenced actually exist, the types are valid, the query has no syntax errors, etc.. These are very useful for running in CI after every change, effectively allowing catching many classes of bugs.
- Sanity checks: these are similar to the quality assurance tests described above, but with a bigger focus on making sense out of the data. They range from "this table has no more rows than this other table" to business-level checks such as "the conversion rate for this week cannot be more than 20% lower compared to last week". They are executed after the data is produced as well, and they would serve as an alerting layer.
There is no silver bullet when it comes to testing SQL, because in the end what is being tested is not just the SQL query but the data asset itself, which makes things more complicated. The fact that SQL has no standardized way of testing things and the language has a lot of dialects make this harder than it could have been. In my experience, I have found the combination of the strategies above to have a very good coverage when it comes to approximating how accurate the queries are and how trustworthy the end result is, provided that a healthy mix of them is being used throughout the whole development lifecycle.
For testing:
Run your query/pipeline against synthetic/manual data that you can easily verify the correctness of. This is like a unit test.
Run your query/pipeline on sampled actual data (eg 0.1% of the furthest upstream data you care about). This is like an integration test or a canary. Instead of taking 0.1% of all records you might instead want to sample 0.1% of all USERID so that things like aggregate values can be sanity checked.
Compare the results of the new query to the results from the old query/pipeline. This is like a regression test. You may think this wouldn’t help for many changes because the output is expected to change, but you could run this only on e.g. a subset of columns.
Take the output of the new query (or sampled query, or the manual query) and feed it to whatever is downstream. This is like a conformance test.
For reliability:
If the cost is not prohibitive, consider persisting temporary query results (eg between stages of your pipeline) for 1-2 weeks. This way if you catch a bug from a recent change you only need to rerun the part of your pipeline after the breakage. May not make sense to do if your pipeline is not big
If the cost is not prohibitive you could also run both the new and old versions of the pipeline for ~a week so that you can quickly “rollback”. Ofc whether this is viable depends on what you’re doing.
The big failure modes with SQL pipelines IME are
1. unexpected edge cases and bad data causing queries to fail (eg you manually test the new query and it works fine, but in production it fails when handling Unicode)
2. not having a plan for what to do when a bug gets caught after the fact
3. barely ever noticing bugs or lost data because nobody is validating the output (for example, if you have a pipeline that aggregates a user’s records over a day, any USERID that’s in the input data for that day should also be in the output data for that day).
4. This can be very hard to solve depending on your circumstances, but upstream changes in data are the most annoying and intractable to solve. The best case here is you either spec out the input data closely OR have some kind of testing in place that the upstream folks run before shipping changes.
To address these, you need to take the approach of expecting things to fail, rather than hoping they don’t. This is common practice in many SWE shops these days but the culture in the data world hasn’t quite caught up. I think part of the problem is that automating this testing usually requires at least some scripting/programming which is outside the comfort zone for many people who “just write SQL.”
Other languages are too complicated. :(“
Everyone today: “tries using sql
Oh wow, the tooling is quite basic, and you can’t express complex data structures and imperative code. :(“
What did you expect?
Look, I spent 4 years in this rabbit hole, and here’s my advice:
Don’t try to put the square peg in the round hole.
You want easy to write, simple code and pipelines? Just use sql.
Have a dev environment and run everything against that to verify it.
Do not bother with unit testing your CTEs, it’s hard to do, there are no good tools to do it.
If you want Strong Engineering TM, use python and spark and all the python libraries that exist to do all that stuff.
It won’t be as quick to write, or make changes to, but it will be easier to write more verifiably robust code.
If you treat either as something it is not (eg. Writing complex data structures and frameworks in sql) you’re using the wrong tool for the outcome you’re trying to achieve.
It’ll take longer and “feel bad”, not because the tool is bad, but because you’re using it in a bad way.