I'm a big advocate of understand your data model at the database level. Need to join on too many tables is too easy to do with an ORM.
My go-to strategy for SQL is simple. Abstract your SQL as far away from your application as possible. Not just by using interfaces in your application, but by using stored procedures in your database.
In my mind SQL is just another service your application depends on. And you want that interface to be as simple as possible for your application. If you need to tweak SQL performance it should not need any input from your application. I could completely migrate to a new table relationship structure without the application realising if that was what was needed. You could even go as far as to unit test just your SQL code if you wanted, something you can't do too easily when it's in your application.
Yes, if you need to return new data then you need to update your stored proc and code. But that's so worth it in my opinion for that extra layer of abstraction.
My opinion is slightly skewed from a decently sized business perspective, but I do still follow this pattern in personal projects. When migrating applications to different tech stacks (like Java to Go, or C# to Go) this abstraction has meant the world to us.
Stored Procedures correctly managed are a more secure option than an ORM executing arbitrary SQL code, but the reason why is misunderstood consistently (I've had experienced DBAs get this wrong and insist to me that stored procedures prevent SQL injection attacks which is completely false).
To pick an arbitrary example out of thin air:
If my application requires 10 different calls to the database with variables, then I can create 10 stored procedures. I can set the permissions for the account that the calling application is using to only have EXEC privileges and only on those 10 procedures. This means that if the credentials were to leak the damage is limited to tasks the application could conceivably have performed anyway, albeit without any application-enforced validations on the variables passed to the calls.
If I use an ORM, I have to give the SQL account the client application is using more-or-less carte-blanche access to the database as it could conceivable read from or write to anywhere in the database through arbitrary SQL. Sure, maybe I scope the ORM to a schema and restrict on that, but it's not nearly as granular and fundamentally misses the point that I've opened a massive attack surface unnecessarily by creating a SQL account with generally free reign on the database that exists in the application layer.
On a recent project, mostly to see how practical it was, I built an application that used stored procedures with all of the validation being done directly inside the procedures in SQL. This had quite a few benefits:
I only had to maintain validation in one place. I could rely on the formatting constraints I already needed in the database anyway. The validation rules were beside the data and were much faster to edit and maintain over time. The validation messages had to be pulled as keys making globalisation of the application much simpler later within the client application.
Different strokes for different folks, but there are literally decades of reasoning behind why RDBMS systems are the way they are that is completely bath-watered by ORMs.
If there is logic or metadata that is common across all those applications, there are a couple choices. You could duplicate that logic across all the applications, and rewrite that logic every time you do a rewrite of the application, or you could keep it one place.
If you keep it one place, one way to do that is to have a service in front of the database that every application that uses the data calls instead of hitting the database directly. That has some disadvantages in that it requires upfront design and planning, the service in front of the database will likely be rewritten in new technologies over time, and it has performance implications. And for the developer, instead of having to deal with logic in an application and a database, now they have to deal with logic in two applications.
Another way to do it is to have common logic for the data implemented in key constraints, check constraints, triggers, stored procedures and other similar tools in the database so that any application that uses the data doesn't need to rewrite that logic, and can't intentionally or unintentionally violate the rules of the common domain. That does have its own disadvantages, and it makes things more complicated for developers who will need to be familiar with an additional set of technologies, but it is a valid use case for stored procedures.
The other way I’ve seen data-transport stored procs used is to allow multiple code versions to work on a single database - where you need to support version N and N+1 one the same data (I once worked for a saas company that provided a Preview version on live data before going to prod). Before version N+1 code goes live, deploy N+1 database schema, updated procs, new data, etc, and put in place versioned stored procs which let a version N application run on a version N+1 database, setting default values for new fields, etc. It did require some extra thought to make it work smoothly, but very rarely caused problems for customers.
The complexity it adds to release management alone makes it not worthwhile in a lot of situations.
It’s like doing full blown OOP design. Your codebase has to pass a certain (rather large) size before the complexity trade offs can even begin to make sense, and even then they often don’t.
I think this is a very valuable and important point and wanted to highlight it.
>> Any stored proc codebase is guranteed over time to become a maze of complexity and footguns.
There's no absolute law that states that SP using environments will always end up as a mess, but there should be such a law, because they do!
I've worked across dozens to hundreds of enterprise environments, and the pattern is clear.
I suppose it comes from personality and culture with the people working on db coding more often in the 'get it done' end of the spectrum vs programmers where you find a significant number interested in 'get it right'.
An even stronger reason is probably that db coders are more likely to be autonomous instead of in large centrally driven teams. That central hierarchical organisation is a necessary but not sufficient condition for working to tame the entropy of large complex systems, without which they evolve into "a maze of complexity and footguns".
You already did, you just chose to ignore that the DB is actually a separate service. Ignoring this has meant that I've spent a not-insignificant amount of time in my last few jobs cleaning up bad ORM code because the queries performed terribly. I've cleaned such messes as pulling in all rows and then looping/filtering on them client-side, just generally querying for too many things/too many columns, querying for data and then looping over it to do more queries (instead of figuring out how to do it in a join), etc. When people treat the database as if its part of your monolith, because all the query logic is locally part of your codebase in your ORM, there's a temptation to be lazy and treat it that way. And it works fine in development because the database is small with only test data, but can be painfully slow in production.
I'm sure plenty of people do it properly, even with an ORM, but I've had to clean up the mess too many times now, across multiple companies and dev teams.
I'm not advocating for putting all of this logic into stored procedures, I think that's going too far in the other direction, only that you already have a distributes system, if you have a database, where the boundaries do matter.
Just call your procedures "micro services" and you are in the clear ;)
I have noticed that over the years the knowledge of SQL eroded a lot in population of Java developers. Not only that, but the design of application seems to be lacking. Where problems could easily be solved with a little bit of efficient SQL people mindlessly accept huge performance losses due to ORM as if they just did not see other possibility.
The problem is exacerbated though by the amount of churn on styles of hammers, and newer developers are often too hammered to pickup other tools ..
There can be issues with package versions compiled to the db. It makes continuously integration harder. We do automated builds and testing in the application layer but we don't have anything in the db layer. It's harder to create a new environment (for a developer who does not have full db permissions). Business logic ends up in the stored procedure layer AND the application layer.
For instance:
1) You want to implement idempotency in your backend by using uniqueness constraints in your DB 2) That's important to test 3) That's not really tested unless you include a real DB
After including a real DB in our CI setup I will never move back.
Are you using a commercial database (SQL Server, Oracle, or similar?). With open source databases this is non issue as developers can run a complete instance locally with full permissions, completely isolated from prod/staging servers.
This used to be the standard when clients used to connect directly to the database. Now that the world depends on web services, things are a little different and there's less incentive to maintain stored procs as an interface to your database.
It's still a good idea though, with unparalleled performance. I suppose developers don't do it for a variety of reasons: They may lack the SQL knowledge, they may not want to maintain the extra code, they may fantasize about database portability, etc.
- They may realise value in having a compiler provide guarantees, as a bonus immediately in a tight feedback look while editing code as well as on compilation
- They have favour expressive languages with robust error handling, integrated IDE and VC etc
- They may want to reduce the number of moving parts
These are all along the themes of a) delivering results faster, and b) improving the maintenance lifecycle of systems - unlike those parts of the industry that don't really plan to maintain their systems and instead just wanna do rewrites in the latest hotness which is usually a hot mess from a long term maintainability perspective.
Where I work we have a custom system which manages releasing immutable SQL snapshots between environments and they get merged to master once in prod. The only thing needed is a process, and then version control is easy.
Portability across databases rarely buys you anything but hard to reproduce bugs.
https://github.com/go-gorm/gorm/issues/1436
Note that there is some work for v2 in allowing users to create JOIN queries but without JOIN support it's effectively unusable in anything other than very small databases.
Not to mention, as computing expands, the average capability of people is dropping. Seats need to be filled, and those seats don't all need to be high quality devs.
So ORMs are there to reduce the workload, in terms of what needs to be learned to be productive. Database tuning is becoming a lost art.
(For DEVs not getting this, on a largish DB, I can literally write a single query which can take 10 seconds in MySQL's console, or hours. Yes, hours.
This isn't a flaw, any more than doing a tree search wrong is a flaw in <insert language here> when it is slow.
And most ORMs don't come close to properly handling all edge cases here.)
Beyond all of that, there is another bonus to using stored procedures or functions. Security.
There are many tables I want to give read only access to, yet want to allow writing in very limited scenarios. A stored procedure can have different user permissions (at least under most DBMS), which means you can:
- make a table read only for a web application user - then, allow a stored procedure to perform INSERT/UPDATES, via a different user - however, let the web user execute the stored procedure
The stored procedure thereby limiting the INSERT/UPDATE to a very specific action.
Mostly, I've used this when dealing with tables/databases which have a financial component. Said stored procedures can even perform additional checks on other tables, whatever is required to ensure that the update == OK.
Beyond that, it also allows for tiered security, in case you have someone in house, maybe a new dev, who decides they'd like to modify that financial table a bit.
And beyond that, it also allows logging to be done at the DB level.
While you are more limited here, in that you really do not want to be burdening the DB with more writes, you can definitely log all modifications to said financial table.
(You could use triggers for this last bit too, of course, but still...)
Also, a lot of businesses have analysts/data scientists that know SQL and have access to the DBs: stores procedures make business domain-level functionality more available to them because they don’t have to guess what application code is actually doing.
The problem is (which you seem to recognize but most seem to miss) is that the opposite happens: the ORM lets you shoot yourself in the foot quickly, but that's about it. In the end, you lose more time using an ORM than you actually gain.
And there is nothing wrong with that. It's like recommending against the use of HTTP before people learn TCI/IP, or recommending against PHP/Apache just because people aren't really writing servers.
Learning how to write a web app and learning database management are 2 separate topics. Furthermore, GORM is rather barebone, compared to Entity Framework or Hibernate, due to Go type limitations...
Of course sometimes the task is to write load/edit/store logic for individual records and their relationships. There's nothing wrong with using an ORM for that.
But extending this sort of thinking to all data processing is extremely detrimental to both productivity and performance. If you're using an RDBMS you're well advised to apply a sets first approach to data handling.
That's the danger of ORMs. They encourage people to do the wrong thing.
They also add considerable complexity to a project. So after having written my own ORMs many years ago and using some of the popular ones I would stay away. The cases where they are the right tool for the job are just not worth the friction they create.
Using any abstraction is a gamble that it doesn't leak before you have the resources to fix it.
HTTP is a terrible comparison because in almost every domain you will either make millions or (far more likely) fail for other reasons before you need to understand how TCP/IP works.
I bet there's loads of companies that have died a slow death, releasing everything a month later than they want to, because the back-end developers aren't comfortable with whatever the stack of the day is and they were too busy learning yesterday's stack of the day instead of the intricacies of the language they're working with, or the standard library.
This is a false dichotoy. SQL is not TCP/IP because it's nearly on the same level as ORMs. Unless you normalize and understand your entities properly it'll be difficult to write performant systems. ORMs are rightly called the Vietnam of Computer science¹ for this reason.
[^1]: http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...
But it's not just the suggestion of GORM. It's that it is often suggested as the solution and anything else is unviable.
I think writing a web app and knowing how the database works are interlinked. It's incredibly valuable knowledge as a web developer IMO.
I have made many mistakes in my career in terms of systems that were either overly complicated or inefficient or both, and it has almost always ended up boiling down to either trusting the SQL that an ORM produces, or putting too much of the business logic in hand crafted stored procedures. I have done both, regretted both, and nowadays advocate for sticking to simple, predictable, and dumb stored procedures so that you can both have the benefit of knowing/controlling exactly what will execute in the database and having things like validation occur code side whenever possible.
In the real world, that simplification of stored procedures isn't always possible. But if it's your starting assumption, then you can fight for it until you are certain that there isn't a practical alternative to pushing the logic into the stored procedure. And if a compromise of these principles needs to happen, I would push complexity into a stored procedure before trusting the SQL produced by an ORM every time.
But where I fall on that spectrum may just be due to the severity and types of problems I have seen happen with each approach, do YMMV. And I wouldn't go so far as to call myself an expert, so also YMMV on that front. I am just a guy who has made a lot of mistakes. But if it'll help others, this is my takeaway.
I also use stored procedures, and treat them as function calls from my application code. So I effectively build a "data storage API" in the database, that I then call (as you say, like any other service) from my code.
I also have views for all the data sets that will be returned to the application, so I can use them easily as a type for function returns, and enforce consistency.
I'm reluctant to go to straight SQL because then I have to:
- Write a ton of 'mapping' code between resultsets and data model
- Do a lot of diffing to determine which relations to add, update or remove
- Manage my own database migrations - challenging because I'm stuck with using sqlite
- etc
I want my tools to make things easier for me, but gorm seems to be the "now you have two problems" direction - for now. Maybe I just don't get it yet. Maybe I just don't get databases anymore either.
How do you unit test SQL code? You could integration test against something like sqlite but unit testing stored procs sounds very difficult.
I worked with a big C# .net core ecommerce platform that had a big stored proc used for ingest. It was an absolute pain in the ass to debug and fix issues because testing it was so difficult. Eventually we replaced it with C# code. Access to full software dev tools (debugger, automated tests) is really something you don't want to be without on large projects...
e.g. https://pgtap.org/documentation.html
There are similar solutions for Oracle's PL/SQL and SQL Server
Things like search interfaces might offer filters over multiple columns... your choices here:
1. Use an ORM
2. Write SQL for each permutation of filter
3. Write a helper that will compose the SQL based on the filters needed
And that 3rd one is the only time I stray from plain SQL towards something that looks like a lightweight abstraction over the SQL.
Similarly... stored procs, I avoid these but there are times when they are necessary. If you need the boundary of a transaction to span multiple SQL calls then it is safer to do so within a stored proc. Yes you can use a transaction and rollback, but from a maintainance perspective I've seen refactors break transactions that were not fully encapsulated in a single Go (insert other language) function .
Yes keep things simple with SQL and do everything in the application... but also break those rules rather than do something daft or with side effects.
Stored procedures as far as I know were first implemented in Sybase SQL Server. We tended to think of them as enabling RPC (remote procedure call) access to the DBMS. Sybase was also the first major relational database to offer client access through the network. Our connectivity was influenced by distributed system concepts like RPCs that were top-of-mind in the 1980s.
Made a tool to achieve just that: https://github.com/sashabaranov/pike/
That said, its easy for me to push out new code versions. I realize it doesn't work for everyone.
I've made the opposite observation. Compared to Python, C#, Java and NodeJS, ORMs are actually not a big thing in Go. Which is good - I agree with you.
interface MyRepository {
Page<SomeObject> findEntitiesByIdAndCreatedAtAfterOrderByCreatedAtDesc(int id, Instant after, Pageable page);
}
This abstracts away executing a query along the lines of:
select * from entities where id = $1 and created_at >= $2 order by created_at descI'll take the latter any day.
When app starts up, Java compiles these declarations and will abort if the declared function actually does not match the entities. This is a huge safety net, which you would not have when you write SQL. And also when you write SQL you have to take care and unpack the "select *" into your fields etc etc. I suspect this is much more important downside than you might realize.
I think it depends on the use case, as always. I use Hibernate for our platform, which is used at most by 10-20 people at once. There ORM is a huge help because I can just easily view and edit my model and all its relations and I accept performance hit in this case and if something is too slow, I rewrite it in sql.
Then we have API that uses data of the platform, and there I dropdown to plain handcrafted sql with a little help of JDBI and so far it has been working great and responses are fast.
So I think it is just about stepping back a bit to think and not to blindly accept either side of the argument. Each use case is different.
Meanwhile, JPA, the standard which Hibernate implements, has a textual query language [1] which is basically SQL but a little more objecty. For example, you can express joins through properties concisely:
SELECT c.capital.name FROM Country AS c WHERE c.name = :name
And use constructor expressions [2] to map results into Java objects: SELECT NEW example.CountryAndCapital(c.name, c.capital.name)
FROM Country AS c
[1] https://www.objectdb.com/java/jpa/query/jpql/structure[2] https://www.objectdb.com/java/jpa/query/jpql/select#Result_C...
@twic Thanks for the correction. Didn't know this wasn't a Hibernate but instead a Spring Data feature :-)
@watt You're right, I could've added an offset/limit. Wasn't the main point though (readability was).
@vitro Depends on the approach, but it won't be done /for you/, the mapping is a manual step (grab field X, etc.).
@piokoch, nice suggestion. Of course in the real world we work in teams and some people will prefer this ;-)
The advantage of using findEntitiesByIdAndCreatedAtAfterOrderByCreatedAtDesc is that you will get for free paging support (which is not trivial and has to be done properly for each database engine, simple approach with using limit typically is very inefficient).
Also, most of the queries will be something simple to fetch some entity using some business identifier or primary key, not being forced to write down all those simple queries is a productivity boost.
For more complicated queries you can fallback to writing query manually.
There is also maturing Spring Data JDBC project that kind of has "best of both worlds".
Pageable adds of course a bit of functionalities, as does the convenience of passing an Instant instead of a String :)
I find both have valid points and there isn't really a compromise. Whichever approach is taken you end up with half the team feeling not listened to and disenfranchised.
I have found few things to be more divisive than the ORM vs No ORM debate and I am not sure what the answer is.
We do manage the SQL code (view definitions, mviews, and the remaining procstocs) right next to the application code with Flyway [1]. We also have SQL-ORM integration tests, spinning a database container with testcontainers [2].
We've had some issues with business rule duplication (updating the ORM and forgetting to do the same in SQL), but so far I'd say it's successful.
This being said, I'm on the SQL boat; and I remain convinced that the CRUD could be done comfortably with an SQL query builder such as jOOQ [3]; and that it would help solving the business rule duplication issue. But hey it's working right now and everybody is happy about it, so why change it?
• Your data makes sense as objects;
• Your data is small enough to fit in memory;
• Most of your operations are CRUD.
If you can answer all three with yes, then use an ORM or some other kind of SQL abstraction layer. Otherwise don't.
They're great for storing complicated queries, but the actual buiness logic? No.
Have some stored procs for updating and querying data, then a lightweight abstraction around them in the application.
[1] "business logic or domain logic is the part of the program that encodes the real-world business rules that determine how data can be created, displayed, stored, and changed.", https://en.wikipedia.org/wiki/Business_logic
Just about anything you can do with stored procedures, you can do within an SQL query block in your favourite language. And when all your queries are fully parameterised, your database server can even use the same execution plan cache as "real" stored procedures.
The language plays a big role in whether an ORM is actually useful or not. In the .Net world, ORMs work quite well because querying capabilities are integrated into the language. Queries and schema are verified at compile-time, and I wouldn't trade it for slightly better performance or control. On the other hand, with Python (or say Node), or with Java and Hibernate the wins are smaller.
Of course, there will be some queries which are just better written in plain SQL. If you're willing to accept that, ORMs are a good tool in your toolbox.
jOOQ needs to be mentioned in this context. With jOOQ you basically write Java-SQL-DSL, and with some code generation from your schema you have really neat type safety (and code completion).
String SQL queries are not the answer to ORMs, imho, we need a level in between. jOOQ provides a really cool set of features in that space.
You can use an ORM and it works flawless for most cases. There might be some cases where I need to write SQL to generate a custom function or some weird edge case but that is why the ORM gives you the possibility to write your own SQL if you want.
I forgot to mention that having to write raw SQL queries and mantain them, doing migrations and keeping up with the changes is kind of a pain when most of the time the ORM takes care of everything.
There's contexts where good knowledge of the ORM, and of SQL, is part of the bar of working in the environment. This often happens in stacks where there aren't a zillion different choices but rather curated default choices for things like ORMS.
Combined with having a large and sophisticated system (or many similar small ones), then the complexity saving of being able to use a good ORM (and LINQ) provides a significant savings in time, and complexity.
This helps the systems to last, and be passed on to other developers.
With most of the stuff I do, an ORM is a non-starter for my databases. The data would make no sense as objects and I tend to avoid using data objects in programming anyway. It's probably impractical or impossible to build what I build using one and even if you could it would be more difficult to write and performance would certainly be atrocious.
In non-ORM codebasese I am seeing patterns where sql queries are "copy-pasted" together and I am not sure I like it.
Just imagine the following example had hundreds of lines of SQL and several optional filters, some of which could themselves be several lines long:
def get_data(table1, table1, use_subset=True):
if use_subset:
filter_query = "AND column2="subset_value"
else:
filter_query = ""
query = f"""
SELECT
*
FROM
{table1}
JOIN
{table2}
WHERE
column1="value"
{filter_query}
""" example.sql
SELECT
*
FROM
%(table1)s
JOIN
%(table2)s
WHERE
column1 = "value"
% if use_subset:
AND column2=%(filter_value)s
% endif
def get_data(table1, table1, use_subset=True):
if use_subset:
filter_value = "subset_value"
else:
filter_query = ""
data = cur.example(table1=table1,
table2=table2,
use_subset=use_subset,
filter_value=filter_value).all()
https://github.com/ebenefuenf/quma/
https://quma.readthedocs.io/en/latest/templates.htmlIn go I use rql: https://github.com/a8m/rql
Which I ported to c# for use in .NET: https://github.com/Ashtonian/RQL.NET
A good alternative (if you are using Java) is to use something like Spring's JDBCTemplate. This provides the following benefits over using Hibernate/JPA:
1. You craft the queries yourself and so there is no intermediate ORM language which you need to use.
2. It parameterises the queries so that you don't have to append strings to get to the final query.
https://dzone.com/tutorials/java/spring/spring-simple-jdbc-t...
<code>
@Override public void insertForum(Forum forum) {
String query = "INSERT INTO FORUMS (FORUM_ID, FORUM_NAME, FORUM_DESC) VALUES (?,?,?)";
jdbcTemplate.update(query, forum.getForumId(), forum
.getForumName(), forum.getForumDesc());
}
@Override
public Forum selectForum(int forumId) { String query = "SELECT * FROM FORUMS WHERE FORUM_ID=?";
return jdbcTemplate.queryForObject(query, new ParameterizedRowMapper<Forum>() {
@Override
public Forum mapRow(ResultSet resultSet, int rowNum) throws SQLException {
return new Forum(resultSet.getInt("FORUM_ID"), resultSet.getString("FORUM_NAME"),
resultSet.getString("FORUM_DESC"));
}
}, forumId);
}
</code>3. It fetches only the columns that you need it to fetch and across multiple tables (if need be) without fetching the "relationships" between tables unnecessarily.
The example given below shows how to use RowMappers and to fetch only the columns you need:
<code> String query = "SELECT FORUM_ID, FORUM_NAME, FORUM_DESC FROM FORUMS WHERE FORUM_ID=?";
return (Forum) jdbcTemplate.queryForObject(query, new Object[] { Integer.valueOf(forumId) },
new RowMapper() {
public Object mapRow(ResultSet resultSet, int rowNum) throws SQLException {
return new Forum(resultSet.getInt("FORUM_ID"), resultSet.getString("FORUM_NAME"),
resultSet.getString("FORUM_DESC"));
}
});I authored a complete rewrite of an ancient and rotting PHP+MySQL web ticket reservation app in Python+SQLAlchemy+PostgreSQL. I use an ORM - except where it doesn't make sense because a SQL query expresses what I need to do more concisely and effectively. I don't use stored procedures - except where I do because I need one specific atomic DB operation to be performant and not bottlenecked on the app. I use relational storage - except almost every table in the database has a big JSON column for everything I don't need to ever join, filter on, or index in production codepaths (though I can still do that with PG's native json support, which is great for the rare case I have to move something to a real column). And I use triggers, stored procedures, and notifies to implement live change notifications for a table, that eventually get fed via WebSockets server to users.
This hybrid approach has served me extremely well, resulting in very readable and maintainable code, minimal DB schema migration pain (most upgrades only touch JSON fields and thus require no migration), and much better performance than the old app, especially in that hot path using the SP, while keeping table column bloat down, and avoiding the join spam that results from keeping everything religiously normalized even in cases where that doesn't buy you anything.
Of course, that does mean you need to know all the relevant technologies involved, SQL, ORMs, etc.
YMMV, but consider that if you think a single solution is the right solution in all cases, you're most likely wrong.
This is sort of the meta-principle that I apply to all of software development: no principle or methodology (DRY, YAGNI, SOLID, TDD, etc...) applies 100% of the time (except for this principle. ;))
CREATE TABLE some_table (
id SERIAL PRIMARY KEY,
uuid DEFAULT uuid_generate_v1(),
user_id INTEGER,
details JSONB,
updated_on timestamp DEFAULT current_timestamp,
created_on timestamp DEFAULT current_timestamp
)They push as much of the business logic as possible to the database. Their reasoning being that if the client insists on a separation between the DB & application servers: you should do as much computation as close to the data as possible. Then just send the end result over the wire.
Due to my own ORM-induced brain damage I found it hard to wrap my head around this at first: a data type in the application no longer represented a table, but the result of a query. Once you realize the database is just another API though it clicks really nicely into your architecture.
I think I still prefer things like Linq, jOOq, Arel, Ecto, etc. where you can write the query in your programming language and have it translated to SQL. It's just nice to see your query right next to the code when debugging. The author is absolutely right though you still have to know SQL to use tools like that effectively, so you might as well just learn it early instead of wasting effort learning the quirks of a specific ORM.
Whereas people who are "imperative thinkers," who just treat the database like a giant excel spreadsheet, will use flow control for branches, cursors for iteration, tons of temporary variables, etc. Without fail the iterative thinkers wonder why they need monster CPUs for their DB server and it's always pegging one core. Well of course the query planner can't optimize your code: you told it how to get the data, rather than asking for what data you wanted.
It has its charms but it mostly fell out of favor because of versioning. It's very hard to manage multiple versions and migrations for the database and it had a negative affect on development velocity.
It's always struck me as odd that, as a profession, we generally agree with, and can be quite fanatical about, the idea that different modules and services should try to hide their implementation details as much as possible, and instead speak over well-defined, constrained protocols such as APIs or interfaces; but as soon as an SQL database comes into the mix, we happily throw all that hard-earned discipline out the window and go back to directly swizzling the internal state of external collaborators.
That said, there are some use cases where I'm not sure how you get around something like an ORM. One is when you need to allow users to execute arbitrary searches against the data. If that's your situation, then, any way you cut it, you're going to end up with some system that takes an abstract representation of a query and compiles it to SQL. The only question is if you want to use something off-the-shelf to do it, or if you'd rather hack it together yourself. In my experience, there are few things on this earth that present a greater maintenance burden than a homegrown ORM-type library does once the original author has moved on to other things.
And there are others where avoiding an ORM is over-engineering. If your database is just an honest entity store, and you're just doing fairly straightforward CRUD operations against it, and it belongs to a single application, go ahead and punch the easy button and have a happy life.
> but as soon as an SQL database comes into the mix, we happily throw all that hard-earned discipline out the window
The data belongs to the database, not to whatever program you happen to work on at the moment.
IMO, the impedance mismatch situation is a result of poor programming language design. Saying that the database abstraction "leaks" into your code is backwards; the shortcomings of your programming language "leak" into how you're handling your data.
It's far from the only way to do domain modeling in an object-oriented language. It's not necessarily even a particularly good way, and it creates problems even if your app comes nowhere near a database.
If it's a performance-critical in-house service where you have full control of the deployment and can tailor it to your use-case, it's a completely different story.
Though I have to admit I do check what queries Hibernate produces to make sure it's not doing some funky stuff that's not really needed (as a result of a mistake of my own, mostly)
Edit: It has come to my attention that it is ofcourse Spring Data that offers those named queries. Still see nothing wrong with hibernate since I still only have to write it once and I'll be relatively safe supporting both db systems.
This is an extremely niche view, ORMs save you a huge amount of time, and the days before ORMs were a real pain in the arse.
As a noob you are much more likely to introduce a massive security hole by rolling your own solution. Don't do it!
So this article is sorta true, and you should learn SQL, but for trivial queries ORMs are actually a massive time saver and really useful in maintaining a system, especially in strongly typed languages (Java/C#/etc.).
Look at his description of the things he was making.
You do not use ORMs for anything but the most basic reports.
Let me tell you... readability and type saftey are a boons. Not a curse.
Things still get convoluted but boy the errors that can sneak into a complex sql statement are painful to debug.
As always the pendulum in this "article/rant" is swinging in the other direction ("everything was better in the past"). Also it's fricking (almost) 7 years old. That's a lifetime in software development.
If we were talking about something like Kubernetes best practices, I'd definitely agree with this point. In this case, though, I don't see what substantially changed about ORMs or SQL in the last 7 years.
The worst thing that ORMs try to do is handling relations (eager / lazy loading) and handling sessions / object lifetime. This is mainly because it's impossible to have one-fit-all solution for all usecases even in one application.
But even with relatively small and simple things you can run into issues very quickly if you don't know what kind of queries the ORM will create. ORMs are very leaky abstractions, they're useful but unless you understand SQL and understand their quirks you're likely to create some weird and monstrous queries at times.
You should know how your ORM handles related entities if you query them, there are some big footguns there with some strategies. And of course you should know how to use the ORM so that it doesn't do a "SELECT *" everywhere (which can be trickier than I'd like in some cases).
I would also not hesitate to drop to plain SQL for some cases, if your query doesn't fit neatly into the capabilities of your ORM.
I think this is where a lot of the anti-ORM crowd trip up. They start with the idea that an ORM completely abstracts away a database, show that isn't true, and then condemn ORMs. But ORMs don't do that, aren't advertised as doing that, and aren't useless because they don't do that.
So it doesn't matter that ORMs admit that they are a leaky abstraction. I still pretty much hate them for it.
It just feels pointless when I know I'm going to have to understand and use the underlying tech anyway. For ORMs specifically, I much rather use a query builder that's basically just a type-checked SQL DSL. Some of those aren't complete either, though, which also enrages me.
And what's not to say you can't write your own lightweight ORM to abstract the database if you can't find a tool that suits your goals - and as always it all depends on your use case.
Instead of fizz buzz maybe testing basic SQL abilities in an interview would be better.
In 'typical' Java application (since many comments here as well as the original article mentions Hibernate...) you will likely use annotations like @Transactional to mark your transaction boundaries (likely with default propagation and isolation levels...) and then Hibernate will track any changes ('dirty checking') to objects you asked him to fetch and then at the end the transaction Hibernate will issue whatever DML commands (INSERT, UPDATE, DELETE) needs to be issued in an appropriate order.
In a galaxy far, far away i.e. before Java 1.5 instead using @Transactional you would maybe use (write) some object like TransactionManager which provides an execute() method that receives a block of code in a form of a interface implementation (no closures for you!). This part is relatively straightforward. Tracking changes in any semi-automatic way was always messy...
Re: identities. Use of db generated identity has the downside, but n/hibernate has a bunch of other Id generators to mitigate the problems. You can use sequential guid, hilo, guid, or whatever. I use sequential guid because it helps with a bunch of other things. So it's not really a leaky abstraction. It's not really an orm problem, you need to do that regardless.
Re queries: I think linq showed the true power of orm in some sense. Query your database as if you are querying your objects. It has problems like n+1 or exploding cartesian but postgres and the likes fixes it nicely with json_agg. I have to agree though that I like graphql way too. For nodejs compile time linq is not an option, so graphql it is.
Transactions: I don't know how this is related to orms?
Disclaimer: former nhibernate developer that was around when ayende was building linq for nhibernate and my view is probably dated and biased.
If you use a more RPC approach to your data layer, partial updates based on an entity id are easier to groc.
"When you have foreign keys, you refer to related identities with an identifier."
"What this results in is having to manipulate the ORM to get a database identifier by manually flushing the cache or doing a partial commit to get the actual database identifier."
This(and the sentences before it) basically saying if you have if you have a foreign key you have to first save the main object to get it's id. Id doesn't have to be responsibility of the database. In fact, I'd argue an identifier is an application layer concern. Orm could still solve this problem by simply cascading, or you can use one of the identity strategies that don't rely on database assigning it. You can generate them in either application layer or orm layer through identity generators.
(a lot of what ORMs give you is just a reduction in boilerplate code, manually populating 30 different fields on an object and so on.)
You're talking about query builders. ORM often come with a query builder but the goal is really about mapping object relations(1:n,1:1,n:n) to SQL queries in an automatic fashion, nothing more. today, most of the time it can easily be done with with JSON queries on most RDMS. But advanced ORM also come with a bunch of useful stuff like unity of work, caching and co...
[0] https://www.postgresql.org/docs/current/sql-createview.html
[1] https://www.postgresql.org/docs/current/rules-materializedvi...
I think a large part of the problem here is that developers who learn to leverage a database only through an ORM are missing out, and really they should also learn SQL (literally the only part of the article that is still objectively correct is the author's advice to learn SQL) to gain a better understanding of when (and when not) to use an ORM. Every other complaint in the article is either classic misuse of the ORM, or else a shortcoming of the ORM in question.
Unfortunately, even if you choose to reject ORMs and go SQL, you're still going to have solve this problem at some point, and it will not be pleasant.
I find value in libraries that occupy a middle ground between nothing but raw SQL and a full-blown ORM.
In TypeScript (and with apologies for hawking my own project): https://jawj.github.io/zapatos/
The tooling would statically analyze your queries, and produce types for exactly what you were requesting.
Then you would just have a bunch of raw queries laying around, but you’d be confident the data retuned is the right data at compile time.
That made just writing raw queries a lot more simple and feasible than building abstractions on top of them.
Now TS and sql doesn’t really have a robust lib for that I think, last time I checked it was only https://github.com/adelsz/pgtyped But haven’t looked at how far it has progressed.
they are there to take care of all of the crap that you don't want or need to worry about. for instance, parametrizing string against sql injections, handling transactions properly, concatenating joins, proper pagination and sql syntax discrepancies to name a few. and let's not to forget to mention things that go BEYOND the database interactions that are built on top of the ORM like data validations, custom properties, callbacks and so on.
there are way more benefits to using an ORM than not using one. a good ORM (like ActiveRecord) will let you break out of it and write raw sql when you need the perform boost while preventing yourself from shooting yourself in the foot.
* i've written and contributed to a couple of ORMs in my lifetime
I currently use dotNet Core and Entity Framework + Dapper.
It has these benefits: - Strong types! - My schema+migrations are part of the source code and I never have to worry about if my schema is in sync - with migrations it just is - Simple, straightforward tasks as super easy (insert, update, delete, simple select) - I can fall back to Dapper (straight SQL) when I need to do something that doesnt fit well with Entity Framework.
I recently worked on a project where EF was not allowed - and we spent so much time doing all the simple stuff, especially when the data model was still not completely locked down.
To be fair, I have never really done a full blown project with entity framework.. but I seem to always need a bit more control than it's offering. So I just created a simple repository framework (inspired by https://www.youtube.com/watch?v=rtXpYpZdOzM ) overlay in which the functions end up calling dapper.
The beauty of this is it's pretty simple to setup the overlay, and allows complete control of what is going on. I have a need for "dynamic" table names. I don't think there is even a way to do this in EF (at least not a simple way), but when I have my own repo overlay it's really a piece of cake. I just tack on whatever specific function input I need to help define what the resulting table name is that its great.
This is probably not the most common use, but a simple repo overlay to a micro orm is not much work and gives full control.
There are various arguments that seem to have flaws of their own. For example, "use ORMs for all the simple things, and raw SQL for the complex things". The problem with this is that even simple things like "insert this into the database" often require checking that the foreign keys you insert are within the domain of the user trying to insert them, and various other constraints like that. So with SQL, you can do this all in one query, but with an ORM, you often create a query for each check.
Related to that is the idea of "premature optimisation". The problem with this concept is that all queries add up together to determine how much hardware you need, which determines your costs. You can argue the opposite of premature optimisation, depending on your case. Why not spend 5 minutes extra writing a manual query that will be run hundreds of thousands of times for years?
Are your goals about reducing infrastructure costs as much as possible? Is "developer time" really a thing, or are you doing this in your own time for a startup or something?
Then there is the fact that doing simple things faster isn't much of a selling point because they are already simple. It is very obvious though that ORMs are much, much more readable than raw SQL.
Then another question starts to come up, if you are using this abstraction like an ORM or GraphQL as an ORM, why are you even using a SQL database when none of the features are really available to you?
SQL has has interesting new features in that you can do the object mapping inside the queries now. For example:
select
post.id,
post.content,
json_agg(comments),
json_build_object('id', a.id, 'name', a.name) as "author"
from
posts,
comments,
authors
group by
etc...
Still, it is nowhere near as easy as using an ORM. The other thing is that when you start using an ORM, you really do end up having a different approach to querying your data in every situation. You don't use all of the various features of SQL. It ends up being a lot slower, but maybe in some cases that is worth it.Use the right tool for the job.
i have done two event-sourced project in a row and i am trying to make a reusable library that i can use from now on, which will allow me to write projections so i am not tied to any schema and can build any data structures i'll need.
I put it along the discussions about which OS is the best. Which does not matter at all. Please stop wasting everyone time with those arguments. Do what is best for your project and shut the f up.
Yet we don't really live in that kind of ideal world and there are many factors that people have to consider. If everybody else in the company is using ORM, why shouldn't you. Or, if you just need to ship products and don't necessarily enjoy learning SQL. Or, if you don't know how to properly abstract the structured SQL queries without making it a huge mess. Saying that ORM is better/worse is like saying everybody likes music.
If I'm building a Nodejs app I would choose not to use ORM. For other languages and frameworks I'd probably reconsider and evaluate my options.
To me however, the biggest benefit of using no ORM is the learning of SQL. Time and time again that proves itself so invaluable, and I'm immensely happy I've learnt the quirks of SQL instead of some language specific API. If I can code faster with ORM that's great, but in ideal world I would much rather learn SQL and become a master of it rather than of some ORM.
Everything else that does more than mapping SQL results 1-1 to a query result and maybeeeee helps you with atomic actions like INSERT, UPDATE and DELETE ends up being a chore. They all have their own quirks like around threading and usually end up forcing you to design your application around it unless you take some measures to isolate it from your main application, which means you're now writing code while you wanted to type less code.
Add to insult a ton of the them seem really resistant to map the results of hand-written queries to an object. Also SQL dependency trees and object dependency trees never really seem to map that well. And don't try to do more complex queries with the built in query language in Django or .Net. I have sometimes spent hours trying to get an SQL query working in an optimal way that would not cost me more than 5 minutes two write and map with simpler ORM's.
The other day I talked to a dev that was otherwise a very strong iOS developer in a discussion why I always used SQLite directly with a simple wrapper that said "I really should learn SQL sometimes".
There probably hasn't been a skill more universally useful in my career as a developer than SQL apart from HTTP? But it also made me think about why I never really learned to write C (I can read and futz with it).
For anybody new in the trade reading this, the following things have been or could have been useful at any step in my career:
* HTTP
* HTML
* SQL
* JavaScript (blegh but true)
* C
* Unix
The rest were just passing by for a while, like Windows, Flash, PHP or any god damn JS framework out there.
it's probably a thousand pages. likely more. i understand it isn't the most thrilling reading of your life. it might just be once you have an issue in prod, though.
if you really don't have time for that, i also understand. i was you. but really please read the table of contents in that case.
Selecting * from a table with 200 columns isn't performant but if the table had 200 columns, I won't be blindly relying on the ORM generated query anyway since most provide the escape hatch of executing direct SQL.
This approach has worked so far.
When using Object Oriented Languages, like C# / Java / C++ etc., the programmer is forced to represent data into objects. True, representing table and relations between tables as objects is difficult, but do OO programmers have any choice at all?
Afterall, the issue is not a single table (a row can very well be represented in OO constructs), it's the relation between tables that causes headaches.
Even when using languages like Python, I have to inevitable gather related pieces of data together, in a dict say, and it still feels like mapping related tables to objects.
Another contention with the "vietnam" article is that the author assumes that OO programmers will want to use inheritance to model relations between tables. This seems outdated in my view. Composition is more suitable and powerful and is almost always used in ORMs, for ex Entity Framework.
The original post also claims that ORMs tend to gravitate towards "select *" queries. This is true, but has many solutions. Many ORMs optimize queries based on required columns. But there is also another issue with this. Say you have a table with 100 columns, but are only querying 5 at a time, isn't it prudent to separate out those 5 columns into another table? Or perhaps create a view? If the use case demands just 5 columns, but the table has 100, isn't there a problem with the data modeling rather than the ORM?
But the source is not all there: http://rupy.se/util.zip (bunch of helpers) http://rupy.se/memory.zip (I called the ORM memory!?) and http://rupy.se/test.zip (my first test of any software, more of a tutorial) is where you find all the details.
Also the thing has a graphical editor: http://move.rupy.se/file/logic.html
All of this is very complex for something I later solved with JSON files over HTTP instead: http://root.rupy.se but parts of it can be reused for other useful things; like logic was used to build game dialogue trees.
Some symptoms that I've seen in multiple projects:
- requests are slow because every request triggers dozens to hundreds of joins.
- overuse of the @Transactional annotation (spring/hibernate) because of developers slapping it on anything that looks like it might be doing anything with a database while neither understanding transactional semantics or aspect oriented programming (which causes some funny behavior)
- Attempts to implement class inheritance via database tables and corresponding hacks and complexity to query because of that.
- Lack of a coherent database design. IMHO, a good use of ORM should start with a good old database design. A 1 to 1 mapping of your domain to tables is typically not it.
- Over and under use of database constraints, indices, etc. because of a lack of knowledge of how databases actually work resulting in unenforced referential integrity constraints, poor performance, and weak transactional semantics.
I've used lots of different styles of databases over the years. I generally break things down into:
- simple key value stores like redis, memcached, etc.
- document databases like couchdb, elasticsearch. These tend to have schemas and fields
- SQL databases (mysql, postgres, mssql, oracle, ....)
- Nosql object databases (mongodb, firestore, etc.
I tend to mix these styles and would happily use postgres as a document store. IMHO if I'm not going to query on it, putting it in separate column adds little or no value. If I am going to query on it; it should have an index.
Years of using document stores have taught me the value of de-normalizing stuff like user names and other things that rarely change but are a part of pretty much everything. I've ripped out hibernate in favor of JDBCTemplate + TransactionTemplate on several projects where hibernate was causing more problems than it solved. Hand crafted joins are pretty easy to do.
If your data model is small, they save you very little work.
If your data model is large, they collapse and you have to abandon them.
There is no middle ground.
I'm not criticising anybody here (I am literally not in a position of critiquing anything); SQL was created probably 40 years ago and it probably made sense back in the times.
My point is that you can always build a human interface from a machine one; the reverse is not that easy. ORMs are a (failed?) attempt to do so.
I would very much push for computer-language integrated query support — (such as Datalog or Linq), and that is what the database server should be accepting as input, instead of a raw string.
Although a lot of people hate MongoDB, I've personally felt way more productive expressing queries with their query document system rather than using SQL.
Fixing design flaws with other software ain't gonna bring us that far maybe.
Of course not all ORMs will protect against SQL Injection completely and some of them will only implement partial protection. But it is a good start.
What I recommend strongly is not to mix ORM with plain SQL in the same line of code.
The code should either be full ORM or plain SQL so that it is clear where the responsibility of protecting against SQL injection should be.
Edit:
Also it is easier to add extra layers of protection after a while if using an ORM then when using plain SQL. Because with an ORM I can redefine lets say the Repo.all or Query.all method to add extra layers of protection. But with plain SQL I need to edit every single line where the SQL statement is present.
I've used it on several projects and it seems to be the perfect balance between 'close to the DB' and 'close to the domain/application'.
Following this principle will definitely make your life easier. I have been using Slick[1] and the Play Framework[2] for my backends for a good while and having the freedom to define my migration scripts in pure SQL is a big plus. My application models are very lean and don't get mixed up with DDL. I can also fully understand what goes in the db migrations, zero magic there.
- [1] http://scala-slick.org/ - [2] https://www.playframework.com/
Languages could be placed on a spectrum of how easy it is to build a DSL over the top of the language. Lisp is on one extreme, where it is easy to write a new language over the top of it.
SQL is pretty close to the other end of the spectrum. I've never seen a DSL that can compile to the full breadth of SQL dialects out there. DSLs generally handle the trivial well then fall apart when they hit complicated SQL statements.
If an ORM just had to fit over the top of a relational model it would probably work fine. The problem is, in the middle of all this, something has to be constructed in performant & parseable SQL.
Also, ORM caching always finds a way to become a huge pain. It's easy to wind up with multiple objects representing the same underlying DB row, but updating one in memory won't update the others, so you update them all from DB just to be safe... You might say "find a better ORM" but I'm pretty sure they all run into this problem at some point. IMO you're better off doing it yourself and making it explicit.
You stop using an orm when you're wasting time going through it's documentation trying to figure out how to do something tricky for some obscure ad-hoc execution.
I do agree that for large, complex performant queries, ORMs might not work out, but I would say that those queries are better written as SQL procedures / functions because they are bound to be modified or changed anyway.
Having said that, most LOB apps, small apps, apps that have in-frequent access of the DB, etc, can benefit from using ORMs.
One problem I've yet to find a good solution to is automatic reconnect on database errors with proper transaction support.
But that's less of an ORM and more of a data mapper?
Now of course learning SQL is still a good thing to do, but for some a good ORM can be a way of doing that
Is an ORM good to help onramp beginners, or is it some syntactic sugar for experts who already know SQL very well? Or both? Or neither?
JOOQ is a wunderful SQL abstraction that is very nice to work with.
What I would really like is a Datomic style Database and have a JOOQ like Datalog, but I can't have everything at the moment.
The reason is that, not all our Rails devs understand well ActiveRecord, but they know SQL enough to make things work.
Not all ORMs are created equally.
I enjoy the Django ORM and it's composability, OTOH I was really not keen on earlier ORMs.
2)Use the minimum amount of technology necessary to solve the problem.
Putting core business logic in stored procs though is a big no no.