Challenges students face when learning to work with relational databases and SQL (opens in new tab)

(growkudos.com)

138 pointsgousiosg4y ago126 comments

126 comments

78 comments · 20 top-level

sdevonoes4y ago· 20 in thread

I takes days/weeks to pick up the core of SQL:

- create tables, update the schema, insert rows, add an index

- select, filters, joins, order by, limit, inner queries

It takes forever to be comfortable with:

- anything that involves summarizing, grouping, having, min, max, windows

fifilura4y ago

I think one tip here is to always work with CTEs "WITH", and not nest queries. That way you can always go back and check "what is it I join with what", by querying the individual steps with LIMIT 10.

The other tip is to sketch the problem in excel/google sheets when it gets hairy. Not the actual code (I don't have a clue how to do that, others have), just the values in the different steps. In the end it is only about rows and columns.

But that said, these days a lot of it happens intuitively for me, I pretty much know the solution before I can spell it out. It certainly was not like that when I started.

When you begin, "programming without for loops" feels like programming with your right hand tied behind your back. But in hindsight you get a lot of exercise in the immutable paradigms of functional programming, working with comprehensions, sets, maps folds comes very natural.

eyelidlessness4y ago

Be careful with blanket CTE recommendations. They tend to deoptimize with writes in really surprising ways. If you’re just reading data, they’re great.

1 more reply

DemocracyFTW4y ago

This is what I thought would be a good tip when I first learned about CTEs. Turns out I use them sparingly and much prefer to create views to represent steps of data refinement. The benefit of doing it this way is that I can inspect intermediate results. Often I use *.sql files for this so I can keep the SQL formulations for later; also, as long as stuff is still experimental / under development, I start out with `begin transaction;` because then I'm guaranteed I do not permanently change anything in the DB I'm working on and also I can repeat all steps without having to care about `create or replace view ...`.

Oh and lest I forget you can't just re-use a CTE in another query. But you can of course re-use a view. Also given what another user here remarked, Postgres might internally treat (and optimize) CTEs like views, so to me that makes views superior to CTEs in more or less all respects.

moonchrome4y ago

Have some links on using CTEs to replace nested queries and how it helps ?

I've used SQL enough to have to write nested queries, haven't dove further than that.

1 more reply

prpl4y ago

This is good advice but sometimes you will need to nest things depending on database and optimizer and how it handles CTEs, especially if you ever mess with recursive things. An additional problem here is that there’s no standard way to represent EXPLAIN queries across systems so that makes an additional barrier to entry unfortunately.

1 more reply

odipar4y ago

Yep, CTEs are a huge boon to structure your SQL - use them where you can.

KronisLV4y ago

I've always found it hard to articulate the problems that i have with SQL and with the "WITH" CTEs, but let me try anyways.

For starters, i can never actually tests parts of those queries without rewriting the query up to the part that i want to test, for example:

  WITH 
    query_one AS (SELECT ...),
    query_two AS (SELECT ...),
    query_three AS (SELECT ...)
    SELECT ... /* main query */

If i want to test the second query, i need to take the first and second ones, copy them into a new worksheet and then rewrite the second one not to have the alias but instead be the main query. This is annoying when you have 5-10 CTEs and you need to test something in the middle.

Then, working with SQL and CTEs feels like going back from a language where functions are first class citizens to one where no such thing exists, just in regards to querying data. It would be nice if i could store parts of queries under packages, to be able to write dynamic SQL more easily, instead of having to use tools like myBatis for this purpose: https://mybatis.org/mybatis-3/sqlmap-xml.html (see the bit about SQL fragments)

So i'd like to do the following:

  PACKAGE my_snippets BODY IS
    SNIPPET query_one
      SELECT ... /* probably 500 lines long but often used snippet */
    END query_one;
  END my_snippets;
  
  /* and then, somewhere in code */
  WITH
    query_one AS my_snippets.query_one,
    query_two AS (SELECT ...),
    query_three AS (SELECT ...)
    SELECT ... /* main query */

Now, you might suggest that using views works for this intent, but what about most DBMSes out there having silly naming rules and restrictions? I don't want to work with v_mtz_wg_priv_prod_attr because someone thought that having just a few dozen characters makes sense as a restriction. Furthermore, you really can't group views into logical packages based on their intent, now can you? So, with views you end up with something that's very much like your cluttered list of tables, which gets really hard to get a good overview of when you have about 300 of them.

Next up, debugging in databases is just really bad. How am i supposed to put logging in the queries, without mixing the logging code with the other triggers and tables? What about debugging long running processes? What about adding breakpoints that i can trigger when a particular view or table is accessed? What about doing this on the server while i have a local app instance connected to the DB, or maybe even another app server? Why can't i step through the query execution and see how the filtered record count changes with each "step"?

Apart from that, my problems are largely with the tooling around databases. There are relatively few universal (cross language) DB migration solutions out there, for example dbmate, every framework seems to have its own approach. There seems to be this odd division between procedural SQL and regular SQL statements, where what you can do differs based on context, which is inconsistent. Procedural languages as a whole vary wildly in what they can do - you won't be doing complex logic with custom types on MySQL/MariaDB anytime soon, whereas Oracle or PostgreSQL will suffice. But even those two have different dialects, it's never "just SQL". There are oddities with selecting certain kinds of data, only pgAdmin seems to work nicely with geospatial data, but apart from that i've also seen problems with using lower level JDBC logic which you can't really test outside of the app, in something like SQL Developer. But even apart from that, as much as we like ER diagrams, MySQL Workbench is the only tool that i've seen which allows you to actually do model driven development properly and synchronize schemas and do forward/backward engineering - even pgAdmin fails at doing this. Oh, and the tools themselves are really inconsistent - you'll see a world of difference between MySQL Workbench, pgAdmin, SQL Developer, JetBrains DataGrip and others.

And now those DBMSes are attempting to add more functionality, such as exposing REST interfaces, instead of fixing the underlying and dated problems, because people out there are relying on those and therefore the logic is set into stone. It's no wonder that every year there's a new product or two that attempt to improve upon these, even if most of the time those products die out.

Perhaps the above is a stream of consciousness with some annoying things that i've dealt with over the years, but personally, relational databases are something that i use because they're often the least horrible tool for the job, even if they are not pleasant or easy to use, at least as easy as they should be. That's where i think the main problem lies - tools should be good for solving the problems on which they'll be used, these ones aren't.

Someone with 20 years of experience might have a different outlook, but personally i'd suggest that you utilize DBMSes for what they're good for - storing, retrieving and manipulating data and don't get too carried away with in database processing otherwise, since doing certain things within the app code seems to scale horizontally far more easier in some situations, has better auditability, debugging etc.

2 more replies

branko_d4y ago

Not so sure about aggregation, but what definitely takes forever is performance.

To make a non-trivial SQL query scale to non-trivial amounts of data, you have to understand the physical data organization and how query optimizer is likely to use it, which is kind of contradictory to the idea of the SQL as a "declarative" language where you just say what you want, and let the query optimizer figure out how to get it.

Instead, you have to design your indexes carefully to coax the optimizer into choosing a reasonable access path for your particular query. And do the same for all queries where performance is important.

Indexes are fundamentally not about data, but about access patterns. Which is what the developers are responsible for. That's why physical database design is a development task, not database administration task.

fomine34y ago

I think RDB is one of the most leaky abstraction. Despite that, it's still very useful.

myspy4y ago

Something other which is hard is writing performant queries. Using statements with subqueries/in syntax for example.

And I always forget which join does what.

AdrianB14y ago

Remembering which join does what is easy: inner joins strictly joins the tables, left takes all on left (first table), right takes all on right (second table), outer (or cross) join are so rarely used you don't need to memorize.

1 more reply

MeinBlutIstBlau4y ago

same with the joins. on paper it makes sense, in practice it does not. if its more than a "select * from dbo.whatever where column abc = 'thing'" i have to refer back to notes and play with it.

1 more reply

btilly4y ago

Meh, summarizing, grouping, etc aren't that hard.

However WINDOW queries definitely have a learning curve. Not the least because useful examples almost always require you to use a nested query.

magicalhippo4y ago

I learned SQL on a need to know basis. For me, recursive queries were the ones that needed the most time to click.

Another one that caught me by surprise was NULL vs unknown[1]. That bit me in a couple of queries.

[1]: https://learnsql.com/blog/understanding-use-null-sql/

2 more replies

SilverRed4y ago

A lot of people try to do stuff that doesn't actually make sense when it comes to groups. Like selecting a column which isn't contained in the group by. And they are confused by the error but when you talk them through it "What did you actually want to see? There are multiple values for this column now" it starts to become clear to them.

AdrianB14y ago

Not really. I used to teach SQL not a long time ago and about 1/4 of the trainees were getting up to speed fast, about half in a reasonable time, the rest were there only because they were sent there by their managers.

I found that the most important success factors in learning SQL is the analytical thinking of the trainee and the way the trainer is explaining the concepts, in what order and what examples are used (the best examples are the ones the trainees meet in their regular work).

The functions are simple, the only difficulty is to remember the ones that are not used often enough (ex: some window functions). Even in that case, a quick check in the documentation is enough to get up to speed. The major difficulty with SQL is to write efficient queries on large data volumes, covered by the right indexes. This is very specific to each RDBMS, especially because of the tools helping with the work are specific (ex: SSMS, SQL Sentry Plan Explorer, statistics parser etc).

MeinBlutIstBlau4y ago

If all you do is SQL, anybody can learn it quick. If you do full stack, you're only gonna care about what gives you the data you need at that time.

1 more reply

5424584y ago

Personally I feel that it took me a while to get really comfortable with more complex joins. There’s an problem they used in the study that required joining a table with itself, and honestly I would probably take a while to come up with that answer, if at all.

weaksauce4y ago

A basic approach(probably what they are going for in a basics study) would be something like this off the top of my head

    select c.cid, c2.cid
    from customer as c
    inner join customer as c2 on c.street = c2.street
    where c.city <> c2.city

though that has reflective duplicates say (1, 5) would also have (5, 1) in the output. So I'm not sure if that's "allowed"

1 more reply

pge4y ago

and worst of all, anything that involves vendor-specific keywords…

dehrmann4y ago· 7 in thread

> For example, some students wrote queries containing ,≠, instead of != or <>.

Was this done on paper? Typing ≠ takes some doing.

nightpool4y ago

Yeah, IMO counting this as a syntax error is a pretty low blow. It's completely clear what the person intended, and they would probably have no problems clarifying if the researchers asked how they would type that query in.

turnerc4y ago

From the study:

> Participants wrote their notes and answers on paper, which they showed in front of the webcam.

Yes it seems they did

dehrmann4y ago

That's pretty flawed methodology since you'd want to know what problems people encounter in the real world and how quickly they solve them.

jve4y ago

I mean, who writes queries on paper? And by pressing Execute, SQL will gladly provide error message that the student will, I think, quickly resolve. Is this a contest that measures how well students can write compile-able code without executing it or what?

Moreover, IDE would have helped those students for sure: SQL keywords are highlighted in different color + autocomplete.

pcblues4y ago

Easy if you have an APL keyboard :) (Hint, it's on the 8) https://www.dyalog.com/uploads/images/Business/products/us_r...

Wevah4y ago

Option-= on a Mac with the US layout, fwiw.

JadeNB4y ago

Man, Mac's keyboard shortcuts for special characters irritate me so much.

First, they're there, and it's absolutely wonderful! I use far more semantically accurate Unicode rather than lossy ASCII approximations than I did back in my old Windows days. (If you don't know the special characters you can get, turn on Keyboard Viewer and whack your keyboard, especially modifier keys, a bit.)

But … I can't customise them. Even back in the days when macOS was OS X and believed in user customisation, these specific shortcuts were frozen and un-customizable. (Like the folder shortcuts in Finder. Maybe it makes sense to you for CMD-SHIFT-D to open the Downloads folder, not the Desktop. Too bad!)

(Boy, I hope I'm wrong and someone will come along and explain my stupidity to me.)

1 more reply

DaiPlusPlus4y ago· 7 in thread

I noticed that the article doesn't mention relational-calculus at all, only relational-algebra. That's a huge oversight, imo - as I feel one needs to understand both RA and RC in order to grok SQL and other RC-like systems, like Linq in C#/.NET and List-comprehensions in Python (or even use those before RC/RA and SQL).

-------

Rather than improve how SQL is taught (which seems to be the paper's objective), why not improve SQL so it isn't as horrible to try to learn in the first place?

The barriers to grokking SQL could be lowered considerably if SQL made minor adjustments like moving the projection part of a SELECT query to being below or syntactically after the WHERE clause instead of being at the top, and making SQL more "natural" to write-in without needing excessively verbose inner-derived-table expressions when all you want is to do perform some repetitive calculation which will be reused in later query steps.

Also, the GROUP BY clause really needs to be renamed to "AGGREGATE BY" or similar, because when normal people think "group" they're probably thinking of sorting/ORDER BY or PARTITION BY and they certainly don't imagine "don't display these rows at all, lol".

I just don't understand what drives the ISO SQL language design committee - I'd have thought that the newer revisions (e.g. SQL-2003) would have improved the language's ergonomics - on the contrary: the language's grammar and verbosity gets worse every release, and the team has strange priorities: apparently they feel needing to generate in-memory XML is more important than deferrable constraints - and I only ever see ISO SQL's XML features being abused to make-up for a lack of decent string-aggregation functions.

(...I could talk for hours about everything wrong with SQL.)

de6u99er4y ago

The trick of becoming really good at SQL (applies to all areas of IT) is having a certain ambition to produce high performance beautiful (readable) code.

This requires experience which can only be gained by rolling up your sleves and working on stuff until the high ambition has been satisfied. Sometimes when I see old code from myself, and I can follow what I have been doing I get really proud of myself. Many times I end up slightly improving it based on new knowledge I have acquired since I initially wrote it.

AdrianB14y ago

Readable code can be easily done via good formatting, but performance requires a combination of writing the query in the right way and the indexes to support it. The second part is not even visible from the query and most of the time is not self-explanatory, but the best part is that indexing is not even universally valid, statistics decide execution plans and the same query with the same indexes can result in very different performance on 2 different instances.

minism4y ago

High performance and readable certainly. Not sure why beautiful would be something to strive for though

1 more reply

jimbob454y ago

If the general computing community can agree on anything over the last 20 years, it’s that Python 2->3 was a disaster. Even though Python 3 made several highly necessary (and irreversibly transformative) changes to Python 2, no one liked it because it fundamentally changed the language to something unfamiliar.

I’m guessing the SQL and C++ committees looked at that transition and decided that such transformative changes really need to be done in new languages (like the Perl -> Raku change) rather than in a new version which risks alienating your existing base.

DaiPlusPlus4y ago

Oh of course - I have no doubt the ISO SQL committee is so conservative (no... they're regressive) is because of the sheer collective industry investment in not-only SQL tooling and SQL-compatible databases, but just energy-spent in teaching non-CS/SE/programmer types in businesses how to express their data-queries in SQL. It's very, very difficult to get the kind of industry cohesiveness around any technical standard, so the fact that SQL is so widely supported is a miracle (though it probably has something to do with US federal government requirements for information systems to support it, just like how POSIX is a thing because of the fed pushing for it).

To be clear: I am not advocating for a brand new query-language syntax or any kind of Python3-style overhaul, but I'd like to see SQL start to take small steps towards integrating the lessons learned from the past 60+ years of language design rather than doing the complete opposite.

btilly4y ago

The Python 2 to Python 3 migration was such a disaster that Python 3 is now used by both more programmers, and a higher proportion of programmers, than Python 2 ever managed.

This is not to minimize the pain of switching. But it does not seem to ahve limited the success of the language.

darksaints4y ago

Python 2->3 was a disaster, but I'd refrain from extrapolation because a lot of that difficulty was very specific to dynamic typing or python itself. There are tons of languages that have gone through far more transformative changes in the core semantics of the language, and have gone a lot smoother.

pcblues4y ago· 6 in thread

I have been developing software that includes SQL for twenty years, and watched my own mental progress from misunderstanding to understanding. I found the biggest initial problem is that I used to imagine SQL queries as an imperative language rather than as expressions of data. Maybe in the teaching of SQL, this should be highlighted so absolute beginners can have that mental model when they are formulating solutions and grappling with the syntax.

k__4y ago

I don't know if that's enough.

Understanding the difference between declarative and imperative programming is rather hard with all the abstractions we have today.

People always say, declarative programming is defining what you want, not doing the steps needed to get it. But today no imperative interface requires you to do all the steps either, plus, most programming languages use both paradigms at the same time.

btilly4y ago

Understanding the difference between declarative and imperative programming is rather hard with all the abstractions we have today.

The distinction is are you telling the computer *how* to do it, or telling it *what* steps to take.

If, even with access to all of the code, you'd have to ask the computer how it chose to do it to figure out what it did, you have a declarative system. If the code reads like instructions for a recipe, it is imperative.

The complications come with the fact that these two paradigms do not describe all of the possibilities. Notably object oriented and functional designs are neither imperative or declarative. (But may share some features with both.)

2 more replies

pcblues4y ago

I guess my point was that if you are trying to achieve results in a language paradigm that isn't the one the language was designed for, the learning curve is _really_ steep, and to use the declarative features of any language still requires you to understand the paradigms' differences. When I was at uni it wasn't until the final year that Programming Paradigms was a course, but even a rough introduction to them in any of the languages I studied earlier would have helped. Something like, "This language is used like this. It is not used like this, for example."

1 more reply

dboreham4y ago

I've made significant $$ over the years being a person who can understand the imperative code the database server will likely execute to make that declarative goodness happen.

tanin4y ago

This is actually quite important.

Some tasks that are seemingly simple in a normal programming language can sometimes be impossible to achieve in SQL (e.g. dynamically generated columns...)

SilverRed4y ago

Thats where you create a massive beast of code which dynamically generates SQL from fragments with string interpolation!

1 more reply

listenallyall4y ago· 4 in thread

SQL is hard not because of terrible syntax, but because the underlying logic of defining exactly what you want, is difficult.

SQL, while not perfect, is very compact and direct. It allows you to express what you want succinctly and without boilerplate. No classes, no variable declarations (sure you can DECLARE a variable, but it is rarely needed), no dependencies or imports.

There's a reason why, despite the promises of every BI tool that it will "simplify" your database and "empower users", none of them have toppled SQL or even added anything useful that SQL could incorporate.

Graphical query designers are nice but have limited capabilities. SQL could occasionally be less verbose and IDEs could probably do better in reducing keystrokes (better autocomplete), but SQL itself, overall, is pure and brilliant.

Groxx4y ago

Yeah - my class covered the language and meanings of the terms in a couple days. It's wonderfully simple. Which makes it hard in the same ways programming is hard - arrays are trivial! People still screw up bounds checks routinely!

Figuring out what you want, and understanding your data well enough to know what's viable and what's nonsense, is infinitely harder. And it changes every time.

Getting good at that part is "expertise" in a nutshell - gradually learning what strategies work and when, and getting better and better at your guesses. That takes more than a few weeks; that's an entire career.

nojito4y ago

> There's a reason why, despite the promises of every BI tool that it will "simplify" your database and "empower users", none of them have toppled SQL or even added anything useful that SQL could incorporate.

Pivoting data is still extremely painful using raw SQL

listenallyall4y ago

That's a semi-valid objection but I'd argue that's not really within the scope of SQL (and the PIVOT keyword is not an official part of the language, I believe). SQL is a language to interact with databases, SELECT queries are specifically for extracting data out of a database. Pivoting is typically something you do with data you've already extracted, that currently resides in memory, which is why it is fast, there's no additional data retrieval each time you change the aggregations or filters in a pivot table. Put it this way -- a SQL-based pivot table engine, that re-queries the database with every change, would perform awfully compared to a simple Excel PivotTable.

Similarly, you can't use SQL to apply color formatting to any of the result rows or values -- but that was never a goal SQL intended to achieve.

2 more replies

nmz4y ago

and also because you have to pay to use Q and kdb+

tracyhenry4y ago· 4 in thread

SQL has a steep learning curve. It expose almost zero insights into the underlying query execution. As a result, increasingly amount of inefficient queries are being written by ML engineers, who in general care little about query efficiency. The solution right now seems to have a team of data engineers to optimize the queries.

Should we think about an alternative, at least for ML ETL workloads?

tester7564y ago

>SQL has a steep learning curve.

Does it? I think SQL just sucks and its tooling sucks too

Even SQL Management Studio which felt way better than PGAdmin is miles behind IntelliSense that's offered by Visual Studio for C# (when it comes to reliability)

SQL would benefit a lot from being like C#'s LINQ (Query syntax) e.g:

var result = from s in stringList

            where s.Contains("Tutorials") 

            select s;

some SQLv2 is something we need

keithnz4y ago

try DataGrip, it's got really nice intellisense and autocomplete. I'm not sure how your example from linq is any better than SQL

   select s from stringList where s like '%Tutorials%'

2 more replies

da39a3ee4y ago

https://opensource.google/projects/logica

dspillett4y ago

> SQL has a steep learning curve.

Overall I don't think it is that steep, though maybe I'm blinded by having worked with various implementations of it for more than two decades. The key sticking point is jumping to thinking in a set based manner to get best results. The rest of the difficult parts are when you need to think about implementation details because the query planners are no perfect (index hints and such) or being aware of limitations (like postgres before the latest major version having optimisation fences around CTEs).

> It expose almost zero insights into the underlying query execution.

That is pretty much by design. It is intended that you say what you want and let the query planner worry about implementation details. Of course how you tell it what you want involves learning to express those intentions in SQL. It does fall apart a bit when implementation limitations become an issue, at which point you are forced to think about the underlying implementation and how you might prod this more imperative code so that it interprets and process your relational descriptions most efficiently.

> As a result, increasingly amount of inefficient queries are being written by ML engineers

That isn't specific to ML. I see a lot of inefficient data interaction from code written by other devs. This seems to be for two reasons:

1. People seem to have taken to heart “make it work, make it work correctly, only then worry about making it work fast” to heart but tend to skip that last part and assume because all is well with their test sets of data at hundreds or thousands of rows (or sometimes tens and singles) that it'll scale just find to the hundreds of thousands or more that the clients datasets will eventually contain.

2. People using further abstractions without much care for how they implement their directives (again, in an ideal world they shouldn't have to), resulting in massively overcomplex queries as the framework tries to be clever and helpful and preempt what might be needed, getting everything whether needed or not (effectively `SELECT `) meaning the query planner can't apply families of its internal tricks for better performance, or getting many rows individually instead of as a set which sometimes means a lot of extra work for each row.

There is a definite “we'll worry about that when it happens attitude in both cases which is dangerous. While a live system has practically ground to a halt and the client needs their report by EOP or someone will get it in the neck (and be sure: they will pass that on to you!) is not a good time to be optimising data access, or worse finding out the structure just doesn't support efficient generation of the required data. Another common failing is applying what would idealy be UI or BLL concerns (timezone conversions etc) in the SQL statements in a way that blocks index use.

> Should we think about an alternative, at least for ML ETL workloads?*

I don't work with ML so that is a little outside my day-to-day wexpertise, but I'd wager ETL there has the same problem as everywhere: the basics are all well known and very well optimised for already. The rest differ so much between applications that no one abstraction would be optimal for more than a small portion of real world needs.

I'd be wary of a separate team for optimising queries. I suggest a reasonable understanding in the whole dev team with a data expert embedded who is involved in design work and code reviews so issues are caught early and junior devs can be tutored as needed so by the time they are seniors they don't need the data expert except for really gnarly problems or long-term planning.

da39a3ee4y ago· 3 in thread

I've done backend web development with a relational DB via an ORM for 10 years. I'm OK at that, but I'm fucking hopeless at SQL. I know that my opinions are thus undermined, but I really wish we could get rid of SQL and replace it with something like logica [1] like today.

SQL's pseudo-natural language syntax is an embarrassment and its lack of composability is even more of an embarrassment.

[1] https://opensource.google/projects/logica

simonw4y ago

Have you used CTEs much (aka the WITH statement)?

I find them to be a huge step forwards in terms of adding composability to complex queries.

odipar4y ago

Yes I concur: CTEs is closer to the spirit of relational algebra: every step/expression should yield a table/relation.

As data munging is about combining/correlating/sorting/grouping data, why not have a sound (bag) algebra to do that? Such algebra would give us equational reasoning, proofs, etc.

And consequently: students would be learning an algebra which is easier to learn IMO.

da39a3ee4y ago

Thanks yes I have learned to use the WITH statement, and I agree it gives more composability. But still, a half way house wouldn't you say?

odipar4y ago· 2 in thread

My first encounter with 'SQL' was a course on relational algebra that was taught at my university.

It started out with defining relations as a mathematical construct, and continued with various operators on such relations. Then they continued explaining the various normal forms up the fifth normal form. I was completely out of my depth, but at least it was good and solid theory that could be learned.

What really messed with my head is they then introduced SQL as a 'practical' implementation of relational algebra. I'm still having nightmares where I try to understand nested HAVING statements that where asked at the exams.

Hey relations don't contain duplicates! But that's OK. We should call (modern) SQL 'BAG ALGEBRA'.

geophile4y ago

Yes, SQL is about bags, not sets.

And anything having to do with aggregation cannot be expressed in relational algebra.

melony4y ago

Don't forget loops

hahamrfunnyguy4y ago· 2 in thread

In my experience, inexperienced database developers pick up SQL fairly quickly under the guidance of an experienced mentor.

Spooky234y ago

This.

Usually I see people struggling to formulate questions. They know what they want, but don’t understand how to get there. Left to their own devices, they hack up some nightmare in Excel.

I worked with a summer intern on creating reports and learning SQL. She was a really smart business major who ended up with the wrong work assignment. I was getting 5-7 questions a day from her in June, 1-2 a week in July and by the time I got back from vacation in August, she had basically done about 90% of a project that was going to be hired out and was showing me some features of the database we were using that I didn’t know!

It inspired her to switch majors and she is a fancy data scientist somewhere! Awesome mentor experience.

edoceo4y ago

+1 for mentoring and pairing. Or mentoring via pairing. Getting with and expert, or even just another set of experienced eyes is a big help

ttfkam4y ago· 2 in thread

Step 1: "SQL sucks!"

Step 2: Let's make a database engine that doesn't use SQL.

Step 3: "This is hard!"

Step 4: Make SQL access layer.

Wash. Rinse. Repeat.

See: PartiQL

Those who ignore the lessons of SQL are doomed to reimplement them...poorly.

ttfkam4y ago

Corollary:

1. "SQL doesn't scale!"

2. We made this database engine that's "web scale"!

3. "This is hard to use!"

4. Make SQL access layer.

See: Spanner

Those who blame SQL for their performance problems are doomed to repeat them using a proprietary syntax.

int_19h4y ago

This is completely orthogonal to having a saner query language for a relational database. SQL is like C - it does the job, but it has so many legacy warts, it's not all that hard to do better. The only real problem is overcoming how entrenched it is in the industry, which requires any replacement to be substantially better than SQL - so that the gains can justify the logistical costs of switching.

TrackerFF4y ago· 1 in thread

I see they mentioned previous course knowledge - this is something you see in many (programming) classes.

Students that have zero prior knowledge in programming, are able to pick up functional programming pretty easy. Students that have studied and used paradigms like OOP, seem to have a hard time grokking functional programming - as they see everything through the lens of OOP (and the languages they've used).

btilly4y ago

Functional and OOP techniques do not seem to be best friends.

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom... uses Java to discuss what this can look like in an extreme case.

patrakov4y ago

Direct link to the research paper, instead of the summary: https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 (pdf)

ipaddr4y ago

One of the best ways to learn advanced oracle specifically is through the ask tom q/a. The question are difficult and the answers teach more than any course.

https://asktom.oracle.com/pls/apex/f?p=100:1000::::::

agumonkey4y ago

To me the most difficult part is learning sql before knowing what can be done with a computer, both on the complexity and the language design part.

Before learning interpreters/compilers/prolog, I'd spend a lot of time trying to figure out about naming/namespaces in queries, while after doing some PLT, it all becomes very very obvious, you can now focus on the operators and since you'd know how far can programming go, you'd see faster how nested queries could make sense, what aggregating functions meants etc

zxcvbn40384y ago

I can’t count how many times I’ve found developers doing “select *” and doing all of the logic and sorting themselves. That goes horribly wrong when you get to production and the database is larger then ram. Then they just stare at you and blink in total disbelief that such a thing is possible.

chmod7754y ago

> This was not always the case for our participants, who wrote queries with synonyms of the correct words, leading to queries that will not be executed.

This shows that those participants lack very basic foundational knowledge. It doesn't surprise me, because in my experience all programming courses that taught SQL early have been terrible.

Nobody who already has a basic understanding of computer science would make this mistake.

On the flipside there's really no point of devoting much time to teaching SQL later, because once you have a good understanding of data structures and algorithms, it is rather easy to make educated guesses of what is happening behind the scenes in a database - and you would have no trouble of teaching yourself SQL if necessary at some point.

Not to mention that teaching databases before what makes up their implementation is teaching software development in precisely the opposite way it is practiced: the composition of lower level concepts into higher level abstractions.

Last but not least, when you're teaching future software engineers, at the end you don't want them to just say "I can use this", you want them to say "I could build this". Teaching SQL early smells like surrender.

tanin4y ago

I've built a desktop app where you can load a CSV and start writing SQL on it. A lower barrier to try it out might be helpful in learning SQL.

See: https://superintendent.app

gnat4y ago

https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 has the actual paper.

monkeydust4y ago

Have been using openai codex for a week and it's shockingly good at SQL with well defined prompts.

jimjimjim4y ago

gone are the days of:

- SELECT name, address

- FROM user, location

- WHERE user.locationid = location.id

j / k navigate · click thread line to collapse

126 comments

78 comments · 20 top-level

sdevonoes4y ago· 20 in thread

I takes days/weeks to pick up the core of SQL:

- create tables, update the schema, insert rows, add an index

- select, filters, joins, order by, limit, inner queries

It takes forever to be comfortable with:

- anything that involves summarizing, grouping, having, min, max, windows

fifilura4y ago

I think one tip here is to always work with CTEs "WITH", and not nest queries. That way you can always go back and check "what is it I join with what", by querying the individual steps with LIMIT 10.

But that said, these days a lot of it happens intuitively for me, I pretty much know the solution before I can spell it out. It certainly was not like that when I started.

eyelidlessness4y ago

Be careful with blanket CTE recommendations. They tend to deoptimize with writes in really surprising ways. If you’re just reading data, they’re great.

1 more reply

DemocracyFTW4y ago

moonchrome4y ago

Have some links on using CTEs to replace nested queries and how it helps ?

I've used SQL enough to have to write nested queries, haven't dove further than that.

1 more reply

prpl4y ago

1 more reply

odipar4y ago

Yep, CTEs are a huge boon to structure your SQL - use them where you can.

KronisLV4y ago

I've always found it hard to articulate the problems that i have with SQL and with the "WITH" CTEs, but let me try anyways.

For starters, i can never actually tests parts of those queries without rewriting the query up to the part that i want to test, for example:

  WITH 
    query_one AS (SELECT ...),
    query_two AS (SELECT ...),
    query_three AS (SELECT ...)
    SELECT ... /* main query */

So i'd like to do the following:

  PACKAGE my_snippets BODY IS
    SNIPPET query_one
      SELECT ... /* probably 500 lines long but often used snippet */
    END query_one;
  END my_snippets;
  
  /* and then, somewhere in code */
  WITH
    query_one AS my_snippets.query_one,
    query_two AS (SELECT ...),
    query_three AS (SELECT ...)
    SELECT ... /* main query */

2 more replies

branko_d4y ago

Not so sure about aggregation, but what definitely takes forever is performance.

fomine34y ago

I think RDB is one of the most leaky abstraction. Despite that, it's still very useful.

myspy4y ago

Something other which is hard is writing performant queries. Using statements with subqueries/in syntax for example.

And I always forget which join does what.

AdrianB14y ago

1 more reply

MeinBlutIstBlau4y ago

same with the joins. on paper it makes sense, in practice it does not. if its more than a "select * from dbo.whatever where column abc = 'thing'" i have to refer back to notes and play with it.

1 more reply

btilly4y ago

Meh, summarizing, grouping, etc aren't that hard.

However WINDOW queries definitely have a learning curve. Not the least because useful examples almost always require you to use a nested query.

magicalhippo4y ago

I learned SQL on a need to know basis. For me, recursive queries were the ones that needed the most time to click.

Another one that caught me by surprise was NULL vs unknown[1]. That bit me in a couple of queries.

[1]: https://learnsql.com/blog/understanding-use-null-sql/

2 more replies

SilverRed4y ago

AdrianB14y ago

MeinBlutIstBlau4y ago

If all you do is SQL, anybody can learn it quick. If you do full stack, you're only gonna care about what gives you the data you need at that time.

1 more reply

5424584y ago

weaksauce4y ago

A basic approach(probably what they are going for in a basics study) would be something like this off the top of my head

    select c.cid, c2.cid
    from customer as c
    inner join customer as c2 on c.street = c2.street
    where c.city <> c2.city

though that has reflective duplicates say (1, 5) would also have (5, 1) in the output. So I'm not sure if that's "allowed"

1 more reply

pge4y ago

and worst of all, anything that involves vendor-specific keywords…

dehrmann4y ago· 7 in thread

> For example, some students wrote queries containing ,≠, instead of != or <>.

Was this done on paper? Typing ≠ takes some doing.

nightpool4y ago

turnerc4y ago

From the study:

> Participants wrote their notes and answers on paper, which they showed in front of the webcam.

Yes it seems they did

dehrmann4y ago

That's pretty flawed methodology since you'd want to know what problems people encounter in the real world and how quickly they solve them.

jve4y ago

Moreover, IDE would have helped those students for sure: SQL keywords are highlighted in different color + autocomplete.

pcblues4y ago

Easy if you have an APL keyboard :) (Hint, it's on the 8) https://www.dyalog.com/uploads/images/Business/products/us_r...

Wevah4y ago

Option-= on a Mac with the US layout, fwiw.

JadeNB4y ago

Man, Mac's keyboard shortcuts for special characters irritate me so much.

(Boy, I hope I'm wrong and someone will come along and explain my stupidity to me.)

1 more reply

DaiPlusPlus4y ago· 7 in thread

-------

Rather than improve how SQL is taught (which seems to be the paper's objective), why not improve SQL so it isn't as horrible to try to learn in the first place?

(...I could talk for hours about everything wrong with SQL.)

de6u99er4y ago

The trick of becoming really good at SQL (applies to all areas of IT) is having a certain ambition to produce high performance beautiful (readable) code.

AdrianB14y ago

minism4y ago

High performance and readable certainly. Not sure why beautiful would be something to strive for though

1 more reply

jimbob454y ago

DaiPlusPlus4y ago

btilly4y ago

The Python 2 to Python 3 migration was such a disaster that Python 3 is now used by both more programmers, and a higher proportion of programmers, than Python 2 ever managed.

This is not to minimize the pain of switching. But it does not seem to ahve limited the success of the language.

darksaints4y ago

pcblues4y ago· 6 in thread

k__4y ago

I don't know if that's enough.

Understanding the difference between declarative and imperative programming is rather hard with all the abstractions we have today.

btilly4y ago

Understanding the difference between declarative and imperative programming is rather hard with all the abstractions we have today.

The distinction is are you telling the computer *how* to do it, or telling it *what* steps to take.

2 more replies

pcblues4y ago

1 more reply

dboreham4y ago

I've made significant $$ over the years being a person who can understand the imperative code the database server will likely execute to make that declarative goodness happen.

tanin4y ago

This is actually quite important.

Some tasks that are seemingly simple in a normal programming language can sometimes be impossible to achieve in SQL (e.g. dynamically generated columns...)

SilverRed4y ago

Thats where you create a massive beast of code which dynamically generates SQL from fragments with string interpolation!

1 more reply

listenallyall4y ago· 4 in thread

SQL is hard not because of terrible syntax, but because the underlying logic of defining exactly what you want, is difficult.

Groxx4y ago

Figuring out what you want, and understanding your data well enough to know what's viable and what's nonsense, is infinitely harder. And it changes every time.

nojito4y ago

Pivoting data is still extremely painful using raw SQL

listenallyall4y ago

Similarly, you can't use SQL to apply color formatting to any of the result rows or values -- but that was never a goal SQL intended to achieve.

2 more replies

nmz4y ago

and also because you have to pay to use Q and kdb+

tracyhenry4y ago· 4 in thread

Should we think about an alternative, at least for ML ETL workloads?

tester7564y ago

>SQL has a steep learning curve.

Does it? I think SQL just sucks and its tooling sucks too

Even SQL Management Studio which felt way better than PGAdmin is miles behind IntelliSense that's offered by Visual Studio for C# (when it comes to reliability)

SQL would benefit a lot from being like C#'s LINQ (Query syntax) e.g:

var result = from s in stringList

            where s.Contains("Tutorials") 

            select s;

some SQLv2 is something we need

keithnz4y ago

try DataGrip, it's got really nice intellisense and autocomplete. I'm not sure how your example from linq is any better than SQL

   select s from stringList where s like '%Tutorials%'

2 more replies

da39a3ee4y ago

https://opensource.google/projects/logica

dspillett4y ago

> SQL has a steep learning curve.

> It expose almost zero insights into the underlying query execution.

> As a result, increasingly amount of inefficient queries are being written by ML engineers

That isn't specific to ML. I see a lot of inefficient data interaction from code written by other devs. This seems to be for two reasons:

> Should we think about an alternative, at least for ML ETL workloads?*

da39a3ee4y ago· 3 in thread

SQL's pseudo-natural language syntax is an embarrassment and its lack of composability is even more of an embarrassment.

[1] https://opensource.google/projects/logica

simonw4y ago

Have you used CTEs much (aka the WITH statement)?

I find them to be a huge step forwards in terms of adding composability to complex queries.

odipar4y ago

Yes I concur: CTEs is closer to the spirit of relational algebra: every step/expression should yield a table/relation.

As data munging is about combining/correlating/sorting/grouping data, why not have a sound (bag) algebra to do that? Such algebra would give us equational reasoning, proofs, etc.

And consequently: students would be learning an algebra which is easier to learn IMO.

da39a3ee4y ago

Thanks yes I have learned to use the WITH statement, and I agree it gives more composability. But still, a half way house wouldn't you say?

odipar4y ago· 2 in thread

My first encounter with 'SQL' was a course on relational algebra that was taught at my university.

Hey relations don't contain duplicates! But that's OK. We should call (modern) SQL 'BAG ALGEBRA'.

geophile4y ago

Yes, SQL is about bags, not sets.

And anything having to do with aggregation cannot be expressed in relational algebra.

melony4y ago

Don't forget loops

hahamrfunnyguy4y ago· 2 in thread

In my experience, inexperienced database developers pick up SQL fairly quickly under the guidance of an experienced mentor.

Spooky234y ago

This.

Usually I see people struggling to formulate questions. They know what they want, but don’t understand how to get there. Left to their own devices, they hack up some nightmare in Excel.

It inspired her to switch majors and she is a fancy data scientist somewhere! Awesome mentor experience.

edoceo4y ago

+1 for mentoring and pairing. Or mentoring via pairing. Getting with and expert, or even just another set of experienced eyes is a big help

ttfkam4y ago· 2 in thread

Step 1: "SQL sucks!"

Step 2: Let's make a database engine that doesn't use SQL.

Step 3: "This is hard!"

Step 4: Make SQL access layer.

Wash. Rinse. Repeat.

See: PartiQL

Those who ignore the lessons of SQL are doomed to reimplement them...poorly.

ttfkam4y ago

Corollary:

1. "SQL doesn't scale!"

2. We made this database engine that's "web scale"!

3. "This is hard to use!"

4. Make SQL access layer.

See: Spanner

Those who blame SQL for their performance problems are doomed to repeat them using a proprietary syntax.

int_19h4y ago

TrackerFF4y ago· 1 in thread

I see they mentioned previous course knowledge - this is something you see in many (programming) classes.

btilly4y ago

Functional and OOP techniques do not seem to be best friends.

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom... uses Java to discuss what this can look like in an extreme case.

patrakov4y ago

Direct link to the research paper, instead of the summary: https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 (pdf)

ipaddr4y ago

One of the best ways to learn advanced oracle specifically is through the ask tom q/a. The question are difficult and the answers teach more than any course.

https://asktom.oracle.com/pls/apex/f?p=100:1000::::::

agumonkey4y ago

To me the most difficult part is learning sql before knowing what can be done with a computer, both on the complexity and the language design part.

zxcvbn40384y ago

chmod7754y ago

> This was not always the case for our participants, who wrote queries with synonyms of the correct words, leading to queries that will not be executed.

This shows that those participants lack very basic foundational knowledge. It doesn't surprise me, because in my experience all programming courses that taught SQL early have been terrible.

Nobody who already has a basic understanding of computer science would make this mistake.

tanin4y ago

I've built a desktop app where you can load a CSV and start writing SQL on it. A lower barrier to try it out might be helpful in learning SQL.

See: https://superintendent.app

gnat4y ago

https://dl.acm.org/doi/pdf/10.1145/3446871.3469759 has the actual paper.

monkeydust4y ago

Have been using openai codex for a week and it's shockingly good at SQL with well defined prompts.

jimjimjim4y ago

gone are the days of:

- SELECT name, address

- FROM user, location

- WHERE user.locationid = location.id

j / k navigate · click thread line to collapse