Plus there is the whole, "My report wants to use the same logic as this web form", oh it's in Java? Guess we'll just pull the data out, process it, populate a temp table, report it, clear those values....
If you have business data in SQL, just write the logic in SQL.
That said, the single biggest performance drain I've seen is people not using the database, and instead pulling data out to process in code. I don't think I've ever seen a case where it was faster to process data outside of the database - though there has also been plenty of times when the pertinent SQL to had to be written "smarter." Think in sets, not iteratively! There's a terrifying amount of loops in a lot of legacy SQL statements.
I think the problem is maintenance instead of performance. When you really want performance (in data reads) youll be precomputing and storing the values. Then your report can filter and be indexed properly.
I dont think that any tech solution will give you fast reads and fast writes you need to pick one.
Your server probably has alot of down time ... use it , make a cron job and make a seperate reporting table ... reporting on raw data does not scale , window functions do not scale
Using window functions is equivalent to using a better cpu. Maybe the environment you work in is faster but you have not factored the logic to make the best use of the resources.
You can take a crap approach in java and port it to sql. But the approach is crap and you should create a seperate reporting process or at least use hadoop.
For Postgres, for example, you can write stored procedures using any of these: https://www.postgresql.org/docs/9.1/static/external-pl.html
One good reason to avoid it and stored procedures is for parallelism. But you need a lot of data to make that worthwhile. Especially in these days of 3+Tb RAM machines.
Also, the site is blocked for me currently...
I guess I need to rethink my design a little.
I think it's due to lack of reading the question properly and impatience.
But it also might be due to how programmers sometimes think/work. Often, when I am trying to solve a problem I like writing something (anything) down in broad strokes first so that I can then go back in and 'fix' each smaller part of the larger problem.
Thank you for making this.
Q6 has a flaw. You request "Order by: ranking, name" but the accepted solution sorts by rank only (Alfie should come before Ashes)
Nice quiz, by the way, and I like the topic. In my experience, too few devs / dbas make good use of window functions.
But I think the design was good. I also noticed the option to look at the schema, if needed.
Their answer:
select age, sum(weight) as total_weight
from cats
group by age
having sum(weight) > 12;
Which is equivalent to: select age, sum(weight) as total_weight
from cats
group by age
having total_weight > 12;Also, optimisers aren't stupid, the actual sum() function will not be recalculated.
Answer from core PostgreSQL dev, Tom Lane
There's also this: https://bugs.mysql.com/bug.php?id=48652 - still open after all these years (and yes I know there are much longer open bugs).
However, depending on your application, you may be enforcing these constraints elsewhere and not need them at the engine level, even if it's a nice-to-have.
[1] http://download.nust.na/pub6/mysql/doc/refman/5.4/en/storage...
(And yes, the workflow does include actions that involve multiple tables in one go.)
https://dev.mysql.com/doc/refman/8.0/en/window-functions.htm...
However, you really really need to be careful about SQL injection. I can see that you tried to lock it down as much as possible, as far as I can tell the account the queries are running under only has SELECT permission (no update/delete/etc). However I was still able to get some data you probably don't want me to have...
This query returns a list of all tables in your database:
SELECT c.relname||'' FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
And this query returns a list of users and their privileges:
SELECT usename||'', usecreatedb||'', usesuper||'', usecatupd||'' FROM pg_user
I guess your name is Andy based on your HN username, but who is Rusty? :) And that's just what I've been able to get in 5 minutes of trying, let alone a determined attacker.
1. Q0's success messages says "Group by and Having are commonly used aggregate functions". This terminology is incorrect. SUM() is an aggregate function. GROUP BY and HAVING are clauses. 2. Without an ORDER BY clause, there's no guarantee of the order of the result, but the quiz appears to require the output rows to be in the same (in the case of Q0, undefined) order as the expected rows. While this is unlikely to spontaneously change, it's like depending on the order of an unordered hash map. As long as you're teaching people things, it's good to instill the habit of always ordering their results.
Q7 includes output:
Misty 5.7 0.2
Oscar 6.1 0.4
Smokey 6.1 0.0
How would one show 0.4 for both Oscar and Smokey?The site is down right now, so I had to test using my own db instance. Column and table names might not match.
WITH
distinct_weights AS (
SELECT DISTINCT weight
FROM cats),
weights AS (
SELECT
weight,
lag(weight)
OVER (
ORDER BY weight ) AS lag
FROM distinct_weights)
SELECT
cats.name,
cats.weight,
cats.weight - weights.lag
FROM cats, weights
WHERE cats.weight = weights.weight
ORDER BY weight;
yields Any 5.5 null
Misty 5.7 0.2
Oscar 6.1 0.4
Smokey 6.1 0.4 SELECT c.name, c.weight, c.weight - lw.lower_weight AS weight_to_lose
FROM cats c
JOIN (SELECT weight, lag(weight, 1, weight) over (order by weight)
lower_weight FROM cats
GROUP BY weight
ORDER BY weight) lw ON c.weight = lw.weight
ORDER BY weight> select name, sum(weight) OVER (PARTITION BY name) as running_total_weight from cats order by name
Anyway, overall very nice!
and Joe Celko https://www.red-gate.com/simple-talk/sql/t-sql-programming/w...
One question / bug report? Q7 rejects this:
select name, weight, weight - (min(weight) over (order by weight asc rows between 1 preceding and current row) ) as weight_to_lose from cats
is that much worse than lag() or is it just non-idiomatic?
(edit: and Q10 asks about "window", but it was quite hard to google. if you could provide another one of those links to the relevant docs, that could help a bunch :D)
Yep I need to fix up Q10 a bit.
And here is a small suggestion, you might want to change the name of output column "unique" in Q3 because.
*expression* as unique --does work and is recommended answer
*expression* unique --fail because of use of a restricted word
*expression* "unique" --work because of double quote
Using a restricted word that need quotation in some context is globally discouraged and in this context can keep someone stuck independently of the problem at stack.PS: also apply to "position" in Q4 maybe thats intentional?
* add questions about range mode
* have questions which discuss default behavior regarding ordering and partitioning
* Have questions which rely on things like RANGE 3 PRECEDING or other constructsSidenote: either HN killed your backend or I did with a recursive CTE. If so: sorry :)
Its a brand new project so I may need to do some hardening.
PostgreSQL 9.3.20 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bi
I wonder why so old version is being used. I will be not supported by the community pretty soon.