If you give all your ID fields unique names, e.g. by calling your field "reservation_id" instead of "id", even in the reservation table, you can do stuff like:
SELECT * FROM reservation JOIN guest USING (reservation_id);
By doing "USING (reservation_id)" instead of "ON reservation.id = guest.reservation_id", the field will be automatically deduplicated, so you don't have to qualify it elsewhere in the query, and "SELECT *" will return only one copy.
But I can see arguments in both directions. I think the big thing is consistency. If you consistently use <table name>_id as the PK and your tooling doesn't fight it, then cool.
I do feel like this benefit breaks down rather quickly though. For example, if you wanted to track who created a guest record, you probably wouldn't name the column user_id as it's ambiguous. So you might use creator_id to signify it points to the user who created the guest, not to the user invited to the reservation – thus negating your ability to join to a user table via USING.
That's not to say it's a bad tip, just because there are cases where it doesn't fit. Definitely worth knowing. Thanks for sharing.
All just to shorten join clauses with a natural join or USING? You already get shortened joins - less so, but still a gain - by using table aliases so as not to repeat the long table names in each query multiple times. IMO it also takes an extra moment to parse the ambiguous "id" columns to resolve them mentally to which id is being referred to.
Apparently, it made joins more clear, I seriously wonder how often people fucked that up for them to think it was a good idea to prefix every fucking column with its table name.
It's like when I see unit tests for setters and think, gee, setters seem pretty straightforward to me, how often to people fuck them up?
SELECT * FROM long_table_name l LEFT JOIN another_table_here a ON l.id = a.rel_id
Also, most tables end up being used to populate objects. It's simpler to reference an object with `Reservation.id` than `Reservation.reservation_id`Regardless of the design choices made it's more important to be consistent in naming conventions and to document why a particular design/schema was chosen.
Nobody ever uses the class "Reservation" with a property named "reservation_id" in their code. They will name their column "reservation_id" in the database, but then work to undo that at the application level by telling the ORM to map "reservation_id" to an object property named "id". It creates the situation where developers fight against the database schema in their code, and adds an unnecessary level of complexity when comparing code to database. "Why is it 'id' in code but 'reservation_id' in the database?".
A common counterargument from pure DBAs, who are not themselves writing code that integrates with the schemas they create, is that databases should be approached as a completely standalone component that should not be designed with applications' structures in mind. That applications are free to bastardize the database schema in any way they please, so it really doesn't matter anyway. I personally prefer to see it as a reality that applications will be the primary consumer of the database resource, and deserve to have their intended integrations analyzed. Once the database is finalized, it is the application developers who will be studying the schema on a weekly basis. Anything to ease that constant inspection process, the better.
Of course, this is all personal preference. I have learned over the years to finally stop arguing so much over these little details. In real life with coworkers, anyway. It's still fun to battle for your point of view with forum strangers! :D
The database already knows that guest.reservation_id is a foreign key referencing reservation.id. Why should you have to repeat yourself?
Problem is that it becomes problematic as soon as you keep such queries around (in stored procedures, source code, or batch files), and then modify your schema by adding or removing a foreign key.
Added:
Also SQL language itself has nothing to do with keys, relationships and other details of database design. It's different level of abstraction.
Various query builders/SQL generators do what you want.
This is almost exactly what "NATURAL JOIN" does. It is like "JOIN USING" with a pre-filled column list.
However, this isn't decided based on the schema, but based on equal column names. So you need to apply the dicipline as noted by the parent comment.
See also the sibling discussion thread at: https://news.ycombinator.com/item?id=11450187
You still have in outer joins. Also, SELECT * is a bad practice (with some exceptions) when used in code, suitable only for ad-hoc queries.
SELECT
table1.*,
table2.field
FROM table1
JOIN table2
ON table2.rel_id = table1.idSELECT * FROM reservation NATURAL INNER JOIN guest;
Imagine two two-column tables, A with id and name, and B with id and value. (Doesn't matter what the columns are; this is just for concreteness.)
select * from A JOIN B using (id) will give you a three column result set, id, name, value.
Now, suppose you want to do a schema modification to add "last _login" to A. Suddenly, your query is returning a 4 column result set, id, name, last_login, value and code using ordinal positions is now reading last_login into the "value" slot.
Maybe it's a corner-case for our use case and zero-downtime release processes, but it's something to be aware of anyway.
left.id = 'Something' COLLATE case insensitive
right.id = 'SOMETHING' COLLATE case insensitive
SELECT id FROM left JOIN right USING (id);
Will the value be 'Something' or 'SOMETHING'? Better avoid the surprise and be specific...This way, my db fetch is always on id and not a longer name ( reservation_id), which on the end of an webapp, makes me save some reading / writing :)
SELECT * FROM reservation r JOIN guest USING (r.id);As someone in the healthcare space, looking at some of these models gives me a better idea of how various aspects of the healthcare industry work, and the things they interact with. Ex) http://www.databaseanswers.org/data_models/patient_data_ware...
https://www.hl7.org/implement/standards/product_brief.cfm?pr...
the ones in these books are better:
https://dba.stackexchange.com/questions/12991/ready-to-use-d...
http://dbpatterns.com/documents/5091f74289cbad03bc958bc0/
It has the "let's put a UUID on every row" disease common to designers who have never really learned anything other than object oriented design. Price is a string (I guess so you can put "market price" on the fish?), and there's a currency symbol on every "delivery". The whole thing just makes no sense. 18 stars. Unbelievable.
Example: t0: 7458e3a9-716b-4352-b2e4-b5b67d0c089b t1: 4d8d753c-1777-439d-8725-b093b1bd8430
Using this as a PK in any relational database will mean the rows are stored in the clustered index order, which causes extreme fragmentation because the db engine constantly has to find "holes" in the data pages to insert more recent records instead of adding those at the bottom of the table as would happen with any continuously increasing key.
Others embrace surrogate keys because: They make joins between tables easier (usually one field rather than a compound key where one might accidentally not include all the join predicates). They save space since an integer is almost always smaller than a compound key.
So if you are in the camp that favors surrogate keys, then consider these points:
Some quick downsides for UUIDs: If your table has a clustered index, UUIDs are generally not created in a regular order, and so you'll be constantly inserting into a part of your index rather than the end. Some databases allow sequential UUID generation to help mitigate this. UUIDs are usually 16 bytes, but an int is usually 1-8 bytes. Also keep in mind you pay this price multiple times since the primary key (generally) becomes the foreign key in other tables.
Some upsides for UUIDs: If you have a process that has no master, being able to tie disparate things together with a shared UUID is very useful sometimes.
Often developers use UUIDs/GUIDs because: Their app wants to perform a 'Create' operation and they want to generate the key in advance.
This can usually be mitigated by creating a routine that generates a range of reserved surrogate integer keys, and gives a starting value to the application layer. Then the application layer can use those values with knowledge they won't already exist. For instance, the app layer say, "I'm going to make 100 customers", and the response comes back "10232". Meaning that 10132-10232 is reserved for that process to use for customer primary keys. This allows larger (chunkier) requests which can be interrelated without constantly making requests to the Database layer. One could extent this to be a true cache for entity reserved numbers that gets dolled out as needed by the app layer.
Some popular ways to do PK: natural key, sequentially generated key, and UUID. Personally I would prefer natural key if I can find an immutable natural key for the table. However, natural keys are hard to find. The food example in this case doesn't have natural key. Also if the natural key requires a compound key, it's just not worth the pain.
Sequential auto generated key is good when you need to hand it off to users, like order id or ticket number. It's short and simple and it's auto generated. The downsize is when you migrate databases, you need to seed the new database carefully or it would start from beginning again. Also in record creation, you need to do extra read to get back the newly generated key.
UUID is mostly worry free. It can be generated anywhere and doesn't need to be in the database. For a setup with distributed databases, I would use UUID just to have global uniqueness. For an offline app, I would use UUID to create the data records locally and later sync them with the main database. UUID is good when it's used internally and not exposed to the users.
Using random UUIDs as an invoice number wouldn't cut it, you'd have to have a sequential ID in some form. You could of course have both but that's adding unnecessary complexity.
Another example, if you are designing a website and lets say give each blog post a UUID, a URL of:
website.com/blog/my-post-34
Looks better than
website.com/blog/my-post-d173affb-6bb8-4435-bef1-4e29409dff4c
Unique sequential Id's also give some clues about the relative age of rows (eg ID 100 is older than ID 50). This doesn't come up often but can sometimes be useful.
Regardless, I always find using a unique int ID for pretty much every record makes your job 10x easier and I fail to see why you would ever want to do anything different. If you want to 'hide' the ID in special cases (eg order numbers) better to generate a unique random int ID.
The only price is on the 'delivery' table, which seems to imply that the price is only set when someone orders something. I guess this is a delivery-only place? But because the 'delivery' table can only have one food_uuid, you'd have to put in for a bunch of deliveries to order more than one thing.
Maybe the 'price' on the menu is in that 'Additional Info' table?
SELECT A.NAME, A.DESCRIPTION, B.VALUE FROM FOOD A LEFT OUTER JOIN "Additional Info" B ON (A.UUID = B.UUID AND B.TYPE = 'Price');
There's so much wrong here...http://dbpatterns.com/documents/5707794a1514b4252236dcbe/
better, but not finished...
Like any open advice community like this, its important there are checks in place to not perpetuate bad ideas...
http://www.wiley.com/WileyCDA/WileyTitle/productCd-047138023...
http://www.wiley.com/WileyCDA/WileyTitle/productCd-047135348...
By far one of the most useful things a resource can do.
It doesn't just broaden your understanding of a domain, but it can teach you how to think about design problems.
I can't imagine using it for anything real but it was fun to play with.
If the creator is reading this, here's a bug report: If the user hasn't hit "Save" when they export the schema, they get an empty text file and they'll think the app is broken. It should either auto-save or prompt the user "You have unsaved content, do you want to save before exporting?"
UPDATE: One more bug report, the export doesn't escape the columns names at all. It just naively adds double quotes around them. This breaks with double quotes in field names (yeah yeah which is stupid but still..):
CREATE TABLE "foo" (
"id" int,
"baz" varchar(255),
"bar" varchar(255),
""test"" varchar(255)
);But the actual modelling functionality itself is really lacking. The lines joining the entities don't even indicate cardinality (does a ninja have many ninja_nelts? Or is it the other way around?). That's pretty much 101 for any visual database modelling.
Also the lines overlap, meaning you can't see whether ninja_weapon is related to ninja, or to ninja_belt.
I can answer these questions myself by perusing the foreign key columns - but that makes the lines redundant. Since several decades ago, Chen, Bachman, Grady Booch and a bunch of others have given ways to denote cardinality etc., and these are widely used. This tool should use one.
(apologies in advance if I have this wrong and the tool does support this but the diagram authors elected not to use them).
How do I browse these models? Is there a taxonomy of organization? How many models are in the system? Are there models relevant to general domains of interest to me? None of this is made apparent.
There's something very therapeutic about organizing data using a system of rules, and the relational data model is a powerful one.
It hurts my pragmatism to solve a distributed systems problem before we even had a distributed system! Don't be a sheep and use UUIDs, rather than a simple integer primary key or composite key that is natural to the table. People say it's just an id, who cares? But I say people making these kinds of decisions are probably making other poor choices backed by something they read and don't understand.
Here's a great article on locality and the need for uniqueness and why UUIDs should only be introduced when needed and to solve very specific distributed systems problems. The reality is almost all applications will never need to have multiple concurrent writers.
The only time I use UUIDs is when I need to pass that id out, but I want to hidr how many rows there are or the rate that they are being created. However, I usually use COMBs to prevent fragmentation.
Most of the patterns I've seen are obvious simple things but they are missing a lot of content like : - What was the design's intended properties? - Expected volumes, access patterns? - Good places for indices? Maybe the comments could be of some help but really haven't seen much. I've actually found more interesting content in the HN comments than on the site itself.
This should be called something like database designs.
If you apply some recipe to structure code, that's a "design pattern"; if we apply some recipe for structuring data, it's a "data(base) design"?
But code is data. But sometimes it's not clear whether something is code or data, or whether it's closer to being code than to data.
Design patterns incorporate data. The Observer pattern requires a list of observers that are notified; that is a data structure which we can have in a database: an observer table joined to an event source by event source ID.
If I make such a database (say for a large number of users to set up persistent notifications about some interesting interesting things), am I doing "database design" or implementing the Observer pattern? :)
- it could allow sql exports for various platforms - it could feature an API so people can write their own framework drivers (for example creating migrations, importing existing schemas etc)
Lets hope that it gains users, so that the voting system (the star) can become more useful to filter out the garbage.
Edit: please remove the login wall to see the starred items, it raises the entry barrier quite a bit. Do you really want new visitors to see low rated schemas as the first thing?
Most egregious example for me is probably the prevalence of a lot of "type" tables when a simple enum column would do. Or maybe the sheer number of UUIDs that are being thrown around.
I even saw a circular ID chain in one. Would be fun setting up foreign key constraints for that!
How do you see the forks?
Simple things like dialogs getting stacked on top of each other, using `prompt("...")` sometimes and modals other times, foreign key relationship arrows not indicating cardinality, etc --
Looks good but please don't consider it "finished" yet.
> jnichols created new pattern
> Penis
> 24 seconds ago
Ok... maybe they need to start with some patterns about filtering spam and noise.