How I Write SQL, Part 1: Naming Conventions (2014) (opens in new tab)

(launchbylunch.com)

164 pointssehrope8y ago78 comments

78 comments

I personally prefer person_id to be the primary key name (instead of id) in both the person table and any table which has it as a foreign key. One reason is for join syntax:

    select * from person join team_member using (person_id)

The other reason is person_id now unambiguously refers to the same field regardless if we're looking at the PK or a FK. It's always person_id.

rodelrod8y ago

I'm with you but in my experience it's a lost fight. Most of the projects I come across these days follow the id convention.

The advantages of using person_id are even more obvious in multiple joins, such as a star schema where you can using(person_id) all the things, reducing both the typing and the cognitive load.

I suspect that if this convention was more pervasive, programmers would be a little bit less afraid of diving into SQL.

Well, at least we're settling with some convention, so it's not all bad.

EDIT: typo

jermaustin18y ago

I use T-SQL, and this is actually why I always join with the table name:

SELECT * FROM Person JOIN TeamMember on PersonId = Person.Id

mtone8y ago

I'd bring it a little further and would write:

SELECT * FROM Person as P INNER JOIN TeamMember as TM on TM.PersonId = P.Id

I have:

- Aliased each table and prefixed every field names with their table alias in my join conditions.

- Explicited the JOIN type.

The above:

- Reduces mistakes due to ambiguities that tend to generate unwanted duplicates rows in SQL.

- Increases the likelihood of getting an error at parse time, instead of run-time or analysis-time, thanks to added scoping.

- Works in any schema, no matter what naming conventions are followed.

- Keeps working as the query becomes more complex with multiples table aliases or self-joins, and similar field names appearing in the set.

- Better expresses intent. Sure JOIN defaults to INNER JOIN, but writing "INNER JOIN" shows that you genuinely expect any row not matching your condition to be removed from the result set.

3 more replies

dspillett8y ago

I prefer to keep ID column names descriptive even if it does lead to repetition like Person.PersonID. That way columns that identify a person always carry the same name and you are never left guessing what a more anonymous "ID" refers to or fall into one of a couple of traps where the parser disambiguate one in a way you were not expecting (though this is also caught by consistently using two+ part names when referring to columns, which I also prefer to do). It is particularly useful if the same entity is joined into a query multiple times with different aliases.

There are cons, of course. This is a matter that divides people and when working with other people's projects you have to ignore your own preference and follow the "local" convention.

1 more reply

elchief8y ago

Joe Celko, as well as ISO-11179, tell us to use collective names ("personnel") or plural names ("employees") for tables

As well, fewer keywords are plural, compared to singular, so there's less chance of accidentally using a keyword if you use plurals

Haven't yet seen an "octopus" table in production...

danso8y ago

I did a Google search for elaboration on this and apparently there is disagreement that ISO-11179 says this at all:

https://social.msdn.microsoft.com/Forums/vstudio/en-US/d5f2f...

> Yes, this is the same version as I found, but the closest thing I could find to addressing table names in the paper itself was an "Object Class name", something like an OOP Class or something you'd find in a UML diagram, but not really the same as a table name, and in any case all the examples were singular.

> Was actually kinda hoping Celko would deign to comment on this himself as he seems to be the chief proponent of the "collective identifiers as specified by ISO 11179" meme.

elchief8y ago

"To remind users that tables are sets of entities, ISO-11179 Standard likes to use collective or plural nouns that describe the set of those entities for the names of tables. Thus 'Employee' is a bad name because it is singular"

Page 10, SQL For Smarties (Celko), 5th Ed

If Celko says it's right, it's right

2 more replies

gwbas1c8y ago

This sounds like a brace style argument. The worst thing is to have a strong opinion.

daigoba668y ago

Also important is adapting to the existing naming conventions of the database, even if you don't like it. (Unless the existing naming conventions cause more trouble that its worth like requiring quoted identifiers or redundant prefixes/suffixes).

ysleepy8y ago

I agree, consistency is worth a lot more than using a slightly better convention. It would also create surprises in the sense of "principle of least surprise" which is in general a good guideline while designing APIs schemas and so on.

gwbas1c8y ago

Probably the worst thing I've encountered is a junior engineer trying to encourage me to change a coding style by having two coding styles coexist.

I just kept repeating, over and over, that I expected the coding-style to be consistent. It was totally over his head, and he totally didn't even bother looking to find a code formatting utility to do a One-Shot style change.

daphneokeefe8y ago

For naming stored procedures, there were a lot of helpful answers to my question "What is your naming convention for stored procedures?" on StackOverflow a few years ago. https://stackoverflow.com/questions/238267/what-is-your-nami...

0xffff28y ago

Without a rigorous attempt at justifying each of these rules, I don't find this article particularly useful. For example, can someone link to or provide a formal explanation for why table names should be singular? I actually really wanted to read the full relational algebra rational for that one.

cwbrandsma8y ago

My own view is they should either be all plural or all singular. Just pick one. But I do gravitate to singular because all nouns have naturally consistent singular words. The same cannot be said of plural. e.g. Moose, Cactus, and any other word that ends in an 's'.

wvenable8y ago

I used to do all plural but then I read a good argument online for doing singular so I switched to that for the next project and I've done singular ever since.

The few advantages of singular:

1. It's not always clear what the plural of something should be.

2. Chances are the singular maps better to your application layer (Person class <=> Person table).

chias8y ago

That's a fair point, but keep in mind that the author isn't suggesting these are the "correct" ways to write SQL. They're just his ways.

I don't think one standard is necessarily better than another, but the important thing is to have rules. Over the years I've adopted similar rules for myself, and just internal consistency is so much better. I have old projects with table names including: "logs", "log_requests", "log_users", "game_logs", etc., not to mention mixing of plural and singular, to the point where I need to `show tables` before writing any query just to remember what I even called the table I need.

zzzeek8y ago

I'm sure this SO answer will satisfy you: https://stackoverflow.com/a/4703155/34549

kbenson8y ago

That answer satisfied me initially, but less and less so the more I read. It became very obvious by the end that what is being represented is one "standard", but presented as the only possible correct solution.

It starts off with Yes. Beware of the heathens. Plural in the table names are a sure sign of someone who has not read any of the standard materials and has no knowledge of database theory. I thought the author was being flippant, but it became increasingly obvious that this is a true reflection of their dogmatic view with regard to this topic.

Even if this is the same view I would settle on with all the knowledge, being presented with what is obviously a single perspective with no acknowledgement whatsoever of any positive aspects of alternatives causes me to instinctively distrust quite a bit of the reasoning presented.

1 more reply

qilo8y ago

Relation definitions define types. Types have names. Tuple in a relation is an instantiation of the corresponding type. And we use singulars for class/types names in most programming systems (e.g. class Person vs class Persons).

davvolun8y ago

I've been through the relational algebra, I found this to be a nice, quick re-cover. More like a checklist than a full inspection.

jcadam8y ago

I know a reasonable amount of SQL but am by no means a database guru. Naturally, this makes me the local database expert at my current place of employment :/

The other devs' eyes glaze over when I say things like 'stored procedures' and 'trigger functions.' Bah.

wvenable8y ago

I don't understand this one:

> Mixed case identifier names means that every usage of the identifier will need to be quoted in double quotes

I've used quite a few RDBMS engines, including most mentioned by the author, and I've never had to quote mixed-case identifier names. They work just the same as all lower-cased names or as any other case-sensitive language.

Most of the programming languages I use typically have the convention of using PascalCase for classes and public fields/properties so I prefer to use that convention for tables and columns (and then everything else for consistency). When doing operations between the application and the database, the name is exactly the same without the need for translation.

Otherwise, I think it's a good list.

stubish8y ago

If you are not quoting your mixed-case identifiers, then they are not mixed-case. They are being implicitly converted to either lowercase or uppercase depending on your database, and your capitalization is lost. It normally doesn't matter, until you are introspecting your database schema, at which point your code generation generates a Organizationrole protobuf message instead of an OrganizationRole, or your ORM fails to find OrganizationRole because PostgreSQL stored it as organizationrole.

wvenable8y ago

This seems like a terrible feature of Postgres and Oracle (which apparently works the same but oppositely). All other database engines I've used have retained the case of unquoted identifiers.

I haven't used Postgres enough to notice this, it's almost a deal breaker.

I might be tempted to mandate that all identifiers be quoted than deal with half the possible characters for names. Although more likely all code-generation would happen on the application side with DB migrations so the database wouldn't the source of truth for identifier names, anyway.

wfriesen8y ago

In Oracle this is kind of hidden away since it is case sensitive, but unquoted identifiers are silently converted to uppercase, quoted identifiers are used as-is. So, for queries against something like

  create table foo (
    Bar integer,
    "Foobar" integer
  );

Referring to Bar, BAR, bar, and "Foobar" will work, but foobar and Foobar will not.

walshemj8y ago

How come I have never in a multi decade career come across "i18n".

I though the canonical way of doing this was to write KEYWORDS in caps and use camel case for Variables.

Also never really brought into adding the type as part of a name - your type is already defined in your schema.

cryptonector8y ago

Really? I have. In some circles (e.g., the IETF), i18n is an ancient acronym. People who've worked on operating systems (e.g., OS X, Solaris, RHEL, whatever) have to deal with L10N (localization). G11N (globalization) is I18N + L10N.

And then there's a11y: accessibility. This is all about making user interfaces accessible for people with low or no vision, low or no hearing, difficulty typing, and so on.

There are generally applicable laws requiring G11N and A11Y, and these fall heavily on OS vendors, which is why people who've worked on OSes tend to know these acronyms.

I18N -> dealing with Unicode in general, codeset conversions, font issues, ...

L10N -> dealing with translating system/application messages to the users' preferred languages (and how to even know they preferences) (think locales)

G11N -> I18N and L10N.

Localization is damned difficult. There's all sort of little bothersome things, like how to format numbers (which varies quite a lot) and dates (can't we all just use ISO-8601?!). And translating printf-like format strings is often non-trivial, especially when the coder doesn't stop to think about just how hard they might be to a translator as they write their code.

kbenson8y ago

> G11N -> I18N and L10N.

That's a new one to me, but makes sense. I've been lucky enough to have heard of i19n and l10n for years (almost decades, and this point) but not had to deal with it much beyond tracking down a string in some open source webapp I was patching before deploying.

> can't we all just use ISO-8601?!

Preach on. I sometimes find myself filling out date fields in paper forms in YYYY-MM-DD without thinking. The elementary school my kids attend probably thinks I'm a weirdo. I know my wife does...

dvh8y ago

V10N --> Velociraptor

T15X --> Tyrannosaurus Rex

D11S --> Dilophosaurus

B11S --> Brachiosaurus

T9S --> Triceratops

S9S --> Stegosaurus

2 more replies

swirepe8y ago

Also p13n -> personalization

irrational8y ago

You've worked on English only applications?

reaperducer8y ago

That could be it. Like him, I didn't run into that abbreviation for my first 20 years of coding. It wasn't until I needed to do a dual English/Czech project that it came up.

We're always learning.

sytelus8y ago

Lot of these is debatable. For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql because lot of tooling uses these names to generate UX. Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming. I have used both approaches with its own pro and cons.

zzzeek8y ago

> For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql because lot of tooling uses these names to generate UX.

those tools are wrong (and I know roughly which ones those are).

> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.

it would be: person.id and the foreign key column that refers to it person_thing.person_id. This is much preferable to person.person_id and person_thing.person_person_id.

sbov8y ago

> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.

I assume you mean you just use PersonID as the foreign key. This oftentimes introduces ambiguity into what the relationship actually is. I prefer names that describe the actual relationship (e.g. author, owner, approver, etc) rather than letting other people guess what it is.

always_good8y ago

I usually want to reserve the noun like "author" as the embedded record after a join. That way "author_id" is always the key and then "author" is the json_agg joined object that embeds the whole record.

Otherwise you're actually introducing ambiguity imo.

2 more replies

zo18y ago

I think he's referring to using "[Table]ID" as the primary key on [Table]?

So now you have to join with the Person table on Person.PersonID from your local column PersonID. I much prefer the other way around: Table.ID with foreign keys being "TableID".

1 more reply

matte_black8y ago

> For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql

Do not do this in Postgres, it will be a pain in the ass since you will have to use quotes around everything.

cryptonector8y ago

Right. PG has appropriated square brackets for array notations, so you really just have to use double-quotes.

At least PG tries really hard to not add new reserved keywords, which means you mostly don't have to worry about your schema element names possibly conflicting with new keywords in future releases.

autokad8y ago

if you need “[First Name]” so it shows up in an UI, you could always do first_name as 'First Name'. But I would say that's still general bad practice. In many cases, you shouldn't be exposing your column names through a UI, and most UI allow for alias

arez8y ago

Person.PersonID stutters, you already know that you query on the person table you don't have to repeat it again. Naming it ID is the same consistency

default-kramer8y ago

I like "Person.PersonID" because then "alias.PersonID" will produce an error if "alias" does not have a "PersonID" column. If every table has an "ID" column then "alias.ID" pretty much never fails, even if you typed the wrong alias - you just end up joining on the wrong thing and getting the wrong result set.

barrkel8y ago

You should follow the conventions that make life easier in the rest of your tooling.

The fact is, you're probably going to be issuing more SQL via abstractions like ORMs or querying libraries than raw SQL. If you need to work against the grain of those libraries to map your model, what upside are you getting?

If most of your data is queried via ActiveRecord, for example, you should use plural table names.

btilly8y ago

It is true that most of my SQL goes through ORMs. However my most complicated SQL is always constructed by hand. Furthermore note the point about how often applications get rewritten against the same database. You should not assume that future code will use the same ORM that you are using now.

rockostrich8y ago

It depends on the situation. If the queries you're writing deal are heavily integrated with application models and logic, then using the ORM is probably the way to go (especially with the ORM does some client side caching and other optimizations). Of course, if you start to see that the ORM's queries underperform compared to raw SQL then you should check to see what SQL the ORM is spitting out. I've seen SQLAlchemy create some very poor queries compared to what would be expected and ended up writing parts of those queries in raw SQL, but those cases are pretty uncommon when most of the logic for the application is simple gets/updates with filter/join conditions.

1 more reply

barrkel8y ago

Hand-constructed SQL, though, can cope with any naming scheme. And I'd argue a consistent naming scheme, using tooling as a forcing function, is better than inconsistencies you'd get with a big team of people writing their own SQL.

walshemj8y ago

Don't use an ORM is the answer just put in the effort to lean SQL

barrkel8y ago

This injunction doesn't really scale across a team.

I've spent weeks of my life tuning SQL, to the point of writing SQL generation libraries to effectively override the database optimizer when it consistently makes poor decisions in specific use cases. But I don't expect the rest of the team to know SQL as well as I do.

When I'm writing or generating SQL, I don't really care much what the naming convention is. If it's consistent, then SQL is easier to write. Consistency is more important than the specifics of any conventions.

1 more reply

matthewmacleod8y ago

That is a weird middlebrow dismissal of a response with essentially no value to anybody.

Use an ORM when appropriate; when using one, follow its conventions. Don't use an ORM if it's not appropriate. This is much better advice.

4 more replies

iblaine8y ago

> Avoid reserved words

Glad you cleared this up for the rest of us.

FWIW, naming conventions are like opinions. Everyone has them, and they usually differ from person to person. The best naming convention is a consistent naming convention. Also, naming conventions differ greatly by environment. A group of SQL Server engineers are going to have different standards than those of people working on mysql.

youpassbutter8y ago

No. That's how you write sql within your organization. Also there are syntactic differences between SQL flavors ( postgres, mysql, mssql, oracle, etc ) that make a SQL standard unrealistic.

The only generic rule is "be consistent". Whatever convention/style you choose, it should be consistent.

j / k navigate · click thread line to collapse

78 comments

meritt8y ago

I personally prefer person_id to be the primary key name (instead of id) in both the person table and any table which has it as a foreign key. One reason is for join syntax:

    select * from person join team_member using (person_id)

The other reason is person_id now unambiguously refers to the same field regardless if we're looking at the PK or a FK. It's always person_id.

rodelrod8y ago

I'm with you but in my experience it's a lost fight. Most of the projects I come across these days follow the id convention.

The advantages of using person_id are even more obvious in multiple joins, such as a star schema where you can using(person_id) all the things, reducing both the typing and the cognitive load.

I suspect that if this convention was more pervasive, programmers would be a little bit less afraid of diving into SQL.

Well, at least we're settling with some convention, so it's not all bad.

EDIT: typo

jermaustin18y ago

I use T-SQL, and this is actually why I always join with the table name:

SELECT * FROM Person JOIN TeamMember on PersonId = Person.Id

mtone8y ago

I'd bring it a little further and would write:

SELECT * FROM Person as P INNER JOIN TeamMember as TM on TM.PersonId = P.Id

I have:

- Aliased each table and prefixed every field names with their table alias in my join conditions.

- Explicited the JOIN type.

The above:

- Reduces mistakes due to ambiguities that tend to generate unwanted duplicates rows in SQL.

- Increases the likelihood of getting an error at parse time, instead of run-time or analysis-time, thanks to added scoping.

- Works in any schema, no matter what naming conventions are followed.

- Keeps working as the query becomes more complex with multiples table aliases or self-joins, and similar field names appearing in the set.

- Better expresses intent. Sure JOIN defaults to INNER JOIN, but writing "INNER JOIN" shows that you genuinely expect any row not matching your condition to be removed from the result set.

3 more replies

dspillett8y ago

There are cons, of course. This is a matter that divides people and when working with other people's projects you have to ignore your own preference and follow the "local" convention.

1 more reply

elchief8y ago

Joe Celko, as well as ISO-11179, tell us to use collective names ("personnel") or plural names ("employees") for tables

As well, fewer keywords are plural, compared to singular, so there's less chance of accidentally using a keyword if you use plurals

Haven't yet seen an "octopus" table in production...

danso8y ago

I did a Google search for elaboration on this and apparently there is disagreement that ISO-11179 says this at all:

https://social.msdn.microsoft.com/Forums/vstudio/en-US/d5f2f...

> Was actually kinda hoping Celko would deign to comment on this himself as he seems to be the chief proponent of the "collective identifiers as specified by ISO 11179" meme.

elchief8y ago

Page 10, SQL For Smarties (Celko), 5th Ed

If Celko says it's right, it's right

2 more replies

gwbas1c8y ago

This sounds like a brace style argument. The worst thing is to have a strong opinion.

daigoba668y ago

ysleepy8y ago

gwbas1c8y ago

Probably the worst thing I've encountered is a junior engineer trying to encourage me to change a coding style by having two coding styles coexist.

daphneokeefe8y ago

0xffff28y ago

cwbrandsma8y ago

wvenable8y ago

I used to do all plural but then I read a good argument online for doing singular so I switched to that for the next project and I've done singular ever since.

The few advantages of singular:

1. It's not always clear what the plural of something should be.

2. Chances are the singular maps better to your application layer (Person class <=> Person table).

chias8y ago

That's a fair point, but keep in mind that the author isn't suggesting these are the "correct" ways to write SQL. They're just his ways.

zzzeek8y ago

I'm sure this SO answer will satisfy you: https://stackoverflow.com/a/4703155/34549

kbenson8y ago

1 more reply

qilo8y ago

davvolun8y ago

I've been through the relational algebra, I found this to be a nice, quick re-cover. More like a checklist than a full inspection.

jcadam8y ago

I know a reasonable amount of SQL but am by no means a database guru. Naturally, this makes me the local database expert at my current place of employment :/

The other devs' eyes glaze over when I say things like 'stored procedures' and 'trigger functions.' Bah.

wvenable8y ago

I don't understand this one:

> Mixed case identifier names means that every usage of the identifier will need to be quoted in double quotes

Otherwise, I think it's a good list.

stubish8y ago

wvenable8y ago

This seems like a terrible feature of Postgres and Oracle (which apparently works the same but oppositely). All other database engines I've used have retained the case of unquoted identifiers.

I haven't used Postgres enough to notice this, it's almost a deal breaker.

wfriesen8y ago

  create table foo (
    Bar integer,
    "Foobar" integer
  );

Referring to Bar, BAR, bar, and "Foobar" will work, but foobar and Foobar will not.

walshemj8y ago

How come I have never in a multi decade career come across "i18n".

I though the canonical way of doing this was to write KEYWORDS in caps and use camel case for Variables.

Also never really brought into adding the type as part of a name - your type is already defined in your schema.

cryptonector8y ago

And then there's a11y: accessibility. This is all about making user interfaces accessible for people with low or no vision, low or no hearing, difficulty typing, and so on.

There are generally applicable laws requiring G11N and A11Y, and these fall heavily on OS vendors, which is why people who've worked on OSes tend to know these acronyms.

I18N -> dealing with Unicode in general, codeset conversions, font issues, ...

L10N -> dealing with translating system/application messages to the users' preferred languages (and how to even know they preferences) (think locales)

G11N -> I18N and L10N.

kbenson8y ago

> G11N -> I18N and L10N.

> can't we all just use ISO-8601?!

Preach on. I sometimes find myself filling out date fields in paper forms in YYYY-MM-DD without thinking. The elementary school my kids attend probably thinks I'm a weirdo. I know my wife does...

dvh8y ago

V10N --> Velociraptor

T15X --> Tyrannosaurus Rex

D11S --> Dilophosaurus

B11S --> Brachiosaurus

T9S --> Triceratops

S9S --> Stegosaurus

2 more replies

swirepe8y ago

Also p13n -> personalization

irrational8y ago

You've worked on English only applications?

reaperducer8y ago

That could be it. Like him, I didn't run into that abbreviation for my first 20 years of coding. It wasn't until I needed to do a dual English/Czech project that it came up.

We're always learning.

sytelus8y ago

zzzeek8y ago

> For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql because lot of tooling uses these names to generate UX.

those tools are wrong (and I know roughly which ones those are).

> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.

it would be: person.id and the foreign key column that refers to it person_thing.person_id. This is much preferable to person.person_id and person_thing.person_person_id.

sbov8y ago

> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.

always_good8y ago

Otherwise you're actually introducing ambiguity imo.

2 more replies

zo18y ago

I think he's referring to using "[Table]ID" as the primary key on [Table]?

So now you have to join with the Person table on Person.PersonID from your local column PersonID. I much prefer the other way around: Table.ID with foreign keys being "TableID".

1 more reply

matte_black8y ago

> For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql

Do not do this in Postgres, it will be a pain in the ass since you will have to use quotes around everything.

cryptonector8y ago

Right. PG has appropriated square brackets for array notations, so you really just have to use double-quotes.

At least PG tries really hard to not add new reserved keywords, which means you mostly don't have to worry about your schema element names possibly conflicting with new keywords in future releases.

autokad8y ago

arez8y ago

Person.PersonID stutters, you already know that you query on the person table you don't have to repeat it again. Naming it ID is the same consistency

default-kramer8y ago

barrkel8y ago

You should follow the conventions that make life easier in the rest of your tooling.

If most of your data is queried via ActiveRecord, for example, you should use plural table names.

btilly8y ago

rockostrich8y ago

1 more reply

barrkel8y ago

walshemj8y ago

Don't use an ORM is the answer just put in the effort to lean SQL

barrkel8y ago

This injunction doesn't really scale across a team.

1 more reply

matthewmacleod8y ago

That is a weird middlebrow dismissal of a response with essentially no value to anybody.

Use an ORM when appropriate; when using one, follow its conventions. Don't use an ORM if it's not appropriate. This is much better advice.

4 more replies

iblaine8y ago

> Avoid reserved words

Glad you cleared this up for the rest of us.

youpassbutter8y ago

No. That's how you write sql within your organization. Also there are syntactic differences between SQL flavors ( postgres, mysql, mssql, oracle, etc ) that make a SQL standard unrealistic.

The only generic rule is "be consistent". Whatever convention/style you choose, it should be consistent.

j / k navigate · click thread line to collapse