A Bad Privacy Bug (opens in new tab)

(blog.pinboard.in)

110 pointsstilist13y ago46 comments

46 comments

30 comments · 13 top-level

davidmp13y ago· 7 in thread

  But I do this from my own Pinboard account, which has 
  superpowers. Specifically, it lets me see private bookmarks
  on all accounts. And since I already see everything, I don't
  notice when everybody else starts to see everything, too.

This is not okay.

thaumaturgy13y ago

I'm a drooling Pinboard fanboy, so I'm probably cutting him more slack here than I would for other services. However: it's been obvious to me since the beginning that "private" in Pinboard-context just means, "this bookmark is not publicly available on my account page." I've had absolutely no expectation that it wasn't accessible by a site admin -- although that would be nice too.

I can't think of a way to store retrievable data, like a bookmark, on his servers without leaving some way for him to access the data if he wanted to.

gyardley13y ago

Yes, retrievable data is obviously accessible by a site admin somehow. But the way he's got things set up:

1) It's trivial for him to inadvertently see something deeply personal to someone just by browsing the 'recent' list or doing a search.

UPDATE: I overstated this one - Maciej let me know by email that he can only access private data on the search / recent page if he intentionally masquerades a user. He can only inadvertently see private data when viewing individual user pages.

2) If his account's ever compromised (let's hope he's not reusing that password elsewhere!) then someone else gets that ability as well, accessible from any browser anywhere.

It's one thing when you have to ssh into a server somewhere and do a SQL query to access someone's private information. It's another thing to set up your admin account so you're casually exposed to it.

I like Pinboard's service too, but this isn't remotely cool.

1 more reply

davidmp13y ago

I'm a paying user and this won't make me stop using the service, but there should be a reasonable expectation of privacy here. Obviously the database has to store this information in a retrievable way, but to expose it so carelessly on a regular basis is completely unnecessary.

lsc13y ago

this is... a hard choice. I'm not saying I know which way is right, but I think that either way, revealing to your users which choice you make is the right thing.

The choice, really, is "do I look at my user's personal stuff and have a good idea that my service is working for them" vs. "do I not look at my user's stuff, and have much less of an idea if the service is working for them."

(yes, yes, good automated tests are the right way to solve the problem, but good automated test are difficult, especially when a broad range of behaviors could be 'correct')

In my own case, there are two places where I'm erroring on the "don't look at the user's data, even if it would allow you to provide better service" side. First? I don't mount a user's disk. it's a block device, as far as I am concerned.

As you might imagine, this makes backups way, way more difficult. Moves, too. It's possible that if I were to ignore this rule, I'd have enough spare disk bandwidth to do at least half-assed backups (I'm not doing any backups of customer data right now, which is pretty scary for me, because I'm in a business where if you lose the customer's data, you lose the customer in the worst way.)

Now, the right way to solve this problem is some kind of 'snapshot over the network' like zfs has or some other mechanisim that only transfers the change in block devices. Unfortunately, Linux has no such tools. (yes, yes, ZFS on linux is a possibility. So is a NAS. Rsync or something rsync like won't work because the problem is not network bandwidth but disk bandwidth. I mean, it's a problem that can be solved, a better way, but solving it the better way is more work.)

Next, from a technical support perspective? I could provide dramatically better service if I logged all the serial consoles. Dramatically better service. but last time I asked, people seemed uncomfortable with it, so I don't. (the thing is, it's only easy to log all consoles or log none of the consoles; I'd have to spend time and effort building a 'log consoles except for users that opt out' mechanisim, which is the right thing to do here, but due to time constraints that hasn't been done.) (to be clear, when a customer asks for help, or even if soemthing happens with a domain and I'm not sure it's working, I look at the serial console as is. My job would be nigh impossible without serial consoles. Also, by default I log the serial consoles on dedicated servers, though unlike Xen, it is easy for me to opt people out of that. Conserver is the tool I use there.)

Personally, I think I made the right choice when it comes to block devices (I have had good luck with my RAIDs, and treating disks as block devices means my customers can use whatever freaky filesystem they like.) but I think I made the wrong choice on the serial console; there really isn't that much personal stuff that comes up on the serial console, and it gives me a lot of clues.

I guess what I'm saying is that from a testing perspective? being able to see the user's bookmarks has a lot of advantages. It's a valid choice; and he's sharing that choice with the customers, which is the right thing to do.

(the interesting thing here, of course, was that if his account couldn't see the users bookmarks, while usually that would have given him much less diagnostic information, it would have given him more diagnostic information this time.)

davidmp13y ago

Of course it's a tradeoff, but there's only one choice guaranteed not to piss off your users. It's an easy decision. Don't look at private data.

1 more reply

sedev13y ago

He recently said on Twitter that correcting that is on his to-do list.

hopeless13y ago

As an admin, having the ability to masquerade as a particular user is invaluable for debugging & support.

PascalW13y ago· 6 in thread

  Instead of automated test suites, I use checklists before deploying major changes,
  performing a series of actions (like creating and editing bookmarks)
  to make sure everything works as expected.

Somehow I always shiver when I read stuff like this. Some manual smoke testing is never bad, but I really can't understand how you can feel comportable about not having automated test suites on (larger) projects.

Stuff like this is so easy to catch in a unit or integration test.

EvilTerran13y ago

Genuinely curious - if you find a project in that sort of condition, how do you go about fixing it? Are there respected guides to introducing proper unit and integration tests to established projects (that may not be structured amenably to them)?

marc_omorain13y ago

The first step is to create an empty test suite with at least one test that runs. This will mean that when you go to add a test later you have no barriers — you already have a suite.

You then need to follow two rules. First, write tests for all new features. Second, add a failing test that exposes the bug before you fix any bug.

By doing these two things you will end up testing two very important areas of your code — new code (which will always be buggiest) and code that has recently proven to be buggy.

1 more reply

calpaterson13y ago

Essentially, my understand of best practice is to write high level functional tests for the features that appear to work and then use them to ensure there are no regressions as a result of your changes. Someone people even define legacy code as "code without tests".

http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

xentronium13y ago

There were a couple Destroy All Software screencasts on that topic [1].

In a nutshell, you start by writing a lot of high level integration tests, which will serve you as some guarantee you don't screw up. Then you start refactoring, writing fine-grained unit tests along the way. Once you settle with code structure, you might want to start removing/rewriting test cases you wrote initially in case they seem redundant.

[1] https://www.destroyallsoftware.com/screencasts/catalog/untes... and further parts 2-4. This screencast is done mostly in ruby.

1 more reply

wickedchicken13y ago

> Stuff like this is so easy to catch in a unit or integration test.

Unit and integration tests are great, but you can't rely on them for everything. There is always a temptation to trust the tests instead of the real world, and that's where you get in trouble. The caption for this photo might as well have read 'all tests are green' http://25.media.tumblr.com/tumblr_m7v0yrMrJ21rzupqxo1_500.jp...

PascalW13y ago

I agree, that's why I said that smoke testing is never bad and neither is a good QA guy.

But as a developer unit and integration tests are what gives me trust in what I ship. I wouldn't feel comfortable shipping something without having a proper test suite, let alone making a change to such codebase.

tdavis13y ago· 2 in thread

As much as anything else, this is another great example of MySQL letting you shoot yourself in the face by default. I also love that you can save "y" to an integer field without complaint. It seems obvious that accurate data validation and comparison would be the cornerstone of a successful relational database, yet MySQL defaults fail to provide any domain-relevant value whatsoever.

ars13y ago

bin is not an integer field. It's a binary field - it can save anything.

And no, you can not save 'y' to an integer field, so your rant is misplaced.

mbrubeck13y ago

I think tdavis is referring to this example from maciej's post, where inserting 'y' into a tinyint field silently converts it to 0 instead of throwing an error:

     mysql> create table demo(int tinyint(1), bin binary(1));
     mysql> insert into demo(int, bin) values ('y', 'y');
     mysql> select * from demo;

     +------+------+
     | int  | bin  | 
     +------+------+
     |    0 | y    |
     +------+------+

1 more reply

vlucas13y ago· 2 in thread

Man, the more I see and experience things like this in MySQL, the more impressed (horrified?) I am at how it ever became popular. So glad I've made the switch to Postgres.

ars13y ago

This isn't really a MySQL issue though. He used bin instead of int, and MySQL did exactly what he asked - stored the raw binary data as is.

drivebyacct213y ago

Why are rows with 'y' in them being returned when he asked for that value equal to zero?

1 more reply

biot13y ago

The schema should really be:

  private bit not null default 0

If you use something like tinyint and have a range of values that are acceptable, bake this into the database with a check constraint:

  myfield tinyint not null default 0 check (myfield between 0 and 3)

Regardless of how comprehensive your unit tests are, you should prevent as much bad data as you can at the DB layer.

brlewis13y ago

I respect maciej more for the candor about this incident. My wife had private bookmarks exposed, but I recommended she continue to use the service. These particular mistakes are not ones I would have made myself, but the important thing is that he's examining them intelligently and taking steps to prevent their recurrence.

overgard13y ago

I don't think I saw it mentioned anywhere, but another thought that occurs to me is it might have been initially useful when he created the table to have the "private" field actually be called "public" instead.

The idea being that if the field is likely to be initialized to a default, you want the safer default. (IE, in that case things would be created as accidentally private, rather than as accidentally public)

Kudos to the author for the transparency. I actually have never used pinboard, but that blog post actually makes me more likely to use it rather then less.

jrochkind113y ago

> So even though the row has (from our perspective) a non-zero value, MySQL sees it as matching zero.

What? Why does MySQL return the row with 'y' in it (in a 'binary' column) when you ask for "where row = 0'? wtf is going on here?

That seems to be the real culprit here. I assume it's MySQL's 'helpful' conversion routines again, but I'm just assuming, I don't understand the details.

rev08713y ago

If only other companies would handle issues with this level of transparency...

mbq13y ago

So you browse "private" bookmarks of your users on a daily basis...

8ig813y ago

I was affected by this. I received the email notification from Maciej and was offered compensation for the mistake. I declined the compensation and instead upgraded my account to include archival service. I did this partially because I've been considering the upgrade for a while, but mainly because I was so impressed by the way the incident was handled.

Keep in mind that I don't have state secrets bookmarked in Pinboard. I'm just an anti-social type of user as their tag line notes and I appreciate how Maciej runs his business.

hopeless13y ago

I recently watched this video which is mostly about Postgres but also pokes some fun at MySQL's weirdness. I didn't realise how scary MySQL can be

http://vimeo.com/m/43536445

themonk13y ago

Implicit type conversion can be evil : http://bhati.livejournal.com/3352.html

j / k navigate · click thread line to collapse

46 comments

30 comments · 13 top-level

davidmp13y ago· 7 in thread

  But I do this from my own Pinboard account, which has 
  superpowers. Specifically, it lets me see private bookmarks
  on all accounts. And since I already see everything, I don't
  notice when everybody else starts to see everything, too.

This is not okay.

thaumaturgy13y ago

I can't think of a way to store retrievable data, like a bookmark, on his servers without leaving some way for him to access the data if he wanted to.

gyardley13y ago

Yes, retrievable data is obviously accessible by a site admin somehow. But the way he's got things set up:

1) It's trivial for him to inadvertently see something deeply personal to someone just by browsing the 'recent' list or doing a search.

2) If his account's ever compromised (let's hope he's not reusing that password elsewhere!) then someone else gets that ability as well, accessible from any browser anywhere.

I like Pinboard's service too, but this isn't remotely cool.

1 more reply

davidmp13y ago

lsc13y ago

this is... a hard choice. I'm not saying I know which way is right, but I think that either way, revealing to your users which choice you make is the right thing.

(yes, yes, good automated tests are the right way to solve the problem, but good automated test are difficult, especially when a broad range of behaviors could be 'correct')

davidmp13y ago

Of course it's a tradeoff, but there's only one choice guaranteed not to piss off your users. It's an easy decision. Don't look at private data.

1 more reply

sedev13y ago

He recently said on Twitter that correcting that is on his to-do list.

hopeless13y ago

As an admin, having the ability to masquerade as a particular user is invaluable for debugging & support.

PascalW13y ago· 6 in thread

  Instead of automated test suites, I use checklists before deploying major changes,
  performing a series of actions (like creating and editing bookmarks)
  to make sure everything works as expected.

Stuff like this is so easy to catch in a unit or integration test.

EvilTerran13y ago

marc_omorain13y ago

The first step is to create an empty test suite with at least one test that runs. This will mean that when you go to add a test later you have no barriers — you already have a suite.

You then need to follow two rules. First, write tests for all new features. Second, add a failing test that exposes the bug before you fix any bug.

By doing these two things you will end up testing two very important areas of your code — new code (which will always be buggiest) and code that has recently proven to be buggy.

1 more reply

calpaterson13y ago

http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

xentronium13y ago

There were a couple Destroy All Software screencasts on that topic [1].

[1] https://www.destroyallsoftware.com/screencasts/catalog/untes... and further parts 2-4. This screencast is done mostly in ruby.

1 more reply

wickedchicken13y ago

> Stuff like this is so easy to catch in a unit or integration test.

PascalW13y ago

I agree, that's why I said that smoke testing is never bad and neither is a good QA guy.

tdavis13y ago· 2 in thread

ars13y ago

bin is not an integer field. It's a binary field - it can save anything.

And no, you can not save 'y' to an integer field, so your rant is misplaced.

mbrubeck13y ago

I think tdavis is referring to this example from maciej's post, where inserting 'y' into a tinyint field silently converts it to 0 instead of throwing an error:

     mysql> create table demo(int tinyint(1), bin binary(1));
     mysql> insert into demo(int, bin) values ('y', 'y');
     mysql> select * from demo;

     +------+------+
     | int  | bin  | 
     +------+------+
     |    0 | y    |
     +------+------+

1 more reply

vlucas13y ago· 2 in thread

Man, the more I see and experience things like this in MySQL, the more impressed (horrified?) I am at how it ever became popular. So glad I've made the switch to Postgres.

ars13y ago

This isn't really a MySQL issue though. He used bin instead of int, and MySQL did exactly what he asked - stored the raw binary data as is.

drivebyacct213y ago

Why are rows with 'y' in them being returned when he asked for that value equal to zero?

1 more reply

biot13y ago

The schema should really be:

  private bit not null default 0

If you use something like tinyint and have a range of values that are acceptable, bake this into the database with a check constraint:

  myfield tinyint not null default 0 check (myfield between 0 and 3)

Regardless of how comprehensive your unit tests are, you should prevent as much bad data as you can at the DB layer.

brlewis13y ago

overgard13y ago

Kudos to the author for the transparency. I actually have never used pinboard, but that blog post actually makes me more likely to use it rather then less.

jrochkind113y ago

> So even though the row has (from our perspective) a non-zero value, MySQL sees it as matching zero.

What? Why does MySQL return the row with 'y' in it (in a 'binary' column) when you ask for "where row = 0'? wtf is going on here?

That seems to be the real culprit here. I assume it's MySQL's 'helpful' conversion routines again, but I'm just assuming, I don't understand the details.

rev08713y ago

If only other companies would handle issues with this level of transparency...

mbq13y ago

So you browse "private" bookmarks of your users on a daily basis...

8ig813y ago

Keep in mind that I don't have state secrets bookmarked in Pinboard. I'm just an anti-social type of user as their tag line notes and I appreciate how Maciej runs his business.

hopeless13y ago

I recently watched this video which is mostly about Postgres but also pokes some fun at MySQL's weirdness. I didn't realise how scary MySQL can be

http://vimeo.com/m/43536445

themonk13y ago

Implicit type conversion can be evil : http://bhati.livejournal.com/3352.html

j / k navigate · click thread line to collapse