The DNC data breach (opens in new tab)

(blog.ngpvan.com)

89 pointsshorodei10y ago84 comments

84 comments

41 comments · 9 top-level

sethbannon10y ago· 18 in thread

For those that are not familiar with the space, campaigns typically use voter contact software to record the results of the conversations they have with potential voters on the phones, at the doors, and over the Internet. In this case, the voter contact software that both the Hillary and Sanders campaigns were using, NGP VAN, had a bug which allowed both campaigns to access each other's private, proprietary data (in this case, I believe, modeling data).

The Data Director on the Sanders campaign discovered the error and (he claims) was verifying and documenting the bug, which was then reported to the Democratic National Committee (DNC) and NGP VAN. The DNC claims these actions were not in good faith, and as a reaction cut the Sanders campaign off from the system.

This is a BIG deal for a campaign, so close to the first elections. Campaigns rely on that data to inform nearly everything they do, and rely on access to such tools to conduct their voter outreach program. Being cut off from the system is crippling for a campaign, likely why the Sanders campaign so quickly sued to get its access reinstated [1].

[1] - http://www.politico.com/story/2015/12/sanders-campaign-threa...

edit: typos

luckynonce10y ago

For an alternative perspective:

"The database logs created by NGP VAN show that four accounts associated with the Sanders team took advantage of the Wednesday morning breach. Staffers conducted searches that would be especially advantageous to the campaign, including lists of its likeliest supporters in 10 early voting states, including Iowa and New Hampshire. Campaigns rent access to a master file of DNC voter information from the party, and update the files with their own data culled from field work and other investments. After one Sanders account gained access to the Clinton data, the audits show, that user began sharing permissions with other Sanders users. The staffers who secured access to the Clinton data included Uretsky and his deputy, Russell Drapkin. The two other usernames that viewed Clinton information were “talani" and "csmith_bernie," created by Uretsky's account after the breach began. The logs show that the Vermont senator’s team created at least 24 lists during the 40-minute breach, which started at 10:40 a.m., and saved those lists to their personal folders. The Sanders searches included New Hampshire lists related to likely voters, "HFA Turnout 60-100" and "HFA Support 50-100," that were conducted and saved by Uretsky. Drapkin's account searched for and saved lists including less likely Clinton voters, "HFA Support <30" in Iowa, and "HFA Turnout 30-70"' in New Hampshire. Despite audit logs, Weaver said at the news conference that NGP VAN has told the campaign that no Clinton data was printed or downloaded."

http://www.bloomberg.com/politics/articles/2015-12-18/sander...

datapolitical10y ago

"Saving the list" entails creating a copy of the list on the VAN servers (technically, creating an SQL query). It does not mean copying any of the data locally where it could be kept.

It demonstrates the ability of the Sanders campaign to access the Clinton data without actually having the ability to use it once the breach was sealed, which, like the previous breach, it would inevitably be.

It's like making a copy of the personnel files left in the mailroom and sticking them in your mailbox. Lets you demonstrate they got left out in case VAN tries to say the breach wasn't serious.

cbps10y ago

"Despite audit logs, Weaver said at the news conference that NGP VAN has told the campaign that no Clinton data was printed or downloaded."

The phrasing here strikes me as somewhat vague. Are they implying that Weaver's statements are in conflict with the audit logs, or are they (somewhat ineffectively) implying that "saving lists" merely equates to bookmarking a certain query?

NGP stated:

"So for voters that a user already had access to, that user was able to search by and view (but not export or save or act on) some attributes that came from another campaign."

What exactly do they mean by "view", let alone "act on"? If someone was truly dedicated to extracting data through their browser, are the terms truly mutually exclusive?

jdnier10y ago

Here's an interview the Sanders campaign staffer who was fired gave explaining just what you describe: http://www.msnbc.com/thomas-roberts/watch/fired-sanders-camp...

vvanders10y ago

Wow, that interview is incredibly hostile.

2 more replies

coderjames10y ago

That's interesting, thanks for adding that explanation for us not in the space.

What I'm surprised about is that the campaigns are willing to let this data be stored in the cloud on shared systems. I would have expected all proprietary data to be stored locally by each campaign on private in-house servers, probably with periodic data dumps of updates from the data provider.

uxp10y ago

Why?

Why put forth the expense of obtaining (purchase or rent) hardware and staff to maintain that hardware? Additionally, why put forth the time and expense to write or compose a CRM-like software solution that integrates with voter data, what sounds like a dialer/call center, and "big data" tools (Spark, Hadoop, Tableau, SSIS/SSRS) that probably needs a good 6 months lead time before the candidate even announces a run for office? Also, why would every potential candidate do this every 4 years?

Sounds like a perfect choice for a hosted solution that can be iterated on outside of the election cycle.

1 more reply

harryh10y ago

How many businesses use Google for email and document storage and run their entire system on AWS?

Private in-house servers are very expensive to set up and maintain. Nearly everyone stores vital personal information on someone else's servers.

occsceo10y ago

> is that the campaigns are willing to let this data be stored in the cloud

Not the campaigns...the parties.

toomuchtodo10y ago

Difficulty level in replicating this dataset from secretary of state rolls?

occsceo10y ago

I've been working on a project like this for some time now - and wresting with whether I want to go the community-based vs. closed source model.

The problems listed below are pretty exact: huge data sets, lots of cleaning and normalizing, and the snail mail/cd problem is real. Additionally, I'd note that ~40% of the states [somehow] charge for the data...it takes six digits to get a snapshot of all 50 states - and certain states (looking at you FL) say that they do not store the historical, meaning you have to connect with the local BoE's to aggregate the data.

A part of me [now] wants to open source this because of the DNC's actions.

3 more replies

sethbannon10y ago

Not technically difficult but incredibly tedious. First, you have to go out and collect it from all 50 Secretaries of State, and in come cases county officials. Some states send you the data on a CD (no joke). You then have to clean the data, which is often not in great shape, and then normalize it.

Even then, you only have a snapshot, because the states typically don't keep historical data. What this means is that your dataset won't be as good as someone who's been collecting this data for years, and thereby knows things you won't like where someone used to live, how often they voted there, who recently dropped off the registered voter rolls, etc.

In this case, even this data wouldn't be enough, because the Sanders team had made likely hundreds of thousands of contacts with voters, and recorded what issues they cared about and who they planned to vote for. This data, which they personally collected, is now inaccessible to them.

edit: expounded

2 more replies

jdnier10y ago

My understanding is that the DNC contracts with VAN to manage the voter files for all fifty states. It's a shared database, with candidates able to build up their own data on top. All the campaigns can see the underlying voter data, but they additions they make are private to the individual campaign. The Sanders campaign staffers realized they were able to see Clinton campaign data they should not have access to. That's all that happened. The problem was fixed within a few hours. The Sanders people didn't abuse the bug in any significant way, in fact they reported it. But then the DNC cut off Sanders campaign access completely -- the nation-wide voter file, all their additions, inaccessible. At this point, the DNC response seems more punitive than security-related.

2 more replies

epaulson10y ago

Very. Expensive, and many states have legal restrictions on what you can use it for and who can get it.

Nationbuilder, a sorta-competitor to NGP, has put together a national voter file and it's reasonably priced. https://elections.nationbuilder.com/about/faq

The DNC/NGP voter file, however, is significantly enhanced - for one, it's got a lot of phone numbers, which most states don't include in their lists. There's a lot of other survey and consumer data associated.

For a national voter file that's good enough to use for say a City Council race anywhere in the country, where all you want to know is "who is likely to vote in this non-presidential election", it's doable but very expensive. For anything serious, it's pretty much outside of capabilities of anyone but the parties and some of the very big SuperPACs/very big orgs.

dragonwriter10y ago

As I understand it, the data set contains proprietary information from the campaigns using it, so it is impossible to reconstruct it from any public source, or really any source unless the campaign has retained separate copies of all the data (which is probably impractical.)

snarfy10y ago

Bad faith seems like a pretty nefarious claim. For all we know Hillary's campaign was accessing Sander's data this whole time. The breach went both ways.

smadge10y ago

Well the press release states: "Our team removed access to the affected data, and determined that only one campaign took actions that could possibly have led to it retaining data to which it should not have had access."

luckynonce10y ago

The Sanders campaign did not report the recent issue to the DNC or NGP VAN.

The Sanders campaign had reported a different issue with a different vendor's software in the past.

slg10y ago· 5 in thread

If you believe the Sanders camp, this sounds a lot like the Instagram bug bounty issue [1] that appeared on HN recently. Someone from the Sanders campaign identified a bug and to prove their was an issue grabbed private data that they should have never had the ability to access. That is questionable ethically whether they looked at the data or not. The DNC also can't immediately tell if it is the truth or if the data was taken maliciously. Given that, I don't think it is unreasonable to temporarily shut out the Sander campaign until it was fixed. Although if I was in charge, I would shut out all campaigns until the matter is fully investigated. It isn't fair to disable one campaign if there was nothing malicious happening. (Never mind, see edit)

EDIT: Actually on seconding reading the Sander's lockout was not for security reasons and was only done by the DNC in awaiting full details from the campaign. In that instance it wouldn't make sense to suspend any other campaign's access. They are punishing the Sanders campaign in hopes that it causes a quick confession of the exact details of what data the campaign accessed and retained. I still don't think that response is as unreasonable as some Sander supporters are alleging.

[1] - https://news.ycombinator.com/item?id=10754194

smadge10y ago

I think it is pretty unreasonable. As you note, there is no technical reason to deny the Bernie campaign access to their data. The Bernie campaign has fully indicated they want and are willing to cooperate with a third party investigation into the data breach, which would require investigating both campaigns, the DNC, and NPG VAN. Given they are already willing to share everything they know about the incident, there is absolute no legitimate reason for the DNC to intentionally sabotage the campaign.

slg10y ago

There is no technical reason, but that doesn't mean there is no reason. Sanders campaign may have violated rules. The DNC has thrown them in jail without bail in hopes that it gets things resolved quickly. I have no problem with that. If the DNC drags this process out that would be a very different story.

4 more replies

Zikes10y ago

Because of the audit, they were quickly able to identify who was accessing data they shouldn't have been able to access. Presumably the temporary lockout is not to prevent further data breaches, as that bug was already fixed, but to minimize the potential damage of the data breach they did identify.

noobermin10y ago

Every time something like this happens, non-technical people don't know how to respond to it. Just look at the DNC Chairwoman's response[0],

"That is just like if you walked into someone's home when the door was unlocked and took things that don't belong to you in order to use them for your own benefit."

Essentially, "gray-hat" hacking isn't always seen as a friendly warning to the vulnerable as much as it might be an attack. One has to wonder, if one could draw a physical parallel between a trespassing and gray-hat style hack, if you did enter someones house, take their gold watch from their bedroom, then walk down to you sitting at your breakfast table and tap you on the shoulder, and then say, "Hey bro, your door was open, and you didn't even secure your jewelry in a safe with a key in case someone did break in. I did this to demonstrate your house's vulnerabilities, you should be grateful! May be even give me a little something for my troubles..."

Of course, the parallel might not be fair, since one can't draw a parallel between a private house and a server with a public facing access point to sensitive material, so the closest proxy I can think of is a bank. Still, a similar parable can be drawn here: You rob a bank without tripping alarms and hand the manager $30000 of stolen money, and claim you did it to warn him/er of issues with the vault's security. In that case, it's plausible to assume s/he might not be that receptive.

I think it's great that penetration testers and people of the like are very willing to do the hard work of finding holes in security systems--and not use it for nefarious purposes, but actually disclose it to companies so that they can holster their systems--but how exactly is the hacked party supposed to take it?

[0]http://www.cnn.com/2015/12/18/politics/bernie-sanders-campai...

noobermin10y ago

can't edit, but my "you's" got switched up in my story about breaking into someone's house.

justinzollars10y ago· 4 in thread

I'm sure Sanders was just polling well, and this is the perfect opportunity for the DNC to pull the rug out under his campaign.

NGP-VAN is crap hack software anyways.

diyorgasms10y ago

Right? Bernie just got some endorsements (which he has been sorely lacking), and all of a sudden this company (the CEO of which is a public Clinton supporter) has a problem that affects the Sanders campaign but not the Clinton campaign, on word from the company that there was a bad actor in the Sanders campaign but not the Clinton campaign.

Sure it's possible that the Sanders campaign did exploit this and the Clinton campaign did not. But I'm skeptical as hell given the political allegiances of the company's leadership.

morninj10y ago

Having actually used NGP-VAN, I think it's far more likely that this was a bug, not a conspiracy. The VAN is a real clunker in many ways, so it's not surprising that a bug like this would appear; and public exposure of a conspiracy to sabotage Sanders would be so catastrophic to the Clinton campaign that I think it's highly unlikely that NGP-VAN would do it.

4 more replies

pool10y ago

When I need a reminder that I'm not like most people, I just need to look at how something in human psychology means that being the president / prime minister / dictator's son/brother/wife means that you get your turn as well.

Maybe it really is nothing more than "Oh, I've heard of pepsi, so I'd better buy a fucking ton of blue-labeled sugar water every week of my life", but there might be something else evolutionary about power and loyalty and reward.

brown9-210y ago

If so then this would be a non-event right?

toufka10y ago· 2 in thread

A significant problem with 'dynasties' is that you start to get perceived, if not real conflicts of interest above and beyond governance itself.

As was pointed out in this reddit thread [1],The CEO of NPG VAN (Stu Trevelyan) is a strong supporter of Hillary Clinton and worked on the 1992 Clinton-Gore "War Room," and then in the Clinton White House [2].

[1] https://www.reddit.com/r/technology/comments/3xbt3w/bernie_s...

[2] https://personaldemocracy.com/stu-trevelyan

joshdick10y ago

It's irrelevant who the NPGVAN CEO supports: The decision to cut off the Sanders campaign was made by the DNC, not him.

dragonwriter10y ago

Even without dynasties, its not at all uncommon (in fact, its rather routine) for either party committee chairs or firms serving parties or campaigns to have close ties to one or more of the candidates.

digitalzombie10y ago· 2 in thread

That bug seems to be setting back Bernie Sanders, which sucks.

The media going to have a field day with this.

Zikes10y ago

Well, at least the media blackout is over.

gingerrr10y ago

Right? I thought I was still asleep when I opened the news to see a Sanders headline, until I read the story

thieving_magpie10y ago· 1 in thread

A bug of that nature, completely bypassing all permissions, made it past testing (I presume they test). Whatever happened afterward is noise to me. How the hell do you let that happen?

Hardly getting any blame is a neat trick. I wish I had that luxury.

sneak10y ago

Close enough for government work.