PRISM: The Amazingly Low Cost of Using Big Data to Know More About You (opens in new tab)

(highscalability.com)

138 pointsJeffDClark12y ago28 comments

28 comments

27 comments · 6 top-level

skwirl12y ago· 13 in thread

According to the PRISM slides, the program costs $20 million per year. This article doesn't mention that, although it should, because it is telling. If the author believes that this program could be implemented at a minimum of $187 million per year, then that $20 million claim is problematic.

Either the $20 million claim is wrong, and then all the information on the slides is suspect, or it is correct, and the scope of PRISM is much smaller than is widely believed and is believed by the author of this article. Or the author of this article properly understands the scope and is in error in his calculation.

mtgx12y ago

PRISM is probably just a small part of the all-encompassing spying program.

sp33212y ago

Right, PRISM is just the project that gives the NSA access to data stored on other (Google, Facebook et al.) servers. The active eavesdropping and other systems aren't included.

1 more reply

diminoten12y ago

One of the fundamental issues with these discussions is that things such as what you're saying get thrown into the mix when we're talking about things we've actually seen evidence for.

We've, for a very long time, said things like, "this is probably happening". That is in no way whatsoever a novel idea. What is novel, and why these discussions are happening so frequently now, is that we have evidence that a 20mil/year program is actually happening.

So when we're talking about things we have evidence for, let's please avoid throwing in conjecture.

tokenadult12y ago

Multiple replies below have already questioned this either-or choice you present. From what I know about government agency presentations to higher-level authorities who set budgets, the likely claim on the slide is that the marginal cost of PRISM-as-such in an environment in which NSA already has other programs and the facilities to run them is just an insubstantial $20 million. And on the more extravagant assumptions of the submitted article, that might very well be a true claim for a PRISM program that gathers and analyzes quite a lot of data. That's especially likely if NSA has low-cost in-house software development capabilities, as it surely does.

ra12y ago

It's funny, when I first saw the slide I assumed that $20mm was the anual fee for other intelligence organisations to gain access to PRISM.

yk12y ago

If you assume that the servers only retain data for one month, then the server costs are cut by a factor of 12 and you end with €168M/12=€14M (roughly $18M). And a total cost of $22M.

Additionally the posting assumes that all the data is stored, that is a lot of cat videos. With decent preprocessing you can probably cut the data rate by a rather large factor ( I would assume at least 100, since you do not need to store warez or the NYT homepage.) Then to do the opposite estimate, by assuming that the system is CPU bound, one needs hardware to process 120 GB/s. With roughly $10M you can then buy a few thousand machines, and your PRISM software needs to handle something like ~50 MB/s per machine. ( Which may or may not be a reasonable data rate, depending on the sophistication of the algorithms, and how much can be discarded very easily.)

reeses12y ago

Big Data is just data before grep is applied.

bigiain12y ago

Right - I'm off to write my "email stored steganographically in cat videos" service...

jlgaddis12y ago

> This is a worst case scenario that does not include potential discounts due to renting such a high volume of hardware and traffic or acquiring the aforementioned hardware (which incurs a higher initial investment but lower recurring costs) .

"worst case scenario" is emphasized in the article.

miahi12y ago

The author counts the storage on a yearly basis (servers to store a year of data). If you allow an expiration date for the records (let's say 4 years), after that period you can spend less on hardware, as you can free space from the old records. Then you only need to spend money on the traffic difference (as the traffic would increase in 4 years).

As the storage boxes in the article also have a nice CPU, the collected data can be indexed and then compressed, saving a lot of space.

samatman12y ago

Given that the Internet grows exponentially year by year, while the cost to store a bit of information drops in a similar fashion, I doubt there is any money to be saved by deleting old data. The save in system complexity is likely to handily outweigh the additional cost of storage, not to mention it isn't worth one iota of frustration and bad reviews if data an analyst wants is not available.

dreamfactory12y ago

$20M doesn't necessarily include infra (e.g. which is already covered in Utah), could just be program running costs.

vehementi12y ago

Author says it is worst case upper bound.

droz12y ago· 5 in thread

The developer salary costs seem really outlandish. 500k euro for a "top notch developer" and 250k euro for "supporting developers". Where are these estimates coming from?

lemming12y ago

500k is not that outlandish for a system like this. Cameron Purdy, who developed Tangosol (now Oracle) Coherence said that he regularly turned down $500k job offers. Bear in mind that their idea of "top notch" is not what gets bandied around here as "rock star". They're not programming Rails. They're talking people who have had their shit together since day one, have always been top of their class even after they got into the best universities, and have applied themselves their whole lives to the theory and practice of building big systems. I've built some biggish systems but I still spent a large amount of my university years and youth generally studying beer, skirts and house music. It makes a difference, I'm a long long way from what they're talking about.

Here's how I would think about it if I were building this. The total hardware costs are 168M, and the total personnel costs are 4M. Say I pay $500K instead of $200k and in doing so I get Jeffrey Dean instead of someone like me (I suspect I might have to pay more than $500k for Jeffrey Dean but bear with me). My costs have doubled but the efficiency of the system might be 5x or 10x better because I'm just quite good at my job and he's a total legend. That efficiency scales the total hardware cost, which dwarfs the personnel cost. I'd say $500k starts to look pretty cheap at that scale.

rdouble12y ago

It's the cost of renting a developer with security clearance from BAH. Snowden would likely be in the same salary band as a "supporting developer." He was taking home $122K, thus his actual cost to the government was likely $250-$300K.

angersock12y ago

This is what's referred to as "blood money".

I wouldn't sell out my fellow man for anything less.

bigiain12y ago

If anybody's buying, I'd consider selling out my fellow man for $490K (with suitable benefits). ;-)

LordHumungous12y ago

Considering the scale, and importance of the system that seems low.

capkutay12y ago· 2 in thread

I would assume the government would require services and large teams to help them make sense of the data along with the low the infrastructure described by the article.

Sven712y ago

I am pretty sure they can't make sense of the data, nor can they afford the sort of talent that might be able to make sense of the data. So they do whats easy...just keep collecting more.

It's much more efficient as an employment/pension guarantee scheme than an efficient intelligence tool.

Reminds me of the qoute from Yes Minister - Something must be done. This is something. Therefore we must do it.

dreamfactory12y ago

According to an ex-Stasi chief on these revelations: “It is the height of naivete to think that once collected this information won’t be used”.

http://news.cnet.com/8301-1009_3-57591551-83/ex-stasi-boss-g...

ww52012y ago· 1 in thread

The low cost is probably because the partner commercial companies are "donating" their vast collecting resources to help the program.

AsymetricCom12y ago

Well, eliminating redundant or spurious data further down the stack can make these kinds of big data operations very cheaply. It's the difference between parsing unstructured logs and a flat file. There is no resources being spent if the data is proactively being maintained and curated.

sneak12y ago

From TFA:

> Do you think that PRISM can be built using a different tech stack?

From https://en.wikipedia.org/wiki/Apache_Accumulo :

> Apache Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and server-side programming mechanisms.

> Accumulo was created in 2008 by the National Security Agency and contributed to the Apache Foundation as an incubator project in September 2011.

mitchi12y ago

Please, our CANADIAN Firearms Registry program cost 66M a year. And that's Canada, the gun land!

I would expect Prim to at LEAST cost more than Instagram, no? :)

j / k navigate · click thread line to collapse

28 comments

27 comments · 6 top-level

skwirl12y ago· 13 in thread

mtgx12y ago

PRISM is probably just a small part of the all-encompassing spying program.

sp33212y ago

Right, PRISM is just the project that gives the NSA access to data stored on other (Google, Facebook et al.) servers. The active eavesdropping and other systems aren't included.

1 more reply

diminoten12y ago

One of the fundamental issues with these discussions is that things such as what you're saying get thrown into the mix when we're talking about things we've actually seen evidence for.

So when we're talking about things we have evidence for, let's please avoid throwing in conjecture.

tokenadult12y ago

ra12y ago

It's funny, when I first saw the slide I assumed that $20mm was the anual fee for other intelligence organisations to gain access to PRISM.

yk12y ago

If you assume that the servers only retain data for one month, then the server costs are cut by a factor of 12 and you end with €168M/12=€14M (roughly $18M). And a total cost of $22M.

reeses12y ago

Big Data is just data before grep is applied.

bigiain12y ago

Right - I'm off to write my "email stored steganographically in cat videos" service...

jlgaddis12y ago

"worst case scenario" is emphasized in the article.

miahi12y ago

As the storage boxes in the article also have a nice CPU, the collected data can be indexed and then compressed, saving a lot of space.

samatman12y ago

dreamfactory12y ago

$20M doesn't necessarily include infra (e.g. which is already covered in Utah), could just be program running costs.

vehementi12y ago

Author says it is worst case upper bound.

droz12y ago· 5 in thread

The developer salary costs seem really outlandish. 500k euro for a "top notch developer" and 250k euro for "supporting developers". Where are these estimates coming from?

lemming12y ago

rdouble12y ago

angersock12y ago

This is what's referred to as "blood money".

I wouldn't sell out my fellow man for anything less.

bigiain12y ago

If anybody's buying, I'd consider selling out my fellow man for $490K (with suitable benefits). ;-)

LordHumungous12y ago

Considering the scale, and importance of the system that seems low.

capkutay12y ago· 2 in thread

I would assume the government would require services and large teams to help them make sense of the data along with the low the infrastructure described by the article.

Sven712y ago

I am pretty sure they can't make sense of the data, nor can they afford the sort of talent that might be able to make sense of the data. So they do whats easy...just keep collecting more.

It's much more efficient as an employment/pension guarantee scheme than an efficient intelligence tool.

Reminds me of the qoute from Yes Minister - Something must be done. This is something. Therefore we must do it.

dreamfactory12y ago

According to an ex-Stasi chief on these revelations: “It is the height of naivete to think that once collected this information won’t be used”.

http://news.cnet.com/8301-1009_3-57591551-83/ex-stasi-boss-g...

ww52012y ago· 1 in thread

The low cost is probably because the partner commercial companies are "donating" their vast collecting resources to help the program.

AsymetricCom12y ago

sneak12y ago

From TFA:

> Do you think that PRISM can be built using a different tech stack?

From https://en.wikipedia.org/wiki/Apache_Accumulo :

> Accumulo was created in 2008 by the National Security Agency and contributed to the Apache Foundation as an incubator project in September 2011.

mitchi12y ago

Please, our CANADIAN Firearms Registry program cost 66M a year. And that's Canada, the gun land!

I would expect Prim to at LEAST cost more than Instagram, no? :)

j / k navigate · click thread line to collapse

PRISM: The Amazingly Low Cost of ­Using Big Data to Know More About You (opens in new tab)

PRISM: The Amazingly Low Cost of ­Using Big Data to Know More About You (opens in new tab)

28 comments

28 comments

PRISM: The Amazingly Low Cost of Using Big Data to Know More About You (opens in new tab)

PRISM: The Amazingly Low Cost of Using Big Data to Know More About You (opens in new tab)