undefined | Better HN

0 pointsvarunkmohan3y ago0 comments

Genuinely asking, why do you think Twitter needs 24 million vcpus to run?

This is not apples to apples but Whatsapp is a product that entirely ran on 16 servers at the time of acquisition (1.5 billion users). It really begs the question why Twitter uses so much compute if there are companies that have operated significantly more efficiently. Twitter was unprofitable during acquisition and spent around half their revenue on compute, maybe some of these features were not really necessary (but were just burning money)?

0 comments

20 comments · 12 top-level

d233y ago· 3 in thread

Presumably there's an entire data engineering / event processing pipeline that's being used to track user interactions at a fine grained level. These events are going to be aggregated and munged by various teams for things like business analytics, product / experiment feature analysis, ad analysis, as well as machine learning model feature development (just to name a few massive ones off the top of my head). Each of these will vary in their requirements of things like timeliness, how much compute is necessary to do their work, what systems / frameworks are used to do the aggregations or feature processing, and tolerance to failure.

> This is not apples to apples but Whatsapp

And yeah, whatsapp isn't even close to an apt comparison. It's a completely different business model with vastly different engineering requirements.

Is Twitter bloated? Perhaps, but it's probably driven by business reasons, not (just) because engineers just wanted to make a bunch of toys and promo projects (though this obviously always plays some role).

ricardobeat3y ago

And for the most part, this herculean effort is wasted. Most people just want to see latest tweets from people they follow. Everything else is fluff to manipulate engagement metrics, pad resumes and attempt to turn twitter into something it's users never wanted.

sofixa3y ago

Most people probably follow more people than they're capable of reading all the latest tweets of, so some sort of ranking/prioritisation makes total sense. And Twitter is ad funded, so they need to also show relevant ads where it makes sense/money.

veec_cas_tant3y ago

Just guessing, but a lot of the resources are probably devoted to making money for the business, not padding resumes. Others have pointed it out, but showing tweets doesn't generate revenue without additional infrastructure.

mapme3y ago· 2 in thread

They did not spend half their revenue on compute. It’s more like 20-25% for running data enters/staff for DCs. Check their earnings report.

Whats app is not an applicable comparison because messages and videos are stored on the client device. Better to look at Pinterest and snap, which spend a lot on infra as well.

The issue is storage, ads, and ML to name a few. For example, from 2015:

“ Our Hadoop filesystems host over 300PB of data on tens of thousands of servers. We scale HDFS by federating multiple namespaces.”

You can also see their hardware usage broken down by service as put in their blog.

https://blog.twitter.com/engineering/en_us/topics/infrastruc...

https://blog.twitter.com/engineering/en_us/a/2015/hadoop-fil....

therealdrag03y ago

Also Search (the article did says these wouldn’t fit to be fair but the discussion seems to be ignoring how much wouldn’t fit and why). Search is pretty expensive especially since to have it responsive you need the indexes to fit in memory—at least the Lucene variety, which at least in old YouTube videos Twitter used.

marcusae3133y ago

Twitter still uses Lucene. They built a custom codec for it, but doesn't change the cost too much.

xwdv3y ago· 2 in thread

Whatsapp used Erlang.

metadat3y ago

Can you expand on why this is worth pointing out in this context?

xwdv3y ago

WhatsApp used Erlang, a programming language designed for building concurrent, distributed systems, to help scale its messaging platform to support millions of users.

Erlang is particularly well-suited to building distributed systems because it was designed to handle failures at the process level rather than the hardware level. This means that if one part of a distributed system built with Erlang fails, the rest of the system can continue to operate without interruption.

This is critical for a messaging platform like WhatsApp, which needs to be able to handle millions of users sending messages simultaneously without experiencing any downtime. Additionally, Erlang's concurrency features allow it to support many thousands of simultaneous connections on a single machine, which is also important for a messaging platform that needs to be able to handle a large volume of traffic.

no_way3y ago· 1 in thread

Chat apps are mostly one on one interaction, it is much harder run an open platform where every user can potentially interact with every other user, not even talking about search and how complex it gets. If Twitter is bloated or not is a valid discussion, but comparison it to WhatsApp is not.

jiggawatts3y ago

Ironically the one-to-many broadcasts are much easier to implement on a single box than as a scalable service spread across thousands of small container instances.

ignoramous3y ago

> This is not apples to apples but Whatsapp is a product that entirely ran on 16 servers at the time of acquisition (1.5 billion users).

- 450m DAUs at the time of facebook acquisition [0]

- Twitter is not just DMs or Group Chat.

> It really begs the question why Twitter uses so much compute if there are companies that have operated significantly more efficiently.

A fair comparision might have been Instagram: While Systrom did run a relatively lean eng org, they never had to monetize and got acquired before they got any bigger than ~50m?

[0] https://www.sequoiacap.com/article/four-numbers-that-explain...

danpalmer3y ago

Being E2E encrypted, WhatsApp can’t do much with the content, so it is much closer to a naive bitshuffler than Twitter.

Twitter, while still not profitable (maybe it was in some recent quarters?) was much closer to it, having all the components necessary to form a reasonable ad business. For ads, analytics is critical, plus all the ad serving, plus it’s a totally different scale of compute being many to many rather than one to ~one.

Snoozus3y ago

Because the actual product is not showing people tweets but to optimize who to show which ads based on their previous interactions with the site. This is many orders of magnitude harder.

vidarh3y ago

Whatsapp is mostly message delivery from one person to a tiny group. It's absolutely trivial compared to Twitter.

In terms of what Twitter uses compute on, I'd guess analytics (Twitter measures "everything" for ad serving - go explore ads.twitter.com and analytics.twitter.com) and non-chronological timeline mixing both takes orders of magnitude more resources than the basic functionality.

peterhunt3y ago

The real answer is twofold:

1. Lots of batch jobs. Sometimes it's unclear how much value they produce / whether they're still used.

2. Twitter probably made a mistake early on in taking a fanout-on-write approach to populate feeds. This is super expensive and necessitates a lot of additional infrastructure. There is a good video about it here: https://www.youtube.com/watch?v=WEgCjwyXvwc

sangnoir3y ago

> This is not apples to apples but Whatsapp is a product that entirely ran on 16 servers at the time of acquisition (1.5 billion users).

WhatsApp is mostly a bent pipe connecting user devices & relying on device storage. If WhatsApp had to implement Twitter functionality and business model (with same Engineers and stack), they'd need a lot more servers too. I'd hazard the number of servers would be in the same order of magnitude

Spooky233y ago

Because at that time, Whatsapp didn’t do all of the stuff that you need to do. No lawful intercept, no abuse detection/prevention, etc.

I don’t know enough about Twitter to assess their infrastructure, but I know that it easy to run lean until there’s a problem, and then you get trapped.

threadweaver343y ago

Whatsapp doesn't do ranking.

j / k navigate · click thread line to collapse