Scaling Financial Reporting at Airbnb (opens in new tab)

(medium.com)

175 pointsknighthacker9y ago40 comments

40 comments

37 comments · 14 top-level

arafa9y ago· 7 in thread

"Going from imperative programming to functional programming has been a powerful paradigm shift for us to think about financial processing and accounting. We can now think of this system as a straightforward actor/handler system rather than getting mired in complicated SQL-join logic."

Though whether SQL is a functional language (or a programming language at all, if you're talking ANSI SQL) is a subtle question, I would at the very least not describe it as a traditional imperative programming language. I think this is an important distinction for the article because, contrary to what this article suggests, I've found SQL to be quite helpful for understanding functional and declarative programming concepts. That said, it might be a lot easier to express the types of tasks in the article as straight-forward functions rather than getting wrapped up in all this set-based talk in SQL.

barrkel9y ago

Relational algebra is a really useful model to think about ETL task generally; SQL is an awkward dialect to express relational algebra, but it is at least a well-known one, and reasonably portable for a subset of querying. You can see the payoff in the Hadoop ecosystem too: Hive with HQL, spark-sql, Impala - SQL being used to express a data flow graph with a bunch of relational operators.

When you program directly against Spark, you're effectively building SQL plans explicitly. It's both more indirect - instead of writing a program that does stuff, you write a program that creates a data flow graph that does stuff; and you have more responsibility for performance, for good and bad.

I think to get good performance, you simply can't think on a per-item basis. You need to orient your thinking towards what can be efficiently performed at the bulk level. Whether it's column scanning in HDFS, or index scanning in a RDBMS, you need to be aware of the engineering properties of the operators you're applying. Doing lots of things per-item is a recipe for blowing your budgets, whether it's cache, memory, I/O, whatever. You want to iteratively do a little work to lots of items, and then join, rather than lots of work to each item one at a time.

a_liang9y ago

Hi, I'm the author. Yeah, you have a good point. Imperative programming was just the way that we were using SQL to build that system.

bigger_cheese9y ago

Did you consider an off the shelf ERP product?

I've written something similar to the first part - extract raw data out of various source DB's using SQL queries then push it to our organisation's ERP product (SAP) using A2A messaging.

From my view SAP is black box but it handles the actual accounting/ financial logic part i.e Ledgers, product tracking, inventory management etc. Our Accountants all seem pretty comfortable using it.

2 more replies

davidcaseria9y ago

Do you plan on using Scala and Akka Persistence in the future for the entirely event-based system? http://doc.akka.io/docs/akka/current/scala/persistence.html.

We have been using it at the finance company I work for to maintain customer's ledgers.

1 more reply

rboyd9y ago

Thanks for the writeup! I was on the payments team at Groupon and have fond memories of the same challenges.

Swizec9y ago

SQL is in fact one of the most well known declarative languages. It operates on sets so I'm not sure it even knows how to do imperative.

Can it?

icebraining9y ago

They mention triggers, my guess is that they were using something like PL/SQL to push data into the financial tables.

dantiberian9y ago· 6 in thread

I've previously worked with enterprise financial systems, and most of what was described here was standard functionality, or would be customisable to do so without too much effort. I would have loved to have heard more about whether they evaluated traditional ERP systems and why they chose to build their own event sourcing system instead.

a_liang9y ago

One thing to consider was that we had to build this in about 5-6 months from the ground up, and match it to our existing financial system. Most of the work was to derive the financial meaning from a system not designed to provide it. The traditional ways you would extract data into an ERP would be with raw data import, or with some slightly processed data. If we did that, we still have the problem of tightly coupling data models and accounting logic, which makes for very slow engineering progress on the product front. Not a tradeoff we want to make.

If you were talking about exporting the data from the event based system, then that's still possible, and I think it's something that our finance team may still be evaluating, but I can't speak for them.

ernestbro9y ago

Same story with Superset, their homegrown BI system. Why re-invent the wheel?

sheeshkebab9y ago

If it was not for "reinventing the wheel" we'd all be coding in cobol or Fortean against IBM supplied mainframes.

In other words - because they can and should (and can afford to and not be hooked up to some third party vendor milking them for the rest of their companies existence).

Tarq0n9y ago

Why sell your soul to Tableau when you can leverage your large pool of engineering talent to create something that only requires investing once, rather than large recurring license fees forever?

In my opinion Superset actually fills a niche that was open for far too long.

hobofan9y ago

Because it is not rocket science and you need people inside the company that know the insides of it when you want to adapt it?

amenghra9y ago

NIH syndrome?

elbasti9y ago· 2 in thread

There seems to be an accounting equivalent of Greenspun's tenth rule:

"Any sufficiently complicated financial reporting system contains an ad-hoc, informally-specified, bug-ridden implementation of double-entry accounting."

It's such a powerful yet simple way of thinking about money that anyone who builds a billing or accounting system should be intimately familiar with it.

hcarvalhoalves9y ago

Related, might be interesting: "Building a powerful double-entry accounting system" -> https://www.youtube.com/watch?v=aw6y4r4NAlw

icebraining9y ago

Agreed; it's unfortunate that explanations are often filled with accounting jargon that obscure its advantages. The concept itself is awesome, and avoids so much pain down the line.

plehoux9y ago· 2 in thread

At my company we built accounting on top of Xero.com API. Even at our small scale we're already busting their capacity. Some reports can't even be generated anymore because we have too much transactions.

This post is a fascinating outlook of how things can be ran at big corp. I always wondered how big companies did accounting. This stuff can't be outsourced since it's so tightly coupled to your business logic.

Is there any tech conferences geared toward this stuff?

Any accounting saas that can scale indefinitely? As opposed to Xero?

richdougherty9y ago

> Any accounting saas that can scale indefinitely? As opposed to Xero?

Need to wait for AWS to release a version of Airbnb's system. ;)

fauigerzigerk9y ago

>I always wondered how big companies did accounting

Many big corporations use SAP or Oracle.

philfrasty9y ago· 2 in thread

„...On the date of the reservation confirmation, we have a guest receivable of $100, and a future host payable of $90...“

The one thing I hate about AirBnB.... Looking to rent a nice place for 10k 9 months in advance? Ok...we'll charge it RIGHT NOW.

Why not do it like Amazon and charge when the package is shipped (trip date has arrived)? I get the cashflow thing but come on... (100 bookings and counting...)

rtpg9y ago

I bet if they didn't charge now, suddenly they have to deal with "card charge failed the day before the flight and now the reservation isn't paid for and the person is in front of the door"-style issues.

Of course AirBnB could forward the money to the host but doing it they way they do cuts out the problem entirely.

richdougherty9y ago

It's the same with booking airfares. I guess when you pay in advance for services to be rendered at a future date you're technically buying a financial "future", which does have a present value. However, I wish these futures were tradable!

d--b9y ago· 2 in thread

Is it really that big that we need to talk about "scaling" it? 4-5 hours of processing time is a lot... I mean how many transactions would they do on a normal day?

dangoldin9y ago

Yea - it does seem a bit high. We use Spark for our adtech data pipeline and we're handling tens of billions of events a day in less time. It may be a function of how much data they're pulling in from other systems or dumping the data back into a variety of systems. Spark itself is parallelizable so in theory can be sped up just by running more nodes.

sheeshkebab9y ago

financial processing is typically sequential - can't calculate some metric until some other thing was calculated (or pulled data for)... not well parallelizable in other words. or so it is with some systems I deal with.

timae9y ago· 1 in thread

This is great, thanks for the writeup! We are actually building something quite similar. We have a rough landing page up at hireross.com. Airbnb folks, I'd love to chat if you're up for it. Just to learn more. email is at the bottom of the landing page.

timae9y ago

changed the landing page a bit to include a signup form instead of email. Email is ross@hireross.com.

spIrr9y ago· 1 in thread

Alice, do you recognise guest receivables the day a booking is made, not the day after check-in ("services rendered")? If that's true, not sure if this is correct, i.e., this event doesn't seem to meet the asset recognition criteria... My intuition would tell me Dt Accounts receivable - Cr Revenue on the day after check-in.

spIrr9y ago

On a second look, I don't seem to understand the flow of your accounting entries:

Booking date (I understand that on that day, the booking is confirmed, but the guest's payment method has not yet been charged)

  Dt Receivables from guests 100

    Cr Future host payout 90 (Problematic: how can you pre-recognise a liability on your balance sheet? Unless this is an off-balance sheet account...)

    Cr Deferred income 10 (Problematic: deferred revenue arises when you receive a pre-payment from customers and have a standing obligation to them to render the services)

Payment made by guest

  Dt Cash 100

    Cr Receivable 100

Check-in date

  Dt Deferred revenue 10

  Dt ??? ??? (seems to be missing to balance the double-entry)

    Cr Payable 90

To me, the natural way to do this would be:

Booking date - no accounting entries, no effect on books. You did not render a service to the customer yet, nor fulfil or incur any obligations yet.

Payment received from guest - you received a prepayment for future services to be rendered, so now have a liability to the customer to fulfil this obligation, i.e., deferred revenue.

  Dr Cash 100

    Cr Deferred revenue 10

    Cr Payables to host 90 (unsure about this one, as this goes into the whole gross vs. net revenue recognition discussion for marketplace-type businesses)

Day after check-in - recognising the revenue on one single day might work now, but it's only a makeshift solution - what if your customer stays for a longer period of time, or between two accounting periods, and this becomes material? By definition, you recognise revenues proportionally, but I get that the cost vs. benefit of doing this now might be unfavourable.

  Dr Deferred revenue 10

    Cr Revenue 10

sachinag9y ago

If this is generalizable, and it looks like it is, it would be pretty smart to spin this out as a new company, make Airbnb the company's first customer, and go after Oracle's financial products.

(Disclosure: I used to work at Oracle on Financials Cloud.)

nadocrew9y ago

This is very interesting, but I would love to know more details. Are they using something like Amazon Kinesis or Kafka to send events and handle missed events? What serialization format are the messages? How do they manage keeping the schemas of the events in sync?

caseyf79y ago

Did you consult your auditors through this? I would expect their answer to be something along the lines of "Spark? Scala? We don't think we can trust this so we'll need to do a full audit." versus "Oh, you're using SAP, ok."

bigzen9y ago

I'm worn out on articles dissing the performance of SQL databases without quoting any hard numbers and then proceeding to replace the systems with no thanks of development in the latest and great tech. I have nothing against spark, but I find it very hard to believe that alarm code is now readable than SQL. In fact, my experience is just the opposite.

AirBnB is using an extract, load and transform architecture. No mention of the hardware, data through put, whether they have a message broker/queue to ease the burden of peak volume but work.

I have a strong feeling that they could have 1.) Kept the system exactly how it is and done some performance tuning. But that's not sexxy anymore. Things are just supposed to scale. Which brings me to

2.) Moved transformation logic to its own server or multiple servers using a message broker and queue to aid the transfer of data between systems. It would have been more readable and could have been done in a mo the or less.

In summary I believe they should have put some effort in to keep SQL. Especially for the purpose of accounting because spark does not lend itself to readable logic.

jaysoncena9y ago

This new system that uses events looks more flexible since it is decoupled on the application logic. I think the downside with this one is that the new system has a lot of moving parts. Also, changes in logic/new events must be communicated and should be supported before the main app is put into production.

ransom15389y ago

tldr; They built an event/messaging system. When an event occurs they broadcast that event with meta data. (EG. event: reservation_booked, meta {..}). Before they tightly coupled reporting sql inside the business logic.

j / k navigate · click thread line to collapse

40 comments

37 comments · 14 top-level

arafa9y ago· 7 in thread

barrkel9y ago

a_liang9y ago

Hi, I'm the author. Yeah, you have a good point. Imperative programming was just the way that we were using SQL to build that system.

bigger_cheese9y ago

Did you consider an off the shelf ERP product?

I've written something similar to the first part - extract raw data out of various source DB's using SQL queries then push it to our organisation's ERP product (SAP) using A2A messaging.

From my view SAP is black box but it handles the actual accounting/ financial logic part i.e Ledgers, product tracking, inventory management etc. Our Accountants all seem pretty comfortable using it.

2 more replies

davidcaseria9y ago

Do you plan on using Scala and Akka Persistence in the future for the entirely event-based system? http://doc.akka.io/docs/akka/current/scala/persistence.html.

We have been using it at the finance company I work for to maintain customer's ledgers.

1 more reply

rboyd9y ago

Thanks for the writeup! I was on the payments team at Groupon and have fond memories of the same challenges.

Swizec9y ago

SQL is in fact one of the most well known declarative languages. It operates on sets so I'm not sure it even knows how to do imperative.

Can it?

icebraining9y ago

They mention triggers, my guess is that they were using something like PL/SQL to push data into the financial tables.

dantiberian9y ago· 6 in thread

a_liang9y ago

ernestbro9y ago

Same story with Superset, their homegrown BI system. Why re-invent the wheel?

sheeshkebab9y ago

If it was not for "reinventing the wheel" we'd all be coding in cobol or Fortean against IBM supplied mainframes.

In other words - because they can and should (and can afford to and not be hooked up to some third party vendor milking them for the rest of their companies existence).

Tarq0n9y ago

Why sell your soul to Tableau when you can leverage your large pool of engineering talent to create something that only requires investing once, rather than large recurring license fees forever?

In my opinion Superset actually fills a niche that was open for far too long.

hobofan9y ago

Because it is not rocket science and you need people inside the company that know the insides of it when you want to adapt it?

amenghra9y ago

NIH syndrome?

elbasti9y ago· 2 in thread

There seems to be an accounting equivalent of Greenspun's tenth rule:

"Any sufficiently complicated financial reporting system contains an ad-hoc, informally-specified, bug-ridden implementation of double-entry accounting."

It's such a powerful yet simple way of thinking about money that anyone who builds a billing or accounting system should be intimately familiar with it.

hcarvalhoalves9y ago

Related, might be interesting: "Building a powerful double-entry accounting system" -> https://www.youtube.com/watch?v=aw6y4r4NAlw

icebraining9y ago

Agreed; it's unfortunate that explanations are often filled with accounting jargon that obscure its advantages. The concept itself is awesome, and avoids so much pain down the line.

plehoux9y ago· 2 in thread

Is there any tech conferences geared toward this stuff?

Any accounting saas that can scale indefinitely? As opposed to Xero?

richdougherty9y ago

> Any accounting saas that can scale indefinitely? As opposed to Xero?

Need to wait for AWS to release a version of Airbnb's system. ;)

fauigerzigerk9y ago

>I always wondered how big companies did accounting

Many big corporations use SAP or Oracle.

philfrasty9y ago· 2 in thread

„...On the date of the reservation confirmation, we have a guest receivable of $100, and a future host payable of $90...“

The one thing I hate about AirBnB.... Looking to rent a nice place for 10k 9 months in advance? Ok...we'll charge it RIGHT NOW.

Why not do it like Amazon and charge when the package is shipped (trip date has arrived)? I get the cashflow thing but come on... (100 bookings and counting...)

rtpg9y ago

Of course AirBnB could forward the money to the host but doing it they way they do cuts out the problem entirely.

richdougherty9y ago

d--b9y ago· 2 in thread

Is it really that big that we need to talk about "scaling" it? 4-5 hours of processing time is a lot... I mean how many transactions would they do on a normal day?

dangoldin9y ago

sheeshkebab9y ago

timae9y ago· 1 in thread

timae9y ago

changed the landing page a bit to include a signup form instead of email. Email is ross@hireross.com.

spIrr9y ago· 1 in thread

spIrr9y ago

On a second look, I don't seem to understand the flow of your accounting entries:

Booking date (I understand that on that day, the booking is confirmed, but the guest's payment method has not yet been charged)

  Dt Receivables from guests 100

    Cr Future host payout 90 (Problematic: how can you pre-recognise a liability on your balance sheet? Unless this is an off-balance sheet account...)

    Cr Deferred income 10 (Problematic: deferred revenue arises when you receive a pre-payment from customers and have a standing obligation to them to render the services)

Payment made by guest

  Dt Cash 100

    Cr Receivable 100

Check-in date

  Dt Deferred revenue 10

  Dt ??? ??? (seems to be missing to balance the double-entry)

    Cr Payable 90

To me, the natural way to do this would be:

Booking date - no accounting entries, no effect on books. You did not render a service to the customer yet, nor fulfil or incur any obligations yet.

Payment received from guest - you received a prepayment for future services to be rendered, so now have a liability to the customer to fulfil this obligation, i.e., deferred revenue.

  Dr Cash 100

    Cr Deferred revenue 10

    Cr Payables to host 90 (unsure about this one, as this goes into the whole gross vs. net revenue recognition discussion for marketplace-type businesses)

  Dr Deferred revenue 10

    Cr Revenue 10

sachinag9y ago

If this is generalizable, and it looks like it is, it would be pretty smart to spin this out as a new company, make Airbnb the company's first customer, and go after Oracle's financial products.

(Disclosure: I used to work at Oracle on Financials Cloud.)

nadocrew9y ago

caseyf79y ago

bigzen9y ago

AirBnB is using an extract, load and transform architecture. No mention of the hardware, data through put, whether they have a message broker/queue to ease the burden of peak volume but work.

In summary I believe they should have put some effort in to keep SQL. Especially for the purpose of accounting because spark does not lend itself to readable logic.

jaysoncena9y ago

ransom15389y ago

j / k navigate · click thread line to collapse