One of the biggest tech stories so far this year was definitely Facebook paying $19bn for the messaging app Whatsapp. Quite why Facebook would pay such an eye- watering sum for a chat app is an interesting question of itself – a bit outside the scope of this article and network – but well worth your reading Benedict Evans.
The most interesting point from my perspective is that Whatsapp now send more messages than the entire global SMS system combined – and all with just 32 engineers.
WhatsApp message volume growth is still accelerating. May have overtaken global SMS. pic.twitter.com/KsR85Mplrt
— Benedict Evans (@BenedictEvans) February 19, 2014
Chat is hard, let’s go shopping
Now that’s no mean feat, because chat applications are notoriously difficult to execute well over the web’s lingua franca, HTTP. This is because HTTP is, at heart, a document retrieval system. Think of Tim Berners Lee, designing a system to view academic papers at CERN. Think about yourself reading BBC News – see a link, click on it; browser opens connection, requests document, downloads document; server closes connection.
The key point here is how HTTP manages connections – always closing them once a response has been served. This means that, although many people may be reading BBC News at any one time, only a small proportion are actually connected to the server simultaneously, namely those that are in the process of downloading a page. And this in turn has profound implications for server architecture, as we shall see.
Now a chat workflow is decidedly different. Think about what you want from a chat app:
- you want to be able to send multiple messages in a rapid fire burst
- you don’t necessarily want or expect to receive an immediate response
- you do however want the server to deliver your messages instantaneously to their intended recipients
- you want messages intended for you to be delivered tout de suite in order to maintain the chat flow
All quite different from a simple request/response.
It’s all about the concurrency
And if you do try and implement chat over HTTP, you’re in for all sorts of trouble – basically, the ‘instantaneous’ nature of the chat experience is lost. Since you don’t have open connections with every client, you can’t send messages instantaneously; you’ll need to write messages to the database first, then have separate worker processes reading them back and ‘pushing’ them to the clients; more resources, and a worse user experience.
There is a solution of course – just ditch HTTP and maintain an open connection with every client you’re talking to. But this opens a huge can of worms, because having been used to only handling connections from a small fraction of clients at any one time, you now have to handle connections with every single client at the same time; it wouldn’t be surprising to see the number of active connections rise ten- or a hundred- fold.
In other words, it’s not about the speed of your CPU; chat messages are relatively small. It’s not about your database; you should be focusing on delivering those messages, not storing them anyway.
No, chat is all about concurrency; the ability to handle many connections simultaneously; and it’s not an easy thing to do well.
Enough about systems for the moment; I just had a heretical sports- related thought:
A betting exchange is really just a chat application – in place of messages, you have bets; in place of chat groups, you have markets; and in place of the chat transcript, you have the order book
and started to wonder just how many bets Betfair are processing these days:
— Biegemartin (@beigemartin) March 7, 2014
OK well that was four years ago; a lot can happen in four years.
Looking at an active betting account, I noticed that bets placed on Jan 31st had ids in the region of 33,800,000,000 and bets on Jan 1st has ids in the region of 32,900,000,000; so we can reasonably assume some 900m bets are currently being placed each month, or some 30mm per day.
That’s Whatsapp- style growth! In which case, let’s make an unfair comparison:
I’m a big fan of comparisons that are unfair but relevant. That’s what disruption means. The more unfair, the more disruptive.
— Benedict Evans (@BenedictEvans) February 16, 2014
|Messages sent / trades per day||20bn||30mm|
|Number of employees||32||2000(1)|
|Market value (GBP)||12bn||1.2bn(1)|
|Messages / trades per day per employee||625mm||15,000|
|Market value per employee (GBP)||375mm||600,000|
So it looks as if an average Whatsapp engineer is able to process 41,666x more messages than the average Betfair engineer.
That’s quite a multiple.
Is that unfair ? Well, you could probably argue that not all Betfair’s employees are engineers; but even if you say only 1 in 10 is an engineer, Betfair’s numbers still don’t look very good.
Maybe bet messages are larger than chat messages ? Well the text portion is probably a bit bigger, but then Whatsapp is carrying a lot more binary data in the form of photos, so I’d argue that the average Whatsapp message is in fact larger, even if only 1 in 10 people are sending photos.
So perhaps the comparison isn’t that unfair after all. In which case it’s quite clear that one of these kids is doing it different.
So what exactly are our friends at Whatsapp doing under the hood ?
Well, they chose very wisely with Erlang, a language almost exclusively focused on concurrency. Erlang was designed to run telephone exchanges, which somewhat unsurprisingly turn out to have a lot in common with chat systems. And betting exchanges.
Erlang is not new. It was developed in the 90’s by Ericsson to run their AXD range of telephone exchanges. It just turns out that the computer scientists at Ericsson were trying to solve problems in the 90’s that a lot of modern systems have today – Bet365
Erlang meanwhile has no problem handling millions of connections. At the time of writing there are application servers written in Erlang that can handle more than two million connections on a single server in a real production application, with spare memory and CPU – ninenines.eu
The most interesting part of the story is perhaps that Erlang is neither a fast language nor a particularly developer friendly one; yet with it Whatsapp are able to post performance figures which are double- digit multiples of folks in similar businesses but using different stacks.
And what do we know about Betfair’s tech setup? Well, we know that in 2004 they made a well- publicised switch from VB to Java:
The company said they took a lot of time to evaluate the strenghts and weaknesses of both Microsoft’s .Net and J2EE at the early stages of the project, and ultimately decided to go with Java due to the platform’s “proven enterprise track record, security, and maintainability” – The Insight
We also know they were still using Java at the IPO date, since Oracle can rarely resist crowing about their ‘enterprise’ systems:
But the core of Betfair’s business logic is written in PL/SQL. The company has more than 250,000 lines of PL/SQL code, and the betting engine that runs that core exchange system is written entirely in PL/SQL.
“We really push the envelope of Oracle and PL/SQL with our exchanges,” says Alex De Vergori, database architect, Betfair. “We take in excess of 5 million transactions a day, and it all goes straight through to the database, where those bets match with other transactions.” – Oracle
A betting exchange written in 250,000 lines of SQL code. That sounds to me like a really bad idea. Sounds awfully like Betfair have a very DB- centric system, write stuff to the database before pushing updates to clients, which would tally with the less-than-real-time experience their website generally provides (there’s even a ‘Refresh’ button; ever seen that on a chat app ?!)
Betfair have gone from being the plucky upstart to a kind of corporate fat cat, parts of which could potentially be disrupted at relatively low cost.
We’ve covered the technology angle; but better technology alone probably isn’t enough to guarantee startup success. Fortunately in this case, there are two other factors at work which makes me think the time for disruption is probably ripe.
High Frequency Trading
Think about that growth in bets again (5mm to 30mm over four years). Has Betfair’s profitability grown by the same 6x factor over the same period ? Not a chance. So it’s highly unlikely that growth in bets is driven by retail (since there’s no associated increase in profitability).
Nope, much more likely that bet growth is being driven by a massive increase in high- frequency trading by technology savvy syndicates; remember that the marginal cost of submitting an extra bet electronically is close to zero. My guess is that there some kind of Pareto principle at work here, with 80% of the growth in bets being supplied by 20% of the users.
In fact it’s probably even more extreme than that, with only a handful of syndicates being responsible for that growth; target the top 100 syndicates and I bet you could take away 50%+ of Betfair’s volume.
Now those syndicates are almost certainly highly profitable (why else submit all those bets ?) and almost certainly very unhappy given that 40% of their profits are being sucked away via the Betfair Premium Charge.
Effectively Betfair have decided to assume the position of the Inland Revenue here; since there’s no such thing as a professional gambler in the eyes of the IR (how long can that last ?!), Betfair have decided they may as well help themselves to the tax rake that a professional in any other industry would have to pay.
But this opens themselves up to tax arbitrage – an exchange with decent liquidity and a lower tax rake is suddenly a very compelling proposition to a big and profitable syndicate.
Now I’m not saying that Betfair are likely to be disrupted overnight; for a start, they have massive marketing clout which means you can forget about competing with their retail business. But their wholesale exchange business does look like it’s ripe for disruption through a combination of their own strategy mis-steps and cheaper open- source technology, as Whatsapp have shown.