MsgPack vs. JSON: Cut your client-server exchange traffic by 50% with one line of code

As the previouse 2 articles were about optimizing performance and keeping the server cost as low as possible, this article will also cover this area and write about why I’m using MsgPack instead of JSON in ZeroPilot.

In a web-app there is allways data sent between the client and the server – would you believe me if I told you that after otimizing all the server-calls, bundeling them together ect. you can still save up to 40% (and some cases even 60%) of your traffic caused by client-server communication with not more than one line of code?

Well, the magic word is “MsgPack” – it is a ‘binary-based efficient object serialization library‘. The difference to JSON is, that MsgPack is binary-based - this gives the possibility to make the exchanged data a) smaller and use less bytes, I guess we all know the advantages of that, however there is an even bigger advantage: b) It is faster to parse and encode, having a parser parse 40 bytes takes about twice as long as parsing 20 bytes. I know those advantages might seem marginal – however if you scale this up to a couple thousand (or even more) users you will see quite a bit of a difference there. And even if you don’t have a couple thousand users – why would you not want to take advantage of something like this? ;-)

To make this difference more visible i put together a small example:

JSON


{“name“:”John Doe“,”age“:12}

MsgPack


‚¤name¨John Doe£age.

 

7B 22 6E 61 6D 65 22 3A 22 4A 6F 68 6E 20 44 6F 65 22 2C 22 61 67 65 22 3A 20 31 32 7D


82 A4 6E 61 6D 65 A8 4A 6F 68 6E 20 44 6F 65 A3 61 67 65 0C

 
→ 29 bytes
→ 20 bytes
 

As you can see, the MsgPack-encoded data is about 2/3 the size of JSON.

(edit: I previously had some unnecessary bytes in the JSON code, making the example to much in favour of MsgPack, I removed those. thx to charles leifer for noticing)

Additionally, the whole thing look less readable when you look at it – so as a small bonus this will probably repell some script-kiddies, who are trying to intercept your JSON- or XML-calls.

One line of code you said?


That is correct, assuming that you are currently using JSON: (JavaScript-example)

//just exchange the encoding and decoding lines
myJSONString = JSON.stringify(myObject);
myObject = JSON.parse(myJSONString);
//with
var myByteArray = msgpack.pack(myObject);
myObject = msgpack.unpack(myByteArray);

MsgPack returns Objects/Hashes or Arrays, Numbers and Strings just like JSON will do, so there is NO need to change anything further than the encoding- and decoding lines.

Why havn’t I heared from MsgPack before?


As with all good things: They need time, MsgPack has first been introduced about 2 1/2 years ago I believe (at least according to their github-repo) – it might be older, but without a github repository it is rather difficult to get noticed nowadays.

I can only encourage you to visit the MsgPack-Site and download the implementation for your programming-language(every major language has one), even if your desired language has no official implementation, there are usually unofficial projects to be found via google or github. If you can’t find it – contribute to the community and write your own implementation and share it. :-)

Edit #1: Looking at JavaScript it is true, that parsing JSON is faster than any custom parser. However, the field of usage is more than just JavaScript, considered that the serverload is probably more to be taken into account than your clients, as it won’t really matter if your clients need 0.1ms or 2ms to decode/encode something, however on the serverside MsgPack-parsing is usually faster than JSON – in the case of Ruby, MsgPack is faster by a factor of ~5.

Edit #2: Since there was quite a bit of critisism here and on Hacker News, Sadayuki “Sada” Furuhashi, the creator of MessagePack posted some thoughts on MessagePack on gist: https://gist.github.com/2908191 clarifying a few things, thank you for that Sadayuki.

  • http://2questions Sai Nayagar

    I liked your post about message pack and it got me thinking. I have two questions for you:

    We already run gzip compression on our web servers, so how much bandwidth will we be really saving by switching to message pack?

    How does debugging work with message pack? I am very often using the http network traffic debugger in chrome, and it is super easy to understand what is being sent and received. Would I still be able to do that is the built in support for decoding these messages on the fly?

    • olsn

      1) Saving bandwidth: Well this depends on how much data you exchange in one request – it happens often, that the protocoll-headers ect. are bigger than the data itself, in this case the percentage of saving is not to high.

      2) Yes, that is correct. Debugging can be a pain, however: You can build a serialization-layer, and then switch between JSON for debugging purposes and MsgPack for production.

    • Anonymous

      Don’t use Gzip if you use a binary formatter or serializer, it’s much higher CPU and actually adds more bytes (due to the deflate table headers) for small sizes.

  • Nathan

    Interesting idea, but the article is rife with spelling mistakes. I can tolerate a few, but, this is excessive. You’re probably saving a bit of bandwidth with all the missing characters, too.

    • olsn

      :-) – Sorry, my english is not the best, I WILL try to improve, I hope you can forgive me! ;-)

  • http://charlesleifer.com charles leifer

    Your Json example has at least 6 unnecessary bytes used for CR/LF.

    • olsn

      you are right, i previously had the bytes of the json pretty-print included in the example, that made it a little bit unfair for JSON, i changed that
      thank you.

  • Fred

    On mobile, anything that rely on JS execution run 20-50 times slower. I bet JSON parse/eval will be much faster than MsgPack decoding. Gzipped JSON should be good enough in most case.

    • Lol

      Even on desktop the difference between json.parse and custom parsing is factor 10

      • olsn

        That is not quite true, you are right, that custom parsing is slower, however the factor is (depending on the browser) max. 3 – also you have to look at the fact that MsgPack is so young, maybe someday there is a native implementation for a binary serialization format.
        Also I was more focusing on what your server has to calculate – it probably won’t really matter if it takes your clients 0.2ms or 1ms to decode/encode data, you are right about that – but if your server can cut the amount of bytes to be parsed by that factor, it can be quite a gain.

  • http://www.clove.com mike

    Nice presentation, but I think you’re looking in the wrong place to evaluate efficiencies.

    Your percentages are off because you’ve forgotten to take into account the network transport layers. At the IP level you’re not gaining nearly as much because of the TCP and IP headers, checksums, routing info, etc which are added prior to leaving the client and then stripped after reception. In addition, packets are often transmitted in minimum sizes, etc. If – as one of the commenters said – your transported data is gzipped, that adds additional cruft, processing, standard block sizes into which the data is embedded, and yadda yadda yadda.

    Anyway, the difference between a 29 byte string and a 20 byte string is trivial, so the place to look is in the impact on processing overhead, maintenance [JSON is a printable representation which makes it much easier to debug than a binary one (does anybody just point 'telnet' at an address:port and pretend to be the client any more?)]

    Now if your argument was between XML and JSON (or any other printable, efficient data serialization method), I’m on your side. But that’s because the savings are probably significant and JSON is readable, while XML isn’t.

    • http://mishak.net Michal Gebauer

      At least I and ex-colleague of mine use telnet to see it pure :)

      To the point there is already http://bsonspec.org/ Binary JSON format/initiative. So using MsgPack is not reasonable.

  • http://rakeshpai.me/ Rakesh Pai

    You also forever lose all the ability to look at your data by viewing it any common tool, since the data isn’t text-based anymore. This makes debugging very hard. So, the part where you say that the data is not readable on the wire is not a benefit but a downside. In fact, it’s a serious enough downside to not consider this approach.

    All the benefits you mention can be achieved by turning on gzip instead, without any of the downsides. You might even get better compression.

    Protip: When in doubt, use a plain-text data exchange format.

    • olsn

      You can just build in a serialization-layer and then switch between JSON and MsgPack for debugging and production.

  • Manne

    Do you have any server stats, like before and after graphs? Would be interesting to see how much the actual saving is.

    • olsn

      Nope sorry – I used MsgPack in this project right away, after doing some benchmarks – my project does not have that many users anyways, but as I said: on Ruby MsgPack parses about 5x faster

  • Originalgeek

    Brilliant! By downloading 69k worth of JavaScript, I can save 20-30 bytes per ajax operation.

  • http://koeniglich.ch Patrick Stadler

    What happened to the 12 (age) in the example?

  • ziad

    It’s the 0C at the end.

  • Bob

    Have you not heard of hessian protocol? Its been out for yours and its ported to more or less every language. Personally I prefer gzipped json, but if I were to use binary I would use hessian.

  • Francis

    According to https://github.com/msgpack/msgpack/issues/26 MsgPack doesn’t handle strings at all, only byte arrays.

    This happens to work if you only send numbers (like coordinates and game scores), and hand-picked values (like “John Doe”), but not for user-entered data.

    How many lines of code does it take to do something like JSON.stringify that works for (arbitrary) strings?

    • olsn

      I think you misinterpreted something here, you can pack Strings, Numbers, Hashes ect.. alike, the only difference is the output:
      JSON: Object < -> String
      MsgPack: Object < -> ByteArray

      This means, that the OUTPUT is no String, but within your code you only work with the OBJECT anyways, that’s why it shouldn’t matter. If you are trying to generate a JSON String yourself, you probably have a design-flaw somewhere in your architecture ;-)

  • Poul

    Note that gzip compression can only be used to send data from the server to the client, not the other way (as it has to be negotiated)

  • https://gist.github.com/2946474 Nathan Aschbacher

    I did a little speed comparison of the node.js msgpack implementations here: https://gist.github.com/2946474

    It’s a very basic benchmark, but it’s pulled from the node-msgpack project that makes some speed claims of its own vs. JSON.

    Clearly JSON parsing and encoding got much, much faster at some point in V8, but msgpack is about the same speed as it always was. JSON is 4+ times as fast now.

    I need to do a test on MsgPack vs. JSON in Erlang next.

    • Pavel

      Thanx for the benchmark, nathan!

  • Pingback: 2012 Archive of Hanselman's Newsletter of Wonderful Things - Scott Hanselman

  • Thiago Z S

    In my point of view, we have a problem with the messagepack, I think the API dosent have a support to compress all over the data.
    I dont understand why the messagepack dont compress the json. Maybe I doing some wrong.
    Just for fun, I’m and Will, we doing some tests over Redis with a list with one object serialized! Checkout our tests.

    https://gist.github.com/thiagozs/6701223

    Thanks all