Use streaming JSON to reduce latency on mobile (opens in new tab)

(instantdomainsearch.com)

189 pointsbeau8y ago84 comments

84 comments

66 comments · 22 top-level

pkulak8y ago· 27 in thread

I agree that a streaming response is cool, but why the dismissal of returning valid JSON, streamed? Why invent a new protocol when JSON already exists? Streaming JSON parsers aren't unicorns, they are horses (sorry).

With this new-line-delimited JSON format all your clients HAVE to know about your new protocol. They have to stream the response bytes, split on new lines, unescape new lines in the payload (how are we doing that, btw?), etc. If a client doesn't care about streaming, it can't just sit on the response and parse it when it's done coming in. Or, how about if later on you upgrade the system so that the response is instant and streaming is no longer necessary? Then you move on to a new API and have to keep supporting this old streaming-but-not-really endpoint forever.

greglindahl8y ago

I've been using jsonlines for logfiles from about 5 years before it acquired a name.

With line delimited logfile objects, it's easy to grep for a string of interest and then only parse the lines that match -- much more efficient than parsing an entire logfile to pick out 0.01% of the lines.

That's not the usecase talked about in this article, but it is a usecase that's important to me.

ivanhoe8y ago

Jsonlines is also an excellent intermediary format for working with any huge datasets that are just too big to be loaded into the memory all at once, and where data is hierarchical so csv is not an option. Compared to XML you save a lot in the total file size, and it's just as easy to parse.

beauOP8y ago

It's also easy to load them into Google's BigQuery: https://cloud.google.com/bigquery/docs/loading-data#supporte...

1 more reply

bringtheaction8y ago

http://jsonlines.org/

> Each Line is a Valid JSON Value

This seems like the better option I agree.

1 more reply

laumars8y ago

I've just googled what jsonlines was and it turns out i've been using it for a while in log files too where I wanted JSON objects but needed it to behave like random access memory (exactly like you described with grep). I hadn't even realised it was a formal thing. Thank you

pkulak8y ago

> That's not the usecase talked about in this article, but it is a usecase that's important to me.

Yeah, and I agree with you 100%. But lot's of things that are great for log storage aren't appropriate for an API.

hiccuphippo8y ago

Together with jq, jsonlines make for great logging.

beached_whale8y ago

it should be trivial to write a tool to stream a json file with proper newlines

1 more reply

mikeash8y ago

Yeah, that’s really weird. They mention streaming parsers, then immediately say “ignore them.” Why? They apparently think this solution is better somehow, but it would be nice if they’d explain why.

Also, if you’re going to reinvent the wheel and make a custom framing format, why would you choose a delimeter that can legally appear in your content? Separating your JSON with newlines is complete madness. If you’re sending UTF-8 then you can trivially use a byte that cannot appear in the data, like 0xff, as the divider.

zamalek8y ago

> it would be nice if they’d explain why.

Just to fill in the picture here: it's because the built-in parser has a very good chance of being faster than anything you could write[1]. In addition, it's code that you don't own and don't have to maintain.

[1]: with exception to https://news.ycombinator.com/item?id=16413917

1 more reply

icebraining8y ago

Newlines can't appear in JSON strings (they must be encoded as \n).

1 more reply

beauOP8y ago

I made this choice because so many tools support newline-delimited text. A quick search showed that others (Twitter, eBay) made the same choice, so I went with the flow.

*Thank you for NSBlog, it's great!

1 more reply

macspoofing8y ago

>I agree that a streaming response is cool, but why the dismissal of returning valid JSON, streamed?

JSON hasn't been designed for chunked interpretation. How would the client know when to start interpreting the received message? How could you tell the difference between a valid chunk or malformed response?

>With this new-line-delimited JSON format all your clients HAVE to know about your new protocol

Yes. It's probably not a good option for public. It should be a complement to a more general API. I would put this in the bucket of micro-optimization for very specific use-cases. Regardless, if you publish your API and document it, other developers should be able to consume it just fine. It's not rocket science.

beauOP8y ago

It works for Twitter: https://developer.twitter.com/en/docs/tutorials/consuming-st...

1 more reply

masklinn8y ago

> JSON hasn't been designed for chunked interpretation. How would the client know when to start interpreting the received message?

Neither has XML and yet we have plenty of working streaming XML parsers.

> How could you tell the difference between a valid chunk or malformed response?

Same as usual, when it breaks.

andrewprock8y ago

Streaming JSON is troublesome because the base object is unbounded. It's true that there are workhorses that handle this, but when debugging responses, ad hoc tooling is very useful. If you can't examine a response without using a streaming parser, the cost of maintenance goes up significantly.

It's important to remember why we use JSON. It's not because it's well suited to transmission. It's because it's easy to reason about. If you want to move to a streaming format that is not easy to reason about, you may as well move to a binary format.

beauOP8y ago

curl is a streaming parser. Most tools know what to do with a newline.

1 more reply

makmanalp8y ago

Last time I tried streaming parsing it was with Oboe.js which wasn't bad but I remember it having approximately a 5x speed penalty. So you get your first item in your dataset faster, but the whole thing loads slower - a tradeoff that's not a no-brainer. I wonder how getline plus many repeated calls to json.parse work in comparison to that - I suspect the native parser probably has some startup overhead but still better than a JS parser.

The other thing was that to get the full benefits of something like this (not necessarily about streaming parsers vs not) you had to rearrange the way your /whole/ stack works, streaming all the way from the database through all the backend layers to the frontend. It's satisfying when it works, but definitely a non-trivial amount of change.

beauOP8y ago

It is a pragmatic choice. Most languages can consume newline-delimited text natively. Do you really think getline() is going away?

euyyn8y ago

His point isn't that consuming data line by line is hard.

ricardobeat8y ago

JS (and JSON by extension) does not have multi-line strings, so line endings are already escaped. All you need to do is output ‘non-pretty’ json.

pkulak8y ago

Nice catch, thanks! I didn't realize that. Looks like no control characters are allowed in JSON string literals.

jandrewrogers8y ago

Let me preface by saying that I don't have an immediate solution for this case, the details seem to be application specific. However, I will note that JSON is brutally inefficient by almost every computational metric that exists and it therefore is quite expensive unless your problem domain is trivial. I used to be in the position of spending -- literally -- millions of dollars per year on JSON because that was how people wanted to format the data. You can solve many engineering efficiency problems for millions of dollars/euros.

Which is to say I don't have an answer for why streaming JSON isn't valid in this case, but I can also say that if it was up to me I would never us it in an application that mattered. It is much too expensive for many (most?) applications.

skybrian8y ago

This makes more sense when working with files (rather than an API). You can use traditional Unix commands.

Another reason might be that when working with hand-written files, the delimiter adds a bit of redundancy. This make errors like unbalanced parentheses easier to diagnose.

gg38y ago

I agree. Keeping this backward compatible would not only simplify client design but also accelerate the adoption. I am not really sure how this would be implemented, though.

boubiyeah8y ago

It's impossible to stream a JSON Array

tobyhinloopen8y ago

Embedding newlines..... hmmmmm \nI’m not sure how one would do that

Figs8y ago· 6 in thread

As someone who actually regularly uses a slow mobile connection (8 kilobytes per second!) with somewhat high ping (~90ms to Google), please don't do this thinking you're making my life significantly better. It barely makes a difference in performance once loaded, and the initial load time is ridiculously worse. I'd much rather you make your page work without JavaScript, kept the design light (as this page has otherwise done), and make your CSS cacheable.

Right now, it takes over 5 seconds(!!) for this page to load because of all the freaking JavaScript it has to download! With JS off, the page loads almost immediately. With a keep-alive connection, subsequent loads over HTTPS are not particularly long, unlike what this article seems to think. (Hacker News is one of the FASTEST sites I can access, for example. Even on my crappy connection, pages load nearly instantly.)

Simply letting me type, press enter, and wait 0.1~0.3 seconds for a new page response would not be a significantly worse experience -- however, due to the way the site is written, search doesn't work AT ALL with JS disabled.

So, lots of engineering effort (compared to just serving up a new page) for little to no actual speed improvement, and a more brittle website that breaks completely on unusual configurations... Yeah. Please don't do this!

beauOP8y ago

You should get the iOS app: https://itunes.apple.com/us/app/instant-domain-search/id1068...

It uses the streaming API, and will work well for you.

Figs8y ago

Thank you for the link, but I have Android, and I'm looking at the performance by tethering to my PC.

ricardobeat8y ago

You wouldn’t wait 0.3s for the next page - that’d be 0.3s + a few seconds waiting for all the queries to return before showing anything. Streaming let’s you show results earlier.

Most likely the page loads instantly because the server is not doing any real work, which is offloaded to the client/js.

Figs8y ago

I'm looking at the behavior of the search tool on the page, which makes a request to https://instantdomainsearch.com/services/all/<query parameters> through AJAX and gets streamed lines of JSON back as a result. It's not doing DNS queries from JS or whatever.

Looking at the behavior with curl and wireshark, what I see is that a full, new connection to the service does spend most of its time in DNS lookup and HTTPS handshake. It takes about 0.1~0.3s for the actual data to transfer.

What the article is recommending is basically, don't make a request per JSON object. (It streamed back 69 objects for the query I tried.) Using one connection to transfer the information saves a lot of overhead -- and I don't have a problem with that part of the advice.

What I mean is, instead of using JS at all to do this (and consequently triggering a 5 second initial page load, etc etc.), have the server build the page the traditional way, and send that -- that's still one connection for that data transfer, and with a light page design and keep alive connection, the page load time does not seem like it would be significantly different here (most of the time is going to be in that 0.1~0.3s for the query to execute regardless) -- but the initial page load time would be significantly faster on slow connections.

If your queries do actually take many seconds, sure, maybe there would be a benefit there, but I'm not seeing the value on a page like this, and I really don't want people to take away the idea here that they should redesign their sites to use AJAX to "reduce latency on mobile" by default as it won't help, and in fact, tends to make things worse.

1 more reply

boubiyeah8y ago

I share your sentiment. But isn't it just their experiment page that uses too much JS as opposed to the approach requiring a ton of JS by nature?

Figs8y ago

I haven't tried to unpack their JS -- but you're right that it might not need to be as heavy as it actually is here.

zeger8y ago· 3 in thread

You say that using websockets was less reliable than streaming HTTPS, can you elaborate why? In my experience websockets are perfect to use for the use case you described, are there disadvantages?

beauOP8y ago

I tried this a long time ago. At the time, certain queries were hard for a server to answer. Other clients connected to the zombie server stalled until something timed out. With HTTPS, a load balancer can direct new queries to servers that are responsive.

thinkloop8y ago

I was also curious about this (and thanks for the article, very informative!)

Are you saying that at one point the servers would crash relatively often, which would leave sockets clients hanging, unless some complicated client-side code was written - whereas without sockets, a load-balancer could automatically switch clients to functional servers, without extra coding, and mitigate the issue? Isn't the problem the crashy servers?

1 more reply

greenleafjacob8y ago

Websockets are also bidirectional.

jannes8y ago· 2 in thread

Does anyone know a JSON parser that parses an ArrayBuffer instead of strings? [1]

JSON.parse() only accepts strings.

The library that the article recommends also uses XMLHttpRequest with strings. [2]

The reason I'm asking is the maximum string length in 32-bit Chrome.

[1]: https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequ...

[2]: https://github.com/eBay/jsonpipe/blob/master/lib/net/xhr.js

matharmin8y ago

If you want to parse more than 4GB of JSON in a browser, ArrayBuffer versus string isn't the most important criteria - you shouldn't parse a 4GB ArrayBuffer as a single chunk either. You can always convert between an ArrayBuffer and a string, but you'll want to compare parsers according to the ability to stream, as well as performance and memory efficiency.

DvdGiessen8y ago

You could look into streaming parsers such as Oboe.js, which specifically support the use case of parsing JSON tree's larger than the available RAM[0]. Then again, when you're loading such huge JSON files into a 32-bit instance of Chrome, it is likely you should look for a totally different solution to your problem.

[0]: http://oboejs.com/examples#loading-json-trees-larger-than-th...

slig8y ago· 1 in thread

I have nothing but praise to this service. It's fast and efficient, and does the job without bloat.

For instance, their iOS app weighs 888.8 KB! When it's common for simple apps to be 50 MB monsters, it's very refreshing to use something that has been developed with proper care.

beauOP8y ago

Thanks!

bshacklett8y ago· 1 in thread

How is this handled from a UI perspective? As more applications are built around the idea of streaming data, I've found that UI elements tend to jump around, and I find myself clicking/tapping the wrong item more and more, because the item which had been under my thumb/cursor has jumped away just before I could activate it.

beauOP8y ago

This is a good point. De-bounce, static results ordering, and static elements with placeholders help. Instant Domain Search has room to improve here.

JensRantil8y ago· 1 in thread

Let's talk about reliability: The network is unrealiable; Firewalls might be broken, packets are dropped, IP-addresses may change, cellphones lose connection in subway tunnels. Simply calling "streaming reliable" without even defining what the "reliability" is protecting again, makes "reliable" an overstatement.

IMHO the most reliable way to get data from point A to point B is likely by having a client actively polling for data, using a strict socket timeout. Data should be at-least once delivered. If JSONS should be called anything remotely "reliable" as periodically polling, at least it should have a strict timeout (not mentioned in the article) for receiving the next newline & it should handle replaying of non-acked messages. Otherwise I would call it far from "reliable".

beauOP8y ago

Each request takes a second or two to complete. If you lose the connection, the client can send the same request again (with exponential back off).

sajal838y ago· 1 in thread

If the concern is HTTPS overhead, why not use HTTP/2 and send multiple requests?

I think streaming would be useful only if the responses are stateful and it's hard to share it across requests.

beauOP8y ago

Even with HTTP/2, sending a GET request is not free. Most of the benefit is that we can show the user results as they come in. Each query gets over 50 responses in random order from DNS, in-memory indexes of zone files, slower fuzzy searches over other data, and so on. Why wait to show them the .com result while .ca resolves?

delaaxe8y ago· 1 in thread

Feels to me like the response content type shouldn't be "application/json" anymore (it is what's returned on that first example).

beauOP8y ago

IIRC, earlier versions of WebKit wouldn't emit data from each chunk when I sent a custom content type. Or maybe there was some conflict with gzip, I forget. Now that those browsers are gone, maybe ndjson? jsons?

nategri8y ago· 1 in thread

Why not just gzip the JSON? Should make complicated JSON around an order of magnitude smaller, and be more portable to boot.

beauOP8y ago

I do gzip the streamed JSON. Try:

$ curl -H "Accept-Encoding: gzip" --trace - "https://instantdomainsearch.com/services/vanity/apple?hash=8...

rhacker8y ago

Somewhat related is the JSON Lines "spec": http://jsonlines.org/

erikrothoff8y ago

Really interesting stuff! What about simply opening a Websocket connection and using that to for all requests if connection latency is such an issue?

fenwick678y ago

I did this a few years ago before I knew it was a "thing" and felt really proud that it actually worked.

The use-case was we had a slow database query for basically map pins. The first ones pins come back in milliseconds, but the last ones would take seconds. The UI was vastly improved by streaming the data instead of waiting for it all to finish, and the server code was easy to implement.

A different delimiter would have worked, but newlines are easy to see in a debugger.

tuukkah8y ago

I'd like to see this streaming JSON parser incorporated into GraphQL clients: http://oboejs.com

MonkeyDan8y ago

Any benefit to using this over Server-Sent Events? (other than IE/Edge support)

iamd3vil8y ago

I think websockets is a much better use case for this if you don't want the reconnection overhead. Also since websockets are bidirectional, you can keep the connection open and send all requests through the connection as well as receive responses from the connection. Also you can send binary on websockets if you want to save bandwidth as well. We do this at work and it works pretty nicely.

Osiris8y ago

Isn't this pretty similar to how you would use WebSocket frames to transfer individual JSON elements when a client is subscribed?

At one job I had several years ago we came up with the same idea and use \n separated JSON elements as a streaming response. We also tossed around the idea of using WebSockets to stream large responses between services.

wybiral8y ago

Is there an advantage to using chunked streams like this over using WebSockets like this: https://github.com/wybiral/hookah (this will take newlines from a programs stdout and send them over WebSockets, even aggregating multiple streams into one).

gumby8y ago

I feel like I'm missing something here. Isn't that the point of using a JSON SAX parser instead of a DOM parser?

cwt1378y ago

How is this different than SSE https://html.spec.whatwg.org/multipage/comms.html#server-sen... ? Or, why would someone choose this over SSE?

osrec8y ago

Why would you use this over web sockets?

stringham8y ago

Should have (2017) in the title.

j / k navigate · click thread line to collapse

84 comments

66 comments · 22 top-level

pkulak8y ago· 27 in thread

greglindahl8y ago

I've been using jsonlines for logfiles from about 5 years before it acquired a name.

That's not the usecase talked about in this article, but it is a usecase that's important to me.

ivanhoe8y ago

beauOP8y ago

It's also easy to load them into Google's BigQuery: https://cloud.google.com/bigquery/docs/loading-data#supporte...

1 more reply

bringtheaction8y ago

http://jsonlines.org/

> Each Line is a Valid JSON Value

This seems like the better option I agree.

1 more reply

laumars8y ago

pkulak8y ago

> That's not the usecase talked about in this article, but it is a usecase that's important to me.

Yeah, and I agree with you 100%. But lot's of things that are great for log storage aren't appropriate for an API.

hiccuphippo8y ago

Together with jq, jsonlines make for great logging.

beached_whale8y ago

it should be trivial to write a tool to stream a json file with proper newlines

1 more reply

mikeash8y ago

zamalek8y ago

> it would be nice if they’d explain why.

[1]: with exception to https://news.ycombinator.com/item?id=16413917

1 more reply

icebraining8y ago

Newlines can't appear in JSON strings (they must be encoded as \n).

1 more reply

beauOP8y ago

I made this choice because so many tools support newline-delimited text. A quick search showed that others (Twitter, eBay) made the same choice, so I went with the flow.

*Thank you for NSBlog, it's great!

1 more reply

macspoofing8y ago

>I agree that a streaming response is cool, but why the dismissal of returning valid JSON, streamed?

>With this new-line-delimited JSON format all your clients HAVE to know about your new protocol

beauOP8y ago

It works for Twitter: https://developer.twitter.com/en/docs/tutorials/consuming-st...

1 more reply

masklinn8y ago

> JSON hasn't been designed for chunked interpretation. How would the client know when to start interpreting the received message?

Neither has XML and yet we have plenty of working streaming XML parsers.

> How could you tell the difference between a valid chunk or malformed response?

Same as usual, when it breaks.

andrewprock8y ago

beauOP8y ago

curl is a streaming parser. Most tools know what to do with a newline.

1 more reply

makmanalp8y ago

beauOP8y ago

It is a pragmatic choice. Most languages can consume newline-delimited text natively. Do you really think getline() is going away?

euyyn8y ago

His point isn't that consuming data line by line is hard.

ricardobeat8y ago

JS (and JSON by extension) does not have multi-line strings, so line endings are already escaped. All you need to do is output ‘non-pretty’ json.

pkulak8y ago

Nice catch, thanks! I didn't realize that. Looks like no control characters are allowed in JSON string literals.

jandrewrogers8y ago

skybrian8y ago

This makes more sense when working with files (rather than an API). You can use traditional Unix commands.

Another reason might be that when working with hand-written files, the delimiter adds a bit of redundancy. This make errors like unbalanced parentheses easier to diagnose.

gg38y ago

I agree. Keeping this backward compatible would not only simplify client design but also accelerate the adoption. I am not really sure how this would be implemented, though.

boubiyeah8y ago

It's impossible to stream a JSON Array

tobyhinloopen8y ago

Embedding newlines..... hmmmmm \nI’m not sure how one would do that

Figs8y ago· 6 in thread

beauOP8y ago

You should get the iOS app: https://itunes.apple.com/us/app/instant-domain-search/id1068...

It uses the streaming API, and will work well for you.

Figs8y ago

Thank you for the link, but I have Android, and I'm looking at the performance by tethering to my PC.

ricardobeat8y ago

You wouldn’t wait 0.3s for the next page - that’d be 0.3s + a few seconds waiting for all the queries to return before showing anything. Streaming let’s you show results earlier.

Most likely the page loads instantly because the server is not doing any real work, which is offloaded to the client/js.

Figs8y ago

1 more reply

boubiyeah8y ago

I share your sentiment. But isn't it just their experiment page that uses too much JS as opposed to the approach requiring a ton of JS by nature?

Figs8y ago

I haven't tried to unpack their JS -- but you're right that it might not need to be as heavy as it actually is here.

zeger8y ago· 3 in thread

You say that using websockets was less reliable than streaming HTTPS, can you elaborate why? In my experience websockets are perfect to use for the use case you described, are there disadvantages?

beauOP8y ago

thinkloop8y ago

I was also curious about this (and thanks for the article, very informative!)

1 more reply

greenleafjacob8y ago

Websockets are also bidirectional.

jannes8y ago· 2 in thread

Does anyone know a JSON parser that parses an ArrayBuffer instead of strings? [1]

JSON.parse() only accepts strings.

The library that the article recommends also uses XMLHttpRequest with strings. [2]

The reason I'm asking is the maximum string length in 32-bit Chrome.

[1]: https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequ...

[2]: https://github.com/eBay/jsonpipe/blob/master/lib/net/xhr.js

matharmin8y ago

DvdGiessen8y ago

[0]: http://oboejs.com/examples#loading-json-trees-larger-than-th...

slig8y ago· 1 in thread

I have nothing but praise to this service. It's fast and efficient, and does the job without bloat.

For instance, their iOS app weighs 888.8 KB! When it's common for simple apps to be 50 MB monsters, it's very refreshing to use something that has been developed with proper care.

beauOP8y ago

Thanks!

bshacklett8y ago· 1 in thread

beauOP8y ago

This is a good point. De-bounce, static results ordering, and static elements with placeholders help. Instant Domain Search has room to improve here.

JensRantil8y ago· 1 in thread

beauOP8y ago

Each request takes a second or two to complete. If you lose the connection, the client can send the same request again (with exponential back off).

sajal838y ago· 1 in thread

If the concern is HTTPS overhead, why not use HTTP/2 and send multiple requests?

I think streaming would be useful only if the responses are stateful and it's hard to share it across requests.

beauOP8y ago

delaaxe8y ago· 1 in thread

Feels to me like the response content type shouldn't be "application/json" anymore (it is what's returned on that first example).

beauOP8y ago

nategri8y ago· 1 in thread

Why not just gzip the JSON? Should make complicated JSON around an order of magnitude smaller, and be more portable to boot.

beauOP8y ago

I do gzip the streamed JSON. Try:

$ curl -H "Accept-Encoding: gzip" --trace - "https://instantdomainsearch.com/services/vanity/apple?hash=8...

rhacker8y ago

Somewhat related is the JSON Lines "spec": http://jsonlines.org/

erikrothoff8y ago

Really interesting stuff! What about simply opening a Websocket connection and using that to for all requests if connection latency is such an issue?

fenwick678y ago

I did this a few years ago before I knew it was a "thing" and felt really proud that it actually worked.

A different delimiter would have worked, but newlines are easy to see in a debugger.

tuukkah8y ago

I'd like to see this streaming JSON parser incorporated into GraphQL clients: http://oboejs.com

MonkeyDan8y ago

Any benefit to using this over Server-Sent Events? (other than IE/Edge support)

iamd3vil8y ago

Osiris8y ago

Isn't this pretty similar to how you would use WebSocket frames to transfer individual JSON elements when a client is subscribed?

wybiral8y ago

gumby8y ago

I feel like I'm missing something here. Isn't that the point of using a JSON SAX parser instead of a DOM parser?

cwt1378y ago

How is this different than SSE https://html.spec.whatwg.org/multipage/comms.html#server-sen... ? Or, why would someone choose this over SSE?

osrec8y ago

Why would you use this over web sockets?

stringham8y ago

Should have (2017) in the title.

j / k navigate · click thread line to collapse