How fast can a BufferedReader read lines in Java? (opens in new tab)

(lemire.me)

91 pointsCCs6y ago78 comments

78 comments

59 comments · 15 top-level

nottorp6y ago· 11 in thread

Java is... java.

I was once working on an Android app on a cheap custom board with 128 M ram (don't ask why Android on a single function custom board, wasn't my decision).

Among other things, I had to parse a 80000 line csv file. Splitting and the rest of the processing created so many temporary strings the system ran out of ram. We eventually gave up.

TheChaplain6y ago

Myself I've worked with gigabyte sized CSV files without issues, so it was likely your implementation rather than the fault of the Java language.

VRay6y ago

Googles "parse csv Java"

Copies and pasted top answer that buffers the whole thing into the heap before parsing

Blames Java/hardware when this doesn't scale

1 more reply

adrianN6y ago

Did you do it in 128 megs of RAM?

2 more replies

Zach_the_Lizard6y ago

> Among other things, I had to parse a 80000 line csv file. Splitting and the rest of the processing created so many temporary strings the system ran out of ram. We eventually gave up

Let us all hope that Project Valhalla, the effort to add value types to Java, comes to a swift completion. It would be very helpful in these sorts of scenarios.

Though at this point I'd wonder (as you did) why I'm using Java in the first place given the resource constraints of the system, I've had some success using byte[] arrays and using `yourInputStreamHere.read(someBuffer, yourOffset, YOUR_PAGE_SIZE)` in a few less constrained but relatively more performance critical areas.

Depending on your exact needs, this can sometimes result in more or less constant memory consumption; it's possible to reuse the array on each read. You can also take it a step further by finding the indexes where you would split the array (e.g. the index of the next comma for a CSV) and using methods that operate on an array plus start and end indexes. That sort of strategy can allow you to avoid additional copies the array, at the cost of somewhat annoying and less maintainable code that is definitely not Java-esque.

There are also less severe solutions that won't buffer the whole file in memory, using StringBuilder or other techniques, etc. depending on what you're doing.

This all might be for naught though; I'm assuming the records have to be parsed into some structure, so tricks like the above might not be good enough given Java's seemingly insatiable hunger for heap space.

While I'm not the language's greatest fan, this is one area where Go has been able to shine. Having value types as a possibility makes solving this sort of problem a typically less expensive proposition.

dionian6y ago

I'm a little confused, trying to parse an entire file in memory instead of streaming it is one of the mistakes I might make in my first year or two of development. parsing 80k lines of CSV in java is pretty easy as long as you write your code to be efficient and release memory line by line

pjmlp6y ago

You can already play with Valhalla.

https://mail.openjdk.java.net/pipermail/valhalla-dev/2019-Ju...

nottorp6y ago

Interesting comments so let's have some answers:

1. 128 M ram total memory, 20-30 M free at best. CPU was slow and we were running off a SD card. Not exactly an enterprise monster.

2. Of course I was parsing it line by line. But running OOM triggered the garbage collector so many times that it took like 3-4 minutes to finish. I didn't mean that the system actually stopped because of running out of ram.

3. Why didn't you do it with the NDK/some other solution? Because the PM just gave up on the feature when it wasn't done in a day. It was just real time search completion whith we didn't really need there, it was just "nice to have".

kuschku6y ago

I’m parsing much larger datasets than those on android, in less than 18M RAM usage.

You just have to use NIO and use direct buffers instead of naively reading and allocating strings.

nottorp6y ago

So what's the point of using Java then? Of course I can do it in a non GC language using a buffer that holds one line and as much memory as needed for the data I want to retain...

2 more replies

pjmlp6y ago

Android gives bad name to Java with its Dalvik and ART partial implementations.

cheez6y ago

Why didn't you just write it using the NDK?

elFarto6y ago· 9 in thread

The first issue I can see with that code is it's not doing what he expects. He does this to read the file into a StringBuffer:

  bf.lines().forEach(s -> sb.append(s));

However, this ends up reading all the lines into one giant line, since the String's that lines() produces have the newline character stripped. This leads to the second lines() call to read a 23MB line (the file produced by gen.py). This is less than optimal.

The fastest version I managed to write was:

    public void readString5(String data) throws IOException {
         int lastIdx = 0;
         for (int idx = data.indexOf('\n'); idx > -1; idx = data.indexOf('\n', lastIdx))
         {
            parseLine(data.substring(lastIdx, idx));
            lastIdx = idx+1;
         }
         parseLine(data.substring(lastIdx));
    }

Not the prettiest thing, but it went from 0.594 GB/s to 1.047 GB/s. Also, it doesn't quite do the same as the lines() method, but that's easily changed.

peq6y ago

Your version still has the problem that since Java 7 the substring method allocates a new String, which in unnecessary when it is only borrowed to the parseLine function and not needed afterwards. What you probably want is a view that reuses the same underlying data. With the following I get around 2 GB/s:

    public void readString(String data) throws IOException {
        int lastIdx = 0;
        for (int idx = data.indexOf('\n'); idx > -1; idx = data.indexOf('\n', lastIdx)) {
            parseLine(subSequenceView(data, lastIdx, idx));
            lastIdx = idx + 1;
        }
        parseLine(subSequenceView(data, lastIdx, data.length()));
    }

    CharSequence subSequenceView(CharSequence base, int beginIndex, int endIndex) {
        return new StringView(base, beginIndex, endIndex - beginIndex);
    }

    static class StringView implements CharSequence {
        final CharSequence base;
        final int offset;
        final int length;

        StringView(CharSequence base, int offset, int length) {
            if (length < 0) throw new IllegalArgumentException("length must be >= 0");
            this.base = base;
            this.offset = offset;
            this.length = length;
        }

        @Override
        public char charAt(int n) {
            if (n < 0 || n >= length)
                throw new IndexOutOfBoundsException(n);
            return base.charAt(offset + n);
        }

        @Override
        public int length() {
            return length;
        }

        @Override
        public CharSequence subSequence(int beginIndex, int endIndex) {
            return new StringView(base, offset + beginIndex, endIndex - beginIndex);
        }
    }

hinkley6y ago

Is it possible he edited the code? I don’t see the append in the blog post.

soup106y ago

this still does a fair amount of processing, curious what the throughput is with just straight reads into a char[] from FileReader

paulddraper6y ago

I'm confused.

Where is the second lines call?

ricardobeat6y ago

Lines 19 and 27 in the source: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

1 more reply

ricardobeat6y ago

That's a pretty big error if you're correct. What does it say about the language when a CS professor falls for this on a 40 line file?

bhaak6y ago

Getting benchmarks right is difficult, even for a CS professor. The language doesn't even enter there.

What does it say about you that you're asking a leading question in this way?

1 more reply

t-writescode6y ago

About as much as binary search being wrong in many languages for 20 years: https://thebittheories.com/the-curious-case-of-binary-search...

Nothing. Mistakes happen.

paulddraper6y ago

Pray tell...what has Java done poorly to ensure this mistake happened?

znpy6y ago· 5 in thread

The post does nothing to explain how and why, it just throws a couple of outputs from a non specified machine and does no comparison.

It has no baseline and no specs. For all I know, he could have got his 0.5 GB/sec on ab old Pentium II processor.

There is no analysis.

I am perplexed.

aristus6y ago

Lemire is one of the leading experts on string matching and the author of several core libraries you probably use every day.

edit fine, so instead maybe click on the links in the post to see that this article is just one of a series. He's probably tired of copy-pasting the specs of his reference hardware (Skylake https://arxiv.org/pdf/1902.08318.pdf) since all he's concerned about is the relative performance of different software.

There is a difference between "I'm being dumb because I don't know what I'm doing" and "I'm being lazy because I've done it 1,000 times and the target audience knows what I mean".

adrianN6y ago

Then they should know enough to give at least some theories that can explain the difference and tell us something more about the test setup.

1 more reply

saagarjha6y ago

That doesn’t excuse him from needing to describe the hardware he ran the benchmark on.

1 more reply

pjmlp6y ago

Yet he misses the fact that Java isn't defined by a single implementation and the standard library reference doesn't dictate how each Java compliant implementation is required to provide BufferedReader behaviour.

rowanG0776y ago

Then he should really know better. Not sure why you think the sentence you have written is even an argument.

1 more reply

adrianN6y ago· 5 in thread

So what's the reason for this? Is it maybe because of some unicode shenanigans? Java characters are 16bit iirc, and strings have some forty bytes of constant overhead.

d2mw6y ago

I'm no Java ninja, but a few things jump out of https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c... :

- at least one heap allocation for every line. After it finds the EOL it first uses 'new String' followed by '.toString()

- the C++ version will almost certainly be backing on to memchr() behind the scenes, which will be using SIMD instructions where it makes sense (e.g. large enough scan size, probably true in this case). the Java version is a manual bytewise-coded loop.

- the C++ version is reusing its output buffer, no reallocations assuming the same string length or less

No idea about encodings in Java, maybe that is playing a role too

zlynx6y ago

I haven't looked recently but several years ago I was shocked to discover the GNU libstdc++ didn't use strchr or memchr. It used a hand-coded for loop because it was a template for various kinds of character. There was no specialization for 8-bit char, either.

As a result std::string was disgustingly slow compared to C code.

adrianN6y ago

Yes the reuse of the buffer in C++ seems likely and would probably explain a large part of the difference, but I don't know enough of the std::string implementation to be sure about that.

soup106y ago

the stringbuffer and string allocation will make it slower for sure, curious what performance the other ways of reading lines in java have

nitwit0056y ago

There is a layer of classes, so presumably there are multiple buffers and extra copying. It's also converting the underlying encoding to UTF-16, allocating String objects, and copying the data into them.

nullwasamistake6y ago· 5 in thread

Eh, he didn't use NIO. BufferedReader is an ancient Java relic. Like reading from STDIN in c, it's not made to be fast, it's there for convenience and backwards compatibility.

Read a file using something like Vert.X, which is optimized for speed. I'm 100% confident it will be faster than the naive c approach

djhworld6y ago

Do you have example of alternatives for the BufferedReader with the NIO APIs?

I do a lot of work with large GZIP that are read line by line using the standard IO (i.e. GzipInputStrem(FileInputStream)) etc) but your comment has really made me second guess my choice of doing that...

nullwasamistake6y ago

I would check out the compression handlers in Netty, which underlies Vert.X and many other projects that need high IO performance.

You should be able to hack something together that feeds zero copy buffers into Netty compression handler. Maybe using Netty or Vert.X file API, or maybe just raw NIO2.

I'm not sure how fast this would be, but my gut says "very". Netty can easily saturate 40 Gigabit Ethernet lines, and file IO should have less overhead. That's ~5 gigabytes a second.

It's going to be a good bit of coding for sure. Vert.X/Netty/NIO2 are all async and pretty low level. They're generally 1 thread per core, and along with SSD read patterns you're probably best off reading files in parallel, one per core. Might not be worth the effort.

You may want to look into ZStandard as well. It's Superior to Gzip in most ways when you need something fast but decent.

_old_dude_6y ago

The NIO API uses channels and buffers

  Path path = ...
  var count = 0;
  try(var channel = FileChannel.open(path)) {
    var buffer = ByteBuffer.allocateDirect(8192);
    while(channel.read(buffer) != -1) {
      while(buffer.hasRemaining()) {
        if (buffer.get() == '\n') {
          count++;
        }
      }
      buffer.clear();
    }
  }
  System.out.println(count);

but in your particular case, i don't think there is a Gzip decoder that works on ByteBuffer.

egwor6y ago

Do you have a quick github project example?

nullwasamistake6y ago

Sorry, I'm too lazy to create a throwaway GitHub repo. I make way too many flamey and policital comments to tie my real name to HN posts :-/

derefr6y ago· 4 in thread

I don't know what Java's BufferedReader is doing, but it's probably not the optimal thing in terms of IO throughput. I would blame the algorithm long before blaming anything inherent about the JVM.

Erlang is another language where "naive" IO is kind of slow. https://github.com/bbense/beatwc/ is a project someone did to test various methods of doing IO in Erlang/Elixir, and their performance for a line-counting task, relative to the Unix wc(1) command.

It's interesting to see which approaches are faster. Yes, parallelism gains you a bit, but a much larger win comes from avoiding the stutter-stop effect of cutting the read buffer off whenever you hit a newline. Instead, the read buffer should be the same size as your IO source's optimal read-chunk size (a disk block; a TCP huge packet), and you should grab a whole buffer-ful of lines at a time, do a pattern-matching binary scan to collect all the indices of the newlines, and then use those indices to part the buffer out as slice references.

This achieves quite a dramatic speedup, since most of the time you don't need movable copies of the lines, and can copy the line (or more likely just part of it) yourself when you need to hold onto it.

This approach is probably also already built in to Java's "better" IO libraries, like NIO.

ubu77376y ago

Java NIO does exactly what you describe. IO is chunked optimally, the entire buffer is pulled, the API exposes a select() mechanism just like any libc, and the user is expected to frame its own received data.

edit: Lemire is showing a lack of what Fowler describes as "mechanical sympathy".

https://martinfowler.com/articles/lmax.html#QueuesAndTheirLa...

Matthias2476y ago

A bit of nitpicking on that: Even Javas nio API isn't very efficient. select() introduces a ton of garbage due to dynamically allocating lists for selected keys, and keys have a lot of synchronization overhead. That's why frameworks like Netty gained a lot of performance by building their own selectors using JNI.

derefr6y ago

I'm reminded to add, in the vein of the author's complaint, that there is a similar ridiculousness in Erlang land, that cannot be circumnavigated so easily: reading/writing to zlib-compressed files using Erlang's file:open(..., [compressed]) option—or generating/parsing zlib-compressed ETF binaries using erlang:binary_to_term(..., [compressed]))—holds [the moral equivalent of†] a global lock. Only one process can be zlib-compressing or zlib-decompressing a chunk of data at once, no matter how many cores your system has.

This means that, even when your data set compresses so well that you'd theoretically gain a ton of speed by having the data streamed from disk compressed, and then decompressed during parsing—this doesn't apply in practice, since you're introducing an artificial bottleneck in your IO reads.

I'm not actually sure if this is a bug in Erlang, per se, or if it's just the intended behavior and compressed file IO was never intended to be used for performance, only for e.g. embedded devices with tiny ROMs.

(If people here think it's a bug, I'll probably go to the effort at some point to profile the performance impact and submit it as a bug on https://bugs.erlang.org.)

† What it's actually doing, is that all zlib compression/decompression passes get sent to a zlib "port driver" as messages. Port drivers can handle multiple requests in-flight at once (they're the in-process equivalent to sockets—each Erlang process's port against the port-driver is its own "connection") but the zlib port driver shim is coded to expose zlib as a single-threaded, blocking, request-response style of server, rather than one that accepts connections in parallel and instantiates a separate zlib context for each separate connection it receives.

micmus6y ago

That's probably not true in the recent versions - in OTP 20 the zlib integration was reworked and is now based on NIFs (similar to Java's JNI) instead of port drivers.

chvid6y ago· 2 in thread

At least two problems with the java code. Concatenation of strings using the plus operator creates a new string and copies the content of the old, that pushes the complexity of the code from o(n) to o(n2) where n is the number of lines. Secondly order is not guaranteed with the for each operation on streams.

The correct way to do it is using collect(Collectors.joining(“\n”)) or straight forward imperative style (without streams).

I don’t think the general statement holds (that java or buffered reader is cpu bound in particular).

pvg6y ago

Where do you see string concatenation with the plus operator in the posted code?

chvid6y ago

Haha - nowhere ... I completely misread his code.

pjmlp6y ago· 2 in thread

Completely wrong.

It is like asserting something about C based on GCC specific behaviour.

Java is not a single language implementation.

tom_mellior6y ago

Agreed. I think you are being downvoted because you forgot to paste in your benchmark results.

pjmlp6y ago

Yeah, I forgot that we aren't allowed to have opinions without doing the hard work.

So now I have to go out and do benchmarks across all these implementations just to prove they aren't the same?!?

https://en.m.wikipedia.org/wiki/List_of_Java_virtual_machine...

tawy123456y ago· 1 in thread

I'm amazed at how upset some commenters are about a blog post that did a toy experiment and didn't actually make any strong claims.

I'm actually a stickler about good benchmarks - it riles me when people draw sweeping conclusions from poorly-designed experiments. Lemire is actually one of the good ones. If you want something more fully developed than a blog post, read one of his papers.

I personally really enjoy his blog because of this - he's good at picking interesting exploratory experiments that provide some insight, without trying to over-generalize from the results. If you read his conclusion, the point is that there is a good probability that even relatively simple programs are CPU-bound. His experiment supports that point. My experience also matches that - I've seen a lot of data processing code that could be I/O bound in theory (i.e. a perfect implementation could max out CPU or network) but is CPU bound in practice. Usually because of string manipulation, regexes, or any number of other things.

> This is not the best that Java can do: Java can ingest data much faster. However, my results suggest that on modern systems, Java file parsing might be frequently processor-bound, as opposed to system bound. That is, you can buy much better disks and network cards, and your system won’t go any faster. Unless, of course, you have really good Java engineers.

jnordwick6y ago

I can make a lot of things cpu bound with crappy implementation. I'm not going to write a blog post about any of them.

Sindisil6y ago

Honestly, it seems that nearly everyone here is missing his point.

Some of the blame for that probably lies with his headline choice, but he clearly states at the end of this post:

""" This is not the best that Java can do: Java can ingest data much faster. However, my results suggest that on modern systems, Java file parsing might be frequently processor-bound, as opposed to system bound. That is, you can buy much better disks and network cards, and your system won’t go any faster. Unless, of course, you have really good Java engineers.

Many firms probably just throw more hardware at the problem. """

It's not about this piece of code. It's not even about Java In the previous post he mentions at the start of this one, he pointed out:

""" These results suggest that reading a text file in C++ could be CPU bound in that sense that buying an even faster disk would not speed up your single-threaded throughput. """

So, I take his point to be that one shouldn't make assumptions about performance. Rough performance scales -- such as have been posted here many times (e.g. [1]) -- make great rules of thumb for implementation choices or as a guide for where to look first for bottlenecks. To optimize in the real world, though, you're best served using real measurements.

[1] https://www.prowesscorp.com/computer-latency-at-a-human-scal...

1 more reply

ubu77376y ago

This is absurd, the original platform libraries do not account for the fastest use-cases in any specialized IO case.

Java NIO channel should have been used for this. It was demonstrated back in the early 2000s with the "Grand Canyon" demo achieving very good throughput for its time, and it's still the gold standard.

vbezhenar6y ago

Java has many inefficient parts. For example there's no immutable array concept (or owning concept, like in Rust), so there's a lot of unnecessary array copies happens in JDK. String is not well designed. There was an attempt to abstract String concept into CharSequence, but a lot of code still uses Strings.

I made a similar benchmark. The idea is as follows: we have 2 GB byte array (because arrays in Java have 32 bit limit, LoL) filled with 32..126 values, imitating ASCII text and 13 values imitating newlines.

The first test is simply does XOR the whole array. It's the ideal result which should correspond to memory bandwidth.

The second test wraps this array into ByteArrayInputSteram, converts it into Reader using InputStreamReader with UTF-8 encoding, reads lines using BufferedReader and in the end also XORs every char value.

For 2 GB I have 516 ms as an ideal time (3,8 GB/s which is still almost order of magnitude less than theoretical 19.2 GB/s DDR4 speed) and 3566 ms as a BufferedReader, so you can have almost 7x speed improvement with better implementation.

Benchmark: https://pastebin.com/xMD4W8mn

tantalor6y ago

One thing that jumps out at me is the test code writes to "volume" variable in the read loop, I assume for counting the number of bytes in the file, but never reads it back. A clever compiler will optimize away those writes, the string length check, the loop over the lines, and actually reading the file.

I'm not saying that's happening here, but it's a basic fact when writing benchmarks that you have to actually test something real and not a transient property of the program after the compiler has had its chance to be really smart.

barbarbar6y ago

I am a bit confused. The scanFile reads from a file. But the readLines inside the for loop is reading from a StringReader - and not from a file?

IloveHN846y ago

But all the stream API isso terribile for performance..you write a one line code and you are already at O(n^5)

j / k navigate · click thread line to collapse

78 comments

59 comments · 15 top-level

nottorp6y ago· 11 in thread

Java is... java.

I was once working on an Android app on a cheap custom board with 128 M ram (don't ask why Android on a single function custom board, wasn't my decision).

Among other things, I had to parse a 80000 line csv file. Splitting and the rest of the processing created so many temporary strings the system ran out of ram. We eventually gave up.

TheChaplain6y ago

Myself I've worked with gigabyte sized CSV files without issues, so it was likely your implementation rather than the fault of the Java language.

VRay6y ago

Googles "parse csv Java"

Copies and pasted top answer that buffers the whole thing into the heap before parsing

Blames Java/hardware when this doesn't scale

1 more reply

adrianN6y ago

Did you do it in 128 megs of RAM?

2 more replies

Zach_the_Lizard6y ago

> Among other things, I had to parse a 80000 line csv file. Splitting and the rest of the processing created so many temporary strings the system ran out of ram. We eventually gave up

Let us all hope that Project Valhalla, the effort to add value types to Java, comes to a swift completion. It would be very helpful in these sorts of scenarios.

There are also less severe solutions that won't buffer the whole file in memory, using StringBuilder or other techniques, etc. depending on what you're doing.

dionian6y ago

pjmlp6y ago

You can already play with Valhalla.

https://mail.openjdk.java.net/pipermail/valhalla-dev/2019-Ju...

nottorp6y ago

Interesting comments so let's have some answers:

1. 128 M ram total memory, 20-30 M free at best. CPU was slow and we were running off a SD card. Not exactly an enterprise monster.

kuschku6y ago

I’m parsing much larger datasets than those on android, in less than 18M RAM usage.

You just have to use NIO and use direct buffers instead of naively reading and allocating strings.

nottorp6y ago

So what's the point of using Java then? Of course I can do it in a non GC language using a buffer that holds one line and as much memory as needed for the data I want to retain...

2 more replies

pjmlp6y ago

Android gives bad name to Java with its Dalvik and ART partial implementations.

cheez6y ago

Why didn't you just write it using the NDK?

elFarto6y ago· 9 in thread

The first issue I can see with that code is it's not doing what he expects. He does this to read the file into a StringBuffer:

  bf.lines().forEach(s -> sb.append(s));

The fastest version I managed to write was:

    public void readString5(String data) throws IOException {
         int lastIdx = 0;
         for (int idx = data.indexOf('\n'); idx > -1; idx = data.indexOf('\n', lastIdx))
         {
            parseLine(data.substring(lastIdx, idx));
            lastIdx = idx+1;
         }
         parseLine(data.substring(lastIdx));
    }

Not the prettiest thing, but it went from 0.594 GB/s to 1.047 GB/s. Also, it doesn't quite do the same as the lines() method, but that's easily changed.

peq6y ago

    public void readString(String data) throws IOException {
        int lastIdx = 0;
        for (int idx = data.indexOf('\n'); idx > -1; idx = data.indexOf('\n', lastIdx)) {
            parseLine(subSequenceView(data, lastIdx, idx));
            lastIdx = idx + 1;
        }
        parseLine(subSequenceView(data, lastIdx, data.length()));
    }

    CharSequence subSequenceView(CharSequence base, int beginIndex, int endIndex) {
        return new StringView(base, beginIndex, endIndex - beginIndex);
    }

    static class StringView implements CharSequence {
        final CharSequence base;
        final int offset;
        final int length;

        StringView(CharSequence base, int offset, int length) {
            if (length < 0) throw new IllegalArgumentException("length must be >= 0");
            this.base = base;
            this.offset = offset;
            this.length = length;
        }

        @Override
        public char charAt(int n) {
            if (n < 0 || n >= length)
                throw new IndexOutOfBoundsException(n);
            return base.charAt(offset + n);
        }

        @Override
        public int length() {
            return length;
        }

        @Override
        public CharSequence subSequence(int beginIndex, int endIndex) {
            return new StringView(base, offset + beginIndex, endIndex - beginIndex);
        }
    }

hinkley6y ago

Is it possible he edited the code? I don’t see the append in the blog post.

soup106y ago

this still does a fair amount of processing, curious what the throughput is with just straight reads into a char[] from FileReader

paulddraper6y ago

I'm confused.

Where is the second lines call?

ricardobeat6y ago

Lines 19 and 27 in the source: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

1 more reply

ricardobeat6y ago

That's a pretty big error if you're correct. What does it say about the language when a CS professor falls for this on a 40 line file?

bhaak6y ago

Getting benchmarks right is difficult, even for a CS professor. The language doesn't even enter there.

What does it say about you that you're asking a leading question in this way?

1 more reply

t-writescode6y ago

About as much as binary search being wrong in many languages for 20 years: https://thebittheories.com/the-curious-case-of-binary-search...

Nothing. Mistakes happen.

paulddraper6y ago

Pray tell...what has Java done poorly to ensure this mistake happened?

znpy6y ago· 5 in thread

The post does nothing to explain how and why, it just throws a couple of outputs from a non specified machine and does no comparison.

It has no baseline and no specs. For all I know, he could have got his 0.5 GB/sec on ab old Pentium II processor.

There is no analysis.

I am perplexed.

aristus6y ago

Lemire is one of the leading experts on string matching and the author of several core libraries you probably use every day.

There is a difference between "I'm being dumb because I don't know what I'm doing" and "I'm being lazy because I've done it 1,000 times and the target audience knows what I mean".

adrianN6y ago

Then they should know enough to give at least some theories that can explain the difference and tell us something more about the test setup.

1 more reply

saagarjha6y ago

That doesn’t excuse him from needing to describe the hardware he ran the benchmark on.

1 more reply

pjmlp6y ago

rowanG0776y ago

Then he should really know better. Not sure why you think the sentence you have written is even an argument.

1 more reply

adrianN6y ago· 5 in thread

So what's the reason for this? Is it maybe because of some unicode shenanigans? Java characters are 16bit iirc, and strings have some forty bytes of constant overhead.

d2mw6y ago

I'm no Java ninja, but a few things jump out of https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c... :

- at least one heap allocation for every line. After it finds the EOL it first uses 'new String' followed by '.toString()

- the C++ version is reusing its output buffer, no reallocations assuming the same string length or less

No idea about encodings in Java, maybe that is playing a role too

zlynx6y ago

As a result std::string was disgustingly slow compared to C code.

adrianN6y ago

Yes the reuse of the buffer in C++ seems likely and would probably explain a large part of the difference, but I don't know enough of the std::string implementation to be sure about that.

soup106y ago

the stringbuffer and string allocation will make it slower for sure, curious what performance the other ways of reading lines in java have

nitwit0056y ago

nullwasamistake6y ago· 5 in thread

Eh, he didn't use NIO. BufferedReader is an ancient Java relic. Like reading from STDIN in c, it's not made to be fast, it's there for convenience and backwards compatibility.

Read a file using something like Vert.X, which is optimized for speed. I'm 100% confident it will be faster than the naive c approach

djhworld6y ago

Do you have example of alternatives for the BufferedReader with the NIO APIs?

nullwasamistake6y ago

I would check out the compression handlers in Netty, which underlies Vert.X and many other projects that need high IO performance.

You should be able to hack something together that feeds zero copy buffers into Netty compression handler. Maybe using Netty or Vert.X file API, or maybe just raw NIO2.

I'm not sure how fast this would be, but my gut says "very". Netty can easily saturate 40 Gigabit Ethernet lines, and file IO should have less overhead. That's ~5 gigabytes a second.

You may want to look into ZStandard as well. It's Superior to Gzip in most ways when you need something fast but decent.

_old_dude_6y ago

The NIO API uses channels and buffers

  Path path = ...
  var count = 0;
  try(var channel = FileChannel.open(path)) {
    var buffer = ByteBuffer.allocateDirect(8192);
    while(channel.read(buffer) != -1) {
      while(buffer.hasRemaining()) {
        if (buffer.get() == '\n') {
          count++;
        }
      }
      buffer.clear();
    }
  }
  System.out.println(count);

but in your particular case, i don't think there is a Gzip decoder that works on ByteBuffer.

egwor6y ago

Do you have a quick github project example?

nullwasamistake6y ago

Sorry, I'm too lazy to create a throwaway GitHub repo. I make way too many flamey and policital comments to tie my real name to HN posts :-/

derefr6y ago· 4 in thread

I don't know what Java's BufferedReader is doing, but it's probably not the optimal thing in terms of IO throughput. I would blame the algorithm long before blaming anything inherent about the JVM.

This approach is probably also already built in to Java's "better" IO libraries, like NIO.

ubu77376y ago

edit: Lemire is showing a lack of what Fowler describes as "mechanical sympathy".

https://martinfowler.com/articles/lmax.html#QueuesAndTheirLa...

Matthias2476y ago

derefr6y ago

(If people here think it's a bug, I'll probably go to the effort at some point to profile the performance impact and submit it as a bug on https://bugs.erlang.org.)

micmus6y ago

That's probably not true in the recent versions - in OTP 20 the zlib integration was reworked and is now based on NIFs (similar to Java's JNI) instead of port drivers.

chvid6y ago· 2 in thread

The correct way to do it is using collect(Collectors.joining(“\n”)) or straight forward imperative style (without streams).

I don’t think the general statement holds (that java or buffered reader is cpu bound in particular).

pvg6y ago

Where do you see string concatenation with the plus operator in the posted code?

chvid6y ago

Haha - nowhere ... I completely misread his code.

pjmlp6y ago· 2 in thread

Completely wrong.

It is like asserting something about C based on GCC specific behaviour.

Java is not a single language implementation.

tom_mellior6y ago

Agreed. I think you are being downvoted because you forgot to paste in your benchmark results.

pjmlp6y ago

Yeah, I forgot that we aren't allowed to have opinions without doing the hard work.

So now I have to go out and do benchmarks across all these implementations just to prove they aren't the same?!?

https://en.m.wikipedia.org/wiki/List_of_Java_virtual_machine...

tawy123456y ago· 1 in thread

I'm amazed at how upset some commenters are about a blog post that did a toy experiment and didn't actually make any strong claims.

jnordwick6y ago

I can make a lot of things cpu bound with crappy implementation. I'm not going to write a blog post about any of them.

Sindisil6y ago

Honestly, it seems that nearly everyone here is missing his point.

Some of the blame for that probably lies with his headline choice, but he clearly states at the end of this post:

Many firms probably just throw more hardware at the problem. """

It's not about this piece of code. It's not even about Java In the previous post he mentions at the start of this one, he pointed out:

""" These results suggest that reading a text file in C++ could be CPU bound in that sense that buying an even faster disk would not speed up your single-threaded throughput. """

[1] https://www.prowesscorp.com/computer-latency-at-a-human-scal...

1 more reply

ubu77376y ago

This is absurd, the original platform libraries do not account for the fastest use-cases in any specialized IO case.

vbezhenar6y ago

The first test is simply does XOR the whole array. It's the ideal result which should correspond to memory bandwidth.

Benchmark: https://pastebin.com/xMD4W8mn

tantalor6y ago

barbarbar6y ago

I am a bit confused. The scanFile reads from a file. But the readLines inside the for loop is reading from a StringReader - and not from a file?

IloveHN846y ago

But all the stream API isso terribile for performance..you write a one line code and you are already at O(n^5)

j / k navigate · click thread line to collapse