The computers are fast, but you don't know it (opens in new tab)

(shvbsle.in)

561 pointsdropbox_miner4y ago795 comments

795 comments

255 comments · 62 top-level

jerf4y ago· 72 in thread

I've been lightly banging the drum the last few years that a lot of programmers don't seem to understand how fast computers are, and often ship code that is just miserably slower than it needs to be, like the code in this article, because they simply don't realize that their code ought to be much, much faster. There's still a lot of very early-2000s ideas of how fast computers are floating around. I've wondered how much of it is the still-extensive use of dynamic scripting languages and programmers not understanding just how much performance you can throw away how quickly with those things. It isn't even just the slowdown you get just from using one at all; it's really easy to pile on several layers of indirection without really noticing it. And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.

I have a hard time using (pure) Python anymore for any task that speed is even remotely a consideration for anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it.

munificent4y ago

I agree 100%. I wish every software engineer would spent at least a little time writing some programs in bare C and running them to get a feel for how fast a native executable can start up and run. It is breathtaking if you're used to running scripting languages and VMs.

Related anecdote: My blog used to be written using Jekyll with Pygments for syntax highlighting. As the number of posts increased, it got closer and closer. Eventually, it took about 20 seconds to refresh a simple text change in a single blog post.

I eventually decided to just write my own damn blog engine completely from scratch in Dart. Wrote my own template language, build graph, and syntax highlighter. By having a smart build system that knew which pages actually needed to be regenerated based on what data actually changed, I hoped to get very fast incremental rebuilds in the common case where only text inside a single post had changed.

Before I got the incremental rebuild system working, I worked on getting it to just to a full build of the entire blog: every post page, pages, for each tag, date archives, and RSS support. I diffed it against the old blog to ensure it produced the same output.

Once I got that working... I realized I didn't even need to implement incremental rebuilds. It could build the entire blog and every single post from scratch in less than a second.

I don't know how people tolerate slow frameworks and build systems.

throwaway8943454y ago

Yeah, I've written static site generators in Go and Rust among other languages (it's my goto project for learning a new language). Neither needed incremental builds because they build instantly. The bottlenecks are I/O.

I've also worked in Python shops for the entirety of my career. There are a lot of Python programmers who don't have experience with and thus can't quite believe how much faster many other languages are (100X-1000X sounds fast in the abstract, but it's really, really fast). I've seen engineering months spent trying to get a CPU-bound endpoint to finish reliably in under 60s (yes, we tried all of the "rewrite the hot path in X" things), while a naive Go implementation completed in hundreds of milliseconds.

Starting a project in Python is a great way to paint yourself into a corner (unless you have 100% certainty that Python [and "rewrite hot path in X"] can handle every performance requirement your project will ever have). Yeah, 3.11 is going to get a bit faster, but other languages are 100-1000X faster--too little, too late.

1 more reply

jcelerier4y ago

> I agree 100%. I wish every software engineer would spent at least a little time writing some programs in bare C and running them to get a feel for how fast a native executable can start up and run. It is breathtaking if you're used to running scripting languages and VMs.

Conversely when 99.9% of the software you use in your daily life is blazing fast C / C++, having to do anything in other stacks is a complete exercise in frustration, it feels like going back a few decades in time

2 more replies

michaelchisari4y ago

I wish product designers took performance into consideration when they designed applications. Engineers can optimize until their fingers fall off, but if the application isn't designed with efficiency in mind (and willing to make trade-offs in order to achieve that), we'll probably just end up right back in the same place.

And a product which is designed inefficiently where the engineer has figured out clever ways to get it to be more performant is most likely a product that is more complicated under the hood than it would be if performance were a design goal in the first place.

kllrnohj4y ago

Rather than bare C something like C++, Rust, or even Haskell would be better. C isn't the fastest, especially not with normal code. C++ templates get a bad rep, but if you want to go fast they are extremely hard to beat.

Also those languages show you don't actually have to give up modern features or even that much convenience in order to get blazing fast speeds.

necovek4y ago

In a sense, knowing this can also hurt you.

At all my recent jobs, I grow frustrated with how slow running a single unit test is locally on a codebase. We are talking 5+ seconds for even the most trivial of trivial unit tests (say, purely functional arithmetic unit test).

And this is even with dynamic languages like Python (you see pytest reporting how your unit test completed in 0.00s, and wall time is 7s).

And then I get grumpy if they don't let me go and fix it because I am the only one who is that annoyed with this :D

1 more reply

fullstackchris4y ago

While I'll take a bite at this, I think it's also fair to say how poorly portable C is. Can an mobile or web engineer quickly take some C code and use it in their stack somehow? I would guess not. While it's indeed an important lesson to see the speed of some of these 'close to the metal' languages, the question of how practical they are to use is a different question.

1 more reply

Aeolun4y ago

This was about my experience switching from webpack to ESBuild for Javascript. Why do incremental builds if rebuilding the whole thing takes just 2s (as opposed to 90+ with webpack).

nyanpasu644y ago

I wish C++ compilers written in C++ were blazing fast too.

1 more reply

latenightcoding4y ago

off topic but I initially didn't notice your username but the second I read "I wrote my own template language in Dart" I knew who it was.

1 more reply

zozbot2344y ago

Please don't write programs in bare C. Use Go if you're looking for something very simple and fast-enough for most uses; it's even memory safe as long as you avoid shared-state concurrency.

6 more replies

Guest190238924y ago

I upgraded a desktop machine the last time I visited my family. It was a Windows 7 computer that was at least 10 years old with 4GB of ram. They wanted to use it online for basic web browsing, so I thought I'd install Windows 10 for security reasons and drop in a modern SSD to upgrade the old 7200rpm drive to make it more snappy.

Well, it felt slower after the "upgrade". Clicking the start menu and opening something like the Downloads or Documents folder was basically instant before. Now, with Windows 10 and the new SSD there was a noticeable delay when opening and browsing folders.

It really made me wonder how it would be running something like Windows 98 and websites of the past on modern hardware.

nequo4y ago

I wonder if you'd have any more luck with that hardware putting Ubuntu Mate on it. For basic web browsing, it probably wouldn't matter much to your family whether it's running Windows or Linux.

2 more replies

speedgoose4y ago

Old windows run a bit slow on a web browser: https://copy.sh/v86/?profile=windows98 or https://bellard.org/jslinux/vm.html?url=win2k.cfg&mem=192&gr...

babypuncher4y ago

Throw in more RAM and Windows 10 will likely feel snappier than Windows 7 did.

It's probable the old Windows 7 install was 32-bit while your fresh install of 10 would have defaulted to 64-bit. That combined with 10's naturally higher memory requirements means the system has less overhead to work with.

4 more replies

xen2xen14y ago

Windows 10 or 11 with 4gb of RAM is a BAD idea. 8 gb is a minimum. Found that out several times.

1 more reply

hnick4y ago

Try Win-R and type "notepad", at a reasonably fast programmer's pace. It consistently loses "no" for me, sometimes more if it's feeling particularly slow.

This should involve absolutely zero disk reads or anything of the sort, it's a window that runs a command. And it used to work reliably in past years. It feels like keyboard input simply isn't buffered like it used to be. Calculator it even worse as it loses input if you start typing the formula too soon. It used to be very easy for casual calculations now I have to wait for the computer.

dataflow4y ago

You'll want to stop using the new start menu. Use OpenShell. It's fast and even better than the old menus.

ishjoh4y ago

In a similar vein I installed Ubuntu on an older laptop that had been running Windows 10. I was shocked at how fast it was compared to Windows 10, it was night and day.

askafriend4y ago

Let the caches warm up a little!

1 more reply

moffkalast4y ago

That's interesting, I cloned a Win10 installation on a HDD to a sata SSD a year or two back and the speed difference was considerable. Especially something like Atom that took minutes to open before was ready to go in like 10 seconds afterwards.

A lot of things remained slow though.

4 more replies

MikusR4y ago

10 years of malware definition updates. 10 years of countless security additions. Every operation needs to be checked for correction, memory safety etc.

kossTKR4y ago

I hope one day latency in general will be "back to normal".

I still remember how fast console based computing, an old gameboy or a 90's macintosh would be - click a button and stuff would show up instantly.

There was a tactility present with computers that's gone today.

Today everything feels sluggish - just writing this comment on my $3000 Macbook Pro and i can feel the latency, sometimes there's even small pauses. A little when i write stuff, a lot when i drag windows.

Hopefully the focus on 100hz+ screens in tech in general will put more focus on latency from click to screen print - now when resolution and interface graphics in general are close to biological limits.

LoveMortuus4y ago

May I ask if you're using the M1 based MacBook or the Intel one?

I'm asking because I've been thinking of getting a MacBook Air in the future with the intent to use it for writing.

3 more replies

41b696ef11134y ago

>Hopefully the focus on 100hz+ screens in tech

Come again? I think anything beyond 60hz still qualifies as niche. Vendors are still selling 720p laptops.

3 more replies

hnick4y ago

For me, there is far more latency on typical operations, but far less waiting for longer intensive operations like opening a program/tab or saving a file (bloat aside, some are guilty here).

I'd also prefer the sluggishness gone if I had my choice between the two.

marcosdumay4y ago

It's not only a matter of 750ms instead of 200ms. I'm astonished every time I open some tool like Visual Studio, SAP Power Designer, or Libre Office that can stay for the most part of a minute on its loading screen.

What do those tools even do for that long? They can read enough data from the disk to overflow my computer's main memory a few times during it.

m12k4y ago

I heard optimization described this way: Sure, you think you need to tune the engine, but really, the first thing you need to do is get the clowns out of the car.

grishka4y ago

I remember a video of a guy running an old version of Visual C++ on an equally old version of Windows, in a VM on modern hardware, to try Windows development "the old way". It took about one frame to launch. One. Frame.

By the way, Apple isn't much better. Xcode takes around 15 seconds to launch on an M1 Max.

edit: probably this video https://youtu.be/j_4iTovYJtc?t=282

6 more replies

knorker4y ago

Not that it invalidates anything you said, but it was 750ms vs 200 microseconds.

But yeah. I agree. Why does Lightroom take forever to load, when I can query its backing SQLite in no time at all?

And that's not even mentioning the RAM elephant in the room: chrome.

Younglings today don't understand what a mindbogglingly large amount of data a GB is.

But here's the thing: it's cheaper to waste thousands of CPU cores on bad performance than to have an engineer spend a day optimizing it.

4 more replies

rasz4y ago

They read stuff off disk 1-4 bytes at a time and malloc() in a loop, then deserialize which calls its own malloc(). Its tiny malloc()s all the way down, and strlens for sure https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

The result is usually one CPU core running at 40% with sporadic disk access while you stare at Loading progress bar.

blktiger4y ago

I've always assumed they are loading a bunch of stuff into caches and pre-computing things.

1 more reply

chopin4y ago

Phone home. I suspect much of the lag is network latency.

6 more replies

mrtranscendence4y ago

> And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.

Nobody has created a language that is both thousands of times faster than Python and nearly as straightforward to learn and to use. The closest thing I know of might be Julia, but that has its own performance problems and is tied closely to its AI/ML niche. Even within that niche I'm certainly not going to get most data scientists to write their code in C or C++ (or heaven forbid Rust) to solve a performance impediment that they've generally been able to work around.

It's great that you've been able to switch to higher-performance languages, but not everyone can do that easily enough to make it worth doing.

morelisp4y ago

The "iterate from notebook to production" process which is common everywhere but the largest data engineering groups rules out anything with manual memory management from becoming popular with data science work.

Some data scientists I know like (or even love) Scala, but that tends to blow up once it's handed over to the data engineers as Scala supports too many paradigms and just a couple DSs will probably manage to find all of them in one program.

We use Go extensively for other things, and most data scientists I've worked with sketching ideas in Go liked it a lot, but the library support just isn't there, and it's not really a priority for any of the big players who are all committed to Python wrapper + C/C++/GPU core, or stock Java stacks. (The performance also isn't quite there yet compared to the top C and C++ libraries, but it's improving.)

1 more reply

stefanos824y ago

> Nobody has created a language that is both thousands of times faster than Python and nearly as straightforward to learn and to use.

Not Python-based, but Lua-based is Nelua [1]

If you like Lua's syntax, LISP's metaprogramming abilities, and C's performance, well there you have it!

[1] https://github.com/edubart/nelua-lang

alar444y ago

I think that's my argument. If a developer thinks C or C++ is really that difficult and they can only write effectively in Python, they're a shitty developer and the world seems to be jam packed with them.

Gibbon14y ago

C# is faster than python and as easy to use.

4 more replies

divan4y ago

Go hit a really sweet spot here.

pjmlp4y ago

It is called Common Lisp, maybe you have heard of it.

V1ndaar4y ago

I don't know, but imo Nim begs to differ.

lmm4y ago

I was able to convert a couple of my data scientist colleagues over to using Scala (given that they were writing code for our Spark cluster it seemed like a no-brainer compared to Python or R). It's not thousands of times faster but it might be ten or a hundred times faster, and a lot of the time you can write the very same code aside from punctuation (and even that difference is smaller in Scala 3, although I don't think Spark has moved to that yet).

Dylan168074y ago

Those speed differences don't come from language, they come from using terribly wrong data structures and algorithms.

pjvsvsrtrxc4y ago

And yet, even with all the evidence that modern, heavily-bloated software development is AWFUL (constant bugs and breakage because no one writing code understands any of the software sitting between them and the machine, much less understands the machine; Rowhammer, Spectre, Meltdown, and now Hertzbleed; sitting there waiting multiple seconds for something to launch up another copy of the web browser you already have running just so that you can have chat, hi Discord)... you still have all the people in the comments below trying to come up with reasons why "oh no it's actually good, the poor software developers would have to actually learn something instead of copying code off of Stack Overflow without understanding it".

doliveira4y ago

During work I'm constantly reminded of how much better and snappier MSN Messenger was when compared to Teams.

lynguist4y ago

I agree except for the Python bit, which is factually wrong.

Python allows you to program as if you’re a jazz pianist. You can improvise, iterate and have fun.

And when you found a solution you just refactor it and use numba. Boom, it runs the same speed as a compiled language.

I once wrote one little program that ran in 24 min without numba and ca. 8 seconds with numba.

3234y ago

There is more to programming than iterating over number arrays, which is pretty much the only scenario where Numba shows impressive speedups.

1 more reply

willis9364y ago

Dozens of instances of a C GUI can launch in the time it takes to launch a hello world python program.

2 more replies

Tade04y ago

> And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.

At least with Chrome's V8, the difference is not that big.

Sure, it loses to C/C++, because it can't vectorize and uses orders of magnitude more memory, but at least in the Computer Language Benchmarks Game it's "just" 2-4x slower.

I remember getting a faster program doing large matrix multiplication in JavaScript than in C with -o1, because V8 figured out that I'm reading from and writing to the same cell, so optimised that out, which gave it an edge, because in both cases the memory bandwidth limited the speed of execution.

As for Electron and the like: half of the reason why they're slow is that document reflows are not minimized, so the underlying view engine works really, really hard to re-render the same thing over and over again.

It's not nearly as visible in web apps, because these in turn are often slowed down by the HTTP connection limit(hardcoded to six in most browsers).

jerf4y ago

Languages top out at around 50x, and that's the extreme of pure CPython to C.

For as many factors of magnitude as I am talking about, you have to be screwing up algorithms, networks, and a whole bunch of other things too.

Python and similar languages like Ruby really do make it easy to accidentally pile things on top of each other, but you can screw up in pure assembler with enough work put into it. Assembler doesn't stop you from being accidentally quadratic or using networks in a silly way.

Yajirobe4y ago

Python allows one to save development time in exchange for execution time

wizofaus4y ago

Except as a developer I lose lots of time if I have to wait long for my code (esp. Unit tests) to run. Having said that larger projects in C/C++ are often very slow to build (esp. if dependencies are not well defined and certain header files affect huge numbers of source files - a problem that doesn't exist with higher level languages). But even if using a particular language and framework saves developer time, it rarely seems to translate into developers using that saved time to bother optimizing where it might really count.

pavon4y ago

I've not found that to be the case. The first draft might get done faster, but then I spend more time debugging issues in dynamic languages that only show up at runtime that the compiler would find in other languages. And then more time optimizing the code, adding caching, moving to more advanced algorithms, and rewriting parts in C just to get it to run at a reasonable speed when the naive approach I implement in other languages is fast enough on first try.

For most tasks, modern mid-level statically typed languages like C#, Go, Kotlin really are the sweet spot for productivity. Languages like Python, Ruby and JS are a false economy that appear more productive than they really are.

zasdffaa4y ago

Get beyond a certain size of python program and you lose dev time.

IOW you lose both. It's not a huge size either.

1 more reply

david4224y ago

But it just comes back to bite you later on maintenance costs.

1 more reply

morelisp4y ago

Python hasn't saved me development time since distutils was the right and only way to build things.

stonogo4y ago

That's only an excuse if you're sociopathically profit-oriented. The program is developed orders of magnitude fewer times than it is run. Shitty performance, like pollution, is an externality that can be ignored but should not.

1 more reply

solatic4y ago

> any task that speed is even remotely a consideration for anymore

How do you know whether or not speed is a consideration?

Yes, OP delivered impressive efficiency gains. I'm sure he could improve the efficiency even more by dropping into pure Assembly.

But is it worth it?

The prime consideration is not execution speed but maintainability. The further that OP got away from pure Python, the more difficult to maintain the code became. That's a downside.

Now, OP describes an important technique because in the real world, you have a performance budget. Code needs to execute at speeds that return quickly enough to the user, or long execution is financially expensive (i.e. cloud computing resources), etc. But optimizing beyond what the budget requires is wasteful in terms of time needed to do the optimization as well as harmful in terms of negatively impacting future maintainability.

Tomis024y ago

> The prime consideration is not execution speed but maintainability.

Why? And how did you measure this drop in maintainability? I'm asking because I see developers prioritize _perceived_ maintainability over _measurable_ things that matter to the user (like performance).

agumonkey4y ago

Remember when people were counting cpu cycles and instruction size to ensure performance ?

SV_BubbleTime4y ago

I program for embedded… still do that.

1 more reply

fmakunbound4y ago

I’m afraid we just “deploy more pods” these days

dekhn4y ago

I don't really find python slow for what I do (typically writing UIs around computer vision systems) but also, several years back I made a microcontroller-based self-balancing robot. It was hard to debug the PID and the sensor, so I replaced it with a Pi Zero and the main robot loop ran in python- enough to read the accelerometer, compute a PID update, and send motor instructions- 100 times a second. If there was a problem (say, another heavy process, like computer vision, running on the single CPU) it would eventually not respond fast enough and the robot would fall over.

Most of the time it's not that you need a faster language, it's that you need to write faster code. I was working on a problem recently where random.choices was slow but I realized that due to the structure of my problem I could convert it to numpy and get a 100X speedup.

geysersam4y ago

More important than the language, is using the right tool for the job. If you are using the scientific Python stack, correctly, you'll have a difficult time beating that with c++. For many applications. While producing way simpler and more maintainable code.

yoyohello134y ago

I felt this pretty viscerally recently. I did Advent of Code 2021 in python last year. My day job is programming in Python so I didn't really think about the execution speed of my solutions much.

As a fun exercise this year I've been doing Advent of Code 2020 in C, and my god it's crazy how much faster my solutions seem to execute. These are just little toy problems, but even still the speed difference is night and day.

Although, I still find Python much easier to read and maintain, but that may just be I'm more experienced with the language.

jeroenhd4y ago

> Although, I still find Python much easier to read and maintain, but that may just be I'm more experienced with the language.

Python is definitely easier to read and maintain if you have loads of dependencies. C dependency management is a pain.

If you can read and write a little C, you should consider giving C#/Java/Kotlin/Swift a try. They're probably an order of magnitude slower than C if you write them in a maintainable style, but they're still much faster than Python. If you're doing stuff like web APIs then ASP.NET/Spring will perform very admirably without manually optimizing code, for example. You might find that these languages are C-like enough to understand and Python-like enough to be productive in. Or you might not, but it's worth a shot!

I personally believe that C is difficult if not impossible to properly to maintain long term, at least not as much as the faster alternatives. On the other hand my experience with Python is that it's one of the slowest mainstream languages out there, relying heavily on C libraries to get acceptable performance.

pjmlp4y ago

I learned that lesson with Tcl and Perl, 20 years ago.

If there isn't a compiler in the box (JIT or AOT), I won't be using language XYZ, unless forced by customers.

The only reason I use Python is for UNIX scripting.

tbran4y ago

Haha, what C/C++ web framework should I use instead of Django/Rails/JS-whatever? Performance is a consideration, but I'm not going to reinvent a bunch of packages because of it.

This kind of blanket comment that "scripting languages are too slow" makes it sound like you shouldn't use them for anything, but they are perfectly adequate for many tasks. I'm more likely to have network and DB slowdowns than problems with scripting languages.

yieldcrv4y ago

There is a balance, like sure there is inefficient code but often its because that code is accessing an I/O resource inefficiently, and so the CPU and RAM speed of the host machine isnt the bottlebeck no matter what dumb things the programmer does

So you dont need to pretty much ever reinvent or even use a hackerrank algorithm, you need to understand that the database compute instance has a fast cpu and lots of RAM too

divan4y ago

So much this.

I wonder what would be the software engineering landscape today if hardware specs were growing like 10% per year...

1vuio0pswjnm74y ago

"I have a hard time using (pure) Python anymore for any task that speed is even remotely a consideration anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it."

vbezhenar4y ago

Back in time all you needed for perfect performance is to use C and proper algorithms. It was easy.

Nowadays you need vector operations, you need to utilise GPU, you need to utilise various accelerators. For me it is black magic.

5 more replies

skohan4y ago· 27 in thread

I remember the moment I realized how fast computers are at uni. I was in an algorithms course, and one of our projects was to make a program which would read in the entire dataset from IMDB of films and actors, and calculate the shortest path between any actor and Kevin Bacon using actors and movies as nodes and roles as edges.

I was working in C, and looking back I came up with a quite performant solution mostly by accident: all the memory allocated up front in a very cache-friendly way.

The first time I ran the program, it finished in a couple seconds. I was sure something must have failed, so I looked at the output to try to find the error, but to my surprise it was totally correct. I added some debug statements to check that all the data was indeed being read, and it was working totally as expected.

I think before then I had a mental model of a little person inside the CPU looking over each line of code and dutifully executing it, and that was a real eye-opener about how computers actually work.

rufus_foreman4y ago

>> The first time I ran the program, it finished in a couple seconds. I was sure something must have failed

At one of my first jobs I was a DBA supporting a CRUD app in the finance industry. The app had one report that took forever and usually timed out, I was told to take a look at it. The DB query was just missing a couple indexes so I added those.

After I added them, my boss told one of the users of the app to try out the report and she said it was still broken. He asked what she meant and she said she clicked the button and the page with the results came up right away. She thought it was broken because it didn't take forever.

Balgair4y ago

If I recall correctly, the first few ATMs near Wall Street had the same issue. They were too fast and people were suspicious. They had to add in a delay so folks would feel alright using them.

1 more reply

winternett4y ago

I was working at my UNI library around 97 when a fresh t3 (~45MBPS) line was just installed... We also got brand new top of the line Micron computers as well there. I was the first person to test the connection and after years of working on 56k modems I couldn't believe how everything I clicked suddenly worked at the speed of light. Videos I clicked on (on MTVs web site back then) loaded instantly, almost felt as if they loaded before I clicked on links... I have never had anything load as quickly since, even on my home Internet, which is directly connected to my router & 250MBPS plan.

I blame all the ads, tracking, and bloatware that is prevalent now most of all.

sparker726784y ago

Gosh, yes, this. There was a brief era where websites loaded instantly on those fast network connections, and it was glorious.

It's bananas how slow the web is today on average when you're on a symmetric gigabit connection.

meshaneian4y ago

Just like freeways - add more traffic lanes, get more traffic.

1 more reply

vishnugupta4y ago

For most of my work CPUs form the last decade will work just fine. It’s the memory and, especially, disk IO that kills the performance. SSDs have helped big time.

von_lohengramm4y ago

I'd argue that SSDs have done more harm than good. Since the worst-case is now far superior that it used to be (HDDs), most developers see no need to optimize any further. For example, plenty of video game engines will stream copious amounts of data from disk instead of optimizing memory usage, asset size, and in general more creative solutions (i.e. shader effects instead of GBs of redundant assets). If hitting the disk slowed everything to a crawl, then maybe software would've been designed in much more efficient ways. Good (enough) is the enemy of great.

4 more replies

ano888884y ago

Why can't we have a language easy to read and maintain but also have the speed of C?

flohofwoe4y ago

It's not "the C part" that makes code run fast, but memory access patterns. C just happens to not get in the way between the coder and the machine when it comes to explicit control over memory layout. In the late 60's and early 70's this was probably an "accidential feature", but with the widening CPU/memory performance gap it turned out that later languages (from the late 90's and early 00's) had bet on the wrong horse by trying to abstract memory away. More recently this trend is reversing again and plenty of C alternatives are starting to appear with explicit control over memory layout (Zig, Nim, Odin, Rust, ...).

3 more replies

paskozdilar4y ago

Go fills that spot for me.

I've done video4linux stuff in Go, and passing an unsafe.Pointer to a Go struct in an ioctl() worked fine, which tells me that Go structs are isomorphic to C structs. Even though Go has garbage collection, it allocates everything it can on the stack, so only long-lived shared-between-goroutines objects are subject to garbage collection.

Go abstracts concurrency, completely removing all concurrent features from a language except for the "go" keyword (that launches a goroutine - which is basically a tiny virtual thread), channels (which are selectable queues) and "select" keyword that waits for the first "input" from a static set of channels.

2 more replies

jakobnissen4y ago

Rust is that if you keep the code simple without too many trait objects and macros and type magic. But then it becomes even more constrained than it actually is. It'll also be boilerplate heavy and hard to write.

Julia is fast, easy to read, and easy to write. But it's not easy to maintain. There is a direct tradeoff between dynamism on one hand making things easier to read/write and static enforceability on the other making it easier to maintain.

mcbuilder4y ago

Haskell has an elegance, and can be written as simply as C. Usually fast enough, and can be optimized as well. The downside is that you will be sucked into a rabbit hole of academic type theory and wonder how best to express your system as a Free Monad instead of bashing it out like any sane C programmer. Just kidding, someone already figured out those hard parts for you, you just forgot to browse for it on Hackage.

1 more reply

pjmlp4y ago

Pascal dialects, Modula-2 are of a similar age, while other like JOVIAL are a decade older but they did not come with a very big killer feature, an OS like UNIX.

Then there were BLISS, Mesa and PL/I, but the OSes that made use of them lost to UNIX, so.

With exception of Mac OS, written in Object Pascal and later ported to a mix of Object Pascal and C++.

Having said this, plenty of alternatives with AOT compilers exist nowadays.

The only thing C has going for it, is historical weight, UNIX/POSIX ecosystem, and some domains that are closed to any alternative suggestions, due to tooling or cargo cult against alternatives.

Banana6994y ago

Because "easy to read and maintain" is about humans, and "speed of C" is about machines, and there is a vast gulf between the 2 that always force you to compromise in one or the other to get the 2 together, and usually both.

Code being easier to read and maintain is a function of how close it is to human semantics. The more the algorithm is presented in terms and notations humans like and find familiar, the easier. Code being performant is a function of how close it is to machine semantics, the more the algorithm is presented as steps that the machine likes and finds familiar, the faster it will run, as the machine is doing less to execute each step.

There is a fundamental tension between the 2, even if compilation from high level languages might, at first glance, give us the illusion that we can have both. We can't, not in general. We can only do it for a class of human semantics that C++ folks call "Zero-Cost Abstractions", the set of abstractions that can be completely erased without a trace by the time you get to the executable.

But otherwise, there is a fundamental cost to making code more readable by humans: making it less readable by the machines that will execute it. This is a reflection of the fundamental alienness of computers, what they find quite easy you find quite hard and vice verca. Optimizing for huamans means generality and ruthless hiding of details, optimizing for machines is all about special cases and ruthless exploitation of assumptions.

(Incidentally, C is not all what it's cracked up to be. Generic containers, off the top of my head, resort to using void* pointers for data and function pointers for operation, which has a runtime cost besides being unsafe and error-prone. Templates in C++ can aggressively inline types and operations for you, on the other hand as if you haven't written generic code at all, no wonder templates is the poster boy for C++'s 0-Cost abstractions. Another example I hear often is how pointer semantics in C and C++ makes it extraordinarily difficult for the compiler to optimize array and memory operations, whereas a language like Fortran make it easier by not having pointers.)

Shorel4y ago

For many things, I have found this easy language is C++.

I use JavaScript and C++ for different things, sometimes in the same day. (And python and PHP and others, but this is not relevant.)

Believe me, JavaScript can be a real head scratcher compared to C++.

And now for the purists: No, I don't use all features of C++, only the minimal necessary ones for the problem I have to solve. This ridiculous idea that you are not using C++ if you are not using every single language feature is what makes programs difficult to write and maintain.

foerbert4y ago

Who says we don't? I think I'm past-due for my regular Ada plug, so uh, here it is.

xigoi4y ago

We can.

https://nim-lang.org/

FeepingCreature4y ago

D as well, though only (like Rust) if you keep the traits and template stuff low.

The nice thing about D for me is that you can generally banish the unreadable metaprogramming code to a library.

anfelor4y ago

Ocaml 5 and Koka (https://koka-lang.github.io/koka/doc/index.html) can get quite close to the speed of C...

1 more reply

mikewarot4y ago

>Why can't we have a language easy to read and maintain but also have the speed of C?

Swap out { } for Begin End, and make a few other changes, and you've got Pascal. Single pass pascal compilers have been faster (at compiling) that almost anything out there since Turbo Pascal 3.0 for MS-DOS.

Modern versions, such as Free Pascal, Delphi and Lazarus also deal with strings in a manner that totally avoids needing to manually manage memory. The GUI builders are awesome as well.

killingtime744y ago

Transpile to C? Zig or Rust?

3 more replies

thorncorona4y ago

Julia is that for a lot of academia + mathematics based code.

meheleventyone4y ago

Zig!

corford4y ago

We do. It's called D

1 more reply

jayp14184y ago

Ada Programming Language

enriquto4y ago

> Why can't we have a language easy to read and maintain but also have the speed of C?

Fortran is fine. Also lua (using the luajit interpreter you get really close to C speed) and julia (except for the atrocious startup time).

rboes4y ago

I did that assignment. Did you go to uw?

forinti4y ago· 20 in thread

On a 3GHz CPU, one clock cycle is enough time for light to travel only 10cm.

If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

dragontamer4y ago

> If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

The latency on multiplication (register input to register output) is 5-clock ticks, and many computers are 4GHz or 5GHz these days.

5-clock cycles at 5GHz is 1ns, which is 30-centimeters of light travel.

If we include L1 cache read and L1 cache write, IIRC its 4 clock cycles for read + 4 more for the write. So 13 clock ticks, which is almost 70 centimeters.

------------

DDR4 read and L1 cache write will add 50 nanoseconds (~250 cycles) of delay, and we're up to 13 meters.

And now you know why cache exists, otherwise computers will be waiting on DDR4 RAM all day, rather than doing work.

moonchild4y ago

> The latency on multiplication (register input to register output) is 5-clock ticks

https://www.agner.org/optimize/instruction_tables.pdf

2 more replies

reacharavindh4y ago

This reminds me of that “todo” I wrote for myself a long time ago. These days processors come with bigger L1,L2, and L3 caches. Would it be possible for a program that works on a tiny bit of data(few KB) to load it all up in the cache and provide ultimate response times?!

Are there any directives to the Operating System to say - “here keep this data in the fastest accessible L[1,2,3] please”?

7 more replies

generalizations4y ago

I ran across an animation once that showed graphically the time it takes light to travel between the planets and the sun. It's weird, but light doesn't seem that fast anymore.

ineedasername4y ago

The speed of light has really not kept pace with Moore's Law. Engineers have focused overly much on clock speed and transistor density and completely ignored C, and it's really beginning to show.

3 more replies

somenameforme4y ago

The much more remarkable thing is to consider that that speed of light is also the speed of causality itself. It takes light from the sun about 8 minutes to reach Earth. If the sun suddenly disappeared, we'd still see it shining brightly in the sky, and the Earth would continue revolving around it - all for another 8 minutes until reality finally caught up to us. So we're already computing at a rate on the verge of the speed of reality itself.

It's interesting to consider this paired against how technologically primitive we ostensibly must be, given that digital computers didn't even exist 90 years ago.

1 more reply

edbaskerville4y ago

The thing that did for me is realizing that people on opposite sides of the United States can't play music together if it requires any rhythmic coordination, even with a true speed-of-light signal with no other sources of latency.

2 more replies

shultays4y ago

I feel like it is more like we cant comprehend how big and empty space is

booleandilemma4y ago

I saw that exact animation you're talking about and my first thought was "there's no way that's the fastest thing in the universe."

generalizations4y ago

Edit: found it: https://www.youtube.com/watch?v=nQUwHdSAhmw

capableweb4y ago

If you're using a normal monitor, the bottleneck would be transferring the results of the calculation to the monitor, which commonly have a latency of 3ms or more. So when the monitor displays the calculations, the CPU has already moved on to other things :)

drewtato4y ago

It takes 10-20ms for the pixels to transition on an LCD display. And on 60Hz, it's 8+/-8ms for the monitor to actually address the row with your information. Luckily, the CPU doesn't need to wait for the monitor. And the slowest part of the chain will almost always be getting it from your eyes to your hands (250ms+).

diroussel4y ago

That is quite an amazing way to put it.

So the processor in my hand can compute a multiplication fast than light can cross the room?

jiggawatts4y ago

It can complete many multiplications in that time, especially if you factor in parallelism. An 8-core machine using AVX-512 could do a few thousand 32-bit multiplications in that time. Your GPU can do tens of thousands, maybe hundreds of thousands depending on the model.

1 more reply

wruza4y ago

Not so extraordinary if you take into account that CPU is essentially a thumb-sized labyrinth for light.

1 more reply

dekhn4y ago

Eventually you can't stuff enough computing in a small area (power density). Therefore you have to connect multiple CPUs spread out in space. The limit for many supercomputers is about how long it takes light or electrical signals to travel about 20 meters. Latency to first result is only part of the measurement that matters.

denzil4y ago

This comparison reminds me of Grace Hopper explaining how long nanosecond is: https://www.youtube.com/watch?v=9eyFDBPk4Yw

grishka4y ago

Huh, but then I'm pretty sure that there are some paths inside the CPU die that are long enough that speed of light is a consideration at these frequencies. Must require a lot of smart people to design these things, yet it only takes a bunch of junior developers to bog them down.

orestarod4y ago

There is indeed the speed of propagation of electric potential taken into account, that is, how long it takes for the input of a logical gate or a logical subsystem to produce the output (that involves the propagation of electric potential through the chip's conductors). If your clock is too fast for the size of your subsystem, the result will not be correct at the output before the next cycle begins, so your system will just be bogus.

google2341234y ago

I say we ban all junior devs.

1 more reply

w0mbat4y ago· 9 in thread

Article says at one point, "We have reduced the time for the computation by ~119%!", which is impossible. If you reduce it by 100% it is taking zero time already.

necovek4y ago

People like to talk in percentages when it's obviously unclear what it means, and they frequently get it wrong.

It gets even better when people start switching between percentages and "percentage points" referring to a measure that's in percentages originally.

Unfortunately, most of those things are easier communicated and harder to get wrong if you try speaking in a more natural way. This is now "twice as fast" or "2.1x faster" is much clearer and can't go past zero :)

Similarly, I think it'd help to switch back from percentages to actual factors (119% = 1.19), and saying "we reduced the time for the computation by 1.19 of original time" would clearly show what's wrong (and saying "by 1.19x" would signal how it's a small reduction, so it's wrong as well).

Finally, I am 94.8% certain people will keep using percentages even where inappropriate, and with too much precision too!

Agentlien4y ago

I work primarily with optimizations and depending on context I will express them in how much time it shaves off one iteration ("this saves 1 ms!"), the change in frame rate ("went from 20-22 fps to a stable 26 fps"), or the ratio between before and after ("it's twice as fast", "only takes one third of the time it used to!", ...)

blindmute4y ago

I like to write "Takes X% of the time it took before" for maximum disambiguation

nerdbaggy4y ago

I always get confused by stuff with this. 100% would actually be 50% in the context you are thinking. https://math.stackexchange.com/a/1404242

Panzer044y ago

"faster" vs "reduced time". Many people confuse rate of work with reduction in time, and it's exceptionally annoying :(

necovek4y ago

But it's not only that. Nothing can be improved to be 119% faster either. Maybe the new result makes the old one 119% slower.

It's about language use and what of are those per-cents (per-hundredths).

1 more reply

pindab0ter4y ago

Bonus points if the 'speed is faster'.

2 more replies

cozzyd4y ago

That branch predictor is working really well!

danybittel4y ago

Isn't it crazy how the branch predictor is something like 99% percent correct. Which means a computer is almost deterministic, it almost knows the future. A tiny bit better and we wouldn't need to show up in office.

Of course multiply this by the sheer number of calculations and even that little misprediction results in huge differences. The reality is actually quite sobering: a computer mostly calculates the same thing over and over.

2 more replies

jiggawatts4y ago· 9 in thread

Something all architecture astronauts deploying microservices on Kubernetes should try is benchmarking the latency of function calls.

E.g.: call a "ping" function that does no computation using different styles.

In-process function call.

In-process virtual ("abstract") function.

Cross-process RPC call in the same operating system.

Cross-VM call on the same box (2 VMs on the same host).

Remote call across a network switch.

Remote call across a firewall and a load balancer.

Remote call across the above, but with HTTPS and JSON encoding.

Same as above, but across Availability Zones.

In my tests these scenarios have a performance range of about 1 million from the fastest to slowest. Languages like C++ and Rust will inline most local calls, but even when that's not possible overhead is typically less than 10 CPU clocks, or about 3 nanoseconds. Remote calls in the typical case start at around 1.5 milliseconds and HTTPS+JSON and intermediate hops like firewalls or layer-7 load balancers can blow this out to 3+ milliseconds surprisingly easily.

To put it another way, a synchronous/sequential stream of remote RPC calls in the typical case can only provide about 300-600 calls per second to a function that does nothing. Performance only goes downhill from here if the function does more work, or calls other remote functions.

Yet, every enterprise architecture you will ever see, without exception has layers and layers, hop upon hop, and everything is HTTPS and JSON as far as the eye can see.

I see K8s architectures growing side-cars, envoys, and proxies like mushrooms, and then having all of that go across external L7 proxies ("ingress"), multiple firewall hops, web application firewalls, etc...

nickjj4y ago

I think folks often make trade offs with their working requirements.

If you provide an end result response from your web app to a user's browser in 50ms-100ms (before external latency) then things like 200 microseconds vs 4 milliseconds have less of a meaningful difference. If your app makes a couple of internal service calls (over HTTP inside of the same Kubernetes cluster) it's not breaking the bank in terms of performance even if you're using "slow" frameworks like Rails and get a few million requests a month.

I'm not defending microservices and using Kubernetes for everything but I could see how people don't end up choosing raw performance over everything. Personally my preference is to keep things as a monolith until you can't and in a lot of cases the time never comes to break it up for a large class of web apps. I also really like the idea of getting performance wins when I can (creating good indexes, caching as needed, going the extra mile to ensure a hot code path is efficient, generally avoiding slow things when I have a hunch it'll be slow, etc.) but I wouldn't choose a different language based only on execution speed for most of the web apps I build.

2 more replies

closeparen4y ago

> in the typical case can only provide about 300-600 calls per second to a function that does nothing

This is a provocative framing but I'm not sure it makes sense. Functions aren't resources; they don't have throughput or utilization. It would be bad if a core could only call the function 300-600 times per second, but that is why we have async programming models, lightweight threads, etc. So that the core can do other stuff during the waiting-on-IO slices of the timeline. Which, as you mention, dominate.

It would also be bad if a user had to wait on 300-600 sequential RPCs to get back a single request, but like... don't do that. Remote endpoints are not for use in tight loops. There are cases where pathological architectures lead to ridiculous fanout/amplification, but even then we are usually talking about parallel tasks.

There is overhead to doing things remotely vs. locally. But the waiting isn't the interesting part. It's serialization, deserialization, copying, tracking which tasks are waiting, etc. A lot of performance work goes on around these topics! Compact and efficient binary wire protocols, zero-copy network stacks, epoll, green threads, async function coloring schemes, etc. The upshot of this work is also, as is typical in web/enterprise backend world, not so much about the latency of individual requests (those are usually simple) but about the number of concurrent requests/users you can serve from a given hardware footprint. That is normally what we're optimizing for. It's a different set of constraints vs. few but individually expensive computations. So of course the solution space looks different too.

1 more reply

taeric4y ago

Being fair, for many of the things that it is worth using a microservice for, you should already have some sort of dominant factor to the call that would more than justify the added latency of the remote call. Be it a database read/write or some other heavy calculation.

Granted, this is exacerbated when architectures don't make a good division between control/compute/data planes.

Control plane, which is exposed to users, should almost certainly be limited to a single (or handful, at most) microservice calls. Preferably to the fastest storage mechanism that you have, such that what latency it does add is minimized entirely.

treeman794y ago

Many many years ago I wrote the companies first do not call list cleaner. Feed it a list phone numbers and it would give you the ones not on the DNC.

Converted the list to band performed a simple binary search to find it.

A basic python script. could handle about 4,000 records a second.

Corporate IT reached out to Oracle. Built a custom solution that cost probably a couple hundred thousand.

They tried to force us to use it. They were a little upset when I asked if they could up the performance by a few thousand percent.

I was on their shit list after that until I had leave.

1 more reply

Aeolun4y ago

> can blow this out to 3+ milliseconds surprisingly easily

And then your actual function starts, and returns after roughly 10s.

I think you underestimate just how inefficient enterprise can be. The extra time taken in connections between layers is not even a consideration.

eastbound4y ago

I cringe every time a senior developer thinks he’s more clever by using an @Annotation, a Jointpoint, and Spring’s Aspect-Oriented Programming to solve issues. Not only it appears in the stacktrace as 10 method calls, but we’ve also forbidden the GOTO, and yet senior developers keep implementing the @COMEFROM [1] instruction. Both slow and impossible to debug.

[1] https://en.wikipedia.org/wiki/COMEFROM

jacobolus4y ago

> Performance only goes downhill from here if the function does more work,

Should be the opposite. Overhead as a proportion of total time goes down the more useful work is involved.

1 more reply

ben0x5394y ago

> Performance only goes downhill from here if the function does more work,

Doesn't this mean it's less of a problem? Like, isn't that good?

1 more reply

travisgriggs4y ago

All for the low low price of letting someone derive analytics on your call graph. What a deal!

porcoda4y ago· 8 in thread

Yup. We have gotten into the habit of leaving a lot of potential performance on the floor in the interest of productivity/accessibility. What always amazes me is when I have to work with a person who only speaks Python or only speaks JS and is completely unaware of the actual performance potential of a system. I think a lot of people just accept the performance they get as normal even if they are doing things that take 1000x (or worse) the time and/or space than it could (even without heroic work).

fleddr4y ago

I think it's even stronger than a habit. When you're exposed to the typical "performance" of the web and apps for a decade or so, you may have forgotten about raw performance entirely. Young people may have never experienced it at all.

I once owned a small business server with a Xeon processor, Linux installed. Just for kicks I wrote a C program that would loop over many thousands of files, read their content, sort in memory, dump into a single output file.

I ran the program and as I ran it, it was done. I kept upping the scope and load but it seems I could throw anything at it and the response time was zero, or something perceived as zero.

Meanwhile, it's 2022 and we can't even have a text editor place a character on screen without noticeable lag.

Shit performance is even ingrained in our culture. When you have a web shop with a "submit order" button, if you'd click it and would instantly say "thanks for your order", people are going to call you. They wonder if the order got through.

3 more replies

vladvasiliu4y ago

> I think a lot of people just accept the performance they get as normal even if they are doing things that take 1000x (or worse) the time and/or space than it could (even without heroic work).

Habit is a very powerful force.

Performance is somewhat abstract, as in "just throw more CPUs at it" / it works for me (on my top of the line PC). But people will happily keep on using unergonomic tools just because they've always done so.

I work for a shop that's mainly Windows (but I'm a Linux guy). I won't even get into how annoying the OS is and how unnecessary, since we're mostly using web apps through Chrome. But pretty much all my colleagues have no issue with using VNC for remote administration of computers.

It's so painful, it hurts to see them do it. And for some reason, they absolutely refuse to use RDP (I'm talking about local connections, over a controlled network). And they don't particularly need to see what the user in front of the computer is seeing, they just need to see that some random app starts or something.

I won't even get into Windows Remote Management and controlling those systems from the comfort of their local terminal with 0 lag.

But for some reason, "we've always done it this way" is stronger than the inconvenience through which they have to suffer every day.

2 more replies

jacobolus4y ago

> only speaks Python or only speaks JS and is completely unaware of the actual performance potential of a system

If you stick to only doing arithmetic and avoid making lots of small objects, javascript engines are pretty fast (really!). The tricky part with doing performance-sensitive work in JS is that it’s hard to reason about the intricacies of JITs and differences between implementations and sometimes subtle mistakes will dramatically bonk performance, but it’s not impossible to be fast.

People building giant towers of indirection and never bothering to profile them is what slows the code down, not running in JS per se.

JS, like other high-level languages, offers convenient features that encourage authors to focus on code clarity and concision by building abstractions out of abstractions out of abstractions, whereas performance is best with simple for loops working over pre-allocated arrays.

1 more reply

spatley4y ago

Agreed that switching to lower level languages give the potential of many orders of magnitude. But the thing that was most enlightening was that removing pandas made a 9900% increase in speed without even a change to language. 20 minutes down to 12 seconds is a very big deal, and I still don't have to remember how to manage pointers.

4 more replies

nonameiguess4y ago

It's interesting to me that two of the top three comments right now are talking about gaining performance benefits by switching from Python to C when the actual article in the link claims he gained a speedup by pulling things out of pandas, which is written in C, and using normal Python list operations.

I would like to see all of the actual code he omitted, because I am skeptical how that would happen. It's been a while since I've used pandas for anything, but it should be pretty fast. The only thing I can think is he was maybe trying to run an apply on a column where the function was something doing Python string processing, or possibly the groupby is on something that isn't a categorical variable and needs to be converted on the fly.

1 more reply

contravariant4y ago

While I would certainly welcome awareness when it comes to performance it's not always useful to make something 1000x faster if it takes even as little as 25% longer to develop. Taking an extra day to make something take 1s instead of an hour is just not always worth it.

Though I will never understand webpages that use more code than you'd reasonably need to implement a performant lisp compiler and build the webpage in that (not that I'm saying that's what they should have done, I just don't understand how they use more code)

2 more replies

vsareto4y ago

That knowledge is often not required to earn a living, so it's not surprising to me at all. My only realistic advice for people lamenting the common lack of this knowledge is to teach it (so you feel like you're making a difference) or put yourself among people with similar interests. Making performance knowledge a requirement to earn a paycheck these days is going to take a hell of a lot of change.

LAC-Tech4y ago

I don't think we can really blame slow languages.

Implementations of languages like javascript, ruby - and I would presume python and php - are a lot faster than they used to be.

I think most slowness is architectural.

1 more reply

journey_161624y ago· 7 in thread

As a front-end developer, I can't help but notice how much useless computation is going on in a fairly popular library - Redux. It's a store of items, if just one tiny items change in the whole store, every subscriber of every item gets notified and a compare function is ran to check if it changes. Perhaps I'm misunderstanding something and not to bash on Redux - I'm sure there are well-deserved reasons it got popular, but to me that just sounds insane and the fact that it got so much widespread adoption perfectly reflects how little care about performance is given nowadays.

I don't use a high-end laptop and I'm not eager to upgrade is because I can relate to the average user of the software I develop. I saw plenty of popular web apps feeling really sluggish.

modo_mario4y ago

>I don't use a high-end laptop and I'm not eager to upgrade is because I can relate to the average user of the software I develop.

Thank you so so much. It's insane how it feels like the speed of much of our software hasn't improved or even regressed despite the gigantic advancements made over the years. People really don't seem to care about this.

I had an argument about it with a senior colleague regarding some industry software. He figured it wasn't worthwhile to improve the speed of some table fetching and calculations that people actually had to wait on since it would only amount to a bit more than a second or so on top of the regular slowness of it all.

A second that had been multiplied on at least 20 pc's each going trough it at least a 100 times a day of more than 260 times each year over at least 10 years so far. Turns out more than 5 million seconds is a lot of man-hours which whilst cheaper than ours amount to manyfold what it would have taken to fix it.

m1el4y ago

Hi, I believe I understand you. If you look at immutable data structures implemented using JS primitives, it will surely look terrible. However, there's a lot of benefit to using a FP approach like Redux.

It's much easier to reason about state updates if all you have is pure functions. It allows you avoid very annoying and hard to catch bugs. I've seen this personally, when replacing a spaghetti component with a straightforward `useReducer` hook.

Unfortunately, we don't really have a performant way to express this pattern in JS (or even in other languages?). You could use something like elm-lang, but it's not as widespread.

acoard4y ago

Right it’s basically just developer convenience.

So from your post it follows that if a developer can reason about the state changes of their app without redux, they should do so if there are performance concerns. Right?

I say this as a webdev who has written pure vanilla Js SPAs a decade ago, and someone who often uses Redux now on most projects today. So I know it’s totally possible to have performant mutable state management on a project that isn’t a mess - that’s how we always did stuff before redux.

2 more replies

korla4y ago

I think developer speed is more important than optimising clock cycles unnecessarily. Generally writing to dom is much much slower than evaluting a few thousand expressions.

For the cases when it's not, use memo.

xaedes4y ago

> I think developer speed is more important than optimising clock cycles unnecessarily.

Developer time is spent once. Users will always have to pay the price of additional run time. For. Each. Single. User. Always.

It scales!

Due to the scale of, e.g. slow front-ends, with millions of users, this takes a HUGE amount of time. Only to save a few hours or days to develop it better.

Having 1 million users each wait a single second is already 11 days. If they have to wait that single second for each interaction, it quickly adds up.

It is also bad for the environment due to scaled up inefficiency and resulting increase of power usage.

3 more replies

blindmute4y ago

Redux is not a new pattern. The pattern is many decades old. The reason we use it now is because computers have become fast enough that it's okay now. Nobody thinks this is a performant pattern; it's only "new" because of how terrible the performance is. This is offset by how easy it is to use. All of this also applies to React and Vue.

swyx4y ago

blaming popular webapps being sluggish after spending a whole paragraph on Redux is a bit of a non sequitur imo. performance issues are multicausal, i hope you can separate criticism of one library from emergent properties of complete products

tpoacher4y ago· 4 in thread

The point about pandas resonates with me.

Don't get me wrong, pandas is a nice library ... but the odd thing is, numpy already has, like, 99% of that functionality built in in the form of structured arrays and records, is super-optimised under the hood, and it's just that nobody uses it or knows anything about it. Most people will have never heard of it.

To me pandas seems to be the sort of library that because popular because it mimics the interface of a popular library from another language that people wanted to migrate to (namely dataframes from R), but that's about it.

Compounding this, is that, it is now becoming an effective library to do things, even if backward, because the network effect means that people are building stuff to work on top of pandas, rather than on top of numpy.

The only times I've had to use pandas in my personal projects was either:

a) when I needed a library that 'used pandas rather than numpy' to hijack a function I couldn't care writing by myself (most recently seaborn heatmaps, and exponentially weighted averages - both relatively trivial things to do with pure numpy, and probably faster, but, eh. Leftpad mentality etc ...)

b) when I knew I'd have to share the code with people who would then be looking for the pandas stuff.

I'm probably wrong, but ...

kortex4y ago

> numpy already has, like, 99% of that functionality built in in the form of structured arrays and records

Respectfully, this is pretty wrong. Pandas does vastly more out of the box than numpy. Off the top of my head: I/O from over a dozen of data formats, joins/merges, sql queries directly to dataframes, sql-like queries on dataframes, index slicing by time, multi-indexes, much more ergonomic grouping/aggregation functions, ergonomic wrappers around common graphing use-cases, rolling windows.

I'm not even really a power user of it, so there's probably a zillion more things it does that numpy can't out of the box, and I don't wanna spend time writing time and validating if an implementation exists.

tpoacher4y ago

Like I said, I'm probably wrong :)

sgillen4y ago

Ironically as a Phd in an ECE department, almost everyone has heard of and uses numpy, but many people have never even heard of pandas!

infra_4y ago

Pandas does a lot, and often times most of it isn’t needed. Basic functionality like Map, Reduce, GroupBy, InnerJoin, LeftJoin, CrossJoin, row or column generators, and transformations between columnar and row based data structures, are often needed but come with a heavy weight library that is not performant when it counts.

Because I needed these operations, I wanted to work with Numpy directly, and didn’t want to write custom implementations each time, I created a library to do it. It also has constructor methods for Python Dicts, any kind of Iterable, CSV, SQL query, pandas DataFrames and Series, or otherwise. As well as destructor methods to generate whatever you need when done. It tries its best to maintain the types you specify, and offers a means to cast as easily as possible. All functions return a single type to allow static type checking. And for performance, there is a “trust me I know what I’m doing” mode for extremely fast access to the data which achieves about a 10x speed up by skipping all data validation steps.

Everything it does outperforms pandas, except for the Joins. It does allow inequality joins and multiple join conditions, but the general solution used isn’t very fast. Anyone reading this who would be interested in improving these component would be welcome to contribute!

https://tafra.readthedocs.io/en/latest/

tiffanyh4y ago· 4 in thread

NIM

NIM should be part of the conversation.

Typically, people trade slower compute time for faster development time.

With NIM, you don’t need to make that trade-off. It allows you to develop in a high-level but get C like performance.

I’m surprise its not more widely used.

41b696ef11134y ago

>I’m surprise its not more widely used.

It's a ~community language without the backing of an 800lb gorilla to offer up both financial and cheerleading support.

I love the idea of Nim, but it is in a real chicken-and-egg problem where it is hard for me to dedicate time to a language I fear will never reach a critical mass.

sergiotapia4y ago

I've used Nim for about 2 years now. It's a wonderful language but it's desperately lacking a proper web framework and a proper ORM. If such a thing existed I would probably drop Elixir for Nim career-wise.

1 more reply

goodpoint4y ago

It's written Nim, not NIM.

dgan4y ago

at that point, almost anything compiled will be at least an order of magnitude faster than python

1 more reply

charlie04y ago· 3 in thread

I've always been tempted to make things fast, but for what I personally do on a day to day basis, it all lands under the category of premature optimization. I suspect this is the case for 90% of development out there. I will optimize, but only after the problem presents itself. Unfortunately, as devs, we need to provide "value to the business". This means cranking out features quickly rather than as performant as possible and leaving those optimization itches for later. I don't like it, but it is what it is.

momojo4y ago

> for what I personally do on a day to day basis, it all lands under the category of premature optimization

Another perspective on premature opt: When my software tool is used for an hour in the middle of a 20-day data pipeline, most optimization becomes negligible unless it's saving time on the scale of hours. And even then, some of my coworkers just shrug and run the job over the weekend.

wildrhythms4y ago

I agree... for a business "fast" means shipping a feature quickly. I have personally seen the convos from upper management where they handwave away or even justify making the application slower or unusable for certain users (usually people in developing countries with crappy devices). Oh it will cost +500KB per page load, but we can ship it in 2 weeks? Sounds good!

1 more reply

blub4y ago

Performance is something that needs to be considered throughout the development cycle. If optimization happens at the end then it’s either a rewrite or a minor concern anyway because the building blocks like frameworks and libraries were already optimized. Or the software is just slow but still sells for other reasons.

1 more reply

pdimitar4y ago· 3 in thread

While I find this comment section fascinating and will read it top to bottom, I can't help but make an observation that such articles often comply with:

    +-------------------------------------------------+
    | People really do love Python to death, do they? |
    +-------------------------------------------------+

I find that extremely weird. As a bystander who never relied on Python for anything important, and as a person who regularly had to wrestle with it and tried to use it several times, the language is non-intuitive in terms of syntax, ecosystem, package management, different language version management, probably 10+ ways to install dependencies by now, subpar standard library and an absolute cosmic-wide Wild West state of things in general. Not to mention people keep making command-line tools with it, ignoring the fact that it often takes 0.3 seconds to even boot.

Why would a programmer that wants semi-predictable productivity choose Python today (or even 10 years ago) remains a mystery to me. (Example: I don't like Go that much but it seems to do everything that Python does, and better.)

Can somebody chime in and give me something better than "I got taught Python in university and never moved on since" or "it pays the bills and I don't want to learn more"?

And please don't give me the fabled "Python is good, you are just biased" crap. Python is, technically and factually and objectively, not that good at all. There are languages out there that do everything that it does much better, and some are pretty popular too (Go, Nim).

I suppose it's the well-trodden path on integrating with pandas and numpy?

Or is it a collective delusion and a self-feeding cycle of "we only ever hired for Python" from companies and "professors teach Python because it's all they know" from universities? Perhaps this is the most plausible explanation -- inertia. Maybe people just want to believe because they are scared they have to learn something else.

I am interested in what people think about why is Python popular regardless of a lot of objective evidence that as a tech it's not impressive at all.

jazzyjackson4y ago

I've started using micropython to interact with embedded arm chips, it's a revelation to interact with hardware through a REPL instead of compiling, transferring, resetting, and writing print statements to serial...

This talk by the creator of micropython [0] gives his reasoning for why to implement python on microcontrollers despite it being hundreds of times slower than C. Starts @ 3:00

- it has nice features like list comprehension, generators, and good exception handling

- it has a big, friendly, helpful community with lots of online learning resources

- it has a shallow but long learning curve. It's easy to get started as a beginner, but you never get bored of the language, there's always more advanced features to learn.

- it has native bitwise operations

- has good distinction between ints and floats, and floats are arbitrary precision, you're not restricted to doubles or even long longs. (I'll add that built in complex numbers is a plus)

- compiled language, so it can be optimized to improve performance

[0] https://www.youtube.com/watch?v=EvGhPmPPzko

1 more reply

Too4y ago

Completely agree that objectively, Python has really bad underlying tech.

Emotionally though (once you have the environment set up), it’s just such a breeze to write it. It’s like executable pseudo code with zero boilerplate. You can focus purely on the algorithms and business logic. Compared to many other languages the line count is often 50-80%, even if you include type annotations! This doesn’t only apply to plain imperative code, using the dynamic features you can also turn it into your own DSL where needed.

Then there is obviously the huge eco-system around it, there is not a single service, file format or database that doesn’t have a good python library for it. While go might have equally wide library choices, I wouldn’t be so sure about nim, go on the other hand has a lot of other wtfs even though it provides a lot of good fresh tech.

Would I use it for a big service with potentially lots of performance requirements? No. But there is no doubt why it’s so popular. For many applications where the the outcome of the program is more important than the performance or environment, like glue code, simple intranet applications or exploratory coding, it is still the perfect choice. You also have to consider what it is replacing, often the alternative would be even worse; bash-scripts, Excel or Matlab.

Another way to put it is that it’s a very good Swiss Army knife that is good at everything but not best at anything.

1 more reply

BeFlatXIII4y ago

How many of the better languages have equal or better readability than Python? IMO, that's the #1 reason for its continued popularity. Python is not full of parentheses, like a lisp, nor is it full of semicolons and brackets for bookkeeping, like most other C-style languages.

1 more reply

wodenokoto4y ago· 3 in thread

Did the author beat pandas group an aggregate by using standard Python lists?

morelisp4y ago

The main optimization at that stage seems to be preallocating the weights. I don't know pandas but such a thing would have been possible without dropping any of the linalg libraries I do know how to use.

I doubt the author's C++ implementations beat BLAS/LAPACK, but since they're not shown I can only guess.

I've done stuff like this before but the tooling is really no fun, somewhere between 2 and 3 I'd just write it all in C++.

Changing the interface just to get parallelism out seems not great - give it to the user for free if the array is long enough - but maybe it was more reasonable for the non-trivial real problem.

BiteCode_dev4y ago

Most likely a missued of Pandas. DF are heavy to create, but calculations on them are fast if you stay in the numpy world and stay vectorized.

2 more replies

bornfreddy4y ago

Yes. Apparently they have used Python lists to beat highly optimized library which builds upon numpy, in C. Yeah right.

Note that I'm not saying that their second version of the code wasn't faster, just that this has nothing to do with python vs. pandas.

2 more replies

dragontamer4y ago· 2 in thread

As a hobby, I still write Win32 programs (WTL framework).

Its hilarious how quickly things work these days if you just used the 90s-era APIs.

Its also fun to play with ControlSpy++ and see the dozens, maybe hundreds, of messages that your Win32 windows receive, and imagine all the function calls that occur in a short period of time (ie: moving your mouse cursor over a button and moving it around a bit).

jansommer4y ago

Win32 is so really, really fast. And with Tiny C Compiler the program compiles and boots faster than the Win10 calculator app takes to start.

pjvsvsrtrxc4y ago

Linux windows get just as many (run xev from a terminal and do the same thing). Our modern processors, even the crappiest Atoms and ARMs, are actually really, really fast.

1 more reply

hamstergene4y ago· 2 in thread

On mobile devices it is more serious than just bad craftsmanship & hurt pride, bad code is short battery life.

Think mobile game that could last 8 hours instead of 2 of it wasn’t doing unnecessary linear searches on timer in JavaScript.

ben_w4y ago

There was one place where a coworker had written a function that converted data from a proprietary format into a SQL database. On some data, this took 20 minutes on the test iPhone. The coworker swore blind it was as optimised as possible and could not possibly go faster, even though it didn't take that long to load from either the original file format or the database in normal use.

By the next morning, I'd found it was doing an O(n^2) operation that, while probably sensible when the app had first been released, was now totally unnecessary and which I could safely remove. That alone reduced the 20 minutes to 200 milliseconds.

(And this is despite that coworker repeatedly emphasising the importance of making the phone battery last as long as possible).

1 more reply

BeFlatXIII4y ago

As the poster child of poor performance even on new iPhones, there is always Pokémon Go (produced with Niantic levels of competence)

user_78324y ago· 2 in thread

I wonder how much power (and resulting CO2 emissions) could be saved if all code had to go through such optimization.

And on a slightly ranty note, Apple's A12z and A14 are still apparently "too weak" to run multiple windows simultaneously :)

MR4D4y ago

That’s a ram issue not a processor issue. At least, that’s according to Apple

https://appleinsider.com/articles/22/06/11/stage-manager-for...

2 more replies

david384y ago

Worse CO2 emissions. You think optimizations are energy free?

3 more replies

eterm4y ago· 2 in thread

It's hard to evaluate this article without seeing the detail of the "algorithm_wizardry", there's no detail here just where it would be interesting.

bertr4nd4y ago

I also found this disappointing. There’s supposedly a 100x speed up to be had going from something in pandas to something using plain python lists but I have no real idea what it is or why it might have produced a speed up. I can guess, but what’s the point of writing an article that just makes me guess at the existence of some hypothetical slow code?

geph20214y ago

The author says:

  "The function looks something like this:"

And then shows some grouping and sorting functions using pandas.

Then he says:

  "I replaced Pandas with simple python lists and implemented the algorithm manually to do the group-by and sort."

I think the point of the first optimization is you can do the relatively expenseive group/sort operations without pandas, and improve performance. For the rest of the article it's just "algorithm_wizardry", which no longer deals with that portion of the code.

1 more reply

justsomeuser4y ago· 2 in thread

I imagine for most web dev’s using a fast memory unsafe language is like taking a bullet train to the local shop to get milk.

Gigachad4y ago

It also doesn’t stop when you reach your destination so you have to jump and roll out. Get it wrong and you die. Questioning this method is widely frowned on.

1 more reply

floucky4y ago

I would say to the local farm, then you have to wait for the cow to be milked (like an external api call...). At the end you just reduced you journey time by 0.1%, and incread you code complexity by 100%.

1 more reply

varispeed4y ago· 2 in thread

I wonder if eventually there is going to be consideration for environment required when building software.

For instance running unoptimised code can eat a lot of energy unnecessarily, which has an impact on carbon footprint.

Do you think we are going to see regulation in this area akin to car emission bands?

Even to an extent that some algorithms would be illegal to use when there are more optimal ways to perform a task? Like using BubbleSort when QuickSort would perform much better.

jcelerier4y ago

> Do you think we are going to see regulation in this area akin to car emission bands?

it has thankfully started: https://www.blauer-engel.de/en/productworld/resources-and-en...

I think KDE's Okular has been one of the first certified software :-)

1 more reply

kzrdude4y ago

Well, there is some rumbling about making proof of work cryptocurrencies illegal, and that falls under this topic.

To some extent they can claim to deliver a unique feature where there is no replacement for the algorithm they are using.

etaioinshrdlu4y ago· 1 in thread

My entire career, we never optimize code as well as we can, we optimize as well as we need to. Obviously the result is that computer performance is only "just okay" despite the hardware being capable of much more. This pattern repeats itself across the industry over decades without changing much.

pjvsvsrtrxc4y ago

The problem is that performance for most common tasks that people do (f.e. browsing the web, opening a word processor, hell even opening an IM app) has gone from "just okay" to "bad" over the past couple of decades despite our computers getting many times more powerful across every possible dimension (from instructions-per-clock to clock-rate to cache-size to memory-speed to memory-size to ...)

For all this decreased performance, what new features do we have to show for it? Oh great, I can search my Start menu and my taskbar had a shiny gradient for a decade.

1 more reply

vjerancrnjak4y ago· 1 in thread

Hmm, interesting that single threaded C++ is 25% of Python exec time. It feels like C++ implementation might have area for improvement.

My usual 1-to-1 translations result in C++ being 1-5% of Python exec time, even on combinatorial stuff.

bbojan4y ago

I recently ported some very simple combinatorial code from Python to Rust. I was expecting around 100x speed up. I was surprised when the code ended running only 14 times faster.

3 more replies

mulmboy4y ago· 1 in thread

Doubtful that moving from vectorised pandas & numpy to vanilla python is faster unless the dataset is small (sub 1k values) or you haven't been mindful of access patterns (that is, you're bad at pandas & numpy)

julius_deane4y ago

but how else do you get to the front page of hn?

xupybd4y ago· 1 in thread

You also have to optimize for the constraints you have. If you're like me then development time is expensive. Is optimizing a function really the best use of that time? Sometimes yes, often no.

Using Pandas in production might make sense if your production system only has a few users. Who cares if 3 people have to wait 20 minutes 4 times a year? But if you're public facing and speed equals user retention then no way can you be that slow.

pjvsvsrtrxc4y ago

> If you're like me then development time is expensive. Is optimizing a function really the best use of that time? Sometimes yes, often no.

Almost always yes, because software is almost always used many more times than it is written. Even if you doubled your dev time to only get a 5% increase of speed at runtime, that's usually worth it!

(Of course, capitalism is really bad at dealing with externalities and it makes our society that much worse. But that's an argument against capitalism, not an argument against optimization.)

1 more reply

vlovich1234y ago· 1 in thread

> extra_compile_args = ["-O3", "-ffast-math", "-march=native", "-fopenmp" ], > Some say -O3 flag is dangerous but that's how we roll

No. O3 is fine. -ffast-math is dangerous.

tomrod4y ago

Why?

2 more replies

reedjosh4y ago· 1 in thread

Python and Pandas are absolutely excellent until you notice you need performance. I say write everything in Python with Pandas until you notice something take 20 seconds.

Then rewrite it with a more performant language or cython hooks.

Developing features quickly is greatly aided by nice tools like Python and Pandas. And these tools make it easy to drop into something better when needed.

Eat your cake and have it too!

ineedasername4y ago

Yes, there have been times that I have called linux command line utilities from python to process something rather do it in python.

aaaaaaaaaaab4y ago· 1 in thread

Developers should be mandated to use artificially slow machines.

april_224y ago

Today, in many cases, it truly is about optimising algorithms instead of building faster machines.

I overheard this quote recently: 'I'd rather have today's algorithms on an old computer, than a new computer with old algorithms'

Someone4y ago· 1 in thread

FTA: Note that the output of this function needs to be computed in less than 500ms for it to even make it to production. I was asked to optimize it.
[…]
Took ~8 seconds to do 1000 calls. Not good at all :(

Isn’t that 8ms per call, way faster than the target performance? Or should that “500ms” be “*500 μs”?

julius_deane4y ago

percentages in the post are wrong too

no surprise pandas was "slow"

hermitcrab4y ago· 1 in thread

I have written some data wrangling software in pure C++. I would like to benchmark it again Pandas to see how the speed compares. Does anyone know if there is a good set of Pandas benchmarks that I can create a comparison to? Even better if it has an R comparison.

hnhn4y ago

The data.table package in R often produces benchmarks that include pandas, e.g. https://h2oai.github.io/db-benchmark/

1 more reply

thanzex4y ago· 1 in thread

If anything this is a testament to how slow python can be, and most importantly how easily it pushes you to write miserably unoptimized code.

It could be a bit overkill, but whenever I'm writing code on top of optimizing data structures and memory allocations I always try to minimize the use of if statements to reduce the possibility of branch prediction errors. Seeing woefully unoptimized python code being used in a production environment just breaks my heart.

baobob4y ago

the CPU branch predictor is so many levels down it will have almost no discernible effect on anything you might call a branch in Python code. Even a statement like "a = 1" likely executes a few tens if not a few hundred branches

That is not to say aiming for generally unbranchy code is not a good thing - that often implies well designed code and well chosen data structures anyway

liprais4y ago· 1 in thread

most likely misused pandas / numpy,as long as you stay in numpy land,it is quite fast.

dfgqeryqe4y ago

Use C or C++ or Rust or even Java and you don't have to worry about any of this. You can just write the obvious thing with your normal set of tools and it will be good enough.

1 more reply

tintor4y ago· 1 in thread

TLDR: How to optimize Python function? Use C++.

julius_deane4y ago

Coming up next: Want to ship a C++ project this quarter? Use Python.

bcatanzaro4y ago

1. There's no real limit to how slow you can make code. So that means there can be surprising large speedups if you start from very slow code.

2. But, there is a real limit to the speed of a particular piece of code. You can try finding it with a roofline model, for example. This post didn't do that. So we don't know if 201ms is good for this benchmark. It could still be very slow.

Taywee4y ago

> It's crazy how fast pure C++ can be. We have reduced the time for the computation by ~119%!

The pure C++ version is so fast, it finishes before you even start it!

pointernil4y ago

Maybe it's been stated already by someone else here but I really hope that CO2 pricing on the major Cloud platforms will help with this. It boils down to resources used (like energy) and waste/CO2 generated.

Software/System Developers using 'good enough' stacks/solutions are externalising costs for their own benefit.

Making those externalities transparent will drive alot of the transformation needed.

andrewclunn4y ago

How are we supposed to optimize coding languages, when the underlying hardware architecture keeps changing? I mean you don't write assembly anymore, you would right in the LLVM. Optimization was done because it was required. It will come back when complete commoditization of cpus occur. Enforcement of standards and consistent targets allow for high optimizations. Just see what people are able to do with outdated hardware in the demo and homebrew scene for old game consoles! We don't need better computers, but so long as we keep getting them, we will get unoptimized software, which will necessitate better computers. The vicious cycle of consumerism continues.

illys4y ago

I am amazed by the discussions below on computer performance vs. software inefficiency: I remember the same discussions and arguments about software running on 8088 vs 80286 vs 80386 vs i486 vs Pentium... and so on.

You could have had those discussion at anytime since the upgraded computers and microprocessors have become compatible with the previous generation (i.e. the x86 and PC lines).

The point is that software efficiency measurement has never changed: it is human patience. The developers and their bosses decide the user can wait a reasonable time for the provided service. It is one-to-five seconds for non-real-time applications, it is often about a target framerate or refresh in 3D or real-time applications... The optimization stops when the target is met with current hardware, no matter how powerful it is.

This measure drives the use of programming languages, libraries, data load... all getting heavier and heavier when more processing power gets available. And that will probably never change.

Not sure about it? Just open your browser debugger on the Network tab and load the Google homepage (a field, a logo and 2 buttons). I just did: 2.2 MB, loaded in 2 seconds. It is sized for current hardware and 100 Mbps fiber, not for the actually provided service!

djmips4y ago

Bingo. It's not that software engineers are stupid, it's that they don't 'see' when they do something stupid and don't have a good mental model because of that lack of sight. Everyone figures out quickly to efficiently clean out their garage or other repetitive chores because it's personally painful to do it poorly and it's right in front of your nose. If only computers were more transparent and/or people learned and used profilers daily...

muziq4y ago

This afternoon, discussing with my boss, why issuing two x 64 byte loads per cycle is pushing it; to the point where l1 says no.. 400GB of l1 bandwidth is all we have.. Is all we have.. I remember when we could move maybe 50KB/s.. Ans that was more than enough..

mg4y ago

Good example is this high performance Fizz Buzz challenge:

https://codegolf.stackexchange.com/questions/215216/high-thr...

An optimized assembler implementation is 500 times faster than a naive Python implementation.

By the way, it is still missing a Javascript entry!

abraxas4y ago

Yep, many (especially younger) programmers don't get the "feel" for how fast things should run and as a result often "optimize" things horribly by either "scaling out" i.e. running things on clusters way larger than the problem justifies or putting queuing in front and dealing with the wait.

modeless4y ago

Now do it on the GPU. There's at least a factor of 10 more there. And a lot of things people think aren't possible with GPUs are actually possible.

tonto4y ago

Also fun: test your intuition on the speed of basic operations https://computers-are-fast.github.io/

ineedasername4y ago

Slow Code Conjecture: inefficient code slows down computers incrementally such that any increase in computer power is offset by slower code.

This is for normal computer tasks-- browser, desktop applications, UI. The exception to this seem to be tasks that were previously bottlenecked by HDD speeds which have been much improved by solid state disks.

It amazes me, for example, that keeping a dozen miscellaneous tabs open in Chrome will eat roughly the same amount of idling CPU time as a dozen tabs did a decade ago, while RAM usage is 5-10x higher.

fullstackchris4y ago

And if you wrote your instructions in assembly, it would be even faster!

Sorry for the rude sarcasm, but isn't this a post truly just about the efficiency pitfalls of Python? (or any language / framework choice for that matter)

Of course modern computers are lightning fast. The overhead of every language, framework, and tool will add significant additional compute however, reducing this lightning speed more and more with each complex abstraction level.

I don't know, I guess I'm just surprised this post is so popular, this stuff seems quite obvious.

Ultimatt4y ago

The missing comparison is to Numpy and Numba as the first optimisation post pandas... I suspect nothing else there would beat it.

pelorat4y ago

Maybe, stop using Python for anything but better shell scripts? Pretty sure it was invented to be a bash replacement.

physicsguy4y ago

The return value on the function in C++ is of the wrong type :)

I agree though. I used these tricks a lot in scientific computing. Go to the world outside and people are just unaware. With that said - there is a cost to introducing those tricks. Either in needing your team to learn new tools and techniques, maintaining the build process across different operating systems, etc. - Python extension modules on Windows for e.g. are still a PITA if you’re not able to use Conda.

wdroz4y ago

If you are unhappy with pandas, give a try to polars[0] it's so fast!

[0] -- https://www.pola.rs/

avianes4y ago

When I have to explain the speed of a processor to a neophyte I always begin by avoiding using GHz unit which has the weakness of hiding the magnitude of the number, so I explain things in terms of billions of cycles each second.

As an example, with an ILP ~4 instruction/cycle at 5GHz we get 20 billion instructions executed each second in a single core. This number is not really tangible but it shocks

FirstLvR4y ago

This is exactly what I was dealing last year, some particular costumer came to meeting with the idea developers has to be aware of making the code Inclusive and sustainable... We told them that we must set priorities on the performance and the literal result from the operation (a transaction development from an integration)

Nothing really happened at the end but it's a funny history in the office

quickthrower24y ago

It is fitting that it is hosted on bearblog.dev, which produces fast minimal sites. It is my favorite blogging platform so far.

Zetaphor4y ago

For some reason my employer is blocking this site as malware using Cisco's OpenDNS service

xvilka4y ago

And all they are used for to run slow and heavy Electron "apps" with three buttons.

FpUser4y ago

"...but you do not know it"

Believe me I do. This is why my backends are single file native C++ with no Docker/VM/etc. The performance on decent hardware (dedicated servers rented from OVH/Hetzner/Selfhost) is nothing short of amazing.

lucidguppy4y ago

If your fast language is talking to a database how fast will your language be?

Shorel4y ago

The fact that now AWS CPU cost is a constant consideration in software development is making developers use better algorithms and languages, a trend that seems the opposite of the 2010s.

bawolff4y ago

It feels like the gist of this article is just, don't use python.

Havoc4y ago

Yes in general for me the limitation is now my ability / Knowledge.

Every cloud / SaaS is throwing free tier compute capacity at people and it’s just overwhelming (in a good way I suppose)

dqpb4y ago

Netsuite takes upwards of 10 seconds to transition from one blank page to another.

thrwyoilarticle4y ago

>Optimization 3: Writing your function in pure C++

>double score_array[]

javajosh4y ago

That's really cool but I somewhat resent the use of percentages here. Just use a straight factor or even better just the order of magnitude. In this case it's four orders of magnitude of an improvement.

streamlining4y ago

For years I stuck with MATE, Xfce4, LXQT, etc. to get optimal performance on old hardware but nothing can top a tiling window manager.

With Nixos I switch between Gnome 40 (I do like the Gnome workflow) and i3 w/ some Xfce4 packages, but lately on my older machine the performance of Gnome (especially while running Firefox) is so sluggish in comparison that I may have switched back permanently now.

newaccount20214y ago

where I work, every frontend dev has a 64gb ram/2tb ssd/multicore laptop to develop web pages...everything is lightning fast apparently!...so they never do performance engineering of any kind

j / k navigate · click thread line to collapse

795 comments

255 comments · 62 top-level

jerf4y ago· 72 in thread

munificent4y ago

Once I got that working... I realized I didn't even need to implement incremental rebuilds. It could build the entire blog and every single post from scratch in less than a second.

I don't know how people tolerate slow frameworks and build systems.

throwaway8943454y ago

1 more reply

jcelerier4y ago

2 more replies

michaelchisari4y ago

kllrnohj4y ago

Also those languages show you don't actually have to give up modern features or even that much convenience in order to get blazing fast speeds.

necovek4y ago

In a sense, knowing this can also hurt you.

And this is even with dynamic languages like Python (you see pytest reporting how your unit test completed in 0.00s, and wall time is 7s).

And then I get grumpy if they don't let me go and fix it because I am the only one who is that annoyed with this :D

1 more reply

fullstackchris4y ago

1 more reply

Aeolun4y ago

This was about my experience switching from webpack to ESBuild for Javascript. Why do incremental builds if rebuilding the whole thing takes just 2s (as opposed to 90+ with webpack).

nyanpasu644y ago

I wish C++ compilers written in C++ were blazing fast too.

1 more reply

latenightcoding4y ago

off topic but I initially didn't notice your username but the second I read "I wrote my own template language in Dart" I knew who it was.

1 more reply

zozbot2344y ago

Please don't write programs in bare C. Use Go if you're looking for something very simple and fast-enough for most uses; it's even memory safe as long as you avoid shared-state concurrency.

6 more replies

Guest190238924y ago

It really made me wonder how it would be running something like Windows 98 and websites of the past on modern hardware.

nequo4y ago

I wonder if you'd have any more luck with that hardware putting Ubuntu Mate on it. For basic web browsing, it probably wouldn't matter much to your family whether it's running Windows or Linux.

2 more replies

speedgoose4y ago

Old windows run a bit slow on a web browser: https://copy.sh/v86/?profile=windows98 or https://bellard.org/jslinux/vm.html?url=win2k.cfg&mem=192&gr...

babypuncher4y ago

Throw in more RAM and Windows 10 will likely feel snappier than Windows 7 did.

4 more replies

xen2xen14y ago

Windows 10 or 11 with 4gb of RAM is a BAD idea. 8 gb is a minimum. Found that out several times.

1 more reply

hnick4y ago

Try Win-R and type "notepad", at a reasonably fast programmer's pace. It consistently loses "no" for me, sometimes more if it's feeling particularly slow.

dataflow4y ago

You'll want to stop using the new start menu. Use OpenShell. It's fast and even better than the old menus.

ishjoh4y ago

In a similar vein I installed Ubuntu on an older laptop that had been running Windows 10. I was shocked at how fast it was compared to Windows 10, it was night and day.

askafriend4y ago

Let the caches warm up a little!

1 more reply

moffkalast4y ago

A lot of things remained slow though.

4 more replies

MikusR4y ago

10 years of malware definition updates. 10 years of countless security additions. Every operation needs to be checked for correction, memory safety etc.

kossTKR4y ago

I hope one day latency in general will be "back to normal".

I still remember how fast console based computing, an old gameboy or a 90's macintosh would be - click a button and stuff would show up instantly.

There was a tactility present with computers that's gone today.

LoveMortuus4y ago

May I ask if you're using the M1 based MacBook or the Intel one?

I'm asking because I've been thinking of getting a MacBook Air in the future with the intent to use it for writing.

3 more replies

41b696ef11134y ago

>Hopefully the focus on 100hz+ screens in tech

Come again? I think anything beyond 60hz still qualifies as niche. Vendors are still selling 720p laptops.

3 more replies

hnick4y ago

For me, there is far more latency on typical operations, but far less waiting for longer intensive operations like opening a program/tab or saving a file (bloat aside, some are guilty here).

I'd also prefer the sluggishness gone if I had my choice between the two.

marcosdumay4y ago

What do those tools even do for that long? They can read enough data from the disk to overflow my computer's main memory a few times during it.

m12k4y ago

I heard optimization described this way: Sure, you think you need to tune the engine, but really, the first thing you need to do is get the clowns out of the car.

grishka4y ago

By the way, Apple isn't much better. Xcode takes around 15 seconds to launch on an M1 Max.

edit: probably this video https://youtu.be/j_4iTovYJtc?t=282

6 more replies

knorker4y ago

Not that it invalidates anything you said, but it was 750ms vs 200 microseconds.

But yeah. I agree. Why does Lightroom take forever to load, when I can query its backing SQLite in no time at all?

And that's not even mentioning the RAM elephant in the room: chrome.

Younglings today don't understand what a mindbogglingly large amount of data a GB is.

But here's the thing: it's cheaper to waste thousands of CPU cores on bad performance than to have an engineer spend a day optimizing it.

4 more replies

rasz4y ago

The result is usually one CPU core running at 40% with sporadic disk access while you stare at Loading progress bar.

blktiger4y ago

I've always assumed they are loading a bunch of stuff into caches and pre-computing things.

1 more reply

chopin4y ago

Phone home. I suspect much of the lag is network latency.

6 more replies

mrtranscendence4y ago

> And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.

It's great that you've been able to switch to higher-performance languages, but not everyone can do that easily enough to make it worth doing.

morelisp4y ago

1 more reply

stefanos824y ago

> Nobody has created a language that is both thousands of times faster than Python and nearly as straightforward to learn and to use.

Not Python-based, but Lua-based is Nelua [1]

If you like Lua's syntax, LISP's metaprogramming abilities, and C's performance, well there you have it!

[1] https://github.com/edubart/nelua-lang

alar444y ago

Gibbon14y ago

C# is faster than python and as easy to use.

4 more replies

divan4y ago

Go hit a really sweet spot here.

pjmlp4y ago

It is called Common Lisp, maybe you have heard of it.

V1ndaar4y ago

I don't know, but imo Nim begs to differ.

lmm4y ago

Dylan168074y ago

Those speed differences don't come from language, they come from using terribly wrong data structures and algorithms.

pjvsvsrtrxc4y ago

doliveira4y ago

During work I'm constantly reminded of how much better and snappier MSN Messenger was when compared to Teams.

lynguist4y ago

I agree except for the Python bit, which is factually wrong.

Python allows you to program as if you’re a jazz pianist. You can improvise, iterate and have fun.

And when you found a solution you just refactor it and use numba. Boom, it runs the same speed as a compiled language.

I once wrote one little program that ran in 24 min without numba and ca. 8 seconds with numba.

3234y ago

There is more to programming than iterating over number arrays, which is pretty much the only scenario where Numba shows impressive speedups.

1 more reply

willis9364y ago

Dozens of instances of a C GUI can launch in the time it takes to launch a hello world python program.

2 more replies

Tade04y ago

> And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.

At least with Chrome's V8, the difference is not that big.

Sure, it loses to C/C++, because it can't vectorize and uses orders of magnitude more memory, but at least in the Computer Language Benchmarks Game it's "just" 2-4x slower.

It's not nearly as visible in web apps, because these in turn are often slowed down by the HTTP connection limit(hardcoded to six in most browsers).

jerf4y ago

Languages top out at around 50x, and that's the extreme of pure CPython to C.

For as many factors of magnitude as I am talking about, you have to be screwing up algorithms, networks, and a whole bunch of other things too.

Yajirobe4y ago

Python allows one to save development time in exchange for execution time

wizofaus4y ago

pavon4y ago

zasdffaa4y ago

Get beyond a certain size of python program and you lose dev time.

IOW you lose both. It's not a huge size either.

1 more reply

david4224y ago

But it just comes back to bite you later on maintenance costs.

1 more reply

morelisp4y ago

Python hasn't saved me development time since distutils was the right and only way to build things.

stonogo4y ago

1 more reply

solatic4y ago

> any task that speed is even remotely a consideration for anymore

How do you know whether or not speed is a consideration?

Yes, OP delivered impressive efficiency gains. I'm sure he could improve the efficiency even more by dropping into pure Assembly.

But is it worth it?

The prime consideration is not execution speed but maintainability. The further that OP got away from pure Python, the more difficult to maintain the code became. That's a downside.

Tomis024y ago

> The prime consideration is not execution speed but maintainability.

agumonkey4y ago

Remember when people were counting cpu cycles and instruction size to ensure performance ?

SV_BubbleTime4y ago

I program for embedded… still do that.

1 more reply

fmakunbound4y ago

I’m afraid we just “deploy more pods” these days

dekhn4y ago

geysersam4y ago

yoyohello134y ago

I felt this pretty viscerally recently. I did Advent of Code 2021 in python last year. My day job is programming in Python so I didn't really think about the execution speed of my solutions much.

Although, I still find Python much easier to read and maintain, but that may just be I'm more experienced with the language.

jeroenhd4y ago

> Although, I still find Python much easier to read and maintain, but that may just be I'm more experienced with the language.

Python is definitely easier to read and maintain if you have loads of dependencies. C dependency management is a pain.

pjmlp4y ago

I learned that lesson with Tcl and Perl, 20 years ago.

If there isn't a compiler in the box (JIT or AOT), I won't be using language XYZ, unless forced by customers.

The only reason I use Python is for UNIX scripting.

tbran4y ago

Haha, what C/C++ web framework should I use instead of Django/Rails/JS-whatever? Performance is a consideration, but I'm not going to reinvent a bunch of packages because of it.

yieldcrv4y ago

So you dont need to pretty much ever reinvent or even use a hackerrank algorithm, you need to understand that the database compute instance has a fast cpu and lots of RAM too

divan4y ago

So much this.

I wonder what would be the software engineering landscape today if hardware specs were growing like 10% per year...

1vuio0pswjnm74y ago

vbezhenar4y ago

Back in time all you needed for perfect performance is to use C and proper algorithms. It was easy.

Nowadays you need vector operations, you need to utilise GPU, you need to utilise various accelerators. For me it is black magic.

5 more replies

skohan4y ago· 27 in thread

I was working in C, and looking back I came up with a quite performant solution mostly by accident: all the memory allocated up front in a very cache-friendly way.

I think before then I had a mental model of a little person inside the CPU looking over each line of code and dutifully executing it, and that was a real eye-opener about how computers actually work.

rufus_foreman4y ago

>> The first time I ran the program, it finished in a couple seconds. I was sure something must have failed

Balgair4y ago

If I recall correctly, the first few ATMs near Wall Street had the same issue. They were too fast and people were suspicious. They had to add in a delay so folks would feel alright using them.

1 more reply

winternett4y ago

I blame all the ads, tracking, and bloatware that is prevalent now most of all.

sparker726784y ago

Gosh, yes, this. There was a brief era where websites loaded instantly on those fast network connections, and it was glorious.

It's bananas how slow the web is today on average when you're on a symmetric gigabit connection.

meshaneian4y ago

Just like freeways - add more traffic lanes, get more traffic.

1 more reply

vishnugupta4y ago

For most of my work CPUs form the last decade will work just fine. It’s the memory and, especially, disk IO that kills the performance. SSDs have helped big time.

von_lohengramm4y ago

4 more replies

ano888884y ago

Why can't we have a language easy to read and maintain but also have the speed of C?

flohofwoe4y ago

3 more replies

paskozdilar4y ago

Go fills that spot for me.

2 more replies

jakobnissen4y ago

mcbuilder4y ago

1 more reply

pjmlp4y ago

Pascal dialects, Modula-2 are of a similar age, while other like JOVIAL are a decade older but they did not come with a very big killer feature, an OS like UNIX.

Then there were BLISS, Mesa and PL/I, but the OSes that made use of them lost to UNIX, so.

With exception of Mac OS, written in Object Pascal and later ported to a mix of Object Pascal and C++.

Having said this, plenty of alternatives with AOT compilers exist nowadays.

The only thing C has going for it, is historical weight, UNIX/POSIX ecosystem, and some domains that are closed to any alternative suggestions, due to tooling or cargo cult against alternatives.

Banana6994y ago

Shorel4y ago

For many things, I have found this easy language is C++.

I use JavaScript and C++ for different things, sometimes in the same day. (And python and PHP and others, but this is not relevant.)

Believe me, JavaScript can be a real head scratcher compared to C++.

foerbert4y ago

Who says we don't? I think I'm past-due for my regular Ada plug, so uh, here it is.

xigoi4y ago

We can.

https://nim-lang.org/

FeepingCreature4y ago

D as well, though only (like Rust) if you keep the traits and template stuff low.

The nice thing about D for me is that you can generally banish the unreadable metaprogramming code to a library.

anfelor4y ago

Ocaml 5 and Koka (https://koka-lang.github.io/koka/doc/index.html) can get quite close to the speed of C...

1 more reply

mikewarot4y ago

>Why can't we have a language easy to read and maintain but also have the speed of C?

Modern versions, such as Free Pascal, Delphi and Lazarus also deal with strings in a manner that totally avoids needing to manually manage memory. The GUI builders are awesome as well.

killingtime744y ago

Transpile to C? Zig or Rust?

3 more replies

thorncorona4y ago

Julia is that for a lot of academia + mathematics based code.

meheleventyone4y ago

Zig!

corford4y ago

We do. It's called D

1 more reply

jayp14184y ago

Ada Programming Language

enriquto4y ago

> Why can't we have a language easy to read and maintain but also have the speed of C?

Fortran is fine. Also lua (using the luajit interpreter you get really close to C speed) and julia (except for the atrocious startup time).

rboes4y ago

I did that assignment. Did you go to uw?

forinti4y ago· 20 in thread

On a 3GHz CPU, one clock cycle is enough time for light to travel only 10cm.

If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

dragontamer4y ago

> If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

The latency on multiplication (register input to register output) is 5-clock ticks, and many computers are 4GHz or 5GHz these days.

5-clock cycles at 5GHz is 1ns, which is 30-centimeters of light travel.

If we include L1 cache read and L1 cache write, IIRC its 4 clock cycles for read + 4 more for the write. So 13 clock ticks, which is almost 70 centimeters.

------------

DDR4 read and L1 cache write will add 50 nanoseconds (~250 cycles) of delay, and we're up to 13 meters.

And now you know why cache exists, otherwise computers will be waiting on DDR4 RAM all day, rather than doing work.

moonchild4y ago

> The latency on multiplication (register input to register output) is 5-clock ticks

https://www.agner.org/optimize/instruction_tables.pdf

2 more replies

reacharavindh4y ago

Are there any directives to the Operating System to say - “here keep this data in the fastest accessible L[1,2,3] please”?

7 more replies

generalizations4y ago

I ran across an animation once that showed graphically the time it takes light to travel between the planets and the sun. It's weird, but light doesn't seem that fast anymore.

ineedasername4y ago

The speed of light has really not kept pace with Moore's Law. Engineers have focused overly much on clock speed and transistor density and completely ignored C, and it's really beginning to show.

3 more replies

somenameforme4y ago

It's interesting to consider this paired against how technologically primitive we ostensibly must be, given that digital computers didn't even exist 90 years ago.

1 more reply

edbaskerville4y ago

2 more replies

shultays4y ago

I feel like it is more like we cant comprehend how big and empty space is

booleandilemma4y ago

I saw that exact animation you're talking about and my first thought was "there's no way that's the fastest thing in the universe."

generalizations4y ago

Edit: found it: https://www.youtube.com/watch?v=nQUwHdSAhmw

capableweb4y ago

drewtato4y ago

diroussel4y ago

That is quite an amazing way to put it.

So the processor in my hand can compute a multiplication fast than light can cross the room?

jiggawatts4y ago

1 more reply

wruza4y ago

Not so extraordinary if you take into account that CPU is essentially a thumb-sized labyrinth for light.

1 more reply

dekhn4y ago

denzil4y ago

This comparison reminds me of Grace Hopper explaining how long nanosecond is: https://www.youtube.com/watch?v=9eyFDBPk4Yw

grishka4y ago

orestarod4y ago

google2341234y ago

I say we ban all junior devs.

1 more reply

w0mbat4y ago· 9 in thread

Article says at one point, "We have reduced the time for the computation by ~119%!", which is impossible. If you reduce it by 100% it is taking zero time already.

necovek4y ago

People like to talk in percentages when it's obviously unclear what it means, and they frequently get it wrong.

It gets even better when people start switching between percentages and "percentage points" referring to a measure that's in percentages originally.

Finally, I am 94.8% certain people will keep using percentages even where inappropriate, and with too much precision too!

Agentlien4y ago

blindmute4y ago

I like to write "Takes X% of the time it took before" for maximum disambiguation

nerdbaggy4y ago

I always get confused by stuff with this. 100% would actually be 50% in the context you are thinking. https://math.stackexchange.com/a/1404242

Panzer044y ago

"faster" vs "reduced time". Many people confuse rate of work with reduction in time, and it's exceptionally annoying :(

necovek4y ago

But it's not only that. Nothing can be improved to be 119% faster either. Maybe the new result makes the old one 119% slower.

It's about language use and what of are those per-cents (per-hundredths).

1 more reply

pindab0ter4y ago

Bonus points if the 'speed is faster'.

2 more replies

cozzyd4y ago

That branch predictor is working really well!

danybittel4y ago

2 more replies

jiggawatts4y ago· 9 in thread

Something all architecture astronauts deploying microservices on Kubernetes should try is benchmarking the latency of function calls.

E.g.: call a "ping" function that does no computation using different styles.

In-process function call.

In-process virtual ("abstract") function.

Cross-process RPC call in the same operating system.

Cross-VM call on the same box (2 VMs on the same host).

Remote call across a network switch.

Remote call across a firewall and a load balancer.

Remote call across the above, but with HTTPS and JSON encoding.

Same as above, but across Availability Zones.

Yet, every enterprise architecture you will ever see, without exception has layers and layers, hop upon hop, and everything is HTTPS and JSON as far as the eye can see.

nickjj4y ago

I think folks often make trade offs with their working requirements.

2 more replies

closeparen4y ago

> in the typical case can only provide about 300-600 calls per second to a function that does nothing

1 more reply

taeric4y ago

Granted, this is exacerbated when architectures don't make a good division between control/compute/data planes.

treeman794y ago

Many many years ago I wrote the companies first do not call list cleaner. Feed it a list phone numbers and it would give you the ones not on the DNC.

Converted the list to band performed a simple binary search to find it.

A basic python script. could handle about 4,000 records a second.

Corporate IT reached out to Oracle. Built a custom solution that cost probably a couple hundred thousand.

They tried to force us to use it. They were a little upset when I asked if they could up the performance by a few thousand percent.

I was on their shit list after that until I had leave.

1 more reply

Aeolun4y ago

> can blow this out to 3+ milliseconds surprisingly easily

And then your actual function starts, and returns after roughly 10s.

I think you underestimate just how inefficient enterprise can be. The extra time taken in connections between layers is not even a consideration.

eastbound4y ago

[1] https://en.wikipedia.org/wiki/COMEFROM

jacobolus4y ago

> Performance only goes downhill from here if the function does more work,

Should be the opposite. Overhead as a proportion of total time goes down the more useful work is involved.

1 more reply

ben0x5394y ago

> Performance only goes downhill from here if the function does more work,

Doesn't this mean it's less of a problem? Like, isn't that good?

1 more reply

travisgriggs4y ago

All for the low low price of letting someone derive analytics on your call graph. What a deal!

porcoda4y ago· 8 in thread

fleddr4y ago

I ran the program and as I ran it, it was done. I kept upping the scope and load but it seems I could throw anything at it and the response time was zero, or something perceived as zero.

Meanwhile, it's 2022 and we can't even have a text editor place a character on screen without noticeable lag.

3 more replies

vladvasiliu4y ago

> I think a lot of people just accept the performance they get as normal even if they are doing things that take 1000x (or worse) the time and/or space than it could (even without heroic work).

Habit is a very powerful force.

I won't even get into Windows Remote Management and controlling those systems from the comfort of their local terminal with 0 lag.

But for some reason, "we've always done it this way" is stronger than the inconvenience through which they have to suffer every day.

2 more replies

jacobolus4y ago

> only speaks Python or only speaks JS and is completely unaware of the actual performance potential of a system

People building giant towers of indirection and never bothering to profile them is what slows the code down, not running in JS per se.

spatley4y ago

nonameiguess4y ago

contravariant4y ago

vsareto4y ago

LAC-Tech4y ago

I don't think we can really blame slow languages.

Implementations of languages like javascript, ruby - and I would presume python and php - are a lot faster than they used to be.

I think most slowness is architectural.

1 more reply

journey_161624y ago· 7 in thread

I don't use a high-end laptop and I'm not eager to upgrade is because I can relate to the average user of the software I develop. I saw plenty of popular web apps feeling really sluggish.

modo_mario4y ago

>I don't use a high-end laptop and I'm not eager to upgrade is because I can relate to the average user of the software I develop.

m1el4y ago

Unfortunately, we don't really have a performant way to express this pattern in JS (or even in other languages?). You could use something like elm-lang, but it's not as widespread.

acoard4y ago

Right it’s basically just developer convenience.

So from your post it follows that if a developer can reason about the state changes of their app without redux, they should do so if there are performance concerns. Right?

2 more replies

korla4y ago

I think developer speed is more important than optimising clock cycles unnecessarily. Generally writing to dom is much much slower than evaluting a few thousand expressions.

For the cases when it's not, use memo.

xaedes4y ago

> I think developer speed is more important than optimising clock cycles unnecessarily.

Developer time is spent once. Users will always have to pay the price of additional run time. For. Each. Single. User. Always.

It scales!

Due to the scale of, e.g. slow front-ends, with millions of users, this takes a HUGE amount of time. Only to save a few hours or days to develop it better.

Having 1 million users each wait a single second is already 11 days. If they have to wait that single second for each interaction, it quickly adds up.

It is also bad for the environment due to scaled up inefficiency and resulting increase of power usage.

3 more replies

blindmute4y ago

swyx4y ago

tpoacher4y ago· 4 in thread

The point about pandas resonates with me.

The only times I've had to use pandas in my personal projects was either:

b) when I knew I'd have to share the code with people who would then be looking for the pandas stuff.

I'm probably wrong, but ...

kortex4y ago

> numpy already has, like, 99% of that functionality built in in the form of structured arrays and records

tpoacher4y ago

Like I said, I'm probably wrong :)

sgillen4y ago

Ironically as a Phd in an ECE department, almost everyone has heard of and uses numpy, but many people have never even heard of pandas!

infra_4y ago

https://tafra.readthedocs.io/en/latest/

tiffanyh4y ago· 4 in thread

NIM

NIM should be part of the conversation.

Typically, people trade slower compute time for faster development time.

With NIM, you don’t need to make that trade-off. It allows you to develop in a high-level but get C like performance.

I’m surprise its not more widely used.

41b696ef11134y ago

>I’m surprise its not more widely used.

It's a ~community language without the backing of an 800lb gorilla to offer up both financial and cheerleading support.

I love the idea of Nim, but it is in a real chicken-and-egg problem where it is hard for me to dedicate time to a language I fear will never reach a critical mass.

sergiotapia4y ago

1 more reply

goodpoint4y ago

It's written Nim, not NIM.

dgan4y ago

at that point, almost anything compiled will be at least an order of magnitude faster than python

1 more reply

charlie04y ago· 3 in thread

momojo4y ago

> for what I personally do on a day to day basis, it all lands under the category of premature optimization

wildrhythms4y ago

1 more reply

blub4y ago

1 more reply

pdimitar4y ago· 3 in thread

While I find this comment section fascinating and will read it top to bottom, I can't help but make an observation that such articles often comply with:

    +-------------------------------------------------+
    | People really do love Python to death, do they? |
    +-------------------------------------------------+

Can somebody chime in and give me something better than "I got taught Python in university and never moved on since" or "it pays the bills and I don't want to learn more"?

I suppose it's the well-trodden path on integrating with pandas and numpy?

I am interested in what people think about why is Python popular regardless of a lot of objective evidence that as a tech it's not impressive at all.

jazzyjackson4y ago

This talk by the creator of micropython [0] gives his reasoning for why to implement python on microcontrollers despite it being hundreds of times slower than C. Starts @ 3:00

- it has nice features like list comprehension, generators, and good exception handling

- it has a big, friendly, helpful community with lots of online learning resources

- it has a shallow but long learning curve. It's easy to get started as a beginner, but you never get bored of the language, there's always more advanced features to learn.

- it has native bitwise operations

- has good distinction between ints and floats, and floats are arbitrary precision, you're not restricted to doubles or even long longs. (I'll add that built in complex numbers is a plus)

- compiled language, so it can be optimized to improve performance

[0] https://www.youtube.com/watch?v=EvGhPmPPzko

1 more reply

Too4y ago

Completely agree that objectively, Python has really bad underlying tech.

Another way to put it is that it’s a very good Swiss Army knife that is good at everything but not best at anything.

1 more reply

BeFlatXIII4y ago

1 more reply

wodenokoto4y ago· 3 in thread

Did the author beat pandas group an aggregate by using standard Python lists?

morelisp4y ago

I doubt the author's C++ implementations beat BLAS/LAPACK, but since they're not shown I can only guess.

I've done stuff like this before but the tooling is really no fun, somewhere between 2 and 3 I'd just write it all in C++.

Changing the interface just to get parallelism out seems not great - give it to the user for free if the array is long enough - but maybe it was more reasonable for the non-trivial real problem.

BiteCode_dev4y ago

Most likely a missued of Pandas. DF are heavy to create, but calculations on them are fast if you stay in the numpy world and stay vectorized.

2 more replies

bornfreddy4y ago

Yes. Apparently they have used Python lists to beat highly optimized library which builds upon numpy, in C. Yeah right.

Note that I'm not saying that their second version of the code wasn't faster, just that this has nothing to do with python vs. pandas.

2 more replies

dragontamer4y ago· 2 in thread

As a hobby, I still write Win32 programs (WTL framework).

Its hilarious how quickly things work these days if you just used the 90s-era APIs.

jansommer4y ago

Win32 is so really, really fast. And with Tiny C Compiler the program compiles and boots faster than the Win10 calculator app takes to start.

pjvsvsrtrxc4y ago

Linux windows get just as many (run xev from a terminal and do the same thing). Our modern processors, even the crappiest Atoms and ARMs, are actually really, really fast.

1 more reply

hamstergene4y ago· 2 in thread

On mobile devices it is more serious than just bad craftsmanship & hurt pride, bad code is short battery life.

Think mobile game that could last 8 hours instead of 2 of it wasn’t doing unnecessary linear searches on timer in JavaScript.

ben_w4y ago

(And this is despite that coworker repeatedly emphasising the importance of making the phone battery last as long as possible).

1 more reply

BeFlatXIII4y ago

As the poster child of poor performance even on new iPhones, there is always Pokémon Go (produced with Niantic levels of competence)

user_78324y ago· 2 in thread

I wonder how much power (and resulting CO2 emissions) could be saved if all code had to go through such optimization.

And on a slightly ranty note, Apple's A12z and A14 are still apparently "too weak" to run multiple windows simultaneously :)

MR4D4y ago

That’s a ram issue not a processor issue. At least, that’s according to Apple

https://appleinsider.com/articles/22/06/11/stage-manager-for...

2 more replies

david384y ago

Worse CO2 emissions. You think optimizations are energy free?

3 more replies

eterm4y ago· 2 in thread

It's hard to evaluate this article without seeing the detail of the "algorithm_wizardry", there's no detail here just where it would be interesting.

bertr4nd4y ago

geph20214y ago

The author says:

  "The function looks something like this:"

And then shows some grouping and sorting functions using pandas.

Then he says:

  "I replaced Pandas with simple python lists and implemented the algorithm manually to do the group-by and sort."

1 more reply

justsomeuser4y ago· 2 in thread

I imagine for most web dev’s using a fast memory unsafe language is like taking a bullet train to the local shop to get milk.

Gigachad4y ago

It also doesn’t stop when you reach your destination so you have to jump and roll out. Get it wrong and you die. Questioning this method is widely frowned on.

1 more reply

floucky4y ago

1 more reply

varispeed4y ago· 2 in thread

I wonder if eventually there is going to be consideration for environment required when building software.

For instance running unoptimised code can eat a lot of energy unnecessarily, which has an impact on carbon footprint.

Do you think we are going to see regulation in this area akin to car emission bands?

Even to an extent that some algorithms would be illegal to use when there are more optimal ways to perform a task? Like using BubbleSort when QuickSort would perform much better.

jcelerier4y ago

> Do you think we are going to see regulation in this area akin to car emission bands?

it has thankfully started: https://www.blauer-engel.de/en/productworld/resources-and-en...

I think KDE's Okular has been one of the first certified software :-)

1 more reply

kzrdude4y ago

Well, there is some rumbling about making proof of work cryptocurrencies illegal, and that falls under this topic.

To some extent they can claim to deliver a unique feature where there is no replacement for the algorithm they are using.

etaioinshrdlu4y ago· 1 in thread

pjvsvsrtrxc4y ago

For all this decreased performance, what new features do we have to show for it? Oh great, I can search my Start menu and my taskbar had a shiny gradient for a decade.

1 more reply

vjerancrnjak4y ago· 1 in thread

Hmm, interesting that single threaded C++ is 25% of Python exec time. It feels like C++ implementation might have area for improvement.

My usual 1-to-1 translations result in C++ being 1-5% of Python exec time, even on combinatorial stuff.

bbojan4y ago

I recently ported some very simple combinatorial code from Python to Rust. I was expecting around 100x speed up. I was surprised when the code ended running only 14 times faster.

3 more replies

mulmboy4y ago· 1 in thread

julius_deane4y ago

but how else do you get to the front page of hn?

xupybd4y ago· 1 in thread

You also have to optimize for the constraints you have. If you're like me then development time is expensive. Is optimizing a function really the best use of that time? Sometimes yes, often no.

pjvsvsrtrxc4y ago

> If you're like me then development time is expensive. Is optimizing a function really the best use of that time? Sometimes yes, often no.

(Of course, capitalism is really bad at dealing with externalities and it makes our society that much worse. But that's an argument against capitalism, not an argument against optimization.)

1 more reply

vlovich1234y ago· 1 in thread

> extra_compile_args = ["-O3", "-ffast-math", "-march=native", "-fopenmp" ], > Some say -O3 flag is dangerous but that's how we roll

No. O3 is fine. -ffast-math is dangerous.

tomrod4y ago

Why?

2 more replies

reedjosh4y ago· 1 in thread

Python and Pandas are absolutely excellent until you notice you need performance. I say write everything in Python with Pandas until you notice something take 20 seconds.

Then rewrite it with a more performant language or cython hooks.

Developing features quickly is greatly aided by nice tools like Python and Pandas. And these tools make it easy to drop into something better when needed.

Eat your cake and have it too!

ineedasername4y ago

Yes, there have been times that I have called linux command line utilities from python to process something rather do it in python.

aaaaaaaaaaab4y ago· 1 in thread

Developers should be mandated to use artificially slow machines.

april_224y ago

Today, in many cases, it truly is about optimising algorithms instead of building faster machines.

I overheard this quote recently: 'I'd rather have today's algorithms on an old computer, than a new computer with old algorithms'

Someone4y ago· 1 in thread

Isn’t that 8ms per call, way faster than the target performance? Or should that “500ms” be “*500 μs”?

julius_deane4y ago

percentages in the post are wrong too

no surprise pandas was "slow"

hermitcrab4y ago· 1 in thread

hnhn4y ago

The data.table package in R often produces benchmarks that include pandas, e.g. https://h2oai.github.io/db-benchmark/

1 more reply

thanzex4y ago· 1 in thread

If anything this is a testament to how slow python can be, and most importantly how easily it pushes you to write miserably unoptimized code.

baobob4y ago

That is not to say aiming for generally unbranchy code is not a good thing - that often implies well designed code and well chosen data structures anyway

liprais4y ago· 1 in thread

most likely misused pandas / numpy,as long as you stay in numpy land,it is quite fast.

dfgqeryqe4y ago

Use C or C++ or Rust or even Java and you don't have to worry about any of this. You can just write the obvious thing with your normal set of tools and it will be good enough.

1 more reply

tintor4y ago· 1 in thread

TLDR: How to optimize Python function? Use C++.

julius_deane4y ago

Coming up next: Want to ship a C++ project this quarter? Use Python.

bcatanzaro4y ago

1. There's no real limit to how slow you can make code. So that means there can be surprising large speedups if you start from very slow code.

Taywee4y ago

> It's crazy how fast pure C++ can be. We have reduced the time for the computation by ~119%!

The pure C++ version is so fast, it finishes before you even start it!

pointernil4y ago

Software/System Developers using 'good enough' stacks/solutions are externalising costs for their own benefit.

Making those externalities transparent will drive alot of the transformation needed.

andrewclunn4y ago

illys4y ago

You could have had those discussion at anytime since the upgraded computers and microprocessors have become compatible with the previous generation (i.e. the x86 and PC lines).

This measure drives the use of programming languages, libraries, data load... all getting heavier and heavier when more processing power gets available. And that will probably never change.

djmips4y ago

muziq4y ago

mg4y ago

Good example is this high performance Fizz Buzz challenge:

https://codegolf.stackexchange.com/questions/215216/high-thr...

An optimized assembler implementation is 500 times faster than a naive Python implementation.

By the way, it is still missing a Javascript entry!

abraxas4y ago

modeless4y ago

Now do it on the GPU. There's at least a factor of 10 more there. And a lot of things people think aren't possible with GPUs are actually possible.

tonto4y ago

Also fun: test your intuition on the speed of basic operations https://computers-are-fast.github.io/

ineedasername4y ago

Slow Code Conjecture: inefficient code slows down computers incrementally such that any increase in computer power is offset by slower code.

fullstackchris4y ago

And if you wrote your instructions in assembly, it would be even faster!

Sorry for the rude sarcasm, but isn't this a post truly just about the efficiency pitfalls of Python? (or any language / framework choice for that matter)

I don't know, I guess I'm just surprised this post is so popular, this stuff seems quite obvious.

Ultimatt4y ago

The missing comparison is to Numpy and Numba as the first optimisation post pandas... I suspect nothing else there would beat it.

pelorat4y ago

Maybe, stop using Python for anything but better shell scripts? Pretty sure it was invented to be a bash replacement.

physicsguy4y ago

The return value on the function in C++ is of the wrong type :)

wdroz4y ago

If you are unhappy with pandas, give a try to polars[0] it's so fast!

[0] -- https://www.pola.rs/

avianes4y ago

As an example, with an ILP ~4 instruction/cycle at 5GHz we get 20 billion instructions executed each second in a single core. This number is not really tangible but it shocks

FirstLvR4y ago

Nothing really happened at the end but it's a funny history in the office

quickthrower24y ago

It is fitting that it is hosted on bearblog.dev, which produces fast minimal sites. It is my favorite blogging platform so far.

Zetaphor4y ago

For some reason my employer is blocking this site as malware using Cisco's OpenDNS service

xvilka4y ago

And all they are used for to run slow and heavy Electron "apps" with three buttons.

FpUser4y ago

"...but you do not know it"

lucidguppy4y ago

If your fast language is talking to a database how fast will your language be?

Shorel4y ago

The fact that now AWS CPU cost is a constant consideration in software development is making developers use better algorithms and languages, a trend that seems the opposite of the 2010s.

bawolff4y ago

It feels like the gist of this article is just, don't use python.

Havoc4y ago

Yes in general for me the limitation is now my ability / Knowledge.

Every cloud / SaaS is throwing free tier compute capacity at people and it’s just overwhelming (in a good way I suppose)

dqpb4y ago

Netsuite takes upwards of 10 seconds to transition from one blank page to another.

thrwyoilarticle4y ago

>Optimization 3: Writing your function in pure C++

>double score_array[]

javajosh4y ago

streamlining4y ago

For years I stuck with MATE, Xfce4, LXQT, etc. to get optimal performance on old hardware but nothing can top a tiling window manager.

newaccount20214y ago

where I work, every frontend dev has a 64gb ram/2tb ssd/multicore laptop to develop web pages...everything is lightning fast apparently!...so they never do performance engineering of any kind

j / k navigate · click thread line to collapse