We rewrote our Rust WASM parser in TypeScript and it got faster (opens in new tab)

(openui.com)

293 pointszahlekhan3mo ago212 comments

212 comments

126 comments · 43 top-level

rented_mule3mo ago· 24 in thread

Something not unlike this happened to me when moving some batch processing code from C++ to Python 1.4 (this was 1997). The batch started finishing about 10x faster. We refused to believe it at first and started looking to make sure the work was actually being done. It was.

The port had been done in a weekend just to see if we could use Python in production. The C++ code had taken a few months to write. The port was pretty direct, function for function. It was even line for line where language and library differences didn't offer an easier way.

A couple of us worked together for a day to find the reason for the speedup. Just looking at the code didn't give us any clues, so we started profiling both versions. We found out that the port had accidentally fixed a previously unknown bug in some code that built and compared cache keys. After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.

We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound. We soon found out that we could make algorithmic improvements so much more quickly, so a lot of the slowest things got a lot faster than they had ever been. And, most importantly, we (the software developers) got quite a bit faster.

ameixaseca3mo ago

My experience is the exact opposite.

This was particularly true for one of the projects I've worked with in the past, where Python was chosen as the main language for a monitoring service.

In short, it proved itself to be a disaster: just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.

In the end, the project went ahead for a while more, and we had to do all sorts of mitigations to get the performance impact to be less of an issue.

We did consider replacing it all by a few open source tools written in C and some glue code, the initial prototype used few MBs instead of dozens (or even hundreds) of MBs of memory, while barely registering any CPU load, but in the end it was deemed a waste of time when the whole project was terminated.

czhu123mo ago

Ditto for me. I had gotten so used to building web backends in Ruby and running at 700MB minimum. When I finally got around to writing a rust backend, it registered in the metrics as 0MB, so I thought for sure the application had crashed.

Turns out the metrics just rounded to the nearest 5MB

wiseowise3mo ago

> but in the end it was deemed a waste of time when the whole project was terminated.

The main lesson of the story. Just pick Python and move fast, kids. It doesn’t matter how fast your software is if nobody uses it.

7 more replies

serial_dev3mo ago

Another anecdote, the team couldn’t improve concurrency reliably in Python, they rewrote the service in about a month (ten years ago) in Go, everything ran about 20x faster.

naasking3mo ago

> just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.

Just write the parsing loop in something faster like C or Rust, instead of the whole thing.

Traubenfuchs3mo ago

He struggled with the algorithms, you struggled with the runtime.

You are not the same.

asveikau3mo ago

> After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.

Pure speculation, but I would guess this has something to do with a copy constructor getting invoked in a place you wouldn't guess, that ends up in a critical path.

andrewflnr3mo ago

Given the context, I'm thinking bad cache keys resulting in spurious cache misses, where the keys are built in some low-level way. Cache misses almost certainly have a bigger asymptotic impact than extra copies, unless that copy constructor is really heavy.

1 more reply

NooneAtAll33mo ago

good ol' shallow-vs-deep copy

branko_d3mo ago

My guess would be bad hashing, resulting in too many collisions.

tda3mo ago

Ome advantage of python is that it is so slow that if you choose the wrong algorithm or data structure that soon gets obvious. And for complicated stuff this is exactly where I find the LLMs struggle. So I make a first version in Python, and only when I am happy with the results and the speed feels reasonable compared to the problem complexity, I ask Claude Code to port the critical parts to Rust.

rabisg3mo ago

The last part is really interesting. It feels like the whole world will soon become Python/JS because thats what LLMs are good at. Very few people will then take the pain of optimizing it

3 more replies

WalterBright3mo ago

> We soon found out that we could make algorithmic improvements so much more quickly

It's true that writing code in C doesn't automatically make it faster.

For example, string manipulation. 0-terminated strings (the default in C) are, frankly, an abomination. String processing code is a tangle of strlen, strcpy, strncpy, strcat, all of which require repeated passes over the string looking for the 0. (Even worse, reloading the string into the cache just to find its length makes things even slower.)

Worse is the problem that, in order to slice a string, you have to malloc some memory and copy the string. And then carefully manage the lifetime of that slice.

The fix is simple - use length-delimited strings. D relies on them to great effect. You can do them in C, but you get no succor from the language. I've proposed a simple enhancement for C to make them work https://www.digitalmars.com/articles/C-biggest-mistake.html but nobody in the C world has any interest in it (which baffles me, it is so simple!).

Another source of slowdown in C is I've discovered over the years that C is not a plastic language, it is a brittle one. The first algorithm you select for a C project gets so welded into it that it cannot be changed without great difficulty. (And we all know that algorithms are the key to speed, not coding details.) Why isn't C plastic?

It's because one cannot switch back and forth between a reference type and a value type without extensively rewriting every use of it. For example:

    struct S { int a; }
    int foo(struct S s) { return s.a; }
    int bar(struct S *s) { return s->a; }

If you want to switch between reference and value, you've got to go through all your code swapping . and ->. It's just too tedious and never happens. In D:

    struct S { int a; }
    int foo(S s) { return s.a; }
    int bar(S *s) { return s.a; }

I discovered while working on D that there is no reason for the C and C++ -> operator to even exist, the . operator covers both bases!

rented_mule3mo ago

What an honor to have Walter Bright respond to my comment! I used Zortech C++ extensively in the late 1980s and early 1990s on OS/2 and Windows. That beautiful black and purple cube-shaped box sat prominently on my bookshelf for many years. Thanks Walter!

garaetjjte3mo ago

Well clearly there is use for these - how do you distinguish what you are accessing in smart-pointer-like types.

1 more reply

casey23mo ago

This is the difference between scripting and programming. If you use C++ as a scripting language you're gonna have a bad time.Of course a scripting language is faster for scripting! That doesn't mean you go full Graham and throw away real programming languages, it just means you aren't writing systems software.

The usual strategy is to write a script then if it's slow see how you could design a program that would

The usual strategy in the real world is to copy paste thousands of lines of C++ code until someone comes along and writes a proper direct solution to the problem.

Of course there are ideas on how to fix this, either writing your own scripting libraries (stb), packages (go/rust/ts), metaprogramming (lisp/jai). As for bugs those are a function of how you choose to write code, the standard way of writing shell it bug prone, the standard way of writing python is less so, not using overloading & going wider in c++ generally helps.

deterministic3mo ago

I use C++ instead of so called 'scripting' languages all the time. I have zero problems doing that and it is lightning fast.

asa4003mo ago

Fun story! Performance is often highly unintuitive, and even counterintuitive (e.g. going from C++ to Python). Very much an art as well as a science.

Crazy how many stories like this I’ve heard of how doing performance work helped people uncover bugs and/or hidden assumptions about their systems.

staticassertion3mo ago

It doesn't come off as unintuitive by my read. They had a bug that led to a massive performance regression. Rewriting the code didn't have that bug so it led to a performance improvement.

They found that they had fewer bugs in Python so they continued with it.

1 more reply

zeroonetwothree3mo ago

I ported Python to C++ one time and it ran 10c faster with 10x less memory usage with no architectural changes

peter_retief3mo ago

I suspect that you used highly optimized algorithms written for python, like the vector algorithms in numpy? You will struggle to write better code, at least I would.

masklinn3mo ago

Python 1.4 would be mid-late 90s long before numpy and vector algorithms would have been available.

I suspect it’s more likely to be something like passing std::string by value not realising that would copy the string every time, especially with the statement that the mistake would be hard to express in Python.

1 more reply

shevy-java3mo ago

> We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound.

Would be kind of cool if e. g. python or ruby could be as fast as C or C++.

I wonder if this could be possible, assuming we could modify both to achieve that as outcome. But without having a language that would be like C or C++. Right now there is a strange divide between "scripting" languages and compiled ones.

nubg3mo ago

@dang this is an ai slop account, check his other comments

blundergoat3mo ago· 15 in thread

The real win here isn't TS over Rust, it's the O(N²) -> O(N) streaming fix via statement-level caching. That's a 3.3x improvement on its own, independent of language choice. The WASM boundary elimination is 2-4x, but the algorithmic fix is what actually matters for user-perceived latency during streaming. Title undersells the more interesting engineering imo.

zahrevsky3mo ago

They even directly conclude at the end of the article that improvements in algorithm are more important than the choice of language:

> Algorithmic complexity improvements dominate language-level optimisations. Going from O(N²) to O(N) in the streaming case had a larger practical impact than switching from WASM to TypeScript.

Yet they still have chosen to put the “Rust rewrite” part in the title. I almost think it's a click bait.

nulltrace3mo ago

Yeah the algorithmic fix is doing most of the work here. But call that parser hundreds of times on tiny streaming chunks and the WASM boundary cost per call adds up fast. Same thing would happen with C++ compiled to WASM.

1 more reply

azakai3mo ago

O(N²) -> O(N) was 3.3x faster, but before that, eliminating the boundary (replacing wasm with JS) led to speedups of 2.2x, 4.6x, 3.0x (see one table back).

It looks like neither is the "real win". both the language and the algorithm made a big difference, as you can see in the first column in the last table - going to wasm was a big speedup, and improving the algorithm on top of that was another big speedup.

1 more reply

socalgal23mo ago

same for uv but no one takes that message. They just think "rust rulez!" and ignore that all of uv's benefits are algo, not lang.

estebank3mo ago

Some architectures are made easier by the choice of implementation language.

2 more replies

coldtea3mo ago

Just the fact that I can install a single binary is 10x better than an equally fast Python implementation.

rowanG0773mo ago

That's a pretty big claim. I don't doubt that a lot of uv's benefits are algo. But everything? Considering that running non IO-bound native code should be an order of magnitude faster than python.

2 more replies

catlifeonmars3mo ago

You’re not wrong, but that win would not get as many views. It’s not clickbaity enough

wolvesechoes3mo ago

> The real win here isn't TS over Rust

Kinda is. We came up with abstractions to help reason about what really matters. The more you need to deal with auxillary stuff (allocations, lifetimes), more likely you will miss the big issue.

coldtea3mo ago

The opposite: the more you rely on abstractions the more you miss the lower level optimization opportunities and loose understanding of algorithms and hardware.

1 more reply

sroussey3mo ago

Yeah, though the n^2 is overstating things.

One thing I noticed was that they time each call and then use a median. Sigh. In a browser. :/ With timing attack defenses build into the JS engine.

fn-mote3mo ago

For those of us not in the know, what are we expecting the results of the defenses to be here?

1 more reply

adastra223mo ago

No AI generated comments on HN please.

Aurornis3mo ago

> Title undersells the more interesting engineering imo.

Thanks for cutting through the clickbait. The post is interesting, but I'm so tired of being unnecessarily clickbaited into reading articles.

shmerl3mo ago

More like a misleading clickbait.

nine_k3mo ago· 8 in thread

"We rewrote this code from language L to language M, and the result is better!" No wonder: it was a chance to rectify everything that was tangled or crooked, avoid every known bad decision, and apply newly-invented better approaches.

So this holds even for L = M. The speedup is not in the language, but in the rewriting and rethinking.

MiddleEndian3mo ago

Now they just need a third party who's never seen the original to rewrite their TypeScript solution in Rust for even more gains.

nine_k3mo ago

Indeed! But only after a year or so of using it in production, so that the drawbacks would be discovered.

azakai3mo ago

You're generally right - rewrites let you improve the code - but they do have an actual reason the new language was better: avoiding copies on the boundary.

They say they measured that cost, and it was most of the runtime in the old version (though they don't give exact numbers). That cost does not exist at all in the new version, simply because of the language.

necovek3mo ago

It's doing copies and (de)serialization on both sides into native data types.

If they used raw byte structures, implemented the caching improvements on the wasm side, the copies might not be as bad.

But they still have an issue with multi-language stack: complexity also has a cost.

Python/C combo does not have this issue because you can work with Python types natively in C, but otherwise, this is a cross-language conversion issue, and not a Rust issue at all.

awesome_dude3mo ago

I think that they were honest about that to a degree, they pointed out that one source of the speed up was caused by the python fixing a big they hadn't noticed in the C++

Edit: fixed phone typos

rabisg3mo ago

One of the authors here. While that’s generally true, in this case it wasn’t time that helped us learn what worked. It was a nagging sense that the architecture wasn’t right, just days before launch, along with heavy instrumentation to test our assumptions.

baranul3mo ago

Truth. You can see improvement, even rewriting code in the same language.

johnisgood3mo ago

I have been saying this for a while now (thought it was obvious), and often I get downvoted when I point this out.

slowhadoken3mo ago· 5 in thread

Am I mistaken or isn’t TypeScript just Golang under the hood these days?

jeremyjh3mo ago

There is too much wrong here to call it a mistake.

iainmerrick3mo ago

Hmm, there's an in-progress rewrite of the TypeScript compiler in Go; is that what you mean?

I don't think that's actually out yet, and more importantly, it doesn't change anything at runtime -- your code still runs in a JS engine (V8, JSC etc).

koakuma-chan3mo ago

npm i -D @typescript/native-preview

You can use it today.

1 more reply

wiseowise3mo ago

Yes, you've uncovered grand conspiracy.

slowhadoken3mo ago

It’s funny because a decade ago people said I was crazy for thinking that Oracle owning JS was going to become an issue in the future.

simonbw3mo ago· 4 in thread

Yeah if you're serializing and deserializing data across the JS-WASM boundary (or actually between web workers in general whether they're WASM or not) the data marshaling costs can add up. There is a way of sharing memory across the boundary though without any marshaling: TypedArrays and SharedArrayBuffers. TypedArrays let you transfer ownership of the underlying memory from one worker (or the main thread) to another without any copying. SharedArrayBuffers allow multiple workers to read and write to the same contiguous chunk of memory. The downside is that you lose all the niceties of any JavaScript types and you're basically stuck working with raw bytes.

You still do get some latency from the event loop, because postMessage gets queued as a MacroTask, which is probably on the order of 10μs. But this is the price you have to pay if you want to run some code in a non-blocking way.

osullivj3mo ago

Strongly agree from an Emscripten C++ wasm pov: it's key to minimise emscripten::val roundtrips. Caches must be designed for rectilinear data geometry, and SharedArrayBuffers are the way for bulk data. But only JS allows us to express asynchrony, so we need an on_completion callback design at the lang boundary.

tankenmate3mo ago

Indeed a whole class of issues become moot if you just don't use javascript anywhere. In the browser world this is obviously difficult/impossible; I look forward to the day when WASM can run natively in a browser and doesn't need javascript at all, DOM, network, etc, etc. On the server side? Just steer clear of the javascript ecosystem altogether.

fHr3mo ago

So the actual processing is faster in rust/c/c++ but the marshaling costs are so big so ts is faster in this case? No vlue how something like swc does this but there it's way faster then babel.

jesse__3mo ago

This should be the top comment

joaohaas3mo ago· 3 in thread

God I hate AI writing.

That final summary benchmark means nothing. It mentions 'baseline' value for the 'Full-stream total' for the rust implementation, and then says the `serde-wasm-bindgen` is '+9-29% slower', but it never gives us the baseline value, because clearly the only benchmark it did against the Rust codebase was the per-call one.

Then it mentions: "End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."

But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.

I really think the guy just prompted claude to "get this shit fast and then publish a blog post".

chvish3mo ago

This. It’s so annoying to read these types of blogs now where the writer clearly didn’t put the effort to understand things fully or atleast review the blog their LLM wrote. Who is this useful for?

JimDabell3mo ago

The article as a whole makes no sense. They are generating UI with an LLM. How fast the UI appears to the user is going to be completely dictated by the speed of the LLM, not the speed of the serialisation.

rabisg3mo ago

as an author of the blog - ouch did a little bit more than prompt claude but a lot of claude prompting was definitely involved

I understand your frustration with AI writing though. We are a small team and given our roadmap it was either use LLMs to help collate all the internal benchmark results file into a blog or never write it so we chose the former. This was a genuinely surprising and counterintuitive result for us, which is why we wanted to share it. Happy to clarify any of the numbers if helpful.

bulbar3mo ago· 3 in thread

Is this an outlier or has Rust started to be part of the establishment and being 'old' so that people want to share their "moving away from Rust" stories?

I didn't mind reading articles that are not about how Rust is great in theory (and maybe practice).

quotemstr3mo ago

There's a certain segment of the industry that's always chasing the newest thing. Many of them like Zig for some ghastly reason.

That said, Rust does have real problems. Manual memory management sucks. People think GC is expensive? Well, keep in mind malloc() and free() take global locks! People just have totally bogus mental models of what drives performance. These models lead them to technical nonsense.

zozbot2343mo ago

This story is about moving away from WASM for an application that's unsuitable for it. It's not really about Rust.

notnullorvoid3mo ago

It's not an unsuitable application for WASM. They could've drastically reduced the WASM boundary impact if instead of mapping to JSON in Rust they streamed out structured bytes to JS then mapped to JSON there. And the streaming fix was language independent.

So it's more so a story about architectural mistakes.

pjmlp3mo ago· 2 in thread

This is why, when a programming language already has tooling for compilers, being it ahead of time, or dynamic, it pays off to first go around validating algorithms and data structures before a full rewrite.

Additionally even after those options are exhausted, only a key parts might need a rewrite, not the whole thing.

However, I wonder how many care about actually learning about algorithms, data structures and mechanical sympathy in the age of Electron apps.

It feels quite often that a rewrite is chosen, because knowing how to actually apply those skills is the CS stuff many think isn't worthwhile learning about.

coldtea3mo ago

>However, I wonder how many care about actually learning about algorithms, data structures and mechanical sympathy in the age of Electron apps.

Never mind the age of Electron apps, even fewer care about those in the age of agents.

pjmlp3mo ago

Agreed, however I would assert that in the age of agents, programming languages will become irrelevant to most, other those lucky enough druids to write AI runtime stack, at the AI overlords.

And those will still care about CS.

szmarczak3mo ago· 2 in thread

> Attempted Fix: Skip the JSON Round-Trip > We integrated serde-wasm-bindgen

So you're reinventing JSON but binary? V8 JSON nowadays is highly optimized [1] and can process gigabytes per second [2], I doubt it is a bottleneck here.

[1] https://v8.dev/blog/json-stringify [2] https://github.com/simdjson/simdjson

kam3mo ago

No, serde-wasm-bindgen implements the serde Serializer interface by calling into JS to directly construct the JS objects on the JS heap without an intermediate serialization/deserialization. You pay the cost of one or more FFI calls for every object though.

https://docs.rs/serde-wasm-bindgen/

szmarczak3mo ago

Indeed, you're right. However, it still needs to encode and decode strings. WASM just needs native interop.

vmsp3mo ago· 2 in thread

Not directly related to the post but what does OpenUI do? I'm finding it interesting but hard to understand. Is it an intermediate layer that makes LLMs generate better UI?

rabisg3mo ago

Its the library that bridges the gap between LLMs and live UI. Best example would be to imagine you want to build interactive charts within your AI agent (like Claude)

The most obvious approach would be to let LLMs generate code and render it but that introduces problems like safety, UI consistency and speed. OpenUI solves those problems and provides a safe, consistent and token optimized runtime for the LLMs to render live UI

aquariusDue3mo ago

Is it kinda similar to the new GenUI SDK for Flutter in that sense?

https://docs.flutter.dev/ai/genui

1 more reply

nallana3mo ago· 2 in thread

Why not a shared buffer? Serializing into JSON on this hot path should be entirely avoidable

mavdol043mo ago

I think a shared array just avoids the copy, not the serialization which is the main problem as they showed with serde-wasm-bindgen test

notnullorvoid3mo ago

You can avoid the serialization in WASM by pushing structured bytes to the SharedArrayBuffer, then do serialization in JS which should be relatively cheap compared to pushing JSON strings across the boundary.

measurablefunc3mo ago· 2 in thread

I tried a similar experiment recently w/ FFT transform for wav files in the browser and javascript was faster than wasm. It was mostly vibe coded Rust to wasm but FFT is a well-known algorithm so I don't think there were any low hanging performance improvements left to pick.

wintermute42823mo ago

It looks like FFTW3 is working on wasm support: https://github.com/FFTW/fftw3/issues/293

You could also try pretty fast fft: https://github.com/JorenSix/pffft.wasm

measurablefunc3mo ago

It was just an experiment in vibe coding. It's easy enough to try different architectures w/ AI coding but what I wanted to see was whether naive numeric calculations were faster w/ wasm or javascript & it turned out that javascript was faster so the performance trade-off between wasm & javascript is not as simple as between a high-level language like python & SIMD optimized C/assembly.

dmix3mo ago· 2 in thread

That blog post design is very nice. I like the 'scrollspy' sidebar which highlights all visible headings.

Claude tells me this is https://www.fumadocs.dev/

sroussey3mo ago

Interesting, thanks. I need make some good docs soon.

dmix3mo ago

Good documentation is always worth the effort. Markdown explaining your products is gold these days with LLMs.

mohsen13mo ago· 2 in thread

When there is a solid test harness, AI Coding can do magic!

It was able to beat XZ on its own game by a good margin:

https://github.com/mohsen1/fesh

applfanboysbgon3mo ago

> I had no idea how any of this works.

This is apparent. xz's own game is not "a specialized compression pre-processor for x86_64 ELF binaries.". xz's own game is a general-purpose compression utility suited for a range of tasks, not optimized for one ridiculously specific domain. Also, any compression benchmark really ought to include speed of de/compression, not only compression ratio, as compression algorithms occupy along a scale trying to maximize one trade-off or another.

mohsen13mo ago

I never claimed to beat xz as a general-purpose compressor. .tar.xz is the dominant format for Linux source tarballs and distro packages. So optimizing for ELF + x86_64 is optimizing for a very real and common case, not some toy benchmark.

btw goal of the project was not building a production ready solution. It was curious case of black box software development. Compression is great because input and output are precise bits. As for speed, I think it's comparable since it's using most of XZ infra anyways.

nssnsjsjsjs3mo ago· 2 in thread

Rewrite bias. Yoy want to also rewrite the Rust one in Rust for comparison.

jeremyjh3mo ago

It would be surprising if rewriting in Rust could change the WASM boundary tax that the article identified as the actual problem.

rabisg3mo ago

(author here) We'd be really surprised if a rewrite could fix the boundary tax but if it does, we'd happily move over to it. People (including me) really underestimate how insanely fast browser's JSON.parse is

bluelightning2k3mo ago· 2 in thread

Great write up. It feels like craft in the age of slop.

Not sold about the fundamental idea of OpenUI though. XML is a great fit for DSLs and UI snippets.

rabisg3mo ago

We tried all formats - XML, json, jsonl, even toon - before deciding that we need to invest in OpenUI Lang

The primary motivation was speed and schema cohesion. We were running a JSON based format, Thesys C1, in production for a year before we realized we cannot add features fast enough because we were fighting the LLMs at multiple levels. It's probably too much to write in a comment but we'd like to write about the motivation and all the things we tried ona a separate blog soon

twoodfin3mo ago

Are you kidding? To the extent this was “crafted” it was by an LLM from somebody’s notes in a prompt.

The other day, someone linked back to this 2018 post on finding a cache coherency bug in the Xbox 360 CPU:

https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...

So much more genuinely engaging than any of the AI-“enhanced” sloppy, confused, trite writing that gets to the front page here daily because it’s been hyper-optimized for upvotes.

evmar3mo ago· 1 in thread

By the way, I did a deeper dive on the problem of serializing objects across the Rust/JS boundary, noticed the approach used by serde wasn’t great for performance, and explored improving it here: https://neugierig.org/software/blog/2024/04/rust-wasm-to-js....

slopinthebag3mo ago

Did you try something like msgpack or bebop?

slopinthebag3mo ago· 1 in thread

This article is obviously AI generated and besides being jarring to read, it makes me really doubt its validity. You can get substantially faster parsing versus `JSON.parse()` by parsing structured binary data, and it's also faster to pass a byte array compared to a JSON string from wasm to the browser. My guess is not only this article was AI generated, but also their benchmarks, and perhaps the implementation as well.

StilesCrisis3mo ago

It's vibe code all the way down!

caderosche3mo ago· 1 in thread

What is the purpose of the Rust WASM parser? Didn't understand that easily from the article. Would love a better explanation.

joshuanapoli3mo ago

They use a bespoke language to define LLM-generated UI components. I think that this is supposed to prevent exfiltration if the LLM is prompt-injected. In any case, the parser compiles chunks streaming from the LLM to build a live UI. The WASM parser restarted from the beginning upon each chunk received. Fixing this algorithm to work more incrementally (while porting from Rust to TypeScript) improved performance a lot.

spankalee3mo ago

I was wondering why I hadn't heard of Open UI doing anything with WASM.

This new company chose a very confusing name that has been used by the Open UI W3C Community Group for over 5 years.

https://open-ui.org/

Open UI is the standards group responsible for HTML having popovers, customizable select, invoker commands, and accordions. They're doing great work.

moomin3mo ago

“We saw huge speed-ups when changing technology.”

Looks inside

“The old implementation had some really inappropriate choices.”

Every time.

gavinray3mo ago

Why weren't you able to use WASM shared heaps to get zero-copy behavior?

AFAIK, you can create a shared memory block between WASM <-> JS:

https://developer.mozilla.org/en-US/docs/WebAssembly/Referen...

Then you'd only need to parse the SharedArrayBuffer at the end on the JS side

jeremyjh3mo ago

> The openui-lang parser converts a custom DSL emitted by an LLM into a React component tree.

> converts internal AST into the public OutputNode format consumed by the React renderer

Why not just have the LLM emit the JSON for OutputNode ? Why is a custom "language" and parser needed at all? And yes, there is a cost for marshaling data, so you should avoid doing it where possible, and do it in large chunks when its not possible to avoid. This is not an unknown phenomenon.

mpajares3mo ago

Had the opposite experience. Our JS FEM solver (~550ms per load case) was rewritten in Rust and dropped to ~270ms. But we compile to native.exe, not WASM — we call it via stdin/stdout with JSON from a Node.js compute engine. Tried the WASM route first but the serialization overhead for large stiffness matrices ate the gains, exactly like this article describes. Native binary + stdin/stdout turned out to be the sweet spot: no boundary tax, no FFI, and you get full native SIMD. The sparse solver variant (sprs crate, COO/CSC assembly) scales even better for larger models.

envguard3mo ago

The WASM story is interesting from a security angle too. WASM modules inheriting the host's memory model means any parsing bugs that trigger buffer overreads in the Rust code could surface in ways that are harder to audit at the JS boundary. Moving to native TS at least keeps the attack surface in one runtime, even if the theoretical memory safety guarantees go down.

athrowaway3z3mo ago

Its also worth underlining that it's not just "The parsing computation is fast enough that V8's JIT eliminates any Rust advantage", but specifically that this kind of straight-forward well-defined data structures and mutation, without any strange eval paths or global access is going to be JITed to near native speed relatively easily.

horacemorace3mo ago

I’m more of a dabbler dev/script guy than a dev but Every. single. thing I ever write in javascript ends up being incredibly fast. It forces me to think in callbacks and events and promises. Python and C (or async!) seem easy and sorta lazy in comparison.

ivanjermakov3mo ago

Good software is usually written on 2nd+ try.

jesse__3mo ago

This somehow reminds me of the days when the fastest way to deep copy an object in javascript was to round trip through toString. I thought that was gross then, and I think this is gross now

Dwedit3mo ago

JS and WASM share the main arraybuffer. It's just very not-javascript-like to try to use an arraybuffer heap, because then you don't have strings or objects, just index,size pairs into that arraybuffer.

Anyway, Javascript is no stranger to breaking changes. Compare Chromium 47 to today. Just add actual integers as another breaking change, then WASM becomes almost unnecessary.

sakesun3mo ago

I heard a lot of similar stories in the past when I started using Python 20+ years ago. A number of people claimed their solutions got faster when develop in Python, mainly because Python make it easier to quickly pivot to experiment with various alternative methods, hence finally yield at more efficient outcome at the end.

mwcampbell3mo ago

I hope we can still get to a point where wasm modules can directly access the web platform APIs and get JS out of the picture entirely. After all, those APIs themselves are implemented in C++ (and maybe some Rust now).

fHr3mo ago

I almost can't believe this swc for example is 80x faster then babeljs.

owenpalmer3mo ago

So this is an issue with WASM/JS interop, not with Rust per se?

gettingoverit3mo ago

In ye olden days of WASM just added to the browser, the difference between native JS and boost::spirit in WASM was x200.

In their worst case it was just x5. We clearly have some progress here.

LunaSea3mo ago

This has been known by Node.js developers for a while with many C++ core and NPM modules being rewritten in JavaScript to improve performance.

kennykartman3mo ago

I dream of the day in which there is no need to pass by JS and Wasm can do all the job by itself. Meanwhile, we are stuck.

hackwaly_new3mo ago

You don't need rewriting if you are using MoonBit. It gives you wasm, wasm-gc, js at once.

marcosdumay3mo ago

It would be great if people stopped dismissing the problem that WASM not being a first-class runtime for the web causes.

shevy-java3mo ago

So ...

Rust.

WASM.

TypeScript.

I am slowly beginning to understand why WASM did not really succeed.

rpodraza3mo ago

Press x to doubt

SCLeo3mo ago

They should rewrite it in rust again to get another 3x performance increase /s

neuropacabra3mo ago

This is very unusual statement :-D

j / k navigate · click thread line to collapse

212 comments

126 comments · 43 top-level

rented_mule3mo ago· 24 in thread

ameixaseca3mo ago

My experience is the exact opposite.

This was particularly true for one of the projects I've worked with in the past, where Python was chosen as the main language for a monitoring service.

In short, it proved itself to be a disaster: just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.

In the end, the project went ahead for a while more, and we had to do all sorts of mitigations to get the performance impact to be less of an issue.

czhu123mo ago

Turns out the metrics just rounded to the nearest 5MB

wiseowise3mo ago

> but in the end it was deemed a waste of time when the whole project was terminated.

The main lesson of the story. Just pick Python and move fast, kids. It doesn’t matter how fast your software is if nobody uses it.

7 more replies

serial_dev3mo ago

Another anecdote, the team couldn’t improve concurrency reliably in Python, they rewrote the service in about a month (ten years ago) in Go, everything ran about 20x faster.

naasking3mo ago

> just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.

Just write the parsing loop in something faster like C or Rust, instead of the whole thing.

Traubenfuchs3mo ago

He struggled with the algorithms, you struggled with the runtime.

You are not the same.

asveikau3mo ago

Pure speculation, but I would guess this has something to do with a copy constructor getting invoked in a place you wouldn't guess, that ends up in a critical path.

andrewflnr3mo ago

1 more reply

NooneAtAll33mo ago

good ol' shallow-vs-deep copy

branko_d3mo ago

My guess would be bad hashing, resulting in too many collisions.

tda3mo ago

rabisg3mo ago

The last part is really interesting. It feels like the whole world will soon become Python/JS because thats what LLMs are good at. Very few people will then take the pain of optimizing it

3 more replies

WalterBright3mo ago

> We soon found out that we could make algorithmic improvements so much more quickly

It's true that writing code in C doesn't automatically make it faster.

Worse is the problem that, in order to slice a string, you have to malloc some memory and copy the string. And then carefully manage the lifetime of that slice.

It's because one cannot switch back and forth between a reference type and a value type without extensively rewriting every use of it. For example:

    struct S { int a; }
    int foo(struct S s) { return s.a; }
    int bar(struct S *s) { return s->a; }

If you want to switch between reference and value, you've got to go through all your code swapping . and ->. It's just too tedious and never happens. In D:

    struct S { int a; }
    int foo(S s) { return s.a; }
    int bar(S *s) { return s.a; }

I discovered while working on D that there is no reason for the C and C++ -> operator to even exist, the . operator covers both bases!

rented_mule3mo ago

garaetjjte3mo ago

Well clearly there is use for these - how do you distinguish what you are accessing in smart-pointer-like types.

1 more reply

casey23mo ago

The usual strategy is to write a script then if it's slow see how you could design a program that would

The usual strategy in the real world is to copy paste thousands of lines of C++ code until someone comes along and writes a proper direct solution to the problem.

deterministic3mo ago

I use C++ instead of so called 'scripting' languages all the time. I have zero problems doing that and it is lightning fast.

asa4003mo ago

Fun story! Performance is often highly unintuitive, and even counterintuitive (e.g. going from C++ to Python). Very much an art as well as a science.

Crazy how many stories like this I’ve heard of how doing performance work helped people uncover bugs and/or hidden assumptions about their systems.

staticassertion3mo ago

It doesn't come off as unintuitive by my read. They had a bug that led to a massive performance regression. Rewriting the code didn't have that bug so it led to a performance improvement.

They found that they had fewer bugs in Python so they continued with it.

1 more reply

zeroonetwothree3mo ago

I ported Python to C++ one time and it ran 10c faster with 10x less memory usage with no architectural changes

peter_retief3mo ago

I suspect that you used highly optimized algorithms written for python, like the vector algorithms in numpy? You will struggle to write better code, at least I would.

masklinn3mo ago

Python 1.4 would be mid-late 90s long before numpy and vector algorithms would have been available.

1 more reply

shevy-java3mo ago

> We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound.

Would be kind of cool if e. g. python or ruby could be as fast as C or C++.

nubg3mo ago

@dang this is an ai slop account, check his other comments

blundergoat3mo ago· 15 in thread

zahrevsky3mo ago

They even directly conclude at the end of the article that improvements in algorithm are more important than the choice of language:

> Algorithmic complexity improvements dominate language-level optimisations. Going from O(N²) to O(N) in the streaming case had a larger practical impact than switching from WASM to TypeScript.

Yet they still have chosen to put the “Rust rewrite” part in the title. I almost think it's a click bait.

nulltrace3mo ago

1 more reply

azakai3mo ago

O(N²) -> O(N) was 3.3x faster, but before that, eliminating the boundary (replacing wasm with JS) led to speedups of 2.2x, 4.6x, 3.0x (see one table back).

1 more reply

socalgal23mo ago

same for uv but no one takes that message. They just think "rust rulez!" and ignore that all of uv's benefits are algo, not lang.

estebank3mo ago

Some architectures are made easier by the choice of implementation language.

2 more replies

coldtea3mo ago

Just the fact that I can install a single binary is 10x better than an equally fast Python implementation.

rowanG0773mo ago

That's a pretty big claim. I don't doubt that a lot of uv's benefits are algo. But everything? Considering that running non IO-bound native code should be an order of magnitude faster than python.

2 more replies

catlifeonmars3mo ago

You’re not wrong, but that win would not get as many views. It’s not clickbaity enough

wolvesechoes3mo ago

> The real win here isn't TS over Rust

Kinda is. We came up with abstractions to help reason about what really matters. The more you need to deal with auxillary stuff (allocations, lifetimes), more likely you will miss the big issue.

coldtea3mo ago

The opposite: the more you rely on abstractions the more you miss the lower level optimization opportunities and loose understanding of algorithms and hardware.

1 more reply

sroussey3mo ago

Yeah, though the n^2 is overstating things.

One thing I noticed was that they time each call and then use a median. Sigh. In a browser. :/ With timing attack defenses build into the JS engine.

fn-mote3mo ago

For those of us not in the know, what are we expecting the results of the defenses to be here?

1 more reply

adastra223mo ago

No AI generated comments on HN please.

Aurornis3mo ago

> Title undersells the more interesting engineering imo.

Thanks for cutting through the clickbait. The post is interesting, but I'm so tired of being unnecessarily clickbaited into reading articles.

shmerl3mo ago

More like a misleading clickbait.

nine_k3mo ago· 8 in thread

So this holds even for L = M. The speedup is not in the language, but in the rewriting and rethinking.

MiddleEndian3mo ago

Now they just need a third party who's never seen the original to rewrite their TypeScript solution in Rust for even more gains.

nine_k3mo ago

Indeed! But only after a year or so of using it in production, so that the drawbacks would be discovered.

azakai3mo ago

You're generally right - rewrites let you improve the code - but they do have an actual reason the new language was better: avoiding copies on the boundary.

necovek3mo ago

It's doing copies and (de)serialization on both sides into native data types.

If they used raw byte structures, implemented the caching improvements on the wasm side, the copies might not be as bad.

But they still have an issue with multi-language stack: complexity also has a cost.

Python/C combo does not have this issue because you can work with Python types natively in C, but otherwise, this is a cross-language conversion issue, and not a Rust issue at all.

awesome_dude3mo ago

I think that they were honest about that to a degree, they pointed out that one source of the speed up was caused by the python fixing a big they hadn't noticed in the C++

Edit: fixed phone typos

rabisg3mo ago

baranul3mo ago

Truth. You can see improvement, even rewriting code in the same language.

johnisgood3mo ago

I have been saying this for a while now (thought it was obvious), and often I get downvoted when I point this out.

slowhadoken3mo ago· 5 in thread

Am I mistaken or isn’t TypeScript just Golang under the hood these days?

jeremyjh3mo ago

There is too much wrong here to call it a mistake.

iainmerrick3mo ago

Hmm, there's an in-progress rewrite of the TypeScript compiler in Go; is that what you mean?

I don't think that's actually out yet, and more importantly, it doesn't change anything at runtime -- your code still runs in a JS engine (V8, JSC etc).

koakuma-chan3mo ago

npm i -D @typescript/native-preview

You can use it today.

1 more reply

wiseowise3mo ago

Yes, you've uncovered grand conspiracy.

slowhadoken3mo ago

It’s funny because a decade ago people said I was crazy for thinking that Oracle owning JS was going to become an issue in the future.

simonbw3mo ago· 4 in thread

osullivj3mo ago

tankenmate3mo ago

fHr3mo ago

So the actual processing is faster in rust/c/c++ but the marshaling costs are so big so ts is faster in this case? No vlue how something like swc does this but there it's way faster then babel.

jesse__3mo ago

This should be the top comment

joaohaas3mo ago· 3 in thread

God I hate AI writing.

Then it mentions: "End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."

But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.

I really think the guy just prompted claude to "get this shit fast and then publish a blog post".

chvish3mo ago

JimDabell3mo ago

rabisg3mo ago

as an author of the blog - ouch did a little bit more than prompt claude but a lot of claude prompting was definitely involved

bulbar3mo ago· 3 in thread

Is this an outlier or has Rust started to be part of the establishment and being 'old' so that people want to share their "moving away from Rust" stories?

I didn't mind reading articles that are not about how Rust is great in theory (and maybe practice).

quotemstr3mo ago

There's a certain segment of the industry that's always chasing the newest thing. Many of them like Zig for some ghastly reason.

zozbot2343mo ago

This story is about moving away from WASM for an application that's unsuitable for it. It's not really about Rust.

notnullorvoid3mo ago

So it's more so a story about architectural mistakes.

pjmlp3mo ago· 2 in thread

Additionally even after those options are exhausted, only a key parts might need a rewrite, not the whole thing.

However, I wonder how many care about actually learning about algorithms, data structures and mechanical sympathy in the age of Electron apps.

It feels quite often that a rewrite is chosen, because knowing how to actually apply those skills is the CS stuff many think isn't worthwhile learning about.

coldtea3mo ago

>However, I wonder how many care about actually learning about algorithms, data structures and mechanical sympathy in the age of Electron apps.

Never mind the age of Electron apps, even fewer care about those in the age of agents.

pjmlp3mo ago

Agreed, however I would assert that in the age of agents, programming languages will become irrelevant to most, other those lucky enough druids to write AI runtime stack, at the AI overlords.

And those will still care about CS.

szmarczak3mo ago· 2 in thread

> Attempted Fix: Skip the JSON Round-Trip > We integrated serde-wasm-bindgen

So you're reinventing JSON but binary? V8 JSON nowadays is highly optimized [1] and can process gigabytes per second [2], I doubt it is a bottleneck here.

[1] https://v8.dev/blog/json-stringify [2] https://github.com/simdjson/simdjson

kam3mo ago

https://docs.rs/serde-wasm-bindgen/

szmarczak3mo ago

Indeed, you're right. However, it still needs to encode and decode strings. WASM just needs native interop.

vmsp3mo ago· 2 in thread

Not directly related to the post but what does OpenUI do? I'm finding it interesting but hard to understand. Is it an intermediate layer that makes LLMs generate better UI?

rabisg3mo ago

Its the library that bridges the gap between LLMs and live UI. Best example would be to imagine you want to build interactive charts within your AI agent (like Claude)

aquariusDue3mo ago

Is it kinda similar to the new GenUI SDK for Flutter in that sense?

https://docs.flutter.dev/ai/genui

1 more reply

nallana3mo ago· 2 in thread

Why not a shared buffer? Serializing into JSON on this hot path should be entirely avoidable

mavdol043mo ago

I think a shared array just avoids the copy, not the serialization which is the main problem as they showed with serde-wasm-bindgen test

notnullorvoid3mo ago

measurablefunc3mo ago· 2 in thread

wintermute42823mo ago

It looks like FFTW3 is working on wasm support: https://github.com/FFTW/fftw3/issues/293

You could also try pretty fast fft: https://github.com/JorenSix/pffft.wasm

measurablefunc3mo ago

dmix3mo ago· 2 in thread

That blog post design is very nice. I like the 'scrollspy' sidebar which highlights all visible headings.

Claude tells me this is https://www.fumadocs.dev/

sroussey3mo ago

Interesting, thanks. I need make some good docs soon.

dmix3mo ago

Good documentation is always worth the effort. Markdown explaining your products is gold these days with LLMs.

mohsen13mo ago· 2 in thread

When there is a solid test harness, AI Coding can do magic!

It was able to beat XZ on its own game by a good margin:

https://github.com/mohsen1/fesh

applfanboysbgon3mo ago

> I had no idea how any of this works.

mohsen13mo ago

nssnsjsjsjs3mo ago· 2 in thread

Rewrite bias. Yoy want to also rewrite the Rust one in Rust for comparison.

jeremyjh3mo ago

It would be surprising if rewriting in Rust could change the WASM boundary tax that the article identified as the actual problem.

rabisg3mo ago

bluelightning2k3mo ago· 2 in thread

Great write up. It feels like craft in the age of slop.

Not sold about the fundamental idea of OpenUI though. XML is a great fit for DSLs and UI snippets.

rabisg3mo ago

We tried all formats - XML, json, jsonl, even toon - before deciding that we need to invest in OpenUI Lang

twoodfin3mo ago

Are you kidding? To the extent this was “crafted” it was by an LLM from somebody’s notes in a prompt.

The other day, someone linked back to this 2018 post on finding a cache coherency bug in the Xbox 360 CPU:

https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...

So much more genuinely engaging than any of the AI-“enhanced” sloppy, confused, trite writing that gets to the front page here daily because it’s been hyper-optimized for upvotes.

evmar3mo ago· 1 in thread

slopinthebag3mo ago

Did you try something like msgpack or bebop?

slopinthebag3mo ago· 1 in thread

StilesCrisis3mo ago

It's vibe code all the way down!

caderosche3mo ago· 1 in thread

What is the purpose of the Rust WASM parser? Didn't understand that easily from the article. Would love a better explanation.

joshuanapoli3mo ago

spankalee3mo ago

I was wondering why I hadn't heard of Open UI doing anything with WASM.

This new company chose a very confusing name that has been used by the Open UI W3C Community Group for over 5 years.

https://open-ui.org/

Open UI is the standards group responsible for HTML having popovers, customizable select, invoker commands, and accordions. They're doing great work.

moomin3mo ago

“We saw huge speed-ups when changing technology.”

Looks inside

“The old implementation had some really inappropriate choices.”

Every time.

gavinray3mo ago

Why weren't you able to use WASM shared heaps to get zero-copy behavior?

AFAIK, you can create a shared memory block between WASM <-> JS:

https://developer.mozilla.org/en-US/docs/WebAssembly/Referen...

Then you'd only need to parse the SharedArrayBuffer at the end on the JS side

jeremyjh3mo ago

> The openui-lang parser converts a custom DSL emitted by an LLM into a React component tree.

> converts internal AST into the public OutputNode format consumed by the React renderer

mpajares3mo ago

envguard3mo ago

athrowaway3z3mo ago

horacemorace3mo ago

ivanjermakov3mo ago

Good software is usually written on 2nd+ try.

jesse__3mo ago

This somehow reminds me of the days when the fastest way to deep copy an object in javascript was to round trip through toString. I thought that was gross then, and I think this is gross now

Dwedit3mo ago

Anyway, Javascript is no stranger to breaking changes. Compare Chromium 47 to today. Just add actual integers as another breaking change, then WASM becomes almost unnecessary.

sakesun3mo ago

mwcampbell3mo ago

fHr3mo ago

I almost can't believe this swc for example is 80x faster then babeljs.

owenpalmer3mo ago

So this is an issue with WASM/JS interop, not with Rust per se?

gettingoverit3mo ago

In ye olden days of WASM just added to the browser, the difference between native JS and boost::spirit in WASM was x200.

In their worst case it was just x5. We clearly have some progress here.

LunaSea3mo ago

This has been known by Node.js developers for a while with many C++ core and NPM modules being rewritten in JavaScript to improve performance.

kennykartman3mo ago

I dream of the day in which there is no need to pass by JS and Wasm can do all the job by itself. Meanwhile, we are stuck.

hackwaly_new3mo ago

You don't need rewriting if you are using MoonBit. It gives you wasm, wasm-gc, js at once.

marcosdumay3mo ago

It would be great if people stopped dismissing the problem that WASM not being a first-class runtime for the web causes.

shevy-java3mo ago

So ...

Rust.

WASM.

TypeScript.

I am slowly beginning to understand why WASM did not really succeed.

rpodraza3mo ago

Press x to doubt

SCLeo3mo ago

They should rewrite it in rust again to get another 3x performance increase /s

neuropacabra3mo ago

This is very unusual statement :-D

j / k navigate · click thread line to collapse