A pure NumPy implementation of Mamba (opens in new tab)

(github.com)

100 pointsjulius2y ago35 comments

35 comments

28 comments · 5 top-level

thoronton2y ago· 8 in thread

Why is it so difficult to write a short description what the project does? With too many open source projects people, who are not familiar with it, have to play detective to figure out what it actually is doing. "Wait a package manager based on numphy? That doesn't make any sense. Oh they mention LLM? So it must have something to do with AI"

Tomte2y ago

The author did not post it to HN to confuse you. He did not post it here, at all.

Why are you entitled to have every single GitHub repo explained, tailored to your individual knowledge?

Many other people understood exactly what this is.

Maybe the submitter could add a comment on HN with an explanation, but the author owes you nothing.

bartread2y ago

Mmmmmmm... you have a valid point about entitlement and OSS but I'm going to also agree with GP here. Not particularly poking at this project, and the work that's gone into it, but too many projects don't have a short paragraph explaining what their purpose is and why it matters.

I'm not going to name names because I don't want to throw shade at what are essentially good or even great projects but, as a recent example, I encountered a library in our codebase the other day where I simply didn't get what the point was, and the corresponding project page and documentation - whilst really detailed in some ways - didn't help. In the end I asked ChatGPT and also found a series of video tutorials that I watched at 1.75x speed to understand it.

It was worth doing that because the thing is already used in our codebase, and it's important in that context for me to understand why and the value it adds.

But if I run across something reading an article or whatever, and it mentions some library or project in passing, I'm semi-regularly left a bit baffled as to what and why and I probably don't have the time to go digging. Nowadays I probably would ask ChatGPT for a short summary because it's so convenient and it's often quicker than Googling, and maybe I'll start submitting PRs against readme.md files to add those summaries (with a bit of editing) to the beginning of them.

edflsafoiewq2y ago

The doc comment at the top of the .py file is sufficiently descriptive

    """Simple, minimal implementation of Mamba in one file of Numpy adapted from (1) and inspired from (2).

    Suggest reading the following before/while reading the code:
        [1] Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Albert Gu and Tri Dao)
            https://arxiv.org/abs/2312.00752
        [2] The Annotated S4 (Sasha Rush and Sidd Karamcheti)
            https://srush.github.io/annotated-s4

mint22y ago

No, I can see the commenters frustration. Unless one is versed in Llm space, one is more likely to know mamba as the package manager and find the headline and also the GitHub page confusing. The markdown read me is supposed to provide the info the commenter wanted.

Even that first line you posted is unhelpfully circular, defining mamba as an implementation of mamba.

Call me old fashioned, but a best practice read me should concisely provide: what the thing is, and why it is, aka the problem it solves. (And not with circular definition.)

1024core2y ago

> The doc comment at the top of the .py file is sufficiently descriptive

Which is the purpose of these doc comments.

If you have the time to gripe on HN, you have the time to click on the link and do some reading. The "Usage" section in the link above is enough to help one disambiguate; if not, then there's always the doc comment.

1 more reply

ktm5j2y ago

Okay, so why not just put that in the readme??

grandma_tea2y ago

That's a fair criticism of many open source projects, however this one does link to the Mamba paper at the bottom of the (short) readme.

arthurcolle2y ago

realistically, it's like a classification problem

at this moment, in this time, if you see Mamba, either you know or you don't

rowanG0772y ago· 5 in thread

Totally unclear what this is. I scrolled through the readme and it didn't even mention once what it does.

sva_2y ago

It's an LLM architecture competing with transformers: https://arxiv.org/abs/2312.00752

Proponents of it usually highlight it's inference performance, in particular linear scaling with the input tokens.

szvsw2y ago

I really disagree with pigeonholing it as an LLM architecture! It is much more general than that as I mentioned in another comment in this post [1] (and of course as mentioned in the original paper which you linked).

[1] https://news.ycombinator.com/item?id=40616181

wodenokoto2y ago

It totally mentions what it does. It takes the sentence "I have a dream that" and extends it to: "I have a dream that I will be able to see the sunrise in the morning."

It's an LLM.

szvsw2y ago

It’s much more than just an LLM. The mamba architecture is often used in the backbone of an LLM but you can use it more generally as a linear-time (as opposed to quadratic-time) sequence modeling architecture (as per the original paper’s title, which is cited in the linked repo). It is much closer to a convolutional network or an RNN (it has bits of both) than to a transformer architecture. It is based off the notion of state spaces (with a twist).

I use Mamba for instance to build surrogate models of physics-based building energy models which can generate 15-min interval data for heating, cooling, electricity, and hot water usage of any building in the US from building characteristics, weather timeseries, and occupancy time series.

It has many other non-NLP applications.

2 more replies

piqufoh2y ago

Completely - I assumed it was an implementation of https://github.com/mamba-org/mamba

I also assumed that "a pure NumPy implementation" meant that it was built purely with numpy, which it isn't smh

ptspts2y ago· 4 in thread

Contrary to what the title says, this is note a pure Pyhon + numpy implementation: in fact it also imports einops, transformers and torch.

For me pure X means: to use this, all you have to install is X.

qwertox2y ago

I was struggling with the fairness of your comment because the libraries are not used as a replacement to NumPy, but to ease dealing with the data. This made me check and it turns out that:

"Yes, the comment you mentioned is fair and reflects a common perspective in the programming and data science communities regarding the usage of "pure" implementations. When someone refers to a "pure X implementation," the typical expectation is that the implementation will rely solely on the functionalities of library X, without introducing dependencies from other libraries or frameworks."

TIL.

rsfern2y ago

I don’t see a PyTorch import, and the transformers import is just for the tokenizer which I don’t really consider a nontrivial part of mamba

So it’s just numpy and einops, which is pretty cool. I guess you could probably rewrite all the einops stuff in pure numpy if you want to trade readable code for eliminating the einops dependency

Edit: found the torch import, but it’s just for a single torch.load to deserialize some data

TeMPOraL2y ago

> Edit: found the torch import, but it’s just for a single torch.load to deserialize some data

Torch is quite heavy though, isn't it? All for that one deserialization call?

nobodywillobsrv2y ago

Yes exactly. It's classic mis-selling.

mint22y ago· 3 in thread

Okay so is Mamba also an llm? There’s too much name overloading!

I’m familiar with mamba, the conda like thing in python, but a numpy implementation of that makes no sense.

nerdponx2y ago

I'm not normally one to gripe about name conflicts, but I knew this was going to get confusing. You could install an implementation of the Mamba LLM using the Mamba package manager!

exe342y ago

maybe we should get rid of names entirely, and call it all software. "software: a super fast and fun retro-encabulator written in rust!"

volemo2y ago

Assign UUIDs to everything!

1 more reply

Hugsun2y ago· 3 in thread

Is there a benefit to implementing it in numpy over pytorch or tf?

blagie2y ago

Yes.

A numpy program will work tomorrow.

ALL of the machine learning frameworks have incredible churn. I have code from two years ago which I can't make work reliably anymore -- not for lack of trying -- due to all the breaking changes and dependency issues. There are systems where each model runs in its own docker, with its own set of pinned library versions (many with security issues now). It's a complete and utter trainwreck. Don't even get me started on CUDA versions (or Intel/AMD compatibility, or older / deprecated GPUs).

For comparison, virtually all of my non-machine-learning Python code from the year 2010 all still works in 2024.

There are good reasons for this. Those breaking changes aren't just for fun; they're representative of the very rapid rate of progress in a rapidly-changing field. In contrast, Python or numpy are mature systems. Still, it makes many machine learning models insanely expensive to maintain in production environments.

If you're a machine learning researcher, it's fine, but if you have a system like an ecommerce web site or a compiler or whatever, where you'd like to be able to plug in a task-specific ML model, your downpayment is a weekend of hacking to make it work, but your ongoing rent of maintenance costs might be a few weeks each year for each model you use. I have a million places I'd love to plug in a little bit of ML. However, I'm very judicious with it, not because it's hard to do, but because it's expensive to maintain.

A pure Python + numpy implementation would mean that you can avoid all of that.

Hugsun2y ago

That makes sense. I imagine that there are significant performance tradeoffs but those are probably worth it in many cases. I would be somewhat surprised if Mamba can be made usefully fast with NumPy, but it would be a pleasant surprise.

1 more reply

7thpower2y ago

This was a very helpful explanation. I appreciate it.

j / k navigate · click thread line to collapse

35 comments

28 comments · 5 top-level

thoronton2y ago· 8 in thread

Tomte2y ago

The author did not post it to HN to confuse you. He did not post it here, at all.

Why are you entitled to have every single GitHub repo explained, tailored to your individual knowledge?

Many other people understood exactly what this is.

Maybe the submitter could add a comment on HN with an explanation, but the author owes you nothing.

bartread2y ago

It was worth doing that because the thing is already used in our codebase, and it's important in that context for me to understand why and the value it adds.

edflsafoiewq2y ago

The doc comment at the top of the .py file is sufficiently descriptive

    """Simple, minimal implementation of Mamba in one file of Numpy adapted from (1) and inspired from (2).

    Suggest reading the following before/while reading the code:
        [1] Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Albert Gu and Tri Dao)
            https://arxiv.org/abs/2312.00752
        [2] The Annotated S4 (Sasha Rush and Sidd Karamcheti)
            https://srush.github.io/annotated-s4

mint22y ago

Even that first line you posted is unhelpfully circular, defining mamba as an implementation of mamba.

Call me old fashioned, but a best practice read me should concisely provide: what the thing is, and why it is, aka the problem it solves. (And not with circular definition.)

1024core2y ago

> The doc comment at the top of the .py file is sufficiently descriptive

Which is the purpose of these doc comments.

1 more reply

ktm5j2y ago

Okay, so why not just put that in the readme??

grandma_tea2y ago

That's a fair criticism of many open source projects, however this one does link to the Mamba paper at the bottom of the (short) readme.

arthurcolle2y ago

realistically, it's like a classification problem

at this moment, in this time, if you see Mamba, either you know or you don't

rowanG0772y ago· 5 in thread

Totally unclear what this is. I scrolled through the readme and it didn't even mention once what it does.

sva_2y ago

It's an LLM architecture competing with transformers: https://arxiv.org/abs/2312.00752

Proponents of it usually highlight it's inference performance, in particular linear scaling with the input tokens.

szvsw2y ago

[1] https://news.ycombinator.com/item?id=40616181

wodenokoto2y ago

It totally mentions what it does. It takes the sentence "I have a dream that" and extends it to: "I have a dream that I will be able to see the sunrise in the morning."

It's an LLM.

szvsw2y ago

It has many other non-NLP applications.

2 more replies

piqufoh2y ago

Completely - I assumed it was an implementation of https://github.com/mamba-org/mamba

I also assumed that "a pure NumPy implementation" meant that it was built purely with numpy, which it isn't smh

ptspts2y ago· 4 in thread

Contrary to what the title says, this is note a pure Pyhon + numpy implementation: in fact it also imports einops, transformers and torch.

For me pure X means: to use this, all you have to install is X.

qwertox2y ago

I was struggling with the fairness of your comment because the libraries are not used as a replacement to NumPy, but to ease dealing with the data. This made me check and it turns out that:

TIL.

rsfern2y ago

I don’t see a PyTorch import, and the transformers import is just for the tokenizer which I don’t really consider a nontrivial part of mamba

So it’s just numpy and einops, which is pretty cool. I guess you could probably rewrite all the einops stuff in pure numpy if you want to trade readable code for eliminating the einops dependency

Edit: found the torch import, but it’s just for a single torch.load to deserialize some data

TeMPOraL2y ago

> Edit: found the torch import, but it’s just for a single torch.load to deserialize some data

Torch is quite heavy though, isn't it? All for that one deserialization call?

nobodywillobsrv2y ago

Yes exactly. It's classic mis-selling.

mint22y ago· 3 in thread

Okay so is Mamba also an llm? There’s too much name overloading!

I’m familiar with mamba, the conda like thing in python, but a numpy implementation of that makes no sense.

nerdponx2y ago

I'm not normally one to gripe about name conflicts, but I knew this was going to get confusing. You could install an implementation of the Mamba LLM using the Mamba package manager!

exe342y ago

maybe we should get rid of names entirely, and call it all software. "software: a super fast and fun retro-encabulator written in rust!"

volemo2y ago

Assign UUIDs to everything!

1 more reply

Hugsun2y ago· 3 in thread

Is there a benefit to implementing it in numpy over pytorch or tf?

blagie2y ago

Yes.

A numpy program will work tomorrow.

For comparison, virtually all of my non-machine-learning Python code from the year 2010 all still works in 2024.

A pure Python + numpy implementation would mean that you can avoid all of that.

Hugsun2y ago

1 more reply

7thpower2y ago

This was a very helpful explanation. I appreciate it.

j / k navigate · click thread line to collapse