SpikingBrain 7B – More efficient than classic LLMs (opens in new tab)

(github.com)

150 pointssomethingsome9mo ago45 comments

45 comments

33 comments · 12 top-level

augment_me9mo ago· 9 in thread

To me it sounds like sparse matrix multiplication repackaged as "event-driven spiking computation", where the spikes are simply the non-zero elements that sparse GPU kernels have always been designed to process.

The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.

Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...

GregarianChild9mo ago

The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer. In particular psychology and related subjects, since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.

There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.

[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...

[2] https://en.wikipedia.org/wiki/MetaX

[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

janalsncm9mo ago

> I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'.

There is something refreshingly consistent in a VC that is laser focused on enterprise CRM dashboard for dashboards workflow optimization ChatGPT wrappers that also filters out the neuromorphicists.

Reminds me of how the Samurai were so used to ritual dueling and reading their lineages before battle but when the Mongolians encountered them they just shot the samurai mid-speech.

1 more reply

justthisguy85789mo ago

This is 100% true. It's just very effective and profitable PR. There IS a minority stream in the field that uses the branding to get funding and then builds real tech. But you'll never know it as neuromorphic as the label comes off once it works. Look up Synaptics (touch pad) history.

cpldcpu9mo ago

I believe the argument is that you can also encode information in the time domain.

If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.

Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...

On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.

HarHarVeryFunny9mo ago

I think the main benefit of a neuromorphic design would be to make it dataflow driven (asynchronous event driven - don't update neuron outputs unless their inputs change) rather than synchronous, which is the big power efficiency unlock. This doesn't need to imply a spiking design though - that seems more of an implementation detail, at least as far as dataflow goes. Nature seems to use spike firing rates to encode activation strength.

In the brain the relative timing/ordering of different neurons asynchronously activating (A before B, or B before A) is also used (spike-timing-dependent plasticity - STDP) as a learning signal to strengthen or weaken connection strengths, presumably to learn sequence prediction in this asynchronous environment.

STDP also doesn't imply that spikes or single neuron spike train inter-spike timings are necessary - an activation event with a strength and timestamp would seem to be enough to implement a digital dataflow design, although ultimately a custom analog design may be more efficient.

1 more reply

dist-epoch9mo ago

> you can also encode information in the time domain.

Also known as a serial interface. They are very successful: PCIe lane, SATA, USB.

1 more reply

nickpsecurity9mo ago

"I believe the argument is that you can also encode information in the time domain."

Brain research showed that's happening, too. You'll see many models like this if you DuckDuckGo for "spiking" "temporal" "encoding" or subtitute "time" for temporal. You can further use "neural" "network" or "brain" focus it on sub-fields.

CuriouslyC9mo ago

https://en.wikipedia.org/wiki/Frequency-division_multiplexin...

The brain is doing shit like this.

2 more replies

drob5189mo ago

Never underestimate the power of marketing.

ziofill9mo ago· 4 in thread

https://github.com/BICLab/SpikingBrain-7B/blob/main/assets/t...

Shouldn’t one bold the better numbers?

rpunkfu9mo ago

Inspired by GPT-5 presentation :)

doph9mo ago

Did they ever address that? I have not been able to stop thinking about it, it was so bizarre.

2 more replies

daveguy9mo ago

Well, then none of their model's numbers would be bold and that's not what they/AIs usually see in publications!

cubefox9mo ago

They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity.

VeejayRampay9mo ago· 3 in thread

it's funny to observe how picky and cynical the HN crowd suddenly becomes when the disruptive technology is from china

bastawhiz9mo ago

What part of this is disruptive? It kind of has to work well to be disruptive, doesn't it?

ramon1569mo ago

You can't be critical anymore?

izabera9mo ago

deepseek is from china and all their papers have been very well received

cpldcpu9mo ago· 2 in thread

>The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.

Isn't that in essence very similar to Quantization Aware Training (QaT)?

spwa49mo ago

Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.

cpldcpu9mo ago

It was also a question from my side. :)

But I understand that they simulate the spikes as integer events in the forward pass (as described here https://github.com/BICLab/Int2Spike) and calculate a continuous gradient based on high resolution weights for the backward pass.

This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.

imtringued9mo ago· 2 in thread

In a few years China will be completely independent from Nvidia.

https://en.wikipedia.org/wiki/MetaX

They have GPU manufacturers that nobody in the west has ever heard of.

astrange9mo ago

They need TSMC for that.

weregiraffe9mo ago

Then they'll have no reason to conquer Taiwan.

bob10299mo ago· 1 in thread

https://news.ycombinator.com/item?id=45206420

cpldcpu9mo ago

Well, it would still allow to deploy the trained model to SNN hardware, if it existed.

asdfasdf19mo ago

SpikingBrain Technical Report: Spiking Brain-inspired Large Models https://arxiv.org/abs/2509.05276

cgadski9mo ago

The technical report says (page 7):

> Our architectural choices are closely aligned with principles observed in biological brains.

How? They point out three design choices: linear attention, MoE layers, and spike coding.

Apparently linear attention is brain-inspired because it can be viewed as a "simplified abstraction of dendritic dynamics with multi-branch morphology." Who knows what that means exactly [1]. They don't discuss it further. MoE layers apparently reflect "a principle of modular specialization." Fine, whatever.

Now, using a dozen attention variants + MoE is bog standard. The real novelty would be spike coding. Page 11 is dedicated to the different ways they could turn signals into spike trains, including such biologically-inspired mechanisms as using two's complement. However, they don't actually do spike coding in a time domain. In their implementation, "spike coding" apparently means to turn activations into integers. Section 3.3.3 claims that this lets us simulate an underlying spiking neural network, so we can validate the spiking approach without using special hardware. But if your SNN can be simulated faithfully on a GPU by turning things into integers, isn't that a bit of a depressing SNN?

Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.

[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.

1 more reply

torotoki9mo ago

They use MetaX GPUs instead NVDIA's...? This point is actually more surprising.

gunalx9mo ago

So significantly worse than qwen2.5, kinda useless in the current landscape. but always fun with more arcitechtures.

janalsncm9mo ago

They compare to Llama3.1 which is 13 months old and qwen 2.5 which is 9 months old. And they don’t beat qwen.

RLAIF9mo ago

SpikingBrain treats 'spikes' as 1-bit quantization stickers. True neural-level sparsity should be input-dependent, time-resolved, and self-organized during learning. If a new circuit diagram cannot 'grow' with every forward pass, then don't blame everyone for treating it as Another Sparse Marketing - oh wait, Neuromorphic Marketing.

j / k navigate · click thread line to collapse

45 comments

33 comments · 12 top-level

augment_me9mo ago· 9 in thread

Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...

GregarianChild9mo ago

[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...

[2] https://en.wikipedia.org/wiki/MetaX

[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

janalsncm9mo ago

> I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'.

There is something refreshingly consistent in a VC that is laser focused on enterprise CRM dashboard for dashboards workflow optimization ChatGPT wrappers that also filters out the neuromorphicists.

Reminds me of how the Samurai were so used to ritual dueling and reading their lineages before battle but when the Mongolians encountered them they just shot the samurai mid-speech.

1 more reply

justthisguy85789mo ago

cpldcpu9mo ago

I believe the argument is that you can also encode information in the time domain.

Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...

On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.

HarHarVeryFunny9mo ago

1 more reply

dist-epoch9mo ago

> you can also encode information in the time domain.

Also known as a serial interface. They are very successful: PCIe lane, SATA, USB.

1 more reply

nickpsecurity9mo ago

"I believe the argument is that you can also encode information in the time domain."

CuriouslyC9mo ago

https://en.wikipedia.org/wiki/Frequency-division_multiplexin...

The brain is doing shit like this.

2 more replies

drob5189mo ago

Never underestimate the power of marketing.

ziofill9mo ago· 4 in thread

https://github.com/BICLab/SpikingBrain-7B/blob/main/assets/t...

Shouldn’t one bold the better numbers?

rpunkfu9mo ago

Inspired by GPT-5 presentation :)

doph9mo ago

Did they ever address that? I have not been able to stop thinking about it, it was so bizarre.

2 more replies

daveguy9mo ago

Well, then none of their model's numbers would be bold and that's not what they/AIs usually see in publications!

cubefox9mo ago

They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity.

VeejayRampay9mo ago· 3 in thread

it's funny to observe how picky and cynical the HN crowd suddenly becomes when the disruptive technology is from china

bastawhiz9mo ago

What part of this is disruptive? It kind of has to work well to be disruptive, doesn't it?

ramon1569mo ago

You can't be critical anymore?

izabera9mo ago

deepseek is from china and all their papers have been very well received

cpldcpu9mo ago· 2 in thread

Isn't that in essence very similar to Quantization Aware Training (QaT)?

spwa49mo ago

Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.

cpldcpu9mo ago

It was also a question from my side. :)

This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.

imtringued9mo ago· 2 in thread

In a few years China will be completely independent from Nvidia.

https://en.wikipedia.org/wiki/MetaX

They have GPU manufacturers that nobody in the west has ever heard of.

astrange9mo ago

They need TSMC for that.

weregiraffe9mo ago

Then they'll have no reason to conquer Taiwan.

bob10299mo ago· 1 in thread

https://news.ycombinator.com/item?id=45206420

cpldcpu9mo ago

Well, it would still allow to deploy the trained model to SNN hardware, if it existed.

asdfasdf19mo ago

SpikingBrain Technical Report: Spiking Brain-inspired Large Models https://arxiv.org/abs/2509.05276

cgadski9mo ago

The technical report says (page 7):

> Our architectural choices are closely aligned with principles observed in biological brains.

How? They point out three design choices: linear attention, MoE layers, and spike coding.

Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.

[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.

1 more reply

torotoki9mo ago

They use MetaX GPUs instead NVDIA's...? This point is actually more surprising.

gunalx9mo ago

So significantly worse than qwen2.5, kinda useless in the current landscape. but always fun with more arcitechtures.

janalsncm9mo ago

They compare to Llama3.1 which is 13 months old and qwen 2.5 which is 9 months old. And they don’t beat qwen.

RLAIF9mo ago

j / k navigate · click thread line to collapse