Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Attention Residuals: Rethinking depth-wise aggregation [pdf] | Better HN
Attention Residuals: Rethinking depth-wise aggregation [pdf]
(opens in new tab)
(github.com)
17 points
salkahfi
10d ago
1 comments
Share
1 comments
default
newest
oldest
krackers
10d ago
In [1] I think a commenter actually speculated about a design just like this, where later layers can directly access outputs of previous layers instead of having to store it in the residual stream
[1]
https://news.ycombinator.com/item?id=46362579
j
/
k
navigate · click thread line to collapse