Backing up Spotify (opens in new tab)

(annas-archive.li)

1980 pointsvitplister6mo ago701 comments

701 comments

223 comments · 114 top-level

crazygringo6mo ago· 24 in thread

This is insane.

I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?

Aurornis6mo ago

> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

11 more replies

gorbachev6mo ago

Flippant response: If it's ok for Meta for commercial use, why not for researchers for legitimate research work?

More serious response: research is explicitly included in fair use protections in US copyright law. News organizations regularly use leaked / stolen copyrighted material in investigative journalism.

2 more replies

VanTheBrand6mo ago

The metadata is probably more useful than the music files themselves arguably

2 more replies

zuspotirko6mo ago

Are you aware Annas Archive already solved the exact same problem with books?

1 more reply

thiht6mo ago

> this doesn't even seem particularly useful for average consumers/listeners

I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.

IshKebab6mo ago

I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.

It's probably going to make the AI music generation problem worse anyway...

1 more reply

sowbug6mo ago

A little off topic, but I remain naively hopeful that the horror you describe will keep Spotify from going down the same road Netflix did once content owners decided to get into the streaming business themselves, so that streaming a movie today requires you to "change the channel" to whichever service offers that movie.

Can you imagine your favorite playlist needing to swap among 10 apps, each requiring a $10/month subscription?

fsckboy6mo ago

>The thing is, this doesn't even seem particularly useful for average consumer

it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?

the average consumer will flock when Spotify ultimately enshitifies

2 more replies

basisword6mo ago

>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

1 more reply

hugholousk6mo ago

This makes me think that after the crack, they probably had to come up with a formula that can statistically calculate how fast they should download spotify songs without letting Spotify realizing that they're scraping the company data and block the access. Remind me of Alan Turing formula after cracking the Enigma

Forgeties796mo ago

Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol

stefan_6mo ago

DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.

1 more reply

troupo6mo ago

Just like with anything digital you (and Spotify) are fully at the mercy of the rights holders. When (not if) they pull their stuff, or replace their stuff, or change their stuff, you can never get the original back unless you preserve it.

Largest example: a lot of Russian music is not available on Spotify because of the Russia-Ukrane war, and Spotify pulling out of Russia. So they don't have the licneses to a lot of stuff because that belongs to companies operating within Russia.

larodi6mo ago

This, indeed, has mostly implications for ML, training, etc. As otherwise the whole catalog is available to partners, but costs a lot. So Anna did indeed liberate the content, but I'm definitely not switching off my Spotify subscription, even though, in my personal taste, neither quality, nor UI does match Apple Music. It is still useful to have s.o. serve the content for you.

firefax6mo ago

>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

What's stopping someone from sticking a microphone next to their speaker?

Slow, but effective.

4 more replies

thaumasiotes6mo ago

> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Do they have DRM at all? Youtube and Pandora don't.

5 more replies

cm20126mo ago

This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.

5 more replies

londons_explore6mo ago

> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?

ccppurcell6mo ago

>The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Curious why not? Assuming you only used the metadata. I think they would be considered raw facts and not copyrightable.

madduci6mo ago

The first users of this dataset will be Big Tech corps. Meta, Alphabet, OpenAI, Microsoft, Apple will all be happy to use this dataset for training their LLMs.

For them, 300TB is just cheap

1 more reply

1dry6mo ago

Thank god we are taking care of the “researchers working on things like music classification and generation” ! As long as we can convince ourselves we have a sound analysis of it, no need to support and defend people making actual art right. So much already made, who needs more?

This is not to defend Spotify (death to it), but to state that opening all of this data for even MORE garbage generation is a step in the wrong direction. The right direction would be to heavily legislate around / regulate companies like Spotify to more fairly compensate the musicians who create the works they train their slop generators with.

4 more replies

robtherobber6mo ago

I believe that we need to distinguish between convenience and preservation here. It is indeed convenient for consumers to use Spotify now whilst it exists and operates the way it does. They could go under, they could change their business model, they could decide to purge everything that is not easily justifiable commercially.

As a society, we should do our best to preserve this trove.

hkt6mo ago

Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.

shevy-java6mo ago

> The thing is, this doesn't even seem particularly useful for average consumers/listeners

Yeah. To me it is not really relevant. I actually was not using spotify and if I need to have songs I use ytldp for youtube but even that is becoming increasingly rare. Today's music just doesn't interest me as much and I have the songs I listen to regularly. I do, however had, also listen to music on youtube in the background; in fact, that is now my primary use case for youtube, even surpassing watching movies or anything else. (I do use youtube for getting some news too though; it is so sad that Google controls this.)

Etheryte6mo ago· 10 in thread

To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.

[0] https://en.wikipedia.org/wiki/What.CD

flxy6mo ago

I think what earned what.cd that title wasn't necessarily just the amount but the quality, as you mentioned, as well as the obscurity of a lot of the offered material. I remember finding an early EP of an unknown local band on there, and I live in the middle of nowhere in Europe. There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown. It was the equivalent of vinyl crate digging without physical restrictions.

Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.

6 more replies

VanTheBrand6mo ago

True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.

4 more replies

rckclmbr6mo ago

You can’t talk about what.cd without talking about its precursor OiNks Pink Palace. Even Trent Reznor was public about what an amazing place it was. Music aside, the community existing just for the shared love of music and not for any other kind of monetary or influencer gain is what set it apart. We just don’t have those kinds of communities for music online anymore

2 more replies

josteink6mo ago

> What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth.

It was quality in technical quality of the audio in the files, but also in the organization and sourcing of the material, the QA-process of the encoding - down the the specific release the audio-file was from.

There was quantity, sure, but that was secondary to the quality. The quantity was just a side-effect of the place being known for quality, making it an attractive arena to participate in.

And it also had all the "weird"/non-standard things you don't find on mainstream streaming-services precisely because that is what independent curators are good at and often driven by.

This Anna's release... While in itself impressive in many ways does not compare to the things What.CD represented. It's almost the exact opposite:

- focus on most popular content - niche content (even by mainstream Spotify-standards) is not included

- quality is 160kbps ogg files, which is far from lossless, it's not tightly coupled to a release and even as so far the audio-grading goes, there's no transparent QA process for the content, nor is it available in audiophile fidelity.

This is definitely Apples vs Oranges.

layer86mo ago

That being sad, I have a lot of non-mainstream tracks in my playlists on YouTube Music that have YouTube comments along the line of “I wish this was available on Spotify :’(“. I bet the same goes for What.CD.

So there’s some way to go for a comprehensive music archive.

b86mo ago

Redacted, their replacement has more records then they had now.

rldjbpin6mo ago

about the scale, the same album in the tracker had several submissions, for dedicated format and regional editions.

while one can compare in terms of number of tracks, the quality used to be in another level altogether. from the article:

> The quality is the original OGG Vorbis at 160kbit/s.

meanwhile the tracker had 16/24-bit flac rips of vinyl, with decent quality control where the track's metadata was verified for any artifacts. for the given quality, one could rip youtube music (maybe not as easily anymore) and achieve a larger scale in a similar quality level.

now if hypothetically tidal had all the music of the world and was accessible this way, then it would be a comparable resource. insane regardless.

1 more reply

WadeGrimridge6mo ago

anna's rip has ~86m tracks, not ~186. ~186m is metadata, specifically ISRCs.

laughingcurve6mo ago

Wow, I have not thought about OiNK in ages... great memories! OiNK and WhatCD did something very special for the musical community

SSLy6mo ago

Well, what.cd counted any album as one torrent. While current spotify has also podcasts and AI slop.

lelouch90996mo ago· 7 in thread

How legal is this with regards to copyright laws?

Aurornis6mo ago

Not legal. This group does not concern themselves with copyright law.

1 more reply

toomuchtodo6mo ago

Adherence to the legal framework is a function of your risk appetite.

luke-stanley6mo ago

Currently it says they have released metadata and album art. Is archiving and sharing the textual track metadata alone (no images, no audio) legal in the US, or Europe? By what basis is it legal or illegal?

ronsor6mo ago

Very, if we delete copyright like we're supposed to.

phainopepla26mo ago

Not legal

layer86mo ago

Completely illegal.

1 more reply

basisword6mo ago

It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.

7 more replies

virtualritz6mo ago· 6 in thread

I just found out that https://annas-archive.li/ is masked by my German internet provider (SIM.de/Drillisch). I usually use a VPN but I had it switched off temp. to watch Fallout (Prime Video won't let you watch through a VPN). Only when I switched Mullvad back on could I open the site.

I didn't know German providers do this.

oarfish6mo ago

Yeah this is actually quite nefarious, as it is a private organization that decides what sites get blocked, with no legal oversight.

- https://de.wikipedia.org/wiki/Clearingstelle_Urheberrecht_im...

- https://netzpolitik.org/2024/cuii-liste-diese-websites-sperr...

Its a DNS based block, so overriding your default DNS server is enough to circumvent it. I think Dns over Https also works.

1 more reply

croemer6mo ago

I think it's a DNS level block. I've been using NextDNS (free plan) and one side effect (besides auto ad block) is that it doesn't have those blocks. Highly recommend - there are alternative services as well, just saw NextDNS recommended here.

Alternative: https://archive.ph/2025.12.21-050644/https://annas-archive.l...

1 more reply

iknowstuff6mo ago

In that vein, I am trying to find out why searching for

    alextud popcorntime

which should trivially yield http://github.com/alextud/PopcornTimeTV results in anything but that one particular URL in every search engine: Google, Kagi, DuckDuckGo, Bing

They even find a fork of that particular repo, which in turn links back to it, but refuse to show the result I want. Have't found any DMCA notices. What is going on?

3 more replies

polytely6mo ago

Also true in the Netherlands, I hate these copyright freaks constantly trying to restrict access.

junon6mo ago

Was also shocked to see that (Berlin, Telekom here).

sva_6mo ago

They also block some foreign "news" like Russia Today last time I checked.

vlaaad6mo ago· 6 in thread

Unrelated, but I just can't stop myself from saying that I absolutely hate Spotify even though I'm a paying customer. Fuck you Spotify. You were supposed to be a convenient way to discover and listen to music. Now you are only convenient for listening to music, and absolutely terrible for any recommendations. This is sad really. Spotify had good recommendations. It's absolutely in a position where it can provide good recommendations — it has both a vast music library and a vast amount of data on user preferences. And it chooses to push procedural/ai-generated slop instead to earn more money. I thought that maybe buying $SPOT stock will make me more at peace with its greed, but it didn't work. Spotify fucking deserves to crash and burn because it sees paying customers as idiots who might not notice they are fed garbage. Fuck you Spotify, fuck you.

xyzzy_plugh6mo ago

I always find these takes curious because they could not be further from my experience. I'm still discovering tons of good music. Perhaps it's specific to genres, but I haven't encountered any generated junk tracks.

2 more replies

gck16mo ago

When they launched Discover Weekly thing, I used to add at least 1 track from it to my library - it was insanely good. Now it's all junk - not even close to what I listen to.

They also removed a lot of discovery features - Playlist Radio - for example. And they still do have some version of it on the backend, but you have to go through some weird mechanisms to trigger it - like play the last song in playlist, wait till it ends (or rewind) and you get the playlist radio. But it's also a crippled version of it - prefers playing the exact same popular songs for some reason.

Then they released this DJ thing, which is laughably bad. No Spotify, I don't want someone talking to me with useless information in between songs. Who though that was a good idea? Who actually uses that?

There hasn't been a change in Spotify in last 7 years or so that wasn't negative.

layer86mo ago

YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.

2 more replies

eastbound6mo ago

This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.

Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.

wintermutestwin6mo ago

Why do you want a megacorp to tell you what to listen to!?? There are a million ways to do discovery where some enshitified corp isn’t incentivized to push something at you.

1 more reply

venturecruelty6mo ago

Why haven't you unsubscribed then?

ipsum26mo ago· 5 in thread

Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.

ghostie_plz6mo ago

Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.

Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point

1 more reply

adzm6mo ago

For electronic music, it's around the lowest bass root note that most systems can play well without a subwoofer. C pretty much requires a sub and things rarely go lower than that.

kzrdude6mo ago

Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.

1 more reply

klysm6mo ago

Difficult to play in what instrument?

1 more reply

RickyLahey6mo ago

i believe the most popular reason is capo on 1st fret when writing songs, other factors coming 2nd or 3rd (electronic music, sped up old samples, etc)

tjoff6mo ago· 5 in thread

I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.

lelandfe6mo ago

https://developer.spotify.com/documentation/web-api/referenc...

I bet you can whip up a super simple script with an LLM to do this!

1 more reply

hn1116mo ago

This works nicely: https://github.com/spotDL/spotify-downloader

Eckter26mo ago

There are a few tools that can export your spotify playlists into folders of audio files. That's what I used a few years ago for my initial spotify -> navidrome migration.

But they're not that good. They look for the songs on youtube, and the versions uploaded there are often modified (or just very low quality). And I've had some issues with metadata. I'd say about 5% of my songs had some issues, and 1% were completely off.

Once they release the actual torrents and not just the metadata, I'm assuming that new playlist export tools will soon show up, and they'll use these new torrents as source instead of youtube. They'll be a lot more reliable. I'd wait for that to happen. In fact I may end up re-exporting my old spotify playlist.

crazygringo6mo ago

This is where ChatGPT shines. Just ask it to write you a script, it'll give you all the instructions.

I've used ChatGPT to write a whole bunch of playlist logic scripts (e.g. create a playlist that takes tracks from playlists A, B and C, but exclude tracks in playlist D.)

2 more replies

emsixteen6mo ago

Exactly the same here, I just wanna back up my playlists and liked songs, in an organised and tagged manner, at a non-potato quality.

krick6mo ago· 5 in thread

Uh, cool, I guess? I want to applaud that, but, first off, unless you are OpenAI or Facebook, it is not exactly plausibly easy to participate in the festivities. Even if I had spare 300 TB laying around, how the fuck do I download that?

But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.

pjerem6mo ago

BitTorrent protocol doesn’t force you to download all of the files of a torrent :)

Now imagine a dedicated music client that will download and stream (and share, because we are polite) only the needed files :)

Spivak6mo ago

I am in no way saying that this is cheap but 300 TB will set you back a little less than $6k with tax. Very attainable for people other than OpenAI and Facebook. And it's not crazy at all to snag a server with enough bays to house all those.

3 more replies

chrneu6mo ago

think popcorn time for mp3s/flac instead of mp4.

a client can selectively list and then stream individual files from a huge torrent. if you've ever watched illegal movies/shows on those random domain websites, you're likely streaming it from a torrent on the backend somewhere.

it wouldn't surprise me if we start to see some docker images pop up in a few days to do exactly this as a sort of "quasi-self-hosted jellyfin". Where a person host a thin client on a machine that then fetches the data from the torrent, then allows the user to "select" their library. A user can just select "Top hits from the 80s" and it'll grab those files from the torrent, then stream or back them up.

I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.

killingtime746mo ago

You can download torrents selectively. I think if they adopted that cautious attitude they wouldn't exist in the first place

Gander57396mo ago

Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.

1 more reply

WD-426mo ago· 4 in thread

Incredible.

> A while ago, we discovered a way to scrape Spotify at scale.

They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

DUDOS6mo ago

How they manage to transfer 300TB of data while remaining anonymous is also astonishing.

5 more replies

derkades6mo ago

It is not hard. But please don't misuse it and ruin the fun for everyone. It is nice to be able to use the music relatively easily for hobby projects. My music server has functionality to play tracks from Spotify this way:

https://codeberg.org/raphson/music-server/src/branch/main/sp...

1 more reply

bambax6mo ago

"at scale" could mean they had direct access to a server or to storage, maybe because they had an insider giving them access, or they found secrets that had leaked somewhere?

bmikaili6mo ago

they're probably just using something like https://github.com/nor-dee/spotizerr-spotify

2 more replies

yegle6mo ago· 4 in thread

Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.

uhfraid6mo ago

spotify used to do just that (stream p2p) until 2014 or so

https://www.scribd.com/document/56651812/kreitz-spotify-kth1...

2 more replies

willio586mo ago

I recently got into the whole homelab *arr stack for things like movies and tv and while I know options exist for music I just don’t see the need yet price-wise. Spotify is still just cheap enough for me to not care enough. We’ll see how long this holds.

That being said it’s no secret Spotify and other streaming services barely pay even popular artists. Artists make money from live shows and merch. The fact that their music is behind a paywall at all could mean they make less money from some lack of exposure.

I do hope one day self-hosting music with an extremely easy setup with torrenting for sourcing is set up again. What I’m talking about exists to some extent, but it’s not trivial for most people.

3 more replies

pjerem6mo ago

Yeah we shouldn’t. But we may.

nness6mo ago

a la "Popcorn Time."

peterburkimsher6mo ago· 3 in thread

For a fully-legal alternative of metadata archiving, I suggest the iTunes EPF (Enterprise Partner Feed). https://performance-partners.apple.com/epf

The best metadata I've found, though, is the MySpace Dragon Hoard: https://archive.org/details/myspace_dragon_hoard_2010

That included the artist location, allowing me to tag songs based on their country. I then created playlists such as "NERAS" Non-English Rock Artist Sample, where the one most popular song for a particular artist was chosen, and only when the country of origin was not English-speaking, and the genre was Rock. I like listening to music while working, but English lyrics distract me because I understand what they're saying.

After discovering music via the MySpace archive, I've since purchased 73 songs from 35 artists that I'd never heard of before digging into the data. I rebuilt my playlist on Spotify, but got greyed out tracks, and YouTube Music, but got "unavailable video". So I still prefer purchasing tracks via the iTunes Music Store, Qobuz, Bandcamp, and 7digital.

Other data sources such as the MP3.com rescue barge, PureVolume archive, and Anna's Spotify archive lack the country-of-origin metadata, so are of less interest to me. It may be possible to use an LLM to guess the language of each track title, but someone else will have to do that.

Meanwhile, if you're interested in the genre-by-country MySpace data, or have questions about the iTunes EPF, feel free to reach out and we can discuss your research.

squigz6mo ago

> Other data sources such as the MP3.com rescue barge, PureVolume archive, and Anna's Spotify archive lack the country-of-origin metadata, so are of less interest to me. It may be possible to use an LLM to guess the language of each track title, but someone else will have to do that.

I would guess that combining these sources, along with info from MusicBrainz, would help quite a bit? Still, I'm rather surprised Spotify doesn't provide more information about artists.

realdeal796mo ago

With the MySpace stuff, where are you seeing the metadata? All of the the zips I’ve downloaded from the Dragon Hoard don’t have any metadata.

1 more reply

o_____________o6mo ago

> Please note that Apple Music and iTunes Music data will be migrating away from the Enterprise Partner Feed (EPF). Starting July 16, 2024

827a6mo ago· 3 in thread

Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.

rightbyte6mo ago

Wasn't all data available to users though?

1 more reply

okokwhatever6mo ago

Who cares now, it's already downloaded and ready to be torrented... God is good

potwinkle6mo ago

I mean... not really? Not much music is Spotify exclusive (at least from the 99.6% of what people listen to mentioned in the article), and from friends in the industry I can guarantee you all major content platforms (Netflix, Disney+, Prime Video, a large chunk of YouTube) have already been completely copied without a business agreement with the rightsholders by AI startups and big-name players.

mvkel6mo ago· 2 in thread

This work is so critical.

Read an article that was published just 10 years ago, and witness the bit rot as most external links will 404, gone forever.

I think it's worth questioning the value of preserving -everything-, but it seems like if we can, we should.

larodi6mo ago

You know, I had the (at time of writing) 600 something comments ran through Opus 4.5 and do a summary of the sentiments. It could't find a single comment that genuinely defends Spotify or expresses sympathy for the company.

HN crowd is, of course, biased in the technocratic sense, but you see - everyone seems to actually rejoice the move.

The closest to remorse is `linhns` and `locusofself` expressing concern about artists getting hurt (not Spotify itself), but locusofself prefaces with "I hate spotify as a company but..."

(disclaimer: this text is NOT LLM generated, I wrote myself a summary of the summary. here's the Claude thread should anyone care https://claude.ai/share/cfc4ca63-2b9e-47ac-a360-202025d1a134)

mycall6mo ago

Are those 404 links available on web.archive.org?

472828476mo ago· 2 in thread

Hmmm I don’t like this. There are sources for music with better quality out there and all this will do is paint them a bigger target for takedowns/prosecution. I am worried about losing their ebook library. Quoting from the announcement: “Generally speaking, music is already fairly well preserved.“ They should have done this as a separate identity.

xandrius6mo ago

The main difference is that people can re-host and seed part of the data by offering space in their own servers.

If AA goes down, it's not the end of it all, a new one comes back up and the seeders are still there.

1 more reply

lukan6mo ago

"and all this will do is paint them a bigger target for takedowns/prosecution"

They are based in russia. And they currently do not work together so well with the west.

So it is imaginable, that if some people give Trump quite some money, to make Annas takedown part of some deal to lift sanctions after a ceasefire in Ukraine, but .. it does not seem like it. I rather suspect more effort in the west to block access to unwanted sites like this. My ISP in germany is already blocking it.

3 more replies

gorbachev6mo ago· 2 in thread

Quoting from their page:

--------------

This is by far the largest music metadata database that is publicly available. For comparison, we have 256 million tracks, while others have 50-150 million. Our data is well-annotated: MusicBrainz has 5 million unique ISRCs, while our database has 186 million.

--------------

If they truly are on a mission to protect world's information from disappearing, they should work with MusicBrainz to get this data on it.

Alternatively, it would be amazing, if they built a MusicBrainz like service around it.

In either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks, assuming that data is not part of the metadata they collected.

aerozol6mo ago

It would be reasonably trivial to set up a bot that mass-imports metadata from Spotify to MusicBrainz (note that MB rules do not allow this, community cleanup from a single user doing this with another source, years ago, is still ongoing).

The value that MusicBrainz adds is the community editor who spent a few hours going through YouTube videos and wayback machine social links to figure out that Fog (Wellington, NZ, punk/post-punk) and Fog (Auckland, NZ, Post-Punk) are different bands - even if they share a Spotify profile. The editor that hunted down and listened to 5 compilations that have mixed up a radio edit and an original mix of a track, to find out which is which, and separate them in MB and make notes. [these are made up examples]

That's not to imply that these two projects are 'competing', or that the ISRC figure comparison isn't useful and correct. But community database + scraped data is apples and oranges. And a mixed fruit bowl is wonderful.

2 more replies

472828476mo ago

> n either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks

How is that a problem?

    for each track in collection do extract_fingerprint

djfergus6mo ago· 2 in thread

Anna’s Archive has largely flown under the radar by focusing on books.

Even perceived involvement in music piracy puts a much bigger target on their back from far more aggressive actors (RIAA, major labels)

pmdr6mo ago

The bulk of today's customers has no idea how to pirate music, so they're not really a threat anymore. Music streaming has been rather convenient, you pretty much get the same content across all services. Video streaming platforms have, unfortunately become fragmented and, as of late, ad-ridden.

reassess_blind6mo ago

“Good luck, we don’t care.” is their stance, as far as I can tell.

frereubu6mo ago· 2 in thread

Site is down for me. Archive link: https://archive.is/jf3HW

mawax6mo ago

Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.

1 more reply

ipsum26mo ago

Ironic. But its working for me.

walthamstow6mo ago· 2 in thread

Very interesting that a white noise track for babies is the 4th most popular track on Spotify.

cluckindan6mo ago

Interesting if that is considered to be copyrightable. Any white noise track is perceptually indistinguishable from another, but none have the exact same sequence of samples except by chance, or if the noise generator happens to be deterministic as a function of time.

1 more reply

al_borland6mo ago

I find it so odd that people then to streaming services for stuff like this. I have a dedicated white noise machine, and when I travel, I use the white noise (bright noise actually) built into the iPhone.

Relying on an external hosted service would never cross my mind, and surely wouldn’t be something I go to on a daily basis.

2 more replies

xandrius6mo ago· 1 in thread

Truly amazing work. I couldn't help but being sad of the less popular songs not being currently stored, as those are definitely the ones more in risk of being lost forever.

If you like the goal and you have even a few 100gb available on your server, consider "donating" some of that space to seeding the data (music or books). It's absolutely how we can fight the system, even if just a tiny bit. https://annas-archive.org/torrents

squigz6mo ago

Going off the blog post, archiving the rest of Spotify (which only represents 0.4% of total listens) would bring the total size up to something like 1PB, and would likely include a huge amount of AI generated stuff, which I don't think is worth it. I'd rather see them focus resources on archiving other stuff.

1 more reply

p0w3n3d6mo ago· 1 in thread

This is something really important, especially in the days when music and film vanishes from platforms one by one. I myself have three playlists with greyed out titles (titles are missing so there's no possibility for me to find out what was there).

That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day

eightys3v3n6mo ago

I really appreciate platforms that still show the titles and metadada after something is removed. Then at least I can go find it again to maintain my collection. Tidal does this.

yellow_lead6mo ago· 1 in thread

Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify

artninja19886mo ago

Yeah, in the article they write:

The data will be released in different stages on our Torrents page:

[X] Metadata (Dec 2025)

[ ] Music files (releasing in order of popularity)

[ ] Additional file metadata (torrent paths and checksums)

[ ] Album art

[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)

1 more reply

ZeWaka6mo ago· 1 in thread

Since the article asks:

> We're curious about the peaks at whole minutes (particularly 2:00, 3:00, 4:00). If you know why this is, please let us know!

As a hobby video/audio editor, people will start with their track taking up a preset amount and fill up the time - even if it means having some dead space at the end.

The other alternative is algorithmically created music.

nemomarx6mo ago

I've heard 2:00 is some kinda sweet spot for the Spotify algorithm and payouts? You get paid per play so you don't want to it too long, but if your track is much shorter than two minutes you get penalized or something. I know they've had to remove ambient tracks that were cut into 40 second clips as part of this.

So you might see a lot of anchoring just like YouTube videos kept stretching to almost exactly ten minutes?

syntaxing6mo ago· 1 in thread

Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.

robotbikes6mo ago

This already exists and is interesting to play around with - https://github.com/ASLP-lab/DiffRhythm

nighthawk4546mo ago· 1 in thread

Amazing! I wonder if the Every Noise At Once[1] site could be updated with the metadata from this?

[1] https://everynoise.com/

iggldiggl6mo ago

Thanks for linking that page, interesting rabbit hole that I hadn't heard about until today…

throwaway6137456mo ago· 1 in thread

I wonder how deep the hole they're gonna put whoever runs this site into is gonna be?

urbandw311er6mo ago

I heard they’re based in Russia so one assumes they probably will be welcomed by the current government (or even aided) rather than prosecuted.

sneak6mo ago· 1 in thread

199GB, only metadata released for now.

Magnet link found here: https://annas-archive.li/torrents/spotify

Are magnet links allowed on HN?

cranberryturkey6mo ago

that is only 199gb, the real one is 300TB

636mo ago· 1 in thread

Attracting the ire of the music industry seems like a huge, unnecessary risk. I wish they had performed this as some kind of other entity to try to keep the ebook archive protected from the fallout. I fear this will not end well.

urbandw311er6mo ago

They can’t be touched by the music industry they’re based in Russia.

userbinator6mo ago· 1 in thread

Music files (releasing in order of popularity)

Increasing or decreasing? IMHO increasing would make more sense, as the most popular music is already mirrored in countless other places. It's the rare stuff that is most in need of preservation.

I wonder how much of the content there is AI-generated. Honestly, even as someone who was initially skeptical, I've found some of it to be rather good --- not knowing that it was AI-generated at first. Now if they could only reverse-engineer the prompt and only store the model, that would be an extremely efficient form of "compression".

reassess_blind6mo ago

Same model and same prompt won’t necessarily create the same result, unless I misunderstand how these audio models work.

1 more reply

aftbit6mo ago· 1 in thread

Has anyone tried to add up the track file size from the metadata dump?

In spotify_clean_track_files.sqlite3:

    SELECT count(*), sum(filesize_bytes) FROM track_files;

    255966403|15970064861274

That's only 14.5 TiB, nowhere near 300 TiB. What makes up the other 285 TiB of content?

squigz6mo ago

That's curious and changes things pretty dramatically. It's a lot easier to host 15TB than 300. I wonder what's up here.

TheAceOfHearts6mo ago· 1 in thread

I wonder if they'll explore other music services as well. As I understand it, Deezer, Qobuz, and Tidal can all get ripped easily enough. Although I'm not sure if they rate limit downloads past a certain point.

I'm a bit sad that they chose to focus on music rather than audiobooks. Creating an archive of audiobooks seem like it would be more aligned with their mission.

TechSquidTV6mo ago

The metadata is gold, but I was immediately curious why why wouldnt go for Tidal first. Though what ever they have on Spotify I think is unique.

romanovcode6mo ago· 1 in thread

`spotdl download "https://open.spotify.com/user/{username}" --user-auth --output '{list-name}/{title} - {artists}.{output-ext}'`

This is literally all you need to back up Spotify.

Philpax6mo ago

spotdl downloads from YouTube, not Spotify, afaik

artninja19886mo ago· 1 in thread

Wow. Anna is a godsend. Hopefully now we get some really good open source music models

brcmthrowaway6mo ago

First we need good stem splitting

1 more reply

markstos6mo ago· 1 in thread

> ≥70% of songs are ones almost no one ever listens to (stream count < 1000).

So much interesting but undiscovered music is out there!

halperter6mo ago

It would be interesting to find out how that has changed with the growth of the music industry over the years. I suspect that many of these <1000 streamed could be artificially generated for monetary purposes but I'm not entirely sure. That being said, there is a lot of good music with less than 1000 streams. I've been looking myslef and I've definitely found some hidden gems.

bob10296mo ago

I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.

There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.

Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?

shevy-java6mo ago

Hmm. This is actually not really something I need, I think; but I consider anna's archive etc... as about as important as the internet web archive. We need to preserve data, at the least important data, also historic data - how the original websites looked. Creativity of past generations. Same for games and books.

It may be only ~30 years for webpages to have emerged, but there are also many young people who may not have experienced that since they are too young to have experienced it. There is always a generational change; our generation has the opportunity to store more things.

yoan92246mo ago

The metadata alone is incredibly valuable for researchers. Having 186 million ISRCs catalogued with associated genre, tempo, and popularity data is a goldmine for music analysis that doesn't even require touching the audio files.

  I've always found it interesting how streaming services have become the de facto music library of record, yet they can and do remove content at will. When Spotify pulled out of Russia, entire catalogs became inaccessible. Physical media and personal archives suddenly matter again in ways we thought were obsolete.

  The copyright discussion is complex, but from a pure preservation standpoint, I'm glad someone is doing this work.

tristanc6mo ago

This is one of the greatest news I've ever heard for the digital preservation community. Just so many projects over the years could have used resources like this. Thank you for contributing to humankind!

Fizzadar6mo ago

I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.

frytaped6mo ago

It seems to be that the metadata doesn't include the lyrics, probably because they are provided by Musixmatch. It would have been nice to have a database of lyrics linked to ISRCs. AFAIK Lrclib doesn't support downloading lyrics for a given ISRC.

zzzeek6mo ago

great. Spotify just removes things all the time (things I actively listen to and work on for my jazz practices, one day just go "poof" because they didn't want to pay the record company anymore), and they are not as a company deserving of the role of "keeper of all the world's music". They don't give a shit and they'd vastly prefer we all listen to their AI generated royalty free crap and Joe Rogan.

DoctorOetker6mo ago

I'd rather see them use AI to convert all the scanned scientific articles into proper PDF or other formats.

Also sort and classify the articles by binary size, vs page count, plot count, raster image count etc, in order to compress the outliers and detect when a raster image should have been a plot and convert it to vectorized images etc.

How compact can we get the collective human scientific corpus?

acjohnson556mo ago

This is incredible. I once assembled a collection of 100,000 tracks for research on exploration of large music libraries. Essentially vector search. I was limited in storage and processing power to a single machine.

If I were to do it today, I could get so much farther with hyperscaler products and this dataset.

bguberfain6mo ago

We can finally search for playlists with a giving song! A basic feature that Spotify is missing!

nmz6mo ago

This might be the perfect time to do archiving before the entire internet gets inundated by sub-par AI generated content.

Motorbytes6mo ago

Does the Spotify backup contain any so far grayed out or unavailable songs on their list?

I'm a music archivist & preservationist, I've archived and found several formerly lost or on the verge of becoming lost albums, EPs, and Singles, and I've been wondering if the backup of Spotify so far, even with the available info, contain any taken down, region limited, or no longer available songs?

any response is appreciated!

pekkag6mo ago

Extremely useful statistics. However, users need to know that IRSC codes are not really unique identifiers. The code was created to identify unique digital tracks (recordings). When older analog recordings (there are millions of them) the publisher assigns it an ISRC code, which shows the year of reissue. If the recording is in public domain, anyone can reissue it and assign it a new ISRC code. Even if the recording is still in copyright, the company can assign each new rerelease a new code - all with a different year. So be careful with interpreting statistics based on these codes.

junon6mo ago

TIL Anna's Archive is blocked in Germany (by a rather obtrusive MitM, I might add). Get redirected to a "Copyright Clearing House" or something.

Jumpmanlives6mo ago

Good stuff Anna's Archive. The Anchormen, premium sea shanty crew from Western Australia, officially endorsed you sharing our salty tunes. https://www.facebook.com/theanchormenwa/

https://open.spotify.com/album/07IyzOA9jJWPZcLDysQwpo?si=KZO...

xnx6mo ago

Merry Christmas!

Mr_Minderbinder6mo ago

> Over-focus on the highest possible quality

This is not an issue in my view. I like the fact that I can download 100 MiB ultra-high resolution TIFF files of scans of photographs from the original negative from the Library of Congress and 24-bit/96kHz FLAC files of captures of 78 RPM records from the Internet Archive. In addition to maintaining completeness and quality of information, one of the main goals of preservation is to guard against further degradation and information loss. You should try to preserve the highest quality copies available (because they contain more information) and re-encoding (deliberate degradation) should only be used to create convenient access copies.

Inferior copies, in addition to being less informative, have the potential to misinform. Only the archivist will enjoy space savings. All the readers who might consult your library in the infinite future will bear the cost.

> ...(e.g. lossless FLAC). This inflates the file size...

This is entirely the wrong view. The file size of a raw capture compressed to FLAC should be thought of as the “true” or “correct” size. It is roughly the most efficient (balancing various trade-offs) representation of sampled audio data that we can presently achieve. In preservation we seek to preserve the item or signal itself and not simply what we might perceive thereof. This human-centric perception view is just wrong. There is data in film photographs which cannot be perceived visually yet can be of interest to researchers and be revealed with digital image analysis tools.

As an example of how much information celluloid can contain see: https://vimeo.com/89784677 (context: he is comparing a Blu-ray and a scan of a 35mm print)

HawkEyeSpaceMan5mo ago

Not worth the risk imo. This might backfire at some point and ruin a good thing with the book libraries.

Kerollmops6mo ago

So nice! That's an excellent extract and looks useful for benchmarking Meilisearch. I'll probably spend my Christmas holidays importing the tracks, albums, and artists into Meilisearch, while my CEO builds a beautiful front-end for it. I'll probably replace [the current music search demo](https://music.meilisearch.com) we have with this much higher-quality dataset!

That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.

new_hair5mo ago

Rookie Question, but how do i access all this metadata especially in a cleaner way, or genre-wise for my project development.

Yeri6mo ago

wow. Blocked in Belgium.

Error HTTP 451 - Unavailable For Legal Reasons

https://lumendatabase.org/notices/71398835

krackers6mo ago

New multimodal training set just dropped.

pranavm276mo ago

Miss anna, next time please scale down image dimensions so that us on mobile can read properly haha

Jokes aside, I always thought the best way to deal with piracy was to understand or convince the demand not to do it over dealing with the supply.

machloof6mo ago

Thats huge, altho as a musician myself i am kinda scared of ai just taking all this data so they could make music better then me, i dunno maybe drop in there an anti ai trap zipbomb or somthing, that way it will work for normal users but not for ai

Aldipower6mo ago

Oh, just noticed my provider "Vodafone Germany" is blocking the domain annas-archive.li on DNS level.

puffpuff123456mo ago

Amazing!

Is there any way to search this spotify database without downloading the currently available metadata torrent?

Uninen6mo ago

I hope someone builds an open API around this metadata. I'd love to have alternatives to the big player APIs.

tolerance6mo ago

I am not enthused by this news. Let us entertain the possibility that similar institutions will eschew this catalog.

soundsgoodman6mo ago

You need to seriously re-think this...

Releasing indie music, like really low-level indie music, for free in the name of "preservation" is so misguided.

Don't do this. You will only end up hurting the artists who rely on paid downloads.

thenthenthen6mo ago

Full circle! Thank you! (https://torrentfreak.com/how-the-pirate-bay-helped-spotify-b...)

performative6mo ago

this is a really incredible effort. but, for the developers and analysts currently working with music metadata in a world where so much of music is being consumed thru streaming services that keep a tight hold on how their metadata and album art can be used, i am constantly yearning for a way to link streaming releases to public metadata sources that can be manipulated, embedded, and queried. i've done my best to build my own w/o a background in data science, but it's a hole that desperately needs filling to enable the new generation of scrobbling/music listening habit exploration.

ewzimm6mo ago

The data analysis here is interesting. One thing that stood out to me is that black metal is the 6th most common musical genre for bands, right after rockabilly. I would never have expected that.

htx80nerd6mo ago

>Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

There is a ton of good bands with under 10k or even 1k monthly listeners.

ThinkBeat6mo ago

Can this last?

I envision an army of lawyers and cyber security companies being prepared to unleash a scorched earth campaign that book publishers might want to be part of as well.

At the end it may take down more than just this publication but most others as well.

meysamazad6mo ago

I wonder if Spotify will pursue any legal actions to take this archive or the site down!

baxuz6mo ago

> The quality is the original OGG Vorbis at 160kbit/s.

Yeah, the original quality is either a 320kbps OGG or lossless. Not 160.

While this is _a_ backup, it's a pretty lossy one.

none149886mo ago

Downloading of individual files to Anna’s Archive Please

haghiri756mo ago

I guess having an API to do search on metadata may be cool. Anyone thought of that?

schmuckonwheels6mo ago

I want to time-travel back to 2000 like Old Biff with the sports almanac so I can tell Shawn Fanning to use the "it's for historical preservation" defense.

nutjob26mo ago

I wonder how definitive their collection is and how much ripping Google Music/YouTube would improve on this.

A distributed ripping project to do that would be a fine thing.

damnitbuilds6mo ago

Well done !

Until we have reasonable copyright terms, Pirate On !

hmokiguess6mo ago

What an early christmas gift for humanity. Now, asking for a friend, what's the ideal setup for torrenting this? Mullvad / Tailscale?

lanalanabobana6mo ago

these guys are 100% selling that data to "AI" companies for thousands of dollars so the internet and world at large can get a little more shitty. awesome -_-

gorbachev6mo ago

I want to peek in that metadata collection to see if it could be used to identify the AI slop that's infecting Spotify.

If you could identify a track supposedly by artist X was actually AI slop not created by artist X, you could use that information to skip tracks on (web) music players, for example.

shomp6mo ago

If only Spotify paid musicians their fair share

wartywhoa236mo ago

https://annas-archive.li/llm

_vqpz6mo ago

I really don't understand how focusing on source quality files is supposed to be a "major issue" with the music preservation community. It's bizarre for them to talk about these being barriers for creating a "full archive of all music that humanity has ever produced" have and their answer be scraping Spotify to end up with a music library comprised of many AI and bulk produced songs at 75/160kbps.

dmix6mo ago

I hope they get the new lossless versions

thih96mo ago

This is conspiracy theory territory but I wonder if big tech is sponsoring efforts like this as an easy way to get training data.

littlecranky676mo ago

For some reason, the link does not work for me (spain). Works perfect at the same time in tor browser.

fungonimus6mo ago

I would like a downloader! :D this is such an awesome project

m00dy6mo ago

Congrats! I’m sure the Spotify lawyers are gonna have some sleepless nights ahead.

gverrilla6mo ago

GREAT DAY

none149886mo ago

Downloading of individual files to Anna’s Archive Please!

eastoncrafter6mo ago

Plans to upload all this to musicbrainz soundid program?

eastoncrafter6mo ago

Plans to upload all of this to music brainz soundid?

msephton6mo ago

Is this all regions? I'm assuming so but I can't be sure

marstall6mo ago

the top 10,000 songs seem to be 99.9% top-40 corporate pop, which suprised me. thought a list that broad would pick up more that was outside the maintream ...

1 more reply

reactordev6mo ago

Oh this is going to go over real well in Nashville, TN.

gyrgtyn6mo ago

is there a torrent client already that is be good at partial downloads? I didn't realize how popcorn time worked until I read this thread.

1 more reply

BaudouinVH6mo ago

error 451 https://postimg.cc/QFddnW41

siquick6mo ago

Is there a way to see the shape of the metadata?

rendaw6mo ago

Looking at the analysis, I'm totally surprised opera and psytrance are so prolific.

Psy-trance... I thought it was the same as any other electronic genres, but do people get high and just start shoveling psy-trance tracks out or something?

Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?

I mean, I'm sure there's some misclassification, but chamber music is basically a couple people with any sort of music training on classical instruments so that doesn't surprise me nearly as much... I can easily imagine there being _lots_ of those, and you might come up with a different artist name for each unique set of people you collaborate with.

5 more replies

iqandjoke6mo ago

That’s why Spotify would lose against Apple. Spotify may need to pay a fortune for this scraper behaviour while Apple Music does not.

simmo90006mo ago

We need insane for culture to survive.

7ero6mo ago

free the music

RickyLahey6mo ago

This will be great to train AI on.

rldjbpin6mo ago

the metadata alone is a staggering couple hundred gb, however it contains quite handy information to play with. consider the following:

> /audio-features/{id} "Get audio feature information for a single track identified by its unique Spotify ID."

this combined with track metadata can finally allow those motivated enough to create their own personalized shuffle. potentially better than the slop we get nowadays. no generative ai required*.

Varaldar6mo ago

im thinking about the consolidation around minute marks. its at every minute mark below 10 minutes, albeit dropping precipitously after 4 minutes. i have 2 guesses. guess one is that people like even numbers so if a track was already going to be within so many seconds of exactly a minute mark that they are more likely to push it to that number. with people caring less above 4 minutes because you are already making a long song, i could imagine caring less at that point. but my second guess is that along with the vast increase of ai slop posted to spotify both by spotify themselves and by other people, some of the programs they use probably fix on minute increments. like how a lot of ai videos are 10 seconds long or a series of 10 second videos. just a guess, however. i have no information or facts to back this up

verisimi6mo ago

Yes, but do they have the one that goes like: to-to-to dotodoo? Hmmm? Do they?

udoyxyz6mo ago

yo, this is insane!! why would anyone do that? I think it is for AI music generation models, like training them. Maybe ai labs people did it?? yeah that is likely

dbacar6mo ago

Now, anyone with some decent info on signal processing and machine learning can build his/her own Shazam.

shmerl6mo ago

Just buy music DRM-free in the first place.

snoozebutton6mo ago

is this not highly illegal?

1 more reply

throw-12-166mo ago

I love coming to these threads to read the pearl clutching of "technologists" who suddenly care about IP and copyright law.

sma3in6mo ago

spotify undressed

bekindtoartists6mo ago

I’m hugely disappointed in Anna’s archive. As much as they believed they were doing this for good, they have now allowed bad faith actors to obtain all music for AI gen. This is just horrific for all artists out there who are fighting against so many issues that impact their creativity and sustainability. Why not just digest the data and not allow the music out there. As usual artists get fucked over.

1 more reply

zoklet-enjoyer6mo ago

Wow. Now I just need some hard drives and a way to download that without my ISP doing something about it. That's amazing.

1 more reply

haryj6mo ago

wow

1dry6mo ago

Yuck. Just to make it easier to train slop machines. The point of art is not to have completionist archives of EVERYthing that’s ever been made! Let it die. Death is the most natural part of life. Art is about the human experience, not “for researchers”.

The point is human connection. Art is a living reflection and record of human experience. Art will persevere- the kinds of folks who prioritize what they like based on popularity were never the supporters artists (contrast with craftspeople trying to make a buck) counted on in the first place. Enjoy your derivative slop - we’ll continue on our imperfect, messy, individual, human artistic lives.

1 more reply

linhns6mo ago

Unlike books, which are massively overpriced, this will hurt artists a lot as they need the fees paid by Spotify to make ends meet.

3 more replies

j / k navigate · click thread line to collapse

701 comments

223 comments · 114 top-level

crazygringo6mo ago· 24 in thread

This is insane.

I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Aurornis6mo ago

> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

11 more replies

gorbachev6mo ago

Flippant response: If it's ok for Meta for commercial use, why not for researchers for legitimate research work?

More serious response: research is explicitly included in fair use protections in US copyright law. News organizations regularly use leaked / stolen copyrighted material in investigative journalism.

2 more replies

VanTheBrand6mo ago

The metadata is probably more useful than the music files themselves arguably

2 more replies

zuspotirko6mo ago

Are you aware Annas Archive already solved the exact same problem with books?

1 more reply

thiht6mo ago

> this doesn't even seem particularly useful for average consumers/listeners

I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.

IshKebab6mo ago

It's probably going to make the AI music generation problem worse anyway...

1 more reply

sowbug6mo ago

Can you imagine your favorite playlist needing to swap among 10 apps, each requiring a $10/month subscription?

fsckboy6mo ago

>The thing is, this doesn't even seem particularly useful for average consumer

it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?

the average consumer will flock when Spotify ultimately enshitifies

2 more replies

basisword6mo ago

Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

1 more reply

hugholousk6mo ago

Forgeties796mo ago

Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol

stefan_6mo ago

1 more reply

troupo6mo ago

larodi6mo ago

firefax6mo ago

>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

What's stopping someone from sticking a microphone next to their speaker?

Slow, but effective.

4 more replies

thaumasiotes6mo ago

> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Do they have DRM at all? Youtube and Pandora don't.

5 more replies

cm20126mo ago

This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.

5 more replies

londons_explore6mo ago

> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?

ccppurcell6mo ago

>The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Curious why not? Assuming you only used the metadata. I think they would be considered raw facts and not copyrightable.

madduci6mo ago

The first users of this dataset will be Big Tech corps. Meta, Alphabet, OpenAI, Microsoft, Apple will all be happy to use this dataset for training their LLMs.

For them, 300TB is just cheap

1 more reply

1dry6mo ago

4 more replies

robtherobber6mo ago

As a society, we should do our best to preserve this trove.

hkt6mo ago

Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.

shevy-java6mo ago

> The thing is, this doesn't even seem particularly useful for average consumers/listeners

Etheryte6mo ago· 10 in thread

[0] https://en.wikipedia.org/wiki/What.CD

flxy6mo ago

6 more replies

VanTheBrand6mo ago

4 more replies

rckclmbr6mo ago

2 more replies

josteink6mo ago

> What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth.

There was quantity, sure, but that was secondary to the quality. The quantity was just a side-effect of the place being known for quality, making it an attractive arena to participate in.

And it also had all the "weird"/non-standard things you don't find on mainstream streaming-services precisely because that is what independent curators are good at and often driven by.

This Anna's release... While in itself impressive in many ways does not compare to the things What.CD represented. It's almost the exact opposite:

- focus on most popular content - niche content (even by mainstream Spotify-standards) is not included

This is definitely Apples vs Oranges.

layer86mo ago

So there’s some way to go for a comprehensive music archive.

b86mo ago

Redacted, their replacement has more records then they had now.

rldjbpin6mo ago

about the scale, the same album in the tracker had several submissions, for dedicated format and regional editions.

while one can compare in terms of number of tracks, the quality used to be in another level altogether. from the article:

> The quality is the original OGG Vorbis at 160kbit/s.

now if hypothetically tidal had all the music of the world and was accessible this way, then it would be a comparable resource. insane regardless.

1 more reply

WadeGrimridge6mo ago

anna's rip has ~86m tracks, not ~186. ~186m is metadata, specifically ISRCs.

laughingcurve6mo ago

Wow, I have not thought about OiNK in ages... great memories! OiNK and WhatCD did something very special for the musical community

SSLy6mo ago

Well, what.cd counted any album as one torrent. While current spotify has also podcasts and AI slop.

lelouch90996mo ago· 7 in thread

How legal is this with regards to copyright laws?

Aurornis6mo ago

Not legal. This group does not concern themselves with copyright law.

1 more reply

toomuchtodo6mo ago

Adherence to the legal framework is a function of your risk appetite.

luke-stanley6mo ago

ronsor6mo ago

Very, if we delete copyright like we're supposed to.

phainopepla26mo ago

Not legal

layer86mo ago

Completely illegal.

1 more reply

basisword6mo ago

It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.

7 more replies

virtualritz6mo ago· 6 in thread

I didn't know German providers do this.

oarfish6mo ago

Yeah this is actually quite nefarious, as it is a private organization that decides what sites get blocked, with no legal oversight.

- https://de.wikipedia.org/wiki/Clearingstelle_Urheberrecht_im...

- https://netzpolitik.org/2024/cuii-liste-diese-websites-sperr...

Its a DNS based block, so overriding your default DNS server is enough to circumvent it. I think Dns over Https also works.

1 more reply

croemer6mo ago

Alternative: https://archive.ph/2025.12.21-050644/https://annas-archive.l...

1 more reply

iknowstuff6mo ago

In that vein, I am trying to find out why searching for

    alextud popcorntime

which should trivially yield http://github.com/alextud/PopcornTimeTV results in anything but that one particular URL in every search engine: Google, Kagi, DuckDuckGo, Bing

They even find a fork of that particular repo, which in turn links back to it, but refuse to show the result I want. Have't found any DMCA notices. What is going on?

3 more replies

polytely6mo ago

Also true in the Netherlands, I hate these copyright freaks constantly trying to restrict access.

junon6mo ago

Was also shocked to see that (Berlin, Telekom here).

sva_6mo ago

They also block some foreign "news" like Russia Today last time I checked.

vlaaad6mo ago· 6 in thread

xyzzy_plugh6mo ago

2 more replies

gck16mo ago

When they launched Discover Weekly thing, I used to add at least 1 track from it to my library - it was insanely good. Now it's all junk - not even close to what I listen to.

There hasn't been a change in Spotify in last 7 years or so that wasn't negative.

layer86mo ago

YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.

2 more replies

eastbound6mo ago

This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.

wintermutestwin6mo ago

Why do you want a megacorp to tell you what to listen to!?? There are a million ways to do discovery where some enshitified corp isn’t incentivized to push something at you.

1 more reply

venturecruelty6mo ago

Why haven't you unsubscribed then?

ipsum26mo ago· 5 in thread

Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.

ghostie_plz6mo ago

Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point

1 more reply

adzm6mo ago

For electronic music, it's around the lowest bass root note that most systems can play well without a subwoofer. C pretty much requires a sub and things rarely go lower than that.

kzrdude6mo ago

1 more reply

klysm6mo ago

Difficult to play in what instrument?

1 more reply

RickyLahey6mo ago

i believe the most popular reason is capo on 1st fret when writing songs, other factors coming 2nd or 3rd (electronic music, sped up old samples, etc)

tjoff6mo ago· 5 in thread

I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.

lelandfe6mo ago

https://developer.spotify.com/documentation/web-api/referenc...

I bet you can whip up a super simple script with an LLM to do this!

1 more reply

hn1116mo ago

This works nicely: https://github.com/spotDL/spotify-downloader

Eckter26mo ago

There are a few tools that can export your spotify playlists into folders of audio files. That's what I used a few years ago for my initial spotify -> navidrome migration.

crazygringo6mo ago

This is where ChatGPT shines. Just ask it to write you a script, it'll give you all the instructions.

I've used ChatGPT to write a whole bunch of playlist logic scripts (e.g. create a playlist that takes tracks from playlists A, B and C, but exclude tracks in playlist D.)

2 more replies

emsixteen6mo ago

Exactly the same here, I just wanna back up my playlists and liked songs, in an organised and tagged manner, at a non-potato quality.

krick6mo ago· 5 in thread

pjerem6mo ago

BitTorrent protocol doesn’t force you to download all of the files of a torrent :)

Now imagine a dedicated music client that will download and stream (and share, because we are polite) only the needed files :)

Spivak6mo ago

3 more replies

chrneu6mo ago

think popcorn time for mp3s/flac instead of mp4.

I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.

killingtime746mo ago

You can download torrents selectively. I think if they adopted that cautious attitude they wouldn't exist in the first place

Gander57396mo ago

Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.

1 more reply

WD-426mo ago· 4 in thread

Incredible.

> A while ago, we discovered a way to scrape Spotify at scale.

They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

DUDOS6mo ago

How they manage to transfer 300TB of data while remaining anonymous is also astonishing.

5 more replies

derkades6mo ago

https://codeberg.org/raphson/music-server/src/branch/main/sp...

1 more reply

bambax6mo ago

"at scale" could mean they had direct access to a server or to storage, maybe because they had an insider giving them access, or they found secrets that had leaked somewhere?

bmikaili6mo ago

they're probably just using something like https://github.com/nor-dee/spotizerr-spotify

2 more replies

yegle6mo ago· 4 in thread

uhfraid6mo ago

spotify used to do just that (stream p2p) until 2014 or so

https://www.scribd.com/document/56651812/kreitz-spotify-kth1...

2 more replies

willio586mo ago

I do hope one day self-hosting music with an extremely easy setup with torrenting for sourcing is set up again. What I’m talking about exists to some extent, but it’s not trivial for most people.

3 more replies

pjerem6mo ago

Yeah we shouldn’t. But we may.

nness6mo ago

a la "Popcorn Time."

peterburkimsher6mo ago· 3 in thread

For a fully-legal alternative of metadata archiving, I suggest the iTunes EPF (Enterprise Partner Feed). https://performance-partners.apple.com/epf

The best metadata I've found, though, is the MySpace Dragon Hoard: https://archive.org/details/myspace_dragon_hoard_2010

Meanwhile, if you're interested in the genre-by-country MySpace data, or have questions about the iTunes EPF, feel free to reach out and we can discuss your research.

squigz6mo ago

I would guess that combining these sources, along with info from MusicBrainz, would help quite a bit? Still, I'm rather surprised Spotify doesn't provide more information about artists.

realdeal796mo ago

With the MySpace stuff, where are you seeing the metadata? All of the the zips I’ve downloaded from the Dragon Hoard don’t have any metadata.

1 more reply

o_____________o6mo ago

> Please note that Apple Music and iTunes Music data will be migrating away from the Enterprise Partner Feed (EPF). Starting July 16, 2024

827a6mo ago· 3 in thread

Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.

rightbyte6mo ago

Wasn't all data available to users though?

1 more reply

okokwhatever6mo ago

Who cares now, it's already downloaded and ready to be torrented... God is good

potwinkle6mo ago

mvkel6mo ago· 2 in thread

This work is so critical.

Read an article that was published just 10 years ago, and witness the bit rot as most external links will 404, gone forever.

I think it's worth questioning the value of preserving -everything-, but it seems like if we can, we should.

larodi6mo ago

HN crowd is, of course, biased in the technocratic sense, but you see - everyone seems to actually rejoice the move.

The closest to remorse is `linhns` and `locusofself` expressing concern about artists getting hurt (not Spotify itself), but locusofself prefaces with "I hate spotify as a company but..."

(disclaimer: this text is NOT LLM generated, I wrote myself a summary of the summary. here's the Claude thread should anyone care https://claude.ai/share/cfc4ca63-2b9e-47ac-a360-202025d1a134)

mycall6mo ago

Are those 404 links available on web.archive.org?

472828476mo ago· 2 in thread

xandrius6mo ago

The main difference is that people can re-host and seed part of the data by offering space in their own servers.

If AA goes down, it's not the end of it all, a new one comes back up and the seeders are still there.

1 more reply

lukan6mo ago

"and all this will do is paint them a bigger target for takedowns/prosecution"

They are based in russia. And they currently do not work together so well with the west.

3 more replies

gorbachev6mo ago· 2 in thread

Quoting from their page:

--------------

If they truly are on a mission to protect world's information from disappearing, they should work with MusicBrainz to get this data on it.

Alternatively, it would be amazing, if they built a MusicBrainz like service around it.

aerozol6mo ago

2 more replies

472828476mo ago

> n either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks

How is that a problem?

    for each track in collection do extract_fingerprint

djfergus6mo ago· 2 in thread

Anna’s Archive has largely flown under the radar by focusing on books.

Even perceived involvement in music piracy puts a much bigger target on their back from far more aggressive actors (RIAA, major labels)

pmdr6mo ago

reassess_blind6mo ago

“Good luck, we don’t care.” is their stance, as far as I can tell.

frereubu6mo ago· 2 in thread

Site is down for me. Archive link: https://archive.is/jf3HW

mawax6mo ago

Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.

1 more reply

ipsum26mo ago

Ironic. But its working for me.

walthamstow6mo ago· 2 in thread

Very interesting that a white noise track for babies is the 4th most popular track on Spotify.

cluckindan6mo ago

1 more reply

al_borland6mo ago

Relying on an external hosted service would never cross my mind, and surely wouldn’t be something I go to on a daily basis.

2 more replies

xandrius6mo ago· 1 in thread

Truly amazing work. I couldn't help but being sad of the less popular songs not being currently stored, as those are definitely the ones more in risk of being lost forever.

squigz6mo ago

1 more reply

p0w3n3d6mo ago· 1 in thread

That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day

eightys3v3n6mo ago

I really appreciate platforms that still show the titles and metadada after something is removed. Then at least I can go find it again to maintain my collection. Tidal does this.

yellow_lead6mo ago· 1 in thread

Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify

artninja19886mo ago

Yeah, in the article they write:

The data will be released in different stages on our Torrents page:

[X] Metadata (Dec 2025)

[ ] Music files (releasing in order of popularity)

[ ] Additional file metadata (torrent paths and checksums)

[ ] Album art

[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)

1 more reply

ZeWaka6mo ago· 1 in thread

Since the article asks:

> We're curious about the peaks at whole minutes (particularly 2:00, 3:00, 4:00). If you know why this is, please let us know!

As a hobby video/audio editor, people will start with their track taking up a preset amount and fill up the time - even if it means having some dead space at the end.

The other alternative is algorithmically created music.

nemomarx6mo ago

So you might see a lot of anchoring just like YouTube videos kept stretching to almost exactly ten minutes?

syntaxing6mo ago· 1 in thread

Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.

robotbikes6mo ago

This already exists and is interesting to play around with - https://github.com/ASLP-lab/DiffRhythm

nighthawk4546mo ago· 1 in thread

Amazing! I wonder if the Every Noise At Once[1] site could be updated with the metadata from this?

[1] https://everynoise.com/

iggldiggl6mo ago

Thanks for linking that page, interesting rabbit hole that I hadn't heard about until today…

throwaway6137456mo ago· 1 in thread

I wonder how deep the hole they're gonna put whoever runs this site into is gonna be?

urbandw311er6mo ago

I heard they’re based in Russia so one assumes they probably will be welcomed by the current government (or even aided) rather than prosecuted.

sneak6mo ago· 1 in thread

199GB, only metadata released for now.

Magnet link found here: https://annas-archive.li/torrents/spotify

Are magnet links allowed on HN?

cranberryturkey6mo ago

that is only 199gb, the real one is 300TB

636mo ago· 1 in thread

urbandw311er6mo ago

They can’t be touched by the music industry they’re based in Russia.

userbinator6mo ago· 1 in thread

Music files (releasing in order of popularity)

Increasing or decreasing? IMHO increasing would make more sense, as the most popular music is already mirrored in countless other places. It's the rare stuff that is most in need of preservation.

reassess_blind6mo ago

Same model and same prompt won’t necessarily create the same result, unless I misunderstand how these audio models work.

1 more reply

aftbit6mo ago· 1 in thread

Has anyone tried to add up the track file size from the metadata dump?

In spotify_clean_track_files.sqlite3:

    SELECT count(*), sum(filesize_bytes) FROM track_files;

    255966403|15970064861274

That's only 14.5 TiB, nowhere near 300 TiB. What makes up the other 285 TiB of content?

squigz6mo ago

That's curious and changes things pretty dramatically. It's a lot easier to host 15TB than 300. I wonder what's up here.

TheAceOfHearts6mo ago· 1 in thread

I'm a bit sad that they chose to focus on music rather than audiobooks. Creating an archive of audiobooks seem like it would be more aligned with their mission.

TechSquidTV6mo ago

The metadata is gold, but I was immediately curious why why wouldnt go for Tidal first. Though what ever they have on Spotify I think is unique.

romanovcode6mo ago· 1 in thread

`spotdl download "https://open.spotify.com/user/{username}" --user-auth --output '{list-name}/{title} - {artists}.{output-ext}'`

This is literally all you need to back up Spotify.

Philpax6mo ago

spotdl downloads from YouTube, not Spotify, afaik

artninja19886mo ago· 1 in thread

Wow. Anna is a godsend. Hopefully now we get some really good open source music models

brcmthrowaway6mo ago

First we need good stem splitting

1 more reply

markstos6mo ago· 1 in thread

> ≥70% of songs are ones almost no one ever listens to (stream count < 1000).

So much interesting but undiscovered music is out there!

halperter6mo ago

bob10296mo ago

I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.

Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?

shevy-java6mo ago

yoan92246mo ago

  I've always found it interesting how streaming services have become the de facto music library of record, yet they can and do remove content at will. When Spotify pulled out of Russia, entire catalogs became inaccessible. Physical media and personal archives suddenly matter again in ways we thought were obsolete.

  The copyright discussion is complex, but from a pure preservation standpoint, I'm glad someone is doing this work.

tristanc6mo ago

Fizzadar6mo ago

I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.

frytaped6mo ago

zzzeek6mo ago

DoctorOetker6mo ago

I'd rather see them use AI to convert all the scanned scientific articles into proper PDF or other formats.

How compact can we get the collective human scientific corpus?

acjohnson556mo ago

If I were to do it today, I could get so much farther with hyperscaler products and this dataset.

bguberfain6mo ago

We can finally search for playlists with a giving song! A basic feature that Spotify is missing!

nmz6mo ago

This might be the perfect time to do archiving before the entire internet gets inundated by sub-par AI generated content.

Motorbytes6mo ago

Does the Spotify backup contain any so far grayed out or unavailable songs on their list?

any response is appreciated!

pekkag6mo ago

junon6mo ago

TIL Anna's Archive is blocked in Germany (by a rather obtrusive MitM, I might add). Get redirected to a "Copyright Clearing House" or something.

Jumpmanlives6mo ago

Good stuff Anna's Archive. The Anchormen, premium sea shanty crew from Western Australia, officially endorsed you sharing our salty tunes. https://www.facebook.com/theanchormenwa/

https://open.spotify.com/album/07IyzOA9jJWPZcLDysQwpo?si=KZO...

xnx6mo ago

Merry Christmas!

Mr_Minderbinder6mo ago

> Over-focus on the highest possible quality

> ...(e.g. lossless FLAC). This inflates the file size...

As an example of how much information celluloid can contain see: https://vimeo.com/89784677 (context: he is comparing a Blu-ray and a scan of a 35mm print)

HawkEyeSpaceMan5mo ago

Not worth the risk imo. This might backfire at some point and ruin a good thing with the book libraries.

Kerollmops6mo ago

new_hair5mo ago

Rookie Question, but how do i access all this metadata especially in a cleaner way, or genre-wise for my project development.

Yeri6mo ago

wow. Blocked in Belgium.

Error HTTP 451 - Unavailable For Legal Reasons

https://lumendatabase.org/notices/71398835

krackers6mo ago

New multimodal training set just dropped.

pranavm276mo ago

Miss anna, next time please scale down image dimensions so that us on mobile can read properly haha

Jokes aside, I always thought the best way to deal with piracy was to understand or convince the demand not to do it over dealing with the supply.

machloof6mo ago

Aldipower6mo ago

Oh, just noticed my provider "Vodafone Germany" is blocking the domain annas-archive.li on DNS level.

puffpuff123456mo ago

Amazing!

Is there any way to search this spotify database without downloading the currently available metadata torrent?

Uninen6mo ago

I hope someone builds an open API around this metadata. I'd love to have alternatives to the big player APIs.

tolerance6mo ago

I am not enthused by this news. Let us entertain the possibility that similar institutions will eschew this catalog.

soundsgoodman6mo ago

You need to seriously re-think this...

Releasing indie music, like really low-level indie music, for free in the name of "preservation" is so misguided.

Don't do this. You will only end up hurting the artists who rely on paid downloads.

thenthenthen6mo ago

Full circle! Thank you! (https://torrentfreak.com/how-the-pirate-bay-helped-spotify-b...)

performative6mo ago

ewzimm6mo ago

The data analysis here is interesting. One thing that stood out to me is that black metal is the 6th most common musical genre for bands, right after rockabilly. I would never have expected that.

htx80nerd6mo ago

>Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

There is a ton of good bands with under 10k or even 1k monthly listeners.

ThinkBeat6mo ago

Can this last?

I envision an army of lawyers and cyber security companies being prepared to unleash a scorched earth campaign that book publishers might want to be part of as well.

At the end it may take down more than just this publication but most others as well.

meysamazad6mo ago

I wonder if Spotify will pursue any legal actions to take this archive or the site down!

baxuz6mo ago

> The quality is the original OGG Vorbis at 160kbit/s.

Yeah, the original quality is either a 320kbps OGG or lossless. Not 160.

While this is _a_ backup, it's a pretty lossy one.

none149886mo ago

Downloading of individual files to Anna’s Archive Please

haghiri756mo ago

I guess having an API to do search on metadata may be cool. Anyone thought of that?

schmuckonwheels6mo ago

I want to time-travel back to 2000 like Old Biff with the sports almanac so I can tell Shawn Fanning to use the "it's for historical preservation" defense.

nutjob26mo ago

I wonder how definitive their collection is and how much ripping Google Music/YouTube would improve on this.

A distributed ripping project to do that would be a fine thing.

damnitbuilds6mo ago

Well done !

Until we have reasonable copyright terms, Pirate On !

hmokiguess6mo ago

What an early christmas gift for humanity. Now, asking for a friend, what's the ideal setup for torrenting this? Mullvad / Tailscale?

lanalanabobana6mo ago

these guys are 100% selling that data to "AI" companies for thousands of dollars so the internet and world at large can get a little more shitty. awesome -_-

gorbachev6mo ago

I want to peek in that metadata collection to see if it could be used to identify the AI slop that's infecting Spotify.

If you could identify a track supposedly by artist X was actually AI slop not created by artist X, you could use that information to skip tracks on (web) music players, for example.

shomp6mo ago

If only Spotify paid musicians their fair share

wartywhoa236mo ago

https://annas-archive.li/llm

_vqpz6mo ago

dmix6mo ago

I hope they get the new lossless versions

thih96mo ago

This is conspiracy theory territory but I wonder if big tech is sponsoring efforts like this as an easy way to get training data.

littlecranky676mo ago

For some reason, the link does not work for me (spain). Works perfect at the same time in tor browser.

fungonimus6mo ago

I would like a downloader! :D this is such an awesome project

m00dy6mo ago

Congrats! I’m sure the Spotify lawyers are gonna have some sleepless nights ahead.

gverrilla6mo ago

GREAT DAY

none149886mo ago

Downloading of individual files to Anna’s Archive Please!

eastoncrafter6mo ago

Plans to upload all this to musicbrainz soundid program?

eastoncrafter6mo ago

Plans to upload all of this to music brainz soundid?

msephton6mo ago

Is this all regions? I'm assuming so but I can't be sure

marstall6mo ago

the top 10,000 songs seem to be 99.9% top-40 corporate pop, which suprised me. thought a list that broad would pick up more that was outside the maintream ...

1 more reply

reactordev6mo ago

Oh this is going to go over real well in Nashville, TN.

gyrgtyn6mo ago

is there a torrent client already that is be good at partial downloads? I didn't realize how popcorn time worked until I read this thread.

1 more reply

BaudouinVH6mo ago

error 451 https://postimg.cc/QFddnW41

siquick6mo ago

Is there a way to see the shape of the metadata?

rendaw6mo ago

Looking at the analysis, I'm totally surprised opera and psytrance are so prolific.

Psy-trance... I thought it was the same as any other electronic genres, but do people get high and just start shoveling psy-trance tracks out or something?

Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?

5 more replies

iqandjoke6mo ago

That’s why Spotify would lose against Apple. Spotify may need to pay a fortune for this scraper behaviour while Apple Music does not.

simmo90006mo ago

We need insane for culture to survive.

7ero6mo ago

free the music

RickyLahey6mo ago

This will be great to train AI on.

rldjbpin6mo ago

the metadata alone is a staggering couple hundred gb, however it contains quite handy information to play with. consider the following:

> /audio-features/{id} "Get audio feature information for a single track identified by its unique Spotify ID."

this combined with track metadata can finally allow those motivated enough to create their own personalized shuffle. potentially better than the slop we get nowadays. no generative ai required*.

Varaldar6mo ago

verisimi6mo ago

Yes, but do they have the one that goes like: to-to-to dotodoo? Hmmm? Do they?

udoyxyz6mo ago

yo, this is insane!! why would anyone do that? I think it is for AI music generation models, like training them. Maybe ai labs people did it?? yeah that is likely

dbacar6mo ago

Now, anyone with some decent info on signal processing and machine learning can build his/her own Shazam.

shmerl6mo ago

Just buy music DRM-free in the first place.

snoozebutton6mo ago

is this not highly illegal?

1 more reply

throw-12-166mo ago

I love coming to these threads to read the pearl clutching of "technologists" who suddenly care about IP and copyright law.

sma3in6mo ago

spotify undressed

bekindtoartists6mo ago

1 more reply

zoklet-enjoyer6mo ago

Wow. Now I just need some hard drives and a way to download that without my ISP doing something about it. That's amazing.

1 more reply

haryj6mo ago

wow

1dry6mo ago

1 more reply

linhns6mo ago

Unlike books, which are massively overpriced, this will hurt artists a lot as they need the fees paid by Spotify to make ends meet.

3 more replies

j / k navigate · click thread line to collapse