Self hosted YouTube media server (opens in new tab)

(tubearchivist.com)

245 points0372y ago45 comments

45 comments

39 comments · 12 top-level

codetrotter2y ago· 8 in thread

Does it save the video thumbnail as well? Video description? Comments? Channel name? Channel avatar? etc

Currently I use yt-dlp to manually download individual videos that I want to keep. At the moment I only save the video itself. And most of the time I then also paste the URL of the video into archive.is save page and web.archive.org/save so that there is a snapshot of what the video page itself looked like at the time. But this is still incomplete, and relies on those services continuing to exist. Locally saving a snapshot of the page like that, and then also saving the thumbnail and perhaps more of the comments would be nice.

submeta2y ago

Check this code fragment:

    youtube_details = {
        "youtube_id": vid_id,
        "channel_name": vid["channel"],
        "vid_thumb_url": vid["thumbnail"],
        "title": vid["title"],
        "channel_id": vid["channel_id"],
        "duration": duration_str,
        "published": published,
        "timestamp": int(datetime.now().timestamp()),
        # Pulling enum value out so it is serializable
        "vid_type": vid_type.value,
    }

abracadaniel2y ago

I’ve been using the ytdl-sub docker container for managing automated download of channels and importing them into Plex https://github.com/jmbannon/ytdl-sub

brettwilcox2y ago

You should also check out https://archivebox.io/

moritzruth2y ago

It saves all the things you mentioned plus the number of views and likes.

CrampusDestrus2y ago

You can embed both the thumbnail and the description of a video with ytdl, if you use the appropriate containers

tjfl2y ago

I didn't know this, neat! Going to try this out:

https://www.reddit.com/r/youtubedl/comments/mcvgmr/download_...

progman322y ago

Yes to all of those. It even indexes most of those so you can do full text searches if you want.

snthd2y ago

https://github.com/TheFrenchGhosty/TheFrenchGhostys-Ultimate...

DCKing2y ago· 6 in thread

Tube Archivist is quite heavyweight as it's meant to do heavy full archiving of YouTube channels and search through positively huge libraries. I'm getting the sense that it's a data hoarding tool, not a casual web video watching tool. I found that I just want to add a few channels to my media library, for which I use Jellyfin already.

For people looking for a more lightweight option of that kind, I run the following script hourly [1]. This script uses yt-dlp to go through a text file full of YouTube RSS urls (either a channel RSS or a playlist RSS works for channels where you're only interested in a subset of videos) [2] and downloads the latest 5 videos organized in folders based on channel name. I watch these files by adding the output folder in a Jellyfin "Movies" type library sorted by most recent. The script contains a bunch of flags to make sure Jellyfin can display video metadata and thumbnails without any further plugins, and repackages videos in a format that is 1080p yet plays efficiently even in web browsers on devices released in at least the last 10 years.

It uses yt-dlp's "archive" functionality to keep track of videos it's already downloaded such that it only downloads a video once, and I use a separate script to clean out files older than two weeks once in a while. Running the script depends on ffmpeg (just used for repackaging videos, not transcoding!), xq (usually comes packaged with jq or yq) and yt-dlp being installed. You sometimes will need to update yt-dlp if a YouTube side change breaks it.

For my personal usage it's been honed for a little while and now runs reliably for my purposes at least. Hope it's useful to more people.

[1]: https://pastebin.com/s6kSzXrL

[2]: E.g. https://danielmiessler.com/p/rss-feed-youtube-channel/

jchw2y ago

Hope you don't mind that I adapted this into a quick container image[1]. (Feel free to scoff at the idea of taking a simple bash script and making it into a massive Docker image; you're right, I just wanted a convenient way to run it in Linux environments that I don't have great control over.) I know it's not a huge script, but nonetheless if you want I can add a LICENSE/copyright notice in my fork/adaptation, if you want to pick a license for this script.

[1]: https://github.com/jchv/ytdl-pvr

DCKing2y ago

Oh great! Yeah as you can probably tell from the script I'm using it as a (locally built) container in my own setup. Feel free to pretend it's BSD licensed if that helps :)

mcpackieh2y ago

Downloading whole channels and searching them shouldn't be heavy weight. I do that with yt-dlp and an index stored in a SQLite db.

inb4 dropbox/rsync reference. yeah yeah, I'm not saying everybody should do it like this, I'm just saying that archiving and indexing/searching needn't be heavyweight. I'm sure there's plenty of utility in a nice GUI for it, but it could easily be a light weight GUI.

aftergibson2y ago

Looks great, but I couldn't get it working, what version of xq are you using this against? 1.2.1 does not seem to have a -r argument.

jchw2y ago

You need to use yq, which has an xq binary.

(edit: removed my "plug" since I mentioned it elsewhere.)

1 more reply

philsnow2y ago

yt-dlp also has flags to write subtitle files and .info.json files, which at least Emby can automatically pick up and use, if not Jellyfin.

I haven’t yet wired up the bits to use whispercpp to automatically generate subtitles for downloads, but I have done so on an ad-hoc basis in the past and gotten (much) better results than the YouTube auto-generated subtitles.

nullcipher2y ago· 6 in thread

I couldn't find docs for installing from source. Is docker really mandatory ?

Also, "Tube Archivist depends on Elasticsearch 8." . Wow, why?

Jolter2y ago

Probably because it’s fast, scales well in terms of size, and decently easy to build apps around. Shouldn’t the question be, why not?

hkt2y ago

Why not?

Well, if you're self hosting for yourself, friends, and family.. it isn't likely to be a thing you'll want to care about fixing when it eventually breaks in mysterious ways.

Better to use sqlite or just a blob of yaml in a file if self hosting might be involved.

1 more reply

nullcipher2y ago

Maybe I misunderstood this but why does a personal youtube listing / downloading app need ES ? Seems so heavyweight for such a lightweight use.

2 more replies

moritzruth2y ago

IIRC it uses Elasticsearch as its only database.

pkulak2y ago

Seems to require Redis too. Maybe just for cache.

tibbydudeza2y ago

Why ???.

indus2y ago· 3 in thread

Interesting idea.

I always dream of writing a proxy server—-where all videos—-irrespective of device—-get stored in a local cache and served without going outside on subsequent requests.

Gonna try this one, and gonna take that direction.

WirelessGigabit2y ago

You can. And you can do fun stuff.

Look at the upside-down-ternet: http://www.ex-parrot.com/pete/upside-down-ternet.html

But here are the problems you'll run into today:

Cache hit rate. How big is your cache? Large enough to get a hit rate that economically saves you money vs the cost of the SAN?

Can you cache YouTube videos? Can you intercept YouTube videos? You'll need a root cert installed on your client devices. And, here's the worst: many applications do cert pinning so they'll refuse to load, even if the signer is in the root store. They require a specific signer.

zo12y ago

I had this same idea around 2012ish. And had the naive impression that you could cache youtube videos on a proxy level (using Squid). Boy was I wrong, even then YT was employing heavy tactics to prevent people from caching videos at the native http cache header level. And it was at that point I realized something was wrong with the internet.

ComputerGuru2y ago

https has put the kibosh on a lot of this type of stuff, unless you’re willing to set up a local CA and trust the root cert on ~every device that will use your network.

EGreg2y ago· 2 in thread

How does it download videos? I thought YouTube blocked ripping videos?

simonw2y ago

https://github.com/ytdl-org/youtube-dl and https://github.com/yt-dlp/yt-dlp (which this project uses, see https://github.com/tubearchivist/tubearchivist/blob/f848e732... ) are great at ripping videos, and not just from YouTube, they support 100+ different media providers.

They are actively updated every time a new blocking technique comes along.

EGreg2y ago

OK but this is self-hosted

1 more reply

simonw2y ago· 1 in thread

I saw this was a Django app so I dug around to look at their models. As far as I can tell this is all they have: https://github.com/tubearchivist/tubearchivist/blob/master/t... - just a `Account` model.

It looks like Django + SQLite is used for user accounts, but all other data storage happens in Elasticsearch.

It's an interesting design decision. I would have gone all-in on the database, and used SQLite FTS in place of Elasticsearch for simplicity, but that's my own personal favourite stack. Not saying their design is bad, just different.

037OP2y ago

Perhaps Elasticsearch was chosen because they also index video comments and subtitles, making full-text search a key feature. But I agree, SQLite FTS might suffice, and much of the metadata could be better managed using a traditional Django structure.

It would be great to add embeddings to the index, possibly using one of your Python tools.

freefaler2y ago· 1 in thread

Looks great, I will try it, since YouTube broke my scirpts a while a go...

The way I was using them was to create a playlist named "save" and pulling from it once a day. It worked for a while, but YT started to ban somehow my script. Tube Archivist looks like would be ideal for that.

Thanks for sharing this!

Adverblessly2y ago

It might just be that you were banned because you were doing it exactly once a day.

I use YT's RSS feature to follow channels and playlists I'm interested in and discovered that (somewhat ironically) if I have it query the RSS periodically Google will decide that I am a bot, will return errors for all reads and force me to pass a captcha next time I try to use any Google product (presumably connecting the two activities via ip).

So now my RSS reader does not periodically query YT and instead I manually click the update button when I'm interested...

c0brac0bra2y ago

I've had significant problems running this for extended periods.

It will crash and then restoration will fail internally with corruption errors, requiring reading through docker logs or just starting over from scratch completely.

renegat0x02y ago

Ha, I have also wrote something similar

https://github.com/rumca-js/Django-link-archive

I support not only youtube, but also any RSS source.

It functions as link aggregation software. I can also fetch meta for all videos in channel, and download videos, audios.

I am using standard Django auth module.

It still lacks polish, and it is under development. I am not a webdev, so I am still struggling with overall architecture

snthd2y ago

As a less sophisticated alternative there's a metadata plugin for jellyfin https://github.com/ankenyr/jellyfin-youtube-metadata-plugin

MaikaDiHaika2y ago

I tried installing it half a year ago and the setup and documentation was really bad. Maybe I'll try it again sometime

ocdtrekkie2y ago

I like prologic's Tube. Way simpler, single Golang binary. Hosts video, not much else.

j / k navigate · click thread line to collapse

45 comments

39 comments · 12 top-level

codetrotter2y ago· 8 in thread

Does it save the video thumbnail as well? Video description? Comments? Channel name? Channel avatar? etc

submeta2y ago

Check this code fragment:

    youtube_details = {
        "youtube_id": vid_id,
        "channel_name": vid["channel"],
        "vid_thumb_url": vid["thumbnail"],
        "title": vid["title"],
        "channel_id": vid["channel_id"],
        "duration": duration_str,
        "published": published,
        "timestamp": int(datetime.now().timestamp()),
        # Pulling enum value out so it is serializable
        "vid_type": vid_type.value,
    }

abracadaniel2y ago

I’ve been using the ytdl-sub docker container for managing automated download of channels and importing them into Plex https://github.com/jmbannon/ytdl-sub

brettwilcox2y ago

You should also check out https://archivebox.io/

moritzruth2y ago

It saves all the things you mentioned plus the number of views and likes.

CrampusDestrus2y ago

You can embed both the thumbnail and the description of a video with ytdl, if you use the appropriate containers

tjfl2y ago

I didn't know this, neat! Going to try this out:

https://www.reddit.com/r/youtubedl/comments/mcvgmr/download_...

progman322y ago

Yes to all of those. It even indexes most of those so you can do full text searches if you want.

snthd2y ago

https://github.com/TheFrenchGhosty/TheFrenchGhostys-Ultimate...

DCKing2y ago· 6 in thread

For my personal usage it's been honed for a little while and now runs reliably for my purposes at least. Hope it's useful to more people.

[1]: https://pastebin.com/s6kSzXrL

[2]: E.g. https://danielmiessler.com/p/rss-feed-youtube-channel/

jchw2y ago

[1]: https://github.com/jchv/ytdl-pvr

DCKing2y ago

Oh great! Yeah as you can probably tell from the script I'm using it as a (locally built) container in my own setup. Feel free to pretend it's BSD licensed if that helps :)

mcpackieh2y ago

Downloading whole channels and searching them shouldn't be heavy weight. I do that with yt-dlp and an index stored in a SQLite db.

aftergibson2y ago

Looks great, but I couldn't get it working, what version of xq are you using this against? 1.2.1 does not seem to have a -r argument.

jchw2y ago

You need to use yq, which has an xq binary.

(edit: removed my "plug" since I mentioned it elsewhere.)

1 more reply

philsnow2y ago

yt-dlp also has flags to write subtitle files and .info.json files, which at least Emby can automatically pick up and use, if not Jellyfin.

nullcipher2y ago· 6 in thread

I couldn't find docs for installing from source. Is docker really mandatory ?

Also, "Tube Archivist depends on Elasticsearch 8." . Wow, why?

Jolter2y ago

Probably because it’s fast, scales well in terms of size, and decently easy to build apps around. Shouldn’t the question be, why not?

hkt2y ago

Why not?

Well, if you're self hosting for yourself, friends, and family.. it isn't likely to be a thing you'll want to care about fixing when it eventually breaks in mysterious ways.

Better to use sqlite or just a blob of yaml in a file if self hosting might be involved.

1 more reply

nullcipher2y ago

Maybe I misunderstood this but why does a personal youtube listing / downloading app need ES ? Seems so heavyweight for such a lightweight use.

2 more replies

moritzruth2y ago

IIRC it uses Elasticsearch as its only database.

pkulak2y ago

Seems to require Redis too. Maybe just for cache.

tibbydudeza2y ago

Why ???.

indus2y ago· 3 in thread

Interesting idea.

I always dream of writing a proxy server—-where all videos—-irrespective of device—-get stored in a local cache and served without going outside on subsequent requests.

Gonna try this one, and gonna take that direction.

WirelessGigabit2y ago

You can. And you can do fun stuff.

Look at the upside-down-ternet: http://www.ex-parrot.com/pete/upside-down-ternet.html

But here are the problems you'll run into today:

Cache hit rate. How big is your cache? Large enough to get a hit rate that economically saves you money vs the cost of the SAN?

zo12y ago

ComputerGuru2y ago

https has put the kibosh on a lot of this type of stuff, unless you’re willing to set up a local CA and trust the root cert on ~every device that will use your network.

EGreg2y ago· 2 in thread

How does it download videos? I thought YouTube blocked ripping videos?

simonw2y ago

They are actively updated every time a new blocking technique comes along.

EGreg2y ago

OK but this is self-hosted

1 more reply

simonw2y ago· 1 in thread

It looks like Django + SQLite is used for user accounts, but all other data storage happens in Elasticsearch.

037OP2y ago

It would be great to add embeddings to the index, possibly using one of your Python tools.

freefaler2y ago· 1 in thread

Looks great, I will try it, since YouTube broke my scirpts a while a go...

Thanks for sharing this!

Adverblessly2y ago

It might just be that you were banned because you were doing it exactly once a day.

So now my RSS reader does not periodically query YT and instead I manually click the update button when I'm interested...

c0brac0bra2y ago

I've had significant problems running this for extended periods.

It will crash and then restoration will fail internally with corruption errors, requiring reading through docker logs or just starting over from scratch completely.

renegat0x02y ago

Ha, I have also wrote something similar

https://github.com/rumca-js/Django-link-archive

I support not only youtube, but also any RSS source.

It functions as link aggregation software. I can also fetch meta for all videos in channel, and download videos, audios.

I am using standard Django auth module.

It still lacks polish, and it is under development. I am not a webdev, so I am still struggling with overall architecture

snthd2y ago

As a less sophisticated alternative there's a metadata plugin for jellyfin https://github.com/ankenyr/jellyfin-youtube-metadata-plugin

MaikaDiHaika2y ago

I tried installing it half a year ago and the setup and documentation was really bad. Maybe I'll try it again sometime

ocdtrekkie2y ago

I like prologic's Tube. Way simpler, single Golang binary. Hosts video, not much else.

j / k navigate · click thread line to collapse