Beyond the hashing algorithm, some important additions that were previously proposals without widespread use (e.g. merkle tree for hashing pieces) are becoming required. The focus has mostly been on optimizing latency for the P2P protocol and making sane improvements to the file spec. I feel like trackers were largely overlooked in this update, but I'm biased because I work on a popular tracker.
Ideally, BitTorrent would be broken down into separate specifications that could be used together or in separate systems: one for the file format and piece representation for sharing files, one for the P2P protocol, and one for discovery (trackers, DHTs). I want to believe that there would be far more interesting P2P projects if you could just lift robust primitives from BitTorrent.
> I feel like trackers were largely overlooked in this update, but I'm biased because I work on a popular tracker.
Yes, we did not pay much attention to trackers, but BEP52 basically seized the opportunity to do some incompatible changes we always wanted to do anyway (quite a few accumulated over the years), and there were no such open issues with the http tracker protocol.
This is because the HTTP protocol is so much overhead that most trackers don't even really run it anymore. I think UDP being promoted to the spec would've been a step in the right direction. Modern trackers have a bunch of tricks like BEP34[0] to avoid getting pounded that would be great if every client conformed to.
I hope I'm not coming off as aggressive. I really appreciate this work and I'm really glad to see a spec revision. It's just as you said, there's been many years and many good improvements that I'd like to see made while there's still a change to break things.
HTTP trackers are considered fine for medium-scale torrent deployments. UDP trackers were originally introduced to cope with the traffic caused by running an open tracker that manages 100k+ infohashes for the whole world.
Also, both BEP3 and 52 already forward-reference the tracker extensions (compact and UDP), so someone who writes a new bittorrent implementation should already be aware of them.
Maybe we could make it more clear that some BEPs are almost-mandatory.
Yeah, if I remember correctly bittorrent DHT ultimately just maps 20 byte hashes to peer-lists (IP + port pairs). It's obviously designed to be convenient for bittorrent swarm discovery, but nothing about it limits it to bittorrent usage. Indeed, I'm surprised it's not more widely exploited for p2p bootstrapping.
Did you guys talk with the IPFS team? Do both of you have a desire to start bringing both families of protocols and technologies closer together?
I feel in this age we must make de-fragmentation of efforts our topmost priority.
In particular, I note that there's nothing in there regarding which infohash should be used in the tracker updates. Should traffic with v1/v2 clients be reported separately, or should it be consolidated under the v2 infohash?
What are the security implications of doing this? It seems it wouldn't increase the strength beyond the original 160 bits, no? Was there anything preventing redesigning the protocol to use full 32-byte SHA256 hashes throughout?
80-bit of collision resistance is usually the number accepted for legacy cryptosystems or for lightweight crypto. It's not great but it's not "too bad".
By removing 96 bits from the state you also prevent length extension attacks (which SHA-256 is vulnerable to, see [1]). Or rather, provide 96-bit of security against them. Which should be enough.
This is better than using SHA-1 because SHA-1 has "efficient" chosen-prefix algorithms to find collisions while SHA-2 currently does not.
Now if it were me I would have chosen a hash function like KangarooTwelve which is faster, provides parallelization for large inputs, allows you to customize the output length and has received a substantial amount of cryptanalysis.
[1]: https://cryptologie.net/article/417/how-did-length-extension...
The hash only gets truncated when used in places as unique identifier. When you start with a v2 magnet link or torrent file you get the full 32bytes hash, which means your integrity-checking is unaffected.
Replacing SHA-1 with SHA-2, what are they thinking? Blake2 is faster and more secure than either.
Moving bittorrent away from it's present image could be achieved by making p2p useful beyond bluray rips
But I just used it a lot when running bigger downloads like install discs for Linux distros, OpenOffice etc., and it made a difference when there was some major release and half of the plain old http mirrors were painfully slow or down entirely. Admittedly, that situation got a lot better compared to 10 years ago, but still I'm delighted by how natural it felt to use, since it seamlessly integrated with the browser's download manager. And you didn't have this "uh, I need to start an external program for this" kind of reluctant thought when you saw a website offered download via torrent. Today I just wonder if BT would have evolved differently if all browsers would have included a client.
https://en.wikipedia.org/wiki/Brave_(web_browser) https://brave.com/
There's plenty of things besides piracy people could be doing with torrents and related tech and it seems like such a waste of an idea. A Linux package manager, an open-source Acestream alternative, collaborative work on large scales
So yes, it's the way it was originally meant to be used.
Developers will code it into their download pages, decentralized systems like a p2p wikipedia will be possible and always accessible by anyone with a browser.
here you go https://en.wikipedia.org/wiki/Twister_%28software%29
I wonder why not SHA512? It's actually faster to compute than SHA512 on 64-bit architectures.
It is an arms race that is not won by updating a slowly evolving core protocol.
2) SHA1 is replaced with SHA2-256 (2x longer hashes and not broken).
3) Files are represented by a tree structure instead of a list of dictionaries with paths-- this reduces duplication in deeply-nested hierarchies.
4) Backwards compatible-- you can make a .torrent file with both old and new pieces, and a swarm can speak either. This requires padding files from BEP47, which most clients probably don't support.
Per-file metadata increases pretty significantly, from ~19B (just length) to ~68B (length + hash).
The .torrent file only stores the merkle tree's root hash for each file, and the torrent client will query it's peers to get the rest of the merkle tree (verifiable against the root hash). The leafs of the merkle tree are the hash of each 16kb block.
Interesting consequences of this:
Piece size isn't baked into the file anymore (and I've seen torrents with 16mb blocks), the client can dynamically chose it's verification piece size by requesting only so many layers of the merkle tree. Or it could skip requesting the tree and verify the whole file at once.
Merkle tree roots will be globally unique. You can scan torrent files for duplicated files and download common files from multiple swarms.
Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".
Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?
v2: O(log(path-depth) * number of files)
that is assuming some constantish branching factor in your directory structure
> Merkle tree roots will only be unique for each piece length.
Merkle trees are independent of piece size, which means you can use them to dedup across torrents.
This is one of the biggest things I feel is missing from the current protocol and I'm very glad it's in v2 draft. Now when a group of related torrents are repacked into a single torrent all the swarms are complementary instead of competitive. You don't have to choose between seeding the big pack instead of the individual files, just do what you want and the whole swarm still benefits.
To clarify, this works by the client deterministically reconstructing the tree once they have the whole file, then checking the root's hash, correct?
Each leaf is the hash of a 16KB chunk. On the next layer up you have a series of nodes which are the hashes of the two leafs below it hashed together.
You add enough layers until you get a single root hash at the root of the tree.
if torrent A and B both contain the exact same file, but torrent A only has the first half available, and torrent B has the second half available, could I combine both torrents to download that file? this could help fix old dead torrents or at least make the file searchable elsewhere by it's sha256 for example
But can't you already download one file? I suppose if a chunk spans two files, you may get a few extra KB of another file you don't want, but it's not noticeable from a user perspective.
I never really thought about the details of how it works, or the really really impressive feats that were accomplished to get it to work. I knew it was a really good technology, but reading this and the comments here puts it on a whole other level.
Why isn't this technology talked about more? Why are blockchains the big "thing" right now with people trying to use them everywhere to see where they fit best, but torrent networks are kind of just... ignored?
The decentralized nature of it seems to open so many possibilities at first glance, is there a reason they aren't being taken advantage of? Is there some kind of "great filter" kind of thing that is preventing widespread usage of something like a torrent network?
Similarly I heard that Skype used to do something similar, I'm not sure exactly how it worked and apparently it was a pain to maintain so I think it has been scraped as well by now. I think some software updaters do use Bittorrent still, though.
If I were to guess, the really big reason for the lack of interest from big corporations is that collecting as much data as possible for use in machine learning is very much in vogue, while at the same time bandwidth seems to be very much a no-issue. Thus there is not much to gain and possible something to lose from employing bittorrent.
Streaming wants us to download A, B, C, D just in time.
Bottorrent (simplified) wants me to download piece P, you to download piece G, then I get P from you and you get G from me.
There are Bottorrent streaming apps but they kind of mess with the nature of BT.
OTOH things like RPM/Deb/WindowsUpdate etc it would make great sense.
The BitTorrent DHT is great for storing and exchanging metadata, but a DHT is not something most people associate with BitTorrent (Bitcoin also has uses a DHT (for client discovery), as do countless other services).
Blockchain technology on the other hand offers verifiable distributed timestamping (with ok-ish resolution). That has much wider applicability than just payment tracking (which is essentially all bitcoin does), which is why there's plenty of people exploring what's possible.
In this case, I was trying to use it to ask if there is some kind of "unsolved problem", inherent limitation or issue/problem with torrent networks that prevents their widespread usage.
combining BEPs 46 and 50 enables rapid updates of torrents, but they are fairly new and there are no implementation designed with low latency in mind. Most bittorrent implementations focus on large amounts of data and throughput, so this use-case is not well served in practice even though the protocol could support it now.
On the other hand, the an uncensorable imageboard would profit from the verifiable timestamping of a blockchain, with just the images distributed via a bittorrent-like mechanism. That also gives you a decent anti-spam mechanism (you can post in exchange for mining blocks, similar to the original idea of hash-cash)
On the other hand, there has to be a way to avoid downloading (and sharing) certain parts of the chain, for example if someone uploads illegal content, they should have the option to never download that data, so I like the idea of keeping images separate.
For posting to be feasible, the time to mine has to be low, though of course it'll increase over time, meaning that either shorter blockchains are favoured for ease of use (nobody wants to wait 5 minutes and waste a lot of power just to make a post) but long enough to make them hard to forge.
There's also the issue of segmentation; there's an interest in certain users wanting not to share certain posts, for example people against political issue X may not want to share posts about X. With a small number of peers, this could mean that only one or two peers keeps track of the posts talking about issue X. And then you'd have to trust that you're not downloading illegal content from those people, so if you are committed to anti-censorship but also don't want to download illegal content, you have to trust those peers to only remove illegal content.
In the end, I'm not sure if it comes out better than NNTP, or even centralised discussion boards with multiple independent archive sites available (which can archive posts before they are deleted by moderators).
Discussion of other changes: https://github.com/bittorrent/bittorrent.org/pull/59
This means in large multi-file torrents you don't have to download (and store) the two extra 1-4mb pieces at the start/end of each file anymore.
Huh, where do you see that? Not seeing any ctrl-f hits for webtorrent or webrtc.
I guess this is why they say when you assume you make an ass out of you and me.
Users hated it for general use, even when downloading big files. 1) They didn't like having to install/run some special software to download a file. 2) They didn't like the effects of uploading to others and it slowing down the connections.
Consumer networks are asymmetric having far more download capacity in upload capacity. This makes sense since 1) most users download and want to use the available bandwidth for faster downloads, and 2) it prevents commercial applications on consumer circuits. This is far from ideal for applications like BitTorrent.
I'm not saying there isn't an application for this technology, I'm saying all the good applications don't want to ask the users to pay for distribution to other users. Thus it's relegated to mostly piracy, open source, etc.
Bittorrent Inc. has been trying to commercialize this for a decade now, I just don't see it happening. If there was anyone who could commercialize it, it was Travis Kalnik, and while he exited for 20m, he was very lucky, (and happy) to get out of that market.
It already is though.
Merkle trees allow torrents to start faster from magnet links since only the tree roots need to be front-loaded while the tree can be fetched incrementally.
Is it considered the spiritual successor to the original uTorrent?
>The qBittorrent project aims to provide an open-source software alternative to µTorrent.
though in my experience it is more of a memory hog and buggier than utorrent. but that doesn't stop me from using it
Now it's full of ads and performs poorly.
[1] Like all the different Linux distro install images over and over again.
It fits the efficient, small memory-footprint and no ads requirements. "Full Featured" is subjective as it depends upon what you consider "Full Featured".