Skip to content

Top Best Ask Show New Jobs

The BitTorrent Protocol Specification v2 (opens in new tab)

(bittorrent.org)

319 pointsNekit12340078y ago112 comments

112 comments

61 comments · 9 top-level

jzelinskie8y ago· 18 in thread

This update to the spec is a modest change that's largely a preemptive reaction to SHA1 being broken; large portions of BitTorrent are designed around the 20-byte length of a SHA1 checksum. They've decided to move forward with SHA256 truncated to 20 bytes to avoid incompatibilities with existing infrastructure such as the Mainline DHT.

Beyond the hashing algorithm, some important additions that were previously proposals without widespread use (e.g. merkle tree for hashing pieces) are becoming required. The focus has mostly been on optimizing latency for the P2P protocol and making sane improvements to the file spec. I feel like trackers were largely overlooked in this update, but I'm biased because I work on a popular tracker.

Ideally, BitTorrent would be broken down into separate specifications that could be used together or in separate systems: one for the file format and piece representation for sharing files, one for the P2P protocol, and one for discovery (trackers, DHTs). I want to believe that there would be far more interesting P2P projects if you could just lift robust primitives from BitTorrent.

The DHT BEPs specify a network that is only barely related to the bittorrent core protocol, they can already be used independently, and some people do.

> I feel like trackers were largely overlooked in this update, but I'm biased because I work on a popular tracker.

Yes, we did not pay much attention to trackers, but BEP52 basically seized the opportunity to do some incompatible changes we always wanted to do anyway (quite a few accumulated over the years), and there were no such open issues with the http tracker protocol.

jzelinskie8y ago

>...and there were no such open issues with the http tracker protocol

This is because the HTTP protocol is so much overhead that most trackers don't even really run it anymore. I think UDP being promoted to the spec would've been a step in the right direction. Modern trackers have a bunch of tricks like BEP34[0] to avoid getting pounded that would be great if every client conformed to.

I hope I'm not coming off as aggressive. I really appreciate this work and I'm really glad to see a spec revision. It's just as you said, there's been many years and many good improvements that I'd like to see made while there's still a change to break things.

[0]: http://www.bittorrent.org/beps/bep_0034.html

aninhumer8y ago

>The DHT BEPs specify a network that is only barely related to the bittorrent core protocol

Yeah, if I remember correctly bittorrent DHT ultimately just maps 20 byte hashes to peer-lists (IP + port pairs). It's obviously designed to be convenient for bittorrent swarm discovery, but nothing about it limits it to bittorrent usage. Indeed, I'm surprised it's not more widely exploited for p2p bootstrapping.

More generic question, apologies if it feels inserted without relevance:

Did you guys talk with the IPFS team? Do both of you have a desire to start bringing both families of protocols and technologies closer together?

I feel in this age we must make de-fragmentation of efforts our topmost priority.

predakanga8y ago

How do you intend to handle incompatibilities in the tracker protocol that are introduced by BEP52?

In particular, I note that there's nothing in there regarding which infohash should be used in the tracker updates. Should traffic with v1/v2 clients be reported separately, or should it be consolidated under the v2 infohash?

userbinator8y ago

SHA256 truncated to 20 bytes

What are the security implications of doing this? It seems it wouldn't increase the strength beyond the original 160 bits, no? Was there anything preventing redesigning the protocol to use full 32-byte SHA256 hashes throughout?

baby8y ago

SHA-256 has a 256-bit output. Truncated to 20 bytes/160-bit you effectively reduce the collision resistance to 80-bit and the (second) pre-image resistance to 160-bit. (This mean for example that you have to do around 2^80 operations (hash computations) to find a collision.)

80-bit of collision resistance is usually the number accepted for legacy cryptosystems or for lightweight crypto. It's not great but it's not "too bad".

By removing 96 bits from the state you also prevent length extension attacks (which SHA-256 is vulnerable to, see [1]). Or rather, provide 96-bit of security against them. Which should be enough.

This is better than using SHA-1 because SHA-1 has "efficient" chosen-prefix algorithms to find collisions while SHA-2 currently does not.

Now if it were me I would have chosen a hash function like KangarooTwelve which is faster, provides parallelization for large inputs, allows you to customize the output length and has received a substantial amount of cryptanalysis.

[1]: https://cryptologie.net/article/417/how-did-length-extension...

scrollaway8y ago

The problem with SHA1 being broken is unrelated to the amount of bits in the hash, so the point of switching to sha256 isn't to "increase the strength beyond the original 160 bits" but to avoid Shattered and future potential attacks.

There are 2 uses of the hashes in bittorrent. a) integrity checking b) as unique identifier for the swarm.

The hash only gets truncated when used in places as unique identifier. When you start with a v2 magnet link or torrent file you get the full 32bytes hash, which means your integrity-checking is unaffected.

sha2throwaway8y ago

Actually you do get some added security, because it prevents the length extension attacks sha-2 and related hash functions have thanks to the Merkle-Damgard construction [0]. Specifically, by truncating the hash, the output no longer contains enough state to perform the attack.

[0] https://en.wikipedia.org/wiki/Length_extension_attack

boyce8y ago

I'd love to see a p2p wikipedia and a p2p twitter alternative

Moving bittorrent away from it's present image could be achieved by making p2p useful beyond bluray rips

I liked how the old Opera included a bittorrent client. It was the only browser that did, and felt like it was actually the way it was meant to be used. But nobody got it. You had warez kiddies left and right complaining how it sucked compared to Azureus and later µTorrent which had bazillions of features to tweak and max out their connection, or saying it was useless because their favorite ALT did not whitelist Opera.

But I just used it a lot when running bigger downloads like install discs for Linux distros, OpenOffice etc., and it made a difference when there was some major release and half of the plain old http mirrors were painfully slow or down entirely. Admittedly, that situation got a lot better compared to 10 years ago, but still I'm delighted by how natural it felt to use, since it seamlessly integrated with the browser's download manager. And you didn't have this "uh, I need to start an external program for this" kind of reluctant thought when you saw a website offered download via torrent. Today I just wonder if BT would have evolved differently if all browsers would have included a client.

Personally I really truly believe that once you can make a torrent client in javascript that runs directly in the browser without any plugins or gateways/tunnels that you will see it explode.

Developers will code it into their download pages, decentralized systems like a p2p wikipedia will be possible and always accessible by anyone with a browser.

dandelion_lover8y ago

> p2p twitter alternative

here you go https://en.wikipedia.org/wiki/Twister_%28software%29

RJIb8RBYxzAMX9u8y ago

> They've decided to move forward with SHA256 truncated to 20 bytes [...]

I wonder why not SHA512? It's actually faster to compute than SHA512 on 64-bit architectures.

SHA256 was chosen over SHA512 because the two most popular 64-bit ISAs (x86-64 and ARMv8) both define instruction extensions to provide hardware acceleration of SHA256. Hardware support makes SHA256 much faster than SHA512 in software, even on 64-bit processors.

mtgx8y ago

The word encryption doesn't even seem to be mentioned there. At the very least it would help against traffic shaping, which you know is coming once net neutrality rules are dead.

Bittorrent already had an obfuscation protocol wrapper for years. It was effective for a while but the companies that implement traffic shaping equipment stepped up their game and probably rely on traffic flow matching now.

It is an arms race that is not won by updating a slowly evolving core protocol.

Klathmon8y ago· 11 in thread

I have to admit, BitTorrent is one of the things I took for granted.

I never really thought about the details of how it works, or the really really impressive feats that were accomplished to get it to work. I knew it was a really good technology, but reading this and the comments here puts it on a whole other level.

Why isn't this technology talked about more? Why are blockchains the big "thing" right now with people trying to use them everywhere to see where they fit best, but torrent networks are kind of just... ignored?

The decentralized nature of it seems to open so many possibilities at first glance, is there a reason they aren't being taken advantage of? Is there some kind of "great filter" kind of thing that is preventing widespread usage of something like a torrent network?

znfi8y ago

Firstly I think that Bittorrent-style techniques are or have been used in some places even though it may not have been advertised very clearly. For example until a few years ago Spotify used to use something like Bittorrent to reduce the load on their servers. It's just that they didn't really tell anyone who used their product about this, which I honestly felt was kinda bad style.

Similarly I heard that Skype used to do something similar, I'm not sure exactly how it worked and apparently it was a pain to maintain so I think it has been scraped as well by now. I think some software updaters do use Bittorrent still, though.

If I were to guess, the really big reason for the lack of interest from big corporations is that collecting as much data as possible for use in machine learning is very much in vogue, while at the same time bandwidth seems to be very much a no-issue. Thus there is not much to gain and possible something to lose from employing bittorrent.

nailer8y ago

Perhaps because a lot of content is streamed live?

Streaming wants us to download A, B, C, D just in time.

Bottorrent (simplified) wants me to download piece P, you to download piece G, then I get P from you and you get G from me.

There are Bottorrent streaming apps but they kind of mess with the nature of BT.

OTOH things like RPM/Deb/WindowsUpdate etc it would make great sense.

freeone30008y ago

World of Warcraft (used to?) patch with Bittorrent.

BitTorrent is "just" exchanging files via a p2p connection. It's kind of useful, a lot of projects use it in one way or another, but it's unlikely to be instrumental for "the next big thing".

The BitTorrent DHT is great for storing and exchanging metadata, but a DHT is not something most people associate with BitTorrent (Bitcoin also has uses a DHT (for client discovery), as do countless other services).

Blockchain technology on the other hand offers verifiable distributed timestamping (with ok-ish resolution). That has much wider applicability than just payment tracking (which is essentially all bitcoin does), which is why there's plenty of people exploring what's possible.

rmc8y ago

I suspect the close association with copyright infringement means that BitTorrent is a little toxic for many corporations.

What is this "great filter"?

Its from the "fermi paradox"[0]. Basically it says that there might not be any other life out there because there is some kind of "event" or "limitation" of life that makes it so that it can only exist for so long, or it is just extremely difficult for life to get past this "filter" (and then there is the question of whether or not that filter is ahead of us, or behind us).

In this case, I was trying to use it to ask if there is some kind of "unsolved problem", inherent limitation or issue/problem with torrent networks that prevents their widespread usage.

[0] https://en.wikipedia.org/wiki/Great_Filter

ue_8y ago

I wanted to implement a distribured imageboard over bittorrent but I quickly realised it's hard to add data after the initial publication, and further to verify it, and the nature of trackers may make it prone to censorship. So I gave up.

> but I quickly realised it's hard to add data after the initial publication

combining BEPs 46 and 50 enables rapid updates of torrents, but they are fairly new and there are no implementation designed with low latency in mind. Most bittorrent implementations focus on large amounts of data and throughput, so this use-case is not well served in practice even though the protocol could support it now.

Distributing the images/posts via bittorrent and the relations between them in the DHT might be the way to go with such a project.

On the other hand, the an uncensorable imageboard would profit from the verifiable timestamping of a blockchain, with just the images distributed via a bittorrent-like mechanism. That also gives you a decent anti-spam mechanism (you can post in exchange for mining blocks, similar to the original idea of hash-cash)

rmc8y ago

Have you seen ipfs? It might do part of what you want...

Scaevolus8y ago· 10 in thread

1) Chunks don't span files. Each file is validated by the hash of its merkle tree. This is the biggest user-visible change, since it means you can download one file without downloading others.

2) SHA1 is replaced with SHA2-256 (2x longer hashes and not broken).

3) Files are represented by a tree structure instead of a list of dictionaries with paths-- this reduces duplication in deeply-nested hierarchies.

4) Backwards compatible-- you can make a .torrent file with both old and new pieces, and a swarm can speak either. This requires padding files from BEP47, which most clients probably don't support.

Per-file metadata increases pretty significantly, from ~19B (just length) to ~68B (length + hash).

phire8y ago

Per-file metadata increases significantly, but it gets rid of the per piece data (which in bittorrent v1 is 20 bytes of sha1 hash per piece and made up the bulk of the .torrent file).

The .torrent file only stores the merkle tree's root hash for each file, and the torrent client will query it's peers to get the rest of the merkle tree (verifiable against the root hash). The leafs of the merkle tree are the hash of each 16kb block.

Interesting consequences of this:

Piece size isn't baked into the file anymore (and I've seen torrents with 16mb blocks), the client can dynamically chose it's verification piece size by requesting only so many layers of the merkle tree. Or it could skip requesting the tree and verify the whole file at once.

Merkle tree roots will be globally unique. You can scan torrent files for duplicated files and download common files from multiple swarms.

Scaevolus8y ago

Right, in BitTorrent v1 the size of the .torrent file is O(number of files) + O(number of bytes), but with this it's just O(number of files) with a higher constant factor.

Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".

Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?

infogulch8y ago

> You can scan torrent files for duplicated files and download common files from multiple swarms.

This is one of the biggest things I feel is missing from the current protocol and I'm very glad it's in v2 draft. Now when a group of related torrents are repacked into a single torrent all the swarms are complementary instead of competitive. You don't have to choose between seeding the big pack instead of the individual files, just do what you want and the whole swarm still benefits.

computerphage8y ago

> Or it could skip requesting the tree and verify the whole file at once.

To clarify, this works by the client deterministically reconstructing the tree once they have the whole file, then checking the root's hash, correct?

jacksonsabey8y ago

would 2 different torrents that contain a file with the same exact physical hash share the same torrent file hash?

if torrent A and B both contain the exact same file, but torrent A only has the first half available, and torrent B has the second half available, could I combine both torrents to download that file? this could help fix old dead torrents or at least make the file searchable elsewhere by it's sha256 for example

caf8y ago

As long as they also have the same piece size.

yodsanklai8y ago

> Chunks don't span files. Each file is validated by the hash of its merkle tree. This is the biggest user-visible change, since it means you can download one file without downloading others.

But can't you already download one file? I suppose if a chunk spans two files, you may get a few extra KB of another file you don't want, but it's not noticeable from a user perspective.

lmm8y ago

With a lot of clients you'll end up with bogus files on disk as the neighbours of the file you wanted - the client has to download the whole chunk and has to be able to validate its checksum, so it has to put it on disk somewhere. Not a huge problem, but annoying.

daurnimator8y ago

more than a few KB: can be a few MB. It's especially significant if you have a torrent full of text files, and want a subset.

snakeanus8y ago

Since support for Merkle trees is being added, does that mean that it could allow for someone who seeds a torrent to also seed a shared file of a peer that leeches a different torrent?

lowglow8y ago· 5 in thread

Can someone diff the spec from the previous version? What's the changelog? :)

mouldysammich8y ago

The main differences I can see is a change from SHA1 -> SHA2 and also seems to have added official spec for webtorrent.

Luminarys8y ago

It also appears to be using a merkle hash tree for piece hashing now along with a few new peer wire messages to support that.

> official spec for webtorrent

Huh, where do you see that? Not seeing any ctrl-f hits for webtorrent or webrtc.

kpcyrd8y ago

It would be interesting to see how the new version compares to ipfs.

supergreg8y ago

The tree structure seems very similar. It would be nice if torrent clients could interact with ipfs or gain ipfs capabilities. Think torrents that update themselves when the files change (thanks to ipns).

smegel8y ago· 5 in thread

Pity we will never see a genuine version of uTorrent that will support it. That was a real loss.

vanderZwan8y ago

We have plenty of good open source alternatives now. qTorrent works fine

smegel8y ago

Thanks I haven't hear of that. And it is written in C++ which is nice.

Is it considered the spiritual successor to the original uTorrent?

StreamBright8y ago

Sorry I am not sure why is that. Would you mind explaining?

uTorrent used to be really efficient, small memory-footprint, full featured bittorrent-client. One of the best software I've ever used to download... perfectly legal content [1] from the internets.

Now it's full of ads and performs poorly.

[1] Like all the different Linux distro install images over and over again.

I found qBittorrent to be very adequate and full of good options to provide you granularity on how would you like to manage your upload traffic.

redm8y ago· 1 in thread

I just don't see this technology ever going mainstream. I first deployed this type of application in 2003. It was named Redswoosh and did effectively the same thing as BitTorrent, just in a closed client. I was also a very early adopter of BitTorrent using it personally.

Users hated it for general use, even when downloading big files. 1) They didn't like having to install/run some special software to download a file. 2) They didn't like the effects of uploading to others and it slowing down the connections.

Consumer networks are asymmetric having far more download capacity in upload capacity. This makes sense since 1) most users download and want to use the available bandwidth for faster downloads, and 2) it prevents commercial applications on consumer circuits. This is far from ideal for applications like BitTorrent.

I'm not saying there isn't an application for this technology, I'm saying all the good applications don't want to ask the users to pay for distribution to other users. Thus it's relegated to mostly piracy, open source, etc.

Bittorrent Inc. has been trying to commercialize this for a decade now, I just don't see it happening. If there was anyone who could commercialize it, it was Travis Kalnik, and while he exited for 20m, he was very lucky, (and happy) to get out of that market.

snakeanus8y ago

> I just don't see this technology ever going mainstream

It already is though.

0x08y ago· 1 in thread

What's the stuff about "proof layers", is that new in this v2? The paper briefly talks about proof layer requests. Is this something merkle-tree related? What is the purpose? Is it to prevent clients from lying about having pieces they do not have by requesting a verifiable random hash chunk?

It's part of switching to merkle trees instead of flat piece lists. A merkle tree can only be verified if you either have a whole layer or send ancestor-siblings (uncle, great-uncle, etc.) along with a partial layer.

Merkle trees allow torrents to start faster from magnet links since only the tree roots need to be front-loaded while the tree can be fetched incrementally.

shmerl8y ago· 1 in thread

Do all Bittorrent clients support it already?

Currently no bittorrent clients support it. This is still just a draft. I've only just started working on an implementation for libtorrent, it will be quite a while before it is production ready.

richdougherty8y ago

Rationale for hash function change: https://github.com/bittorrent/bittorrent.org/issues/58

Discussion of other changes: https://github.com/bittorrent/bittorrent.org/pull/59

j / k navigate · click thread line to collapse