Skip to content

Top Best Ask Show New Jobs

How we built a new, fast file transfer protocol (opens in new tab)

(trytachyon.com)

90 pointsmcharawi4y ago63 comments

63 comments

45 comments · 16 top-level

mcharawiOP4y ago· 7 in thread

Hey HN! I'm Mahamad, co-founder of Tachyon Transfer, where we're building faster file transfer tools for developers. We've spent the last year building an ultra-fast FTP replacement, and we thought we'd show you guys what our technical process was like. Let me know if you have any questions!

KennyBlanken4y ago

Please show performance tests versus hpn-ssh, GridFTP (aka, the defacto tool of the particle physics and genetics research communities) and simpler systems like wget2's multi-threaded mode.

eps4y ago

Would also be nice to compare against different standard TCP congestion avoidance algs, of which there's plenty.

It is, after all, a very well researched area.

Can I tunnel this over SSH and use it the same way as faster drop-in replacement for SFTP? (Why not?)

mcharawiOP4y ago

Standard SSH uses TCP over port 22 by default, so it wouldn't be possible without modifying SSH to use a different protocol. That being said, however, our protocol uses TLS over UDP via the OpenSSL libraries so it is secure by default. We also offer a BSD-style socket interface that you can use if you want a drop in replacement for TCP sockets. Shoot me a note at mahamad _at_ trytachyon _dot_ com if you want to chat!

tener4y ago

Can you share some actual performance numbers across whatever are the key metrics that you observe?

rsync4y ago

Is this software that one licenses and uses on any arbitrary network or do you run a network of some kind that users pay to access?

Or both ?

I think this is a software package but the tl;dr doesn’t make that clear to me…

mcharawiOP4y ago

At the moment we offer both options. We offer our own network with a pricing plan similar to massive.io (though 10c per gb vs 25c) Our licensing is cheaper but requires large volumes.

fn-mote4y ago· 5 in thread

This article was interesting and also frustrating to read.

1. There are very few numbers. In particular, improvement in performance under various circumstances is _not_ given! If you dig around you can find their transfer time application [1], but there is no discussion on that page.

2. The basis for the improvement is not spelled out. (References are given, but you have to know the field - "acronyms only".) If I understand correctly, their contribution is the improved measures of congestion used. Their landing page just touts "don't use TCP"... which sounds like Step 0 of a very long process.

I admit, the title is basically accurate: "how to build" not "the performance of".

tl;dr: Start with existing work, simulate and improve incrementally.

I don't know anything about the field, but this article didn't lead me to understand any better. I'd love to know the real numbers they observed, which approaches didn't pan out, are they effectively using an error correcting code?

Anyway, it's certainly not an academic paper - just an advertisement.

[1] https://www.trytachyon.com/file-transfer-calculator

I just tried the calculator. It seems that if you're in "US Metro" or "Europe", then the transfer protocol is just as fast as TCP, is this correct? I wonder why this is the case. Is it because the routers play more fairly?

I would expect it means your service provider isn't dropping packets. Their protocol seems to just be more aggressive about not backing off in the face of packet loss, which is helpful if one of your links is a marginal radio connection.

The cynic in me thinks they achieve better throughput because they don't play nice with TCP and monopolize the link while everybody else gets backed off.

mcharawiOP4y ago

Thanks for taking the time to read it! To address your concerns:

1. To give you an idea of the speed improvements, we transferred a 2GB file between Ohio and Singapore on AWS and were able to transfer it in 0:26 (seconds) using our protocol, vs 2:15 for SCP.

2. The basis for improvement is taking into account the changes in round-trip-time for a particular network path; these temporary increases are used as the primary congestion signal.

We are not using error correcting codes, which are good for preventing the retransmission of packets but do not address the underlying problem of avoiding congestion in a network.

koprulusector4y ago

Can I ask a dumb question? Why SCP and not rsync?

Straw4y ago

How much was SCP affected by TCP buffer size tuning?

AitchEmArsey4y ago· 3 in thread

Interesting, but somewhat misses the point; the reason people want an alternative to Aspera is that no-one wants to pay for file transfer tools.

mcharawiOP4y ago

Thanks for the feedback-we're actually planning to open-source a version of our work that significantly improves on the original UDT project: https://udt.sourceforge.io/.

AitchEmArsey4y ago

Look forward to it. I'd be interested to hear how your tool compares with Facebook WDT[1], as that would be my go-to right now if someone asked me for a fast point-to-point data transfer solution.

[1] https://github.com/facebook/wdt

rurban4y ago

did you describe your UDT improvements somewhere? why not push them to upstream?

dochtman4y ago· 3 in thread

Does UDT come with encryption? If so, how does it compare to QUIC?

mcharawiOP4y ago

The canonical UDT implementation does not come with encryption, however there are some older open source GitHub repos that have attempted to add TLS to UDT. The original author of UDT, Yunhong Gu, has a project called Sector/Sphere that adds some application-level encryption to file transfer if you want to check it out: http://sector.sourceforge.net/. We've added encryption for our algorithm though!

With respect to QUIC, I believe it was designed specifically to reduce the latency of HTTP connections by using multiple UDP flows and building the reliability/ordering guarantees at the application layer.

The problem with getting performance increases out of multiple, distinct traffic flows is that you become more and more unfair to other packet traffic as you increase the number of flows you are using. For example, if you use 9 TCP (or any other AIMD) flows to send a file over some link, and a tenth connection is started, you now are taking up to 90% of the available bandwidth (because AIMD flows are designed to be fair amongst themselves).

klabb34y ago

> The problem with getting performance increases out of multiple, distinct traffic flows

Is this in response to Quic or just multiple TCP streams? Afaik quic multiplexes everything over a single connection on a single port.

As for fairness, the reality (aiui) is that neither tcp nor udp is guaranteed more bandwidth. It's all up to middleboxes, and I think there are quite a few that assume udp is "less important"/"can deal better with degraded performance" (eg video conferencing, live streams) and instead prefers tcp, in which case there's no recourse. Did you ever observe any conditions like this?

> AIMD

Additive Increase Multiplactive Decrease (for others wondering)

> a feedback control algorithm best known for its use in TCP congestion control. AIMD combines linear growth of the congestion window when there is no congestion with an exponential reduction when congestion is detected.

-- https://en.wikipedia.org/wiki/Additive_increase/multiplicati...

ncmncm4y ago· 2 in thread

The article has essentially no technical information on how the protocol works, besides the unlabeled flow rate graphs which show TCP rate popping up and down and theirs more or less constant. Their rate appears much less stable than FASP's. OTOH, FASP is extremely expensive.

I have updated the Wikipedia page on the FASP protocol they compete with, to provide more detail on how it works. Theirs might work similarly.

Curiously, FASP usage only really took off after the product got a comprehensive GUI file management app, with scheduling and a speed control you could drag up and down. Being able to transfer files overseas several times as fast as the competition was not enough.

FASP was on HN a couple of years ago, https://news.ycombinator.com/item?id=21898072

mcharawiOP4y ago

Hey, thanks for your comment-you're right there aren't too many details on the protocol changes we made in part because we will follow up on that in another post. This blog post was getting too long as it is and we wanted to focus more on the need to simulate and test. Your comments on the hacker news post you link too actually partially served to inspire us!

ncmncm4y ago

I am glad to have somebody acting on this stuff.

The original design for FASP's flow control was developed in China and applied in a router: you would have one at each end, and it would spoof TCP for clients. That was a huge flop, because it didn't say Cisco on the nameplate.

The Aspera principals realized they could implement it in user space using UDP, bypassing the network infrastructure purchasing cabal, and sell directly to the users who had data to move.

I always wanted to get the astronomy community using it (e.g. to Antarctica and Atacama, Chile) under a free license, but never quite got there.

rsync4y ago· 2 in thread

I actually read the entire article and was specifically looking for a reference to hpn-ssh which I think is the most standard way to approach this … can op comment here on that tooling and how that compares and contrasts ?

mcharawiOP4y ago

Thanks for reading!

I haven't seen hpn-ssh before, but from a cursory look at the project page it looks like the main improvements are targeted at improving the speed of the encryption using multi-threading, and increasing ssh/scp buffer sizes. These are certainly good improvements over standard ssh/scp (and setting TCP buffers to the value of the bandwidth delay product for a particular network path is a well known way to squeeze some perf out of TCP) but do not address the root cause of slowdown in window-based, loss-based congestion control.

In order to be fair to other flows, exponential back-off is required on detection of congestion, and packet loss as an indicator of congestion is both a lagging indicator of congestion and has a very low signal to noise ratio on high throughput, lossy networks.

KennyBlanken4y ago

hpn-ssh is specifically designed for high latency, high bandwidth file transfer and is more than just "big buffers and multi-threaded." And the question remains: how does your solution compare in simulated and real-world testing?

It's a little strange that you "conducted an extensive literature review" of congestion algorithms but you aren't aware of basic common tools like hpn-ssh, wget2's multithreading mode, or GridFTP which is used extensively in particle physics and genetics research communities.

kkfx4y ago· 2 in thread

IMVHO the main issue in file transfer today is that in 2022 most people still do not have a public ip (like an IPv6 global ones) so most people still have NAT traversal issues and need to relay on third parties or not-so-performant more or less distributed networks...

The second main issue is that most do not own a personal domain name with a subdomain per personal host (like {desktop,craphone,laptop}.mydomain.tld etc).

Those two issues are so big IMVHO that push all others aside...

Yep. Most people do not have an internet connection. They only have a web+ connection from a mobile telco further restricted by not having control of their hardware. So, if we care about the internet we can ignore them and just develop as we always have for actual computers on the internet.

But if you're only profit motivated then this isn't reasonable and you should only target smartphones without internet access and gimp your software to make it possible to run on such limited platforms.

kkfx4y ago

That's why some use FLOSS and others have invented the concept of public research, and that's why they are they are fought, parasitized, hindered, denaturized from the inside.

The trick is to know how much "little" critical mass is needed to succeed...

kevinherron4y ago· 2 in thread

> If your internet connection is 1Gps and you are transferring a 10Gb file, it should theoretically take 10 seconds to transfer

Err what? I don't know the this "Gps" unit is, but if it's 1Gbps (gigabit per second), and a 10GB (gigabyte) file, that's not how it works... it would be 80 seconds.

It should be OK. It says "10Gb" not "10GB", ie. it is a 10 gigabit file. (while it is untraditional to measure file size in bits, it should be perfectly fine)

mcharawiOP4y ago

Sorry about the typo-you are right it should be Gbps. As for the transfer time, we are just using bits for the file size to make the mental math easier.

metadat4y ago· 1 in thread

> It took us a little while to build UDT..

> Building this infrastructure took a substantial amount of time..

> If anyone is interested in trying the Tachyon Transfer Algorithm we offer a storage transfer acceleration API like AWS does. Our SDK includes node, c++ and objc and could be used in a wide variety of applications

So it was a lot of effort, and now they're inviting Big-G and Cloudflare to contact them to possibly achieve a paltry 30%-ish speed increase for certain scenarios? Or are they inviting app devs who want faster video uploads to reach out? What is the actual use case where the sometimes 30% improvement matters and actually moves the needle?!

Why hasn't Tachyon been working with their prospective customers and warming them up the whole time, or at least working the social and investor nets and reaching out proactively already?

This strategy is kind of like being a dweeb at a poorly lit school dance and hoping the most popular girl at the dance somehow notices you're wearing shoes that let you float a centimeter in the air. Cool trick, bud.

Presumably it's not a $10/mo service contract. Is this really an effective strategy when building and selling to enterprise these days? To me it sounds like a risky and hard way to make less money than what is possible using tried and true product development strategies. To be fair, I have also made this mistake before. It was embarrassing enough as a solo-founder, and seems less forgivable with larger founding group sizes, because it means more folks agreed to support and follow such a sub-optimal harebrained scheme :)

You all sound like very capable software engineers, and I know it's both fun and satisfying to build and make The Thing.

Good luck, sincerely.

p.s. You may also consider pursuing some of the medium sized targets like Backblaze, Rackspace, Larry Ellisons Oracle OCI, or Microshaft Azure.

(sorry, I couldn't resist having some fun at the end, though the suggestion is serious!)

hansel_der4y ago

sarcasm aside, FANG's have something similar for some time now.

this is aimed at consumers (scientists)

amaccuish4y ago· 1 in thread

Never understood, once SMB gets going it's pretty fast, but it takes agessss to list a directory. Like why can't it just pipe the output of dir() or ls (when samba) out over the network.

hansel_der4y ago

comparing a ford-fiesta with a hypersonic rocket-car; totally different applications.

parallelism, integrity/reliability and efficiency get somewhat more important once you regularly have to shuffle petabytes around the globe.

ac130kz4y ago· 1 in thread

Some basic transfer based on UDP with forward error correction is a really good solution to tackle packet loss and avoid TCP congestion entirely.

mcharawiOP4y ago

So congestion and packet loss are different problems; it is true that forward error correction could be a good way to avoid retransmitting lost packets, but the only way to avoid congestion is to adjust the congestion window (for window based congestion control) or packet sending rate (for rate based congestion control) based on some indicator of congestion.

Scaevolus4y ago

Always good to see more in this space! Long fat networks (LFNs or "elephants") are everywhere, especially once you start moving data between continents.

I've had success personally with UFTP, but you explicitly set the transmit rate. Don't forget to enable encryption/authentication if you want the downloads to be verified! You'll get silent UDP corruption otherwise: http://uftp-multicast.sourceforge.net/

JZL0034y ago

I tried a couple of different programs for personal use and udt/wft wouldn't compile for me even after 2 hours+ messing with it

CERN's fdt https://github.com/fast-data-transfer/fdt was way better. I did use it over ssh port forwarding which artificially slowed things down, but it saturated my uplink to 300-400 MB/s

mypalmike4y ago

I worked at a tier 2 ISP about 15 years ago that developed multiple products trying to sell accelerated transfers as a service. They worked similarly to what this article describes. The problem was that there were very few buyers. It's easier to sell transparent acceleration boxes as an appliance, and even then it's very niche.

bradknowles4y ago

You need to show detailed benchmark examples with other protocols, including s3.

Otherwise, it's empty paper-ware.

charcircuit4y ago

Where is the download for the SDK?

j / k navigate · click thread line to collapse