Infinite Git repos on Cloudflare workers (opens in new tab)

(gitlip.com)

144 pointsplesiv1y ago90 comments

90 comments

> We’re building Gitlip - the collaborative devtool for the AI era. An all-in-one combination of Git-powered version control, collaborative coding and 1-click deployments.

Did they get a waiver from the git team to name it as such?

Per the trademark policy, new “git${SUFFIX}” names aren’t allowed: https://git-scm.com/about/trademark

>> In addition, you may not use any of the Marks as a syllable in a new word or as part of a portmanteau (e.g., "Gitalicious", "Gitpedia") used as a mark for a third-party product or service without Conservancy's written permission. For the avoidance of doubt, this provision applies even to third-party marks that use the Marks as a syllable or as part of a portmanteau to refer to a product or service's use of Git code.

WorkerBee284741y ago

You don't need their permission to make a portmanteau, all you need is to follow trademark law (which may or may not allow it). The policy page can go kick sand.

saurik1y ago

While true, using someone else's trademark as a prefix of your name when you are actively intending it to reference the protected use seems egregious.

1 more reply

fumplethumb1y ago

What about… GitHub, Gitlab, Gitkraken, GirButler (featured on HN recently)? The list goes on forever!

afiori1y ago

Supposedly they got written permission

1 more reply

plesivOP1y ago

OP here. Oops, thank you for pointing that out! We weren’t aware of it. We will investigate ASAP. In the worst case, we’ll change our name.

benatkin1y ago

Doesn't sound worse case to me. It could use a better name anyway.

1 more reply

rzzzt1y ago

What about an old word? Agitator, legitimate, cogitate?

singron1y ago

Is the syllable still the Mark if you pronounce it with a soft G instead of a hard G?

ecshafer1y ago

Github doesn't stop me from making an infinite number of git repos. Or maybe they do, but I have never hit the limit. And if I am hitting that limit, and become a large enterprise customer, I am sure they would work with me on getting around that limit.

Where does this fit into a product? Maybe I am blind, but while this is cool, I don't really see where I would want this.

aftbit1y ago

Github would definitely reach out if you tried to make 100k+ Github repos. We once automatically opened issues in response to exceptions (sort of a ghetto Bugsnag / Sentry) and received a nice email from an engineer asking us if we really needed to do that when we hit around the 200k mark.

no_wizard1y ago

Oh here’s an interesting idea.

What if these bug reporting platforms could create a branch and tag it for each issue.

This would be particularly useful for point and time things where you have an immutable deployment branch. So it could create a branch off that immutable deployment branch and tag it, so you always have a point in time code reference for bugs.

Would that be useful? I feel like what you’re doing here isn’t that different if I get what’s going on (basically creating one repository per bug?)

2 more replies

foota1y ago

In some ways, you could imagine repos might be more scalable than issues within a repo, since you could reasonably assume a bound on the number of issues in a single repo.

plesivOP1y ago

OP here. We’re building a new kind of Git platform. "Infinity" is more beneficial for us as platform builders (simplifying infrastructure) but less relevant to our customers as users.

creatonez1y ago

Read the article. It's not literally a foray into creating endless git repositories or anything fancy like highly strategic git-specific data compression, they are just using "infinite" as a buzzword for 'highly horizontally scalable. The product is something like Github.

shivasaxena1y ago

Imagine every notion docs or every airtable base being a a git repo. Imagine the PR workflow that we developers love being available to everyone.

yjftsjthsd-h1y ago

> It allows us to easily host an infinite number of repositories

I like this system in general, but I don't understand why scaling the number of repos is treated as a pinch point? Are there git hosts that struggle with the number of repos hosted in particular? (I don't think the "Motivation" section answers this, either.)

icambron1y ago

Seems like it enables you do things like use git repos as per-customer or per-some-business-object storage, which you otherwise wouldn't even consider. Like imagine you were setting up a blogging site where each blog was backed by a repo

beachy1y ago

Or perhaps a SaaS product where individual customers had their own fork of the code.

There are many reasons not to do this, perhaps this scratches away at one of them.

plesivOP1y ago

OP here.

It’s unlikely any Git providers struggle with the number of repos they're hosting, but most are larger companies.

Currently, we're a bootstrapped team of 2. I think our approach changes the kind of product we can build as a small team.

rad_gruchalski1y ago

How? What makes it so much more powerful than gitea hosted on a cheap vps with some backup in s3?

Unless, of course, your product is infinite git repos with cf workers.

1 more reply

bhl1y ago

Serverless git repos would be useful if you wanted to make a product like real-time collaboration + offline support code editing in the browser.

You can still sync to a platform like GitHub or BitBucket after all users close their tabs.

A long time ago, I looked into using isomorphic-git with lightning-fs to build light note-taking app in the browser: pull your markdown files in, edit them in a rich-text-editor a la Notion, stage and then commit changes back using git.

aphantastic1y ago

That’s essentially what github.dev and vscode.dev do FWIW.

jauntywundrkind1y ago

> After extensive research, we rewrote significant parts of Emscripten to support asynchronous file system calls.

> We ended up creating our own Emscripten filesystem on top of Durable Objects, which we call DOFS.

> We abandoned the porting efforts and ended up implementing the missing Git server functionality ourselves by leveraging libgit2’s core functionality, studying all available documentation, and painstakingly investigating Git’s behavior.

Using a ton of great open source & taking it all further. Would sure be great if ya'll could contribute some of this forward!

Libgit2 is GPL with Linking Exception, and Emscripten MIT so I think legally everything is in the clear. But it sure would be such a boon to share.

plesivOP1y ago

Definitely! We're focused on launching right now, but once we have more bandwidth, we'd be happy to do it.

I believe our changes are solid, but they’re tailored specifically to our use case and can’t be merged as-is. For example, our modifications to libgit2 would need at least as much additional code to make them toggleable in the build process, which requires extra effort.

abstractbeliefs1y ago

No free software no support. You don't have to merge it upstream right away, but publish it for others to study and use as permitted by the license.

sluongng1y ago

@plesiv could you please elaborate on how repack/gc is handled with a libgit2 backend? I know that Alibaba has done something similar in the past based on libgit2, but I have yet to see another implementation in the wild like this.

Very cool project. I hope Cloudflare workers can support more protocols like SSH and GRPC. It's one of the reasons why I prefer Fly.io over Cloudflare worker for special servers like this.

plesivOP1y ago

Great question! By default, with libgit2 each write to a repo (e.g. push) will create a new pack file. We have written a simple packing algorithm that runs after each write. It works like this:

Choose these values:

* P, pack "Planck" size, e.g. 100kB

* N, branching factor, e.g. 8

After each write:

1. iterate over each pack (pack size is S) and assign each pack a class C which is the smallest integer that satisfies P * N^C > S

2. iterate variable c from 0 to the maximum value of C that you got in step 2

* if there are N packs of class c, repack them into a new pack, new pack is going to be at most of class c+1

betaby1y ago

Somewhat related question. Assume I have ~1k ~200MB XML files that get ~20% of their content changed. What are my best option to store them? While using vanilla git on a SSD raid10 works, that's quite slow in retrieving historical data dating back ~3-6 months. Are there other options for a quickie back-end? I'm fine with it being not that storage efficient to a degree.

adobrawy1y ago

I don't know what your "best" criterion is (implementation costs, implementation time, maintainability, performance, compression ratio, etc.). Still, the easiest way to start is to delegate it to the file system, so zfs + compression. Access time should be decent. No application-level changes are required to enable that.

betaby1y ago

It is already on ZFS with compression.

a_e_k1y ago

Crazy thought: something like BorgBackup.

If only 20% of the content gets changed, the rolling hash that Borg does to chunk files could identify the 80% common parts and then with its deduplication it would store just a single compressed copy of those chunks. And as a bonus, it's designed for handling historical data.

nomel1y ago

If you can share, but I'd be curious to know what that large of an XML file might be used for, and what benefits it might have over other formats. My persona and professional use of XML has been pretty limited, but XSD was super powerful, and the reason we choose it when we did.

betaby1y ago

Juniper routers configs, something like below.

adamc@router> show arp | display xml <rpc-reply xmlns:JUNOS="http://xml.juniper.net/JUNOS/15.1F6/JUNOS"> <arp-table-information xmlns="http://xml.juniper.net/JUNOS/15.1F6/JUNOS-arp" JUNOS:style="normal"> <arp-table-entry> <mac-address>0a:00:27:00:00:00</mac-address> <ip-address>10.0.201.1</ip-address> <hostname>adamc-mac</hostname> <interface-name>em0.0</interface-name> <arp-table-entry-flags> <none/> </arp-table-entry-flags> </arp-table-entry> </arp-table-information> <cli> <banner></banner> </cli> </rpc-reply>

hobs1y ago

it's a good question because my answer for a system like this which had very little schema changing was just dump it into a database and add historical tracking per object that way, hash, compare, insert and add historical record.

1 more reply

o11c1y ago

I'm not sure if this quite fits your workload, but a lot of times people use `git` when `casync` would be more appropriate.

hokkos1y ago

You can compress in EXI, it's a format for XML and if it is informed by the schema can give a big boost in compression.

tln1y ago

> get ~20% of their content changed

...daily? monthly? how many versions do you have to keep around?

I'd look at a simple zstd dictionary based scheme, first. Put your history/metadata into a database. Put the XML data into file system/S3/BackBlaze/B2, zstd compressed against a dictionary.

Create the dictionary : zstd --train PathToTrainingSet/* -o dictionaryName Compress with the dictionary: zstd FILE -D dictionaryName Decompress with the dictionary: zstd --decompress FILE.zst -D dictionaryName

Although you say you're fine with it being not that storage efficient to a degree, I think if you were OK with storing every version of every XML file, uncompressed, you wouldn't have to ask right?

betaby1y ago

If one stores a whole versions of the files that defeats the idea of git, and would consume too much space. I suppose I don't even need zstd if I have ZFS with compression, although compression levels won't be as good.

1 more reply

skybrian1y ago

Not having a technical limit is nice, because then it’s a matter of spending money. But whenever I see “infinite,” I ask what it will cost. How expensive is it to host git repos this way?

As a hobbyist, “free” is pretty appealing. I’m pretty sure my repos on GitHub won’t cost me anything, and that’s unlikely to change anytime soon. Not sure about the new stuff.

jsheard1y ago

With CloudFlare at least when you overstay your welcome on the free plan they just start nagging you to start paying, and possibly kick you out if you don't, rather than sending you a surprise bill for $10,000 like AWS or Azure or GCP might do.

VoidWhisperer1y ago

Not the main purpose of the article but they mention they were working on a notetaking app oriented towards developers - did anything ever come of that? If not, does anyone know products that might fit this niche? (I currently use obsidian)

plesivOP1y ago

OP here. Not yet - it's about 50% complete. I plan to open-source it in the future.

nbbaier1y ago

Definitely interested in seeing this as well. What are the key features?

tln1y ago

Congrats, you've done a lot of interesting work to get here.

This could be a fantastic building block for headless CMS and the like.

plesivOP1y ago

OP here. Thank you and good catch! :-) We have a blog post planned on that topic.

seanvelasco1y ago

this leverages Durable Objects, but as i remember from two years ago, DO's way of guaranteeing uniqueness is that there can only be once instance of that DO in the world.

what if there are two users who wants to access the same DO repo at the same time, one in the US and the other in Singapore? the DO must live either in US servers or SG servers, but not at the same time. so one of the two users must have high latency then?

then after some time, a user in Australia accesses this DO repo - the DO bounces to AU servers - US and SG users will have high latency?

but please correct me if i'm wrong

skybrian1y ago

Last I heard, durable objects don’t move while running. It doesn’t seem worse than hosting in US-East, though.

tredre31y ago

> Wanting to avoid managing the servers ourselves, we experimented with a serverless approach.

I must be getting old but building a gigantic house of card of interlinked components only to arrive to a more limited solution is truly bizarre to me.

The maintenance burden for a VPS: periodically run apt update upgrade. Use filesystem snapshots to create periodic backups. If something happens to your provider, spin up a new VM elsewhere with your last snapshot.

The maintenance burden for your solution: Periodically merge upstream libgit2 in your custom fork, maintain your custom git server code and audit it for vulnerabilities, make sure everything still compiles with emscripten, deploy it. Rotate API keys to make sure your database service can talk to your storage service and your worker service. Then I don't even know how you'd backup all this to get it back online quickly if something happened to cloudflare. And all that only to end up with worse latency than a VPS, and more size constraints on the repo and objects.

But hey, at least it scales infinitely!

notamy1y ago

> The maintenance burden for a VPS: periodically run apt update upgrade. Use filesystem snapshots to create periodic backups. If something happens to your provider, spin up a new VM elsewhere with your last snapshot.

And make sure it reboots for kernel upgrades (or set up live-patching), and make sure that service updates don't go wrong[0], and make sure that your backups work consistently, and make sure that you're able to vertically or horizontally scale, and make sure it's all automated and repeatable, and make sure the automation is following best-practices, and make sure you're not accidentally configuring any services to be vulnerable[1], and ...

Making this stuff be someone else's problem by using managed services is a lot easier, especially with a smaller team, because then you can focus on what you're building and not making sure your SPOF VPS is still running correctly.

[0] I self-host some stuff for a side-project right now, and packages updates are miserable because they're not simply `apt-get update && apt-get upgrade`. Instead, the documented upgrade process for some services is more/less "dump the entire DB, stop the service, rm -rf the old DB, upgrade the service package, start the service, load the dump in, hope it works."

[1] Because it's so easy to configure something to be vulnerable because it makes it easier, even if the vulnerability was unintentional.

kentonv1y ago

> Periodically merge upstream libgit2 in your custom fork, maintain your custom git server code and audit it for vulnerabilities, make sure everything still compiles with emscripten, deploy it.

There's only a difference here because there exist off-the-shelf git packages for traditional VPS environments but there do not yet exist off-the-shelf git packages for serverless stacks. The OP is a pioneer here. The work they are doing is what will eventually make this an off-the-shelf thing for everyone else.

> Rotate API keys to make sure your database service can talk to your storage service and your worker service.

Huh? With Durable Objects the storage is local to each object. There is no API key involved in accessing it.

> Then I don't even know how you'd backup all this

Durable Object storage (under the new beta storage engine) automatically gives you point-in-time recovery to any point in time in the last 30 days.

https://developers.cloudflare.com/durable-objects/api/storag...

> And all that only to end up with worse latency than a VPS

Why would it be worse? It should be better, because Cloudflare can locate each DO (git repo) close to whoever is accessing it, whereas your VPS is going to sit in one single central location that's probably further away.

> and more size constraints on the repo and objects.

While each individual repo may be more constrained, this solution can scale to far more total repos than a single-server VPS could.

(I'm the tech lead for Cloudflare Workers.)

Spunkie1y ago

I've been wondering what to do to backup our github repos other than keeping a local copy and/or dumping them on something like S3.

I would love to use this to serve as a live/working automatic backup for my github repos on CF infrastructure.

yellow_lead1y ago

The latency on the examples seems quite slow, around 7 seconds to a full load for me.

https://gitlip.com/@nataliemarleny/test-repo

plesivOP1y ago

OP here. That’s expected for now, and we’re working on a solution. We didn’t explain the reason in the post because we plan to cover it in a separate write-up.

yellow_lead1y ago

I see you haven't launched yet so that's fair. Looking forward to trying it

ericyd1y ago

Engaging read! For me, just the right balance of technical detail and narrative content. It's a hard balance to strike and I'm sure preferences vary widely which makes it an impossible target for every audience.

csomar1y ago

This picked my interest as I am working on a Git product and using Cloudflare Workers for most of my back-end. I looked through the options, but the hard limit for Cloudflare workers and the fact that most interesting repos (that is companies you want to sell to) have repos in the Gbs means the platform is not fit for this.

I am ending up with AWS lambdas. Not only that solves the Wasm issue but you can have up to 10Gb of memory on a single instance. That is close to enough for most use cases. 100Mb? Not really.

gkoberger1y ago

This is really cool! I've been building something on libgit2 + EFS, and this approach is really interesting.

Between libgit2 on emscripten, the number of file writes to DO, etc, how is performance?

markphip1y ago

I wonder if they considered or looked at using JGit? https://github.com/eclipse-jgit/jgit

It provides client and server API. The latter is used by Gerrit for its server. https://www.gerritcodereview.com

Not sure what the Java to WASM story is if that is a requirement for what they need.

stavros1y ago

This is a very impressive technical achievement, and it's clear that a lot of work went into it.

Unfortunately, the entrepreneur in me continues that thought with "work that could have gone into finding customers instead". Now you have a system that could store "infinite" git repos, but how many customers?

deadbunny1y ago

TFA is literally marketing for them. HN is their target audience and a good way to capture that audience is to show them something technically interesting.

akerl_1y ago

I agree. That’s a really unfortunate way to view somebody’s project.

stavros1y ago

Is it a project, or is it a company?

1 more reply

scosman1y ago

Serverless git repos: super cool

But I can't figure out what makes this an AI company. Seems like a collaboration tool?

plesivOP1y ago

OP here. We're not an AI company; we're aiming to be AI-adjacent and simplify the practical application of AI models.

ijamj1y ago

Honest question: how is this "AI-adjacent"? How does it specifically "simplify the practical application of AI models"? Focus of the question being on "AI"...

iampims1y ago

Some serious engineering here. Kudos!

nathants1y ago

this is very cool!

i prototyped a similar serverless git product recently using a different technique.

i used aws lambda holding leases in dynamo backed by s3. i zipped git binaries into the lambda and invoked them directly. i used shallow clone style repos stored in chunks in s3, that could be merged as needed in lambda /tmp.

lambda was nice because for cpu heavy ops like merging many shallow clones, i could do that in a larger cpu lambda, and cache the result.

other constraints were similar to what is described here. mainly that an individual push/pull cannot exceed the api gateway max payload size, a few MB.

i looked at isomorphic, but did not try emscripten libgit2. using cloudflare is nice because of free egress, which opens up many new use cases that don’t make sense on $0.10/GB egress.

i ended up shelving this while i build a different product. glad to see others pursuing the same thing, serverless git is an obvious win! do you back your repos with r2?

for my own git usage, what i ended up building was a trustless git system backed by dynamo and s3 directly. this removes the push/pull size limit, and makes storage trustless. this uses git functionality i had no idea about prior, git bundle and unbundle[1]. they are used for transfer of git objects without a server, serverless git! this one i published[2].

good luck with your raise and your project. looking forward to the next blog. awesome stuff.

1. https://git-scm.com/docs/git-bundle

2. https://github.com/nathants/git-remote-aws

gavindean901y ago

I really like the idea if file system over durable objects

eastdakota1y ago

I do too.

bagels1y ago

Infinite sounds like a bug happened. It's obviously not infinite, some resource will eventually be exhausted, in this case, memory.

j / k navigate · click thread line to collapse

90 comments

koolba1y ago

> We’re building Gitlip - the collaborative devtool for the AI era. An all-in-one combination of Git-powered version control, collaborative coding and 1-click deployments.

Did they get a waiver from the git team to name it as such?

Per the trademark policy, new “git${SUFFIX}” names aren’t allowed: https://git-scm.com/about/trademark

WorkerBee284741y ago

You don't need their permission to make a portmanteau, all you need is to follow trademark law (which may or may not allow it). The policy page can go kick sand.

saurik1y ago

While true, using someone else's trademark as a prefix of your name when you are actively intending it to reference the protected use seems egregious.

1 more reply

fumplethumb1y ago

What about… GitHub, Gitlab, Gitkraken, GirButler (featured on HN recently)? The list goes on forever!

afiori1y ago

Supposedly they got written permission

1 more reply

plesivOP1y ago

OP here. Oops, thank you for pointing that out! We weren’t aware of it. We will investigate ASAP. In the worst case, we’ll change our name.

benatkin1y ago

Doesn't sound worse case to me. It could use a better name anyway.

1 more reply

rzzzt1y ago

What about an old word? Agitator, legitimate, cogitate?

singron1y ago

Is the syllable still the Mark if you pronounce it with a soft G instead of a hard G?

ecshafer1y ago

Where does this fit into a product? Maybe I am blind, but while this is cool, I don't really see where I would want this.

aftbit1y ago

no_wizard1y ago

Oh here’s an interesting idea.

What if these bug reporting platforms could create a branch and tag it for each issue.

Would that be useful? I feel like what you’re doing here isn’t that different if I get what’s going on (basically creating one repository per bug?)

2 more replies

foota1y ago

In some ways, you could imagine repos might be more scalable than issues within a repo, since you could reasonably assume a bound on the number of issues in a single repo.

plesivOP1y ago

OP here. We’re building a new kind of Git platform. "Infinity" is more beneficial for us as platform builders (simplifying infrastructure) but less relevant to our customers as users.

creatonez1y ago

shivasaxena1y ago

Imagine every notion docs or every airtable base being a a git repo. Imagine the PR workflow that we developers love being available to everyone.

yjftsjthsd-h1y ago

> It allows us to easily host an infinite number of repositories

icambron1y ago

beachy1y ago

Or perhaps a SaaS product where individual customers had their own fork of the code.

There are many reasons not to do this, perhaps this scratches away at one of them.

plesivOP1y ago

OP here.

It’s unlikely any Git providers struggle with the number of repos they're hosting, but most are larger companies.

Currently, we're a bootstrapped team of 2. I think our approach changes the kind of product we can build as a small team.

rad_gruchalski1y ago

How? What makes it so much more powerful than gitea hosted on a cheap vps with some backup in s3?

Unless, of course, your product is infinite git repos with cf workers.

1 more reply

bhl1y ago

Serverless git repos would be useful if you wanted to make a product like real-time collaboration + offline support code editing in the browser.

You can still sync to a platform like GitHub or BitBucket after all users close their tabs.

aphantastic1y ago

That’s essentially what github.dev and vscode.dev do FWIW.

jauntywundrkind1y ago

> After extensive research, we rewrote significant parts of Emscripten to support asynchronous file system calls.

> We ended up creating our own Emscripten filesystem on top of Durable Objects, which we call DOFS.

Using a ton of great open source & taking it all further. Would sure be great if ya'll could contribute some of this forward!

Libgit2 is GPL with Linking Exception, and Emscripten MIT so I think legally everything is in the clear. But it sure would be such a boon to share.

plesivOP1y ago

Definitely! We're focused on launching right now, but once we have more bandwidth, we'd be happy to do it.

abstractbeliefs1y ago

No free software no support. You don't have to merge it upstream right away, but publish it for others to study and use as permitted by the license.

sluongng1y ago

Very cool project. I hope Cloudflare workers can support more protocols like SSH and GRPC. It's one of the reasons why I prefer Fly.io over Cloudflare worker for special servers like this.

plesivOP1y ago

Great question! By default, with libgit2 each write to a repo (e.g. push) will create a new pack file. We have written a simple packing algorithm that runs after each write. It works like this:

Choose these values:

* P, pack "Planck" size, e.g. 100kB

* N, branching factor, e.g. 8

After each write:

1. iterate over each pack (pack size is S) and assign each pack a class C which is the smallest integer that satisfies P * N^C > S

2. iterate variable c from 0 to the maximum value of C that you got in step 2

* if there are N packs of class c, repack them into a new pack, new pack is going to be at most of class c+1

betaby1y ago

adobrawy1y ago

betaby1y ago

It is already on ZFS with compression.

a_e_k1y ago

Crazy thought: something like BorgBackup.

nomel1y ago

betaby1y ago

Juniper routers configs, something like below.

hobs1y ago

1 more reply

o11c1y ago

I'm not sure if this quite fits your workload, but a lot of times people use `git` when `casync` would be more appropriate.

hokkos1y ago

You can compress in EXI, it's a format for XML and if it is informed by the schema can give a big boost in compression.

tln1y ago

> get ~20% of their content changed

...daily? monthly? how many versions do you have to keep around?

I'd look at a simple zstd dictionary based scheme, first. Put your history/metadata into a database. Put the XML data into file system/S3/BackBlaze/B2, zstd compressed against a dictionary.

Although you say you're fine with it being not that storage efficient to a degree, I think if you were OK with storing every version of every XML file, uncompressed, you wouldn't have to ask right?

betaby1y ago

1 more reply

skybrian1y ago

Not having a technical limit is nice, because then it’s a matter of spending money. But whenever I see “infinite,” I ask what it will cost. How expensive is it to host git repos this way?

As a hobbyist, “free” is pretty appealing. I’m pretty sure my repos on GitHub won’t cost me anything, and that’s unlikely to change anytime soon. Not sure about the new stuff.

jsheard1y ago

VoidWhisperer1y ago

plesivOP1y ago

OP here. Not yet - it's about 50% complete. I plan to open-source it in the future.

nbbaier1y ago

Definitely interested in seeing this as well. What are the key features?

tln1y ago

Congrats, you've done a lot of interesting work to get here.

This could be a fantastic building block for headless CMS and the like.

plesivOP1y ago

OP here. Thank you and good catch! :-) We have a blog post planned on that topic.

seanvelasco1y ago

this leverages Durable Objects, but as i remember from two years ago, DO's way of guaranteeing uniqueness is that there can only be once instance of that DO in the world.

then after some time, a user in Australia accesses this DO repo - the DO bounces to AU servers - US and SG users will have high latency?

but please correct me if i'm wrong

skybrian1y ago

Last I heard, durable objects don’t move while running. It doesn’t seem worse than hosting in US-East, though.

tredre31y ago

> Wanting to avoid managing the servers ourselves, we experimented with a serverless approach.

I must be getting old but building a gigantic house of card of interlinked components only to arrive to a more limited solution is truly bizarre to me.

But hey, at least it scales infinitely!

notamy1y ago

[1] Because it's so easy to configure something to be vulnerable because it makes it easier, even if the vulnerability was unintentional.

kentonv1y ago

> Periodically merge upstream libgit2 in your custom fork, maintain your custom git server code and audit it for vulnerabilities, make sure everything still compiles with emscripten, deploy it.

> Rotate API keys to make sure your database service can talk to your storage service and your worker service.

Huh? With Durable Objects the storage is local to each object. There is no API key involved in accessing it.

> Then I don't even know how you'd backup all this

Durable Object storage (under the new beta storage engine) automatically gives you point-in-time recovery to any point in time in the last 30 days.

https://developers.cloudflare.com/durable-objects/api/storag...

> And all that only to end up with worse latency than a VPS

> and more size constraints on the repo and objects.

While each individual repo may be more constrained, this solution can scale to far more total repos than a single-server VPS could.

(I'm the tech lead for Cloudflare Workers.)

Spunkie1y ago

I've been wondering what to do to backup our github repos other than keeping a local copy and/or dumping them on something like S3.

I would love to use this to serve as a live/working automatic backup for my github repos on CF infrastructure.

yellow_lead1y ago

The latency on the examples seems quite slow, around 7 seconds to a full load for me.

https://gitlip.com/@nataliemarleny/test-repo

plesivOP1y ago

OP here. That’s expected for now, and we’re working on a solution. We didn’t explain the reason in the post because we plan to cover it in a separate write-up.

yellow_lead1y ago

I see you haven't launched yet so that's fair. Looking forward to trying it

ericyd1y ago

csomar1y ago

I am ending up with AWS lambdas. Not only that solves the Wasm issue but you can have up to 10Gb of memory on a single instance. That is close to enough for most use cases. 100Mb? Not really.

gkoberger1y ago

This is really cool! I've been building something on libgit2 + EFS, and this approach is really interesting.

Between libgit2 on emscripten, the number of file writes to DO, etc, how is performance?

markphip1y ago

I wonder if they considered or looked at using JGit? https://github.com/eclipse-jgit/jgit

It provides client and server API. The latter is used by Gerrit for its server. https://www.gerritcodereview.com

Not sure what the Java to WASM story is if that is a requirement for what they need.

stavros1y ago

This is a very impressive technical achievement, and it's clear that a lot of work went into it.

deadbunny1y ago

TFA is literally marketing for them. HN is their target audience and a good way to capture that audience is to show them something technically interesting.

akerl_1y ago

I agree. That’s a really unfortunate way to view somebody’s project.

stavros1y ago

Is it a project, or is it a company?

1 more reply

scosman1y ago

Serverless git repos: super cool

But I can't figure out what makes this an AI company. Seems like a collaboration tool?

plesivOP1y ago

OP here. We're not an AI company; we're aiming to be AI-adjacent and simplify the practical application of AI models.

ijamj1y ago

Honest question: how is this "AI-adjacent"? How does it specifically "simplify the practical application of AI models"? Focus of the question being on "AI"...

iampims1y ago

Some serious engineering here. Kudos!

nathants1y ago

this is very cool!

i prototyped a similar serverless git product recently using a different technique.

lambda was nice because for cpu heavy ops like merging many shallow clones, i could do that in a larger cpu lambda, and cache the result.

other constraints were similar to what is described here. mainly that an individual push/pull cannot exceed the api gateway max payload size, a few MB.

i looked at isomorphic, but did not try emscripten libgit2. using cloudflare is nice because of free egress, which opens up many new use cases that don’t make sense on $0.10/GB egress.

i ended up shelving this while i build a different product. glad to see others pursuing the same thing, serverless git is an obvious win! do you back your repos with r2?

good luck with your raise and your project. looking forward to the next blog. awesome stuff.

1. https://git-scm.com/docs/git-bundle

2. https://github.com/nathants/git-remote-aws

gavindean901y ago

I really like the idea if file system over durable objects

eastdakota1y ago

I do too.

bagels1y ago

Infinite sounds like a bug happened. It's obviously not infinite, some resource will eventually be exhausted, in this case, memory.

j / k navigate · click thread line to collapse