2.1 We're starting work on this in the next couple of months. We plan to ship it in early 2022.
2.2 We want to speed up the pace of GitHub Enterprise Server releases, but I don't have more info to share.
2.3 We're looking at ways to not require a GHEC account or "unified" license.
2.4 The limits are much higher with the GitHub hosted runners, but this is a current limit of self-hosted runners.
3.1 It's on our backlog. No date to share.
3.2 I haven't heard this before and thanks for sharing the scenario. We'll think about it.
3.3 This will ship in October.
3.4 We're doing some performance optimizations for GHES 3.4 that should fix this.
3.5 This shipped recently - https://github.blog/changelog/2021-08-25-github-actions-redu...
3.6 We have a couple API improvements coming later this year.
3.7 We're looking into this, but no dates to share.
We're dedicated to making Actions a great experience. As you would assume, I'm very excited about the future of Actions and getting feedback like this helps us make it better.
I want to share whole jobs across my repos, including matrixes and all. I could wrap some common workflows into one action, but then the whole complexity will be hidden in a single step.
So I resorted to creating GHA Templates and manually synching them with a CLI tool[1], which isn't great, but at least I don't have to copy-paste changes across repos and I can keep a "main version" of the workflow
If you need to lock down reusable deployment workflows, these features will enable that. https://github.com/github/roadmap/issues/249 https://github.com/github/roadmap/issues/220
To clarify this further, in https://github.com/xmonad/xmonad/actions/runs/1201348600, GITHUB_RUN_ID=1201348600, GITHUB_RUN_NUMBER=168, but there's no way to get the 3514203530 out of https://github.com/xmonad/xmonad/runs/3514203530 (that's one of the jobs of that run).
That said, I have frustrations with the management of the GitHub Actions repositories: which is to say, there isn’t any. Most issues and Pull Requests languish, without acknowledgement or engagement.
I appreciate that managing issues and contributions is a full time job, and your team probably doesn’t have capacity right now: if you could magic up some capacity, that would be great, but… a more realistic request would be to evaluate capacity and follow Terraforms lead in being open about pausing contributions —- rather than READMEs that encourage contributions, which (despite the best of intentions) will never be reviewed.
e.g: https://www.hashicorp.com/blog/terraform-community-contribut...
Thanks :-)
Sometimes the checkout action fetches the wrong commit! The SHA in the GitHub environment variable says one commit, but the actual code is different(!). Because we don't know why this happens we basically need to do an unshallow fetch of the whole repo to be sure we have what we expect.
Using autoscaling self hosted runners, it is not currently possible to instruct the agent to finalise the current job but to accept none after it. This is essential to avoid broken workflows while scaling in. Gitlab supports this via a signal, but there is no equivalent.
I'm a product manager at GitHub, and I'm investigating this now. Feel free to drop me an email (ethomson@github.com) if you can share more information about the workflow runs that were affected by this.
> Using autoscaling self hosted runners, it is not currently possible to instruct the agent to finalise the current job but to accept none after it.
We're building this functionality out right now, so that you can have self-hosted runners that will run a single job and we will let you subscribe to webhooks that will let you scale up/down your fleet of runners.
I known is a fairly recent platform but I was expecting much more compared to what other services offer.
As far as the limitations in the blog post, they are real, but most of them are limitations in features that are not available to begin with in Jenkins or Travis.
Interesting - I find GitLab docs easy to use while I hate GH ones. It takes a long time for me to find the relevant part, even more to determine that this is really all there is, and then some extra to guess which interpretation of the text suits the reality. It has never occured to me that someone might prefer GH docs to GL ones. :)
On par? Gitlab CI is lightyears ahead of GHA and shows no sign of stopping: child pipelines, dynamic pipeline generation, very flexible conditionals, composable pipeline definitions, security, merge trains.
GHA is very basic, even simplest thing like individual job restart is not implemented.
However, having also used Buildkite, I'm not sure I could go back to GHA - it's slightly nicer to target, and at least as large a step towards reliability & predictability as Travis -> GHA was.
I think what GitHub was going for is to create some kind of ecosystem for proprietary 3rd party applications and use that as a revenue. That approach only crippled the whole functionality.
Then you're much much more portable. Want to run tests? Run ".ci/tests.sh", want to generate artifacts "make", or ".ci/build.sh".
All systems, be they github actions, jenkins, gitlab-runners, and everything else allow you to clone/update your repository and run something from within it. Which keeps things mostly portable.
I put together a simple github action a long time ago, but now of course I realize it is overkill:
I really wonder why would anyone self-host GitHub. Gitlab has a much more feature rich, mature and cheap ( there's a perfectly usable free version) offering. Yeah, someone might prefer Github's UX, but is it really worth it to pay for a worse product?
That's subjective. Gitlab does many more things, but nobody is forcing you to use them, and unused menu buttons don't matter much for the main workflows ( commits, MRs, repo visualisation, wikis, CI/CD).
before i understood your comment i was going to reply "security, data security & control" -- as a reason orgs might choose to self-host vs outsource hosting SaaS github.
But you're probably arguing for self-hosted gitlab over self-hosted github.
Good question! i've worked somewhere with self-hosted github, but they were using it prior to github actions / workflows, so it was a combination of self-hosted github and N different CI and deployment automation tools ( bamboo! GoCD! udeploy! jenkins! google cloud build! poor bloody operator manually copying deployment payload onto jump box as the bureaucracy won't approve automation!)
If, however, you've been running GHE for years and have it integrated with other tooling and workflows, and your developers understand it, the question is,
"Why would anyone rip all that out, replace it, reintegrate it and retrain everyone over a handful of features, most of which you don't care about?"
That's why we migrated to GitLab CI. We made some pretty complicated pipelines, with tens of stages, many of them dynamic, and it worked with minimal hassle. It was a great success story internally.
Functioning CI/CD for free is certainly a huge feature to care about. For a user moving between Github and Gitlab is at worst a slight annoyance, the UX and DX are pretty similar; most third-party tooling that isn't github.com ( SaaS) only supports both Github and Gitlab self-hosted (i'd even wager there are more tools that support self-hosted Gitlab than Github, the first is much more popular).
- Fallback cache keys don't work (because it doesn't compose with cache changing cache ids). So each time someone clears the runner caches, the fallback cache stops working.
- `if`/`rules` and `needs` don't work together, and cause the build to fail with a *yaml build failure*. Wat? This is even a documented failure mode now. This is a huge issue for monorepos.
- gitlab-runner is full of weird behaviors, like `gitlab-runner unregister <runner-name>` fails if that runner was already deleted (through the gitlab UI), but won't remove it from the local config. But `gitlab-runner` has a separate command, gitlab-runner verify --delete, which does just that...
It feels like every time I use it, I run into a bug. I've heavily used github actions in many projects of similar sizes, and have yet to come across a single bug.
> `if`/`rules` and `needs` don't work together, and cause the build to fail with a yaml build failure. Wat? This is even a documented failure mode now. This is a huge issue for monorepos
Could you elaborate on this? I have a fairly complex pipeline which has rules and needs, and it works without any yaml errors.
why not? is that a cost issue, perceived security, administration overhead?
Our main gotchas are roughly:
- GH-hosted runners have too little RAM/HD for big docker software. They push you to self-hosted runners for that, which is fine in theory, but GHA/Azure doesn't actually support serverless runners, so that falls flat in practice. We don't want to be turning machines on/off, that's GHA's job. We experimented with GHA -> Packer -> Azure for serverless, but it was slow and Packer frequently leaves zombie machines, so we went back to tweaking the low-RAM runners provided by our enterprise plan.
- Security: We want contactors etc. to be able to run limited GHA CI jobs and use that quota, but not higher-trust GHA CD ones. This is tricky at a configuration level. Ex: It seems like we'd need to funny things like main repo for CI w/ CI secrets, and a separate repo for CD w/ CD secrets, and only give untrusted folks access to the CD-cred repo. We've thought of other possibilities as well, but in general, it's frightening.
- Big Docker images: We do spend more time than I'd like messing with optimizing Docker caching as GPU containers are embarrassingly big (we use dockerhub vs github's due to sizes/pricing/etc), think both multi-stage containers + multi-step jobs (monorepo/microservices). I think they're in a good position to speed that up!
I'm optimistic about these, but tricky to align with MS/GH PM personal team priorities :)
They improved this quite a bit last June. https://github.blog/changelog/2021-06-22-github-actions-envi...
Basically you can define deployment environments which have their own secrets and configure it so only authorized users can approve workflows which access those environments.
You use a policy CRD called ImagePolicy to declare what versions should be matched and deployed automatically. You can have them deployed directly to production, or if it makes you more comfortable with your contractors, to an "auto-PR" branch which simply queues them up for a release engineer with the required juice to approve and merge the changes to prod.
There is no deploy job in CI, since the deployment in prod is whatever happens to be on the main branch in the config repo. This is a protected branch which only releng can merge changes into, usually only when the release checks are satisfied. Flux (which scans any image repos) updates those manifests when there is a release, either directly or via pull request.
This strategy I think would be able to address your Security concerns regarding contractors, by reducing the responsibility of your CI system to only CI, and not to handle CD anymore.
This is the subject of my talk at KubeCon[1], although you might not be able to get that from the title, (I'm presenting this topic for a Jenkins audience, but the focus is on Flux and how Flux works) so I'm hoping it should be applicable to a broader audience, certainly inclusive of GitHub users :)
[1]: https://kccncna2021.sched.com/event/lV0V/gitopsjenkins-ci-wi...
Ex: Network segmentation for CI. We only expect CI to communicate with dockerhub, conda/pypi, and the CI service (ex: azure). Something similar for CD. That _should_ be settable via GHA, but isn't. In a world of weird npm/python scripts and github action marketplace... scary we can't. Self-hosted runners can in theory do this via custom network policies, but it's a PITA for something ~everyone should be doing out-of-the-box.
Edit: HN won't let me respond to the below. Imagine something like IP theft, wanting to mine our repo for everything labeled "security", dig into our version #'s for viable CVEs, or force-push some git commits (incl. history rewriting). Defense-in-depth says we shouldn't make that unnecessarily easy for anyone who plants a backdoor that runs during CI's `npm install` / `RUN xyz` phase. Network + RBAC are basically table stakes for almost everyone building enterprise software, so GHA doesn't have to reinvent the wheel here, just do it + make it friendly.
In our case we've got some cypress tests that we want to run on a specific on-premise server every time we create a pull request. They take about 20 minutes to run, and we're creating a lot of pull requests, so you have to carefully check what github actions are executing before you create a pull request or push new changes to it. I'd love support for proper queues like what teamcity and other CI systems have.
We've had similar issues with Terraform deployments.
Maybe they need to reinvent JES and JCL. :)
When I left my last job, and started working on my own, I set up things like CI/D, JIRA, Jenkins, etc. These were the bread and butter for development in my old shop.
But they are "Concrete Galoshes"[0], and work very well for teams, as opposed to ICs. As a single developer, working alone, the infrastructure overhead just slowed me down, and, ironically, interfered with Quality.
When GH Actions were first announced (I can't remember, but they may have been beta, then), I set up several of them, on my busier projects. They worked great, until I started to introduce some pivots, and I realized that there was actually no advantage to them. I ran the tests manually, anyway, and the Actions just gave me one more thing to tweak. It was annoying, getting the failure messages, when I knew damn well, the project was fine. I'd just forgotten to tweak the Action. I introduce frequent changes, in my work, and that is great.
[0] https://littlegreenviper.com/miscellany/concrete-galoshes/
You can also run actions on top of Jenkins. https://github.com/DontShaveTheYak/jenkins-std-lib
For example. Some open source code I need to publish to package-indexes or marketplaces, and will have an action to do that whenever I publish a release, I’ve found it to be very useful and saves a lot of time.
There's a brilliant trick for doing that described here but it's a lot of work to setup: https://blog.thea.codes/building-a-stateless-api-proxy/
You can also create a brand new GitHub user account, grant it access to just a single repository and then create a PAT for that user account - annoying but it does at least let you scope down the permissions a bit.
The biggest issue I have is around self-hosted runners.
1. There's no official auto-scaling runner option, so even if you're paying Github (aka Microsoft) for Enterprise - they're not going to support your auto-scaling EKS/GKE/EC2/whatever runners.
2. You can't register self-hosted runners without a Personal Access Token - the key word being _Personal_. Your automation code for provisioning runners should not rely on an individuals Github access token just to register, they need to have a system like GitLab has where you can generate a registration token per-organisation/team/repo that allows you to programmatically register runners.
It seemed decent but I hit two problems:
1. I didn't manage to get autoscaling to work - I suspect my helm templates might have been incorrect.
2. Docker-in-Docker (DIND) I know works, but one of the clients I'm working with has switched to containerd and the controller got a little confused by Docker-in-Containerd (DINC - you heard it here first!), I know really they should be using Kaniko/Buildah etc... but their devs aren't ready to make the change yet.
With the default GITHUB_TOKEN, you can't push to protected branches. If you decide to use personal access tokens, you can push to protected branches, BUT that will trigger other workflows. That can cause an infinite loop of workflows.
We still couldn't figure out how to push code to a protected branch without triggering the same/other workflows.
If you include `[skip ci]` anywhere in the commit message, that commit won't trigger any github actions. This is a built-in behavior; you don't have to manually check for that string in any downstream actions. I spent a long time trying to work around the same constraint before my colleague pointed out that `[skip ci]` was a thing.
if: "!contains(github.event.head_commit.message, 'DO_NOT_TRIGGER_ACTIONS')"- can't manually delete cache (although they're considering changing that[0])
- only saves cache at the end of a workflow, and only if the workflow succeeds. This could be solved with CircleCI's approach of having a save-cache step and a restore-cache step.
- Cache is super slow for self-hosted runners, so it makes more sense to have a local cache instead of using the action
- only 5gb of storage size. This was supposed to be increased via billable cache storage[1], but it's been on the backburner since July 2020
- In addition to the above, you can't use a different storage backend on the official action (which would allow storing over 5GB of cache via your own storage). The best workaround is to use a user-provided action which utilizes the s3 api[2].
0: https://github.com/actions/cache/issues/632
1: https://github.com/github/roadmap/issues/66
2: https://github.com/actions/cache/issues/354#issuecomment-854...
2>
Caching is a joke. But then again, comparing GHA to GitLab CI is like comparing a pogo stick to a bicycle. Sure, both can be used to get there, but one ride is a bit... bumpy.
(I use both and I am not affiliated to either.)
Because of a bug (in my action) I was pushing/pulling gigabytes of data into the cache. On large files I was seeing ~125MB/s (that's Byte) download speeds.
I haven't used the GitHub hosted runners much so I'm not sure how it compares though.
It was so bad I ended up scheduling my cron jobs to run 4 - 5 hours early and just sit there idling until the actual execution time came around haha.
Ultimately not running at all started to be a real pain point, I gave up and paid $5 for a monthly box to sit there and run crond to save on the headache.
I have a few suspicions that it's intentionally done this way for balancing demand spikes and that jobs scheduled for the top of hour are worst affected.
2.1 Caching isn’t available: GitLab has this everywhere.
2.2 GitHub Enterprise Server is behind GitHub Enterprise Cloud: GitLab ships the same code to GitLab.com as it does to our self-managed customers. This was a tough decision but has a lot of benefits...the central being feature parity and scalability for self-managed folks
2.3 Using Public GitHub.com Actions: This is a symptom more than the problem itself - relying on third-party plugins for build jobs is scary, and leads to many of the same issues we’ve seen in the Jenkins ecosystem - easy to get started, hard to maintain.
2.4 Dockerhub pull rate limiting: for self-hosted runners, you can use a registry mirror or Dependency Proxy to reduce your number of pulls from Docker Hub. The key is the entire platform has to be there to enable the right workflows.
3.1 No dropdowns for manually triggered jobs: GitLab also doesn’t have drop downs, but does have the ability to pre-fill these values.
3.2 Self-hosted runner default labels: I think this is also more of a symptom than a problem. 3.3 Being able to tag and use runners for specific tasks is key - so I understand the frustration and we’ve spent a lot of time on this.
3.4 You can’t restart a single job of a workflow: You can do this with GitLab.
3.5 Slow log output: I haven’t seen this be a problem, and is a benefit of our scalability features being built into the self-managed code.
3.6 You can’t have actions that call other actions: There are lots of ways to relate pipelines (parent/child, triggers. etc.) in GitLab.
3.7 Metrics and observability: The GitLab runner has Prometheus build in, and the dashboards we use to manage GitLab.com are partially public: https://dashboards.gitlab.com
3.8 Workflow YAML Syntax can confusing: This can be really hard to get right. I learned to stop worrying and love the YAML long ago, and I know we’ve got through a lot of iterations to try and get this right.
I'd love to know where folks think I got this assessment wrong. And is there value in writing more about it?
(edited for line spacing)
Kudos to Gitlab for not utilizing the GHE model in order to convert users to their hosted solutions.
> GitHub Actions make our pipelines accessible to any developer on the team
Do you reckon this accessibility is a combination of (i) storing the pipeline definitions in the application's source repo, where application developers can find them easily, not hidden/scattered elsewhere in other repos or behind management UIs, and (ii) a relatively simple and documented pipeline syntax?
The first example I can think of a tool that supported this workflow was Travis CI ~ 2011 - 2012. Appveyor offered similar capabilities quite early as well. Same workflow can be done with Gitlab, Google cloud build.
> we can even deploy each pull request to the cloud rather inexpensively and leverage GitHub's environments to help manage the cleanup for us. This allows our team members to review and test changes in their browser before we pull them into our development branch
Yeah, this kind of workflow is great. Another way this kind of workflow can be done is to create simple command line tools that developers can use to create and destroy temporary test environments running their speculative changes. In some cases, for rapid experimentation, it can be great to be able to spin up N temporary environments in parallel with different changes without tying it to pull requests. But I can see that tying the temporary environment lifecycle to the lifecycle of a PR might make it easier to share demos of proposed changes with reviewers.
Out of curiosity, how reliable do you find the environment cleanup is? I remember building a similar create-temp-environment / destroy-temp-environment workflow for ephemeral databases running in AWS RDS driven by jenkins pipelines. It took a few months of tweaking to figure out how to ensure the RDS databases got torn down correctly and not "leaked" even if the jenkins master or workers failed midway through pipeline execution. From memory we had a bunch of exception handling in a jenkins groovy scripted pipeline that would run on the master jenkins to try to do cleanup, and even that wouldn't work all of the time, so we had a second cleanup job on a cron-schedule to detect and kill leaked resources.
Yes, exactly. All of our build pipelines for a repository are included in the .github folder in the root of the repo. It makes it easier for team members to feel comfortable making changes and submitting a PR for them. You can setup an ACT container so you can test GitHub Action changes locally before pushing them too (see https://github.com/nektos/act )
> Out of curiosity, how reliable do you find the environment cleanup is?
So far, environment cleanup has been reliable, but I have noticed where it failed to cleanup some provisioned resources once in a blue moon. I blame this more on our code than GitHub Actions. I periodically review our sandbox environments to ensure we didn't miss deleting anything.
Apparently it’s still better than Github though xD
Actually, most points in the article are the basis on why we created BuildJet.
We initially tried to solve these annoyances by creating a CI with speed and the YAML config as a USP. We got 4x speed and a much better YAML config structure, but despite these improvements we noticed that it people had a mental barrier to migrate to a new unknown CI.
But like OP we always enjoyed the experience of using GitHub Actions, so with this in mind. We decided to build BuildJet for GitHub Action[1] that uses the same infrastructure but plugs right into Github Action as a "self-hosted" runner, which is automatically set up for you with OAauth. This resulted on average a 2x speed improvement for half the cost(due to us being close to the metal). Easy to install and easy to revert.
The biggest problem is that I don't feel comfortable putting our IP out on a server managed by a small provider. If you had an offering where I could self-host, I'd be very interested.
BTW. You have a broken link to "Privacy" from the "Terms" page.
Once you get into more complex things - like building docker images, storing into an artifact repository, baking amis, running integration or end to end tests, etc, it can be a pain.
It was a great place for us to start but we've since moved to BuildKite.
Among my small, open source work, probably my biggest complaint is actions running in forks. Wastes a lot of resources on their side and limits my concurrent runners for projects in my personal space. For companies, depending on the setup, this would eat their compute minutes.
Also annoying that PR actions can't post to the PR. I can understand there are security limitations but it makes it so a lot of nice features don't exist for most people.
Everything else is managed via a custom tool we use for packaging & deploying our product.
Even our simple "run this 1 build command and ensure exit code == 0" action seems to have a semi-weekly issue like stuck "waiting for status" and other unexplained failures throughout. We don't want to put any more eggs into that particular basket right now.
My wishlist item would be more variants of Windows server versions so that we could build Windows containers for more versions of Windows. I realize the fault lies with Windows containers pinning the container base version to the host version, but I'm still stuck with the burden.
I think GitHub Actions got the model correct, using everything as events to trigger any number of workflows. This is far simpler to maintain than a single workflow with conditionals and wait states that you see with other systems.
The verbosity of accessing output has the added benefit of making it much clearer that the 2 workflow steps are tightly interdependent on each other.
As of August 25, you can! https://github.blog/changelog/2021-08-25-github-actions-redu...
The problem is that I cannot make the image build action happen IFF both testing actions pass. I had to combine all three actions into one.
It works. But now there’s a “skipped” step that’s skipped 99% of the time and makes no sense for a lot of PRs. It also means I have a Frankenstein monster action that does three long lists of very different things. All just so I can make 3 depend on 1 and 2.
The other problem is that to develop and Test an action, I have to just push to origin a thousand times. My kingdom for a CI system that _trivially_ enables me to install a single one-liner program that lets me locally test my actions at near 1:1 compatibility.
* Full clean build including dependencies (support Carthage, Cocoapods, SPM)
* Running multiple test suites that takes maybe 5+ hours for full suite?
* Running simulators for screenshot testing
It let's you run actions on top of Jenkins.
Oh jesus christ. I feel for you dude.
We evaluated GHA, and we still are trying to use it, but there is a barrage of problems and limitations, including cost, lack of functionality, and technical issues. It's really only suitable (at scale) for linting, or generating Changelogs, or something else trivial. I use it in my OSS projects to run tests, and it's okay for that (though impossible to just tail a build log when it's large)
Drone.io is still an amazingly effective system that matches GHA (and has _crazy_ features like build parameters) but is more flexible. Of course you'll have to pay for commercial licenses, but if it's between paying for GHA or Drone, I highly recommend Drone instead. Drone is stupidly easy to maintain (infrastructure-wise).