The cynic in me thinks they like having the extra revenue.
In fact, the incentives are diametrically opposed in that almost every one of them makes more money when our builds take longer to run, regardless of the reason. So they are financially disincentivized to build anything that could make them faster, or even make it easier to limit the duration, as is the case here. When it does happen, it's a rare triumph of some combination of people with genuinely good intentions, customer demand, and competitive pressure, over the demand for financial returns that every company has to eventually come to terms with, and not sustainable over the super long term.
The ones that let us host our own runners at least offer us an escape hatch where we can opt out of the diametrically opposed incentive structure and back into a still-not-aligned but neutral one, but then we give up much of the benefits of CI as a SaaS and have to spend engineering hours building and maintaining our own build infrastructure at a significant cost.
Let's not forget that traditional CI in itself is already a commodity where providers sell us dumb CI minutes that we have to spend our own eng hours engineering deployment and testing solutions on top of, and eventually have to sink entire full-time engineering teams' worth of hours into fighting against the natural tendency for these systems to get slower as we add more code and people.
I believe the solution is deployment & testing platforms tailored to specific technologies, meticulously engineered to be so ridiculously fast that they can reasonably be offered as an all-you-can-eat plan for a fixed monthly price per seat, instead of the industry standard usage-based pricing of traditional CI providers. This aligns incentives much better in that slow builds hurt the provider's bottom line as much as they hurt customers' engineering productivity, and on the flip side it financially incentivizes constant investments into making the system even faster from the provider side since faster builds means they can serve more customers on the same hardware and pocket the difference as profit.
Shameless plug: I've been building one of these platforms at https://reflame.app. Reflame can deploy client-rendered React web apps in milliseconds, fast enough to make deploying to the internet feel like local dev.
Github should send a bunch of money to the act developer - I know I wouldn't have used Github actions at all without act existing, I'm sure other people must be in the same situation. (Though I'm not paying Github either, so perhaps I'm not a target customer...)
I've found it especially useful for fixing complex workflows or working with custom actions. It's not strictly needed, but it does speed up your workflow once you figure out the kinks.
* Scheduled actions basically never run anywhere close to on schedule. If you schedule something to run every 13 minutes, it may just run 1-3 times an hour with random 30 minute to 1 hour waits in between executions.
* Triggering a workflow as a result of an action from another workflow doesn't work if you're using the GITHUB_TOKEN as part of the action. Github does this to prevent accidental recursion, but it forces you to either use insecure PATs or rearchitect how to handle chained events: https://docs.github.com/en/actions/using-workflows/triggerin...
I miss the days of setting the clock/date to avoid time bombs in software builds. "back in my day, things were so much easier!" now, I would not be surprised if the teams working on these kinds of lock outs are larger than the teams building the product.
I welcome their anti-recursion measures because I fought in the recursive clone wars and no one should have to support any systems that allow that.
I'm not talking about using this on a free tier or something. Github actions are billed monthly. This goes way beyond just not having a tight SLA. Precision isn't even the ask here. It's one thing if a job scheduled to run every 10 minutes occasionally takes 12 or 13 in between runs. It's a completely different matter if it takes an hour.
Having some safe-guards against unbounded recursion is one thing, but the escape hatch for it right now is to use less secure credentials. That's just madness.
As much intelligence as possible ought to be pushed down to and tested and debugged on the script level so that you're left with little more than a linear sequence of 4-5 commands in your YAML.
The debugging tooling on github actions is, frankly, abysmal. You need a third party ngrok action to even run ssh.
Also, I like that you build the hypothetical merge of branch + main. But that commit SHA is gone after that successful build. Give me a way to track this. I need to store artifacts related to this build, as I don’t want to build those again!
https://docs.github.com/en/actions/using-workflows/storing-w...
they do seem to be capable of saving most things people call artifacts or if you are looking for something more along the lines of caching parts of the build for future builds, you can adjust it pretty easily by adjusting what the cache key is based on.
example:
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
which will allow you to cached based on the hash of a specific deps lock file instead of the commit sha.https://docs.github.com/en/actions/using-workflows/caching-d...
https://github.com/actions/cache
The one note here is clearing that cache/cache management isn't straight forward currently (although they are improving it), there are a few acceptable workarounds though.
Not sure if you were aware of these already.
I believe YAML supports non-string keys, so your key would be parsed to the corresponding Boolean value (true), if the pipeline then goes through JSON where only string keys are supported the serialiser could simply stringing the key rather than raise an error, leading to “True”.
And that’s one of the billion reasons why barewords are bad.
I think this has been fixed in Yaml 1.2, but there’s a lot of Yaml 1.1 libraries out there, and they can’t just switch since they could break user code.
That's when I found out the YAML spec explicitly says it's human-readable, not human-writeable. Our mistake was assuming YAML was a configuration format, when actually it's a data serialization format (again, spec explicitly says this) that is easy to read.
Now I only write YAML files with a YAML generator, because just running a hand-edited file through a parser may fall victim to a parser quirk.
More likely they hacked their YAML parser to treat on as a string.
At least that's what Travis CI folks did:
In fact, YAML does that terrible substitution for both keys and values.
In New projects I tend to use scripts to perform any required task for the ci and have github actions only run the script. Way easier to reason about.
Gitlab CI definitely handles this better with it "script" concept.
1. for Github to natively allow CI management for several repos in a centralized way, so repo setup can just be "select this CI config" instead of "copy this YAML file and change the project name in some places"
2. to mandate certain CI steps at the organization level (such as running `black`) so it isn't opt-in
If they allowed config to come from an internal setting not visible in repo, i'm sure repos I collaborate with would start using that feature, and I would not be able to find their Actions configs.
(I work mostly on open source, which may lead to different patterns of access and such).
I haven't tried it yet though.
However, that then exposed me to the up thread bug about files. So now I also have to delete the file before creating it. Sigh.
If you're only free to run those workflows when they land in the default branch, does that also mean that the workflow that runs is the one from the default branch and if you change the workflow in a PR, it will only run the new workflow on merge?
I know there's something in here to permit non-owned commits (from an external contributor) to be tested against a trusted workflow from the main branch, but I don't think it has anything to do with workflow_dispatch. I would expect that if you're able to run workflows and target any branch, then if the workflow you run is the one contained in that branch, you'd be able to select any workflow that is named and defined in the branch's configuration.
I'm not saying that's how it works, I'm saying that's how I'd imagine it to work. If someone knows "the rule" that we can use to disambiguate this and understand it from the precepts that went into the design, maybe speak up here? I don't get it.
the premise of your question is wrong. you can trigger workflow_dispatch workflows in any branch via the UI if a workflow by that name also exists in the default branch, and only via API if no workflow by that name exists in the default branch.
- [1] https://github.com/actions-runner-controller/actions-runner-...
my employer used some code from philips-labs to support ephemeral runners. works great after a few customizations.
I wrote a shell script and a very small Go program to support ephemeral MacOS runners on-premise.
these things are so fun to work on.
I'm 100% sure they don't use this internally as these are glaring issues that impacts anyone using the self hosted runner. They also recommend running the container as root[1] instead of designing something more secure and sane.
1: https://github.com/actions/runner/issues/434#issuecomment-61...
the result is root or another user inside the container can write root-owned files because they have the same UID as root on the container host.
my employer runs an orchestrator and destroys each runner VM after a single job so this only bites the user who causes it, and not anyone else.
The checks associated with the workflow don’t run and stay in a pending state, preventing the PR from being merged.
The only workaround I’m aware of is to use an action such as paths-filter [3] instead at the job level.
A further, related frustration/limitation - you can _only_ set the “paths” property [2] at the workflow level (i.e. not per-job), so those rules apply to all jobs in the workflow. Given that you can only build a DAG of jobs (ie “needs”) within a single workflow, it makes it quite difficult to do anything non trivial in a monorepo.
[1]: https://docs.github.com/en/repositories/configuring-branches...
[2]: https://docs.github.com/en/actions/using-workflows/workflow-...
Of course, the nature of running various commands on virtual machines and shells is inherently messy, but GHA could have done a lot to hide that. Instead I feel like I'm forced mix YAML, bash, Powershell and various higher-level scripting languages (that came with the projects) in an unholy concoction that is hard to get right (return codes, passing values and escaping directly comes to mind) and that is even harder to debug, due to the nature of running somewhere else (act helps, a little, but it doesn't properly replicate the GHA environment).
I kind of wished I could write all my workflows cross-platform from start with some well known but fullfledged scripting language. (Which of course I could, and just use GHA to call that script). What options are out there to make this whole thing less brittle?
"ubuntu-latest" isn't necessarily the latest Ubuntu, it's the latest version that has been fixed to the point of having no workflow-breaking known issues, I believe.
https://github.com/actions/runner-images/
I rely on that repo to build my own images and it is a frequent cause of failed builds. I'm going to convert almost all of it to installation via homebrew instead, I think. works well for MacOS anyway.
> While this behavior can be changed by passing ignoreReturnCode as the third argument ExecOptions, the default behavior is very surprising.
This is the same behavior as node's child process exec when wrapped by util.promisify[1] If something returns a promise (async func) it should be expected that it has the possibility of being rejected.
[1] https://nodejs.org/api/child_process.html#child_processexecc...
https://github.com/rhysd/actionlint
> oops.yaml:20:24: property "jobone" is not defined in object type {jobtwo: {outputs: {}; result: string}} [expression]
As it turns out, images are pulled at the start of the run, which means your docker login will have no effect if you're currently bumping into these pull limits. This is made worse by the fact that the images themselves are controlled in the remote actions you're using, not something in your own codebase.
So you're left with either: forking the action and controlling it yourself, or hoping the maintainer will push to the Github registry.
For example, I'd like to build an action that triggers a documentation update based on the path and filename that is changed.
on:
push:
branches:
- main
paths:
- */README.md
But there does not appear to be a way to pass a list of changed paths to the job. on:
push:
branches:
- main
paths:
- docs/**
- README.md
I use something similar for triggering different app workflows in a monorepo.*EDIT* Or in multiple directories but grouped into multiple documentation directories.
on:
push:
branches:
- main
paths:
- package1/docs/**
- package2/docs/**
- package3/docs/**
- README.md