Dagger Python SDK: Develop Your CI/CD Pipelines as Code (opens in new tab)

(dagger.io)

161 pointsyurisagalov3y ago91 comments

91 comments

78 comments · 27 top-level

diarrhea3y ago· 16 in thread

I've long wished to be just writing (ideally) Python to define and run CI/CD pipelines. YAML is simply hell, `if` keys in such data description languages are cruel jokes.

Round-trip times for GitHub Actions are too high: sometimes you're waiting for 10 minutes just to run into a dumb typo, empty string-evaluated variable or other mishap. There's zero IDE support for almost anything beyond getting the YAML syntax itself right.

We have containerization for write-once-run-anywhere and languages like Python for highly productive (without footguns as bash has them), imperative descriptions of what to do. The main downside I see is it getting messy and cowboy-ish. That's where frameworks can step in. If the Dagger SDK were widely adopted, it'd be as exchangeable and widely understood/supported as, say, GitHub Actions themselves.

We currently have quite inefficient GHA pipelines (repeated actions etc.) simply because the provided YAML possibilities aren't descriptive enough. (Are Turing-complete languages a bad choice for pipelines?)

What's unclear to me from the article and video is how this can replace e.g. GitHub Actions. Their integration with e.g. PR status checks and the like is a must, of course. Would Dagger just run on top of a `ubuntu-latest` GHA runner?

pydry3y ago

>Are Turing-complete languages a bad choice for pipelines?

I'd say that non turing complete languages are a bad fit for pipelines. Even mildly complex pipeline will eventually have loops and conditionals.

Better python than some originally-a-config-YAML language turned into an imperative monstrosity with loops and conditionals bolted on.

verdverm3y ago

Dagger originally started with CUE, and is still powered by it under the hood, which has the constructs you mention, while also being turing incomplete.

I don't understand this move to define infra and CI imperatively, and tool vendors moving to support umptine languages for their users... Say what the world should look like, not how to get there?

1 more reply

courgette3y ago

Not a snarky question : what about having data and config in yml, and some high level tool like ansible or terraform and the occasional bash?

You loop and branch in ansible/terraform

I hate that setup but also I have a hard time thinking of something else

2 more replies

lmm3y ago

Something like Dhall might be viable for pipelines. I wonder if there are any efforts in that direction.

ripperdoc3y ago

I can only agree, sitting with this right now and it's all so brittle. Of course, it's not just YAML, it's Javascript, Powershell and Bash embedded into strings plus various hacky methods of escaping, templating and sending values between steps. It's really a mess.

shykes3y ago

> What's unclear to me from the article and video is how this can replace e.g. GitHub Actions. Their integration with e.g. PR status checks and the like is a must, of course.

You guessed correctly: Dagger does not replace Github Actions, they are complementary. The Dagger project itself uses Github Actions and Dagger together :)

> Would Dagger just run on top of a `ubuntu-latest` GHA runner?

Sure, you can do that. The only dependency to run Dagger is Docker or any OCI-compatible runtime. So, assuming the `ubuntu-latest` runner has docker installed, you can just execute your Dagger-enabled tool, and it should work out of the box.

For example here's our own github workflow for testing the Python SDK (you can look around for other workflows): https://github.com/dagger/dagger/blob/main/.github/workflows...

Note that the word "dagger" doesn't even appear, since Dagger is embedded as a library (in this case, using the Go SDK). As far as GHA is concerned, it's just executing a regular binary.

Too3y ago

> Are Turing-complete languages a bad choice for pipelines?

Yes and no. If you write “steps” in yaml, you are doing it wrong and might as well be using a Turing complete imperative language.

On the other hand, linear steps isn’t always the best to begin with for a pipeline. Better have a dependency tree such as makefiles but more advanced, that the CI engine can execute in the most optimal order by itself and retry failing steps without restarting from the beginning. Just keep the number of transitions between declarative and Turing complete few, nobody likes templating strings to inject data into a small script.

dfee3y ago

A good first reach is encoding your CI into a first class construct - ideally in the language of your codebase.

CI paths are underdeveloped, which IMO is a huge miss: you pay a developer premium and potentially risk every iteration. Keep the GHA glue light and invest in your code not your lock in.

__MatrixMan__3y ago

I've been using Apache Airflow for CI. It's a square peg in a lot of ways but I like that it's python and I like I can run it locally and iterate by just rerunning a single task instead of the whole pipeline.

Pretty much everything is just a @task decorated python function or a KubernetesPodOperator but sometimes I imagine I'll write CI-focused operators, e.g. terraform.

heydonovan3y ago

You can! At least with GitLab. Our pipelines are written in Python, and generate YAML that kick off child-pipelines. It's fairly trivial and works really well. Having for-loops and building an object based on functions making things so much easier.

hbogert3y ago

You start to wonder thy you have to compile to 'yaml', instead of just having gitlab just give you a normale interface, in the form of a library. And then we've come full-circle.

paulddraper3y ago

Child pipelines work kinda weird.

blowski3y ago

Think about it from the perspective of the provider of the build platform (e.g. CircleCI, GitLab, GitHub). There are way fewer edge cases parsing YAML files than allowing Turing complete languages.

inferiorhuman3y ago

Be that as it may, YAML has plenty of edge cases that make it the wrong choice.

1 more reply

wmichelin3y ago

I don't know if this is just scoped to my company, but drone seemingly supports starlark, which is just python

KronisLV3y ago

I've used GitHub Actions a little in the past, also lots more of GitLab CI (I liked their pipeline description format a little bit more), some Jenkins and recently I've settled on using Drone CI for my personal projects (disclaimer: has community and enterprise editions, the latter of which is commercial): https://www.drone.io/

Personally, it's pretty interesting to see how different tools in the space handle things, even without getting into where and how your pipelines run. It's nice to see the increasing move towards defining everything in "code", regardless of whether it's a DSL or a turing-complete language - I'll take a versioned Jenkinsfile over messing about in the UI most days.

> I've long wished to be just writing (ideally) Python to define and run CI/CD pipelines. YAML is simply hell, `if` keys in such data description languages are cruel jokes.

I'd say that YAML is passable on its own, but gets more and more inadequate, the harder the things you're trying to do get.

Thankfully, I've been able to keep most of my CI pipelines relatively simple nowadays, along the lines of:

  - initialize some variables that don't come out of the box from the CI solution
  - parse or process any files that are needed for the build/action, such as attaching metadata to project description
  - do the build (nowadays typically just building an OCI container), or whatever else the pipeline needs to do (since you can do more than just build applications)
  - save any build artefacts, push containers, do logging or whatever else is needed

Even navigating between build steps is mostly taken care of by the DAG (directed acyclic graph) functionality and choosing whether a build needs to be done is typically done declaratively in the step description (though I haven't found any solution that does this well, e.g. complex conditions).

That said, there's basically nothing preventing me or anyone else from including a Python script, or a Go program, or even Bash scripts (if you don't want to think about getting an environment where most of the other languages are available, in lieu of the footguns of Bash) and just running those. Then, control flow, looping, using additional libraries or tools suddenly becomes more easy.

> Round-trip times for GitHub Actions are too high: sometimes you're waiting for 10 minutes just to run into a dumb typo, empty string-evaluated variable or other mishap. There's zero IDE support for almost anything beyond getting the YAML syntax itself right.

In regards to writing correct pipelines, I really liked how GitLab CI lets you validate your configuration and even shows how the pipeline would look like, without executing anything, in their web UI: https://docs.gitlab.com/ee/ci/lint.html I think most tools should have something like that, as well as pipeline visualizations - anything to make using them more user friendly!

As for the cycle times, if most of what the build or CI action (whatever it might be) needs is already described as "code", you should be able to run the steps locally as well, either with a separate wrapper script for the stuff that you won't get locally (like CI injected environment variables, which you can generate yourself), or with a local runner for the CI solution.

This is why I mentioned Drone, which has something nice in this regard: https://docs.drone.io/cli/drone-exec/

But generally, for most simpler pipelines (like the example above), you can even just set up an IDE run profile. In my case, I typically version a few run configurations for JetBrains IDEs, that can build containers for me, run tests and do other things. Sometimes the local experience can be a bit better than what you get on the CI server: if you have any integration tests that automate a browser with Selenium (or a more recent solution), you can essentially sit back and watch the test execute on your machine, instead of having to rely on screenshots/recordings on the server after execution.

Of course, much of this would gradually break down, the more complicated your CI pipelines would become. The only thing I can recommend is KISS: https://en.wikipedia.org/wiki/KISS_principle Sadly, this is a non-solution when you're not in control of much of the process. Here's hoping that Dagger and other solutions can incrementally iterate on these aspects and make CI/CD easier in the future!

shykes3y ago· 12 in thread

Hi everyone, Dagger co-founder here. Happy to answer any questions.

We also released an update to the Go SDK a few days ago: https://dagger.io/blog/go-sdk-0.4

ath3nd3y ago

It looks awesome! I have a clarification question on this one:

> Using the SDK, your program prepares API requests describing pipelines to run, then sends them to the engine. The wire protocol used to communicate with the engine is private and not yet documented, but this will change in the future. For now, the SDK is the only documented API available to your program.

Does it mean the sdk is making a round trip to the dagger API remotely somewhere, or is the round trip to a locally running docker container?

shykes3y ago

> Does it mean the sdk is making a round trip to the dagger API remotely somewhere, or is the round trip to a locally running docker container?

The short answer, for now, is: "it's complicated" :) There's a detailed explanation of the Dagger Engine architecture here: https://github.com/dagger/dagger/issues/3595

To quote relevant parts:

> The engine is made of 2 parts: an API router, and a runner. > - The router serves API queries and dispatches individual operations to the runner. > - The runner talks to your OCI runtime to execute actual operations. This is basically a buildkit daemon + some glue. > The router currently runs on the client machine, whereas the runner is on a worker machine that will run the containers. This could be the same machine but typically isn’t.

> Eventually we will move the router to a server-side component, tightly coupled and co-located with the runner. This will be shipped as an OCI image which you will be able to provision, administer and upgrade yourself to your heart’s content. This requires non-trivial engineering work, in order to make the API router accessible remotely, and multi-tenant.

dolanor3y ago

Hi, on the locally running engine.

codetrotter3y ago

This is very cool!

Also, from the post:

> Get started with the Dagger Go SDK, Dagger Python SDK, or let us know which SDK you're looking for.

Are you guys seeing many requests for a Dagger Rust SDK yet? :)

As someone who writes mostly in Rust, I’d love to get rid of yaml definitions of CI pipelines and instead define pipelines using Rust

dolanor3y ago

Yes, we're getting a bunch of request from rustaceans

bennine3y ago

Hey there, this looks interesting. What's the scope of this project? Is it meant to be closer to developers or do you see this being used in production?

How is progress of builds, observability, etc being tackled?

shykes3y ago

> What's the scope of this project? Is it meant to be closer to developers or do you see this being used in production?

Dagger is meant for both development and production. Note that Dagger doesn't run your application itself: only the pipelines to build, test and deploy it. So, although the project is still young and pre-1.0, we expect it will be production-ready more quickly because of the nature of the workloads (running a pipeline is easier than running an app).

> How is progress of builds, observability, etc being tackled?

All Dagger SDKs target the same Dagger Engine. End-users and administrators can target the engine API directly, for logging, instrumentation, etc. The API is not yet publicly documented, but will be soon.

We're also building an optional cloud service, Dagger Cloud, that will provide a lot of these features as a "turnkey" software supply chain management platform.

pydry3y ago

A) Why async in the user code? Is it really necessary?

B) Can you mock pipeline events?

shykes3y ago

> A) Why async in the user code? Is it really necessary?

We support both sync and async mode. The Python ecosystem is in a state of flux at the moment between sync and async, so it seemed like the best approach to offer both and let developers choose.

> B) Can you mock pipeline events?

Could you share a bit more details on what you mean, to make sure I understand correctly?

2 more replies

helderco3y ago

> A) Why async in the user code? Is it really necessary?

It's not a requirement, but it's simpler to default to one and mention the other. You can see an example of sync code in https://github.com/helderco/dagger-examples/blob/main/say_sy... and we'll add a guide in the docs website to explain the difference.

Why async?

It's more inclusive. If you want to run dagger from an async environment (say FastAPI), you don't want to run blocking code. You can run the whole pipeline in a thread, but not really taking advantage of the event loop. It's simpler to do the opposite because if you run in a sync environment (like all our examples, running from CLI), it's much easier to just spin an event loop with `anyio.run`.

It's more powerful. For most examples probably the difference is small, unless you're using a lot of async features. Just remove async/await keywords and the event loop. But you can easily reach for concurrency if there's benefit. While the dagger engine ensures most of the parallelism and efficiency, some pipelines can benefit from doing this at the language level. See this example where I'm testing a library (FastAPI) with multiple Python versions: https://github.com/helderco/dagger-examples/blob/main/test_c.... It has an obvious performance benefit compared to running "synchronously": https://github.com/helderco/dagger-examples/blob/main/test_m...

Dagger has a client and a server architecture, so you're sending requests through the network. This is an especially common use case for using async.

Async Python is on the rise. More and more libraries are supporting it, more users are getting to know it, and sometimes it feels very transitional. It's very hard to maintain both async and sync code. There's a lot of duplication because you need blocking and non-blocking versions for a lot of things like network requests, file operations and running subprocesses. But I've made quite an effort to support both and meet you where you're at. I especially took great care to hide the sync/async classes and methods behind common names so it's easy to change from one to another.

I'm very interested to know the community's adoption or preference of one vs the other. :)

hbogert3y ago

don't forget about CUE please! Dagger is a good reason to learn CUE, my gut feeling so far is, my life is better with CUE experience.

intelVISA3y ago

Any reccs for moving into the IaC space?

helderco3y ago· 3 in thread

Hey, lead developer for the Python SDK here.

I have a few examples in https://github.com/helderco/dagger-examples and I plan to add more.

There's also reference documentation in https://dagger-io.readthedocs.io/ so you can get a birds eye view on what's possible.

JackMcMack3y ago

The api looks very simple and easy, that's a very good sign. We've been doing similar stuff in python, mainly in combination with Pulumi automation.

You can do some very fancy things, and I'm sure Dagger would be a nice addition to that. But honestly the thing I'm missing most is a nice UI above all that. How to present realtime and historic logs (and state) for multiple concurrent actions is still not as easy as I think it could be.

shykes3y ago

> I'm missing most is a nice UI above all that. How to present realtime and historic logs (and state) for multiple concurrent actions is still not as easy as I think it could be.

We have plans to solve that :)

1 more reply

anentropic3y ago

Haven't tried it yet, but I like the features!

dvasdekis3y ago· 2 in thread

Hi guys, love the idea and the code examples look simple and understandable. Nicely done!

One of the reasons we use proprietary pipelines is the automatic 'service principal' login benefits that exist on e.g. Azure Devops, where the pipeline doesn't need to authenticate via secrets or tokens, and instead the running machine has the privileges to interact directly with Azure. (See https://learn.microsoft.com/en-us/azure/devops/pipelines/tas... particularly "addSpnToEnvironment" parameter). I'm sure other clouds have something similar.

Running the same pipeline locally, there are ways to synthetically inject this, but there's no ready support in your framework yet for this (as ideally you'd have an 'authentication' parameter that you can set the details for). Is something like this planned?

shykes3y ago

Thanks for the kind words!

> Running the same pipeline locally, there are ways to synthetically inject this, but there's no ready support in your framework yet for this (as ideally you'd have an 'authentication' parameter that you can set the details for). Is something like this planned?

Yes, we are watching this new pattern of authorizing CI runners very closely. We fully intend to support it, as it seems inevitable that this model will become the standard eventually.

You may actually be able to implement this pattern now, with the current API.

I'm not familiar with this Azure-specific feature, but in the case of OIDC tokens, it's typically as simple as retrieving the ephemeral token from a file or environment variable, and injecting it as an input to your pipeline.

Would you be able to share some more reading material on the "ways to synthetically inject" that you mentioned? We could use that information to devise a plan for supporting it. Also happy to discuss this directly on our Discord server, if you're interested!

Thanks again for the feedback.

dvasdekis3y ago

Logging on to Discord now - I think I can help out with this. Thanks for the welcome! :)

rekwah3y ago· 2 in thread

Curious if cuelang just ended up being too much of a hurdle for onboarding. I like it and have used it quite a bit but there's something about the syntax that makes it impenetrable for many.

shykes3y ago

There's some of that. CUE is incredibly powerful, but it can be polarizing. But the fundamental problem is that developers don't want to learn a new language to write CI/CD pipelines: they want to use the language they already know and love.

So, no matter what language we had chosen for our first SDK, we would have eventually hit the same problem. The only way to truly solve the "CI/CD as code" problem for everyone, is to have a common engine and API that can be programmed with (almost) any language.

aliasxneo3y ago

In my case, I just simply didn't like it (CUE). I'm much more optimistic about Nickel at this point.

chologrande3y ago· 2 in thread

This post is for the python sdk, but the golang SDK is fantastic too.

Complete gamechanger to be able to move away from jenkins groovy CPS hell

plmpsu3y ago

Could you share some information about how you're doing this? How do you create Jenkins stages, push artifacts, test results, etc.? Cheers

chologrande3y ago

3 days later a response.

We have extensive jenkins pipelines, and the "platform team" supports multiple types of build/release models. Most of the infrastructure is written in TF. * 3 languages * 5 "types" (http/cron/flink/etc)

As such we've got huge groovy libraries that codify the build/release process.

I'm using dagger to re-implement most of these libraries, but in a way that I can test and develop locally without getting into tiny commit/rebuild hell.

Using a language like golang with an engine like buildkit behind it, allows me to more easily test each step of the pipeline, without running the whole thing.

travisgriggs3y ago· 2 in thread

Years ago, I had inherited a complicated C code base with some clever Makefile jujitsu. None of the recent maintainers understood it at its core. They just put a little lipstick here, and a little there to keep things running.

I sat down and extracted the compiler/link flags, and then wrote a Python script to do the build. The code was smaller, and built faster.

Every “build” engine evolves from being a simple recipe processor to the software equivalent of a 5 axis CnC mill. Some things should not succumb to one size fits all.

ngcc_hk3y ago

Python is good enough.

Maintainability? Just if one start any OO features, one will gradually deal with modern day goto scenario. One can start another round of guessing, especially you have global.

I found it is easier to read a lisp or a c (functional mostly) source than python. But guess we have to use python these days.

Btw, your saying sound like the usual lisp meme - for every … there is a tiny untested lisp engine there.

intelVISA3y ago

> Every “build” engine evolves from being a simple recipe processor

what if I told you it was every machine

mkoubaa3y ago· 2 in thread

While I prefer python over yaml, I tend to think imperative build systems are inherently more brittle.

shykes3y ago

Dagger remains a declarative DAG-based system. The trick is that you can control it with a (also declarative) API. This gives you the best of both worlds: a robust declarative environment to run your pipelines, accessible from your favorite language (imperative or not).

qbasic_forever3y ago

Yeah I would worry without some very strict policies you could slowly end up with your CI taking on insane dependencies like requiring a full numpy and scipy install because someone years ago wanted to make a chart as output. Google went deep into python as a build system and it burned them enough they built Bazel and starlark as a python-like build system but without the rope to hang yourself in full python.

aclatuts3y ago· 2 in thread

This makes a lot of sense but would it prevent developers from using the default diagrams already built into most CICD websites. I assume migrating to this would make the CICD pipeline look like one step from gitlab, bitbucket, and githubs perspective.

shykes3y ago

It's up to you how granular you make your CI configuration. Much of it depends on the context and how your team works.

If you've already found yourself integrating a Makefile in a CI job, and figuring out the best mapping of Make rules to CI job/step/workflow: this is exactly the same. Ultimately you're just executing a tool which happens to depend on the Dagger engine. How and when you execute it is entirely up to you.

For example, here's the Github Actions job we use to test the Dagger Python SDK. It executes a custom tool written in Go. hhttps://github.com/dagger/dagger/blob/bd75d17f9625f837d7a2f9...

senorsmile3y ago

I mentioned this above, but I believe that the killer feature would be integrating with one or more existing front end CI/CD systems so that the dag created by dagger automatically maps to e.g. the pipeline steps of a Jenkins blueocean job.

yakkityyak3y ago· 2 in thread

Wish I could use this for work, but with no buildkit access its a no go :(

shykes3y ago

I'd love to learn more! May I ask what your use case, and what you mean by "buildkit access"?

yakkityyak3y ago

We have some custom thing built on top of jenkins, and we don't get access to a docker socket, so getting dagger to work doesn't seem possible.

1 more reply

bilalq3y ago· 1 in thread

Haven't heard of Dagger before, but using IaC as code to model your pipelines (including pipelines of infra that itself is modeled as IaC) is absolutely the right way to do things. And doing it in a real programming language rather than some limiting yaml schema is so much nicer. Places I've worked at in the past would refer to these as meta pipelines or self-mutating pipelines.

For this reason, I love using the AWS CDK. Being able to model things in TypeScript was so much nicer than the janky flow of always feeling lost in some Ruby/JSON/YAML IaC template monstrosity.

Curious how Dagger differentiates itself from AWS CDK or cdktf.

shykes3y ago

Dagger and IaC tools (AWS CDK, Terraform, Pulumi) are very complementary. Here's a livestream of Nic Jackson (devrel at Terraform) using Dagger and Terraform CDK together to build and deploy an application: https://www.youtube.com/watch?v=iFNe5W1o2_U

The main difference is that Dagger focuses on running your entire CI/CD pipeline as a DAG of operations running in containers; whereas IaC tools focus on managing infrastructure state. The typical integration is Dagger would run your IaC tool in a container, and integrate that into a broader pipeline with well-defined inputs and outputs. Dagger itself is stateless except for its cache: infrastructure state management is handed off to your IaC tool.

epgui3y ago· 1 in thread

Even if this was a great, well-built tool, which I am sure it is...

Doesn't this introduce a whole dimension of extra stateful complexity compared to configuration YAML?

rytis3y ago

But when we have control structures built on top of yaml (like ansible's `loop`, or `when`), we might as well just use proper programming language.

madjam0023y ago· 1 in thread

I had a look at Dagger, but I'm having a lot of success recently just building CI/CD pipelines with Nix. My `.gitlab-ci.yml` file is then just a thin wrapper of `nix build` commands, and pipelines can be run locally in the exact same environment as CI/CD.

aliasxneo3y ago

Yep, being able to hand-tune the environment and then run it anywhere has been a staple of Nix far longer than Dagger has been around. I know it has its detractors (especially around here), but it's going to take something a lot more awe-inspiring before I give up my Nix superpowers :)

quelltext3y ago· 1 in thread

Confusing. I initially thought someone ported the Dagger DI framework to Python: https://dagger.dev/

POPOSYS3y ago

I wondered how it related to https://dagster.io/

throwawaaarrgh3y ago· 1 in thread

Debugging and maintaining code is to be avoided at all costs. You may prefer writing code as a developer, but it doesn't provide value to the people paying your paycheck, and it doesn't result in better outcomes than simple composeable actions.

Proprietary CI/CD systems are a waste of time. If it's more complicated than a shell script, you need to strip it down.

POPOSYS3y ago

Usually the problems occur when there are too many people between people paying for a product (and therefor paying the paycheck) and people actually working on the product.

goodpoint3y ago· 1 in thread

Why does it require docker? That's a show stopper.

devtastic3y ago

It requires an OCI compliant container, not Docker specifically so technically you are not running "Docker in Docker"

brianzelip3y ago

FYI one of Dagger’s developers runs the most excellent, and breath of fresh air really, Ship It podcast, which focuses on CI/CD and Infrastructure as Code (and always be improving!).

https://shipit.show/

pm3y ago

So far I've been using Earthly for structuring builds, and it's been brilliant for the most part. Dagger was interesting but a no-go without Python, so curious to see how it stacks up.

xedx3y ago

This is an absolute god send for data engineers! We often get blocked full stop with CI/CD headaches and the alternative of “no CI/CD” is just unbearable at times. This needs to catch on quick !

mehanik3y ago

Pipelines as code is the way to go, no matter whether it is CI/CD, data or ML. It’s unfortunate that a lot of pipeline tools use YAML. Glad to see projects like Dagger that use Python instead.

However it is not clear for me what is the benefit of using it instead of calling commands like docker, pytest and kubectl from Python with Plumbum or similar library. Add Fire and it is trivial to create a complete DevOps CLI for your app that can run locally or called from GitHub actions.

tatoalo3y ago

This is something I’d love to try on my personal projects.

I recently used act[0] to locally test my GitHub actions pipelines and worked okay, the fact that I could interact with the Dagger API via a Python SDK could be even more convenient, will definitely try!

[0]: https://github.com/nektos/act

mkesper3y ago

Might be an interesting middle ground between the nightmare of everthing-as-plugins-but-flexible Jenkins and something like tekton.

__warlord__3y ago

Shameless plug:

I'm working on a similar project, portable pipelines built with Nim :)

The idea behind this project is to:

1. Dev and Prod pipelines runs the same code, the only difference is where are executed. Allows easy troubleshooting and faster development.

2. Tests output by default on each script.

3. Decouples the code from the image/container. This allows it to be embedded into any CI stack.

4. Complex rules for retries specific scripts rather than the whole pipeline or even the step (stage)

A typical pipeline definition looks like this:

    ```yaml
    kind: pipeline
    version: 1
    policy: strict
    
    steps:
      - hello

    hello:
      - script: echo "hello world"
        expected_output: "hello world"
        expected_return_code: 0
        retries: 5
    ```

    ```bash
    ./takito --tasks tasks.yml --step hello
    ```

kpen113y ago

Here's the demo video linked in the blog post: https://www.youtube.com/watch?v=c0bLWmi2B-4

It goes step by step through the getting started guide from the Dagger Python SDK docs

pianoben3y ago

(NB: No relation to the widely-used Android/Java DI library)

meling3y ago

This looks awesome! I’ve built similar CI logic using git and docker apis for an autograder system. Looks like I can throw out some of this code… and replace it with dagger.

dith3r3y ago

how this compares to buildbot or gajapipelines

j / k navigate · click thread line to collapse

91 comments

78 comments · 27 top-level

diarrhea3y ago· 16 in thread

I've long wished to be just writing (ideally) Python to define and run CI/CD pipelines. YAML is simply hell, `if` keys in such data description languages are cruel jokes.

pydry3y ago

>Are Turing-complete languages a bad choice for pipelines?

I'd say that non turing complete languages are a bad fit for pipelines. Even mildly complex pipeline will eventually have loops and conditionals.

Better python than some originally-a-config-YAML language turned into an imperative monstrosity with loops and conditionals bolted on.

verdverm3y ago

Dagger originally started with CUE, and is still powered by it under the hood, which has the constructs you mention, while also being turing incomplete.

I don't understand this move to define infra and CI imperatively, and tool vendors moving to support umptine languages for their users... Say what the world should look like, not how to get there?

1 more reply

courgette3y ago

Not a snarky question : what about having data and config in yml, and some high level tool like ansible or terraform and the occasional bash?

You loop and branch in ansible/terraform

I hate that setup but also I have a hard time thinking of something else

2 more replies

lmm3y ago

Something like Dhall might be viable for pipelines. I wonder if there are any efforts in that direction.

ripperdoc3y ago

shykes3y ago

> What's unclear to me from the article and video is how this can replace e.g. GitHub Actions. Their integration with e.g. PR status checks and the like is a must, of course.

You guessed correctly: Dagger does not replace Github Actions, they are complementary. The Dagger project itself uses Github Actions and Dagger together :)

> Would Dagger just run on top of a `ubuntu-latest` GHA runner?

For example here's our own github workflow for testing the Python SDK (you can look around for other workflows): https://github.com/dagger/dagger/blob/main/.github/workflows...

Note that the word "dagger" doesn't even appear, since Dagger is embedded as a library (in this case, using the Go SDK). As far as GHA is concerned, it's just executing a regular binary.

Too3y ago

> Are Turing-complete languages a bad choice for pipelines?

Yes and no. If you write “steps” in yaml, you are doing it wrong and might as well be using a Turing complete imperative language.

dfee3y ago

A good first reach is encoding your CI into a first class construct - ideally in the language of your codebase.

CI paths are underdeveloped, which IMO is a huge miss: you pay a developer premium and potentially risk every iteration. Keep the GHA glue light and invest in your code not your lock in.

__MatrixMan__3y ago

Pretty much everything is just a @task decorated python function or a KubernetesPodOperator but sometimes I imagine I'll write CI-focused operators, e.g. terraform.

heydonovan3y ago

hbogert3y ago

You start to wonder thy you have to compile to 'yaml', instead of just having gitlab just give you a normale interface, in the form of a library. And then we've come full-circle.

paulddraper3y ago

Child pipelines work kinda weird.

blowski3y ago

Think about it from the perspective of the provider of the build platform (e.g. CircleCI, GitLab, GitHub). There are way fewer edge cases parsing YAML files than allowing Turing complete languages.

inferiorhuman3y ago

Be that as it may, YAML has plenty of edge cases that make it the wrong choice.

1 more reply

wmichelin3y ago

I don't know if this is just scoped to my company, but drone seemingly supports starlark, which is just python

KronisLV3y ago

> I've long wished to be just writing (ideally) Python to define and run CI/CD pipelines. YAML is simply hell, `if` keys in such data description languages are cruel jokes.

I'd say that YAML is passable on its own, but gets more and more inadequate, the harder the things you're trying to do get.

Thankfully, I've been able to keep most of my CI pipelines relatively simple nowadays, along the lines of:

  - initialize some variables that don't come out of the box from the CI solution
  - parse or process any files that are needed for the build/action, such as attaching metadata to project description
  - do the build (nowadays typically just building an OCI container), or whatever else the pipeline needs to do (since you can do more than just build applications)
  - save any build artefacts, push containers, do logging or whatever else is needed

This is why I mentioned Drone, which has something nice in this regard: https://docs.drone.io/cli/drone-exec/

shykes3y ago· 12 in thread

Hi everyone, Dagger co-founder here. Happy to answer any questions.

We also released an update to the Go SDK a few days ago: https://dagger.io/blog/go-sdk-0.4

ath3nd3y ago

It looks awesome! I have a clarification question on this one:

Does it mean the sdk is making a round trip to the dagger API remotely somewhere, or is the round trip to a locally running docker container?

shykes3y ago

> Does it mean the sdk is making a round trip to the dagger API remotely somewhere, or is the round trip to a locally running docker container?

The short answer, for now, is: "it's complicated" :) There's a detailed explanation of the Dagger Engine architecture here: https://github.com/dagger/dagger/issues/3595

To quote relevant parts:

dolanor3y ago

Hi, on the locally running engine.

codetrotter3y ago

This is very cool!

Also, from the post:

> Get started with the Dagger Go SDK, Dagger Python SDK, or let us know which SDK you're looking for.

Are you guys seeing many requests for a Dagger Rust SDK yet? :)

As someone who writes mostly in Rust, I’d love to get rid of yaml definitions of CI pipelines and instead define pipelines using Rust

dolanor3y ago

Yes, we're getting a bunch of request from rustaceans

bennine3y ago

Hey there, this looks interesting. What's the scope of this project? Is it meant to be closer to developers or do you see this being used in production?

How is progress of builds, observability, etc being tackled?

shykes3y ago

> What's the scope of this project? Is it meant to be closer to developers or do you see this being used in production?

> How is progress of builds, observability, etc being tackled?

We're also building an optional cloud service, Dagger Cloud, that will provide a lot of these features as a "turnkey" software supply chain management platform.

pydry3y ago

A) Why async in the user code? Is it really necessary?

B) Can you mock pipeline events?

shykes3y ago

> A) Why async in the user code? Is it really necessary?

We support both sync and async mode. The Python ecosystem is in a state of flux at the moment between sync and async, so it seemed like the best approach to offer both and let developers choose.

> B) Can you mock pipeline events?

Could you share a bit more details on what you mean, to make sure I understand correctly?

2 more replies

helderco3y ago

> A) Why async in the user code? Is it really necessary?

Why async?

Dagger has a client and a server architecture, so you're sending requests through the network. This is an especially common use case for using async.

I'm very interested to know the community's adoption or preference of one vs the other. :)

hbogert3y ago

don't forget about CUE please! Dagger is a good reason to learn CUE, my gut feeling so far is, my life is better with CUE experience.

intelVISA3y ago

Any reccs for moving into the IaC space?

helderco3y ago· 3 in thread

Hey, lead developer for the Python SDK here.

I have a few examples in https://github.com/helderco/dagger-examples and I plan to add more.

There's also reference documentation in https://dagger-io.readthedocs.io/ so you can get a birds eye view on what's possible.

JackMcMack3y ago

The api looks very simple and easy, that's a very good sign. We've been doing similar stuff in python, mainly in combination with Pulumi automation.

shykes3y ago

> I'm missing most is a nice UI above all that. How to present realtime and historic logs (and state) for multiple concurrent actions is still not as easy as I think it could be.

We have plans to solve that :)

1 more reply

anentropic3y ago

Haven't tried it yet, but I like the features!

dvasdekis3y ago· 2 in thread

Hi guys, love the idea and the code examples look simple and understandable. Nicely done!

shykes3y ago

Thanks for the kind words!

Yes, we are watching this new pattern of authorizing CI runners very closely. We fully intend to support it, as it seems inevitable that this model will become the standard eventually.

You may actually be able to implement this pattern now, with the current API.

Thanks again for the feedback.

dvasdekis3y ago

Logging on to Discord now - I think I can help out with this. Thanks for the welcome! :)

rekwah3y ago· 2 in thread

Curious if cuelang just ended up being too much of a hurdle for onboarding. I like it and have used it quite a bit but there's something about the syntax that makes it impenetrable for many.

shykes3y ago

aliasxneo3y ago

In my case, I just simply didn't like it (CUE). I'm much more optimistic about Nickel at this point.

chologrande3y ago· 2 in thread

This post is for the python sdk, but the golang SDK is fantastic too.

Complete gamechanger to be able to move away from jenkins groovy CPS hell

plmpsu3y ago

Could you share some information about how you're doing this? How do you create Jenkins stages, push artifacts, test results, etc.? Cheers

chologrande3y ago

3 days later a response.

As such we've got huge groovy libraries that codify the build/release process.

I'm using dagger to re-implement most of these libraries, but in a way that I can test and develop locally without getting into tiny commit/rebuild hell.

Using a language like golang with an engine like buildkit behind it, allows me to more easily test each step of the pipeline, without running the whole thing.

travisgriggs3y ago· 2 in thread

I sat down and extracted the compiler/link flags, and then wrote a Python script to do the build. The code was smaller, and built faster.

Every “build” engine evolves from being a simple recipe processor to the software equivalent of a 5 axis CnC mill. Some things should not succumb to one size fits all.

ngcc_hk3y ago

Python is good enough.

Maintainability? Just if one start any OO features, one will gradually deal with modern day goto scenario. One can start another round of guessing, especially you have global.

I found it is easier to read a lisp or a c (functional mostly) source than python. But guess we have to use python these days.

Btw, your saying sound like the usual lisp meme - for every … there is a tiny untested lisp engine there.

intelVISA3y ago

> Every “build” engine evolves from being a simple recipe processor

what if I told you it was every machine

mkoubaa3y ago· 2 in thread

While I prefer python over yaml, I tend to think imperative build systems are inherently more brittle.

shykes3y ago

qbasic_forever3y ago

aclatuts3y ago· 2 in thread

shykes3y ago

It's up to you how granular you make your CI configuration. Much of it depends on the context and how your team works.

For example, here's the Github Actions job we use to test the Dagger Python SDK. It executes a custom tool written in Go. hhttps://github.com/dagger/dagger/blob/bd75d17f9625f837d7a2f9...

senorsmile3y ago

yakkityyak3y ago· 2 in thread

Wish I could use this for work, but with no buildkit access its a no go :(

shykes3y ago

I'd love to learn more! May I ask what your use case, and what you mean by "buildkit access"?

yakkityyak3y ago

We have some custom thing built on top of jenkins, and we don't get access to a docker socket, so getting dagger to work doesn't seem possible.

1 more reply

bilalq3y ago· 1 in thread

For this reason, I love using the AWS CDK. Being able to model things in TypeScript was so much nicer than the janky flow of always feeling lost in some Ruby/JSON/YAML IaC template monstrosity.

Curious how Dagger differentiates itself from AWS CDK or cdktf.

shykes3y ago

epgui3y ago· 1 in thread

Even if this was a great, well-built tool, which I am sure it is...

Doesn't this introduce a whole dimension of extra stateful complexity compared to configuration YAML?

rytis3y ago

But when we have control structures built on top of yaml (like ansible's `loop`, or `when`), we might as well just use proper programming language.

madjam0023y ago· 1 in thread

aliasxneo3y ago

quelltext3y ago· 1 in thread

Confusing. I initially thought someone ported the Dagger DI framework to Python: https://dagger.dev/

POPOSYS3y ago

I wondered how it related to https://dagster.io/

throwawaaarrgh3y ago· 1 in thread

Proprietary CI/CD systems are a waste of time. If it's more complicated than a shell script, you need to strip it down.

POPOSYS3y ago

Usually the problems occur when there are too many people between people paying for a product (and therefor paying the paycheck) and people actually working on the product.

goodpoint3y ago· 1 in thread

Why does it require docker? That's a show stopper.

devtastic3y ago

It requires an OCI compliant container, not Docker specifically so technically you are not running "Docker in Docker"

brianzelip3y ago

FYI one of Dagger’s developers runs the most excellent, and breath of fresh air really, Ship It podcast, which focuses on CI/CD and Infrastructure as Code (and always be improving!).

https://shipit.show/

pm3y ago

So far I've been using Earthly for structuring builds, and it's been brilliant for the most part. Dagger was interesting but a no-go without Python, so curious to see how it stacks up.

xedx3y ago

This is an absolute god send for data engineers! We often get blocked full stop with CI/CD headaches and the alternative of “no CI/CD” is just unbearable at times. This needs to catch on quick !

mehanik3y ago

Pipelines as code is the way to go, no matter whether it is CI/CD, data or ML. It’s unfortunate that a lot of pipeline tools use YAML. Glad to see projects like Dagger that use Python instead.

tatoalo3y ago

This is something I’d love to try on my personal projects.

[0]: https://github.com/nektos/act

mkesper3y ago

Might be an interesting middle ground between the nightmare of everthing-as-plugins-but-flexible Jenkins and something like tekton.

__warlord__3y ago

Shameless plug:

I'm working on a similar project, portable pipelines built with Nim :)

The idea behind this project is to:

1. Dev and Prod pipelines runs the same code, the only difference is where are executed. Allows easy troubleshooting and faster development.

2. Tests output by default on each script.

3. Decouples the code from the image/container. This allows it to be embedded into any CI stack.

4. Complex rules for retries specific scripts rather than the whole pipeline or even the step (stage)

A typical pipeline definition looks like this:

    ```yaml
    kind: pipeline
    version: 1
    policy: strict
    
    steps:
      - hello

    hello:
      - script: echo "hello world"
        expected_output: "hello world"
        expected_return_code: 0
        retries: 5
    ```

    ```bash
    ./takito --tasks tasks.yml --step hello
    ```

kpen113y ago

Here's the demo video linked in the blog post: https://www.youtube.com/watch?v=c0bLWmi2B-4

It goes step by step through the getting started guide from the Dagger Python SDK docs

pianoben3y ago

(NB: No relation to the widely-used Android/Java DI library)

meling3y ago

This looks awesome! I’ve built similar CI logic using git and docker apis for an autograder system. Looks like I can throw out some of this code… and replace it with dagger.

dith3r3y ago

how this compares to buildbot or gajapipelines

j / k navigate · click thread line to collapse