Once you have that under your belt it's not hard to work out how Docker itself works and how you can use it to fulfill the sort of CI/CD objectives you have outlined. Docker itself isn't important, the semantics of containerization are.
Something that Docker (and Docker like things) take massive advantage of are overly filesystems like AUFS and overlayfs, you would do good to understand these (atleast skin deep).
Finally networking becomes really important when you start playing with network namespaces, you should be somewhat familiar with atleast the Linux bridge infrastructure and how Linux routing works.
Good luck!
It's the most roundabout way - and OP is conflating docker with CI/CD, referencing PHP and Node - it's probably safe to say they aren't looking for a deep dive.
Plus - knowing how it runs under the hood doesn't mean you know how to use docker itself.
I think it's safe to say they do want a deep dive but might be forgetting to mention some of their reasons.
For someone familiar with docker, maybe it is good to start from the other side and work backwards.
You would think that is would still be advantageous to have a detailed understanding of what is going on in the stack, but that actually causes problems when you make a suggestion that no one else understands.
And this would be a great advice!
Docker implemented in around 100 lines of bash: https://github.com/p8952/bocker
This is the most mindblowing example for enterprise security teams that think Docker is a new threat on a single tenant Linux host.
No, buddies, all this stuff is already there. If you were fine with your visibility before*, you're still fine. Go find a real problem while people play with their developer dopamine.
* NARRATOR: They shouldn't have been.
For example. Docker has absolutely zero knowledge of the branches lifetime or even branches at all. This is something you have to design using the existing capabilities of docker together with features or existing integrations provided by GitHub or bitbucket.
Of course knowing docker deeper will help you understand these boundaries better and use them.
One secret is that there is actually not much to it, most things are just variations of docker run and various tricks within docker build, sprinkled with some volume and image management like tagging and pruning. Other orchestrators like GH Actions, Compose, Kubernetes etc can be seen as building around these basic blocks.
If you already know these basics, you are probably going to learn faster by getting your hands dirty, trying to solve the scenarios you need, rather than binge watching tutorial#187 on YouTube.
1) your reverse proxy, Nginx/Caddy
2) your "app", or API, whatever. pick whatever you want, a Rails API, a Phoenix microservice, a Django monolithic app, whatever you want.
3) your database. Postgres, whatever
4) Redis - not just for caching. Can use it for anything that requires some level of persistence, or any message bus needs. They even have some plugins you can use (iirc, limited to enterprise plans... maybe?) like RedisGraph.
5) elasticsearch, if you need real-time indexing and search capabilities. Alternatively you can just spin up a dedicated API that leverages full text search for your database container from 3)
6) ??? (sky is the limit!)
I prefer docker compose to Kubernetes because I am not a megacorp. You just define your different services, let them run, expose the right ports, and then things should "just work"
Sometimes you need to specifically name your containers (like naming redis container `redis`, and then in your code you will have to use `redis` as the hostname instead of `localhost` for example).
basically That's It (tm)
Kubernetes scales down pretty well, there's a few canned single-machine versions. I've been playing with k3s lately at work (if this works out, it should lead to things running on a proper HA cluster), and I'll probably also move over some of the standalone containers I have at home (which are all single-instance standalone things, currently using podman as a systemd service).
In which cases would you prefer Kubernetes? Or rather, why would a megacorp prefer it over standard Docker?
"Docker compose" (not the same thing as docker) works great at single machine scale/local development environments, but isn't really designed to scale much beyond this to production environments, multiple servers and data centers etc, which Kubernetes is. This isn't to say you couldn't deploy something to production with compose, its just not very likely outside of small personal projects - there are heaps of features in Kubernetes that simply don't exist in compose.
Generally you'd typically find a docker compose configuration for easy local development environment deployments, a Kubernetes configuration for managing the production environments, although there are no hard and fast rules here. Compose generally works best where the services fit all on the same box, which is rare for a business of almost any size in production, but common for local dev work.
I also prefer Compose for personal projects and local development, but it simply wouldn't work at any place I've worked for production deployments.
that's just my guess though :) Happy hacking!
How did you gather this? OP only mentions CI/testing, nothing about deployment. That's a long distance from needing kubernetes.
btw, imo, nearly all commercial shops' use cases are a long distance from needing kubernetes, but that doesn't seem to be stopping very many people... :)
https://github.com/nickjj/docker-node-example is an up to date Node example app[0] that's ready to go for development and production and sets up GitHub Actions. Its readme links to a DockerCon talk from about a year ago that covers most of the patterns used in that project and if not some of my more recent blog posts cover the rest.
None of my posts cover feature branch deployments tho. That's a pretty different set of topics mostly unrelated to learning Docker. Implementing this also greatly depends on how you plan to deploy Docker. For example, are you using Docker Compose or Kubernetes, etc..
[0]: You can replace "node" in the GitHub URL with flask, rails, django and phoenix for other example apps in other tech stacks.
1. Rebuilding the entire container which often involves stopping and starting it, etc.
2. Manually running commands that copy the files into the container. This is irritating because if I forget which files I changed or forget to run the copy command I end up with a "half updated" container.
3. SSHing into the container. This is irritating because I have to modify the port layout and permissions of the container and later remember to "restore" them when I'm "done" making the container.
Thanks!
Of course, this means that you'll not just stop, but completely destroy the old container and start a new one (created from the new image). You can get this to happen without service downtime by using the very same techniques that you'd use on any high availability / multi-server enviornment (rolling upgrades, canary deployments, etc.).
If you need to have some files that persist across these upgrades, then you use volumes and/or bind mounts. These allow you to have folders that persist independently of the container's lifecycle. They are typically used to store things like a sqlite database that the container uses, the set of configuration files that you can edit on a per-instance basis, etc.
Finally, there's a big case where you ignore all of the above: when you use containers as a development tool. In that case, particuarly for "interpreted" languages (python, php, ruby, etc.) it becomes extremely useful to bind-mount your pograms' sources inside a development container. You can then develop normally but also change the entire system where your app runs extremely easily. You can also keep different environments (language version, libraries, configuration of all those) for different projects without any change for conflicts between them, etc.
For development, you add a volume to the container and mount a location on that volume to a location on your host hard drive. There are different mount options that control which changes propagate in which direction, but two-way propagation is avaliable and works fine.
Once you have this set up, any changes you make on your host system are automatically reflected in your containers volume. It's very low stress and low overhead, honestly.
This works, however, only if you know beforehand which files you’re going to update frequently.
The data you work with should be stored externally (volume, database, accessed via API, …). You don‘t keep persistent state of your workload in the container.
RUN adduser -u $UID...
USER node
Then in the docker-compose you bind mount your current directory. volumes:
- ./:/app/You want to learn more about your CI system and then try things out until you hit the harder / edge cases.
Some things to try or think about
- Push two commits quickly, so the second starts while the first is running.
- rebuild a a commit while the current build is executing. Which one writes the final image to the registry? How do you know?
- How do you tag your images? If by branch name, how do you know which build produced an image? If by commit, how do you know which branch?
- Do you want to run the entire system per commit, shutting it down at the end of a build? Do you want to run supporting systems for the life of a branch? How do you clean up resources and not blow up your cloud budget? Do you clean up old containers each build (from old commits on this branch)? How do you clean up containers after a branch is deleted?
- Build a CI process that triggers subjobs, because eventually you may want to split things up. If you push a commit before the last build's subjob triggers, does it get the original commit or the latest commit? CI systems have nuances, Jenkins always fetches the latest commit when a job starts for a branch, so you may not be testing the code you think you are.
- Do you use a polyrepo or monorepo setup? For poly, how do you gather the right version of components for your commit? For mono, how do you build only what is necessary while still running a full integration test?
- Should you be doing integration testing inside or outside of the build system?
One of the reasons content that addresses these questions is harder to find is that the answers are highly dependent on the situation and tools. My solutions to many are handled with a mix of CUE and Python. You'll be writing code in most solutions
Start step by step.
Before building on Github Actions, build locally.
See if you can build and tag and image with the git SHA. Then run your automated test command against the image/container.
Then see if you can write a github action doing exactly what you did locally.
Random blog posts have been more helpful in my experience vs youtube videos.
This is the reason I gave up on learning docker properly. I had 3 devices at my disposal - M1 mac, a windows 10 pc and a rpi. The random errors I was getting made me quite frustrated. Keep a code diary and document your mistakes and solutions.
Also get a VPS. Never ever try a serverless solution when trying to properly learn docker. Also, do not try to do anything that involves GPU processing.
It all boil downs to that step 0, makes sure you have that "docker compatible" device. People are successfully running GPU processes using docker. However, docker is not "run everything, everywhere" as some people may think it is.
Really understanding the idea of containerization is fundamental. If someone tries to dabble in GPU processing in their first or second week of learning docker, they will be surprised how difficult troubleshooting docker is.
Like do you just run e.g. nodejs or javac locally and then "deploy" to a container, or do you have a development container where you code "in it", or is a new container built on every file change and redeployed?
At my current place of work, all of this is totally abstracted away so no idea how real world people do it!
There's certainly patterns you can use to run things like auto compiling code (think angular app in dev server mode) as you save in a container to ensure a consistent dev environment. However, I usually find that you're fighting the operating system and dev tooling on things like volume syncing that makes it not actually a net benefit. Though if I were to tackle getting my 100 devs consistent and I know they all have the same OS layout it probably would be worth it.
The only time I tend to code "in" a container is when it's a very large code base with complicated dev tooling, or tooling I need to containerize to avoid clashing with my OS. In that case I build a "build container" and run it with a volume mount pointing to my working directory, rather than do a docker build. For a large code base, the purist "build with docker" approach requires copying that whole build context each time. Causing the build to take forever and thrash disk space
There's a shared repository we use that hosts all our migrations and has a script to refresh a set of schema files that define our current schema by running the migrations against a fresh, empty database container; this repository is referenced by our code repositories as a git submodule.
We have one shared database that multiple apps use, and the above setup has worked out reasonably well for us. It's not terribly fancy, and I'm sure there are better ways out there we haven't discovered, but we have a good amount of flexibility.
Also, that's all just for local manual testing/exploration -- we also use Java's "testcontainers", which spins up its own containers (though basically the same idea) for automated tests to run against. Testcontainers lets you specify how it restarts the containers -- after each test class, after each test run, or not at all. Restarting the container is pretty slow, so we have it set up to just drop and recreate the relevant databases within the container.
For deployment, we've used different platforms that deploy apps in containers, but we don't manage the containers directly.
Also, I've tried the "develop in a container" thing but only on rare occasions, such as to work around a MacOS bug that makes CIFS file sharing horribly slow. Technically that was a Vagrant VM, not a container. Haven't had much luck with it otherwise -- it seemed like more trouble than it was worth (it's the awkward file syncing that's the barrier for me).
IMHO there’s still room for hiqh-quality blog posts about containers. E.g. there are lots of gotchas that could be explained. E.g. if you keep your commands in a suboptimal order, you will not get the benefit of caching when building the container. And why use multi-stage builds etc etc.
PS. See also https://xkcd.com/1053/ :)
It is even hard to find undoubtedly holistically good examples for docker usage. Many people do many things in different ways, some better some not so good. One can often find good aspects of docker usage in projects though. Like "What kind of environment variables should to let the user pass in, to avoid having to hardcode them in the image and keeping things configurable?", or "How to use multi-stage builds?". It is up to the thoughtful observer, to identify those and adapt ones own process of creating docker images.
I don't see docker as some kind of thing, that one sits down with for a few evenings and then fully knows. More like a thing one picks up over time. One runs into a problem, then searches for answers of how to solve this problem in a docker scenario, then finds several answers and picks one that seems appropriate, then learns, whether that choice was a good one later on. Until then it works for as long as that solution works. It is not like docker is some kind of scientific thing, where there is one correct answer to every question. Many things in docker are rather ad-hoc developed solutions to problems. Just look at the language that makes a docker file and you will see the ad-hoc-ness of it all. Then there are limitations that seem just as arbitrary. For example limited number of layers (stemming from being afraid of too much recursion not being supported by Go and not "externalizing the stack"), not being able to change most of a container's attributes (like labels) while the container is running.
As for questions of CI and so on: I think they are separate issues, which are solved by having a good workflow for the version control system of choice. One could for example configure the CI to do things for specific brances. Like deploying only the master branch or deploying a test branch to another machine/server. But this has nothing to do with docker.
- https://softwaremill.com/preview-environment-for-code-review...
- https://softwaremill.com/preview-environment-for-code-review...
- https://softwaremill.com/preview-environment-for-code-review...
While the examples use Gitlab, it shouldn't be very hard to port the same idea to a Bitbucket.
It demystified a lot of docker features for me.
I'm in the process of making a follow-up to this that covers more advanced topics. Stay tuned.
I also have a course that shows you how to use Docker for the build-test-deploy loop, though some of it is a little stale. Check that out here: https://www.linkedin.com/learning/devops-foundations-your-fi...
We use this mechanism with AWS, the serverless framework and some terraform. It works well. With us, the only thing remotely container related is the runtime context for the CI/CD pipeline.
That being said, you could make this work against a k8s cluster, fargate, or just some build servers.
An easy way to get ephemeral envs starting from your docker-compose definition is Bunnyshell.com. It uses Kubernetes behind the scenes, but it's all pretty much abstracted away from the user. There is a free so you can experiment.
Disclosure: I'm part of the Bunnyshell team.
I use it severel times a week. Buildx, Dockerfile, etc.
Good luck!
Ask it something like “Explain how to get started with Docker” and it will give you a bunch of steps in a reasonable order. Then ask it for details for each step, like:
“How do I install Docker on macOS?”
“Write a commented Dockerfile for an application written in $WHATEVER”
“Now write a commented Docker Compose file for this application and a Postgres database”
etc
What does ChatGPT say for something like
"How do I clean up old docker images in a registry after I merge a branch?"
I think this statement is not as precise as it should be. I'm running docker produced images in a k8s cluster and had to google what you are talking about here.
https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-...
"Docker-produced images will continue to work in your cluster with all runtimes, as they always have."
Dockerfile is a very common file to see in projects, and I'm thankful when I see it.
Although a pattern I commonly see these days with a lot of OSS projects is that oftentimes they provide a docker-compose.yaml even though no one at any reasonable scale is running Docker Compose in production. This is simply because Kubernetes setups are complicated, weighty, and heterogenous, and often not running on your local machine and Docker Compose is a great way to do a hello world style demo of your container-orchestrated app (because everyone still uses Docker Desktop).
Not sure about this, Docker is still widely used in CI systems and local development in my experience.
It's also widely used in Opensource projects for "reproducible builds"