A Turing-complete language allows to write programs that never terminate. This is not what a config file should be capable of.
If it depends on conditional logic or iteration, it probably belongs in a proper programming language with a linter, type checkers, debugger and unit test framework.
For me I'd guess maybe... 0.1%? It's definitely under 1%.
Given that, it makes no sense to me that I'd want to make myself jump through hoops to express some basic coding patterns [1], just to rule out that single class of bugs. It seems like a solution in search of a problem.
[1] https://github.com/dhall-lang/dhall-lang/wiki/How-to-transla...
How is XML coupling data and logic? The only kind of "processing" it does by itself I can think of is composing documents from pieces and "processing instructions" as a generic extension mechanism. That is, features to support its original use case of authoring and capturing structured text. Now SGML has more processing features (tag inference, stylesheets/link processes, notations), but is still far away from Turing-completeness.
That's not really true or even sensible, since XML doesn't combine data and logic. Sure, there were XML-based logic languages (most notably XSLT) as well as XML-based data languages, but while all were applications of XML they were separate languages.
XML lost ground to JSON, etc., as the fashion pendulum swung away from heavyweight tooling and detailed specs for most things (though it's swinging back again), and to some closer-to-memory-layout binary formats as efficiency became a concern in some of the places where rigid specs remained important.
- Jsonnet (https://jsonnet.org/) - simpler syntax and less concepts to learn, just an extension of JSON. But no type checking. An open source offspring of Google's internal config language (GCL/BCL)
- Cue (https://github.com/cuelang/cue) - a more ambitious attempt to fix GCL/BCL by replacing inheritance as the fundamental compositional primitive with constraint unification.
Great thread comparing them against each other by the authors of both: https://github.com/cuelang/cue/issues/33
Cue seems kind of similar to Dhall on first sight, but I haven't used either enough for an informed opinion yet.
The language is simple and remarkably well specified, enough that we implemented our own intellij plugin (https://plugins.jetbrains.com/plugin/10852-jsonnet) and even our own faster compiler (https://github.com/databricks/sjsonnet) without much effort at all.
There are odd corners in the language, but not something that most people will end up bumping into in typical usage. The templates certainly get messy in large configurations, but no more messy than any other code, and the hermeticity/purity greatly helps in managing the messiness. It's certainly less messy/odd than the copy-paste configs or be-spoke JSON/YAML templating systems that inevitably appear in messy deployment environments!
The last thing of note is the lack of static types: this definitely affects usability to some extent, and especially hinders IDE support from being as useful as it is in e.g. Java. But having a useful/ergonomic type system that fits this specific problem space is probably still an unsolved research question.
It was a real shame because ops then implemented some features of Jsonnet via scripts to to parse and merge YAML. What was 0 LOC in Jsonnet is now about 300 LOC plus custom CI checkers, all because of a marketing problem.
What I don't understand is the following: config files are read by text editors, and in the end, by human beings. Because of the latter they should have certain traits. We must agree on the importance of these traits before we can settle on a standard.
For me, important features are that they must be readable, and easily editable. They must be readable with a certain text editor (vi) for backwards compatibility. So that means it shouldn't require syntax highlighting or schema. Well, these 2 simple requirements of mine rule out anything remotely resembling JSON.
It just appears to me that JSON is for JavaScript developers, YAML for Python developers, and Dhall for ML (the whole family I suppose, not just Haskell) developers.
Well then if we're going that route then perhaps all we need is some kind of glue between text config and binary config (which reminds me of Systemd...). Ie. that it accepts multiple config file formats.
it works with yaml structures (hence avoids text templating problems) and uses familiar python-like language, starlark, making quite easy to get started. it makes use of yaml comments to assign metadata/templating directives to yaml nodes, so it looks something like this:
#@ load("@ytt:data", "data")
#@ def labels():
app: echo
org: test
#@ end
kind: Pod
apiVersion: v1
metadata:
name: echo-app
labels: #@ labels()
spec:
containers:
#@ for/end echo in data.values.echos:
- name: #@ echo.name
image: hashicorp/http-echo
args:
- #@ "-text=" + echo.text
it doesn't include type checking, however, it does have a system to "overlay" structures on top of each other via overlay feature -- https://github.com/k14s/ytt/blob/master/docs/lang-ref-ytt-ov.... merge/replace/remove operations expect to find one node by default so map key typos or wrong structural nesting problems are caught easily in common cases.You either need a simple list of items (eg. dependencies) or key/value pairs. Use a text file or yml or json or whatever.
Or you need templating, the use of functions, etc, like dhall provides. But then, why not use the language you're already using for the rest of your project, or a bash script to export some variables?
Might sound like I'm throwing sourness around, but I just don't see the niche for this, except inventing a new thing for the joy of it?
Well, for same reason, say, why people would use a javascript framework to build a webapp over vanilla js. Both could do the job, and for simple cases there's little reason to go with a framework resp. specialized config language.
But as your app/config gets larger and more complex, using a framework resp. config language would tend to get the job done more efficiently by providing you structure and toolbox with solutions to common pain points.
Config generators themselves tend to be a rather heavyweight all-or-nothing solution which leads people to compromise on some adhoc middle-ground solutions like YAML with jinja templates with unclear evaluation semantics. A good config language designed from the ground up can be so much better than this unholy yaml/jinja mess!
Finally, one of the key selling points of specifically Dhall is type checking. Implementing that in config generators in a generic untyped scripting language would be a nontrivial amount of boilerplate, and boilerplate elimination is what config languages are all about.
You clearly haven't wrestled with kubernetes configuration files.
Reams and reams of YAML, heavily indented to represent umpteen nested objects.
Templating, bash scripts, using your favourite language to roll your own config generator - these are all well trodden approaches that fail to scale.
Official shitshows like Helm have come along with nothing more innovative to offer than templated YAML. The next version uses lua to generate config, but I remain skeptical given previous design choices.
We can question whether kubernetes has pushed the dial too far towards necessitating mountains of config, but for now it is most definitely a problem for us users.
Overall a reason enough is that human-friendly languages have different priority than parser-friendly ones.
Honestly it is the reason I like TOML, with the exception of the date data type it cleanly maps to json (which everyone agrees it is a good enough serialization format) and it is specifically focused on human friendliness (except uniform lists) and readability
As an underappreciated feature, the ability to have scoped keyvalues allow to define nested table with flat statements.
Okay thanks, that's a good one. Readability might be a big deal.
I wonder if JSON had allowed for comments if we'd see such a proliferation of config system? At least IME, that seems to be the biggest pain point with JSON.
Non-Turing-completeness is certainly very important in many cases (e.g., in DTrace and eBPF), but I'm not sure that it's so important for configuration. Assuming for a moment that I don't need non-Turing-completeness for configuration, my choice of DSL would be jq[0]! Using jq for configuration means that I can use JSON, TOML-style, and other ways of expressing complex data, including combinations of them, all with "interpolation" (not quite) and complex computation being available.
[0] https://stedolan.github.io/jq/So you could describe e.g. a cluster of machines entirely in Dhall and derive Ansible YAML scripts (with all their boilerplate), derive DNS config files, etc. etc. all from a single strongly typed description.
- Why use '=' instead of ':' for attributes? If you used ':', then '=' could be variable assignment and eliminate the need for 'let'.
- Why is there a need for commas?
- Why quote via ticks?! Gee!
- What's with the '{-' and '-}' for comments?! It's like its author decided to differ at any price!
In general, good ideas, but it's too weird and unnecessarily deviates from common syntax.
Colons are used for type signatures.
Commas are presumably required because you can have multi line and nested records. (don't quote me on this, not a parser expert)
The comment syntax is from Haskell.
Not saying this syntax is familiar to everyone, but it is familiar to some. The lineage of the syntax might help you understand where the language is coming from
{ -- Unlike YAML, Dhall does not accept YES|NO|ON|OFF
validDhallBools = [ True, False ]
, someNumbers = [ 1
,
-- Dhall is not indentation-sensitive
2, 3 ]
-- Field names that conflict with reserved identifiers must be quoted
, `True` = True
, version = "9.3" {- Strings must be quoted
All Dhall literals have unambiguous types -}
}I hate commas at the start of lines and I would prefer not to have curly braces in a human editable/readable format.
Neither reason is terribly rational but my first impressions weren't great.
are not a requirement.
There are already kubernetes bindings available https://github.com/dhall-lang/dhall-kubernetes .
The syntax in the examples looks a bit more verbose and less readable than yaml but I think building sensible abstractions on top of it will alleviate the pain (abstractions here are innocuous since you can 'normalize' the code and they disappear)
I'm not too happy with the default formatting though. I think if the formatter indented nested values similar to yaml that would look better to the human eye.
I read about dhall’s imports, and I don’t think I like it. If I add a text configuration mechanism to software, I do not want it accessing the network by default, full stop. To me, a “safe” configuration language means that parsing terminates, does not have side effects, does not touch the network, and that parsing the same file twice gives the same output unless I explicitly change an input. Pulling a prelude off of github does misses several of these requirements.
(Having your config file fail to parse if your network is down is bad, bad news if that config is needed to bring your network up. It’s also bad news if a parsing failure due to a transient network issue leaves your system in a state where it won’t quickly recover if the network comes back.)
You may also just download any imports yourself and source them locally.
Additionally Dhall supports import fallbacks, for example you may try first a remote import, and if it fails it will look for another place, which I’ve could be remote or local. This is a good strategy for developing locally and then committing imports for production use.
You can also, of course, host the files in your local network.
Also, if you were to import over the network, by running `dhall freeze` a semantic hash of the content is computed so you are 100% sure that what you are importing is not going to change. Moreover, files that have a hash value will be cached by dhall.
If you don't want to bother with copying over Prelude and you don't trust the cache, you can also normalize the code before pushing it to the network. This will flatten all your imports and reduce your file to normal form.
You might be interested in what they say about imports here: https://github.com/dhall-lang/dhall-lang/blob/master/standar...
Well... I'd argue that when using Python I don't feel the need for a config file language in the first place... it's human friendly enough and I don't have to learn another syntax, use another parser, etc. I had to work on a Symfony project recently and I wish it wasn't sprinkled with all those yaml files.
> I’ve never caught myself thinking “if only there was a nice way to limit myself to a non Turing complete subset of python/Haskell”
Ditto... seems like bloat to me. There may be some use cases I don't know about but these config file languages tend to repel me.
Generating assembly from C is an implementation detail, and many C compilers don't do that.
On the other hand, Dhall really is a YAML generator: the available tools allow only one-way conversion (in particular, there is no interpreter/library to ingest Dhall from the configured application itself).
If it's truly for end-users (read: non-admin/dev types), you probably shouldn't have them touching configuration files _at all_.
So it's not so surprising to see it here, seeing as Dhall is written in Haskell.
Dhall elegantly solves a major challenge: configuration management at scale. We build a multi-cloud management platform, which serves DevOps teams, IT Governance, Controlling and IT Management in large enterprises. That means we're an integration solution for a lot of things, so we need to be highly configurable. Because we also manage private clouds (a la OpenStack, Cloud Foundry, OpenShift etc.), we often run on-premises and operate our software as a managed service. Using dhall allows us to _compile and type check_ all our configuration for all our customers before rolling things out. We use dhall to compile everything from terraform/ansible, kubernetes templates, spring config, to concourse ci pipelines and customer-specific reference data to load into our product. Since adopting dhall earlier this year, we measurably reduced our deployment defect rate and re-gained the ability to safely refactor configuration.
It takes a little time to get used to, but we appreciate that it's highly opinionated around formatting and "how to do things" - somewhat in the same way as golang is. It has certainly helped that we had a member with haskell experience on the team, as dhall is built in haskell and the syntax feels familiar.
Plug: if you're looking for a job working with dhall, reach out :-)
- 0: https://meshcloud.io - 1: https://github.com/Meshcloud/ejs-compiler
Common example: let's say I want to set up a PostgreSQL database for a service running in Kubernetes in AWS. How best to get it done?
Well, it turns out there's a number of different options: you can set up a DB through RDS, and a service in Kubernetes which directs to it through an externalName, which is probably what you want in production; you can set up Postgres as a StatefulSet, which is probably what you want in an ephemeral testing environment; or maybe you have a customer with a full-time DBA who will create the database for you and give you a connection string.
With Dhall, you set up a union type with each of these scenarios as options, and then you have a Dhall function for your Terraform and Kubernetes configurations. In your Terraform configuration, you have an RDS module where the count is set to 1 for the RDS/production scenario and 0 otherwise. In your Kubernetes configurations, you set up a service with an external name appropriately when you need to, set up a StatefulSet when that's relevant, etc.
Because they all use the same type in their function's parameter, they're guaranteed to stay consistent. You're guaranteed to never have an RDS instance setup alongside a Postgres StatefulSet. If you need to make changes (add options, change options, etc.) then you will get type errors in each and every place which forces you to address them, including in places you forgot about.
We started to adopt Dhall more than half a year ago now and we've barely scratched the surface of what the language makes possible. Purity in infrastructure and operations is a powerful drug.
- This is human readable contrary to the JSON family and its {} abuses.
- It is not space / ident base contrary to YAML that becomes very quickly a mess to write and a mess to parse.
TOML is good for data layed out with TOML. Representing arbitrary nested arrays and tables gets messy.
Also, the constraint on homogenous shallow types has impacted me in some cases. Originally, I was all on board. Arrays should be homogenous. The problem is logically homogenous vs syntactically homogenous.
Cargo uses tables to declare dependencies. The values are logically homogenous, they are declarations. Synatictically, some values are strings while the rest are sub-tables. The string is just shorthand for a table though.
This feature can't be implemented in arrays like it can with tables.
So the authors claim that their language is guaranteed to terminate for all well-typed programs. That is actually a nice spot for configuration languages. Yet, I wonder how
a) they guarantee it, as I have seen no obvious link to the language's semantics
b) useful this is in practice.
Nevertheless, very nice approach, indeed.
What matters is that you can analyze the code quickly. To find that out, one way is to try it and kill the process if it takes too long.
Or perhaps better would be to come up with a portable definition of what "takes too long" means that you can put in a presubmit check. Something like "running out of gas" in Ethereum.
https://github.com/dhall-lang/dhall-lang/wiki/Safety-guarant...
which in Rust, we're solving this via SANE and SCL:
https://gitlab.com/bloom42/sane-rs
I'm not sure how much need there is for an additional programming layer, especially within config (the part of a program with the simplest syntactic requirements).
for my projects where "ahead-of-time validation" is needed, we're currently using SCL's parser for safety guarantees:
https://github.com/foundpatterns/contentdb
https://github.com/foundpatterns/lighttouch/blob/d7ada4576a6...
https://github.com/foundpatterns/torchbear/blob/4dd2b9ea76ba...
- functions
- a powerful typesystem
- remote (HTTP) imports with sha256 checksums
Programmatic generation of static configuration files can be very useful.
Sufficiently complex examples of the latter might as well be the former as far as maintenance is concerned.
If you need to write a program to configure your program, you're probably doing it wrong.
Generating the config during deployment, eh... often necessary. Best done with transforms and templates because they're simple.
Executable config, run during startup or, worse, on each request? NO.
[edit] I think that's the main disconnect here: 'past compile time'. The whole point of testing, strong type systems, etc is to lock down the set of states the system can be in. If your configuration is so 'dynamic' you are essentially abandoning all those benefits and saying 'yeah, do what you like to our live servers'.
In short, configuration which is that powerful is indistinguishable from running untested code in production.
How small is a static binary to run this in my containers?
How are some ways to integrate the typed config in a language?
The following languages natively bind to Dhall:
* Haskell * Clojure * Ruby
... and the following language bindings are in progress:
* Rust * Go * Python * PureScript
In the absence of a native language binding, you can convert Dhall to YAML or JSON and read that in.
let input =
{ relative = "daughter"
, movies = [ "Boss Baby", "Frozen", "Moana" ]
}
We don't frequent the same kind of "non-technical users" I guess.