undefined | Better HN

0 pointslmm5y ago0 comments

At that point why use YAML at all? If it's generated by a program and fed to a program, you're better off using protobuf or something like that. In fact, since you're probably using the same language on both ends, why not just write a regular value in your language?

This probably sounds like a strawman, but it's not. It's how a lot of e.g. Python projects are configured - the "config" file is just a normal bit of code that gets run to produce a value. Unless you're using a programming language that absolutely sucks at expressing plain values (e.g. C or Java), it's much better than separate config files, IMO.

0 comments

derefr5y ago

> At that point why use YAML at all?

Ideological answer: For the same reason HTTP/2.0’s binary protocol didn’t instantly obviate/deprecate HTTP/1.0’s text protocol. Text has advantages: text is debuggable, and prototypable. If the interface between two programs is a text based declarative language, you can audit that text, diff that text, edit that text to see how changes affect the result, mock one side or the other by producing or consuming that text, etc. “GitOps” style config management would never work if config was all opaque binary blobs. These are all reasons that major software projects standardize on YAML or other widely-supported textual data serialization formats for their config.

Pragmatic answer: because we’re talking about production configuration management, here, which is, 99% of the time, about writing configuring and managing the third-party black-box components in your stack, not your own components. Your own business layer usually can be configured conventionally, with minimal explicit config, for your use case, since you built it to work idiomatically for that use-case. It’s all the third-party stuff that has an impedance mismatch to your use-case assumptions, translating to needing tons of config to do what you need.

And, obviously, if you don’t control the other end, you don’t decide how the other end does its config. Usually, these days, it’s YAML (or TOML) — for the ideological reasons mentioned above.

Example: Kubernetes. Big consumer of complex YAML. Many people try to template that YAML. Much simpler and less error-prone to just write a program to generate said YAML. No reason to assume you’re writing in whatever language the k8s orchestrator is written in. (In fact, there are multiple orchestrators, written in different languages, and the shared YAML resource spec is the only formal interface they share.)

lmmOP5y ago

> Ideological answer: For the same reason HTTP/2.0’s binary protocol didn’t instantly obviate/deprecate HTTP/1.0’s text protocol. Text has advantages: text is debuggable, and prototypable. If the interface between two programs is a text based declarative language, you can audit that text, diff that text, edit that text to see how changes affect the result, mock one side or the other by producing or consuming that text, etc.

I can see the argument for using a textual format (although I think it's weaker than you say; if we're generating this config with code then we don't want to diff or edit the generated config), but YAML seems like a singularly poor choice if you want reliable diffs and editing; it's like picking tag-soup HTML. Straight JSON (ideally with a schema), TOML or even XML seems like a better bet if you're generating it programmatically.

> And, obviously, if you don’t control the other end, you don’t decide how the other end does its config.

Right, in that case it's all moot. I took GP to be talking about what formats these tools should use. IMO if the tool is intended to consume a machine-generated config then it would be better to use a machine-oriented config format. I think the option of something like protobuf (which is language-independent) is underappreciated, but even restricting ourselves to textual options, something stricter than YAML seems like a better bet.

derefr5y ago

But the third-party tool frequently isn’t intended to (only) consume machine-generated config. It’s usually built to consume a format that could equally be machine-generated or hand-authored. Usually with an emphasis on hand-authoring, where machine-generation is an automation over hand-authoring that will only need to happen as one scales; and so high-complexity machine-generation will only be relevant to the most enterprise-y of integrators.

Other examples of formats like this, that are hand-authored in the small but generated in the large: RSS, SQL, CSV.

Again, Kubernetes is a prime example of this. K8s config YAML is designed with the intention of being hand-authored and hand-edited. It’s only when devs or their tools need to auto-generate entire k8s cluster definitions, that you begin needing to machine-generate this YAML. This generated YAML is expected to still be audited by eye and patched by hand after insertion, though, so it still needs to be in a format amenable to those cases, rather than in a format optimal for machine consumption.

> if we're generating this config with code then we don't want to diff or edit the generated config

Look more into GitOps. The idea behind it is that whatever tooling you’re using to generate config is run and the resulting config is committed to a “deployment” repo as a PR; ops staff (who don’t necessarily trust the tooling that generated the config) can then audit the PR, and the low-level changes it describes, before accepting it as the new converged system state. It puts a human veto in the pipeline between machine-generated config and continuous deployment; and allows for debugging when upstream tweaks aren’t having the low-level side-effects on system state one would expect.

emn135y ago

In most programming languages you can hand author a value just fine - that part isn't an advantage to something like YAML or json. Given the use of variables and a few other similar simple techniques, I dare say many programming languages are more amenable to hand-authoring static config objects than most static config languages.

I think the real issue is reproducibility; and that boils down to purity. Fully fledged languages all come with lots of apis and features to interact with the rest of the world, and it's quite unclear which apis have such dependencies and which do not - and it's seductively easy to do something actually useful in a "real" programming language that will make the whole configuration process unwieldy later - like, say, reading parts of the config from disk, getting some services public key off the internet, embedding a timestap, or even writing some computed config like a random key to a bit of storage for a later config process to consume. And once you do that, then the whole thing gets flaky, fast.

If you can rigorously avoid that, there's not too much advantage to a static config language.

1 more reply

stevekemp5y ago

Things like AWS Cloudformation require YAML input, so there's no real choice on what you emit.

But writing the YAML is fiddly and annoying, so that's a good example of something where it is better to generate it via troposphere (a python module) or some similar system.

To be less specific I guess the answer is that sometimes you don't control both ends - the part that emits and the part that consumes, and having faught ansible, and similar tools, if I can avoid it I'd never want to write YAML by hand for non-trivial purposes if I could script it instead.

noctune5y ago

Just write JSON and pretend it's YAML. YAML is a superset of JSON so there's no need to generate "nice" YAML if there isn't a human reading or writing it.

throwaway8943455y ago

It’s still good for humans to be able to debug it, and there’s no downside to generating YAML over JSON (I say this as someone who typically prefers JSON).

pmontra5y ago

This is even more true for Ruby. The language is famous for the ease of creating DSLs because of block passing and optional parentheses. Examples: puppet, chef, vagrant, Rails' configuration files. I still remember the joy of not configuring a project with XML coming from Java Structs in 2006.

boublepop5y ago

> It's how a lot of e.g. Python projects are configured - the "config" file is just a normal bit of code that gets run to produce a value.

Which is the root of a ton of different problems and issues and generally regarded as a bad idea. See pep518 and pyproject.toml vs setup.py

j / k navigate · click thread line to collapse

0 comments

derefr5y ago

> At that point why use YAML at all?

And, obviously, if you don’t control the other end, you don’t decide how the other end does its config. Usually, these days, it’s YAML (or TOML) — for the ideological reasons mentioned above.

lmmOP5y ago

> And, obviously, if you don’t control the other end, you don’t decide how the other end does its config.

derefr5y ago

Other examples of formats like this, that are hand-authored in the small but generated in the large: RSS, SQL, CSV.

> if we're generating this config with code then we don't want to diff or edit the generated config

emn135y ago

If you can rigorously avoid that, there's not too much advantage to a static config language.

1 more reply

stevekemp5y ago

Things like AWS Cloudformation require YAML input, so there's no real choice on what you emit.

But writing the YAML is fiddly and annoying, so that's a good example of something where it is better to generate it via troposphere (a python module) or some similar system.

noctune5y ago

Just write JSON and pretend it's YAML. YAML is a superset of JSON so there's no need to generate "nice" YAML if there isn't a human reading or writing it.

throwaway8943455y ago

It’s still good for humans to be able to debug it, and there’s no downside to generating YAML over JSON (I say this as someone who typically prefers JSON).

pmontra5y ago

boublepop5y ago

> It's how a lot of e.g. Python projects are configured - the "config" file is just a normal bit of code that gets run to produce a value.

Which is the root of a ton of different problems and issues and generally regarded as a bad idea. See pep518 and pyproject.toml vs setup.py

j / k navigate · click thread line to collapse