Isn’t writing code and using zod the same thing? The difference being who wrote the code.
Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.
Whether you do that with Zod or manually or whatever isn't important, the important thing is having a preprocessing step that transforms the data and doesn't just validate it.
For instance if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips. Light "surface" validation can still be applied, but that's not what we're talking about here I think.
That said, I fully agree with the article content itself. It basically just boils down to:
When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options
void validate(struct Data d);
or struct ValidatedData;
ValidatedData validate(struct Data d);
"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.
Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.
I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.
Then instead of validating a loose type & still using the loose type, you're parsing it from a loose type into a strict type.
The key point is you never need to look at a loose type and think "I don't need to check this is valid, because it was checked before"; the type system tracks that for you.
The difference is (a) where and how validation happens, and (b) the type of the final result.
A parser is a function producing structured values - values of some type, usually different from the input type. In contrast, a validator is a predicate that only checks constraints on existing values.
For example, a parser can parse an email address into a variable of type EmailAddress. If the parser succeeds at doing that, assuming you're using a language with a decent type system, you now have a variable which is statically guaranteed to be an email address - not a string which you have to trust has passed validation at some point in the past.
This is part of the "Make illegal states unrepresentable" approach which allows for static debugging - debugging your code at compile time. It's a very powerful way to produce reliable systems with robust, statically proven guarantees.
But as Alexis King (who coined the phrase "Parse, don't validate") wrote, "Unless you already know what type-driven design is, my catchy slogan probably doesn’t mean all that much to you."
function parse(x: Foo): Bar { ... }
const y = parse(x);
and function validate(x: Foo): void { ... }
validate(x);
const y = x as Bar;
Zod has a parser API, not a validator API.But I suppose it isn't as catchy.
The point is you don’t check that your string only contains valid characters and then continue passing that string through your system. You parse your string into a narrower type, and none of the rest of your system needs to be programmed defensively.
To describe this advice as “vacuous” says more about you than it does about the author.
> Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.
Yes, judgement is required to make depending on zod (or any library) worthwhile. This is not different in principle from trusting those same things hold for TypeScript, or Node, or V8, or the C++ compiler V8 was compiled with, or the x86_64 chip it's running on, or the laws of physics.
Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.
$ ldd /usr/bin/rg
linux-vdso.so.1 (0x00007fff45dd7000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000070764e7b1000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000070764e6ca000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000070764de00000)
/lib64/ld-linux-x86-64.so.2 (0x000070764e7e6000)
The worst is compiling a C program with a compiler that uses a more recent libc than is installed on the installation host. $ wget 'https://github.com/BurntSushi/ripgrep/releases/download/14.1.1/ripgrep-14.1.1-x86_64-unknown-linux-musl.tar.gz'
$ tar -xvf 'ripgrep-14.1.1-x86_64-unknown-linux-musl.tar.gz'
$ ldd ripgrep-14.1.1-x86_64-unknown-linux-musl/rg
ldd (0x7f1dcb927000)
$ file ripgrep-14.1.1-x86_64-unknown-linux-musl/rg
ripgrep-14.1.1-x86_64-unknown-linux-musl/rg: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, strippedThis is only a problem, when the program USES a symbol that was only introduced in the newer libc. In other words, when the program made a choice to deliberately need that newer symbol.
My most comfortable tool is Java, but I'm not going to persuade most of the HN crowd to install a JVM unless the software I'm offering is unbearably compelling.
Internal to work? Yeah, Java's going to be an easy sell.
I don't think OP necessarily meant it as a political statement.
And don't write programs with languages that depend on CMake and random tarballs to build and/or shared libraries to run.
I usually have a lot less issues with dragging a runtime than fighting with builds.
Pretty much agreed - once any sort of complicated logic enters a shell script it's probably better off written in C/Rust/Go or something akin to that.
Well that's confused me. I write a lot of scripts in BASH specifically to make it easy to move them to different architectures etc. and not require a custom runtime. Interpreted scripts also have the advantage that they're human readable/editable.
One of the things I love about clap is that you can configure it to automatically spit out --help info, and you can even get it to generate shell autocompletions for you!
I think there are some other libraries that are challenging it now (fewer dependencies or something?) but clap sets the standard to beat.
Go programs compile to native executables, they're still rather slow to start, especially if you just want to do --help
The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?
He even gives the example of zod, which is a validation library he defines to be a parser.
What he wants to say : "I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema"
But that _is_ parsing, at least in the sense of "parse, don't validate". It's about turning inputs into real objects representing the domain code that you're about to be working with. The result is still going to be a DTO of some description, but it will be a DTO with guaranteed invariants that are useful to you. For example, a post request shouldn't be parsed into a user object just because it shares a lot of fields in common with a user. Instead it should become a DTO with the invariants fulfilled that makes sense for a DTO. Some of those invariants are simple (like "dates should be valid" -> the DTO contains Date objects not strings), and some will be more complex like the "if the server is active, then the port also needs to be provided" restriction from the article.
This is one of the key ideas behind Zod - it isn't just trying to validate whether an object matches a certain schema, but it converts the result into a type that accurately expresses the invariants that must be in place if the object is valid.
Or in Haskell!
So maybe the reason why they were able to reduce the code is because they lost the ability to do good error reporting.
For parsing specifically, there's literature on error recovery to try to make progress past the error.
After parsing you check if `error_message` exists and raise that error.
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va... (2019, using Haskell)
https://www.lelanthran.com/chap13/content.html (April 2025, using C)
args:
username str # Required string
password str? # Optional string
token str? # Optional auth token
age int # Required integer
status str # Required string
username requires password // If username is provided, password must also be provided
token excludes password // Token and password cannot be used together
age range [18, 99] // Inclusive range from 18 to 99
status enum ["active", "inactive", "pending"]
Rad will handle all the validation for you, you can just write the rest of your script assuming the constraints you declared are met.Make use of the usage string be the specification!
A criminally underused library.
A CLI and an API should indeed occupy the same layer of a program architecture, namely they are entry points that live on the periphery. But really all you should be doing there is lifting the low byte stream you are getting from users to something higher level you can use to call your internals.
So "CLI validation" should be limited to just "I need an int here, one of these strings here, optionally" etc. Stuff like "is this port out of range" or "if you give me this I need this too" should be handled by your internals by e.g. throwing an exception. Your CLI can then display that as an error message in a nice way.
But this is wrong. Programmers should be writing parsers all the time!
Don't get me wrong, I actually love writing parsers. It's just not required all that often in my day-to-day work. 99% of the time when I need to write a parser myself it's for and Advent of Code problem, usually I just import whatever JSON or YAML parser is provided for the platform and go from there.
Sometimes it's going down to machine code, or rolling your own hash table, or writing your own recursive-descent parser from first principles. But most of the time you don't have to reach that low, and things like parsing are but a minor detail in the grand scheme. The engineer should not spend time on building them, but should be able to competently choose a ready-made part.
I mean, creating your own bolts and nuts may be fun, but mot of the time, if you want to build something, you just pick a few from an appropriate box, and this is exactly right.
TFA links to Alexis King’s Parse, Don’t Validate article, which explains this well. Did you not read it?
>> const port = option("--port", integer());
I don't understand. Why is this a parser? Isn't it just way of enforcing a type in a language that doesn't have types?
I was expecting something like a state machine that takes the command line text and parses it to validate the syntax and values.
That might sound messy but to the author's point about parser combinators not being complicated, they really don't take much time to get used to, and they're quite simple if you wanted to build such a library yourself. There's not much code (and certainly no magic) going on under the hood.
The advantage of that parsing approach:
It's reasonably declarative. This seems like the author's core point. Parser-combinator code largely looks like just writing out the object you want as a parse result, using your favorite combinator library as the building blocks, and everything automagically works, with amazing type-checking if your language has such features.
The disadvantages:
1. Like any parsing approach, you have to actually consider all the nuances of what you really want parsed (e.g., conditional rules around whitespace handling). It looks a little to me (just from the blog post, not having examined the inner workings yet) like this project side-stepped that by working with the `Stream` type as just the `argv` list, allowing you to be able to say things like "parse the next blob as a string" without also having to encode whitespace and blob boundaries.
2. It's definitely slower (and more memory-intensive) than a hand-rolled parser, and usually also worse in that regard than other sorts of "auto-generated" parsing code.
For CLI arguments, especially if they picked argv as their base stream type, those disadvantages mostly don't exist. I could see it performing poorly for argv parsing for something like `cp` though (maybe not -- maybe something like `git cp`, which has more potential parse failures from delimiters like `--`?), which has both options and potentially ginormous lists of files; if you're not very careful in your argument specification then you might have exponential backtracking issues, and where that would be blatantly obvious in a hand-rolled parser it'll probably get swept under the rug with parser combinators.
This approach IMNSHO is much cleaner than the intrication of cmdline parser libraries with application logic and application-domain-related types.
Then one can specify validation logic declaratively, and apply it generically.
This has the added benefit - for compiled rather than interpreted library - of not having to recompile the CLI parsing library for each different app and each different definition of options.
It's a genuine pleasure to use, and I use it often.
If you dig a little deeper into it, it does all the type and value validation, file validation, it does required and mutually exclusive args, it does subargs. And it lets you do special cases of just about anything.
And of course it does the "normal" stuff like short + long args, boolean args, args that are lists, default values, and help strings.
The result is that you often still this kind of defensive programming, where argparse ensures that an invariant holds, but other functions still check the same invariant later on because they might have been called a different way or just because the developer isn't sure whether everything was checked where they are in the program.
What I think the author is looking for is a combination of argparse and Pydantic, such that when you define a parser using argparse, it automatically creates the relevant Pydantic classes that define the type of the parsed arguments.
You need a boundary to convert nice opts into nice types. Like pydantic models could take argparse namespace and convert it to something manageable.
Not quite that, but https://typer.tiangolo.com/ is fully type driven.
This function parses a number in 6502 asm. So `255` in dec or `$ff` in hex: https://github.com/geon/dumbasm/blob/main/src/parsers/parseN...
I looked at several typescript libraries but they all felt off. Writing my own at least ensured I know how it works.
Why CLIs in particular? Because they usually are smaller tools. For a big, important tool, you might be willing to jump through more hoops (installing the right runtime), but for a smaller, less important tool, it's just not worth it.
This project looks neat, I've never thought to use parser combinators for something other than left-to-right string/token stream parsing.
And I like how it uses Typescript's metaprogramming to generate types from the parser code. I think that would be much harder (or impossible) in other languages, making the idiomatic design of a similar similar library very different.
I use Effect CLI https://github.com/Effect-TS/effect/tree/main/packages/cli for the same reasons. It has the advantage of fitting within the ecosystem. For example, I can reuse existing schemas.
"Well, I already know this is a valid uuid, so I don't really need to worry about sql injection at this point."
Sure, this is a dumb thing to do in any case, but I've seen this exact thing happen.
Typesafety isn't safety.
The quote here — which I suspect is a straw man — is such a weird non sequitur. What would logically follow from “I already know this is a valid UUID” is “so I don’t need to worry about this not being a UUID at this point”.
Even in languages like Haskell, "safety" is an illusion. You might create a NumberGreaterThanFive type with smart constructors but that doesn't stop another dev from exporting and abusing the plain constructor somewhere else.
For the most part it's fine to assume the names of types are accurate, but for safety critical operations it absolutely makes sense to revalidate inputs.
"options that depend on options" should not be a thing. Every option should be optional. Even if you have working code that can handle some complex situation, this doesn't make the situation any less unintuitive for the users.
If you need more complex relationships, consider using arguments as well. Top level, or under an option. Yes, they are not named, but since they are mandatory anyway, you are likely to remember their meaning (spaced repetition and all that). They can still be optional (if they come last). Sometimes an argument may need to have multiple parts, like user@host:port You can still parse it instead of validating, if you want.
> mutually exclusive --json, --xml, --yaml.
Use something like -t TYPE instead, where TYPE can be one of json, xml, or yaml. (Make illegal states unrepresentable.)
> debug: optional(option("--debug")),
Again, I believe it's called "option" because it's meant to be optional already.
optional(optional(option("--common-sense")))
EORWhat would you do for "top level option, which can be modified in two other ways"?
(--option | --option-with-flag1 | --option-with-flag2 | --option-with-flag1-and-flag2)
would solve invalid representation, but is unwieldy.Something that results in the usage string
[--option [--flag1 --flag2]]
doesn't seem so bad at that point. --option flag1,flag2
(Maybe with another separator, as long as it doesn't need to be escaped.)Another possibility is to make the main option an argument, like the subcommands in git, systemctl, and others:
command option --flag1 --flag2
This depends on the specifics, though.In Python this was a motivating factor for letting functions demand their arguments be passed as named keywords. Something like send("foo", "bar") is easier to understand and call correctly when you have to say send(channel="foo", message="bar")
Makes sense, I think a lot of developers would want to complect this problem with their runtime type system of choice without considering the set of downsides for the users
I was recently thinking about type safety and validation strategies are particularly thorny in languages where the typings are just annotations. E.g. the Typescript/Zod or Python/Pydantic universes. Especially in IO cases where the data doesn't originate in the same type system.
In a language like Go (just an example, not endorsing) if you parse something into say a struct you know worst case you're getting that struct with all the fields set to zero, and you just have to handle the zero values. In typescript-likes you can get a totally different structure and run into all sorts of errors.
All that is to say, the runtime validation is always somewhere (perhaps in the library, as they often are?), and the feature here isn't no runtime validation but typed cli arguments. Which is cool and great.
In the field I work, zero values are valid and doing it in Go would be a nightmare
":3000" -> use port 3000 with a default host.
"some-host" -> use host with a default port.
"some-host:3000" -> you guess it.
It also allows to extend it to other sources/destinations like unix domain sockets and other stuff without cluttering your CLI options.
Also please consider to use DSN or URI to define database configurations. Host, port, dbname, credentials as separate options or environment variables are quite painful to use.
In short, a great article.
"Invalid data? The parser rejects it. Done."
"That validation logic that used to be 30% of my CLI code? Gone."
"Mutually exclusive groups? Sure. Context-dependent options? Why not."
For me this really piled on at the end of the blog post. But maybe it's just personal style too.