Compile-time JSON deserialization in C++ (opens in new tab)

(medium.com)

108 pointsdctwin1y ago88 comments

88 comments

49 comments · 9 top-level

stephc_int131y ago· 13 in thread

I am afraid of the compile-time cost.

For this kind of things I tend to prefer using a simpler program (written in anything you like) to generate C or C++ instead of having the compile do the same thing much slowly.

Meta programming can be good, but it is even better done with an actual meta program, IMO.

chipdart1y ago

> I am afraid of the compile-time cost.

Even though compilation time is the bane of C++, I think this concern regarding this specific usage is grossly overblown. I'm going to tell you why.

With incremental builds you only rebuild whatever has changed in your project. Embedding JSON documents in a C++ app is the kind of thing that is rarely touched, specially if all your requirements are met by serializing docs at compile time. This means that this deserialization will only be rarely rebuilt, and only under two scenarios: full rebuild, and touching the file.

As far as full rebuilds go, there is no scenario where deserializing JSON represents a relevant task in your build tree.

As for touching the file, if for some weird and unbelievable reason the build step for the JSON deserialization component is deemed too computationally expensive, it's trivial to move this specific component into a subproject that's built independently. This means that the full cost of an incremental build boils down to a) rebuilding your tiny JSON deserialization subproject, b) linking. Step a) runs happily in parallel with any other build task, thus it's impact is meaningless.

To read more on the topic, google for "horizontal architecture", a concept popularized by the book "Large-Scale C++: Process and Architecture, Volume 1" By John Lakos.

Mountain out of a molehill.

OskarS1y ago

There is another scenario where this is an issue: if this code ends up in a header which is included in a lot of places. You might say "that's dumb, don't do that", but there is a real tendency in C++ for things to migrate into headers (because they're templates, because you want them to be aggressively inlined, for convenience, whatever), and then headers get included into other headers, then without knowing it you suddenly have disastrous compile times.

Like, for this particular example, you might start out with a header that looks like:

    SomeData get_data_from_json(std::string_view json);

with nothing else in it, everything else in a .cpp file.

Then somebody comes around and says "we'd like to reuse the parsing logic to get SomeOtherData as well" and your nice, one-line header becomes

    template<typename Ret>
    Ret get_data_from_json(std::string_view json) {
        // .. a gazillion lines of template-heavy code
    }

which ends up without someone noticing it in "CommonUtils.hpp", and now your compiler wants to curl up in a ball and cry every time you build.

It takes more discipline than you think across a team to prevent this from happening, mostly because a lot of people don't take "this takes too long to compile" as a serious complaint if it involves any kind of other trade-off.

2 more replies

adolph1y ago

Brings to mind the old story about a JSON DSL

https://thedailywtf.com/articles/the-inner-json-effect

threatripper1y ago

Is this real? It can't be real. Nobody can be this stupid. But then again it takes a special kind of person who doesn't understand satire to actually do something like that. Somebody, where they would say "we trained him wrong on purpose as a kind of a joke".

5 more replies

kazinator1y ago

> generate C or C++ instead of having the compile do the same thing much slowly

That's a wild-assed guess. A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.

Also, if you use ten code generators for ten different features in a pipeline instead of ten compile-time things built into the language, will that still be faster? What if most files use just use one one or two features? You have to pass them through all the generators just in case; each generator decides whether the file contains anything that it knows how to expand.

pjc501y ago

> You have to pass them through all the generators just in case; each generator decides whether the file contains anything that it knows how to expand.

The C# approach for this is that code generators operate as compiler plugins (and therefore also IDE plugins, so if you report an error from the code generator it goes with all the other compile errors). There is a two-pass approach where your plugin gets to scan the syntax tree quickly for "might be relevant" and then another go later; the first pass is cached.

A limitation of the plugin approach is that your codegen code itself has to be in a separate project that gets compiled first.

An argument in favor of separate-codegen is that if it breaks you can inspect the intermediate code, and indeed do things like breakpoints, logging and inspection in the code generator itself. The C++ approach seems like it might be hard to debug in some situations.

Joker_vD1y ago

> A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.

It also can easily be slower: C++ templates are not exactly known for their blazingly fast compilation speed. Besides, the program they encode in this case is effectively being interpreted by the C++ compiler which, I suppose, is not really optimized for that: it's still mostly oriented around emitting optimized machine code.

1 more reply

jayd161y ago

I'm not taking sides but I don't think a code-gen tool necessitates re-scanning the entire codebase every compile. gRPC would be a good example.

1 more reply

dctwinOP1y ago

Yes, I agree. I don't see much practical use in this. I was just surprised how (relatively) straightforwards this is to do, and thought it was more cool than useful

silon421y ago

Often I also find the opposite problem ... sure, you can do some stuff in (c++) metaprogramming, but can you (at compile time) generate a JSON/XML/YAML file that can be fed to some other part of the system?

1 more reply

chipdart1y ago

> Yes, I agree. I don't see much practical use in this.

Me too. The best example I can come up with is loading test data in automated tests, but even then I wouldn't use this sort of approach.

ulrikrasmussen1y ago

I like how code generation is typically done in Kotlin using KSP. Here you write your code generator as a plugin to the compiler, so you have the full expressivity of any JVM language you like. It also operates on the parsed and resolved AST, so you can analyze even derived types. It also allows code generators to run on code which has type errors or even fails to resolve some symbols which is very useful when you generate code from class annotations and then proceed to use the generated code later in the same file.

Another advantage of using KSP is that it also handles caching for you and will avoid running code generators again if the output already exists.

ranger_danger1y ago

> I am afraid of the compile-time cost.

Still better than Rust /s

dctwinOP1y ago· 9 in thread

Hello! I wrote this short blog post about using pattern-matching-like template metaprogramming to deserialize JSON at build time - please let me know what you think (especially if you see improvements)

worstspotgain1y ago

As someone who used to have to do this sort of compile-time stuff with previous versions of the standard, I'm jealous of how much more can be done now that I don't have to.

If you're looking for an interesting follow-up project, here's something I had to do once that's now become much easier: compute a compile-time hash of the compilation for the current translation unit, e.g. __BASE_FILE__ hashed together with __TIMESTAMP__ or the equivalents for each platform.

This allows you to dynamically invalidate on-disk caches and trigger new-build tripwires based on ongoing revisions. Development and release builds are handled identically: if source file X handles a cache and X was recompiled, discard the cache.

dctwinOP1y ago

Thanks for the idea! Yes constexpr std::vector feels like cheating

whizzter1y ago

It's cute and neat to be able to do it 100% constexpr, however as you mention the indexers feels a tad inelegant.

I've written 2 iterations of a reflection library where you needed to annotate structs slightly with an ugly macro but once done you could just do: Message msg; if (parse_json(str,msg)) { ..process msg struct.. }

The previous iterations were for C++11 and C++17 but it seems that with C++20 features you don't even seem to need the macro uglyness so I personally think libraries need to move in the direction of plain old structs.

svalorzen1y ago

I recently actually tried to do a very similar thing, although a bit tighter in scope. What stopped me what that actually deserializing floating points cannot currently be done at compile time; the only utility available to do so is `from_chars` and it is only constexpr for ints.

I did not see any mention of this in the post; so are you actually simply extracting the string versions of the numbers, without verifying nor deserializing them?

dctwinOP1y ago

I was able to do the primitive

long double result = 0.0;

while (...) {

  if (json[head] == '.') ...

  result *= 10; result += json[head] - '0';

}

in a constexpr function with no problem :)

2 more replies

emmanueloga_1y ago

The concept reminds me of F#'s "Type Providers" [1].

In terms of the implementation ... I feel like C++ is best when used in an "orthodox style" and minimizing the use of templates as much as possible.

1: https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/ty...

goodcanadian1y ago

My experience with template meta-programming: it is hardly ever useful, but on those rare occasions when it is, it is magical!

cobbal1y ago

small note: "JSON in its pure form anarchic" is missing a verb

dctwinOP1y ago

Thank you!

forrestthewoods1y ago· 6 in thread

I think the value of compile-time JSON deserialization is... well I was going to say zero but really it's negative. It's a cute trick, but please don't ever do this in a real project.

dctwinOP1y ago

Despite my writing the article I agree

beached_whale1y ago

So I have owned a library for 6years or so that does constexpr JSON to data structures, JSON Link. There are a few benefits and in the near future with #embed it gets even better. The big benefit is that we can now get earlier errors and do testing in constexpr that gives more guarantees around the areas of core UB and in most implementations they add constexpr checked preconditions on the std library too. But, just because it is marked constexpr, doesn't mean it will be run at compile time. This also, limits the shenanigans that the library dev can do to get potential perf and work around design limitations.

In JSON Link's case, since it was using C++17 at the time, it forced me to think around the problem of allocation and who does it. The library does not allocate, but potentially the data structures being deserialized to will. In C++20 you can get limited constexpr allocations but they are good for things like stacks and eliminating the fixed buffers many devs have used in the past; which is a good thing on it's own but isn't really allowing one to parse to a vector at compile time(as in OP's example) for things that persist.

Where this will get really interesting, though, is when #embed is in the major compilers. It's mostly there in clang, with gcc on the way I believe. It will open the door for DSL's and compile time configs in human readable formats or interop with other tools(maybe GUI designers)

As for OP's library, I am not a fan of the json_value like library approach that treats JSON as a thing to care about when it is usually just an imp detail to move to ones business objects.

TL;DR The big benefit though, is the ability to reason about the quality of the code in the library and have stronger testing.

forrestthewoods1y ago

That's a slightly more interesting use case. But "detect data errors at code compile-time via ultra complex comptime template metaprogramming" does not strike me as a particularly good idea. There are much better, easier ways to detect data errors. And the value of baking JSON data into an executable (even if it's transformed) is highly suspect imho.

1 more reply

pragma_x1y ago

Since you've done this for real in a library, I have to ask: how would you decide to use a compile-time template solution like this versus a code generator or some other "outboard" tool to generate code?

I'm curious since I've gone back and forth on this in my own career. Both approaches come with their own pros and cons, but each get us to the same place.

1 more reply

dctwinOP1y ago

You're right to point out that this is really 'first class JSON', rather than the Pydantic/Jackson type thing where the json barely exists and is immediately transformed into your models and classes.

Thanks for reading the article though, that's cool. I am a daw_json_link fan

1 more reply

delfinom1y ago

You clearly never wrote a Fizz Buzz enterprise grade application ;)

nikeee1y ago· 3 in thread

Could this be leveraged to emit a parser that is specialized for the provided type that can be used at runtime? Afaik .NET does something like that using code generators.

The advantage being that the parser is tailored to the specific type that is deserialized and it writes directly to the struct's fields instead of going through some dictionary.

leni5361y ago

This lib does something like that:

https://github.com/beached/daw_json_link

nikeee1y ago

It seems that you have to maintain hand-coded mappings for each type. Maybe this could be solved by using C++23's compile-time reflections.

dctwinOP1y ago

Yes, the nonconstexpr version does just that, unless I misunderstood your question. See also boost::spirit for a 'big' version of this

anothername121y ago· 3 in thread

Not a C++ user, but is this the same as #. reader macro in Common Lisp?

kazinator1y ago

Also, this:

  (defmacro macro-time (&rest forms)
    `(quote ,(eval `(progn ,@forms))))

forms are evaluated at macro-expansion-time, and their result is quoted, and substituted for the (macro-time ...) invocation.

For instance, if we have a snarf-file function which reads a text file and returns the contents as a string, we can do:

  (macro-time (snarf-file "foo.txt"))

and we now have the contents of foo.txt as a string literal.

heisig1y ago

Yes, the #. reader macro is one of the ways how you can achieve this in Common Lisp. Using the reader macro is also way more efficient because you don't awkwardly use your compiler as an interpreter for a weird subset of your actual language - you simply call to compiled code.

Seeing Greenspun's tenth rule [1] in action again and again is one of the weird things we Common Lisp programmers have to endure. I wish we would have more discussions on how to improve Lisp even further instead of trying to 'fix' C or C++ for the umpteenth time.

[1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

ykonstant1y ago

>I wish we would have more discussions on how to improve Lisp even further instead of trying to 'fix' C or C++ for the umpteenth time.

I agree one million percent; projects like SBCL are great, but my impression is that there are tons of improvements to be had in producing optimized code for modern processors (cache friendliness, SIMD, etc), GPU programming etc. I asked about efforts in those directions here and there, but did not get very clear answers.

1 more reply

nikki931y ago· 2 in thread

I use this static reflection hack in C++ -- https://godbolt.org/z/enh8za4ja

You do have to tag struct fields with a macro, but you can attach contexpr-visitable attributes. There's also a static limit to how many reflectable fields you can have, all reflectable fields need to be at the front of the struct, and the struct needs to be an aggregate.

gpderetta1y ago

that forEachProp function... it brings back nightmares of when, before variadics, we used to macro generate up-to N-arity functions (with all the const/non-const permutations(.

Now I use the same trick in our code base to generically hash aggregates, but I limit it to 4 fields for sanity.

asguy1y ago

Holy crap; that's pretty epic. Did you come up with that yourself?

abbeyj1y ago· 2 in thread

Could you use something like `template <StringLiteral str> constexpr inline Key<str> key;`? Then you could write `key<"myKey">` instead of `Key<"myKey">{}`, saving you from needing the `{}` each time.

dctwinOP1y ago

Oh I think I finally groked what you suggested! Something like

template <StringLiteral str> static constexpr Key<str> key

in the class namespace - I think this this would work, but if I understand this correctly, You would need to do like

User user {...}; user[User::key<"myKey">]

Which actually is not so bad...

dctwinOP1y ago

Hm - so this would instantiate a variable for each key in the class namespace? I admit I haven't seen anything like this but sounds very interesting

fsloth1y ago· 2 in thread

What a beautiful example of abuse of C++ templates. I love it.

But please don’t do this in production.

What ever you need to do, use C++ templates as the last resort because you’ve figured out all other approaches suck even more. Maintaining template heavy code is absolutely horrible and wasteful (and if it’s C++ production code we measure it’s lifetime in decades). And no, there is no way ”to do it correctly so it doesn’t suck”.

Templates belong to the lowest abstraction levels - as stl mostly does. Anyhting more prevalent is an abomination.

If the schema is fixed, have types with the data and if you have a default data, provide it using initializer lists.

Ie. have a struct or structs with explicit serializeToJson and deserializeFromJson functions.

It’s faster to write than figuring out the correct template gymnastics and about 100x easier to maintain and extend.

qsdf381001y ago

This reminds me of some coworkers I had that moaned anytime they saw 'template' in some code. They were convinced that templates were just bad, and that they should stay in the STL. Some of these people would then proceed to use void* and enums to perform the same computations (the C++ haters kind), or use virtual functions all over the place (the java-background kind). Not only was the result much more fragile (compile errors are now runtime errors), but it would also prevent inlining, not to mention the dynamic memory allocation fest.

fsloth1y ago

No, don’t abuse the language.

Wrap everything in types and all is fine.

Verbose C++ is the only good kind of C++. Why?

C++ code needs to be debuggable and modifiable so when a profiler shows hotspots, you know where they are coming from and react appropriately.

How do you fix a hotspot on a one line of code somewhere in the middle of a template thingsmajic?

You don’t. You need to unroll the template code to untemplated code and the fix the hotspot.

Cases where you don’t need to fix one line hotspots because the resources consumed by the code are irrelevant are fine. But if performance does not matter it likely means you should use a bette language than C++.

Using C++ and not caring about performance finetuning is the worst of both worlds - you are using a cumbersome language AND it’s not even for any practical benefit.

actionfromafar1y ago

Where are the functional language programmers so I can hold their beer?

j / k navigate · click thread line to collapse

88 comments

49 comments · 9 top-level

stephc_int131y ago· 13 in thread

I am afraid of the compile-time cost.

For this kind of things I tend to prefer using a simpler program (written in anything you like) to generate C or C++ instead of having the compile do the same thing much slowly.

Meta programming can be good, but it is even better done with an actual meta program, IMO.

chipdart1y ago

> I am afraid of the compile-time cost.

Even though compilation time is the bane of C++, I think this concern regarding this specific usage is grossly overblown. I'm going to tell you why.

As far as full rebuilds go, there is no scenario where deserializing JSON represents a relevant task in your build tree.

To read more on the topic, google for "horizontal architecture", a concept popularized by the book "Large-Scale C++: Process and Architecture, Volume 1" By John Lakos.

Mountain out of a molehill.

OskarS1y ago

Like, for this particular example, you might start out with a header that looks like:

    SomeData get_data_from_json(std::string_view json);

with nothing else in it, everything else in a .cpp file.

Then somebody comes around and says "we'd like to reuse the parsing logic to get SomeOtherData as well" and your nice, one-line header becomes

    template<typename Ret>
    Ret get_data_from_json(std::string_view json) {
        // .. a gazillion lines of template-heavy code
    }

which ends up without someone noticing it in "CommonUtils.hpp", and now your compiler wants to curl up in a ball and cry every time you build.

2 more replies

adolph1y ago

Brings to mind the old story about a JSON DSL

https://thedailywtf.com/articles/the-inner-json-effect

threatripper1y ago

5 more replies

kazinator1y ago

> generate C or C++ instead of having the compile do the same thing much slowly

That's a wild-assed guess. A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.

pjc501y ago

> You have to pass them through all the generators just in case; each generator decides whether the file contains anything that it knows how to expand.

A limitation of the plugin approach is that your codegen code itself has to be in a separate project that gets compiled first.

Joker_vD1y ago

> A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.

1 more reply

jayd161y ago

I'm not taking sides but I don't think a code-gen tool necessitates re-scanning the entire codebase every compile. gRPC would be a good example.

1 more reply

dctwinOP1y ago

Yes, I agree. I don't see much practical use in this. I was just surprised how (relatively) straightforwards this is to do, and thought it was more cool than useful

silon421y ago

1 more reply

chipdart1y ago

> Yes, I agree. I don't see much practical use in this.

Me too. The best example I can come up with is loading test data in automated tests, but even then I wouldn't use this sort of approach.

ulrikrasmussen1y ago

Another advantage of using KSP is that it also handles caching for you and will avoid running code generators again if the output already exists.

ranger_danger1y ago

> I am afraid of the compile-time cost.

Still better than Rust /s

dctwinOP1y ago· 9 in thread

worstspotgain1y ago

As someone who used to have to do this sort of compile-time stuff with previous versions of the standard, I'm jealous of how much more can be done now that I don't have to.

dctwinOP1y ago

Thanks for the idea! Yes constexpr std::vector feels like cheating

whizzter1y ago

It's cute and neat to be able to do it 100% constexpr, however as you mention the indexers feels a tad inelegant.

svalorzen1y ago

I did not see any mention of this in the post; so are you actually simply extracting the string versions of the numbers, without verifying nor deserializing them?

dctwinOP1y ago

I was able to do the primitive

long double result = 0.0;

while (...) {

  if (json[head] == '.') ...

  result *= 10; result += json[head] - '0';

}

in a constexpr function with no problem :)

2 more replies

emmanueloga_1y ago

The concept reminds me of F#'s "Type Providers" [1].

In terms of the implementation ... I feel like C++ is best when used in an "orthodox style" and minimizing the use of templates as much as possible.

1: https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/ty...

goodcanadian1y ago

My experience with template meta-programming: it is hardly ever useful, but on those rare occasions when it is, it is magical!

cobbal1y ago

small note: "JSON in its pure form anarchic" is missing a verb

dctwinOP1y ago

Thank you!

forrestthewoods1y ago· 6 in thread

I think the value of compile-time JSON deserialization is... well I was going to say zero but really it's negative. It's a cute trick, but please don't ever do this in a real project.

dctwinOP1y ago

Despite my writing the article I agree

beached_whale1y ago

As for OP's library, I am not a fan of the json_value like library approach that treats JSON as a thing to care about when it is usually just an imp detail to move to ones business objects.

TL;DR The big benefit though, is the ability to reason about the quality of the code in the library and have stronger testing.

forrestthewoods1y ago

1 more reply

pragma_x1y ago

I'm curious since I've gone back and forth on this in my own career. Both approaches come with their own pros and cons, but each get us to the same place.

1 more reply

dctwinOP1y ago

You're right to point out that this is really 'first class JSON', rather than the Pydantic/Jackson type thing where the json barely exists and is immediately transformed into your models and classes.

Thanks for reading the article though, that's cool. I am a daw_json_link fan

1 more reply

delfinom1y ago

You clearly never wrote a Fizz Buzz enterprise grade application ;)

nikeee1y ago· 3 in thread

Could this be leveraged to emit a parser that is specialized for the provided type that can be used at runtime? Afaik .NET does something like that using code generators.

The advantage being that the parser is tailored to the specific type that is deserialized and it writes directly to the struct's fields instead of going through some dictionary.

leni5361y ago

This lib does something like that:

https://github.com/beached/daw_json_link

nikeee1y ago

It seems that you have to maintain hand-coded mappings for each type. Maybe this could be solved by using C++23's compile-time reflections.

dctwinOP1y ago

Yes, the nonconstexpr version does just that, unless I misunderstood your question. See also boost::spirit for a 'big' version of this

anothername121y ago· 3 in thread

Not a C++ user, but is this the same as #. reader macro in Common Lisp?

kazinator1y ago

Also, this:

  (defmacro macro-time (&rest forms)
    `(quote ,(eval `(progn ,@forms))))

forms are evaluated at macro-expansion-time, and their result is quoted, and substituted for the (macro-time ...) invocation.

For instance, if we have a snarf-file function which reads a text file and returns the contents as a string, we can do:

  (macro-time (snarf-file "foo.txt"))

and we now have the contents of foo.txt as a string literal.

heisig1y ago

[1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

ykonstant1y ago

>I wish we would have more discussions on how to improve Lisp even further instead of trying to 'fix' C or C++ for the umpteenth time.

1 more reply

nikki931y ago· 2 in thread

I use this static reflection hack in C++ -- https://godbolt.org/z/enh8za4ja

gpderetta1y ago

that forEachProp function... it brings back nightmares of when, before variadics, we used to macro generate up-to N-arity functions (with all the const/non-const permutations(.

Now I use the same trick in our code base to generically hash aggregates, but I limit it to 4 fields for sanity.

asguy1y ago

Holy crap; that's pretty epic. Did you come up with that yourself?

abbeyj1y ago· 2 in thread

dctwinOP1y ago

Oh I think I finally groked what you suggested! Something like

template <StringLiteral str> static constexpr Key<str> key

in the class namespace - I think this this would work, but if I understand this correctly, You would need to do like

User user {...}; user[User::key<"myKey">]

Which actually is not so bad...

dctwinOP1y ago

Hm - so this would instantiate a variable for each key in the class namespace? I admit I haven't seen anything like this but sounds very interesting

fsloth1y ago· 2 in thread

What a beautiful example of abuse of C++ templates. I love it.

But please don’t do this in production.

Templates belong to the lowest abstraction levels - as stl mostly does. Anyhting more prevalent is an abomination.

If the schema is fixed, have types with the data and if you have a default data, provide it using initializer lists.

Ie. have a struct or structs with explicit serializeToJson and deserializeFromJson functions.

It’s faster to write than figuring out the correct template gymnastics and about 100x easier to maintain and extend.

qsdf381001y ago

fsloth1y ago

No, don’t abuse the language.

Wrap everything in types and all is fine.

Verbose C++ is the only good kind of C++. Why?

C++ code needs to be debuggable and modifiable so when a profiler shows hotspots, you know where they are coming from and react appropriately.

How do you fix a hotspot on a one line of code somewhere in the middle of a template thingsmajic?

You don’t. You need to unroll the template code to untemplated code and the fix the hotspot.

Using C++ and not caring about performance finetuning is the worst of both worlds - you are using a cumbersome language AND it’s not even for any practical benefit.

actionfromafar1y ago

Where are the functional language programmers so I can hold their beer?

j / k navigate · click thread line to collapse