For this kind of things I tend to prefer using a simpler program (written in anything you like) to generate C or C++ instead of having the compile do the same thing much slowly.
Meta programming can be good, but it is even better done with an actual meta program, IMO.
Even though compilation time is the bane of C++, I think this concern regarding this specific usage is grossly overblown. I'm going to tell you why.
With incremental builds you only rebuild whatever has changed in your project. Embedding JSON documents in a C++ app is the kind of thing that is rarely touched, specially if all your requirements are met by serializing docs at compile time. This means that this deserialization will only be rarely rebuilt, and only under two scenarios: full rebuild, and touching the file.
As far as full rebuilds go, there is no scenario where deserializing JSON represents a relevant task in your build tree.
As for touching the file, if for some weird and unbelievable reason the build step for the JSON deserialization component is deemed too computationally expensive, it's trivial to move this specific component into a subproject that's built independently. This means that the full cost of an incremental build boils down to a) rebuilding your tiny JSON deserialization subproject, b) linking. Step a) runs happily in parallel with any other build task, thus it's impact is meaningless.
To read more on the topic, google for "horizontal architecture", a concept popularized by the book "Large-Scale C++: Process and Architecture, Volume 1" By John Lakos.
Mountain out of a molehill.
Like, for this particular example, you might start out with a header that looks like:
SomeData get_data_from_json(std::string_view json);
with nothing else in it, everything else in a .cpp file.Then somebody comes around and says "we'd like to reuse the parsing logic to get SomeOtherData as well" and your nice, one-line header becomes
template<typename Ret>
Ret get_data_from_json(std::string_view json) {
// .. a gazillion lines of template-heavy code
}
which ends up without someone noticing it in "CommonUtils.hpp", and now your compiler wants to curl up in a ball and cry every time you build.It takes more discipline than you think across a team to prevent this from happening, mostly because a lot of people don't take "this takes too long to compile" as a serious complaint if it involves any kind of other trade-off.
That's a wild-assed guess. A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.
Also, if you use ten code generators for ten different features in a pipeline instead of ten compile-time things built into the language, will that still be faster? What if most files use just use one one or two features? You have to pass them through all the generators just in case; each generator decides whether the file contains anything that it knows how to expand.
The C# approach for this is that code generators operate as compiler plugins (and therefore also IDE plugins, so if you report an error from the code generator it goes with all the other compile errors). There is a two-pass approach where your plugin gets to scan the syntax tree quickly for "might be relevant" and then another go later; the first pass is cached.
A limitation of the plugin approach is that your codegen code itself has to be in a separate project that gets compiled first.
An argument in favor of separate-codegen is that if it breaks you can inspect the intermediate code, and indeed do things like breakpoints, logging and inspection in the code generator itself. The C++ approach seems like it might be hard to debug in some situations.
It also can easily be slower: C++ templates are not exactly known for their blazingly fast compilation speed. Besides, the program they encode in this case is effectively being interpreted by the C++ compiler which, I suppose, is not really optimized for that: it's still mostly oriented around emitting optimized machine code.
Me too. The best example I can come up with is loading test data in automated tests, but even then I wouldn't use this sort of approach.
Another advantage of using KSP is that it also handles caching for you and will avoid running code generators again if the output already exists.
Still better than Rust /s
If you're looking for an interesting follow-up project, here's something I had to do once that's now become much easier: compute a compile-time hash of the compilation for the current translation unit, e.g. __BASE_FILE__ hashed together with __TIMESTAMP__ or the equivalents for each platform.
This allows you to dynamically invalidate on-disk caches and trigger new-build tripwires based on ongoing revisions. Development and release builds are handled identically: if source file X handles a cache and X was recompiled, discard the cache.
I've written 2 iterations of a reflection library where you needed to annotate structs slightly with an ugly macro but once done you could just do: Message msg; if (parse_json(str,msg)) { ..process msg struct.. }
The previous iterations were for C++11 and C++17 but it seems that with C++20 features you don't even seem to need the macro uglyness so I personally think libraries need to move in the direction of plain old structs.
I did not see any mention of this in the post; so are you actually simply extracting the string versions of the numbers, without verifying nor deserializing them?
long double result = 0.0;
while (...) {
if (json[head] == '.') ...
result *= 10; result += json[head] - '0';
}in a constexpr function with no problem :)
In terms of the implementation ... I feel like C++ is best when used in an "orthodox style" and minimizing the use of templates as much as possible.
--
1: https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/ty...
In JSON Link's case, since it was using C++17 at the time, it forced me to think around the problem of allocation and who does it. The library does not allocate, but potentially the data structures being deserialized to will. In C++20 you can get limited constexpr allocations but they are good for things like stacks and eliminating the fixed buffers many devs have used in the past; which is a good thing on it's own but isn't really allowing one to parse to a vector at compile time(as in OP's example) for things that persist.
Where this will get really interesting, though, is when #embed is in the major compilers. It's mostly there in clang, with gcc on the way I believe. It will open the door for DSL's and compile time configs in human readable formats or interop with other tools(maybe GUI designers)
As for OP's library, I am not a fan of the json_value like library approach that treats JSON as a thing to care about when it is usually just an imp detail to move to ones business objects.
TL;DR The big benefit though, is the ability to reason about the quality of the code in the library and have stronger testing.
I'm curious since I've gone back and forth on this in my own career. Both approaches come with their own pros and cons, but each get us to the same place.
Thanks for reading the article though, that's cool. I am a daw_json_link fan
The advantage being that the parser is tailored to the specific type that is deserialized and it writes directly to the struct's fields instead of going through some dictionary.
(defmacro macro-time (&rest forms)
`(quote ,(eval `(progn ,@forms))))
forms are evaluated at macro-expansion-time, and their result is quoted, and substituted for the (macro-time ...) invocation.For instance, if we have a snarf-file function which reads a text file and returns the contents as a string, we can do:
(macro-time (snarf-file "foo.txt"))
and we now have the contents of foo.txt as a string literal.Seeing Greenspun's tenth rule [1] in action again and again is one of the weird things we Common Lisp programmers have to endure. I wish we would have more discussions on how to improve Lisp even further instead of trying to 'fix' C or C++ for the umpteenth time.
I agree one million percent; projects like SBCL are great, but my impression is that there are tons of improvements to be had in producing optimized code for modern processors (cache friendliness, SIMD, etc), GPU programming etc. I asked about efforts in those directions here and there, but did not get very clear answers.
You do have to tag struct fields with a macro, but you can attach contexpr-visitable attributes. There's also a static limit to how many reflectable fields you can have, all reflectable fields need to be at the front of the struct, and the struct needs to be an aggregate.
Now I use the same trick in our code base to generically hash aggregates, but I limit it to 4 fields for sanity.
template <StringLiteral str> static constexpr Key<str> key
in the class namespace - I think this this would work, but if I understand this correctly, You would need to do like
User user {...}; user[User::key<"myKey">]
Which actually is not so bad...
But please don’t do this in production.
What ever you need to do, use C++ templates as the last resort because you’ve figured out all other approaches suck even more. Maintaining template heavy code is absolutely horrible and wasteful (and if it’s C++ production code we measure it’s lifetime in decades). And no, there is no way ”to do it correctly so it doesn’t suck”.
Templates belong to the lowest abstraction levels - as stl mostly does. Anyhting more prevalent is an abomination.
If the schema is fixed, have types with the data and if you have a default data, provide it using initializer lists.
Ie. have a struct or structs with explicit serializeToJson and deserializeFromJson functions.
It’s faster to write than figuring out the correct template gymnastics and about 100x easier to maintain and extend.
Wrap everything in types and all is fine.
Verbose C++ is the only good kind of C++. Why?
C++ code needs to be debuggable and modifiable so when a profiler shows hotspots, you know where they are coming from and react appropriately.
How do you fix a hotspot on a one line of code somewhere in the middle of a template thingsmajic?
You don’t. You need to unroll the template code to untemplated code and the fix the hotspot.
Cases where you don’t need to fix one line hotspots because the resources consumed by the code are irrelevant are fine. But if performance does not matter it likely means you should use a bette language than C++.
Using C++ and not caring about performance finetuning is the worst of both worlds - you are using a cumbersome language AND it’s not even for any practical benefit.