I made a mistake in the article, where I meant to write "there are at least 2 solutions and here is one example". I will fix this. Nice catch.
3821+0468=04289 as well.
If anyone is interested: An accessible python module for this is python-constraint. Gecode is a powerful, modern C++ CP Framework.
I recommend reading Knuth's The Art of Computer Programming: Volume 4 Fascicle 5. Knuth covers the creation of word-puzzles such as crossword puzzles (embedding words into a 2-dimension plane that fits).
Knuth also covers the "Create a comma-free code" problem to a high amount of detail, which probably is most related to your question of "creating puzzles of a certain type".
TAOCP Fascicle 5 is pretty short but is a very difficult read due to the difficulty of the material. But it really does cover what you're asking for.
Given that one is typically limited to at most 10 different letters (since each letter must have a unique corresponding digit), the number of possible word combinations is reasonably limited as well.
According to the website below, there are 5,454 four letter words, 12,478 five letter words and 22,157 six letter words (assuming one stops at six letter words).
https://www.bestwordlist.com/6letterwords.htm
If we allow well known names (sports stars, movie stars, presidents etc) then this list might become very large.
"Maintaining Arc Consistency" is actually asymptotically equivalent (in the same O(n^2) complexity class as FC), but textbooks only list AC3 typically (which is suboptimal but pretty easy to understand. I'd compare it to "insertion sort"). AC4 is difficult to program and understand (especially difficult to extend beyond binary constraints), but worst-case asymptotically optimal.
Today: there's AC2001, a simple extension to AC3, which has best-case attributes similar to AC3 but is worst-case asymptotically optimal. Its also "obvious" how to extend into GAC (aka: the GAC2001 algorithm) to handle non-binary constraints.
This is an NP-complete problem, so there's really no way to know if FC, MAC (with AC2001), or even Path-consistency (a consistency level above MAC) is optimal. I guess the rule of thumb is that if there's "lots of solutions" out there, then you want to spend less time checking: and maybe FC is better.
If there are "very very few solutions", then MAC (and even Path-consistency) may be superior. A stronger check leads to cutting more of the search tree away, but requires a bigger-and-bigger preprocessor at every step.
The textbooks I've read seem to default to FC. But given the discovery of AC2001, I think the modern default probably should be MAC using AC2001.
----------------
Or in the "Sudoku terms", how long do you want to spend searching the tree, vs how long do you want to spend proving that the current search isn't "hopeless" ??
The longer you ensure that your "current choice" is "not hopeless", the longer your best-case search takes. The less time you spend on checking for "hopelessness", the longer the worst-case search takes (because you'll run into more hopeless situations but still spend tons of time searching through it).
FC, MAC (aka 2-consistency), Path Consistency (aka 3-consistency), and 4-Consistency, 5-consistency, 6-consistency... you can arbitrarily spend as much time as you like on checking for hopelessness. If you have 99-variables (like the 99-locations in Sudoku), checking for 99-consistency is equivalent to solving the puzzle in every possible way (If multiple solutions are possible, 99-consistency would find them all!!!). As such, 100-consistency cannot exist if you only have 99-variables, so 99-consistency is the limit on Sudoku.
Your "Big O" is something like (O(d^k)), where d is the size of domains (9 in Sudoku), and k is the consistency level you're aiming for. So O(9^2-ish) is what you're looking at for FC and MAC. O(9^99) is what you're looking at for 99-consistency. So aiming for higher levels of consistency scales poorly to say the least... and textbooks only briefly mention Path (3-consistency) and k-consistency in passing.
In practice, if you're in fact trying to "find all solutions", the brute force 99-variable == 99-consistency is suboptimal. In practice, you might find a 55-consistency relationship that's backtrack free (exponentially faster than 99-consistency), but this involves solving some rather nasty graph problems. Its absolutely worth the tradeoff but there be dragons here. (Finding the optimal ordering of variables means solving for the minimum-induced width of an induced graph of the variables, which is an NP-complete problem)
--------
There's also DAC: Directional Arc Consistency. Which is "more consistent" than FC but "less consistent" than full Arc Consistency (aka: MAC/Maintaining Arc Consistency). A naive implementation of DAC will be asymptotically optimal.
Needless to say: researchers have spent a lot of time looking at various "consistency" benchmarks for these backtracking problems.
With that said, sometimes shaving (also called singleton arc consistency) is used, by testing all literal assignments and pruning those that directly give failure. See https://www.inf.tu-dresden.de/content/institutes/ki/cl/study... for an application of this to Sudoku
I'm not too familiar with many kinds of constraints. But the constraints I am familiar with such as "Global Cardinality Constraint" (which is a generalization upon all_different) reaches arc-consistency after a basic "maximal flow" algorithm is applied to it.
Indeed: proving that the maximum-flow algorithm over global-cardinality-constraint (aka all_different) reaches arc-consistency proves that you can reach arc-consistency over the whole problem set by iterating over the fixed point.
-------
In effect, we quickly learn to reach arc-consistency using a different algorithm (ex: maximum flow) over the "all_different(x1, …, xn)".
You're right in that "arc-consistency classic" is not being done here. But the effect is the same. Maximum-flow(all_different(x1, x2... xn)) will lead to arc-consistency with the all_different constraint.
Now that you're "locally arc-consistent", you iterate over "a+b<=c" and make that constraint arc-consistent. If the domains are restricted again, you run the maximum-flow algorithm over all_different(x1, x2... xn) and achieve arc-consistency again. (Bonus points: the "last time" you ran maximum_flow is still relevant and serves to accelerate the 2nd run of the maximum_flow problem. That residual graph is still correct!! You just need to slightly update it given the new restrictions on the domains)
-------
In case you're curious: the all_different constraint being arc-consistent with maximum flow is pretty fun to think about. All variables are on one-side of a bipartite graph, while all assignments are on the 2nd side of a bipartite graph. Connect all variables to assigments that match their domains.
Connect a "super-source" to all variables, and "super-sink" to all values. Super-sink has a value of "1" for each connection to each variable (aka: each variable can only be assigned once).
Solve for maximum flow from super-source to the super-sink. Now that you have one particular maximum-flow (a possible assignment), it is possible to determine all other assignments by iterating the possibilities where the flow goes but remains maximum.