I'm curious, what kind of error are we talking about? Words that don't exist, or another solution to a problem that may not have a unique solution?
- There was one with a name that suggested an Alice in Wonderland connection, and it had an answer "THE LOOKING GLASS" (no spaces) running vertically down the full length of the center of the grid.
Every across answer that was entirely to the left of that was written normally. Every across answer entirely to the right was written backwards. Every across answer that crossed the center was a palindrome centered on the center.
- There was one where several answers were triple Spoonerisms of well known phrases.
For example, the answer "THE STUCK HOPS BEER" for the clue "Tagline in an ad for Elmer's Glue-Ale". Rotate the ST from STUCK, the H from HOPS, and the B from BEER and you get "THE BUCK STOPS HERE".
- I remember one that had a few isolated black squares, and a theme that suggested the puzzle had something to do with roundabouts.
Those black squares were roundabouts. Answers would hit the roundabout and continue after a 90 degree turn.
- I remember one where the theme was something like "What goes up must come down". That had several answers that, like the roundabout one above, would take a 90 degree turn from across but it was a left turn so after the turn they went up. Then there would be a down answer whose start was the reversed end of that across answer that would come down, also make a left tern, and continue across to the right.
I'm interested in your magical solution because this is a hard problem.
My favorite example of using this is the 1996-11-05, the day of the presidential election, NY Times crossword with the clue "Lead story in tomorrow's newspaper (!), with 43-Across". The puzzle worked with both "CLINTON" or "BOBDOLE" in the crossword.
However, with a combination of a bad clue and a bit of bad luck it does sometimes happen that an experienced crossword solver has to take a guess before sending in their solution. I'm not an experienced crossword solver, but I've seen them at work, and I would guess that finding a good alternative solution happens less than one time in 50. That's just a guess, though. For a better estimate one should systematically compare the solutions produced by competent crossword solvers with the official solutions.
Of course, people don't like to play crosswords like sudoku, blindly matching letters, so there does have to be some skill to the clue design to have a relatively small amount of plausible answers, if not being entirely unique.
I wrote it because I wanted to make my dad (a huge nyt crossword fan) a custom crossword for his birthday. I put in a bunch of phrases related to him and our family and let the program fill in the rest. It was a huge hit, never really went back to it though. Anyone know if anything else like this exists?
Especially if you can generate them from arbitrary lists of of clues:words.
The trick is to have a good word list… I’m working on one for German, but it’s a bit of an undertaking and I guess pretty subjective.
This is really hard for supervised learning - the reward is quite sparse (e.g. did you get it right / how many characters did you get right) but the task is reasonably complex (e.g. it would have to learn how to spot/execute arbitrary length anagrams on its own, already something that is nontrivial for ML). Sparse + complex usually means gradient descent will fail or converge to a more trivial minima e.g. only look for synonyms in the clue.
I reckon you would have to codify the different types of cryptic clues manually for this work well.
https://unlikely.ai/cryptic-crossword-genius-unlikely-ai-art... has a little more detail.
https://crosswordgenius.com/clue/who-may-get-drunk-with-real... example solution with anagram.
An unsupervised learner could probably do reasonably well at picking up on those patterns, even to the point of considering that a word it hasn’t seen used as an anagram signifier before might be doing that job in a particular clue.
On the other hand, machine translation learning has moved away from using tagged parts of speech as far as I’m aware, and it has nonetheless managed to develop sufficient internal modeling that it is as if it has learned parts-of-speech tagging; it’s possible that an ML on a cryptic clue corpus could develop those same hidden models.
yet from the wiki:
> Probabilities for individual words or phrases in the puzzle are computed using relatively simple statistical techniques based on features such as previous appearances of the clue, number of Google hits for the fill. https://en.wikipedia.org/wiki/Dr.Fill
I don't think it's fair to describe it as a closed-system if it's Googling stuff, even for pruning.
Then an editor will refine the clues to fit whatever criteria they may have for publication.