Instead you could do this as
\documentclass{article}
\usepackage{xparse}
\NewDocumentCommand \LambdaCalc {u{.} r()} {%
[arg:(#1) body:(#2)]
}
\DeclareUnicodeCharacter {03BB} {\LambdaCalc}
\begin{document}
λx.(2x)
\end{document}> these are predefined as `\@firstoftwo` and `\@secondoftwo`
I do wish LaTeX kernel commands (which I'm assuming these are) were more widely documented. As it stands, it's pretty hard to keep track of what already exists. Is there a nice reference for those?
> Also the Unicode bytes are already active so setting their catcode is useless.
This is true for LaTeX and not TeX, correct? Originally, I'd `\expandafter\let\expandafter\@firstoct\@firstoftwoλ`, but I decided not to assume that that character was already active.
> Also redefining the first octet breaks LaTeX's UTF-8 handling...
How so? (If the else case wasn't broken)
>...and the else case forms an infinite loop.
If `\Firstλ` was not an active character, would this still be true? Since I store `\Firstλ` in `\lambda@first@oct` before it's declared an active character.
> and it breaks other uses of `(` and `)` in the argument.
This is not a concern for the DSL, but...
> Changing the catcodes of `(` and `)` means that this command doesn't work in the arguments of other commands
...this is. Thanks.
> Instead you could do this as
Damn :)
Thanks for the nice feedback. I suppose I should read up on xparse. In any case I feel like it's not moot to try to achieve the same results with primitives, to have some idea of what's breaking when a given program doesn't compile (usually at that point the primitives surface).
Not really, the traditional commands are rather messy. Of course you can read source2e, but that's not really documentation. For new stuff it often makes sense to write the more programmy stuff in expl3 which is much better documented in interface3. (It contains these commands as `\use_i:nn` and `\use_ii:nn`)
> > Also the Unicode bytes are already active so setting their catcode is useless. > > This is true for LaTeX and not TeX, correct?
Right, this is LaTeX specific.
> Also redefining the first octet breaks LaTeX's UTF-8 handling... > > How so? (If the else case wasn't broken)
LaTeX's definition of the first byte handles arbitrary valid UTF-8 following bytes by using corresponding definitions or printing correct errors, while even a definition which wouldn't trigger the active character again would just print the two bytes which does not print a useful error message and probably prints two random characters from the font, completely ignoring any potential definition using LaTeX's mechanism for other codepoints starting with this byte.
>...and the else case forms an infinite loop. > > If `\Firstλ` was not an active character, would this still be true? Since I store `\Firstλ` in `\lambda@first@oct` before it's declared an active character.
You are correct, if the first byte wouldn't already be an active character (e.g. in plain TeX) then it wouldn't loop. It wouldn't expand to something particularly useful, but that wouldn't be any worse than without the definition so it would be "correct".
> I suppose I should read up on xparse.
Normally `xparse` is preloaded and not a package anymore, therefore also it's documentation has been moved into usrguide3. In this case you still need the package though since the `d` argument type has not been added to the kernel (and therefore also not to usrguide3) since delimited arguments are not recommended for LaTeX commands. It's still documented in the old `xparse` manual though. Just in case you're wondering about the split.
Half of the post is about handling UTF-8, which AFAIK both LuaTeX and XeTeX (you really shoulduse either) do natively.
LuaTeX and XeTeX usually aren't an option where LaTeX comes up, i.e., in academic submissions. This is a common discussion, see [the comments under my previous post].
> LaTeX is great for typesetting math.
Q: Ok, great! So how do I typeset this bit of common math?
A: a 20-line barrage of import statements, makeatletter's and definitions that you copy-paste into your preamble and cross your fingers that it won't conflict with the half-dozen other barrages that you copied there to do other bits of common math, often hidden between other Google results with wildly different answers.
About the posted article: if all one wanted to do was "typeset this bit of common math", one can just type "\lambda x.(2x)" in math mode. Or, if not constrained to keep it old school i.e. pdfTeX, use XeTeX/LuaTeX with \usepackage{unicode-math}, to type "λ x.(2x)" directly.
The posted article is actually about doing some parsing using TeX, namely the author wants to type "λ x.(2x)" into their .tex file and have it be parsed into, say, [arg:(x) body:(2x)] to be used later for whatever they're building. This is not related to typesetting at all, so why do they want to do such a thing in TeX, instead of doing it outside and using TeX just for typesetting? The motivation seems to be, as their footnote 2 indicates, that some people just enjoy being perverse. That's fine!
Even there, if you compare the author's approach with that in the comment here https://news.ycombinator.com/item?id=33296527 (by someone who knows what they're doing; cf. https://www.latex-project.org/about/team/), you'll see how the "right" way is less forbidding-looking, and also less breakage-prone. What's going on is that the author has just learned something new (how Unicode is handled in pdfTeX even though it only works with 8-bit bytes), become excited at the possibilities, and hacked their own solution using the primitives, without bothering to integrate with the broader ecosystem of other packages and conventions — which is also fine; TeX will let you do that and not get in the way.
The real interesting question raised by your comment IMO is not at all about the posted article but about experiences such as those in your comment: I can easily imagine many people doing what you did (not understanding the context, and possibly even copying ad-hoc code like this into one's document and crossing one's fingers): here we start to get into the actual problems with the LaTeX ecosystem and the mismatch between users' mental models and that of the (too many!) pieces of software involved, but I've exceeded the time limit I set to comment here so I'll stop :)
And as I said, doing things one's own way with the primitives is perfectly fine… I too have participated in my share of TeX perversity and doing things it wasn't designed for (example: https://tex.stackexchange.com/a/403353); I guess it's one of the things that attracts people like us to such an old system. :-)
Too much to ask for I guess. Continue waiting.
There are very few bits of software that are more arcane and broken by default than this absolute crapstraction of a platform.
And that is exactly why a more programmable platform would be good! These issues arise in the first place largely because TeX is not easily programmable, so people have to find arcane workarounds to do anything complex in it.
My understanding, from reading some of the history, is that in the early 1980s when Lamport wrote the set of TeX macros that became LaTeX, already TeX had spread like wildfire among math/CS departments, and it so happened that TeX itself was a more widely available and reliable (see "trip test") programming "language"/platform than any actual programming language. (This was when C was yet to become available outside of a handful of labs and universities, and what most people had was a variety of mediocre Pascal compilers that supported differing sets of Pascal and extensions, which is why Knuth wrote TeX in a very tiny subset of Pascal extended with his own preprocessing system (WEB/tangle/weave) — and still it is said that compiling and testing TeX uncovered at least one bug in every Pascal compiler encountered.)
So at that time, it made sense to "do everything in TeX", and LaTeX's approach of hiding a lot of insane macro-hell complexity behind innocent-looking boxes, an approach since taken up by zillions of "packages" of varying quality, leads us to the situation we're in today.
But there's hope: with LuaTeX inside the TeX world, and things like markdown/pandoc outside of it, people are slowly beginning to get used to doing the programming part in an actual programming language (Lua or whatever), and using TeX merely for typesetting, in which area it is still good / a reasonable and not-too-weird piece of software.
This has allowed both to grow in functionality, year after year. Code written by users wanting particular capabilities has led to so many fantastic enhancements. No one user or committee could anticipate the many ways that these programs could be extended. Although, it would be nice if these systems could be cleaned up and modernized, this is only a dream. In practice, nobody has the time to start over on these enormous bodies of work comprising millions of lines of code.
It's too bad, because it's a joy to use ether of these system once you've spent enough time reading their extensive documentation and puttered around with them for a few years, but their eccentricities and anachronisms keep them out of the hands of many users.