But presuming that wasn't the critical point you wanted to make:
Like I said, a language model can know that "1" "is less than" "2" — and it can also know (if it's either trained with characters as lexemes, or is given access to a pre-parse output to second-chance analyze unknown tokens) that "10" is the same thing as (1 tens). Which then means that it can know that "23" "is less than" "48" because it can do linguistic deductive tricks between the terms (2 tens plus 3 ones) and (4 tens plus 8 ones).
But those tricks are tricks. It isn't doing math; it's applying "2" as an adjective to "tens", constructing a verb phrase whose verb is "plus", and then (likely) interpreting your question as a question about analogy. It knows that (2 pineapples) "is less than" (3 pineapples) by analogy — (N of some unit) "is analogous to" N-the-number. But it doesn't know that "tens" is a special unit distinct from "pineapples" in that it changes the meaning of the number-token it's attaching to.
To put it another way: a (pure) language model has no way of encoding numbers that allows it to actually do math and get correct results out. It can memorize tables of answers for well-known numbers, and it can try to use language tricks to combine those tables, but it can't perform an algorithm on a number, because no part of its architecture allows the nodes in its model to act as a register to encode an (arbitrarily large) number in such a way that it is actually amenable to numeric operations being performed on that data.
A model that is really modelling numbers, should be able to apply any arbitrary algorithm it knows about to those numbers, just like a regular CPU can apply any instruction sequence it reads to its registers. Not just add/sub, or mul/div, but arbitrarily-complex things like e.g. iterated modular exponentiation, should just be a matter of saying "hey LLM, you remember the algorithm for doing MOD-EXP, right? So tell me...."
(Note that humans can't do this kind of math purely "in our heads" any more than LLMs can, because we don't have any low-level accelerative infrastructure for modelling and working with numeric data either! We need an external buffer that inherently embeds sequencing/positioning info — like our auditory sensory "loop" memory from [sub]verbally repeating the working data; or our visual sensory persistence-of-vision memory, from writing the data down onto a piece of paper and staring at it as we work.)