His code here demonstrates why. It is both shorter and much easier to understand than anything the LLMs generated. It is not always as efficient as the LLMs (who often skip the third loop by calculating the last factor), but it is definitely the code I would prefer to work with in most situations.
[1] https://www.udacity.com/course/design-of-computer-programs--...
WITH all_numbers AS
(
SELECT generate_series as n
FROM generate_series(1, 108) as n
),
divisors AS
(
SELECT *
FROM all_numbers
WHERE 108 % n = 0
),
permutations as
(
SELECT a.n as n1, b.n as n2, c.n as n3
FROM divisors as a
CROSS JOIN divisors as b
CROSS JOIN divisors as c
)
SELECT *
FROM permutations
WHERE n1 * n2 * n3 = 108
AND n1 < n2 And n2 < n3
ORDER BY n1, n2, n3
https://codapi.org/embed/?sandbox=duckdb&code=data%3A%3Bbase...It's amazing how he never used the words "AI" once in this course despite the fact that it is a straight up AI course.
I revisit the course notes at least once a year and I still walk away with something new every time.
I think this makes sense. LLMs are trained on average code on average. This means they produce average code, on average.
from math import prod
from itertools import combinations
def find_products(k=3, N=108) -> set[tuple[int, ...]]:
"""A list of all ways in which `k` distinct positive integers have a product of `N`."""
factors = {i for i in range(1, N + 1) if N % i == 0}
return {ints for ints in combinations(factors, k) if prod(ints) == N}
find_products()Edit: I see that he imported the typing Set, which is deprecated, instead of collections.abc.Set, which is still useful, so your comment is correct.
You can increase the compute for allowing more tokens for it to use as a "scratch pad" so the total compute available will be num_tokens * ops_per_token but there still are infinite amount of problems you can ask that will not be computable within that constraint.
But, you can offload computation by asking for the description of the computation, instead of asking for the LLM to compute it. I'm no mathematician but I would not be surprised to learn that the above limit applies here as well in some sense (maybe there are solutions to problems that can't be represented in a reasonable number of symbols given our constraints - Kolmogorov Complexity and all that), but still for most practical (and beyond) purposes this is a huge improvement and should be enough for most things we care about. Just letting the system describe the computation steps to solve a problem and executing that computation separately offline (then feeding it back if necessary) is a necessary component if we want to do more useful things.
llama3.1-70b, llama3.1-405b, deepseekcoder2.5, gemma-27b, mistral-large, qwen2.5-72b. https://gist.github.com/segmond/8992a8ec5976ff6533d797caafe1...
I like how the solution sort of varies across most, tho mistral and qwen look really similar.
There is math term "formal language", and both math and programming perfectly fit into it: https://en.wikipedia.org/wiki/Formal_language
> Sometimes a natural language such as English is a good choice, sometimes you need the language of mathematical equations, or chemical equations, or musical notation, and sometimes a programming language is best. Written language is an amazing invention that has enabled human culture to build over the centuries (and also enabled LLMs to work). But human ingenuity has divised [sic] other notations that are more specialized but very effective in limited domains.
Of course, this is what Chomsky calls a “terminological dispute”, so I mean no offense and you’re ofc free to stand your ground that the only language is what appears in human brains! But if mathematical notation and programming languages aren’t languages, what are they…? Protocols? Recursively generative patterns? Maybe just grammars?
The best move in any terminological dispute is “my terms are more useful”, so this seems like a good reason to keep language as it’s defined by the generative linguistics. Or, more broadly:
Saussure approaches the essence of language from two sides. For the one, he borrows ideas from Steinthal and Durkheim, concluding that language is a 'social fact'. For the other, he creates a theory of language as a system in and for itself which arises from the association of concepts and words or expressions. Thus, language is a dual system of interactive sub-systems: a conceptual system and a system of linguistic forms. Neither of these can exist without the other because, in Saussure's notion, there are no (proper) expressions without meaning, but also no (organised) meaning without words or expressions.
https://en.wikipedia.org/wiki/Theory_of_languageBut if I must, I suppose I am indeed assuming a stance by taking potshots at this narrower (albeit more precise) use of the word language by (obliquely) pointing to counterexamples that could be considered languages in their own right; the sweeping claim that "language is not essential for thought" seems far broader than the narrow sense in which the term is construed in the actual paper.
I don't think that this is actually true at all (and modern neuroscience, sociolinguistics, etc. disagree pretty heavily with Chomsky), but it is impressive how far you can get with modelling natural language grammar with formal methods.
import itertools
def factors(n):
result = set()
for i in range(1, int(n**0.5) + 1):
if n % i == 0:
result.add(i)
result.add(n // i)
return sorted(result)
def main():
n = 108
factors_list = factors(n)
triplets = []
for a_idx in range(len(factors_list) - 2):
a = factors_list[a_idx]
for b_idx in range(a_idx + 1, len(factors_list) - 1):
b = factors_list[b_idx]
for c_idx in range(b_idx + 1, len(factors_list)):
c = factors_list[c_idx]
if a * b * c == n and a < b < c:
triplets.append((a, b, c))
print("All ways in which three distinct positive integers have a product of 108:")
for triplet in triplets:
print(triplet)
if __name__ == "__main__":
main()
Run: -▶ python a.py
All ways in which three distinct positive integers have a product of 108:
(1, 2, 54)
(1, 3, 36)
(1, 4, 27)
(1, 6, 18)
(1, 9, 12)
(2, 3, 18)
(2, 6, 9)
(3, 4, 9)
Too smart for its own good, maybe? Still, apparently correct on the first try.There's an import for `itertools` which isn't used, just as noted in the article for 4o.
Can someone who knows what it's about say how optimal this version is, compared to other answers?
Note the added section compared to the leetcode question (https://leetcode.com/problems/two-sum-ii-input-array-is-sort...) is "And note also that index1 + 1 == index2."
If you enter that into 4o it'll just keep repeating leetcode 167 answers and insist binary search has nothing to do with it.
tangent: o1-preview isn't going to invent things quite yet (i tried to get it to come up with something original in https://chatgpt.com/share/6715f3fe-cfb8-8001-8c7c-10e970c7d3... via curiosity after watching https://www.youtube.com/watch?v=1_gJp2uAjO0). but it can sort of reason about things that are more well-established.
On a dark note, I wonder if increasing AI reasoning capability could have dangerous results. Currently, LLMs are relatively empathetic, and seem to factor the complex human experience into it's responses. Would making LLMs more logical, cold, and calculating result in them stepping on things which humans care about?
There's definitely anecdata supporting this. For some time chatgpt was better on a lot of arithmetic/logic type tasks if you asked it to generate a python program to do x than if you just asked it to do x for example. I haven't tested this specifically on the latest generation but my feeling is it has caught up a lot.
Because the "intelligence" is borrowed from language (lower entropy)
I would place English (or all spoken/written languages in general) first, Math (as discovered) second, and programming languages last.
If I understand correctly, Peter Norvig's argument is about the relative expressivity and precision of Python and natural language with respect to a particular kind of problem. He's saying that Python is a more appropriate language to express factorisation problems, and their solutions, than natural language.
Respectfully -very respectfully- I disagree. The much simpler explanation is that there are many more examples, in the training set of most LLMs, of factorisation problems and their solutions in Python (and other programming languages), than in natural language. Examples in Python etc. are also likely to share more common structure, even down to function and variable names [1], so there are more statistical regularities for a language model to overfit-to, during training.
We know LLMs do this. We even know how they do it, to an extent. We've known since the time of BERT. For example:
Probing Neural Network Comprehension of Natural Language Arguments
https://aclanthology.org/P19-1459/
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
https://aclanthology.org/P19-1334/
Given these and other prior results Peter Norvig's single experiment is not enough, and not strong enough, evidence to support his alternative hypothesis. Ideally, we would be able to test an LLM by asking it to solve a factorisation problem in a language in which we can ensure there are very few examples of a solution, but that is unfortunately very hard to do.
______________
[1] Notice for instance how Llama 3.1 immediately identifies the problem as "find_factors", even though there's no such instruction in the two prompts. That's because it's seen that kind of code in the context of that kind of question during training. The other LLMs seem to take terms from the prompts instead.
Not immediately clear what this means or if it is good or bad or something else.
"But some of them forgot that 1 could be a factor of 108"
I struggle to take them seriously. The anthropomorphization of AI into something that can "know" or "forget" is ridiculous and shows a serious lack of caution and thinking when working with them.
Likewise it leads people into wasting time on "prompt engineering" exercises that produce overfit and worthless solutions to trivial problems because it makes them feel like they're revealing "hidden knowledge."
Genuinely disappointing use of human time and of commercial electricity.