If this was a "true" regular expression, in the sense that it describes a regular language (http://en.wikipedia.org/wiki/Regular_language), then checking whether the number is prime or not, by checking if the expression matches, would run in linear time. This would be worth a few nobel prizes at least.
Unfortunately, it is an "extended" regular expression, i.e. not regular at all in the strictest sense, and (as the article says) requires backtracking.
First of all there is no Nobel in math/CS.
But more importantly, when we talk about the running time of algorithms, we talk about the time in relation to how many bits it takes to describe the number. So linear in the size of the number is exponential in the number of bits, and that is not very interesting.
And a better primality test would not rock the world of cryptography. At the moment http://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_... is demonstrably good enough as a prime test that nobody really is hurting for better ones. The thing that you want is a better factoring algorithm. And the General Number Sieve is already subexponential, and so better than any regular expression could be.
I know. An algorithm this efficient would still be worth that much. :)
But more importantly, when we talk about the running time of algorithms, we talk about the time in relation to how many bits it takes to describe the number. So linear in the size of the number is exponential in the number of bits, and that is not very interesting.
No, the regular expression described here works on numbers in unary representation. Linear to the size of the unary representation is linear to the number itself.
He didn't say it would win a Nobel prize, just that it would be worth a few Nobel prizes. A major breakthrough in algorithmic theory could easily result in someone winning a Fields medal, and I'd say that's easily worth more than a Nobel prize.
Better primality testing would be interesting to the crypto community, but it could rock the math world - like "Primes is in P" introducing AKS as the first deterministic polynomial time primality test.
As you said, though, better factoring would be game-changing for both cryptography AND mathematics.
Proving this is a common assignment in automata theory classes.
Unfortunately, quite a few very-well-defined terms have been made much-less-defined by use in practice.
"regular expression" has come to mean "anything that can be specified by a Perl 'RE' expression, which can backtrack and may take exponential time", rather than the original precise definition (that I will not give here) equivalent to "anything that can be accepted by a memoryless finite state automaton"
Similarly, "real time" has come to mean "on-line" or "computation happening more or less at the same time input is generated", as opposed to the original meaning of "guaranteed response to input within a pre-specified time frame" (which has been relegated to being "hard real time", with the amortized version often called "soft real time").
One could think software construction, being somewhat of an engineering discipline, would be more precise with terms - but one would be wrong; software construction as practiced today is something between an art and a craft, and is mostly not related to engineering.
It's probably time to give up on that terminology nit-pick.
Many Perl coders are self-taught programmers with no CS background. If a man page calls something a regular expression, that is good enough for them.
Why drop to that level if you know better?
I don't want to appear "gratuitously ignorant" in front of someone who might be a decent computer scientist.
There are already ways in which I will appear ignorant due to actual ignorance; why go out of my way to dumb down the way I speak about something that I do understand?
If we are going to wreck math terminology why don't we just redefine "prime number". Let's see: 3 is prime, 5 is prime, 7 is prime, oh ... so 9 is prime 11, is prime, 13 is prime, 15 is prime ... And, what do you know: now we can use a regular regular regex for primality testing over the binary representation: why, it's just (0|1)*1. As a bonus, we can restore 1 to its proper status as a prime number and kick out the oddball 2.
Say, if Perl called an odd test function "isprime", would you start calling odd numbers prime?
Perl standardized a good enough regular expression dialect that everyone else copied it. But the divergence between "regular expression" and "strictly only regular languages" was already a lost cause.
However while cute, it is very slow. It tries every possible factorization as a pattern match. When it succeeds, on a string of length n that means that n times it tries to match a string of length n against a specific pattern. This is O(n^2). Try it on primes like 35509, 195341, 526049 and 1030793 and you can observe the slowdown. (Some regular expression engines may employ tricks that speed this up over the naive analysis, but none will be super speedy about it.)
Usual arithmetic is much more efficient. :-)
here a JS implementation
function isPrime(n) {
var re = /^1?$|^(11+?)\1+$/;
var s = [];
while (n -- > 0)
s.push(1);
return !re.test(s.join(''));
} /this|is|awesome/.examples # => ["this", "is", "awesome"]
So, I tried it out on this "prime number validator" regex: /1?|(11+?)\1+/.examples
#=> ["", "1", "1111", "111111", "11111111"]
/1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 100).uniq.sort.map(&:length).first(10)
#=> [0, 1, 4, 6, 8, 9, 10, 12, 14, 15]
require 'prime'
Array(1..100) - /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 3000).uniq.sort.map(&:length) == Prime.first(25)
#=> true
Horribly inefficient, but pretty cool! Here's my gem if case anyone's interested: https://github.com/tom-lord/regexp-examplesVery usefull!
It's still "under development", while I add a few missing features, but as you can see from the README these are only for fairly obscure parts of the regex language! - E.g. POSIX groups, named properties, character set intersection, ...
AFAICT the way the string is going to be parsed is equivalent to this:
def isprime(n):
if n == 0 or n == 1:
return False
i = 2
while i <= n:
if (n - i) % i == 0:
return False
i += 1
return True
A parser implementation may optimize and do "while (n - i) >= 1" instead, which cuts in half the number of iterations.A high-level description helps makes obvious what's happening:
To check the primality of a number N with string matching,
repeat a string (say, "1") N times
and declare N prime:
UNLESS the repetitions match the string 0 or 1 times
OR the repetitions can be EVENLY matched into groups of MORE than one string.Edit: that's not quite correct. The above would imply some simple structure in repunit primes. If you add the known asymptotical behaviour of the 'number of primes < N)' function, I think it would imply it.
Also, in binary, it would either imply there are finitely many Mersenne primes, or that there is a N such that all Mersenne numbers larger than N are prime (edit: Mersenne numbers are repunits in binary)
It's cool, but it's regexp, not regular expression -- regular expressions cannot test for primeness. The proof is a simple application of pumping lemma[1]: we want to show that language { a^p: p prime } is not regular. Suppose it is. By pumping lemma we have numbers N, n >= 1, such that for every k >= 0, we have a^(N+nk) in our language. But that means that for every k, N+nk is prime, and it fails for k = N, since N+nN = N(1+n) is not a prime, since both N and 1+n are greater than 1.
[1] - http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_langu... [2] - https://news.ycombinator.com/item?id=3391547 [3] - https://news.ycombinator.com/item?id=3391572
https://github.com/fxn/math-with-regexps/blob/master/one-lin...
/^(11+?)\1+$/ is a prime test on whether a unary string > 2 is composite
then people have a real chance of figuring it out. (go ahead and try, if you haven't read the article.)As it stands that 1 x shift is super confusing. who would have thought 1 would become a literal! we wouldn't even write it that way in english.
def is_prime(n); str = "5"*n; str !~ /^5?$|^(55+?)\1+$/; end
The regex essentially says, for integer n, "does 2 ones fit evenly into n ones", then "does 3 ones fit into n ones", etc., which is the same as "does 2 fit into n", "does 3 fit into n", etc.
When I am explaining the basics of regular expressions to people, I always refer them to this post to make them realize that regular expressions are not only glorified search terms. In general, they are amazed :-)