digits: digit: charset "0123456789"
rule: [
thru "$"
some digits
"."
digit
digit
]
parse "$10.00" rule ;; true
pattern: [
some "p"
2 "q" any "q"
]
new-rule: [
2 pattern
]
parse "pqqpqq" new-rule ;; true
Rebol doesn't have regular expressions instead it comes with a parse dialect which is a TDPL - http://en.wikipedia.org/wiki/Top-down_parsing_languageSome parse refs: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Feat... | http://www.rebol.net/wiki/Parse_Project | http://www.rebol.com/r3/docs/concepts/parsing-summary.html
TIL
Although Rebol can be used for programming,
writing functions, and performing processes,
its greatest strength is the ability to
easily create domain-specific languages or
dialects.
— Carl Sassenrath [Rebol author]
https://en.wikipedia.org/wiki/Rebolhttp://reference.wolfram.com/language/ref/StringExpression.h...
Something like that would be
StringExpression[
"$",
Repeated[DigitCharacter],
".",
DigitCharacter,
DigitCharacter
]
or StringExpression[
"$",
Repeated[DigitCharacter],
".",
Repeated[DigitCharacter, {2}],
]
or StringExpression[
"$",
NumberString
]
and the other is StringExpression[
Repeated[
StringExpression[
Repeated["p", {1, Infinity}],
Repeated["q", {2, Infinity}]
],
{2}
]
]
This can be made more concise since StringExpression has an infix form (~~) and Repeated can sometimes be replaced by postfix ..Always, not sometimes. ;-)
'$10.00' ~~ rx{ \$ \d+ \. \d\d };
my $pat = rx{ \p+ \q**2..Inf }; 'pqqpqq' ~~ rx{ <$pat>**2 }
Note that these "regexes" are syntax, not strings, checked and converted in to a hybrid DFA/NFA at compile-time.Regex may be ugly, but you lose something important when you move from declarative to imperative.
I've "learned" regular expressions multiple times but it just never sticks, I have no idea why. It certainly doesn't help that there are several different incompatible syntaxes (so what I remember and think "should" work doesn't).
I'd prefer to write RegX's in this style, however I would pay attention to performance (not that Regular Expressions are high performance, however I wouldn't want to see a large performance loss either).
Modern regular expression engines in a lot of languages, actually go beyond the expressiveness of a regular language. This is what damages performance.
There is no reason why this would reduce performance... if its not doing anything crazy.
If anything your taking work away from it. Your building the tree directly here, where as parser would normally build a tree from the string. But since this is integrating into the languages RE library i'm guessing its writing that tree as a string, which is then passed into the regular expression engine, to be turned into a tree again :)
If a regular expression runs too often, even pre-compiled (as they should be), you'll want to replace them with code written in the native language. I've gone in and replaced a one line search/replace written in RegX (compiled), with just a C-style for() loop over the wchar array, and had the memory usage drop by near 80% and performance increase by over 60%.
So high performance is all relative. However RegX isn't something I'd describe that way, even compiled. It is a nice way to write complex string parsing code quickly however.
If your regex is complicated, it will probably beat any naive attempt to write it into conventional string processing, short of reimplementing regexs in the first place. Especially since in many languages, "conventional string processing" may involve the creation of lots of copies and sub-copies.
Probably not true for Javascript (and other scripted languages) - matching regex uses native and highly optimized regex lib, which will usually be orders of magnitude faster than implementing this in the language.
It is highly dependent on the regular expression engine you use, most don't use automata because of extra features.
As the name suggests though, the focus was on passphrase criteria and it wasn't to produce a DSL for general regex building. The library also supports named templates and a few utility methods.
As for syntax, there's the fluent syntax (chained methods), and there's the query syntax which is syntactic sugar that gets compiled to the methods. The query syntax is probably the biggest reason people mistake LINQ for being SQL specific since it resembles SQL.
E.g.,
var results = SomeCollection.Where(c => c.SomeProperty < 10)
.Select(c => new { c.SomeProperty, c.OtherProperty });
The same thing in query syntax: var results = from c in SomeCollection
where c.SomeProperty < 10
select new { c.SomeProperty, c.OtherProperty };
Then you can iterate over both the same way: foreach (var result in results)
{
Console.WriteLine(result);
}``` (?xi) \b ( # Capture 1: entire matched URL (?: [a-z][\w-]+: # URL protocol and colon (?: /{1,3} # 1-3 slashes | # or [a-z0-9%] # Single letter or digit or '%' # (Trying not to match e.g. "URI::Escape") ) | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels )+ (?: # End with: \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) ) ```
/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or "%";https://www.debuggex.com/r/EpocMU_7Fq_B_p9z
edit:
wait, I thought about it for a second and I see what you meant. You're not saying it's wrong, you're saying it's obvious.
I wasn't sure if it was obvious because I wasn't sure if {1,3} was supposed to be {1-3} and there was a mistake in the expression, or if there was some kind of unexpected error in the [a-z0-9%] expression.
Because even in this simple example, there is room for error.
(?xi)
\b
( # Capture 1: entire matched URL
(?:
[a-z][\w-]+: # URL protocol and colon
(?:
/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or '%'
# (Trying not to match e.g. "URI::Escape")
)
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)https://github.com/perl6-community-modules/uri/blob/master/l...
e.g.
(: (or (in ("az")) (in ("AZ")))
(* (uncase (in ("az09")))))Look, I know it takes a while, but once you get the hang of it, you won't need any crutches to write regular expressions. The only tool that's really needed is a way to rigorously test a regular expression to make sure it does what it needs to do and there are a ton of those around.
If you think characters and logic need to be on different quoting levels, you're not taking the right perspective on regular expressions. \d or \w are not an escaped d or w, they are their own atoms (or "the keywords of the language", if you will), distinct from the atoms that match the ASCII characters 0x64 and 0x77. The thing to remember with regular expressions is always the first lesson presented: (non-meta) characters match themselves, the regular expression /a/ matches the letter a. What's implied here, but rarely said, is that that's not really the letter a in there, but rather an expression that matches the letter a—it just so happens to also look like the thing it matches. This distinction is subtle, but important. This can also be made more evident by using the /x modifier if it's available to spread out the individual expressions (put space between the keywords).
The primary difference in regular expression languages is often how "logic", as you call it, is expressed. PCRE considers, for example, [ to be the character for opening a character class and \[ to match the byte 0x5b. Admittedly, this is confusing when switching engines because 1) not every character matches itself (the expression that matches a character and the character it matches are not visually the same) and 2) other RE engines have taken the opposite approach depending on if that engine was meant, by the author, to have more literal atoms or more logic in its most common use (that is, you save typing if you mean to match the byte 0x5b more frequently than if you mean to open a character class).
As for "quoting", you almost NEVER should be using things like PCRE's \Q…\E (or the quotemeta function) unless you're building regular expressions dynamically from user-input. quotemeta and friends are not readability tools, but safety tools.
I see regex like that: if you have to use it often enough, better to learn it as it is - will be more helpful in the long run. If you don't use regex too often then just google your question - there's a very high chance that somebody already wrote regex for your or similar problem.
Only tools I ever use are regex testers (like regexr.com) when I need to make sure that pattern works correctly.
While I prefer writing regexes, a regex DSL isn't fundamentally better or worse, just different. In addition, it allows non-computer people to write, or at least specify, regexes in a way that makes more sense to non-developers.
(setq imenu-generic-expression
(let ((ident '(1+ (any "A-Za-z0-9_"))))
`(("plugin" ,(rx line-start
(0+ space) "plugin"
(1+ space) (eval ident)
(1+ space) (group (eval ident)))
1))))
Of course, you can do this with string concatenation, but I think this syntax makes it clearer what's going on.The particular syntax we use (which is not that great) is not THE "regular expressions" is just one syntax we arrived at.
That is, the "regular expressions" name doesn't refer to the syntax, but to the concept.
These web based tools can do it:
https://www.debuggex.com/r/Yxqws81Uif-BGBN8
Important note - this is built up programmatically, it's not just a string dumped in a parser!
I get that some people have a hard time understanding regexpes with all the backtracking and greediness. Yes, syntax is a bit complicated. Maybe simplified predictable default mode could help. But there is no problem with DSL being used as an abstraction. In fact, we need more DSLs, for everything!
(compound "$" (1+ :digit) "." :digit :digit)
Run: $ txr -p "(regex-compile '(compound \"$\" (1+ :digit) \".\" :digit :digit))"
#/$\d+\.\d\d/