Lost much of that knowledge within a couple years after leaving that job... was shocked how much wasn't retained when I stumbled into another project that required a fair bit of regex work. There was some muscle memory involved and I was able to ramp up quickly, but now 20 years after that initial job I'm just like you.
The thing that finally forced me to dig a little deeper was an assignment at work that involved Apache HTTPD and mod_proxy and the need to define some really complex routing rules that were imposed on us by something upstream of our service. We wound up having to peek into the incoming URL and route things differently based on sub-elements of the overall path. So I finally had to learn to use capture groups and get into the difference between the "greedy" and "non greedy" matching, yadda, etc. And the thing is, when I figured it out and got all that working, I felt like I'd acquired a new super-power.
For about 3 weeks. Now, I'm pretty sure all of the new stuff I learned has totally escaped my memory again, because - once again - I haven't had any call to touch a regex in almost 2 years.
sigh I should probably look for an Anki deck on regexes and start doing spaced repetition on them just to try to finally get this stuff locked in.
Sometimes I'm using .Net. Sometimes I'm using Python. Sometimes I'm using whatever oddball engine the developer chose.
I know how regex works. I can use forward and backward references. I can combine complex patterns. I can match, extract, replace, transform, etc. Sometimes I have even used nested patterns (though I'd probably need half an hour to re-learn it well enough to read one). But I'm not sitting down and memorizing the difference between /d and /D, or \S and \w or any of that. Frankly, I'm very lucky if I remember the difference between ^ and $. I will have the .Net[0] and the Python[1] doc in my bookmarks forever.
I'm not remotely ashamed of it, either. The codes are completely arbitrary with absolutely no intrinsic meaning. Worse, it's not easy to tell the difference at a glance between literal characters, character classes, operators, wildcards, special constructs, etc. More than once I've been confused by a regex only to discover it does something I didn't know they could even do. Regex patterns are meant to be concise and comprehensible to the regex engine, not to the programmer.
Don't feel bad because you don't memorize an arbitrary and complex syntax. Memorizing syntax is not the job of a programmer. The job of a programmer is to compose the logic and design the system and know that a syntax exists to compose it in. A programmer is an author, not a linguist.
[0]: https://docs.microsoft.com/en-us/dotnet/standard/base-types/...
If our experiences are typical, I'd argue people new to regex's should more learn they exist and when they can be useful, and not worry too much about actually learning their mechanics.
I don't understand the goodwill toward regexes. It's basically an embedded BrainFuck in your programming language.
I'm 14 years in my dev career, there never was a moment where not using regexes came to be a problem.
Because a well written regex performs extremely well (regex engines are often very highly optimized).
It gives you all the benefits of using a domain-specific language and using an extremely mature software library. Just like a domain-specific language, it will have a baked-in philosophy involving the exact task you want to accomplish, so it will not suffer from language vs algorithm impedence. Just like using a mature library, it will probably have accounted for weird oddball cases that you're not even thinking of and have enough features to do everything you will want.
> It's basically an embedded BrainFuck in your programming language.
I don't disagree. It's not easy to read and can be hard to maintain. There are ways to write regex such that it's easier to understand, but the syntax generally doesn't make it easy to do that and doesn't encourage you to spend the time on it.
However, when you see a regex, you do know that it's 100% used to manipulate strings. That alone tells you quite a bit about what is going on.
My thought was "Oh I think this is a job for that regex thing" and 35 minutes of googling syntax + a handful of passes later I had all the dates in a workable table. I have no idea how much code that would have taken. Albeit, I am a novice programmer.
[0] Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
I love regular expressions and figuring out the syntax to solve problems I encounter where I can use regex. Such as searching code for calls with specific named params that may be in different order across the code base. But I always have this quote in the back of my mind if I'm thinking about using regex in code, especially areas that get hit a lot.[0] https://blog.codinghorror.com/regular-expressions-now-you-ha...
It's just a riff on tech-debt, imho
And that's how I always think of Regex now. It's an incredibly powerful tool that I pull out pretty often, but if I'm doing anything beyond the most basic pattern matching I know I'm about to lose a bit of time questioning my abilities and sanity.
My problem with it (other than all the different flavors—that’s just history’s fault) is that you don’t tend to get integrated help/tool support for them. They are often just plain strings—how is e.g. Intellij supposed to spot them reliably and help you?
It’s too bad if we have to copy and paste regex strings into websites in order to figure out what they do.
The number one most useful tool support in my book would just be to highlight metacharacters. Are parens a metacharacter in the dialect that I am using now? Is `+` significant?
A bigger ask would be more verbose regular expression declarations and compilers that can translate to and from them. It’s nice if you can use comments in a regex dialect, but comments should be treated like they are treated in code writ large; don’t use them if you can make the code “self-describing”. Imagine a more verbose declaration language where you can make local aliases, for example `let ident = [a-z][a-z1-9_]*`.
However, I would realllyyy not want to have a more verbose regex dialect. For me at least, it would make recognizing common regex patterns a lot harder since they won't be succinct. It would also be yet _another_ regex variety that I would have to remember.
For me, Perl's DEFINE feature and 'x' switch are definitely close enough[1] to variables and comments. Pardon, I don't remember what they are officially called. What we really need is for more non PCRE engines to implement these features
WebStorm or any IDE that has WebStorm functionality built-in can detect `/pattern/` in Javascript and give you an option to test it and warn you of syntax errors. PyCharm will help you with functions in `re` module, Rider with `Regex.IsMatch` and so on, all JetBrains IDEs have some sort of contextual help that gives you guidance on stdlib functions that accept regular expressions.
https://discourse.julialang.org/t/ann-readableregex-jl/43450
It depends on the language. JavaScript has a dedicated RegExp object. If you use the literal notation or constructor, you will always get IntelliJ support (unless you decide to use the constructor and extract the regex string for whatever reason). I _think_ you could also mark a standalone string variable with JSDoc to get IDE support, too.
[dbf]eer
When the answer required was (if I remember correctly):
[bdf]eer
I think as long as you're correct you should be allowed to move on. By all means show your preferred solution.
Apart from that I love it! I normally use regexr.com and just mess around with it until I get the desired result. It also helps with learning, but you end up never truly understanding the concepts.
A nice example was a big Cloudflare outage in 2019: https://blog.cloudflare.com/details-of-the-cloudflare-outage...
So for anyone using regex in (1) production as (2) part of an automation / regular process (i.e. not a one time search) on (3) sizeable amounts or reoccurrences of data, I'd really advice gaining a deeper understanding of what the various options do.
An example from 21/55:
"To express at least a certain number of occurrences of a character, we write the end of the character at least how many times we want it to occur, with a comma , at the end, and inside curly braces {n, }. For example, indicate that the following letter e can occur at least 3 time."
It should read something like:
"To express a match of a minimum number of character occurrences, we use a comma after the number of occurrences, within the curly braces {n, }. In the following, try building a regex to match the letter e occurring a minimum of 3 times sequentially."
I always look at these intros/descriptions of Regex with a heavy heart. They describe what regex's are, but none of the info is going to make much sense to someone who doesn't already know why they would want to learn them.
The best motivation for regexes that I've read is actually from a Python Tutorial [0] where the author gives an example of writing a lot of nested 'if' statements that could all be solved by a single regex. On the whole, I think regexes are one of the most powerful tools that doesn't have enough publicity in large part due to this Catch 22 of trying to explain what they are.
I frequently use the journalist's "5 Ws and H" framework as a checklist procedure for ensuring my technical communication covers fundamental questions/ideas:
* Who
* What
* Where
* Why
* When
* How
The slightly tricky thing is that you have to formulate a question for each W based on your domain. For example what is a fruitful "where" question for RegEx? Nonetheless the checklist makes me less likely to miss very key ideas, such as "why" one would use RegEx.
To make this idea more procedural maybe we could just formulate it as ungrammatical questions where you put the key topic after each W:
* Who RegEx?
* What RegEx?
* Where RegEx?
* Why RegEx?
* When RegEx?
* How RegEx?
And then just let your mind flesh them out into more complete questions...
That said, where I see most tech sites / products fail is on addressing benefits. Why should I care? (As opposed to the brand or product why.)
I wish I had $20 for every "Looks cool. But it's not clear to me my life will be any better."
But then you're just memorizing things like 'start_of_line' instead of '^'. Perhaps easier to read, but no easier to write.
I do prefer using parser combinators for more complex tasks.
Sure if they were, we’d already discover them. All of the regex criticism boils down to few simple statements for categories of cases:
1) I didn’t learn regex and have no cheatsheet
Learn it or at least print a cheatsheet and stick it to the wall.
2) The problem that this specific regex solves is a hell of a regular problem under any representation.
Any particular regex is only as terrible as a ladder of corresponding if’s and for’s would be. Deal with it.
3) The problem that this specific regex solves is not a regular language.
Use a proper xml parser.
You have some text to process, open your text editor, you will probably use a dozen regular expressions for that - this is very frequent for many. Can you conceive a better syntax, at least for the simple cases?
In general I would say ~70% of regexes are highly readable. With tools like the above, you can probably go to like ~85%? There are some regexes that are super complicated and then likely should be refactored into a composition of simpler regexes. But that's just a guess. I wonder if there are any studies done about this...
They're very effective at what they do as long as you don't make insane brainteasers that make people curse your name.
> Catch 22 of trying to explain what they are.
any teachers, or people who explain/document things for a living, have some good tips or templates to avoid this?
"With regex you can search for any combination of characters in a string or return any such combo or modification you like"
1) Encourage whoever you're teaching to stop you immediately if they don't feel like they understand something you're saying, even if it's a single word that's throwing them off, and especially if they're not rock-solid about a simple concept they "should already know". Modern school teaches people that "returning to the basics" is a waste of time; but as Feynman says, you should return to the basics often, as masters do. Pianists don't stop playing scales once they're famous. This means that if your student want to review what an "expression" is, or a what a "string" is, or what "returning" means, you've got to encourage them to do it. If a 10-minute explanation of RegEx turns into a 45-minute review of how the string variable type was invented, that will be more useful for the student in their pursuit of RegEx mastery than will a technically accurate but shallow regurgitation of your 10-minute spiel about what RegEx is. This is because they need to lay the mental framework of how they're going to think about RegEx; you are able to explain it in 10 minutes because you already have that built in your head, but they need to build those background pathways and connections themselves before analogies and summarizations make sense.
2) Try to figure out how you can make them experience the problem that led to the invention of RegEx. A student will never truly understand why a solution is valuable until they really, deeply understand the problem that the solution is solving. Note that I'm not saying that you need to teach the problem before the solution--not every student needs them in that order--just that they won't master the solution until they understand the problem.
3) In lieu of "testing" a student, have them take many breaks to re-explain what they've learned to you, even if you haven't reached a real conclusion about anything and are just checking that they understand a sentence you said. Many students, especially if they have a good teacher, will experience the sensation of comprehension even if it's not actually there. This is the "it makes sense when he says it, but when I try to explain it I can't find the words" phenomena. Taking frequent breaks to have them explain things back to you in their own words will reveal their conceptual weaknesses, and those are what you focus on.
4) Don't try to get it all done in a single session. Learning requires both forgetting and sleep. First, you should tell them to expect to forget, and that they will need to come back over and over again to topics that seem basic or simple; forgetting is part of the process of learning, like painting multiple layers on a wall. Second, they need to sleep in between sessions, which means that you can't teach everything in one day and you can't learn everything in one day, and multiple days may need to be spent reviewing the same material.
This all makes a lot more sense when you treat learning like sports. Learning <programming topic> is like learning a slice serve in tennis. You don't need to serve slice, especially if you can hit flat serves at 115 mph, but serving slice is an invaluable technique when you're playing someone who can't return slice serves at all--that's a near-guaranteed 3/6 games out of every set. But in order to learn it, you need to focus on your tennis fundamentals (stay loose, eye on the ball, toss correctly), practice the same basic movements over and over again, get lots of sleep, and understand why you're learning the skill in the first place.
> Regular expressions (commonly known as "regex") are used for advanced pattern matching in strings. They can also be used to replace text, transform strings, or extract substrings. It's a very powerful domain-specific language that is purpose-built for string patterns and manipulation. Many general-purpose programming languages include regex engines that use similar, but often slightly different syntaxes to support the use of regex.
I'm neither of those, but I frequently explain things to my friends and they say I explain well. So I will throw my two cents anyway and hope you don't find them trivial self-help platitudes.
(1) Start with Concrete things
No learning ever starts from generalities. Never start with something like "Regular Expressions is a declarative language to describe strings of a certain general form blah blah blah", I call this the wikipedia style of teaching, an utterly useless word-swapping game where you explain things and constructs in terms of even more complicated (or equivalently complicated) things and constructs till the learner runs out of stack space and comes out learning nothing and feeling like a faliure on top of that. Remember that learning is a process of building up, you start from familiar questions, problems, specifics, themes or worldviews of the learner, then gradually introduce generalizations and solutions to get them to where you want them to be.
(This is generally a two-way street, the learner also has to know something about the teacher and where they are coming from and what are they trying to do, it's like telling a story: The author can't simply say "because I say so!" to explain every detail of the plot, but the reader can't also say "I don't know, feels too unbelievable" in response to every plot detail.)
The bare essense of regex is using meta characters to encode several string characters. The fact that the regex
"meta.*"
so powerfully and succinctly encode string-recognizing logic that would be imperatively expressed as
fun metastar(str):
if len(str) < 4 then return false
if str[0:3] != "meta" then return false
return true
Makes the case concretely and perfectly: a single string (two letters longer than the simplest string it matches) versus 3 bug-hiding branches (e.g. what if the "!=" operator in the implementation language actually compares string-identity, not string-equality?). This is even more generous than most languages allow, the ':' array slicing operator for example is saving us a loop. (possibly inefficiently, if it's copying the slice from the string. Not a problem now for "meta", but who knows when it will be?)
Regexes are patterns, which are things that resemble the things they are describing, but aren't any of those thing specifically. It's like a dark silhouette of a man, it doesn't describe any specific man, it's a pattern that can match any man of the same general body plan and height. Regexes are silhouettes, the dark parts are the meta characters that act as placeholders for arbitary strings.
(2) Examples from real life
Don't just take the "menu approach" of reading all the features and meta characters and thinking you're explaining, actually take the time with examples. Again, examples are all that matters for the human brain, it's literally useless to tell somebody to imagine a golden mountain if they have never seen a mountain or gold before.
Our world is awash in strings of certain identifiable structures (money, dates, times, names in formal settings, equations, etc...), try to take the time to obtain several real-life examples, try to make the data come from sources like wikipedia or other publicly available dataset. After demonstrating how each of those 3 or 4 general forms of strings can be described powerfully by this meta character, give 3 or 4 more general forms to the learner to try on their own.
(3) Visualize executions, introducing debugging tools in the process
Just because regexes are declarative, doesn't mean the matching process can't be described in imperative terms, especially initially.
Later on, introduce tools like https://regex101.com/ or https://www.debuggex.com/ and always draw "Rail road digrams" that show what a given regex matches in terms of easily verbalized diagrams.
(4) Disadvantages, subtleties, and other approaches
The learning process isn't a sales pitch, there are plenty of things that suck in regexes. They are non-standard and ad-hocly designed, the runtime engine that runs them can be inefficient (unlikely if the host programming laguage is popular and > 20-years-old, but a thing to keep in mind nontheless: regexes are a whole other language, requiring a seperate interpreter or a compiler other than the one for the surrounding code), and the equivalent imperative code might not be so bad in comparison for simple cases and much more debuggable.
The name "regex" is derived from a misnomer, the orignal "regular expressions" are a mathmetaical formalism to encode finite-state machines, it orignally contained only alternation, sequencing and kleene star (the '|' and the '*' operators, plus putting letters next to each other. That's it, that was the orignal regex capabilities), when programming languages and cmd utilities started to implement them in the 70s and 80s, each started to experiment with features that break this model. For example, "capture groups", the ability of the regex to copy parts of the matched string into variables, trivially break the model : if you can capture arbitarily-long strings, then you can't be a finite state machine.
This increases power but decreases efficiency guarantees (Perl's regex are dangerously close to turing-completenss [https://www.perlmonks.org/?node_id=809842]!, the language is hiding a whole other language inside a single feature) , it also complicate the notation with symbols for the new capabilities that it wasn't designed for, with the result being the mess that regexes' syntax is now. It also means you can never "learn" regex, you can only learn (to whatever accuracy you care) Perl's regex, or Java's regex, or Python's regex. There is a vague set of commonalities, but don't rely on remembering which is a common and which is different when there are so many features implemented in so many ways.
Don't let the learner come away thinking that "declarative" is synonymous with regexes. For example there is the parser combinator style, which can encode the above example as something like:
the_specific_string("meta"). followed_by(ANY_LETTER). repeated(ZERO_OR_MORE_TIMES). build_pattern(). recognize("meta-circular")
the key idea at play here is a sort of "builder pattern". There is an abstract "parser" object that has a single recognize(str) method, and you can build your pattern by composing together the many customizable childrens that implement this abstract interface. The composition happens by "combinator methods", which takes two or more parsers and build a parser that performs a mixture of their functionalities indicated by the name (e.g. followed_by() takes several parsers and sequences them next to each other, repeated() takes a list of parsers and iterates the last one any number of times, including skipping it entirely). The things being built to represent parsers are generally (in functional languages at least) closures, but there is no reason why this pattern can't be built on top of regexes, each step simply generates the equivalent meta-character, and build_pattern returns the final pattern string.
There are tons of those "Parser approaches", formalisms, tools, patterns and libraries to express strings and string-recognition and parsing declaritevly. Regexes are merely the most famous and widespread, which is a sad state of affairs IMO.
Imo replacing several nested if statements with a single esoteric regex is not necessarily a win. It depends on if pattern matching is really the best tool for the job.
An endless source of off-by-one errors, not to mention buffer overflows, index out of bounds exceptions, accidental negative indexing.
Having said this, I use tools that make regexes easy to use and readily available - I think in many programming languages the syntax means that other solutions are just as easy to devise and implement.
I wonder why there's no context-free language parsers in standard libraries. The Earley parser can take grammars as input without necessarily having to generate code, it would be a great algorithm for a standard context-free parser.
Seems like you are supporting JS flavor, that should be mentioned prominently.
Suggestions for the cheatsheet:
1) Multiline example regex is missing the anchors
2) Negative lookbehind should be `(?<!)` not `(?!)`
3) `+` and `*` examples are using `()` around a character, which isn't needed (the `?` example doesn't use it)
I used to use it a lot, although I've not used it in quite a while. I don't recall the last time I even needed to write a regex. Somehow it's still stuck in my head.
Well worth having it if you want a tool to hack around large blocks of text and play with regex in a "live" environment.
Compare that with https://regexone.com/ . No fluff. I immediately know what I'm looking at and am given a simple practice problem, with a place to type smack in the middle of the page, and a responsive green coloring when I get something right. The problems advance to exactly the kind of stuff I need, with a quick view on all the lessons on the right-hand side. I'm visiting that website for the n^th time because I need to quickly refresh my regex to accomplish a task.
One day at my second job out of college, I wrote a quick regex for something, and one of my more senior colleagues looked at me like I was a wizard. It was amazing to be able to get some street cred for that. To me, it validated the effort I went through previously.
I'm trying to learn about SAT/SMT solvers now. Not because I have a pressing need for them but because it's a completely foreign thing that - who knows? - maybe I'll be able to put to good use today. The problem with SAT/SMT is that there are nowhere near the clear learning resources compared with, say, regexes.
> Write the expression using curly brackets {} to select the numbers from 0 to 9 in the text that is at least between 1 and 4.
Discussed on HN as well (78 comments): https://news.ycombinator.com/item?id=7370622
I have no affiliation with these guys, I’ve just been impressed over the years
there's a bug when entering text into the regex area which creates new padded elements show up which makes text on the page shift, e.g. https://i.imgur.com/AFBccIn.gif
the line height in the subtitle area could be increased so the text isn't so cramped ( https://i.imgur.com/HReADcS.png )
the background color of the entire page should be darker (or use some other solution) to make the <code> looking text more distinct as code text. ( dark grey on dark blue doesn't really stand out https://i.imgur.com/8Nt9YNl.png )
Edit: After lesson 8 you start to get problems to solve
Edit 2: I take it back, this is the first time I've understood how lookarounds work. Great stuff!
Btw: Lexer-less derivation parsers are interesting because of how simple they are to write and maintain. IIRCBIMW, they are neither LL or LR.
> The basic matcher is to type as is to choose a character or word. For example, to select the word curious in the text, type in the same way.