Show HN: Regex Cheatsheet (opens in new tab)

(ihateregex.io)

499 pointsgeongeorgek6y ago129 comments

129 comments

111 comments · 36 top-level

robert_tweed6y ago· 14 in thread

OK, these kinds of regex tools get posted quite often. I get it, regex is very confusing at first. And some of these use-cases result in rather complex expressions nobody should be forced to write from scratch (you are still remembering to write unit tests for them though, right?)

But as someone who actually knows [some flavours of] regex fairly well, what I would really like, is a reference that covers all the subtle differences between the various regex engines, along with community-managed documentation (perhaps wiki pages) of which applications & API versions use which flavour of regex.

For example, the other day I wanted to run a find on my NAS. I needed to use a regex, but the Busybox version of find doesn't support the iregex option, so all expressions are case-sensitive. With some googling, I was able to find out that the default regex type is Emacs, but I wasn't able to find either a good reference for exactly what Emacs regex does and doesn't support, nor any information about how to set the "i" flag. In the end I had to manually convert every character into a class (like [aA] for "a") which was tedious, but quicker than trying to find a better solution or resorting to grep.

A related, annoyingly common pattern is that the documentation for `find` states that `--regex` specifies a regex, but it does not state which flavour of regex. The documentation for certain versions of `find`, which support alternative engines, note that the default is Emacs. From this I was able to infer (perhaps wrongly) that the Busybox `find` uses Emacs-flavoured regex, but ultimate I still had to resort to some trial-and-error. This problem is all too common in API documentation.

justaj6y ago

Honestly, as a noob, this is one of the biggest reasons I have such a hard time deciding to learn regex.

Python flavor would probably be different than PCRE, which is probably different than JS flavor.

Even worse is that it might be too late to standardize all the regex flavors because there is already so much written in different regex flavors that it just costs too much for them to become obsolete in the future.

This is really demotivating.

chirss6y ago

Honestly don't let this get you down, here's a learning plan (use regex101 to learn)

1) Learn PCRE regex. 2) Try regex golf or cross words to learn PCRE regex. 3) Take the quiz on regex101.

Once you're done with all 3:

Learn the minor/major differences in the other languages. There aren't many. For example this named capture group:

(?<somename>someregex)

Would look like this in a different language:

(?P<somename>someregex)

There's some differences about what language can and cannot do like recursion because someone thought it was a great idea to make javascript awful at regex, but that's besides the point. Regex is totally worth learning.

new_guy6y ago

> Honestly, as a noob, this is one of the biggest reasons I have such a hard time deciding to learn regex.

Clear your afternoon, and just learn it. Seriously, it takes a couple of hours at best and then - BOOM - you're done for the rest of your life.

2 more replies

celeritascelery6y ago

The O’Riley book “mastering regular expressions” has a whole section dedicated to it. As well as several tables. But it would be nice to have an online version.

wyclif6y ago

And it's one one the best O'Reilly books. I went and checked because of your comment and just noticed there was a third edition that I missed, I have the second. Still a book worth studying.

alexhutcheson6y ago

RE2 syntax[1] is a pretty good option to learn, because it's mostly a "lowest common denominator" - if it works in RE2, it should work in PCRE, Python, Javascript, etc. The reverse isn't true - there is a bunch of syntax that RE2 doesn't support by design, often to constrain performance bounds.

Emacs regexps are unfortunately their own weird beast - they handle parentheses differently than other regexp engines, because Emacs assumes that you'll be running regexps on Lisp code a lot and want to easily match parentheses. The best documentation on that syntax is (confusingly) in the Elisp reference manual: https://www.gnu.org/software/emacs/manual/html_node/elisp/Sy....

[1] https://github.com/google/re2/wiki/Syntax

useragent866y ago

IME Emacs provides a very pleasant way to write regexps using the rx library. ELPA also has the package xr, which converts Elisp regexps to rx format, and pcre2el converts PCRE to Elisp. So a regexp like

    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Can easily be converted like:

    (->> "\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\\b"
         pcre-to-elisp xr)

To:

    (seq word-boundary
         (one-or-more
          (any "0-9A-Z" "%+._-"))
         "@"
         (one-or-more
          (any "0-9A-Z" ".-"))
         not-newline
         (repeat 2 4
                 (any "A-Z"))
         word-boundary)

1 more reply

mklein9946y ago

I tend to go to https://www.regular-expressions.info when I need to find out which features are supported between dialects. Not always up-to-date, but has some good info.

zaptheimpaler6y ago

Its like SQL - everyone has a dialect. For most things where a SQL/regex engine/parser isn't the core of what they do, it will never be a priority. The best approach IMO is something like this in priority order:

1. Stick to using the lowest common denominator like you did for case insensitivity.

2. If that becomes too cumbersome, then consider whether regex is the right tool for the job. Maybe you can use e.g Python/your favorite language with a known regex standard.

3. If there are no other tools and you're stuck with whatever flavor of regex one particular thing supports, only then invest time in learning the details. There is probably a book out there with the details even if there's no webpage.

Then pray you never get to step 3 :)

geongeorgekOP6y ago

You're totally right. Right now this tool only supports the javascript flavor of regex. That said, for all the simple expressions shown there it's more or less the same for most other engines. I guess that makes it okay.

8bitsrule6y ago

By coincidence, I found this link a bit earlier today. It tries to avoid flavors and exotic syntax.

https://rexegg.com/regex-quickstart.html

AlchemistCamp6y ago

To me, the divide is pre and post-Perl.

It's not so bad going between JS, Ruby and Elixir regex (possibly due to my use of a smaller set of features), but VIM regex disappoint me time after time.

waz0wski6y ago

if you're on osx, the app Patterns is really good for testing regex, and also has quick references for a variety of regex 'engines' and also has decent matching explanations

https://krillapps.com/patterns/

chirss6y ago

regex101 does a good job at showing you what the selected variant can do.

darau16y ago· 8 in thread

Nobody pointed it out, but there's also https://regexr.com/

It's how I learned regex years ago, and I still use it today to test/build more complex patterns.

strig6y ago

My go-to is https://regex101.com/

huseyinkeles6y ago

I've been using regex101 for many years and love it! The debugger [0] that it has is amazing!

[0] - https://regex101.com/debugger

darau16y ago

Didn't know about this. Thanks!

1 more reply

bepvte6y ago

I love regex101. It uses webassembly for some of its engines

52-6F-626y ago

I use the same as a default. It's been a great help.

jve6y ago

Well, there is a whole list of useful regex links posted 5 months ago when someone posted url to RegExr.

Enjoy: https://news.ycombinator.com/item?id=20614847

smartmic6y ago

Here is my goto resource for checking regexpr with railroad diagrams: https://regexper.com/

imafish6y ago

I love regexr. Has been a constant tab in my browser for years now.

kitd6y ago· 7 in thread

This is really cool!

2 points:

1. it fiddled with my back button which is a bit annoying

2. a better email sample is

    ^[^@]+@[^@]+\.[^@]+$

which removes the 2 ampersands problem.

laumars6y ago

Even that is wrong because you can have privately owned TLDs (I forget what they're technically called) like .google

So sundar.pichai@google is technically a valid address (whether .google has any MX records is another matter)

Regex shouldn't really be used for email addresses anyway because the only reliable way to authenticate an email address is to literally send an email to that address.

bduerst6y ago

AFAIK none of the TLDs allow for MX records on just the TLD

i.e. johndoe@com will never exist

2 more replies

donalhunt6y ago

.google does not have any MX records

1 more reply

skrebbel6y ago

You'll probably want to add \S to those character classes as well, or it matches "it's an @ sign. not an ampersand."

anamexis6y ago

Escaped or quoted whitespace is allowed in the local part of email addresses.

geongeorgekOP6y ago

Thank you!

I think I know what's wrong with your back button. I will fix it.

And for the regex. will try it out and see if I can add it.

bmn__6y ago

That's not how the spec works. Compliant solution: https://stackoverflow.com/a/1917982

blauditore6y ago· 7 in thread

Would be nice to have a regex for parsing HTML...

grabs popcorn

bmn__6y ago

Easy with a sufficiently powerful engine: https://stackoverflow.com/a/4234491

Relies on ?(DEFINE): http://p3rl.org/perlre#(DEFINE)

quickthrower26y ago

There is a good comment on that answer:

> To sum up: RegEx's are misnamed. I think it's a shame, but it won't change. Compatible 'RegEx' engines are not allowed to reject non-regular languages. They therefore cannot be implemented correctly with only Finte State Machines. The powerful concepts around computational classes do not apply. Use of RegEx's does not ensure O(n) execution time. The advantages of RegEx's are terse syntax and the implied domain of character recognition. To me, this is a slow moving train wreck, impossible to look away, but with horrible consequences unfolding

arkh6y ago

With subroutines and recursive patterns I think you could do something parsing valid HTML.

Your sanity won't be left intact tho.

asicsp6y ago

how about this "match "A B C" where A+B=C"[1] for sanity?

[1] http://www.drregex.com/2018/11/how-to-match-b-c-where-abc-be...

chirss6y ago

boom. https://regex101.com/r/PxSY4U/1 technically it does parse it. :P

bmn__6y ago

Nope. <h1 class="foo>bar">My First Heading</h1> will misparse. (This is valid HTML 5.) You really need recursive regex or something equivalent in power, otherwise you will always fail.

1 more reply

geongeorgekOP6y ago

Haha..careful. someone might take this seriously

crispyambulance6y ago· 5 in thread

I use regex a lot but deliberately keep it simple.

One thing that confounded me often was positive and negative look-arounds. I always got the expressions mixed up, until I just put the expressions into a table like this...

              look-behind  |  look-ahead
    ------------------------------------
    positive    (?<=a)b    |    a(?=b)
    ------------------------------------
    negative    (?<!a)b    |    a(?!b)

It's not hard, but for whatever reason my brain had trouble remembering the usage because every time I looked it up, each of those expressions was nested in a paragraph of explanation, and I could not see the simple intuitive pattern.

Putting it into a simple visualization helps a lot.

Now, if I can find a similar mnemonic for backreferences !?

wahern6y ago

Maybe it's easier to remember that lookbehinds are evil from an implementation standpoint, and even in Perl have arbitrary limitations. If you see lookbehinds, look away! If you see lookaheads, go ahead.

glangdale6y ago

Oddly, lookbehinds are evil only in a specific backtracking world. We never got around to implementing arbitrary lookarounds in Hyperscan (https://github.com/intel/hyperscan) but if we had done something in the automata world to handle lookaround, lookbehinds are way easier than lookaheads.

To handle a lookbehind, you really only need to occasionally 'AND' together some states (not an operation you would normally do in a standard NFA whether Glushkov or Thompson). To handle lookaheads... well, it gets ugly.

1 more reply

ygra6y ago

It's something I really like about .NET's regular expressions. Lookbehind has no limitations and will just match backwards with all features you can use in other parts.

So depending on the language or flavor you're working in, running away isn't really necessary.

lonelappde6y ago

Lookbehinds stay behind.

geongeorgekOP6y ago

This is really intended for beginners. but I can confirm more content is coming soon <3

superasn6y ago· 4 in thread

Regex are quite simple and useful but my only issue is with those recursive things. Like how do you match balanced brackets? I have a regex (pcre) copy-pasted for it but for the life of me I don't get it or maybe nod my head but instantly ununderstand it. I wish there was a simple to understand doc that teaches to me how I can match something like:

    "(this is inside a bracket (and this is nested or (double nested)))

P.S. I know token parsing is better for these things but still I just want to learn the other thing too.

gizmo6866y ago

Balanced paranthesis are not a regular language, so it s theoretically imposdible to match them with regular expressions.

In practice, most regexp implemenations you see are more powerful then regular expressions. For instance, .net has a balancing groups feature [0] for exactly this usecase.

[0] https://regular-expressions.mobi/balancing.html?wlr=1

superasn6y ago

The regex I've copy-pasted is this:

    $str = "(this is inside a bracket (and this is nested or (double nested)))";
    do {
        preg_match_all('~\(((?:[^\(\)]++|(?R))*)\)~', $str, $matches);
        echo $str = $matches[1][0] ?? '', "\n";
    } while($str);

Outputs this [1]:

    > this is inside a bracket (and this is nested or (double nested))
    > and this is nested or (double nested)
    > double nested

You're right that there is more processing involved (e.g. while loop) but I still don't understand this part

    '~\(((?:[^\(\)]++|(?R))*)\)~'

[1] https://rextester.com/MEH86820

1 more reply

chirss6y ago

Can you explain the problem further?

superasn6y ago

please see my reply to @gizmo686

1 more reply

__tk__6y ago· 3 in thread

I'm loving the graphs which for the first time in years are giving me an idea of what an expression is actually doing. Just because the visualization is kept in a form that is easy to understand with a programming background but can also be translated to the expression itself in a straightforward manner.

noxToken6y ago

Graphs for these really hammer home the point that regular expressions aren't magic. Parsers have so many abilities that when starting out, my expressions were horribly inefficient and missed many corner cases. Learning to graph them just like automata immediately made things easier.

When green devs are having trouble with regular expressions (and don't have a formal computer science background), I like to give them a crash course in DFAs.

geongeorgekOP6y ago

I can't take credit for the visualizations although implementing it was a pain in the ass. It was originally created by: https://regexper.com/

leibnitz276y ago

I knocked up a silly dynamic regex grapher a while back as a little teaching aid - mildly fun

https://www.benf.org/other/regexview/

philshem6y ago· 3 in thread

I have a secret hobby of answering python + regex questions on stackoverflow with pure python.

geongeorgekOP6y ago

I'm gonna pretend I didn't read this

johnnylambada6y ago

Examples?

philshem6y ago

_secret_

Glench6y ago· 3 in thread

Plug for Verbal Expressions (no affiliation), which has an alternate way of compiling more human-readable regexes for a dozen languages: http://verbalexpressions.github.io/

linusjs_6y ago

I remember that library. A year after I made regexpbuilder https://www.npmjs.com/package/regexpbuilder that library suddenly appeared, and was basically a rip-off of the concept I appear to have created (there was no such other library before regexpbuilder), but is also fairly useless because it doesn't look like it could represent more than about 10% of the possible regular expressions. Yet there was no mention of my library at all in the readme of verbal expressions.

certifiedloud6y ago

A CLI version of this would be pretty useful to me.

geongeorgekOP6y ago

This looks nice

Amarok6y ago· 3 in thread

^[a-z0-9_-]{3,15}$

The username reference doesn't match 16 characters as claimed

geongeorgekOP6y ago

I should match. the number 15 there means that repeat x up to 15 times. so 1+15=16.

looks good to me

aratauto6y ago

That is not correct. 15 is total maximum number of repeats including the first one. Even the diagram on https://ihateregex.io/expr/username correctly says that loop can be taken between 2 and 14 times.

asicsp6y ago

where does the extra 1 come from? a{2,5} means match 'a' two to five times

StavrosK6y ago· 2 in thread

I love regex and have no trouble reading them, but still love this tool, great job. I especially like the railroad diagrams, for those cases where I brainfarted on a regex and it's doing something other than what I intended. Thanks for this.

geongeorgekOP6y ago

I'm glad you like the tool <3 It will have a lot more content soon :)

chirss6y ago

If you want some help swing by #regex on efnet, happy to help.

dana3216y ago· 2 in thread

One thing i've always missed from the Perl programming language is the regex operators.

You could do:

  my $var='foo foo bar and more bar foo!!!';

  if($var=~/(foo|bar)/g){  # does the variable contain foo or bar?

    print "foo! $1 removing foo..\n";

    # remove our value..

    $var=~s/$1//g;

  }

radiac6y ago

So did I: https://github.com/radiac/python-perl/

dana3216y ago

Awesome job, i did a hack bootstrapping the tokenizer to do the same thing in php, didn't release it though.

dan_hawkins6y ago· 2 in thread

Is there a bug? In regexp for IPv4: https://ihateregex.io/expr/ip expression ends with {3} but the diagram states "2 times" in lower right - shouldn't it say "3 times"?

jve6y ago

I think it says "repeat 2" times. So basically you'v already went through the group and then 2 more times.

Because if I specify x{0,3}, i have 2 paths - around x and thru x + at most 2 more times

geongeorgekOP6y ago

Yep you are right

axegon6y ago· 2 in thread

This is awesome but.... I don't hate regex. Matter of fact, I love regex.

geongeorgekOP6y ago

check out the ipv6 one :)

axegon6y ago

I've had to write regex for deeply proprietary SQL-like (the word "like" is a big BIG stretch) language. This really is nothing. The regex itself was 4 pages long. AFAIK they still use it in production, almost 10 years later with 0 modifications.

¯\_(ツ)_/¯

lfglopes6y ago· 1 in thread

I used to use this site http://txt2re.com which is now off the grid, at the least since yesterday. :(

Unlike most regex helpers, in this one you would start with the text you want to filter/parse and then it would suggest you possible extractions.

Do you know any alternatives?

deadliftpro6y ago

same, looking for an alternative to txt2re.

rubyn00bie6y ago· 1 in thread

Nice work on this!

Something subtle, but I quite loved the email regex is, IMHO, close to perfect: \S+@\S+\.\S+

Because the "perfect" one is just absurd, and no one realizes it's going to be so fucking absurd until they start getting support cases and then go read something like this: https://stackoverflow.com/a/201378/931209

> If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective.

p4lindromica6y ago

Even this regexp has false positives.

The `ai` ccTLD ran their own mail server at the root, so an address like `a@ai` was a valid email address.

They serve a website at the tld root: http://ai./

vzidex6y ago· 1 in thread

Very cool! The site that worked best for me to learn regex was https://regexcrossword.com/ - after solving my way through all of them (I got really hooked when I discovered the site) I found I was alright at regex.

geongeorgekOP6y ago

Thank you for sharing that. looks good

adambowles6y ago· 1 in thread

>/h.llo/ the '.' matches any one character other than a new line character... matches 'hello', 'hallo' but not 'h llo'

in the cheatsheet is false. (https://regexr.com/4tc48)

`.` can match any character except linebreaks (including whitespace)

jodrellblank6y ago

`.` "can" match any character including linebreaks if the regex engine is in re.DOTALL mode (Python) or SingleLine Mode (.Net).

asicsp6y ago· 1 in thread

neat site! clicking an example opens up a playground with live update and explanation and railroad diagrams, similar to sites like regex101[1] and regulex[2]

one suggestion would be to mention clearly which tool/language is being used, regex has no unified standard.. based on "Cheatsheet adapted" message at the bottom, I think it is for JavaScript. I wrote a book on js regexp last year, and I have post for cheatsheet too [3]

[1] https://regex101.com/

[2] https://jex.im/regulex

[3] https://learnbyexample.github.io/cheatsheet/javascript/javas...

geongeorgekOP6y ago

Totally agreed! Right now I only support javascript. But for everything shown there, it's pretty much the same for most flavors

mimixco6y ago· 1 in thread

This is awesome! Thank you! I hate regex, too, but I love your inline railroad diagramming tool.

geongeorgekOP6y ago

Haha thank you <3

Diti6y ago· 1 in thread

For the love of god, PLEASE DON’T USE REGEX TO VALIDATE EMAIL. The RegEx of this website ignores plus-addressing, for example. All you need to do to validate email is send a verification email.

xiconfjs6y ago

not just the email regex is simplified (and at the end plain wrong). Also the one for phone numbers is highly simplified and will not match all valid phone numbers...

binarysneaker6y ago· 1 in thread

These regexs are garbage. Others have suggested better sites for learning how to construct regexs, and stackoverflow has plenty of great examples.

geongeorgekOP6y ago

Why don't you link them with the comment

olalonde6y ago· 1 in thread

Thumbs up for the relatable domain name.

geongeorgekOP6y ago

Glad you find it that way

ape46y ago· 1 in thread

The IPv6 regex is surprisingly complicated.

geongeorgekOP6y ago

Yeah. this is when you start to have 2 problems

geongeorgekOP6y ago

I used to spend hours trying to craft the perfect expression for my scraping projects not realizing that I don't really know regex.

This tool is a cheat sheet that also explains the commonly used expressions so that you understand it.

- There is a visual representation of the regular expression (thanks to regexpr)

- The application shows matching strings which you can play around

- Expressions can be edited and these are instantly validated

xxsaculxx6y ago

Nice tool! I personally use https://regex101.com/ as I like the explanations and quick reference.

sylvanaar6y ago

Nothing will ever beat RegexBuddy when it comes to Regex tools. It is an entire IDE just for regex, and has been my not-so-secret weapon for a decade or more.

KenanSulayman6y ago

I don't understand why the Github repository lists regexper as the source of the visual graph code but the frame only shows iHateRegex as watermark?

If the only thing that is embedded in that frame was taken entirely from a different project, that project should at least be mentioned in the frame.

hyperpape6y ago

Really nice idea.

I found that you can see your own regex with railroad diagram by going to one of the prepopulated examples and editing it. However, it wasn't clear to me that's the intended use of the tool. It's either a little side-effect, or not super-discoverable.

mNovak6y ago

I always refer back to http://rexegg.com/ Not a tool as such, but a good reference if you know how it works and just need to refresh on syntax.

kazinator6y ago

There is no way I would just plop that IPv6 regex into any serious program. :)

chenster6y ago

For email specific regular expression, it's all covered on https://emailregex.com

esaym6y ago

Either I'm a regex wizard and don't know it, or perhaps I think I know something but know nothing at all but I've never complained about using regex expressions. I use them all the time without thought. Never quite figured out the need for a cheatsheet either, your language of choice should have a good documentation page for any specific supported syntax.

hamid_ra6y ago

love the idea! I would crowdsource it so people can add their regex and vote on other people rexgexes!

samat6y ago

This is very neat, thank you!

shawnyou6y ago

Good tool

j / k navigate · click thread line to collapse

129 comments

111 comments · 36 top-level

robert_tweed6y ago· 14 in thread

justaj6y ago

Honestly, as a noob, this is one of the biggest reasons I have such a hard time deciding to learn regex.

Python flavor would probably be different than PCRE, which is probably different than JS flavor.

This is really demotivating.

chirss6y ago

Honestly don't let this get you down, here's a learning plan (use regex101 to learn)

1) Learn PCRE regex. 2) Try regex golf or cross words to learn PCRE regex. 3) Take the quiz on regex101.

Once you're done with all 3:

Learn the minor/major differences in the other languages. There aren't many. For example this named capture group:

(?<somename>someregex)

Would look like this in a different language:

(?P<somename>someregex)

new_guy6y ago

> Honestly, as a noob, this is one of the biggest reasons I have such a hard time deciding to learn regex.

Clear your afternoon, and just learn it. Seriously, it takes a couple of hours at best and then - BOOM - you're done for the rest of your life.

2 more replies

celeritascelery6y ago

The O’Riley book “mastering regular expressions” has a whole section dedicated to it. As well as several tables. But it would be nice to have an online version.

wyclif6y ago

And it's one one the best O'Reilly books. I went and checked because of your comment and just noticed there was a third edition that I missed, I have the second. Still a book worth studying.

alexhutcheson6y ago

[1] https://github.com/google/re2/wiki/Syntax

useragent866y ago

    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Can easily be converted like:

    (->> "\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\\b"
         pcre-to-elisp xr)

To:

    (seq word-boundary
         (one-or-more
          (any "0-9A-Z" "%+._-"))
         "@"
         (one-or-more
          (any "0-9A-Z" ".-"))
         not-newline
         (repeat 2 4
                 (any "A-Z"))
         word-boundary)

1 more reply

mklein9946y ago

I tend to go to https://www.regular-expressions.info when I need to find out which features are supported between dialects. Not always up-to-date, but has some good info.

zaptheimpaler6y ago

1. Stick to using the lowest common denominator like you did for case insensitivity.

2. If that becomes too cumbersome, then consider whether regex is the right tool for the job. Maybe you can use e.g Python/your favorite language with a known regex standard.

Then pray you never get to step 3 :)

geongeorgekOP6y ago

8bitsrule6y ago

By coincidence, I found this link a bit earlier today. It tries to avoid flavors and exotic syntax.

https://rexegg.com/regex-quickstart.html

AlchemistCamp6y ago

To me, the divide is pre and post-Perl.

It's not so bad going between JS, Ruby and Elixir regex (possibly due to my use of a smaller set of features), but VIM regex disappoint me time after time.

waz0wski6y ago

if you're on osx, the app Patterns is really good for testing regex, and also has quick references for a variety of regex 'engines' and also has decent matching explanations

https://krillapps.com/patterns/

chirss6y ago

regex101 does a good job at showing you what the selected variant can do.

darau16y ago· 8 in thread

Nobody pointed it out, but there's also https://regexr.com/

It's how I learned regex years ago, and I still use it today to test/build more complex patterns.

strig6y ago

My go-to is https://regex101.com/

huseyinkeles6y ago

I've been using regex101 for many years and love it! The debugger [0] that it has is amazing!

[0] - https://regex101.com/debugger

darau16y ago

Didn't know about this. Thanks!

1 more reply

bepvte6y ago

I love regex101. It uses webassembly for some of its engines

52-6F-626y ago

I use the same as a default. It's been a great help.

jve6y ago

Well, there is a whole list of useful regex links posted 5 months ago when someone posted url to RegExr.

Enjoy: https://news.ycombinator.com/item?id=20614847

smartmic6y ago

Here is my goto resource for checking regexpr with railroad diagrams: https://regexper.com/

imafish6y ago

I love regexr. Has been a constant tab in my browser for years now.

kitd6y ago· 7 in thread

This is really cool!

2 points:

1. it fiddled with my back button which is a bit annoying

2. a better email sample is

    ^[^@]+@[^@]+\.[^@]+$

which removes the 2 ampersands problem.

laumars6y ago

Even that is wrong because you can have privately owned TLDs (I forget what they're technically called) like .google

So sundar.pichai@google is technically a valid address (whether .google has any MX records is another matter)

Regex shouldn't really be used for email addresses anyway because the only reliable way to authenticate an email address is to literally send an email to that address.

bduerst6y ago

AFAIK none of the TLDs allow for MX records on just the TLD

i.e. johndoe@com will never exist

2 more replies

donalhunt6y ago

.google does not have any MX records

1 more reply

skrebbel6y ago

You'll probably want to add \S to those character classes as well, or it matches "it's an @ sign. not an ampersand."

anamexis6y ago

Escaped or quoted whitespace is allowed in the local part of email addresses.

geongeorgekOP6y ago

Thank you!

I think I know what's wrong with your back button. I will fix it.

And for the regex. will try it out and see if I can add it.

bmn__6y ago

That's not how the spec works. Compliant solution: https://stackoverflow.com/a/1917982

blauditore6y ago· 7 in thread

Would be nice to have a regex for parsing HTML...

grabs popcorn

bmn__6y ago

Easy with a sufficiently powerful engine: https://stackoverflow.com/a/4234491

Relies on ?(DEFINE): http://p3rl.org/perlre#(DEFINE)

quickthrower26y ago

There is a good comment on that answer:

arkh6y ago

With subroutines and recursive patterns I think you could do something parsing valid HTML.

Your sanity won't be left intact tho.

asicsp6y ago

how about this "match "A B C" where A+B=C"[1] for sanity?

[1] http://www.drregex.com/2018/11/how-to-match-b-c-where-abc-be...

chirss6y ago

boom. https://regex101.com/r/PxSY4U/1 technically it does parse it. :P

bmn__6y ago

Nope. <h1 class="foo>bar">My First Heading</h1> will misparse. (This is valid HTML 5.) You really need recursive regex or something equivalent in power, otherwise you will always fail.

1 more reply

geongeorgekOP6y ago

Haha..careful. someone might take this seriously

crispyambulance6y ago· 5 in thread

I use regex a lot but deliberately keep it simple.

One thing that confounded me often was positive and negative look-arounds. I always got the expressions mixed up, until I just put the expressions into a table like this...

              look-behind  |  look-ahead
    ------------------------------------
    positive    (?<=a)b    |    a(?=b)
    ------------------------------------
    negative    (?<!a)b    |    a(?!b)

Putting it into a simple visualization helps a lot.

Now, if I can find a similar mnemonic for backreferences !?

wahern6y ago

glangdale6y ago

1 more reply

ygra6y ago

It's something I really like about .NET's regular expressions. Lookbehind has no limitations and will just match backwards with all features you can use in other parts.

So depending on the language or flavor you're working in, running away isn't really necessary.

lonelappde6y ago

Lookbehinds stay behind.

geongeorgekOP6y ago

This is really intended for beginners. but I can confirm more content is coming soon <3

superasn6y ago· 4 in thread

    "(this is inside a bracket (and this is nested or (double nested)))

P.S. I know token parsing is better for these things but still I just want to learn the other thing too.

gizmo6866y ago

Balanced paranthesis are not a regular language, so it s theoretically imposdible to match them with regular expressions.

In practice, most regexp implemenations you see are more powerful then regular expressions. For instance, .net has a balancing groups feature [0] for exactly this usecase.

[0] https://regular-expressions.mobi/balancing.html?wlr=1

superasn6y ago

The regex I've copy-pasted is this:

    $str = "(this is inside a bracket (and this is nested or (double nested)))";
    do {
        preg_match_all('~\(((?:[^\(\)]++|(?R))*)\)~', $str, $matches);
        echo $str = $matches[1][0] ?? '', "\n";
    } while($str);

Outputs this [1]:

    > this is inside a bracket (and this is nested or (double nested))
    > and this is nested or (double nested)
    > double nested

You're right that there is more processing involved (e.g. while loop) but I still don't understand this part

    '~\(((?:[^\(\)]++|(?R))*)\)~'

[1] https://rextester.com/MEH86820

1 more reply

chirss6y ago

Can you explain the problem further?

superasn6y ago

please see my reply to @gizmo686

1 more reply

__tk__6y ago· 3 in thread

noxToken6y ago

When green devs are having trouble with regular expressions (and don't have a formal computer science background), I like to give them a crash course in DFAs.

geongeorgekOP6y ago

I can't take credit for the visualizations although implementing it was a pain in the ass. It was originally created by: https://regexper.com/

leibnitz276y ago

I knocked up a silly dynamic regex grapher a while back as a little teaching aid - mildly fun

https://www.benf.org/other/regexview/

philshem6y ago· 3 in thread

I have a secret hobby of answering python + regex questions on stackoverflow with pure python.

geongeorgekOP6y ago

I'm gonna pretend I didn't read this

johnnylambada6y ago

Examples?

philshem6y ago

_secret_

Glench6y ago· 3 in thread

Plug for Verbal Expressions (no affiliation), which has an alternate way of compiling more human-readable regexes for a dozen languages: http://verbalexpressions.github.io/

linusjs_6y ago

certifiedloud6y ago

A CLI version of this would be pretty useful to me.

geongeorgekOP6y ago

This looks nice

Amarok6y ago· 3 in thread

^[a-z0-9_-]{3,15}$

The username reference doesn't match 16 characters as claimed

geongeorgekOP6y ago

I should match. the number 15 there means that repeat x up to 15 times. so 1+15=16.

looks good to me

aratauto6y ago

asicsp6y ago

where does the extra 1 come from? a{2,5} means match 'a' two to five times

StavrosK6y ago· 2 in thread

geongeorgekOP6y ago

I'm glad you like the tool <3 It will have a lot more content soon :)

chirss6y ago

If you want some help swing by #regex on efnet, happy to help.

dana3216y ago· 2 in thread

One thing i've always missed from the Perl programming language is the regex operators.

You could do:

  my $var='foo foo bar and more bar foo!!!';

  if($var=~/(foo|bar)/g){  # does the variable contain foo or bar?

    print "foo! $1 removing foo..\n";

    # remove our value..

    $var=~s/$1//g;

  }

radiac6y ago

So did I: https://github.com/radiac/python-perl/

dana3216y ago

Awesome job, i did a hack bootstrapping the tokenizer to do the same thing in php, didn't release it though.

dan_hawkins6y ago· 2 in thread

Is there a bug? In regexp for IPv4: https://ihateregex.io/expr/ip expression ends with {3} but the diagram states "2 times" in lower right - shouldn't it say "3 times"?

jve6y ago

I think it says "repeat 2" times. So basically you'v already went through the group and then 2 more times.

Because if I specify x{0,3}, i have 2 paths - around x and thru x + at most 2 more times

geongeorgekOP6y ago

Yep you are right

axegon6y ago· 2 in thread

This is awesome but.... I don't hate regex. Matter of fact, I love regex.

geongeorgekOP6y ago

check out the ipv6 one :)

axegon6y ago

¯\_(ツ)_/¯

lfglopes6y ago· 1 in thread

I used to use this site http://txt2re.com which is now off the grid, at the least since yesterday. :(

Unlike most regex helpers, in this one you would start with the text you want to filter/parse and then it would suggest you possible extractions.

Do you know any alternatives?

deadliftpro6y ago

same, looking for an alternative to txt2re.

rubyn00bie6y ago· 1 in thread

Nice work on this!

Something subtle, but I quite loved the email regex is, IMHO, close to perfect: \S+@\S+\.\S+

p4lindromica6y ago

Even this regexp has false positives.

The `ai` ccTLD ran their own mail server at the root, so an address like `a@ai` was a valid email address.

They serve a website at the tld root: http://ai./

vzidex6y ago· 1 in thread

geongeorgekOP6y ago

Thank you for sharing that. looks good

adambowles6y ago· 1 in thread

>/h.llo/ the '.' matches any one character other than a new line character... matches 'hello', 'hallo' but not 'h llo'

in the cheatsheet is false. (https://regexr.com/4tc48)

`.` can match any character except linebreaks (including whitespace)

jodrellblank6y ago

`.` "can" match any character including linebreaks if the regex engine is in re.DOTALL mode (Python) or SingleLine Mode (.Net).

asicsp6y ago· 1 in thread

neat site! clicking an example opens up a playground with live update and explanation and railroad diagrams, similar to sites like regex101[1] and regulex[2]

[1] https://regex101.com/

[2] https://jex.im/regulex

[3] https://learnbyexample.github.io/cheatsheet/javascript/javas...

geongeorgekOP6y ago

Totally agreed! Right now I only support javascript. But for everything shown there, it's pretty much the same for most flavors

mimixco6y ago· 1 in thread

This is awesome! Thank you! I hate regex, too, but I love your inline railroad diagramming tool.

geongeorgekOP6y ago

Haha thank you <3

Diti6y ago· 1 in thread

For the love of god, PLEASE DON’T USE REGEX TO VALIDATE EMAIL. The RegEx of this website ignores plus-addressing, for example. All you need to do to validate email is send a verification email.

xiconfjs6y ago

not just the email regex is simplified (and at the end plain wrong). Also the one for phone numbers is highly simplified and will not match all valid phone numbers...

binarysneaker6y ago· 1 in thread

These regexs are garbage. Others have suggested better sites for learning how to construct regexs, and stackoverflow has plenty of great examples.

geongeorgekOP6y ago

Why don't you link them with the comment

olalonde6y ago· 1 in thread

Thumbs up for the relatable domain name.

geongeorgekOP6y ago

Glad you find it that way

ape46y ago· 1 in thread

The IPv6 regex is surprisingly complicated.

geongeorgekOP6y ago

Yeah. this is when you start to have 2 problems

geongeorgekOP6y ago

I used to spend hours trying to craft the perfect expression for my scraping projects not realizing that I don't really know regex.

This tool is a cheat sheet that also explains the commonly used expressions so that you understand it.

- There is a visual representation of the regular expression (thanks to regexpr)

- The application shows matching strings which you can play around

- Expressions can be edited and these are instantly validated

xxsaculxx6y ago

Nice tool! I personally use https://regex101.com/ as I like the explanations and quick reference.

sylvanaar6y ago

Nothing will ever beat RegexBuddy when it comes to Regex tools. It is an entire IDE just for regex, and has been my not-so-secret weapon for a decade or more.

KenanSulayman6y ago

I don't understand why the Github repository lists regexper as the source of the visual graph code but the frame only shows iHateRegex as watermark?

If the only thing that is embedded in that frame was taken entirely from a different project, that project should at least be mentioned in the frame.

hyperpape6y ago

Really nice idea.

mNovak6y ago

I always refer back to http://rexegg.com/ Not a tool as such, but a good reference if you know how it works and just need to refresh on syntax.

kazinator6y ago

There is no way I would just plop that IPv6 regex into any serious program. :)

chenster6y ago

For email specific regular expression, it's all covered on https://emailregex.com

esaym6y ago

hamid_ra6y ago

love the idea! I would crowdsource it so people can add their regex and vote on other people rexgexes!

samat6y ago

This is very neat, thank you!

shawnyou6y ago

Good tool

j / k navigate · click thread line to collapse