8 Regular Expressions You Should Know

8 Regular Expressions You Should Know

Aug 10th in Other by Vasili

Regular expressions are a language of their own. When you learn a new programming language, they're this little sub-language that makes no sense at first glance. Many times you have to read another tutorial, article, or book just to understand the "simple" pattern described. Today, we'll review eight regular expressions that you should know for your next coding project.

PG

Author: Vasili

This is a NETTUTS contributor who has published 3 tutorial(s) so far here. Their bio is coming soon!

Background Info on Regular Expressions

This is what Wikipedia has to say about them:

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

Now, that doesn't really tell me much about the actual patterns. The regexes I'll be going over today contains characters such as \w, \s, \1, and many others that represent something totally different from what they look like.

If you'd like to learn a little about regular expressions before you continue reading this article, I'd suggest watching the Regular Expressions for Dummies screencast series.

The eight regular expressions we'll be going over today will allow you to match a(n): username, password, email, hex value (like #fff or #000), slug, URL, IP address, and an HTML tag. As the list goes down, the regular expressions get more and more confusing. The pictures for each regex in the beginning are easy to follow, but the last four are more easily understood by reading the explanation.

The key thing to remember about regular expressions is that they are almost read forwards and backwards at the same time. This sentence will make more sense when we talk about matching HTML tags.

Note: The delimiters used in the regular expressions are forward slashes, "/". Each pattern begins and ends with a delimiter. If a forward slash appears in a regex, we must escape it with a backslash: "\/".

Matching a Username

Matching a username

Pattern:

/^[a-z0-9_-]{3,16}$/

Description:

We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).

String that matches:

my-us3r_n4m3

String that doesn't match:

th1s1s-wayt00_l0ngt0beausername (too long)

Matching a Password

Matching a password

Pattern:

/^[a-z0-9_-]{6,18}$/

Description:

Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).

String that matches:

myp4ssw0rd

String that doesn't match:

mypa$$w0rd (contains a dollar sign)

Matching a Hex Value

Matching a hex valud

Pattern:

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

Description:

We begin by telling the parser to find the beginning of the string (^). Next, a number sign is optional because it is followed a question mark. The question mark tells the parser that the preceding character — in this case a number sign — is optional, but to be "greedy" and capture it if it's there. Next, inside the first group (first group of parentheses), we can have two different situations. The first is any lowercase letter between a and f or a number six times. The vertical bar tells us that we can also have three lowercase letters between a and f or numbers instead. Finally, we want the end of the string ($).

The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f's.

String that matches:

#a3c113

String that doesn't match:

#4d82h4 (contains the letter h)

Matching a Slug

Matching a slug

Pattern:

/^[a-z0-9-]+$/

Description:

You will be using this regex if you ever have to work with mod_rewrite and pretty URL's. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).

String that matches:

my-title-here

String that doesn't match:

my_title_here (contains underscores)

Matching an Email

Matching an email

Pattern:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Description:

We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD's (.ny.us or .co.uk). Finally, we want the end of the string ($).

String that matches:

john@doe.com

String that doesn't match:

john@doe.something (TLD is too long)

Matching a URL

Matching a url

Pattern:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Description:

This regex is almost like taking the ending part of the above regex, slapping it between "http://" and some file structure at the end. It sounds a lot simpler than it really is. To start off, we search for the beginning of the line with the caret.

The first capturing group is all option. It allows the URL to begin with "http://", "https://", or neither of them. I have a question mark after the s to allow URL's that have http or https. In order to make this entire group optional, I just added a question mark to the end of it.

Next is the domain name: one or more numbers, letters, dots, or hypens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we say that this group can be matched as many times as we want. Pretty much this allows multiple directories to be matched along with a file at the end. I have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.

Then a trailing slash is matched, but it can be optional. Finally we end with the end of the line.

String that matches:

http://net.tutsplus.com/about

String that doesn't match:

http://google.com/some/file!.html (contains an exclamation point)

Matching an IP Address

Matching an IP address

Pattern:

/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/

Description:

Now, I'm not going to lie, I didn't write this regex; I got it from here. Now, that doesn't mean that I can't rip it apart character for character.

The first capture group really isn't a captured group because

?:
was placed inside which tells the parser to not capture this group (more on this in the last regex). We also want this non-captured group to be repeated three times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup then a dot to move on.

The subgroup is also another non-capture group. It's just a bunch of character sets (things inside brackets): the string "25" followed by a number between 0 and 5; or the string "2" and a number between 0 and 4 and any number; or an optional zero or one followed by two numbers, with the second being optional.

After we match three of those, it's onto the next non-capturing group. This one wants: the string "25" followed by a number between 0 and 5; or the string "2" with a number between 0 and 4 and another number at the end; or an optional zero or one followed by two numbers, with the second being optional.

We end this confusing regex with the end of the string.

String that matches:

73.60.124.136 (no, that is not my IP address :P)

String that doesn't match:

256.60.124.136 (the first group must be "25" and a number between zero and five)

Matching an HTML Tag

Matching an HTML tag

Pattern:

/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Description:

One of the more useful regexes on the list. It matches any HTML tag with the content inside. As usually, we begin with the start of the line.

First comes the tag's name. It must be one or more letters long. This is the first capture group, it comes in handy when we have to grab the closing tag. The next thing are the tag's attributes. This is any character but a greater than sign (>). Since this is optional, but I want to match more than one character, the star is used. The plus sign makes up the attribute and value, and the star says as many attributes as you want.

Next comes the third non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forward slash, and a greater than sign. The first option looks for a greater than sign followed by any number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing group. In this case it was the tag's name. Now, if that couldn't be matched we want to look for a self closing tag (like an img, br, or hr tag). This needs to have one or more spaces followed by "/>".

The regex is ended with the end of the line.

String that matches:

<a href="http://net.tutsplus.com/">Nettuts+</a>

String that doesn't match:

<img src="img.jpg" alt="My image>" /> (attributes can't contain greater than signs)

Conclusion

I hope that you have grasped the ideas behind regular expressions a little bit better. Hopefully you'll be using these regexes in future projects! Many times you won't need to decipher a regex character by character, but sometimes if you do this it helps you learn. Just remember, don't be afraid of regular expressions, they might not seem it, but they make your life a lot easier. Just try and pull out a tag's name from a string without regular expressions! ;)


Related Posts

Check out some more great tutorials and articles that you might like

Enjoy this Post?

Your vote will help us grow this site and provide even more awesomeness

Plus Members

Source Files, Bonus Tutorials and
More for $9 a month for all TUTS+
sites in one subscription.

Join Now

User Comments

( ADD YOURS )
  1. PG

    Pierre August 10th

    Awesome post.

    ( Reply )
  2. PG

    Chad Hietala August 10th

    great reference.

    ( Reply )
  3. PG

    Alexander Högberg August 10th

    Thanks a bunch! I finally got a real grip of this now, great reference pictures.

    ( Reply )
  4. PG

    Jeff Adams August 10th

    Great tutorial, simple yet for folks like me who are designers SLASH developers but not one or the other this is great.

    particularly liked the images with these, they explain things with example – top tut!

    ( Reply )
  5. PG

    mitchell bryson August 10th

    Matching a Hex Value – graphic doesn’t match expression

    ( Reply )
  6. PG

    Simon August 10th

    Regex: something I know I should know but I know I don’t know well enough.

    This is a great post, bookmarked, thank you.

    Also, thanks for the link to the screencast.

    ( Reply )
    1. PG

      Simon August 10th

      Also, sorry to be clogging this up, just a question to the general public. I’ve used rubular before to test my regular expressions with ruby, but is there something like it – ease of use, etc – for other languages (I’m thinking php and javascript here).

      ( Reply )
      1. PG

        Way August 10th

        For sure there are a lot of Regex tester out there.
        Easy to use is the following one:
        http://www.supercrumbly.com/assets/html/phpregextester/

        And for everybody: Here is a really really helpfull tool for regex:
        http://www.txt2re.com/

  7. PG

    ru83n August 10th

    What about IPv6?

    ( Reply )
  8. PG

    MGK August 10th

    Hello,

    nice idea to do such a tuts :)

    however, I have some things to add

    concerning the email regex, and the url regex, I personnaly use a php function : filter_var.

    The result for an email would be

    filter_var($email_to_verify, FILTER_VALIDATE_EMAIL)

    it returns true or false.

    for an url, it would be

    filter_var($websiteurl_to_verify, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED)

    the FILTER_FLAG_SCHEME_REQUIRED is optional if you need to force the http/https/…

    concerning the password, I wanted to get a password of 6 to 15 characters, with letters and numbers ONLY, and with at least 1 number, 1 lowercase letter, AND 1 uppercase letter..

    so the regex I made is :

    $pattern_pass = “/(?=^.{6,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?!.*[\W_\x7B-\xFF]).*$/”;

    all other characters are denied of course.

    finally, as for Names/Surnames, (not usernames), I wanted something to allow letters, space and accents only (Note, for the accents and the space, it is because I’m french and we have names such as De Lamberté for example), with 2 to 30 characters

    so I made something like this :

    $pattern_name = “/^([\p{L}\s]{2,30})$/”;

    voila ! I hope it was as useful as your tuts, and if anybody can verify my regex because I would like to have a “foreign eye” to catch any error I’ve made that I cannot see !

    Regards :)

    ( Reply )
    1. PG

      Lane August 10th

      filter_var(”a@a”, FILTER_VALIDATE_EMAIL);

      That validates. I’m not sure if it’s intended, but I wouldn’t wanna validate an e-mail without a TLD.

      ( Reply )
      1. PG

        Peter August 11th

        Just like the regular expression from this article would happily allow an email address of the form “.@….” — I wouldn’t wanna validate an e-mail with only punctuation!

      2. PG

        Steve August 12th

        If you want to get really strict with e-mails:

        ^(?:[a-zA-Z0-9_'^&/+-])+(?:\.(?:[a-zA-Z0-9_'^&/+-])+)*@(?:(?:\[?(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.){3}(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\]?)|(?:[a-zA-Z0-9-]+\.)+(?:[a-zA-Z]){2,}\.?)$

        Or try regexlib.com

  9. PG

    Erik August 10th

    Great reference! However, at first glance I don’t see why the hex value ‘#4d82h4′ wouldn’t match. It’s indeed not a valid hex value due to the ‘h’, but it should match the expression displayed because it matches the entire alphabet (in lowercase).

    ( Reply )
    1. PG

      MGK August 10th

      there is an error on the picture indeed, but the correct pattern which is

      /^#?([a-f0-9]{6}|[a-f0-9]{3})$/

      is written just below the picture…

      ( Reply )
      1. PG

        Peter August 11th

        An arguably improved pattern would be:

        /^#?[0-9a-f]{3}(?:[0-9a-f]{3})?$/iD

        1. Much neater (IMO) specification of optional 4th-6th hex digit
        2. D modifier to anchor to actual end of string
        3. i modifier to allow upper- and lower-case hex digits
        4. No need for the capturing group around the hex digits (which in the article should at least be non-capturing)

    2. PG

      Gallery Savant August 10th

      I was initially confused by this as well but if you look closely, the image representation says ‘a-z’ while the pattern given in text is ‘a-f’. So image appears to be wrong and should also read ‘a-f’.

      ( Reply )
  10. PG

    Andrew August 10th

    This is really helpful; will definitely come in handy!

    ( Reply )
  11. PG

    Emil August 10th

    Unfortuntaly, many of the regexps here shouldn’t be used.

    Password one is just plane dumb. Why would you actually want to limit the security to a lower level. Let the user use all kinds of special characters – hell, let them use Chinese!

    The e-mail regexp is just plane wrong in many ways. For instance, a dot can not follow another dot, nor can an email start or end with a dot. An e-mail address can also contain all kinds of different characters, way more than the ones allowed in this regexp. You’re free to use an exclamation mark, hash sign or even a tilde if you want to. 1+1=2@host.domain IS a valid e-mail address. The host can also be an IP-number (including an IP6-number)… and the list goes on… Check the full RFC here: http://tools.ietf.org/html/rfc5322

    Same with the URL regexp. It doesn’t follow the standard at all. What happend to IP numbers?

    ( Reply )
    1. PG

      Torkild Dyvik Olsen August 10th

      Good someone else noticed this!

      On the other side, it’s a good tutorial to get an idea of how regexp works, and adapt it to other needs.

      But it should be noted that regexp should only be used when absolutely necessary, as it’s slow and there is internal functions for much of this in most programming languages.

      ( Reply )
    2. PG

      jford August 10th

      absolutely. the limit on six characters after the domain means that .com.au would not be accepted, a perfectly legal TLD.

      ( Reply )
      1. PG

        Emil August 13th

        Actually, au is the TLD. But what about Internationalized domain names, like ten characters of Cyrillic?

        Don’t know if Tutsplus will print this in the right way, but this domain for instance: http://пример.испытание/

    3. PG

      Hendrik August 11th

      “my name\@”@host.domain would be a valid e-mail address as well if following the IETF standard. It’s up to you how much effort you put in validating, though. It has been quite standard to only allow e-mails as presented here (e.g. in phpbb and other bulletin board systems). However, I would recommend including the + character as well. It might be used in the name part of an e-mail address to do automatic sorting of e-mails on an imap server for example.

      ( Reply )
    4. PG

      Henrik August 13th

      I’m glad to see that someone actually mentioned that it’s a stupid thing to “check” a password like this example.
      The biggest bummer in this example is that it doesn’t even match uppercase letters, making the password magnitudes less effective.

      ( Reply )
  12. PG

    Dustin August 10th

    Great tutorial, there is a whole slew of uses for regular expressions. A nice reference cheat sheet can be found at:

    http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

    Has been a tremendous help in some circumstances for me.

    ( Reply )
    1. PG

      James Hogan August 10th

      I do love a good cheat sheet, Ive jQuery 1.3 as my laptop background for yonks now,

      ( Reply )
  13. PG

    Muhammad Adnan August 10th

    Good job vasilli.

    ( Reply )
  14. PG

    Daniel Schmitz August 10th

    COOL

    ( Reply )
  15. PG

    Jeff Deibel August 10th

    Wow, excellent article and visual detail.

    ( Reply )
  16. PG

    Deoxys August 10th

    Probably the best explanation of RegEx I’ve ever see, well done!

    ( Reply )
  17. PG

    Lew A August 10th

    Your “Matching a Hex Value” image is messed up, it has a-z not a-f. Also, you may want to change the title to “Matching an HTML Color Hex Value”.

    F is a perfectly valid Hex value, but wouldn’t match the 6|3 character requirement in your regexp :) .

    ( Reply )
  18. PG

    Clinton Montague August 10th

    You made REGEX look easy! A great post for beginners and reference for the rest of us!

    ( Reply )
  19. PG

    Chris Pratt August 10th

    My two biggest pet peeves in the world of regex got honorable mention here.

    1) I’ve never understood why people restrict password characters. Usernames and such are used for other purposes that plain authentication and understandably need to be constrained to certain degrees, but passwords have no limitations other than those arbitrarily imposed upon them by developers. I use passwords that contain symbolic characters (SHIFT+Number Keys) for greater password security, and there is nothing more annoying that being handed an error message saying that I’m only allowed to use a-z and 0-9 for a password of all things.

    2) Granted, emails are hard things to match properly. Some of the best email matching regular expressions are insanely long, but you should at least allow for any valid email address, even at the cost of allowing some fake ones past the guardpost. Namely, most email fields will error out with plus-addressing, where you can tack a tag onto your email address (after the username but before the @, preceded by a +). This is fantastic for finding the source of spam, as you can tag different sites and see which one fed your email address based on what address the spam gets sent to.

    ( Reply )
  20. PG

    -S August 10th

    The email regexp is a little too simple. I would recommend you check Cal Henderson (of Flickr fame)’s RFC822 Email Parser:
    http://www.iamcal.com/publish/articles/php/parsing_email

    Same for the URL. Check this one instead:
    http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url/190405#190405

    ( Reply )
  21. PG

    argg August 10th

    Great stuff, very helpful!

    ( Reply )
  22. PG

    Dan Langille August 10th

    It appears as if your Matching an Email regex does not allow emails of the form dan+tag@example.org

    ( Reply )
  23. PG

    Jason August 10th

    Great article! I’m always struggling with regular expressions and have to look up code each time I need one. I’ve saved this one for future reference.

    ( Reply )
  24. PG

    Hank Scorpio August 10th

    these commants are so random. scary.

    Image for “Matching a Hex Value” is wrong. It says a-z0-9 instead of a-f0-9

    ( Reply )
  25. PG

    David Moreen August 10th

    Very good references to have.

    ( Reply )
  26. PG

    Jay Kimble August 10th

    Concerning email regex.. it’s broken… please read the RFCs or at least read this –> http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx

    Here’s an excerpt:
    These are all valid email addresses!

    “Abc\@def”@example.com
    “Fred Bloggs”@example.com
    “Joe\\Blow”@example.com
    “Abc@def”@example.com
    customer/department=shipping@example.com
    $A12345@example.com
    !def!xyz%abc@example.com
    _somename@example.com

    I tried
    myemail++someTag@gmail.com
    and it didn’t validate. Just so you understand. gmail has a auto tagging feature where if I give you an email like the above I can instantly move the email to a folder/category. When I see posts like this it really angers me because yet again developers are being misinformed, and then they right email validators that are too harsh on validating!

    ( Reply )
    1. PG

      someone August 10th

      Internet is full of crap! (No offense), And developers who use these regex are not doing a favor to them selfs, to begin with. Would have been tons more useful a simple guide to regex, using your nice graphs, and elaborate how complicate & elegant is to match an email full email with a regex.

      ( Reply )
  27. PG

    James August 10th

    Regexing HTML is very dangerous. HTML has a recursive structure and is essentially impossible to capture with regular expressions (even recursive expressions). Your regular expression may work in some cases but it’d be foolish to implement it into a live project; edge cases DO matter. Since HTML cannot be *reliably* scrutinized with regexes the only way left is to the parse the HTML (manually)… or to use a pre-made parser.

    All in all, this was a very well put-together post; thank you Vasili!

    ( Reply )
    1. PG

      Vasili August 10th

      Yup, I had thought about that before hand, but thought that I should include it anyway. When I was trying to grab the tags I ended up breaking the program I was using to test the expressions because it was like an infinite loop!

      Thanks for the comment. :)

      ( Reply )
  28. PG

    crysfel August 10th

    Regular expressions rules!! you can do a lot of thing with this :D

    ( Reply )
  29. PG

    sean August 10th

    The “+” sign is valid in the first part of an email address.

    ( Reply )
  30. PG

    Sebastiaan August 10th

    This is indeed wonderful, the illustrations make it much more clear!

    I am, however, a little miffed at all the sites telling me (in caps) to “USE A SECURE PASSWORD” and then I am not allowed to use: !@#$^%&*()

    So, if you could, please update the password regex, for the sake of security? Thanks so much.

    ( Reply )
  31. PG

    Jeffrey August 10th

    Your email validating regexp is incorrect. Here’s a painfully thorough article that provides a better one. I consider this to be canonical since it follows RFC822 closely:

    http://www.iamcal.com/publish/articles/php/parsing_email/

    ( Reply )
  32. PG

    Vasili August 10th

    Thanks for the comments everyone, I thought that this would confuse you even more! I’m glad you found the pictures helpful. :)

    As for the hex value, yes the image is incorrect, I will fix that and email it to Jeffrey so he can update the post. For now, the text version is correct! :)

    ( Reply )
    1. PG

      zly August 14th

      how to make these pictures?so nice ~

      ( Reply )
  33. PG

    Ozh August 10th

    wrong regexp for matching an email.

    joe+cool@site.com is a perfectly valid email address, and way too many sites on the internet are not allowing the + sign in emails.

    ( Reply )
    1. PG

      MGK August 10th

      it’s for this kind of reason (always having to get ALL the possibilities) that I use the filter_var function that I pointed out previously

      ( Reply )
      1. PG

        waveclaw August 10th

        Good introduction and the highlighting is very nice. As has been said: a programmer that doesn’t know regular expressions is like a carpenter that doesn’t know how to use a hammer.

        Personally, I’d have left the email topic alone. There is another old adage: not everyone is running Outlook on Windows with IE6 from a user@company.com address.

        While some people think of the world wide web as the Internet, it’s really just a small part. email has been around since the beginning of the Internet and is probably the biggest network of networks that lives on it (interconnected by the Internet so to pun.) Due to this age and complexity, the addressing scheme is so complicated a sane person should not try to home cook an email detector.

        Just look at the RFC compliant expression at http://www.regular-expressions.info/email.html

        And that sites doesn’t even consider Usenet’s special email addresses. (Usenet is the global email bulletin board – another network of networks – you probably have accessed through Google Groups.)

        But still, a good start for people unfamiliar with the Programmer’s Hammer.

    2. PG

      phil August 10th

      plenty of people have apostrophes in their name as well. the logic used to verify the domain has issues as well. for example it fails for anyone who owns a .museum tld.

      ( Reply )
    3. PG

      Shahways Romani August 10th

      Please note in addition that even john.doe@73.60.124.136 (no, that is not my IP address neither :P ) is a valid email address

      ( Reply )
  34. PG

    Sebastian Bratu August 10th

    Really useful tutorial !

    ( Reply )
  35. PG

    Max August 10th

    Very helpful! Thank you for writing this up

    ( Reply )
  36. PG

    Rick Hambrook August 10th

    I only read the first two then lost interest… the first one still allows usernames like “—” etc.

    And I hate it oh so much when websites restrict me to weak passwords that can’t be too long or contain special characters. Even my bank does this.

    Kudos for trying though.

    ( Reply )
  37. PG

    Jason August 10th

    The email regex should be removed entirely. Sites like this provide an invaluable resource for beginners for most development issues. However, with respect to regexes that generally end up just copy/pasted, you really shouldn’t be posting something that isn’t completely fleshed out.
    The email regex has already prevented everyone with a gmail address from signing in with a ‘+’ sign. You’ve kicked out half of the world for those with country domains (@example.co.uk) as you only match one dot after the @. Not to mention the legal email addresses that use IP addresses (email@21.192.56.78). Further, the regex completely ignores quoted values.

    ( Reply )
    1. PG

      Anonymouse August 11th

      I agree completely with Jason. Please do not encourage bad regex usage. Read the RFC for what’s allowed as an email address before writing something like this. At the very least warn your users not to copy and paste from here!

      ( Reply )
  38. PG

    Vasili August 10th

    Regarding the email regex:

    I knew it would be almost impossible to write a regex (with my knowledge) that doesn’t take years to write. You can use filter_var, but seeing as how that’s only available in PHP5.2 and up, I thought it would be nice to show how that function worked (to some extent).

    ( Reply )
    1. PG

      Jeffrey August 10th

      It’s not impossible. Just use the regex that Cal Henderson wrote that I linked to above. It’s complex, but it has the benefit of having been written to the RFC standard for email addresses.

      ( Reply )
  39. PG

    Justin August 10th

    although i know all these by now, still a great reference for beginners. nice post

    ( Reply )
  40. PG

    dreake August 10th

    Hy,
    This is my URL validation regex:
    ‘/^(ftp|http|https|gopher|mailto|news|nntp|telnet|wais|file|prospero|aim|webcal|www){1}(\.|:\/\/).{2,}\..{2,6}$/’

    ( Reply )
  41. PG

    Ruben August 10th

    I just keep wondering why people prevent special characters in passwords. This only reduces the searchspace for brute force password cracking. And really that saves a huge amount of time if you only have to take a look at a basic alphabet.

    I would really suggest developers to not prevent users to make more difficult to guess passwords. You could however prevent more simple passwords. For example you might require users to at least use one special character, or you might want them to use at least one number. Do note that requiring the user more from a password may prevent them from filling it in at all, so don’t require to much if the risks aren’t to high on your site or application.

    ( Reply )
    1. PG

      Michael Mokrysz August 11th

      My favourite technique is, after checking 6 <= length 3 characters, then check there’s an uppercase & lowercase char (and for admin panels and sites that need greater security, a number & possibly symbol as well).

      I’m interested if anyone has any major criticism of my approach – always good to have people look for improvements in your code.

      ( Reply )
  42. PG

    Adrian Bloomer August 10th

    i use /^[^@:\/]+@[^:@\/]\..{2,4}/ for emails.

    ( Reply )
    1. PG

      Adrian Bloomer August 10th

      oops i meant /^[^@:\/\s\?]+@[^:@\/\s\?]\.\d{2,4}$/

      ( Reply )
      1. PG

        Jonathan August 10th

        Can you please explain this example? I have used quite a few regular expressions in my day and can’t for the life of me make sense of this one. This still has the domain issue, as well as the IP issue that was brought up earlier.

        It drives me crazy, as someone who has waded through countless message boards and tutorials to learn about regular expressions to see this kind of sloppy code posted without a thought to whether or not it is a worthwhile solution

  43. PG

    Shibi Kannan August 10th

    Very useful article. Just what I was looking for to do some form customizations. Regular expressions are a bit complicated for new folks but in the long run it saves a lot of time and helps validate the input fields and also helps prevent spam. I like the on going discussion on this subject and seems to be there are many ways to doing the same thing with regular expressions.

    ( Reply )
  44. PG

    Lane August 10th

    Please do not manually write e-mail validation regex. It’s more complicated than it seems. And it’s impractical to come up with a single regex call that can do the job flawlessly. Good implementations involve tens of lines of code.

    I recommend using a trusted third party implementation. Even the PHP filter_var function is not so good. It will verify ‘a@a’ as a valid e-mail.

    ( Reply )
    1. PG

      David Jordan August 10th

      a@a can be valid. For example on intranets you can set up mail that is something like: admin@localhost

      So because it is on an intranet there is no TLD so it is a valid email address for that particular intranet

      ( Reply )
  45. PG

    HorusKol August 10th

    Nice patterns, and some are useful…

    Unfortunately, you fell into the email trap:

    /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

    for one thing – you don’t want to include a . in the TLD at the end
    also, some people insist on capitalising personal names in emails
    finally, new ICANN regulations mean that there is no limit on the length of TLDs

    about the only suitable validation for an email is to check that there is an @ and at least one . after it – anything else is a waste of processor cycles, especially as a ‘well-formed’ email address doesn’t guarantee existence…

    ( Reply )
  46. PG

    concerned... August 10th

    While being a good tutorial in understanding regular expressions in general, the thought of these being “canned solutions” to regular problems is greatly concerning. As several people have pointed out, the email regex is flat out wrong (doesn’t follow the RFC), and the username doesn’t follow standard UNIX requirements (e.g. user names must start with a lower case letter). Here’s the actual regex filter from /etc/adduser.conf on my Debian system:

    NAME_REGEX=”^[a-z][-a-z0-9]*\$”

    ( Reply )
  47. PG

    HorusKol August 10th

    also – why bother pattern matching a password input?

    you should be encrypting passwords (preferably with salting) – so you should know that you are going to be testing a 32 or 40 character hex string against.

    and because you are encrypting – ’special’ characters are not an issue.

    ( Reply )
    1. PG

      Vasili August 10th

      For the scripts I’ve written, I check the user’s password and store a salted/md5ed/hashed password. :)

      ( Reply )
    2. PG

      ken August 10th

      You check a PW to make sure it meets complexity requirements, or do you prefer to let users use “aaaaaa” for a PW?

      It makes no sense to test an encrypted PW — when would you do this?

      ( Reply )
  48. PG

    ken August 10th

    Use \w!

    And you have \w annotated wrong in the graphic — it should be “any letter, number, or underscore”.

    ( Reply )
  49. PG

    David Jordan August 10th

    A better username one would be:

    /^[a-z]{1,}[a-z0-9 _-]{3,20}$/i

    This would allow only an a-z character as the first letter. This will stop a space or underscore ( _ ) or hyphen ( – ) from being first.

    Also I noticed on most of the regexes that it did not have the /i at the end to say that it is case-insensitive, so on most of the regexes only lowercase alpha characters are matched and not uppercase.

    So I think the html hex one should be:

    /^#([a-f0-9]{6}|[a-f0-9]{3})$/i

    I agree that there doesn’t really need to be one for a password. Passwords should be allowed to contain anything, it should get encrypted anyway so it doesn’t matter to the website what it is.

    The URL one is flimsy as well as others have said.

    ( Reply )
  50. PG

    freerf August 10th

    Tnx, very intersting and easy for understanding

    ( Reply )
  51. PG

    philips August 10th

    excellent post

    ( Reply )
  52. PG

    Harro August 10th

    The e-mail one works. but it’s not according to the official email spec.
    You can have a & before the @, but that would be rejected by that regex.

    Upside is that spambots might make the same mistake ;-)

    ( Reply )
    1. PG

      Michael Mokrysz August 11th

      I think it’s probably excessive to try and perfectly match the full range of possibilities for email addresses, otherwise you’d waste a lot of CPU cycles with little purpose. The best thing as I see it is to make sure you disallow most/all of what shouldn’t be allowed, and only allow all the commonly used characters.

      ( Reply )
  53. PG

    emwa August 11th

    two months earlier and i would have been very greatful, now i’m only normal greatful ;-)

    ( Reply )
  54. PG

    Steven August 11th

    interesting and well explained. thanks Vasili

    ( Reply )
  55. PG

    Stephen Ainsworth August 11th

    Interesting post. More for the fact that your expressions have raised comments and debates of things I would have not accounted for.

    The basics of regular expressions are fairly simple, but the more you look at them the more complex they become.

    ( Reply )
    1. PG

      Stephen Ainsworth August 11th

      P.S Nice images! Nice and clear to follow.

      ( Reply )
  56. PG

    Peter August 11th

    I’ve added a few comments in reply to others’ comments above, but would like to address a few points separately. Some of the points address the author/article directly, others are more broad in their concerns.

    Matching a Hex Value:
    1. I believe it is good practice to not capture groups unnecessarily, there is no reason why you could not have used a non-capturing group around the hexadecimal digits.
    2. “The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f’s.” Sorry but I’m calling bull on that. If the pattern looked for three hex digits before the six, then that matching would fail after matching three and not finding the end of the string but would then backtrack and continue on looking for the remaining three hex digits of six.
    3. This next point goes for most (all?) of your regular expressions that anchor the match using $: use the D modifier unless you also want any trailing newline character to be matched. E.g. /^foo$/ will happily match “foo\n” If you don’t want to use the D modifier, then you can also use \z within the pattern to match only the end of the string.

    Matching an Email:
    1. A brave choice indeed for an introductory article. The existing comments here cover a number of issues and suggested improvements.
    2. For those recommending the (sole?) use of filter_var with FILTER_VALIDATE_EMAIL please note that even that may well fall short of what you want out of your email validation. For example it claims emails like +@0 (plus at zero) as valid, which your application may not want.

    Matching an URL:
    1. Not specific to URLs but perhaps that might have been a good place to show that alternative pattern delimiters can be used if the pattern itself contains forward slashes. Popular alternatives appear to be tilde (~) and the hash/pound/number symbol (#).
    2. As with any pattern, you have to be careful to not only match what you want to match but also try and not match what you do not want to match! In this case, your URL matching pattern would match the string “hello.today is a fine day.” which I’m sure you can agree is not an URL!

    Matching an HTML tag:
    1. As with other patterns, please allow upper-case characters!
    2. The image and description both tell a different story to the pattern when looking for the tag’s attributes: a less-than character is used in the pattern but greater-than in the image/description.
    3. For the attributes, it would make more sense to use ? rather than * to denote their optional nature. The description for this attribute capturing portion is incorrect. The plus sign makes up as many attributes as you want, the star in this case just says zero-or-one series of attributes (there will only ever be one or zero with your pattern).
    4. The “some content” portion of the pattern is greedy and would happily eat up much more than you might want. E.g. given the string “blah blah foo bar” the entire string will be matched! Now imagine this on the scale of an entire HTML document and it’s probably not the desired behaviour.
    5. Why anchor the pattern? When matching HTML tags that often won’t make much practical sense.

    That’s all for now.

    P.S. Sorry for the long comment folks.
    P.P.S. Hopefully the comment system doesn’t eat any backslashes as there are a few important ones up there.

    ( Reply )
    1. PG

      Peter August 11th

      Looks like the backslashes escaped unscathed, but the <em> tags (for HTML tag point 4) were turned into HTML rather than encoded into entities. D’oh.

      ( Reply )
  57. PG

    Grimnir August 11th

    First of all I don’t understand, why a password shouldn’t be allowed to contain capital letters and special characters such as paranthesis and a few others. That’s IMO too strict.

    Actually not allowing capital letters is kind of consistent through-out this tutorial, and that’s a unnecessary restriction. Why not accept hexadecimal numbers with capital letters or accepting tags with capital letters? IMO that’s too strict not to allow it, unless you have some really good reason.

    There is an error in the email address and url. The last part which accepts TLD is not really good enough. It would accept addresses such as something@something.a.b.c. or even something@hello..a.b.c
    If you want to capture the “extended” TLD such as co.uk you could do it like this:
    ((?:[a-zA-Z]{,2}\.)?[a-zA-Z]{2,3})

    I would probably just use ([a-zA-Z]{2,3}) and then parse the hostname and look out for an “extended” TLD if I really need it.

    I can see other have commented on this one, and apparently TLDs can be any length.

    I’m not sure exactly how many special characters a URL is allowed to contain, but I do know, that the one in this tutorial is not good enough. It won’t accept URLs such as http://something.is.wrong.com/mylittlescript?firstparameter=hello&secondparameter=helo

    That’s way too strict.

    The IP address is correct, but it could be a bit more compact, if you like :) You could be using \d instead of [0-9]. Maybe it’s just a matter of habit, but I always use \d.

    For matching an HTML tag the same goes for capital letters.

    I only have one comment for the HTML tag regular expression. I don’t really understand the part with ([^>]+)*
    Is there any special reason not to use ([^>]*) instead?

    I can see other people have pointed out these issues as well.

    ( Reply )
  58. PG

    Cerium August 11th

    The pattern for the email is wrong, it don’t allow for something like john.smith+folder@gmail.com

    And the pattern for the password don’t alow “secure characters” like @/#& or something like that…

    The url’s one don’t allow ftp:// or mailto: or whatever-you-want protocol/pseudo-protocol different from http/https…

    ( Reply )
  59. PG

    Mubarak Ali August 11th

    Marvelous post i’ve ever seen easy to grasp

    ( Reply )
  60. PG

    Mujtaba August 11th

    this was just what i needed…
    i used to scratch my head out at seeing this mysterious strings in some php codes… it was only recently that i came to know that they are called “regex” …

    ( Reply )
  61. PG

    Stu August 11th

    Pretty much all of these could do with the /i modifier to make them case-insensitive, or stick the A-Z range in the character classes.

    Limiting a password to lower case letters, numbers, and a -_ seems like pretty poor practice to me.

    ( Reply )
  62. PG

    Mujtaba August 11th

    wow…. Vasili you are just 14? just checked out your site and i couldnt believe it!!!!! Good job…

    ( Reply )
    1. PG

      Vasili August 11th

      Woops, forgot to change that, 15. ;)

      ( Reply )
  63. PG

    André Faria Gomes August 11th

    Very nice post. Regex it’s a really good time saver.

    ( Reply )
  64. PG

    Miguel Ventura August 11th

    The article is very good and it explains well the expressions.

    However i’d like you to add the “+” character to the email username part as it’s perfectly valid. Other weirder characters are also valid, but the “+” is quite used nowadays due to Gmail allowing aliases for username+anythingYouWant@gmail.com to forward to username@gmail.com.

    Best regards

    ( Reply )
  65. PG

    kangax August 11th

    <img src=”img.jpg” alt=”My image>” /> (attributes can’t contain greater than signs)

    This is not really true. HTML4 (at least) allows “>” and “<” in attribute values, so this tag is actually a valid one.

    Besides the fact that your regex utilizes greedy match (and so will capture as many same-named tags as it can), it fails to match things like – “<h2>foo</h2>” and “<br/>”

    ( Reply )
  66. PG

    Dwayne Clarke August 11th

    i love this website. great post

    ( Reply )
  67. PG

    Michael August 11th

    Great tut! I just started working on reg expressions after watching the video series by themeforest. There’s also a great tool at http://www.gskinner.com/RegExr Check it out! As a side note to newbies to the expression world, anywhere you have 0-9, you can use \d. :-D Thanks!

    ( Reply )
  68. PG

    eric August 11th

    Un article brillant pour ceux qui comme moi ne comprennent rien aux expressions régulières ! merci

    ( Reply )
  69. PG

    DemoGeek August 11th

    Very well explained! “A picture is worth a thousand words” is absolutely true with this, I’ve never got comfortable with RegEx as I never understood the details of it but this shot the heck out of it and now I know what I’m doing with my RegExes. Great!

    ( Reply )
  70. PG

    Matthew August 11th

    Can the email address section be removed or at least a big old warning be added to it. As many others have said above, it’s broken and I’d hate to see this misconception of how to properly validate an email address perpetuated.

    I’m sick of me+tag@example.com and other perfectly valid email addresses being rejected by broken validation routines written by ignorant coders.

    ( Reply )
  71. PG

    Jimpsson August 11th

    Very appreciated article, would appreciate an deepening about Regular Expressions even more! :)

    ( Reply )
  72. PG

    Oğuz ÇELİKDEMİR August 11th

    @Vasili

    could you please give me an idea to block automatically “Out of Office” auto-reply messages?

    ( Reply )
  73. PG

    TK August 11th

    The patterns for email and url are buggy as heck. There are a lot of valid emails and URLs they wont match. The formal specs for these entities have character length limits for the parts. Using unbounded + operators is poor form.

    If you want match properly with Regular expressions I recommend reading the O’Reilly book “Mastering Regular Expressions”… then read it again. It will change your life and make many tasks easier across many different technologies and platforms.

    ( Reply )
    1. PG

      Vasili August 12th

      I’m currently reading that book and recommend it 100%.

      ( Reply )
  74. PG

    Ary Mega August 11th

    Excellent article. Keep writing more, Vasili!

    ( Reply )
  75. PG

    Ethan August 11th

    Great job Vasili! This is the only post about regex I’ve ever actually understood!

    ( Reply )
  76. PG

    Dale Watkins August 11th

    Love this post. Regular Expressions have always been something I never took the time to learn. The illustrations made it easy, Thanks!

    ( Reply )
  77. PG

    Lam Nguyen August 11th

    Awesome list and very well explained. Great!

    ( Reply )
  78. PG

    Arun Mariappa K August 11th

    Great Post… Well Explained…. Thanks

    ( Reply )
  79. PG

    jQueryGlobe August 12th

    I don`t get it, why someone should limit passwords to max length and allow only [a-z0-9_-]

    Also I noticed that you allow dots for TLD, so, emails like aaa@bbaa… will match your regexp

    ( Reply )
  80. PG

    David Ferguson August 12th

    Good description and break down of regular expressions and how they work.. but your examples dont exactly cut it, as some of the other posts have indicated..

    ( Reply )
  81. PG

    Mike Schell August 12th

    Here’s one I use to check for a U.S. Zip Code *or* a Canadian Postal Code:

    /\s?(^\d{5}(?:[\s-]\d{4})?$|[a-z]\d[a-z]\s?\d[a-z]\d)/i

    ( Reply )
  82. PG

    Marius August 12th

    Just terrible regular expressions.

    Propagating these exceptionally bad regular expressions so that unknowing developers implement them is a crime.

    Things it will not match:

    mandela@state.gov.za – valid e-mails
    127.1 – valid IP address’
    http://www.google.com. – valid address (full stop at end is valid)

    the list of faults go on, but to my mind the fact that password characters are limited at all is the worst and is such a bad security problem you should be drawn and quartered.

    see: http://en.wikipedia.org/wiki/Password_fatigue

    ( Reply )
  83. PG

    Leonardo França August 12th

    very cool!!! congradulations!!

    ( Reply )
  84. PG

    Shane August 12th

    Be kind to some of us that use subaddressing and include a “+” in the username portion of the email.

    What am I talking about?

    fakeemail+mytag@gmail.com

    is a legit email in Gmail for fakeemail@gmail.com. Many mail systems support sub addressing by using a “+” to signfy this. Your regular expression for emails would not match the subaddress.

    ( Reply )
  85. PG

    TheBug August 12th

    Typo in the last “Matching an HTML Tag” expression..
    /^<([a-z]+)([^(.*)|\s+\/>)$/
    should be
    /^]+)*(?:>(.*)|\s+\/>)$/

    Not mathcing with the image above the expression

    ( Reply )
  86. PG

    Daweed August 12th

    Hello .

    Thanks for this RegEx summary. The article is really clear to understand how regEx works.

    Just one comment about the regEx for detecting email.

    The pattern is not working properly for all emails.

    For exemple emails ending with .gouv-qc-ca won’t match with the regEx pattern.

    ( Reply )
  87. PG

    Erik Ostrom August 12th

    While {3,16} will constrain the length of a username, in practice you won’t want to rely on this for the most common usage, validating user input. You want not only to rule out invalid usernames, but also explain to the user why they’re invalid. “Doesn’t match this regular expression” is a terrible explanation; “‘JR’ is too short” is a good one. So for this purpose you need to test length separately anyway, and might as well leave it out of the regular expression.

    ( Reply )
  88. PG

    Greg Jastrab August 12th

    Yes, if you’re going to publicize Email Regexp’s, get it right and AT LEAST include a plus sign being valid before the @ !! I love using that for gmail to create custom filters, and hate when websites tell me that user+something@domain.com is an invalid email!

    ( Reply )
  89. PG

    STVerschoof August 12th

    You might have forgotten one important regular expression! does checking a Date ring a bell? Maybe you could still add it to this article or something! Great article anyway, really helpfull when one is not totally sure about the expression check! So Thank you!!!

    ( Reply )
    1. PG

      Vasili August 12th

      I didn’t include this because there are just SO MANY different formats for dates. The regex would be way too long. :P

      An easy way to check a date: strtotime() :)

      ( Reply )
      1. PG

        STVerschoof August 13th

        You are right ;) there are many formats indeed! :P true true ;)

  90. PG

    aguaesolutions August 13th

    Many thanks for this excellent post

    ( Reply )
  91. PG

    Nic Wise August 13th

    Yup – great visual explanation, but the regex’s are way too limited. eg password (makes them a LOT less secure and easier to hack!), email (wouldn’t handle whatever@immigration.dol.govt.nz or me+other@gmail.com), the hex this matches hex COLOURS, but not hex VALUES.

    Close tho – and a great start for people wanting to know more about RegEx’s.

    ( Reply )
  92. PG

    MP August 13th

    nice stuff, but I can’t really get a hold on this things.

    If I use the reg exp for validating URLs, and want to check that the url contains a certain part for example “amazon” , should I use (amazon) and where to put it?

    /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

    ( Reply )
  93. PG

    Chandra August 13th

    What is the regular expression to match a file path?

    ( Reply )
  94. PG

    fahri August 13th

    nice tutorial, i think it will be useful for everyday using.

    keep posting man…. you are the man

    ( Reply )
  95. PG

    Scott Elkin August 14th

    Can you clarify, “Finally, we want the end of the string ($)”. What would happen w/out the “$”.

    I always use the “$” but not sure I really know why.

    ( Reply )
    1. PG

      Peter August 18th

      The $ symbol anchors the regular expression to the end of the subject (or a line of the subject in multiline mode, but best ignore that for now). If the subject does not end at that point in the pattern, the match fails. To give discrete examples:

      preg_match(’/^foo$/’, ‘foo’); // Match
      preg_match(’/^foo$/’, ‘foobar’); // No match
      preg_match(’/^foo/’, ‘foo’); // Match
      preg_match(’/^foo/’, ‘foobar’); // Match

      ( Reply )
  96. PG

    Tomáš Fejfar August 16th

    The slug regexp would validate for slugs like “–test—” what’S bad :)

    ( Reply )
  97. PG

    myname August 17th

    thanks a lot.. very useful

    ( Reply )
  98. PG

    Mike August 17th

    Thanks for a great explanation of how RegExes work. It refreshed my memory just when I needed it!

    My favorite RegEx tester is Rubular: http://rubular.com

    Great tester that tells you how your regexen will work in Ruby (on Rails)

    ( Reply )
  99. PG

    Constantin Tovisi August 23rd

    I see that this post generated a lot of reactions, both pro and cons. This only shows that Regular Expressions are a hot topic that a lot of posts can be made about.

    ( Reply )
  100. PG

    amir August 25th

    really useful tips abt regx

    ( Reply )
  101. PG

    rizza August 27th

    This is what I am looking for

    Great post

    ( Reply )
  102. PG

    Benjamin Reid August 28th

    I keep coming back to these regex’s for a class I’m writing, there so helpful!

    :)

    ( Reply )
  103. PG

    Hussain Cutpiecewala August 31st

    Awesome..

    ( Reply )
  104. PG

    Brianary September 2nd

    Good introduction to regular expressions, but as production code, not so good. :(

    1. Simple username: /^[a-z0-9]+([_.-][a-z0-9]+)*$/i since you probably don’t want consectutive underscores, dots, or dashes and don’t want them to start the username. It really depends on whether you really need to create 7-bit ASCII usernames, though, which isn’t particularly international-friendly. Also reasonable: /^\w+(\S\w+)*$/ .
    2. Don’t validate passwords. That’s not a good idea.
    3. Remember to match the hex values case-insensitively.
    4. “Slug” is OK.
    5. Same problems as the username match. Try /^[a-z0-9]+([_.-+][a-z0-9]+)*@([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{3,6}$/i . This will also match address+tag@gmail.com, john.q.public@us-1.example.org, &c., but not —@—.– or _@_.org . You should also be sure to exclude /@example\.(org|net|com)$/ so that you don’t get any phony RFC 2606 addresses (you could also do this with a negative lookahead assertion).
    6. Same problems as the email matching. Try /([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{3,6}(\/([^ \t\n!*"'();:@&=+$,/?%#\[\]]|%[a-f0-9]{2}|%u[a-f0-9]{4})+)*(\?([^ \t\n!*"'();:@&=+$,/?%#\[\]]|%[a-f0-9]{2}|%u[a-f0-9]{4})*)?(#([^ \t\n!*"'();:@&=+$,/?%#\[\]]|%[a-f0-9]{2}|%u[a-f0-9]{4})*)?$/i . You don’t want _.com or –_.info or __.___ . You should also check against RFC 2606 here, too. You also want to make sure there is no whitespace, and that the correct characters are used according to the spec.
    7. IP address is OK, though you could also allow for integer notation.
    8. Don’t use regular expressions to parse HTML or XML. These are not regular languages, and this will always lead to code that will eventually break. It’s OK for the occasional ad-hoc Perl one-liner, but you shouldn’t be maintaining regex code for markup parsing. There are libraries for that.

    ( Reply )
  105. PG

    mycrazydream September 4th

    These are very helpful to someone that has no clue about regex, but some fall short of accuracy. Rather, they will match what each determines to match, but other things as well. In other words, they are not specific enough. And that is what regex is all about. I want to capture what I want. Only that. Nothing else.

    But all in all good work

    ( Reply )
  106. PG

    mycrazydream September 4th

    But I have to continue in lieu of the comment to not “parse HTML or XML.” Seriously? Not only would that be seriously limiting but almost every great javascript framework in use on the web relies on the highest degree of pattern matching in the DOM to return the elements one needs to control. In other words, I take that warning to be, “don’t try to use regex where data on the internet is concerned.” Laughable.

    ( Reply )
    1. PG

      Brianary September 9th

      It’s just not the ideal tool for the job. Parsers are available in any language for markup. And the beauty of the DOM is that it *is* a parser.

      Give me any regex and I will find either valid HTML/XML or working tag soup (stuff that isn’t valid, but looks right in current browsers) that won’t match correctly.

      Sure, it’ll usually work, and you can keep patching it each time you run across breaking code or the code changes unexpectedly, but eventually that will become your full-time job, and that’s not laughable.

      ( Reply )
    2. PG

      Brianary September 9th

      It’s just not the ideal tool for the job. Parsers are available in any language for markup. And the beauty of the DOM is that it *is* a parser. You may want to double-check the claim that “almost every great javascript framework” uses regex to parse the DOM.

      Give me any regex and I will find either valid HTML/XML or working tag soup (stuff that isn’t valid, but looks right in current browsers) that won’t match correctly.

      Sure, it’ll usually work, and you can keep patching it each time you run across breaking code or the code changes unexpectedly, but eventually that will become your full-time job, and that’s not laughable.

      ( Reply )
  107. PG

    vovkin September 8th

    Just great!
    It’s really what I need.

    ( Reply )
  108. PG

    RedPyxll September 10th

    Great!
    Linked to it on my new code blog here: http://redpyxll.com/archives/408

    ( Reply )
  109. PG

    Paw September 11th

    One of the best posts EVER!

    ( Reply )
  110. PG

    hysia September 28th

    Awesome post. I like it very much!

    ( Reply )
  111. PG

    Allan October 5th

    The email isn’t really working when you have
    IT does not consider it as a tag. But its good though.

    ( Reply )
  112. PG

    Hamman Samuel October 6th

    This is the best tutorial on regular expressions on the whole world wide web! I just begun learning them, and this place helped me the most

    ( Reply )
  113. PG

    MR October 8th

    Actually validating an email with a regexp is a bit more camplicated, as explained here : http://www.regular-expressions.info/email.html.

    For instance, your expression wouldn’t match john+doe@mymail.com tough it’s a perfectly valid and RFC compliant address.

    And don’t forget the /i (case-insensitive) modifier.

    ( Reply )
  114. PG

    Sumit November 2nd

    Great Dude…

    I got details which I could get before today.
    Keep It Up.
    Also publish some more tutorials on PHP and OOPS with some basic examples of web applications implemented in 3-tier or N-tier architectures. I want to know basic of it. If any body of you all have links to those websites please mail me on my mail id: joshisumitnet@yahoo.com
    Thanks in advance. I want more about Tiered Architecture examples.

    ( Reply )
  115. PG

    David Moreen November 11th

    Vasili I love you man, you just saved my booty on a project!

    ( Reply )
  116. PG

    faraz November 15th

    very nice its save my time

    ( Reply )
  117. PG

    Debo November 18th

    great post, Thank you!! Vasili.

    ( Reply )
  1. Arrow
    Gravatar

    Your Name
    November 18th