8 Regular Expressions You Should Know

8 Regular Expressions You Should Know

Regular expressions are a language of their own. When you learn a new programming language, they’re this little sub-language that makes no sense at first glance. Many times you have to read another tutorial, article, or book just to understand the “simple” pattern described. Today, we’ll review eight regular expressions that you should know for your next coding project.


Background Info on Regular Expressions

This is what Wikipedia has to say about them:

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

Now, that doesn’t really tell me much about the actual patterns. The regexes I’ll be going over today contains characters such as \w, \s, \1, and many others that represent something totally different from what they look like.

If you’d like to learn a little about regular expressions before you continue reading this article, I’d suggest watching the Regular Expressions for Dummies screencast series.

The eight regular expressions we’ll be going over today will allow you to match a(n): username, password, email, hex value (like #fff or #000), slug, URL, IP address, and an HTML tag. As the list goes down, the regular expressions get more and more confusing. The pictures for each regex in the beginning are easy to follow, but the last four are more easily understood by reading the explanation.

The key thing to remember about regular expressions is that they are almost read forwards and backwards at the same time. This sentence will make more sense when we talk about matching HTML tags.

Note: The delimiters used in the regular expressions are forward slashes, “/”. Each pattern begins and ends with a delimiter. If a forward slash appears in a regex, we must escape it with a backslash: “\/”.


1. Matching a Username

Matching a username

Pattern:

/^[a-z0-9_-]{3,16}$/

Description:

We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).

String that matches:

my-us3r_n4m3

String that doesn’t match:

th1s1s-wayt00_l0ngt0beausername (too long)


2. Matching a Password

Matching a password

Pattern:

/^[a-z0-9_-]{6,18}$/

Description:

Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).

String that matches:

myp4ssw0rd

String that doesn’t match:

mypa$$w0rd (contains a dollar sign)


3. Matching a Hex Value

Matching a hex valud

Pattern:

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

Description:

We begin by telling the parser to find the beginning of the string (^). Next, a number sign is optional because it is followed a question mark. The question mark tells the parser that the preceding character — in this case a number sign — is optional, but to be “greedy” and capture it if it’s there. Next, inside the first group (first group of parentheses), we can have two different situations. The first is any lowercase letter between a and f or a number six times. The vertical bar tells us that we can also have three lowercase letters between a and f or numbers instead. Finally, we want the end of the string ($).

The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f’s.

String that matches:

#a3c113

String that doesn’t match:

#4d82h4 (contains the letter h)


4. Matching a Slug

Matching a slug

Pattern:

/^[a-z0-9-]+$/

Description:

You will be using this regex if you ever have to work with mod_rewrite and pretty URL’s. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).

String that matches:

my-title-here

String that doesn’t match:

my_title_here (contains underscores)


5. Matching an Email

Matching an email

Pattern:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Description:

We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD’s (.ny.us or .co.uk). Finally, we want the end of the string ($).

String that matches:

john@doe.com

String that doesn’t match:

john@doe.something (TLD is too long)


6. Matching a URL

Matching a url

Pattern:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Description:

This regex is almost like taking the ending part of the above regex, slapping it between “http://” and some file structure at the end. It sounds a lot simpler than it really is. To start off, we search for the beginning of the line with the caret.

The first capturing group is all option. It allows the URL to begin with “http://”, “https://”, or neither of them. I have a question mark after the s to allow URL’s that have http or https. In order to make this entire group optional, I just added a question mark to the end of it.

Next is the domain name: one or more numbers, letters, dots, or hypens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we say that this group can be matched as many times as we want. Pretty much this allows multiple directories to be matched along with a file at the end. I have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.

Then a trailing slash is matched, but it can be optional. Finally we end with the end of the line.

String that matches:

http://net.tutsplus.com/about

String that doesn’t match:

http://google.com/some/file!.html (contains an exclamation point)


7. Matching an IP Address

Matching an IP address

Pattern:

/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/

Description:

Now, I’m not going to lie, I didn’t write this regex; I got it from here. Now, that doesn’t mean that I can’t rip it apart character for character.

The first capture group really isn’t a captured group because

?:

was placed inside which tells the parser to not capture this group (more on this in the last regex). We also want this non-captured group to be repeated three times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup then a dot to move on.

The subgroup is also another non-capture group. It’s just a bunch of character sets (things inside brackets): the string “25″ followed by a number between 0 and 5; or the string “2″ and a number between 0 and 4 and any number; or an optional zero or one followed by two numbers, with the second being optional.

After we match three of those, it’s onto the next non-capturing group. This one wants: the string “25″ followed by a number between 0 and 5; or the string “2″ with a number between 0 and 4 and another number at the end; or an optional zero or one followed by two numbers, with the second being optional.

We end this confusing regex with the end of the string.

String that matches:

73.60.124.136 (no, that is not my IP address :P)

String that doesn’t match:

256.60.124.136 (the first group must be “25″ and a number between zero and five)


8. Matching an HTML Tag

Matching an HTML tag

Pattern:

/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Description:

One of the more useful regexes on the list. It matches any HTML tag with the content inside. As usually, we begin with the start of the line.

First comes the tag’s name. It must be one or more letters long. This is the first capture group, it comes in handy when we have to grab the closing tag. The next thing are the tag’s attributes. This is any character but a greater than sign (>). Since this is optional, but I want to match more than one character, the star is used. The plus sign makes up the attribute and value, and the star says as many attributes as you want.

Next comes the third non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forward slash, and a greater than sign. The first option looks for a greater than sign followed by any number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing group. In this case it was the tag’s name. Now, if that couldn’t be matched we want to look for a self closing tag (like an img, br, or hr tag). This needs to have one or more spaces followed by “/>”.

The regex is ended with the end of the line.

String that matches:

<a href=”http://net.tutsplus.com/”>Nettuts+</a>

String that doesn’t match:

<img src=”img.jpg” alt=”My image>” /> (attributes can’t contain greater than signs)


Conclusion

I hope that you have grasped the ideas behind regular expressions a little bit better. Hopefully you’ll be using these regexes in future projects! Many times you won’t need to decipher a regex character by character, but sometimes if you do this it helps you learn. Just remember, don’t be afraid of regular expressions, they might not seem it, but they make your life a lot easier. Just try and pull out a tag’s name from a string without regular expressions! ;)


Note: Want to add some source code? Type <pre><code> before it and </code></pre> after it. Find out more
  • Pierre

    Awesome post.

  • Chad Hietala

    great reference.

  • Alexander Högberg

    Thanks a bunch! I finally got a real grip of this now, great reference pictures.

  • http://www.jeffadams.ci.uk Jeff Adams

    Great tutorial, simple yet for folks like me who are designers SLASH developers but not one or the other this is great.

    particularly liked the images with these, they explain things with example – top tut!

  • http://www.mitchbryson.com mitchell bryson

    Matching a Hex Value – graphic doesn’t match expression

  • http://simon.vansintjan.org Simon

    Regex: something I know I should know but I know I don’t know well enough.

    This is a great post, bookmarked, thank you.

    Also, thanks for the link to the screencast.

    • http://simon.vansintjan.org Simon

      Also, sorry to be clogging this up, just a question to the general public. I’ve used rubular before to test my regular expressions with ruby, but is there something like it – ease of use, etc – for other languages (I’m thinking php and javascript here).

  • ru83n

    What about IPv6?

  • http://www.uni4ya.com MGK

    Hello,

    nice idea to do such a tuts :)

    however, I have some things to add

    concerning the email regex, and the url regex, I personnaly use a php function : filter_var.

    The result for an email would be

    filter_var($email_to_verify, FILTER_VALIDATE_EMAIL)

    it returns true or false.

    for an url, it would be

    filter_var($websiteurl_to_verify, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED)

    the FILTER_FLAG_SCHEME_REQUIRED is optional if you need to force the http/https/…

    concerning the password, I wanted to get a password of 6 to 15 characters, with letters and numbers ONLY, and with at least 1 number, 1 lowercase letter, AND 1 uppercase letter..

    so the regex I made is :

    $pattern_pass = “/(?=^.{6,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?!.*[\W_\x7B-\xFF]).*$/”;

    all other characters are denied of course.

    finally, as for Names/Surnames, (not usernames), I wanted something to allow letters, space and accents only (Note, for the accents and the space, it is because I’m french and we have names such as De Lamberté for example), with 2 to 30 characters

    so I made something like this :

    $pattern_name = “/^([\p{L}\s]{2,30})$/”;

    voila ! I hope it was as useful as your tuts, and if anybody can verify my regex because I would like to have a “foreign eye” to catch any error I’ve made that I cannot see !

    Regards :)

    • http://www.phpandstuff.com Lane

      filter_var(“a@a”, FILTER_VALIDATE_EMAIL);

      That validates. I’m not sure if it’s intended, but I wouldn’t wanna validate an e-mail without a TLD.

      • Peter

        Just like the regular expression from this article would happily allow an email address of the form “.@….” — I wouldn’t wanna validate an e-mail with only punctuation!

      • http://www.steveoliveira.com Steve

        If you want to get really strict with e-mails:

        ^(?:[a-zA-Z0-9_'^&/+-])+(?:\.(?:[a-zA-Z0-9_'^&/+-])+)*@(?:(?:\[?(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.){3}(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\]?)|(?:[a-zA-Z0-9-]+\.)+(?:[a-zA-Z]){2,}\.?)$

        Or try regexlib.com

  • Erik

    Great reference! However, at first glance I don’t see why the hex value ‘#4d82h4′ wouldn’t match. It’s indeed not a valid hex value due to the ‘h’, but it should match the expression displayed because it matches the entire alphabet (in lowercase).

    • http://www.uni4ya.com MGK

      there is an error on the picture indeed, but the correct pattern which is

      /^#?([a-f0-9]{6}|[a-f0-9]{3})$/

      is written just below the picture…

      • Peter

        An arguably improved pattern would be:

        /^#?[0-9a-f]{3}(?:[0-9a-f]{3})?$/iD

        1. Much neater (IMO) specification of optional 4th-6th hex digit
        2. D modifier to anchor to actual end of string
        3. i modifier to allow upper- and lower-case hex digits
        4. No need for the capturing group around the hex digits (which in the article should at least be non-capturing)

    • http://www.gallerysavant.com Gallery Savant

      I was initially confused by this as well but if you look closely, the image representation says ‘a-z’ while the pattern given in text is ‘a-f’. So image appears to be wrong and should also read ‘a-f’.

  • http://andrewburgess.posterous.com Andrew

    This is really helpful; will definitely come in handy!

  • Emil

    Unfortuntaly, many of the regexps here shouldn’t be used.

    Password one is just plane dumb. Why would you actually want to limit the security to a lower level. Let the user use all kinds of special characters – hell, let them use Chinese!

    The e-mail regexp is just plane wrong in many ways. For instance, a dot can not follow another dot, nor can an email start or end with a dot. An e-mail address can also contain all kinds of different characters, way more than the ones allowed in this regexp. You’re free to use an exclamation mark, hash sign or even a tilde if you want to. 1+1=2@host.domain IS a valid e-mail address. The host can also be an IP-number (including an IP6-number)… and the list goes on… Check the full RFC here: http://tools.ietf.org/html/rfc5322

    Same with the URL regexp. It doesn’t follow the standard at all. What happend to IP numbers?

    • Torkild Dyvik Olsen

      Good someone else noticed this!

      On the other side, it’s a good tutorial to get an idea of how regexp works, and adapt it to other needs.

      But it should be noted that regexp should only be used when absolutely necessary, as it’s slow and there is internal functions for much of this in most programming languages.

    • jford

      absolutely. the limit on six characters after the domain means that .com.au would not be accepted, a perfectly legal TLD.

    • http://xDest.com Hendrik

      “my name\@”@host.domain would be a valid e-mail address as well if following the IETF standard. It’s up to you how much effort you put in validating, though. It has been quite standard to only allow e-mails as presented here (e.g. in phpbb and other bulletin board systems). However, I would recommend including the + character as well. It might be used in the name part of an e-mail address to do automatic sorting of e-mails on an imap server for example.

    • Henrik

      I’m glad to see that someone actually mentioned that it’s a stupid thing to “check” a password like this example.
      The biggest bummer in this example is that it doesn’t even match uppercase letters, making the password magnitudes less effective.

  • http://www.circuitbomb.com Dustin

    Great tutorial, there is a whole slew of uses for regular expressions. A nice reference cheat sheet can be found at:

    http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

    Has been a tremendous help in some circumstances for me.

    • http://threadbarecanvas.com James Hogan

      I do love a good cheat sheet, Ive jQuery 1.3 as my laptop background for yonks now,

  • http://www.imblog.info Muhammad Adnan

    Good job vasilli.

  • http://www.flex.etc.br Daniel Schmitz

    COOL

  • http://www.mindfulpracticesyoga.com Jeff Deibel

    Wow, excellent article and visual detail.

  • Deoxys

    Probably the best explanation of RegEx I’ve ever see, well done!

  • http://fullthrottledevelopment.com/ Lew A

    Your “Matching a Hex Value” image is messed up, it has a-z not a-f. Also, you may want to change the title to “Matching an HTML Color Hex Value”.

    F is a perfectly valid Hex value, but wouldn’t match the 6|3 character requirement in your regexp :).

  • http://sonergonul.com Soner Gönül
  • http://slightlymore.co.uk Clinton Montague

    You made REGEX look easy! A great post for beginners and reference for the rest of us!

  • http://www.chrisdpratt.com Chris Pratt

    My two biggest pet peeves in the world of regex got honorable mention here.

    1) I’ve never understood why people restrict password characters. Usernames and such are used for other purposes that plain authentication and understandably need to be constrained to certain degrees, but passwords have no limitations other than those arbitrarily imposed upon them by developers. I use passwords that contain symbolic characters (SHIFT+Number Keys) for greater password security, and there is nothing more annoying that being handed an error message saying that I’m only allowed to use a-z and 0-9 for a password of all things.

    2) Granted, emails are hard things to match properly. Some of the best email matching regular expressions are insanely long, but you should at least allow for any valid email address, even at the cost of allowing some fake ones past the guardpost. Namely, most email fields will error out with plus-addressing, where you can tack a tag onto your email address (after the username but before the @, preceded by a +). This is fantastic for finding the source of spam, as you can tag different sites and see which one fed your email address based on what address the spam gets sent to.

  • -S

    The email regexp is a little too simple. I would recommend you check Cal Henderson (of Flickr fame)’s RFC822 Email Parser:
    http://www.iamcal.com/publish/articles/php/parsing_email

    Same for the URL. Check this one instead:
    http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url/190405#190405

  • http://www.larryruckman.com argg

    Great stuff, very helpful!

  • http://langille.org/ Dan Langille

    It appears as if your Matching an Email regex does not allow emails of the form dan+tag@example.org

  • http://pantheon.org/ Jason

    Great article! I’m always struggling with regular expressions and have to look up code each time I need one. I’ve saved this one for future reference.

  • Hank Scorpio

    these commants are so random. scary.

    Image for “Matching a Hex Value” is wrong. It says a-z0-9 instead of a-f0-9

  • http://spotdex.com David Moreen

    Very good references to have.

  • http://theruntime.com/blogs/jaykimble Jay Kimble

    Concerning email regex.. it’s broken… please read the RFCs or at least read this –> http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx

    Here’s an excerpt:
    These are all valid email addresses!

    “Abc\@def”@example.com
    “Fred Bloggs”@example.com
    “Joe\\Blow”@example.com
    “Abc@def”@example.com
    customer/department=shipping@example.com
    $A12345@example.com
    !def!xyz%abc@example.com
    _somename@example.com

    I tried
    myemail++someTag@gmail.com
    and it didn’t validate. Just so you understand. gmail has a auto tagging feature where if I give you an email like the above I can instantly move the email to a folder/category. When I see posts like this it really angers me because yet again developers are being misinformed, and then they right email validators that are too harsh on validating!

    • someone

      Internet is full of crap! (No offense), And developers who use these regex are not doing a favor to them selfs, to begin with. Would have been tons more useful a simple guide to regex, using your nice graphs, and elaborate how complicate & elegant is to match an email full email with a regex.

  • http://james.padolsey.com James

    Regexing HTML is very dangerous. HTML has a recursive structure and is essentially impossible to capture with regular expressions (even recursive expressions). Your regular expression may work in some cases but it’d be foolish to implement it into a live project; edge cases DO matter. Since HTML cannot be *reliably* scrutinized with regexes the only way left is to the parse the HTML (manually)… or to use a pre-made parser.

    All in all, this was a very well put-together post; thank you Vasili!

    • http://vasili.duove.com/ Vasili

      Yup, I had thought about that before hand, but thought that I should include it anyway. When I was trying to grab the tags I ended up breaking the program I was using to test the expressions because it was like an infinite loop!

      Thanks for the comment. :)

  • http://www.quizzpot.com crysfel

    Regular expressions rules!! you can do a lot of thing with this :D

  • http://seanhess.net sean

    The “+” sign is valid in the first part of an email address.

  • Sebastiaan

    This is indeed wonderful, the illustrations make it much more clear!

    I am, however, a little miffed at all the sites telling me (in caps) to “USE A SECURE PASSWORD” and then I am not allowed to use: !@#$^%&*()

    So, if you could, please update the password regex, for the sake of security? Thanks so much.

  • http://blog.jeffreymcmanus.com/ Jeffrey

    Your email validating regexp is incorrect. Here’s a painfully thorough article that provides a better one. I consider this to be canonical since it follows RFC822 closely:

    http://www.iamcal.com/publish/articles/php/parsing_email/

  • http://vasili.duove.com/ Vasili

    Thanks for the comments everyone, I thought that this would confuse you even more! I’m glad you found the pictures helpful. :)

    As for the hex value, yes the image is incorrect, I will fix that and email it to Jeffrey so he can update the post. For now, the text version is correct! :)

    • zly

      how to make these pictures?so nice ~

  • http://planetozh.com/ Ozh

    wrong regexp for matching an email.

    joe+cool@site.com is a perfectly valid email address, and way too many sites on the internet are not allowing the + sign in emails.

    • http://www.uni4ya.com MGK

      it’s for this kind of reason (always having to get ALL the possibilities) that I use the filter_var function that I pointed out previously

      • http://www.waveclaw.net waveclaw

        Good introduction and the highlighting is very nice. As has been said: a programmer that doesn’t know regular expressions is like a carpenter that doesn’t know how to use a hammer.

        Personally, I’d have left the email topic alone. There is another old adage: not everyone is running Outlook on Windows with IE6 from a user@company.com address.

        While some people think of the world wide web as the Internet, it’s really just a small part. email has been around since the beginning of the Internet and is probably the biggest network of networks that lives on it (interconnected by the Internet so to pun.) Due to this age and complexity, the addressing scheme is so complicated a sane person should not try to home cook an email detector.

        Just look at the RFC compliant expression at http://www.regular-expressions.info/email.html

        And that sites doesn’t even consider Usenet’s special email addresses. (Usenet is the global email bulletin board – another network of networks – you probably have accessed through Google Groups.)

        But still, a good start for people unfamiliar with the Programmer’s Hammer.

    • phil

      plenty of people have apostrophes in their name as well. the logic used to verify the domain has issues as well. for example it fails for anyone who owns a .museum tld.

    • Shahways Romani

      Please note in addition that even john.doe@73.60.124.136 (no, that is not my IP address neither :P) is a valid email address

  • http://cbesslabs.com Sebastian Bratu

    Really useful tutorial !

  • http://everythinglikesuchasblah.com Max

    Very helpful! Thank you for writing this up

  • http://www.firstclick.co.nz Rick Hambrook

    I only read the first two then lost interest… the first one still allows usernames like “—” etc.

    And I hate it oh so much when websites restrict me to weak passwords that can’t be too long or contain special characters. Even my bank does this.

    Kudos for trying though.

  • http://jason.karns.name Jason

    The email regex should be removed entirely. Sites like this provide an invaluable resource for beginners for most development issues. However, with respect to regexes that generally end up just copy/pasted, you really shouldn’t be posting something that isn’t completely fleshed out.
    The email regex has already prevented everyone with a gmail address from signing in with a ‘+’ sign. You’ve kicked out half of the world for those with country domains (@example.co.uk) as you only match one dot after the @. Not to mention the legal email addresses that use IP addresses (email@21.192.56.78). Further, the regex completely ignores quoted values.

    • Anonymouse

      I agree completely with Jason. Please do not encourage bad regex usage. Read the RFC for what’s allowed as an email address before writing something like this. At the very least warn your users not to copy and paste from here!

  • http://vasili.duove.com/ Vasili

    Regarding the email regex:

    I knew it would be almost impossible to write a regex (with my knowledge) that doesn’t take years to write. You can use filter_var, but seeing as how that’s only available in PHP5.2 and up, I thought it would be nice to show how that function worked (to some extent).

    • http://blog.jeffreymcmanus.com/ Jeffrey

      It’s not impossible. Just use the regex that Cal Henderson wrote that I linked to above. It’s complex, but it has the benefit of having been written to the RFC standard for email addresses.

  • Justin

    although i know all these by now, still a great reference for beginners. nice post

  • dreake

    Hy,
    This is my URL validation regex:
    ‘/^(ftp|http|https|gopher|mailto|news|nntp|telnet|wais|file|prospero|aim|webcal|www){1}(\.|:\/\/).{2,}\..{2,6}$/’

  • http://toolskyn.nl Ruben

    I just keep wondering why people prevent special characters in passwords. This only reduces the searchspace for brute force password cracking. And really that saves a huge amount of time if you only have to take a look at a basic alphabet.

    I would really suggest developers to not prevent users to make more difficult to guess passwords. You could however prevent more simple passwords. For example you might require users to at least use one special character, or you might want them to use at least one number. Do note that requiring the user more from a password may prevent them from filling it in at all, so don’t require to much if the risks aren’t to high on your site or application.

    • http://46bit.com Michael Mokrysz

      My favourite technique is, after checking 6 <= length 3 characters, then check there’s an uppercase & lowercase char (and for admin panels and sites that need greater security, a number & possibly symbol as well).

      I’m interested if anyone has any major criticism of my approach – always good to have people look for improvements in your code.

  • Adrian Bloomer

    i use /^[^@:\/]+@[^:@\/]\..{2,4}/ for emails.

    • Adrian Bloomer

      oops i meant /^[^@:\/\s\?]+@[^:@\/\s\?]\.\d{2,4}$/

      • Jonathan

        Can you please explain this example? I have used quite a few regular expressions in my day and can’t for the life of me make sense of this one. This still has the domain issue, as well as the IP issue that was brought up earlier.

        It drives me crazy, as someone who has waded through countless message boards and tutorials to learn about regular expressions to see this kind of sloppy code posted without a thought to whether or not it is a worthwhile solution

  • http://www.vectormesh.info Shibi Kannan

    Very useful article. Just what I was looking for to do some form customizations. Regular expressions are a bit complicated for new folks but in the long run it saves a lot of time and helps validate the input fields and also helps prevent spam. I like the on going discussion on this subject and seems to be there are many ways to doing the same thing with regular expressions.

  • http://www.phpandstuff.com Lane

    Please do not manually write e-mail validation regex. It’s more complicated than it seems. And it’s impractical to come up with a single regex call that can do the job flawlessly. Good implementations involve tens of lines of code.

    I recommend using a trusted third party implementation. Even the PHP filter_var function is not so good. It will verify ‘a@a’ as a valid e-mail.

    • David Jordan

      a@a can be valid. For example on intranets you can set up mail that is something like: admin@localhost

      So because it is on an intranet there is no TLD so it is a valid email address for that particular intranet

  • http://horuskol.net HorusKol

    Nice patterns, and some are useful…

    Unfortunately, you fell into the email trap:

    /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

    for one thing – you don’t want to include a . in the TLD at the end
    also, some people insist on capitalising personal names in emails
    finally, new ICANN regulations mean that there is no limit on the length of TLDs

    about the only suitable validation for an email is to check that there is an @ and at least one . after it – anything else is a waste of processor cycles, especially as a ‘well-formed’ email address doesn’t guarantee existence…

  • concerned…

    While being a good tutorial in understanding regular expressions in general, the thought of these being “canned solutions” to regular problems is greatly concerning. As several people have pointed out, the email regex is flat out wrong (doesn’t follow the RFC), and the username doesn’t follow standard UNIX requirements (e.g. user names must start with a lower case letter). Here’s the actual regex filter from /etc/adduser.conf on my Debian system:

    NAME_REGEX=”^[a-z][-a-z0-9]*\$”

  • http://horuskol.net HorusKol

    also – why bother pattern matching a password input?

    you should be encrypting passwords (preferably with salting) – so you should know that you are going to be testing a 32 or 40 character hex string against.

    and because you are encrypting – ‘special’ characters are not an issue.

    • http://vasili.duove.com/ Vasili

      For the scripts I’ve written, I check the user’s password and store a salted/md5ed/hashed password. :)

    • ken

      You check a PW to make sure it meets complexity requirements, or do you prefer to let users use “aaaaaa” for a PW?

      It makes no sense to test an encrypted PW — when would you do this?

  • ken

    Use \w!

    And you have \w annotated wrong in the graphic — it should be “any letter, number, or underscore”.

  • David Jordan

    A better username one would be:

    /^[a-z]{1,}[a-z0-9 _-]{3,20}$/i

    This would allow only an a-z character as the first letter. This will stop a space or underscore ( _ ) or hyphen ( – ) from being first.

    Also I noticed on most of the regexes that it did not have the /i at the end to say that it is case-insensitive, so on most of the regexes only lowercase alpha characters are matched and not uppercase.

    So I think the html hex one should be:

    /^#([a-f0-9]{6}|[a-f0-9]{3})$/i

    I agree that there doesn’t really need to be one for a password. Passwords should be allowed to contain anything, it should get encrypted anyway so it doesn’t matter to the website what it is.

    The URL one is flimsy as well as others have said.