Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.
The one (?!...) is a negative lookahead assertion.
The $& var doesn’t really matter outside of Perl. It contains the text of the pattern you just matched, but even within Perl, capture groups are preferred. Using it at all also slows down your program every time a new regex is hit, which is especially bad in long running web server environments.
What really matters is that the lookaheads don’t consume any text. In other words, the pointer that shows where in the text we are doesn’t increment; once we’re outside of the lookahead, we’re still right back in the same place.
So let’s break this down using the /x modifier to make it somewhat sane.
/^
(?!.*\s) # no whitespace allowed
(?=.{8,256}$)# between 8 and 256 characters (the '$' here indicating the end of the string)
(?=.*[a-z]) # has to be a lowercase ASCII alphabet somewhere
(?=.*[A-Z]) # has to be an uppercase ASCII alphabet somewhere
( # need a number, or a list of special chars on a US keyboard
(?=.*[0-9])
| (?=.*[~!@#$%^&*()-=_+[\]{}|;:,./<>?])
)
.* # consumes the whole string$/x
Notes:
Doesn’t make any allowances for non-English charecters, or even non-US characters (like the “£” character in the UK)
There’s a whole slew of utf8 characters out there that you should let through
There’s no reason to deny whitespace; let people use passphrases if they want (but then, you also don’t want to block those people for not using symbols)
Putting a limit at 256 is questionable, but may not necessarily be wrong
That last one has some nuance. We often say you shouldn’t put any upper limit, but that’s generally not true in the real world. You don’t want someone flooding an indefinite amount of data into any field, password or not. A large limit like this is defensible.
Also, lots of devs are surprised to learn that bcrypt and scrypt have a length limit of 72 bytes. A way around this is to run your input through SHA256 before giving it to bcrypt or scrypt.
I don’t see a reason to limit the length as long as the password hash can handle large values. I am green when it comes to the inner workings of password hashing, so I may be wrong.
Being able to handle it, and being able to handle it efficiently enough are two very distinct things. The hash method might be able to handle long strings, but it might take several seconds/minutes to process them, slowing down the application significantly. Imagine a malicious user being able to set a password with millions (or billions!) of characters.
Therefore, restricting it to a small, but still sufficiently big, number of characters might help prevent DoS-attacks without any notable reduction in security for regular users.
Honestly, white space is a character, and adds extra entropy to passwords. I do not understand why people do not want to promote using white space in passwords/passphrases. If I’m missing something intrinsically bad about white space in passwords, I’d love to know.
Those
(?=...)
bits are positive lookahead assertion:The one
(?!...)
is a negative lookahead assertion.The
$&
var doesn’t really matter outside of Perl. It contains the text of the pattern you just matched, but even within Perl, capture groups are preferred. Using it at all also slows down your program every time a new regex is hit, which is especially bad in long running web server environments.What really matters is that the lookaheads don’t consume any text. In other words, the pointer that shows where in the text we are doesn’t increment; once we’re outside of the lookahead, we’re still right back in the same place.
So let’s break this down using the
/x
modifier to make it somewhat sane./^ (?!.*\s) # no whitespace allowed (?=.{8,256}$) # between 8 and 256 characters (the '$' here indicating the end of the string) (?=.*[a-z]) # has to be a lowercase ASCII alphabet somewhere (?=.*[A-Z]) # has to be an uppercase ASCII alphabet somewhere ( # need a number, or a list of special chars on a US keyboard (?=.*[0-9]) | (?=.*[~!@#$%^&*()-=_+[\]{}|;:,./<>?]) ) .* # consumes the whole string $/x
Notes:
That last one has some nuance. We often say you shouldn’t put any upper limit, but that’s generally not true in the real world. You don’t want someone flooding an indefinite amount of data into any field, password or not. A large limit like this is defensible.
Also, lots of devs are surprised to learn that bcrypt and scrypt have a length limit of 72 bytes. A way around this is to run your input through SHA256 before giving it to bcrypt or scrypt.
I don’t see a reason to limit the length as long as the password hash can handle large values. I am green when it comes to the inner workings of password hashing, so I may be wrong.
Being able to handle it, and being able to handle it efficiently enough are two very distinct things. The hash method might be able to handle long strings, but it might take several seconds/minutes to process them, slowing down the application significantly. Imagine a malicious user being able to set a password with millions (or billions!) of characters.
Therefore, restricting it to a small, but still sufficiently big, number of characters might help prevent DoS-attacks without any notable reduction in security for regular users.
Nerd!
As someone who spent many years as a Perl developer, I immediately recognized the incantations to the regex gods of old, heh. Great explanation!
Honestly, white space is a character, and adds extra entropy to passwords. I do not understand why people do not want to promote using white space in passwords/passphrases. If I’m missing something intrinsically bad about white space in passwords, I’d love to know.
You’re right on. As long as you’re otherwise following best practices for storing passwords, there’s no downside.
But but but if I add it to the queryparams for my rest endpoint the space will break my URL!
Always urlencode your passwords!
Wait, that doesn’t seem right…