Regular expressions... most of all of us don't know what they are and why they're included in the common text editor software.
Today I want to introduce to the ones that never used them the reason of this powerful mechanism to manage strings.
Why only few persons do use them?
Probably because the syntax seems to be difficult, but this concept is valid also for every programming language. And also because they seems to be enigmatic and so not easy.
The funny thing is that most of us don't know that we can save a lot of time with them.
Backreference
For me the most powerful aspect using regular expression is what is called backreference. It's used for replacing purpose and it's simply the enclosure of a part of regex for reusing it for the search/replace operation.
notepad++
My free favorite text editor. Simple, fast and very rich of plugins. One that is related to this post is the "Python scripts" plugin. I'll dedicate a future post in the future. It makes possible to script more than one regex replace job.
Notepad++ natively supports regex and this page is very rich of description and explanation: obviously better than my post.
To open the find/replace window simply press CTRL-H:
Remember to check the Regular expression checkbox as indicated by the lower arrow.
Instead of annoying you with theory I prefer to introduce to you the importance of regex with the help of concrete examples:
Example #1.1:
We have a balance report of thousand and thousand transactions with the date formatted to English style (MM-DD-YYYY) and we have a new tool that imports only the transactions with the European format (DD-MM-YYYY).
12/27/2013 - transaction 1 €12500,00
07/11/2014 - transaction 2 €500,00
07/11/2015 - transaction 3 €37500,00
...
Find: (\d{1,2})(\/)(\d{1,2})
Explanation:
Find:
(\d{1,2}) : a number with at least one digit and no more than two, preferring the maximum. Make this as my group 1 and 3 for backreference.
(\/) : and followed by a "slash" char. Make this as my group 2 for backreference.
Replace: \3\2\1
The result is the following:
27/12/2013 - transaction 1 €12500,00
11/07/2014 - transaction 2 €500,00
11/07/2015 - transaction 3 €37500,00
...
Elapsed time: ~30s
Saved time: haahhaha... days
Example #1.2:
In the example #1.1 we need to move the number of the transaction first and keep all the columns aligned. The word transaction is not more necessary.
Find: (^\d{1,2}\/\d{1,2}\/\d{4}).*transaction\h{1,}(\d{1,}).*(€.*$)
Explanation:
Find:
(^\d{1,2}\/\d{1,2}\/\d{4}) : a number with at least one digit and no more than two, preferring the maximum starting from the beginning of the line ('^' char), followed by a "slash" char, another one or two digit numbers and another "slash" char and followed by a number with fixed four digits: this implies a date format. Make this as my group 1 for backreference.
.*transaction\h{1,} : every sequence of chars followed by the word "transaction" and followed by one or more horizontal spaces (blank or tab)
(\d{1,}) : a number with at least one or more digits, preferring the maximum.
Make this as my group 2 for backreference.
.* : every chars until...
(€.*$) : an € symbol followed by every chars until the end of the line. Make this as my group 3 for backreference.
Replace: \2\t\1\t\3 <- note the tabs for aligning the columns
The result is the following:
1 12/27/2013 €12500,00
2 07/11/2014 €500,00
3 07/11/2015 €37500,00
...
Elapsed time: ~60s
Saved time: haahhaha... days
Example #1.3:
In the example #1.1 we need to remove all the transactions with price less than €1000,00.
Find: (^.*€\d{1,3}\p{punct}.*\v{1,}
Explanation:
Find:
(^.*€ : every chars starting from the beginning of the line ('^' char) until the '€' symbol.
\d{1,3}\p{punct}.* : followed by a number with at least one digit and not more than three (999), followed by a punctuation character and followed by every chars till...
\v{1,} : ... one or more vertical white space. This encompasses the The VT, FF and CR control characters: 0x0B (vertical tab), 0x0D (carriage return) and 0x0C (form feed).
Replace: <- nothing, we need to delete all!
The result is the following:
1 12/27/2013 €12500,00
3 07/11/2015 €37500,00
...
Elapsed time: ~30s
Saved time: hours
Example #2.1:
In the following example suppose to have a website with password authentication. You want to check if a registering user is entering a good password.
Good password for this exercise means:
1- at least eight chars
2- at least one lowercase char
3- at least one uppercase char
4- at least one digit
We test our exercise on the following examples:
QD6Lb5Lm
kLTa9ek
F2JVyWcD
zsCh4SJ73e
cjxyg9
hTduy6TXc
6fngt
eJz4AdRR
C9Xj
srm3T
aaaaaaaa
BBBBBBBB
77uhb3HYccee
5PEbm36U
2LXJKESK
v9XRNGzZ
6mtFbHhRdew
UTd4jrf2
XFQsADdRs
t3A3avmvd
JHe2vTPv
Find: ((?=.*\p{lower})(?=.*\p{upper})(?=.*\p{digit}).{8,})
Replace: \1\t\t\t\tgood password
Explanation:
Find:
(?=.*\p{lower}) : one or more lower letter chars, if found backtrack at the beginning
(?=.*\p{upper}) : one or more upper letter chars, if found backtrack at the beginning
(?=.*\p{digit}) : one or more digit chars, if found backtrack at the beginning
.{8,} :for the previous matches make sure to find at least 8.
(...) : make the previous pattern as my group 1 for backreference
Replace:
the group 1 (the password) followed by 4 tabs and a "good password" string
Here's the result:
QD6Lb5Lm good password
kLTa9ek
F2JVyWcD good password
zsCh4SJ73e good password
cjxyg9
hTduy6TXc good password
6fngt
eJz4AdRR good password
C9Xj
srm3T
aaaaaaaa
BBBBBBBB
77uhb3HYccee good password
5PEbm36U good password
2LXJKESK
v9XRNGzZ good password
6mtFbHhRdew good password
UTd4jrf2 good password
XFQsADdRs
t3A3avmvd good password
JHe2vTPv good password
Now we want to cut bad passwords
Find: (^\w{1,}\v{1,})
Replace: <- nothing, we need to delete all!
Explanation:
Find:
(^\w{1,}\v{1,}) : from the beginning of the line one or more word char (which is a letter, digit or underscore) followed by one or more vertical spaces (the line feed and carriage return )
And here's the result
QD6Lb5Lm good password
F2JVyWcD good password
zsCh4SJ73e good password
hTduy6TXc good password
eJz4AdRR good password
77uhb3HYccee good password
5PEbm36U good password
v9XRNGzZ good password
6mtFbHhRdew good password
UTd4jrf2 good password
t3A3avmvd good password
JHe2vTPv good password
Elapsed time: ~60s
Saved time: it depends of your site ;)
That's all for today. I'll keep this post updated with fresh new examples or if you have a problem we can try to solve it.
No comments :
Post a Comment