Regular Expression
Teacher: Today we will learn regex and how to use it in Java.
Jonny & Alice screaming with fear, they said in chorus, sir it is the most confusing thing which makes our life miserable as a programmer.
Teacher: Smiled!!! And replied yes I also used to think in this way when I am student :), But it is not hard as you think, Just need some key points to remember, Yes you have to remember key points like you remember History and Geography.
I try to point out those keys which will break your fear.
Before that tell me why you are saying regex confusing and also share the confusion to me.
Alice: Sir, The common problem is, it is hard to read and write, What we mean by that is , If we have a string to validate with a complex pattern, say email validation if you look the regex for that, it is a one liner with multiple backslashes, third brackets, first bracket etc. so we often perplex how to understand what it says. In simple word just seeing a regex solution, we don’t understand what it tries to say.
Teacher: So you mean readability right, you prefer to write more code to avoid RegEx, so code increases readability but lets me tell you regex is a Holy Grail if you unleash its power you can write concise and readable code.
Jonny: Sir another problem is there is no fix solution for a problem let takes the example of email validation if you search it in google you can see a ton of different solutions to validate email. So it is hard to take the right one?
Teacher: This is because you are not understood the crux of regex. Any other problem??
Students: Pin drop silence there.
Teacher slowly takes a step towards board and start his lesson.
What is Regex?
The teacher said Regular Expression is a technique  for search a pattern in a String, This search pattern can be very simple to very complex, a word to a sentence, or an expression made by different meta-characters or symbol used in the regex.
To understand regex correctly we need to know metacharacters/symbols and it’s meaning, This is the only thing you need to remember.
We found regex hard because we are not able to understand the usage of symbols.
Let take a look what are the different symbols used in the regex.
We can classify regex symbols in 3 brackets.
- Meta-Characters.
 - Ranges & reserved symbols.
 - Quantifiers.
 
Meta-Characters : In regex, there are some reserved metacharacters which have
pre-defined meanings to express some common patterns like the digit, whitespace etc in a compact way.
Meta Character 
 | 
Expression 
 | 
Alternate Expr. 
 | 
Definition 
 | 
To Express digit 
 | 
\d 
 | 
[0-9] or [^\D] 
 | 
By this we represent a digit character 
 | 
To Express anything but not digit 
 | 
\D 
 | 
[^0-9] or [^\d] 
 | 
By this we represent a non-digit character 
 | 
To Express a word 
 | 
\w 
 | 
[a-zA-Z_0-9] or [^\W] 
 | 
By this we represent a word character 
 | 
To Express anything but not a  word 
 | 
\W 
 | 
[^a-zA-Z_0-9] or [^\w] 
 | 
By this we represent a non-word character 
 | 
To Express a whitespace 
 | 
\s 
 | 
[\t\n\x0b\r\f] or [^\S] 
 | 
By this we represent any whitespace like \r,\t,\n etc 
 | 
To Express anything but not a whitespace 
 | 
\S 
 | 
[^\t\n\x0b\r\f] or [^\s] 
 | 
By this we represent any non whitespace  
 | 
To Express a boundary 
 | 
\b 
 | 
[a-zA-Z0-9_]  
 | 
By this we represent a boundary 
 | 
Ranges & reserved symbols :  In regex when we try to match pattern, some information has to mention like how many times a pattern will be matched or you want to match the beginning of the string or end of the string or more complex pattern like maximum how many times a pattern can be a String or minimum etc. we defined them using ranges and reserved symbols.
Symbol 
 | 
Description 
 | 
Example 
 | 
Example Definition 
 | 
. 
 | 
Any character 
 | 
.ha. 
 | 
Start with any character followed by ha then any character -- sham match:  gyan: not match 
 | 
^ 
 | 
Check beginning of the line 
 | 
^sha 
 | 
If line starts with sha matched else false 
sham : match :Aha “ not match 
 | 
$ 
 | 
Check end of the line 
 | 
tra$ 
 | 
If line ends with tra matched else false 
Mitra: match :Chakra “ not match 
 | 
[xyz] 
 | 
Match either x or y or z 
 | 
a[xyz] 
 | 
ax : Matched 
aa : not matched 
 | 
[xyz][abc] 
 | 
Match x,y or z followed by a or b or c 
 | 
s[hwo][abc] 
 | 
sha : Matched 
sou : Not matched 
 | 
XA 
 | 
Exactly X followed by A 
 | 
sm 
 | 
sm: Matched 
Sa : Not Matched 
 | 
X|A 
 | 
X or A 
 | 
s[X|A] 
 | 
sX: Matched 
sZ: Not Matched 
 | 
[^abc] 
 | 
Remember : When ^ uses in side third braces act as Negate. 
 | 
s[^abc]m 
 | 
shm:Matched 
sam:Not Matched 
 | 
[a-c1-10] 
 | 
Match between a to c and digit 1 to 10 remember  
 | 
s[x-z1-10] 
 | 
sy:Matched 
sb : Not Matched 
 | 
() 
 | 
Used for Grouping 
 | 
(s[^yz])(a|b)([a-c1-10] 
 | 
sab1: Matched 
shac: Matched 
syab: Not Matched 
sabb: Matched 
 | 
Quantifiers: Quantifiers say how many times a pattern can be found in a String.
Quantifiers 
 | 
Description 
 | 
Example 
 | 
Example Definition 
 | 
* 
 | 
Pattern can occurs zero to many times 
 | 
s(\s)*m 
 | 
sm:Matched 
s    m : Matched 
s:m:Not Matched 
 | 
+ 
 | 
Pattern can occurs one to many times 
 | 
s(\s)+m 
 | 
s    m : Matched 
sm:Not Matched 
 | 
? 
 | 
Pattern can occur no or one time 
 | 
s(h)?a 
 | 
sha:Matched 
sa:Matched 
shha:Not Matched 
 | 
{X} 
 | 
Pattern must occurs exactly X times 
 | 
s(\d)(3) 
 | 
s123:Matched 
s1234:Not Matched 
s1:Not Matched 
 | 
{X,Y} 
 | 
Pattern must occurs at least X and at maximum Y 
 | 
s(\d)(2,4) 
 | 
s12:Matched 
s1:Not Matched 
s12345:Not Matched 
 | 
Email Validation :
Teacher : So Jonny earlier you said that Email validation is  confusing, Now can you guys tell us what below email validation says,
^[A-Za-z0-9]+(\\.[A-Za-z0-9-]+)*
@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$;
@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$;
Jonny: Yes sir, first part says ^[A-Za-z0-9-\\+]+, email must start with any characters and there must be one occurrence,^ denotes the start of the line and + says one or more, so email can start with any characters with any length.
Sir: Very good, Alice you tell me the second part.
Alice : (\\.[A-Za-z0-9-]+)*, this says that after first part it followed by a dot then again any length of characters but at least one and this part is optional as * is in the last.
Sir: Impressive.
Jonny: @[A-Za-z0-9-]+  Then it strictly matches @ and then at least one character. As + is there.
Alice : (\\.[A-Za-z0-9]+)* again it follows by the dot and at least one character and it is optional again.
Jonny : (\\.[A-Za-z]{2,})$ then email ends($) with a dot and any character in a-z or A-z and length between two to any.
Sir: Great, Now Alice, tell me a Valid Email according to this regex.
Alice: shamik.mitra@gmail.co.in or shamik@gmail.com
Sir: good, Jonny tell me an invalid one
Jonny: shamik.mitra@co.i or .mitra@gmail.co.uk
Sir: Well it seems you are learning regex very quickly. So before finish today's lesson I give you one tip, stick above tables in your desk so every day you can go through the regex symbols then easily you will remember the Regex.






Post a Comment