The confusing RegEx

Regular Expression



the-regex-session-with-shamik-highres.png

Teacher: Today we will learn regex and how to use it in Java.
Jonny & Alice screaming with fear, they said in chorus, sir it is the most confusing thing which makes our life miserable as a programmer.
Teacher: Smiled!!! And replied yes I also used to think in this way when I am student :), But it is not hard as you think, Just need some key points to remember, Yes you have to remember key points like you remember History and Geography.
I try to point out those keys which will break your fear.
Before that tell me why you are saying regex confusing and also share the confusion to me.
Alice: Sir, The common problem is, it is hard to read and write, What we mean by that is , If we have a string to validate with a complex pattern, say email validation if you look the regex for that, it is a one liner with multiple backslashes, third brackets, first bracket etc. so we often perplex how to understand what it says. In simple word just seeing a regex solution, we don’t understand what it tries to say.
Teacher: So you mean readability right, you prefer to write more code to avoid RegEx, so code increases readability but lets me tell you regex is a Holy Grail if you unleash its power you can write concise and readable code.

Jonny: Sir another problem is there is no fix solution for a problem let takes the example of email validation if you search it in google you can see a ton of different solutions to validate email. So it is hard to take the right one?
Teacher: This is because you are not understood the crux of regex. Any other problem??
Students: Pin drop silence there.
Teacher slowly takes a step towards board and start his lesson.

What is Regex?
The teacher said Regular Expression is a technique  for search a pattern in a String, This search pattern can be very simple to very complex, a word to a sentence, or an expression made by different meta-characters or symbol used in the regex.
To understand regex correctly we need to know metacharacters/symbols and it’s meaning, This is the only thing you need to remember.
We found regex hard because we are not able to understand the usage of symbols.
Let take a look what are the different symbols used in the regex.
We can classify regex symbols in 3 brackets.

  1. Meta-Characters.
  2. Ranges & reserved symbols.
  3. Quantifiers.



Meta-Characters : In regex, there are some reserved metacharacters which have
pre-defined meanings to express some common patterns like the digit, whitespace etc in a compact way.



Meta Character
Expression
Alternate Expr.
Definition
To Express digit
\d
[0-9] or [^\D]
By this we represent a digit character
To Express anything but not digit
\D
[^0-9] or [^\d]
By this we represent a non-digit character
To Express a word
\w
[a-zA-Z_0-9] or [^\W]
By this we represent a word character
To Express anything but not a  word
\W
[^a-zA-Z_0-9] or [^\w]
By this we represent a non-word character
To Express a whitespace
\s
[\t\n\x0b\r\f] or [^\S]
By this we represent any whitespace like \r,\t,\n etc
To Express anything but not a whitespace
\S
[^\t\n\x0b\r\f] or [^\s]
By this we represent any non whitespace
To Express a boundary
\b
[a-zA-Z0-9_]
By this we represent a boundary




Ranges & reserved symbols :  In regex when we try to match pattern, some information has to mention like how many times a pattern will be matched or you want to match the beginning of the string or end of the string or more complex pattern like maximum how many times a pattern can be a String or minimum etc. we defined them using ranges and reserved symbols.




Symbol
Description
Example
Example Definition
.
Any character
.ha.
Start with any character followed by ha then any character -- sham match:  gyan: not match
^
Check beginning of the line
^sha
If line starts with sha matched else false
sham : match :Aha “ not match
$
Check end of the line
tra$
If line ends with tra matched else false
Mitra: match :Chakra “ not match
[xyz]
Match either x or y or z
a[xyz]
ax : Matched
aa : not matched
[xyz][abc]
Match x,y or z followed by a or b or c
s[hwo][abc]
sha : Matched
sou : Not matched
XA
Exactly X followed by A
sm
sm: Matched
Sa : Not Matched

X|A
X or A
s[X|A]
sX: Matched
sZ: Not Matched
[^abc]
Remember : When ^ uses in side third braces act as Negate.
s[^abc]m
shm:Matched
sam:Not Matched
[a-c1-10]
Match between a to c and digit 1 to 10 remember 
s[x-z1-10]
sy:Matched
sb : Not Matched
()
Used for Grouping
(s[^yz])(a|b)([a-c1-10]
sab1: Matched
shac: Matched
syab: Not Matched
sabb: Matched






Quantifiers: Quantifiers say how many times a pattern can be found in a String.



Quantifiers
Description
Example
Example Definition
*
Pattern can occurs zero to many times
s(\s)*m
sm:Matched
s    m : Matched
s:m:Not Matched
+
Pattern can occurs one to many times
s(\s)+m
s    m : Matched
sm:Not Matched

?
Pattern can occur no or one time
s(h)?a
sha:Matched
sa:Matched
shha:Not Matched
{X}
Pattern must occurs exactly X times
s(\d)(3)
s123:Matched
s1234:Not Matched
s1:Not Matched
{X,Y}
Pattern must occurs at least X and at maximum Y
s(\d)(2,4)
s12:Matched
s1:Not Matched
s12345:Not Matched



Email Validation :

Teacher : So Jonny earlier you said that Email validation is  confusing, Now can you guys tell us what below email validation says,

^[A-Za-z0-9]+(\\.[A-Za-z0-9-]+)*
     @[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$;

Jonny: Yes sir, first part says ^[A-Za-z0-9-\\+]+, email must start with any characters and there must be one occurrence,^ denotes the start of the line and + says one or more, so email can start with any characters with any length.
Sir: Very good, Alice you tell me the second part.
Alice : (\\.[A-Za-z0-9-]+)*, this says that after first part it followed by a dot then again any length of characters but at least one and this part is optional as * is in the last.
Sir: Impressive.
Jonny: @[A-Za-z0-9-]+  Then it strictly matches @ and then at least one character. As + is there.
Alice : (\\.[A-Za-z0-9]+)* again it follows by the dot and at least one character and it is optional again.
Jonny : (\\.[A-Za-z]{2,})$ then email ends($) with a dot and any character in a-z or A-z and length between two to any.
Sir: Great, Now Alice, tell me a Valid Email according to this regex.
Alice: shamik.mitra@gmail.co.in or shamik@gmail.com
Sir: good, Jonny tell me an invalid one
Jonny: shamik.mitra@co.i or .mitra@gmail.co.uk
Sir: Well it seems you are learning regex very quickly. So before finish today's lesson I give you one tip, stick above tables in your desk so every day you can go through the regex symbols then easily you will remember the Regex.


















The Hollywood Principle

The Hollywood Principle


Hollywood.jpg



Hollywood Principle says:  "Don't call us, we'll call you"

This little sentence opens up a new viewpoint in Software industries. This is one of the important principles every developer should know.

In this article, we will try to discuss this principle.

"Don't call us, we'll call you" what does it mean?

In layman terms,  you don’t have to bother about when your turns come, when your turns come they will call you.

But What you and they denote here?

To explain "you" and "they" in technical terms first we need to understand How a Software design works.

When we design a software we try to implements two things.

  1. API
  2. Framework.


  1. API:  API used to publish some methods/functions so the caller/user of the API call this method to get some useful information. So caller does not have any action points to take only call methods and get output.
  2. Framework: The Framework is a little bit critical than API. The framework is maintaining an algorithm but it expects the value to be produced by the caller of the Framework. To put it simple way, Framework takes Strategy or Business implementation from the caller and call it when it requires.



By Hollywood principle, we can make Framework works, where you means the Strategy or business implementation, which needs to be fed, and they denote framework engine/implementation which calls fed strategy when required.

Real time Example:

Spring DI: Think about Spring where we declare beans in XML, and Spring container call these beans create Spring beans, inject other beans into it and returns fully configured bean.  So by the help of XML, we feed the strategy and Spring container calls them when required. We often called it Dependency Injection so Hollywood Principle’s another name is IOC(Inversion of Control).

Struts 1.x: Pay attention to Struts 1.x implementation where caller of Struts extends ActionClass and provide  Business implementations in the Action class and Struts framework call those Action class based on the URL mentioned in Struts config file. So here Action class is Strategy and Struts framework invokes it.

Observer pattern/Listener in Swing:  Think about Swing’s actionListener we subscribe to an event like button click, on Blur etc and when this event occurs Swing call our code written in the actionPerformed method.


Apart from this Servlet / EJB all maintains lifecycles so underlying server calls appropriate life cycle method when the servlet or EJB state changed like init, service, destroy or ejbActivate ejbPassivate etc.

We call those methods as Callback methods because Framework call this method we don’t have to call them but we may provide the implementations of those methods if we want to push some strategies in Framework.



Let see a use case where we can use Hollywood principle

Say Cognizant has a Resume upload portal where job seeker uploads their resumes. When there is an On campusing happens in Cognizant they will call them by sending a mail to their inbox.

We can implement the same by Hollywood principle,
Jobseeker uploads their resumes in Job portal and Cognizant send mail to them when an on campusing occurs.
cognizant says : Don’t call me I will call you. :)Hollywood (2).jpg

Let see the implementation.

Assumption : Here I have not introduced any interface for sake of simplicity and not add any complex scenario like a priority, reset, GroupWise mail sending etc. as I just wanted to show How Hollywood principle works.


Resume.java

/**
*
*/
package com.example.hollywood;

/**
* @author Shamik Mitra
*
*/
public class Resume {
    private String email;
    private String name;
    private String content;
    public String getEmail() {
        return email;
    }
    public void setEmail(String email) {
        this.email = email;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getContent() {
        return content;
    }
    public void setContent(String content) {
        this.content = content;
    }
    @Override
    public String toString() {
        return "Resume [email=" + email + ", name=" + name + ", content=" + content + "]";
    }
   
   

}


CognizantJobPortal.java

/**
*
*/
package com.example.hollywood;

import java.util.ArrayList;
import java.util.List;

/**
* @author Shamik Mitra
*
*/
public class CognizantJobPortal {
   
    private static CognizantJobPortal portal = new CognizantJobPortal();
   
    public static CognizantJobPortal get(){
        return portal;
    }
   
    private CognizantJobPortal(){
       
    }
   
   
    private List<Resume> resumeList = new ArrayList<Resume>();
   
    public void uploadContent(String mail ,String name,String content)
    {
        Resume resume = new Resume();
        resume.setName(name);
        resume.setEmail(mail);
        resume.setContent(content);
        resumeList.add(resume);
    }

   
    public void triggercampusing(){
        for(Resume resume : resumeList){
            System.out.println("Sending mail to " + resume.getName() + " at " + resume.getEmail());
        }
    }
}


Test

/**
*
*/
package com.example.hollywood;

/**
* @author Shamik Mitra
*
*/
public class HollywoodTest {
   
    public static void main(String[] args) {
       
        CognizantJobPortal.get().uploadContent("shamik@xyz.com", "Shamik Mitra", "A java developer");
        CognizantJobPortal.get().uploadContent("Ajanta@vvv.com", "Ajanta Mitra", "A PHP developer");
        CognizantJobPortal.get().uploadContent("Swastika@vvv.com", "Swastika Mitra", "A Microservice developer");
        CognizantJobPortal.get().uploadContent("Mayukh@vvv.com", "Mayukh Mitra", "A Network engineer");
        CognizantJobPortal.get().uploadContent("Samir@123.com", "Samir Mitra", "A java Architect");   
        // Now trigger campusing
        CognizantJobPortal.get().triggercampusing();
    }

}

Output:

Sending mail to Shamik Mitra at shamik@xyz.com
Sending mail to Ajanta Mitra at Ajanta@vvv.com
Sending mail to Swastika Mitra at Swastika@vvv.com
Sending mail to Mayukh Mitra at Mayukh@vvv.com
Sending mail to Samir Mitra at Samir@123.com


Please note that in CognizantJobPortal class I maintain a list where uploaded resumes are added. When cognizant triggers a campusing, job portal/framework send the mail to all jobseekers who uploaded the CV to Cognizant.