Perl Regular Expressions

Perl Regular Expressions




Metacharacters
char meaning
^ beginning of string $ end of string . any character except newline * match 0 or more times + match 1 or more times ? match 0 or 1 times; or: shortest match | alternative ( ) grouping; "storing" [ ] set of characters { } repetition modifier quote or special

Repetition

a*     zero or more a's 

a+     one or more a's 

a?     zero or one a's (i.e., optional a) 

a{m}   exactly m a's 

a{m,}  at least m a's 

a{m,n} at least m but at most n a's repetition?



 

t     tab 

n     newline 

r     return (CR) 

xhh   character with hex. code hh 

b     "word" boundary 

B     not a "word" boundary 





w     matches any single character classified as a 

       "word" character (alphanumeric or _) 

W     matches any non-"word" character 

s     matches any whitespace character (space, tab, newline) 

S     matches any non-whitespace character  

d     matches any digit character, equiv. to [0-9] 

D     matches any non-digit character 





[characters] matches any of the characters in the sequence  

[x-y]        matches any of the characters from x to y 

             (inclusively) in the ASCII code  

[-]         matches the hyphen character - 

[n]         matches the newline; other single character 

             denotations with  apply normally, too  

 



Examples

How do I extract everything between a the words “start” and “end”?

$mystring = “The start text always precedes the end of the end text.”;

if($mystring =~ m/start(.*)end/) {

print $1;

}

How do I extract a complete number, like the year?

$mystring = “[2004/04/13] The date of this article.”;

if($mystring =~ m/(d+)/) {

print “The first number is $1.”;

}

# find word that is bolded

# returns: $1 = ‘text’

$line = “This is some text with HTML and “;

$line =~ m/(.*)/i;

Perl Subroutine

Perl Subroutine


sub mysubroutine

{

	print "Not a very interesting routinen";

	print "This does the same thing every timen";

}

regardless of any parameters that we may want to pass to it. All of the following will work to call this subroutine. Notice that a subroutine is called with an & character in front of the name:


&mysubroutine;		# Call the subroutine

&mysubroutine($_);	# Call it with a parameter

&mysubroutine(1+2, $_);	# Call it with two parameters

Parameters

In the above case the parameters are acceptable but ignored. When the subroutine is called any parameters are passed as a list in the special @_ list array variable. This variable has absolutely nothing to do with the $_ scalar variable. The following subroutine merely prints out the list that it was called with. It is followed by a couple of examples of its use.


sub printargs

{

	print "@_n";

}



&printargs("perly", "king");	# Example prints "perly king"

&printargs("frog", "and", "toad"); # Prints "frog and toad"

Just like any other list array the individual elements of @_ can be accessed with the square bracket notation:


sub printfirsttwo

{

	print "Your first argument was $_[0]n";

	print "and $_[1] was your secondn";

}

Again it should be stressed that the indexed scalars $_[0] and $_[1] and so on have nothing to with the scalar $_ which can also be used without fear of a clash.

Returning values

Result of a subroutine is always the last thing evaluated. This subroutine returns the maximum of two input parameters. An example of its use follows.


sub maximum

{

	if ($_[0] > $_[1])

	{

		$_[0];

	}

	else

	{

		$_[1];

	}

}



$biggest = &maximum(37, 24);	# Now $biggest is 37

The &printfirsttwo subroutine above also returns a value, in this case 1. This is because the last thing that subroutine did was a print statement and the result of a successful print statement is always 1.

Local variables

The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1], $_[2], and so on. Other variables can be made local too, and this is useful if we want to start altering the input parameters. The following subroutine tests to see if one string is inside another, spaces not withstanding. An example follows.


sub inside

{

	local($a, $b);                  # Make local variables

	($a, $b) = ($_[0], $_[1]);      # Assign values

	$a =~ s/ //g;                   # Strip spaces from

	$b =~ s/ //g;                   # local variables

	($a =~ /$b/ || $b =~ /$a/);     # Is $b inside $a

					# or $a inside $b?

}



&inside("lemon", "dole money");		# true

In fact, it can even be tidied up by replacing the first two lines with

local($a, $b) = ($_[0], $_[1]);

Perl References

Perl References

I’m happiest writing Perl code that does not use references because they always give me a mild headache. Here’s the short version of how they work. The backslash operator () computes a reference to something. The reference is a scalar that points to the original thing. The ‘$’ dereferences to access the original thing. Suppose there is a string…

$str = “hello”; ## original string

And there is a reference that points to that string…

$ref = $str; ## compute $ref that points to $str

The expression to access $str is $$ref. Essentially, the alphabetic part of the variable, ‘str’, is replaced with the dereference expression ‘$ref’…

print “$$refn”; ## prints “hello” — identical to “$strn”;

Here’s an example of the same principle with a reference to an array…

@a = (1, 2, 3); ## original array

$aRef = @a; ## reference to the array

print “a: @an”; ## prints “a: 1 2 3”

print “a: @$aRefn”; ## exactly the same

Curly braces { } can be added in code and in strings to help clarify the stack of @, $, …

print “a: @{$aRef}n”; ## use { } for clarity

Here’s how you put references to arrays in another array to make it look two dimensional…

@a = (1, 2, 3); @b = (4, 5, 6);

@root = (@a, @b);

print “a: @an”; ## a: (1 2 3)

print “a: @{$root[0]}n”; ## a: (1 2 3)

print “b: @{$root[1]}n”; ## b: (4 5 6)

scalar(@root) ## root len == 2

scalar(@{$root[0]}) ## a len: == 3

For arrays of arrays, the [ ] operations can stack together so the syntax is more C like…

$root[1][0] ## this is 4

Perl if else

The if…else Statement

This statement uses a relational expression to check the validity of a condition and execute a set of statements enclosed in braces. It returns a Boolean value, true or false, according to the validity of the condition. The syntax of the if…else statement is:


if(condition)

{

	block of statement(s);

}

else

{

	block of statement(s);

}

In this syntax, condition is a relational expression. If the result of this expression is true, then the block of statements following the if statement is executed. Otherwise, the block of statements following the else statement is executed.

In Perl, unlike other languages, all loops and conditional constructs require statements to be enclosed in braces, even for single statements.


#! /usr/bin/perl

print "Enter a value for a: ";

$a = <>;

print "Enter a value for b: ";

$b  = <>;

if ($a>$b)

{

   print "a is greater than bn";

}

else

{

   print "b is greater than an";

}

In this example, the if clause checks whether $a is greater than $b. If the value of $a is greater than $b, then the result is: a is greater than b. Otherwise, the control transfers to the code following the else clause and the statement associated with the else clause is printed as a result.

The if…elsif…else Statement

This statement is used when there is more than one condition to be checked. The syntax of the if…elsif…else statement is:


if (condition)

{

	block of statement(s);

}

elsif (condition)

{

	block of statement(s);

}

else

{

	block of statement(s);

}

In this syntax, if the condition associated with the if clause is false, the control transfers to elsif clause that checks for the next condition. The code associated with elsif clause is executed only if the condition is true or the code associated with the else clause is executed.


#! /usr/bin/perl

print "Enter the score of a student: ";

$score = <>;

if($score>=90)

{

  print "Excellent Performancen";

}

elsif($score>=70 && $score<90)

{

  print "Good Performancen";

}

else

{

  print "Try hardn";

}

Perl Loop

The Loop Statements

Different loop statements in perl…

  • The for Loop
  • The foreach Loop
  • The while Loop
  • The do-while Loop
  • The until Loop

The process of executing a code block repetitively is known as

iteration. To perform iteration in applications, use the loop

statements, such as for and while.

The loop statements check a condition and repetitively execute the

enclosed statements until the condition is true. The loop

terminates only when the condition becomes invalid.

The for Loop

This loop is used to execute a given set of statements for a fixed

number of times. The syntax of the for loop is:


for(initialization;testing;updation)

{

	block of statement(s);

}

In these statements:

initialization: Is the code to declare and initialize the

loop counter. Loop counter is a variable to keep a check on the

number of iterations. Initialization happens only once before the

beginning of the loop.

testing: Is the code, which specifies the condition to

control the number of iterations. The loop executes until the

result of testing is true. When the condition becomes false, the

control passes to the statement following the loop.

updation: Is the code to modify the loop counter after each

iteration. It can increment or decrement the loop counter,

according to the program requirements. Updation occurs at the end

of the loop.

block of statement(s): Is the code to be executed

iteratively. This code must be enclosed within the curly braces.

Note The three expressions for initialization, condition, and

updation are optional. If you leave the condition expression empty,

the for loop will be an infinite loop.


#! /usr/bin/perl

print "Enter a digit to create its table: ";

$a = <>;

chomp($a);

for($b=1;$b<=10;$b++)

{

	print $a.' x '.$b.' = '. $a*$b."n";

}

  • $b=1 is the initialization statement, which initializes the

    loop counter $b to 1.

  • $b<=10 is the testing statement, which checks whether the value stored in $b is less than or equal to 10.
  • $b++ is the updation statement, which increments the value of

    $b by 1.

    The Nested for Loop

    A for loop contained inside another for loop is called a nested for

    loop. This is used when the data is to be stored and printed in a

    tabular format having multiple rows and columns.

    
    #! /usr/bin/perl
    
    for($a=0;$a<=9;$a++){
    
            for($b=0;$b<=$a;$b++){
    
                    print "*";
    
            }
    
            print "n";
    
    } 
    
    

    In this program:

    • The outer loop works until the value of $a is less than or

      equal to 9.

    • The inner loop works until the value of $b is less than or

      equal to the value of $a.

    • The newline character n is used to enter a newline after every

      row.

    Note The nested for loops are used with multidimensional arrays.

    For more information on arrays.

    The foreach Loop

    This loop operates on arrays. An array stores multiple related

    values in a row that can be accessed easily using the foreach loop.

    The syntax of the foreach loop is:

    
    foreach $var_name (@array_name)
    
    {
    
    	block of statement(s);
    
    }
    
    

    In this syntax:

    • @array_name is the array whose elements are accessed using the

      foreach loop.

    • $var_name is the scalar variable that stores the value of

      element of @array_name for each iteration.

    • This loop is repeated for all the elements in the array. The

      code used for the foreach loop is:

    
    #! /usr/bin/perl
    
    @names = ("George", "Jack", "Davis");
    
    foreach $word(@names)
    
    {
    
    print "$wordn";
    
    }
    
    

    This example, when executed, prints values of all the elements in

    the @names array one-by-one using the foreach loop. The output of

    the example is shown in Figure 4-5:

    The while Loop

    There may be situations when you do not know the number of times a

    loop is to be executed. For example, an application accepts and

    stores the scores of students in a class, and you do not know the

    number of students in a class. In this example, you can use the

    while loop.


    The while loop executes as long as the condition specified is true.

    The condition can be any valid relational expression, which returns

    true or false.

    This loop is also known as Pre-Check or Pre-Tested Looping

    Construct because the condition is checked before executing the

    statement(s) in the block.

    The syntax of the while loop is:

    
    while (condition)
    
    {
    
    	block of statement(s);
    
    }
    
    

    In these statements, block of statement(s) is executed only if the

    condition is true.

    Note The code block must be enclosed within curly braces.

    
    #! /usr/bin/perl
    
    $a = 1;
    
    while ($a <= 10)
    
    {
    
    	print "$an";
    
    	$a++;
    
    }
    
    

    In this example:

    • $a, the loop counter is initialized to 1.
    • The condition checks whether $a is less than or equal to 10.
    • The value of $a is incremented by 1 in each iteration using the

      post increment operator (++).

    • The loop prints the numbers from 1 to 10 until the value of $a is less than or equal to 10. When $a is incremented to 11, the condition results in false, and the loop terminates.

    The do-while Loop

    The do-while loop is used when you want to execute a code block at

    least once unconditionally, and then iteratively on the basis of a

    condition.

    In this loop, condition is tested at the end of the loop. Because

    of this, this loop is also known as Post-Check or Post Tested

    Looping Construct.

    The syntax of the do-while loop is:

    
    do 
    
    {
    
    	block of statement(s);
    
    }
    
    while (condition);
    
    

    
    #! /usr/bin/perl
    
    $a = 2;
    
    do
    
    {
    
    	print "$an";
    
    	$a+=2;
    
    } while ($a<=20);
    
    

    In this example:

    • $a, the loop counter is initialized to 2.
    • The loop prints the value of $a and also increments it by 2.
    • The condition associated with while tests whether the value of

      $a is less than or equal to 20.

    • The loop prints even numbers until the value of $a is less than

      or equal to 20. When $a is equal to 22, the loop terminates.

    The until Loop

    In case of the while loop, the code that follows condition is

    executed only if the condition is true. In the case of the until

    loop, code associated with the condition is executed only if the

    condition is false. The syntax of the until loop is:

    
    until(condition)
    
    {
    
    	block of statement(s);
    
    }
    
    

    In this syntax, the block of statement(s) is executed only when the

    condition returns false.

    
    #! /usr/bin/perl
    
    $a = 1;
    
    until(a == 11)
    
    {
    
    	print $a."n";
    
    $a++;
    
    }
    
    

    In this program:

    • $a, the loop counter is initialized to 1.
    • The condition checks whether $a is equal to 11.
    • The loop prints and increments the value of $a.
    • The condition returns false until $a is less than 11, and the

      code in the loop prints from 1 to 10. The moment $a is equal to 11,

      the condition is met and loop is terminated.

  • Perl ARGV

    @ARGV and %ENV

    The built-in array @ARGV contains the command line arguments for a Perl program. The following run of the Perl program critic.pl will have the ARGV array (“-poetry”, “poem.txt”).

    unix% perl critic.pl -poetry poem.txt

    %ENV contains the environment variables of the context that launched the Perl program.

    @ARGV and %ENV make the most sense in a Unix environment.

    So for example you have a script like test.pl

    
    #!/usr/bin/perl
    
    use strict;
    
    my $column=$ARGV[0];
    
    my $database=$ARGV[1];
    
    
    
    ------run the program now---
    
    $./test.pl ssn employees
    
    

    in above example we are passing two values as commandline arguments. so $column will have “ssn” and $database will have “employees”.
    Note: Pass values in quoutes if values have spaces.

    Perl Hash Maps

    Perl Hash Maps/associative arrays

    ASSIGNING HASH VALUES

  • hash tables consist of key/value pairs
  • every key is followed by a value

    values can be assigned to hash tables as

    
    %states=
    
    ("California","Sacramento","Wisconsin","Madison","New York","Albany");
    
    

    We can also use => operator to identify the

    key to the left, and the value to the right; if the => operator

    encounters bare words in key positions, they will be automatically

    quoted (note “New York”, however, which consists of two words

    and MUST be quoted

    
    %states=
    
    (California=>"Sacramento",Wisconsin=>"Madison","New York"=>"Albany");
    
    

    In above example California is key and Sacromento is value.

    Similarily Wisconsin is key and Madison is value.

    Printing:

    
    print "Capital of California is " . $states{"California"} . "nn";
    
    

    printing all values(using for loop):

    
    #!/usr/bin/perl
    
    %states=
    
    (California=>"Sacramento",Wisconsin=>"Madison","New York"=>"Albany");
    
    foreach my $keys(keys %states)
    
    {
    
      print "KEY:$keys VALUE:$states{$keys}n";
    
    
    
    }
    
    
    
    output
    
    KEY:Wisconsin VALUE:Madison
    
    KEY:New York VALUE:Albany
    
    KEY:California VALUE:Sacramento
    
    
  • Perl Arrays

    Arrays @

    Array constants are specified using parenthesis ( ) and the elements are separated with

    commas. Perl arrays are like lists or collections in other languages since they can grow

    and shrink, but in Perl they are just called “arrays”. Array variable names begin with the

    at-sign (@). Unlike C, the assignment operator (=) works for arrays — an independent copy

    of the array and its elements is made. Arrays may not contain other arrays as elements.

    Perl has sort of a “1-deep” mentality. Actually, it’s possible to get around the 1-deep

    constraint using “references”, but it’s no fun. Arrays work best if they just contain

    scalars (strings and numbers). The elements in an array do not all need to be the same

    type.

    
    @array = (1, 2, "hello");  ## a 3 element array 
    @empty = (); ## the array with 0 elements
    $x = 1; $y = 2; @nums = ($x + $y, $x - $y); ## @nums is now (3, -1)

    Just as in C, square brackets [ ] are used to refer to elements, so $a[6] is the element

    at index 6 in the array @a. As in C, array indexes start at 0. Notice that the syntax to

    access an element begins with ‘$’ not ‘@’ — use ‘@’ only when referring to the whole

    array (remember: all scalar expressions begin with $).

    
    @array = (1, 2, "hello", "there"); 
    
    $array[0] = $array[0] + $array[1];## $array[0] is now 3 
    
    

    Perl arrays are not bounds checked. If code attempts to read an element outside the array

    size, undef is returned. If code writes outside the array size, the array grows

    automatically to be big enough. Well written code probably should not rely on either of

    those features.

    
    @array = (1, 2, "hello", "there"); 
    
    $sum = $array[0] + $array[27];  
    
    ## $sum is now 1, since $array[27] returned undef 
    
    $array[99] = "the end";
    
    ## array grows to be size 100 
    
    

    When used in a scalar context, an array evaluates to its length. The “scalar” operator

    will force the evaluation of something in a scalar context, so you can use scalar() to get

    the length of an array. As an alternative to using scalar, the expression $#array is the

    index of the last element of the array which is always one less than the length.

    
    @array = (1, 2, "hello", "there"); 
    
    $len = @array;                
    
    ## $len is now 4 (the length of @array) 
    
    $len = scalar(@array);
    
    ## same as above, since $len represented a scalar 
    
    ## context anyway, but this is more explicit 
    
    
    
    @letters = ("a", "b", "c"); 
    
    $i = $#letters;## $i is now 2 
    
    

    That scalar(@array) is the way to refer to the length of an array is not a great moment in

    the history of readable code. At least I haven’t showed you the even more vulgar forms

    such as (0 + @a).

    The sort operator (sort @a) returns a copy of the array sorted in ascending alphabetic

    order. Note that sort does not change the original array. Here are some common ways to sort…

    
    (sort @array)
    
    ## sort alphabetically, with uppercase first 
    
    (sort {$a <=> $b} @array)            
    
    ## sort numerically 
    
    (sort {$b cmp $a} @array)            
    
    ## sort reverse alphabetically 
    
    (sort {lc($a) cmp lc($b)} @array)    
    
    ## sort alphabetically, ignoring case (somewhat inefficient) 
    
    

    The sort expression above pass a comparator function {…} to the sort operator, where the

    special variables $a and $b are the two elements to compare — cmp is the built-in string

    compare, and <=> is the built-in numeric compare.

    There’s a variant of array assignment that is used sometimes to assign several variables

    at once. If an array on the left hand side of an assignment operation contains the names

    of variables, the variables are assigned the corresponding values from the right hand

    side.

    ($x, $y, $z) = (1, 2, “hello”, 4);

    ## assigns $x=1, $y=2, $z=”hello”, and the 4 is discarded

    This type of assignment only works with scalars. If one of the values is an array, the

    wrong thing happens (see “flattening” below).

    Array Add/Remove/Splice Functions

    These handy operators will add or remove an element from an array. These operators change

    the array they operate on…

    Operating at the “front” ($array[0]) end of the array…

    shift(array)
    returns the frontmost element and removes it from the array. Can be used

    in a loop to gradually remove and examine all the elements in an array left to right. The

    foreach operator, below, is another way to examine all the elements.

    unshift(array, elem)
    inserts an element at the front of the array. Opposite of shift.

    Operating at the “back” ($array[$len-1]) end of the array…

    pop(array)
    returns the endmost element (right hand side) and removes it from the

    array.

    push(array, elem)
    adds a single element to the end of the array. Opposite of pop.

    splice(array, index, length, array2)
    removes the section of the array defined by index

    and length, and replaces that section with the elements from array2. If array2 is omitted,

    splice()
    simply deletes. For example, to delete the element at index $i from an array, use

    splice(@array, $i, 1).

    Perl Operators

    Perl Operators

    In Perl, the comparison operators are divided into two classes:

    • Comparison operators that work with numbers
    • Comparison operators that work with strings

    Integer-Comparison Operators

    
    Operator Description  
    
    <        Less than
    
    >        Greater than
    
    ==       Equal to
    
    <=       Less than or equal to
    
    >=       Greater than or equal to
    
    !=       Not equal to
    
    <=> Comparison returning 1, 0, or -1
    
    

    Each of these operators yields one of two values:

    • True, or nonzero
    • False, or zero

    The <=> operator is a special case. Unlike the other integer comparison operators, <=> returns one of three values:

    • 0, if the two values being compared are equal
    • 1, if the first value is greater
    • -1, if the second value is greater

    String-Comparison Operators

    For every numeric-comparison operator, Perl defines an equivalent string-comparison operator.

    
    String operator Comparison operation 
    
    lt              Less than
    
    gt              Greater than
    
    eq              Equal to
    
    le              Less than or equal to
    
    ge              Greater than or equal to
    
    ne              Not equal to !=
    
    cmp             Compare, returning 1, 0, or -1
    
    

    Example

    
    #!/usr/bin/perl
     
    $a="coding-school.com";
    
    $b="www.coding-school.com";
    
    if($a eq $b)
    
    {
    
    print "Both are same";
    
    }
    
    else
    
    {
    
    print "Both are different";
    
    }
    
    

    Perl List

    List

    A list is a sequence of scalar values enclosed in parentheses. The following is a simple example of a list:

    (1, 5.3, “hello”, 2)

    This list contains four elements, each of which is a scalar value: the numbers 1 and 5.3, the string hello, and the number 2.

    Lists can be as long as needed, and they can contain any scalar value. A list can have no elements at all, as follows:

    ()

    This list also is called an empty list.

    NOTE

    A list with one element and a scalar value are different entities. For example, the list

    (43.2)

    and the scalar value

    43.2

    are not the same thing. This is not a limitation because one can be converted to or assigned to the other.