Regular Expressions in Ruby

Regular Expressions are a powerful tool to process text and extract information from it. The idea behind regular expressions is that you can define an expression to check if an input string matches a certain pattern. And you can manipulate or extract pieces of information from the string according to your needs. It’s very useful because instead of coding a long program to manipulate a string, you just have to create an expression.

In Ruby, the =~ operator is used for matching regular expressions. Example:

 >> sentence = "This is just a test."
=> "This is just a test."
>> sentence =~ /^[A-Z].*[?!.]$/
=> 0

In the example above, the regular expression matches the string provided (remember that in Ruby zero evaluates to true). The regular expression in the example is /^[A-Z].*[?!.]$/ . When using the =~ operator, the regular expression must be provided between slashes. This expression checks the following pattern:

  • ^: this character anchors the pattern to the beginning of the string.
  • [A-Z]: the first character in the sentence has to be in the range A-Z (uppercase letters only).
  • .*: the dot matches any character, and the asterisk means that this pattern can repeat zero or many times. Therefore, it will match any sequence of characters, or no characters at all.
  • [?!.]: matches one ocurrence of any of these punctuation marks.
  • $: anchors the pattern to the end of the string.

So, in resume, this regular expression will match any string that starts with an uppercase letter (A-Z) and finishes with a punctuation mark (dot, interrogation or exclamation). Note that the square brackets are used to define ranges or lists of characters in the expression.

You can find more information about Ruby’s regular expressions at http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm.

Advertisement

Ruby

After a long time dealing with boring programming languages, I’m really excited for having been introduced to Ruby. In my opinion, programming languages have to be compact, or, to use a better word: minimalist. And this is a great characteristic of Ruby. You don’t have to spend ages to learn it, specially if you already know how to program in another Object-Oriented language.

I’m gonna list some of the great characteristics of Ruby:

  1. It’s a minimalist language, as I already noted. The syntax is easy to learn.
  2. Strong Object-Orientation. Everything is an object!
  3. Since everything is an object, every operation is a method call. Even basic arithmetic operations (+, -, etc.) are method calls.
  4. Metaprogramming: you can alter methods and classes at any time, even when the program is running. You can even change the methods from the basic classes of the language such as String or Integer.

Other positive points (in my subjective opinion) are that it’s extremelly compatible Linux, BSD or any other Unix-based operating system; you can use an interactive interpreter to develop, which makes it really easy to test and run pieces of code; and there is an excellent framework for agile web development available for Ruby, called Rails. It’s more usual to see the term Ruby on Rails; just remember that Ruby is the language, and Rails is the framework.

Arrays in Ruby

Let’s start this post creating an array in Ruby (notice we are using the irb environment for our examples):

>> array=["a","b","c","d"]
=> ["a", "b", "c", "d"]

You just put a list of items separated by commas between square brackets, assign it to a variable, and the array is created. In Ruby, the array indexes start at 0, so if you want to retrieve the first item from the array you just created, simply type:

>> array[0]
=> "a"

Let’s try to access an index out of range in the array.

>> array[4]
=> nil

You will get the nil object as return value.

You can retrieve several items at once using array[i,n] in which i is the starting index for the items you want, and n is the number of items you want.

>> array[2,2]
=> ["c", "d"]

You can make assignments to your array, replacing multiple items at once. Let’s start modifying the array we created above.

>> array[1,2]=["B","C"]
=> ["B", "C"]
>> array
=> ["a", "B", "C", "d"]

It’s possible to replace multiple items by a different number of items.

>> array[1,2]=["X","Y","Z"]
=> ["X", "Y", "Z"]
>> array
=> ["a", "X", "Y", "Z", "d"]

It’s also possible delete them using the same technique.

>> array[2,3]=[]
=> []
>> array
=> ["a", "X"]

You can even insert new elements between the other elements of the array using array[i,n] with n=0. Let’s create another array with symbols and insert one element between the others.

>> a=[:a,:b,:d,:e]
=> [:a, :b, :d, :e]
>> a[2,0]=[:c]
=> [:c]
>> a
=> [:a, :b, :c, :d, :e]

You can think that the array[i,n] syntax means: retrieve n elements from the array starting by the i-th position. If you try to access elements from the array using n=0, you will get [] as a result (within the range of the array).

>> a[1,0]
=> []

Actually, what is happening on the situation above, is that n=0 will define a place between elements of the array, just between the indexes i-1 and i.

For retrieving elements, you may see no reason to use array[i,n] with n=0. But, for inserting elements, it can be really useful. As you saw in the examples above, instead of replacing elements, you can insert new elements on the array using n=0. So, to insert a new element at the end of an array we can do the following:

>> b=[:x,:y]
=> [:x, :y]
>> b[2,0]=:z
=> :z
>> b
=> [:x, :y, :z]

Now, one interesting point to finish this article, observe the behaviours showed bellow:

>> b[3]
=> nil

In the example above the index 3 is out of range, since the array b has three elements.

>> b[3,0]
=> []

In the example showed above, we are on the tail of the array, so we’re not out of range yet.

>> b[4,0]
=> nil

Now we are out of the range, so we received a nil object as return value.