I’ve decided to start a new Series of blog posts focused on the analysis of existing code that all developers depend on, and in some cases, take for granted. I know this exercise will uncover useful programming practices and “tricks” to help me become a better programmer. My hope is that by sharing this information in this blog, other developers will be helped as well. My goal is to keep the length of these posts small enough that they are easy to digest in a few minutes of reading but still provide useful insights to the reader.

Installment 1 – ActionView::Base.word_wrap

The word_wrap method within ActionView is a very useful method for reformatting text to a specific column width (default is 80 characters). A high level description of how it works is that it breaks up the original text into chunks based on any existing EOL characters found in the text. For each chunk or paragraph, additional EOL characters are inserted at appropriate locations so as to limit the sentence width to the desired number of columns without cutting any words in half. The resulting chunks are then glued back together with additional EOL characters.

A simple example using a line_width of 8 characters is as follows:

  word_wrap('Level Five Solutions', :line_width => 10)
  # => Level Five\nSolutions

Lets look at how this is implemented within the TextHelpers.rb class bundled with ActionView:

  def word_wrap(text, *args)
    options = args.extract_options!
    unless args.blank?
      options[:line_width] = args[0] || 80
    end
    options.reverse_merge!(:line_width => 80)

    text.split("\n").collect do |line|
      line.length > options[:line_width] ?
            line.gsub(
                  /(.{1,#{options[:line_width]}})(\s+|$)/,
                  "\\1\n").strip :
            line
    end * "\n"
  end

I had to do some creative “wrapping” of the code to make it fit our blog layout. How ironic!

The relevant parts of this method begin on line 8 where it splits the original text on any existing “\n” characters and then operates on each “line” within the body of the block past in to the collect method. A simple ternary operator separates the lines that are longer than the line_width from those that are not. The lines that are longer are “embedded” with EOL characters through a call to gsub with a very creative regular expression:

  /(.{1,#{options[:line_width]}})(\s+|$)/

The first pair of parenthesis identify this as a capturing regular expression. The period says to match any character up to the number identified by the following curly brace expression. The curly brace section is greedy in that it tries to match up to the line_width first and then starts falling back from there. It decides to fall back based on the trailing look-ahead expression which requies that the next character is a space or the end of the line. After the regular expression matcher extracts a particular subset of the text it is replaced by the gsub method with the following expression:

  "\\1\n"

This expression builds a new string containing the substring captured by the regular expression (\\1) and a trailing EOL character (\n). The “g” in gsub means global so that the substitution is applied across the entire string. This means that the regular expression is repeated as many times as it takes to fully modify the original text to include EOL characters at all the “appropriate” locations based on the line_width provided.

The final tail end of this method “multiplies” the result of the collect statement by a string containing the EOL character (\n). The result of the collect method is an array containing each “line” from the original text that was broken by the call to split. The multiplication operator (*) on the Array class has special logic when working with strings. It concatenates each element of the array with the string provided as the second parameter and additionally concatenates the entire list into one long string. It does not add a final trailing occurrence of the EOL character. Here’s an example to clarify this point:

  ['1','2','3'] * 'a'
  1a2a3

Notice that the ‘a’ character only shows up between elements ’1′ and ’2′. It does not show up at the end.

It looks like I’ve gotten to the end of this method, and therefore, the end of this blog post. Some of the lessons I’m taking away from this post include additional regular expression knowledge and a better understanding of how the multiplication operator of the array class behaves when operating on strings.

I have not yet decided what my next post will cover (or when I’ll get it done) but I hope to continue the series by diving into other useful code looking for interesting techniques and cool “tricks”. If anyone reading this series has an idea for some code that would make a good topic, just post it to the comments and I’ll try to cover it in a future post.

Follow

Get every new post delivered to your Inbox.