ben hoskings

ruby 2.0 by example

Since it’s ruby’s 20th birthday today, I thought I’d write about what we can expect in the imminent 2.0 release. Version 2.0 is a great release with a strong theme, and that’s convention—2.0 formalises useful conventions that us rubyists have grown to use, and it judiciously adds language features to address bad conventions.

There are four big front-facing changes in 2.0—keyword arguments, refinements, lazy enumerables, and prependable modules. Here’s my take on all four, using lots of code samples from my RubyConf AU talk. I’ve also written up some of the lesser-known changes, which together with these make for a very promising release.


Keyword arguments

Keyword args are the perfect example of convention in 2.0. They very neatly address the pattern of passing optional named args in a trailing hash.

Some have criticised the feature for not being a true named argument implementation, like python has. That’s true, but I don’t think it’s a fair criticism because they’re not intended as such: they’re quite focused on solving just the optional trailing argument problem.

def render(source, opts = {})
  opts = {fmt: 'html'}.merge(opts)
  r = Renderer.for(opts[:fmt])
  r.render(source)
end

render(template, fmt: 'json')
This is the essence of the design we’re used to: pass a hash as the final argument, merge it into the defaults, and then pull keys out of it as required.

In ruby 2.0, this convention has a language-level version:

def render(source, fmt: 'html')
  r = Renderer.for(fmt)
  r.render(source)
end

render(template, fmt: 'json')
Beautiful!

A couple of things to notice here. Firstly, no default handling is required, because it’s part of the definition. Secondly, both the definition and the calls are written using the 1.9 hash syntax, so keyword args already feel familiar.

Here’s a real-world example from within actionpack.

def accepts_nested_attributes_for(*attr_names)
  options = {
    :allow_destroy => false,
    :update_only => false
  }
  options.update(attr_names.extract_options!)
  options.assert_valid_keys(
    :allow_destroy,
    :reject_if,
    :limit,
    :update_only
  )
  # ...
end
As I said on Friday, I need a cup of tea and a lie down by the time I get to the method body.

Notice that there are three jobs being done by that code: extraction from the splatted args, rejection of invalid arguments, and default handling. Here’s the 2.0 equivalent:

def accepts_nested_attributes_for(*attr_names,
  allow_destroy: false,
  update_only: false,
  reject_if: nil,
  limit: nil
)
  # ...
end
Much better! But a tea would be great.

This is about as condensed as that information can get, and all three jobs the manual version was doing—splat extraction, unexpected args, and defaults—are taken care of. (Defaults are mandatory, and unexpected keys raise an ArgumentError.) All in all a great feature that I think will clean up a lot of codebases.


Refinements

Matz announced a new feature for ruby 2.0 called classboxing, “non-global monkey patching”, at RubyKaigi ‘10. Since then the idea has evolved and the result, refinements, are an experimental feature in 2.0. They’re a way of patching a class only within a certain scope.

module Patches
  refine Array do
    def collapse pattern
      grep(pattern).map {|i| i.sub(pattern, '') }
    end
  end
end

using Patches

`git branch`.split("\n").collapse(/\* /)
  #=> ['master']
This is a refinement. The patch is only active in a given lexical scope after the using call.

At present, you can only call using at the top level of a file, but in future it will most likely be callable within a class or module too.

Note also that refinements don’t involve any syntax changes: both refine and using are new methods in the standard library, not new keywords. Because no syntax has changed, we could build a fallback mechanism for 1.9 that instead applied the refinement as a traditional monkey patch:

class Module
  def refine(klass, &block)
    klass.module_eval(&block)
  end
end

class Object
  def using(refinement)
    # Nothing to do!
  end
end
The golden rule of monkey patching applies: just because you could, doesn’t mean you should.

Anyhow, there’s been a lot of discussion about how refinements should work, most of it around the merits of local rebinding. (Local rebinding means that a binding’s active refinements would also be active in a proc run against that binding, even if the refinements weren’t active in the proc’s defining scope.) Refinements were originally going to include local rebinding, but as it stands today, they don’t—refinements are lexically (i.e. statically) scoped.

The local rebinding discussion is about the choice between lexical and dynamic scoping, and boils down to this: should the refinements in the current runtime scope, or those from the lexical scope where the code was defined, be the ones that count?

Charles Nutter argued strongly that lexical scoping is the right choice. Yehuda Katz wrote a great post on the subject too, arguing that dynamic scoping would allow code to be run against unexpected refinements after the fact, which is the exact problem with monkey patching that refinements aim to solve.

I agree with Charles and Yehuda that lexical scoping is the right choice. It does have a surprising implication, though, that it’s important to be aware of.

class Dict
  def word_list
    File.open('/usr/share/dict/words')
  end

  def long_words
    word_list.sort_by(&:length).reverse
  end
end

Dict.new.long_words.take(10)
  #=> ['scientificophilosophical', ...]
The simple case: #long_words calls #word_list to obtain the full list of words to operate on.

Suppose that we refine this class to reimplement #word_list.

module NameDict
  refine Dict do
    def word_list
      File.open('/usr/share/dict/propernames')
    end
  end
end

using NameDict

Dict.new.long_words.take(10)
  #=> ['scientificophilosophical', ...]
This returns the longest words from the original dictionary, not the longest names as the refinement would. Huh?

This is the implication of strict lexical scoping: only refinements that were active at the static callpoint count. In this case, we’re not calling #word_list in a refined scope.

To see why, trace where the call to Dict.new.long_words goes. At the callpoint, the NameDict refinement is active. But the refined method, #word_list, isn’t called in that scope: it’s called by the #long_words method up there in the original class, and in that scope the refinement isn’t active.

This seems surprising and a bit limiting at first, but it actually makes a lot of sense, because what looks like a limitation here is actually the exact constraint we need in order to make monkey patching safe. Along with this constraint comes a powerful guarantee: code will always call the version of the method it was written to call. This is good isolation, and means that refinements won’t cause collateral damage.


Lazy enumerables

Declarative list programming is even nicer in 2.0 thanks to lazy lists. Expensive and even infinite lists are cheaply useable now, because their elements are only evaluated as they’re requested.

def natural_numbers
  (1..Float::INFINITY).lazy
end

def primes
  natural_numbers.select {|n|
    (2..(n**0.5)).all? {|f|
      n % f > 0
    }
  }
end

primes.take(10).force
  #=> [1, 2, 3, 5, 7, 11, 13, 17, 19, 23]
Here we’re filtering the natural numbers (an infinite list) to just those that are prime (another infinite but less dense list). The #force method is just an alias to #to_a.

This is great. Suddenly we can deal with unweildy lists the way ruby does best: by giving them clear names and chaining them meaningfully.

Custom enumerators can be defined for other types of data, too. It’s already possible to read a file line-by-line using #gets, but still, here’s an example:

def lazy_lines(io)
  Enumerator::Lazy.new(io) do |enum, i|
    enum << i
  end
end

def words
  lazy_lines(File.open('/usr/share/dict/words'))
end

words.select {|l| l.length > 12 }.take(10).force
  #=> ['abalienation', 'abarticulation', ...]
This is an example of enumerating a custom type (in this case an IO), by defining a new lazy enumerator. This would be an expensive operation, but laziness means we can stop reading as soon as we have 10 words that are long enough.

And now, a bit more on refinements

Here’s a rewrite of our infinite list example, this time pushing the prime number logic into a refinement on Fixnum. Check the two select styles at the bottom for another lexical scoping surprise.

module Maths
  refine Fixnum do
    def prime?
      (2..(self**0.5)).all? {|f|
        self % f > 0
      }
    end
  end
end

def natural_numbers
  (1..Float::INFINITY).lazy
end

using Maths

natural_numbers.select {|i| i.prime? }.take(10).force
  #=> [1, 2, 3, 5, 7, 11, 13, 17, 19, 23]

natural_numbers.select(&:prime?).take(10).force
  #=> no superclass method 'prime?' for 1:Fixnum
 

In the second example, the & in &:prime? invokes Symbol#to_proc to define a block equivalent to the literal one in the first example. But that process happens away inside the Symbol class, where no refinement is active. That is, at the point those values actually receive the #prime? method, there’s no refinement active.

My guess is that this will be added as a special case, because at this point &:method is a language feature in its own right.

Either way, the integration of refinements and the way they feel are definitely unfinished, but I think they’re looking very promising.


Module prepending

Like keyword arguments, module prepending addresses a convention that’s evolved by solving it at the language level. In this case, the convention is around modules that wrap method calls in their host classes.

class Template
  def initialize(erb)
    @erb = erb
  end
  def render values
    ERB.new(@erb).result(binding)
  end
end

module RenderProfiler
  def self.included base
    base.send :alias_method, :render_without_profiling, :render
    base.send :alias_method, :render, :render_with_profiling
  end
  def render_with_profiling values
    start = Time.now
    render_without_profiling(values).tap {
      $stderr.puts "Rendered in #{Time.now - start}s."
    }
  end
end

class Template
  include RenderProfiler
end
To wrap a method call from an included module, you have to alias the original out of the way, then call it from your module method, which you alias into the original’s place. It works, but it’s not great.

Ideally, you’d wrap calls at the language level, by calling super, but this isn’t possible from an included module. Super punts the method call to the next class in the ancestry, and an included module is already behind its host class.

Template.ancestors
  #=> [Template, RenderProfiler, Object, Kernel, BasicObject]
The RenderProfiler module was included in Template, and so is behind Template in its ancestry.

And that’s the key to how Module#prepend works. Prepending a module is no different to including it, with one exception: the prepended module comes before its host in the resulting ancestry. This means that modules in 2.0 can cleanly wrap methods in their hosts.

module RenderProfiler
  def render values
    start = Time.now
    super(values).tap {
      $stderr.puts "Rendered in #{Time.now - start}s."
    }
  end
end

class Template
  prepend RenderProfiler
end

Template.ancestors
  #=> [RenderProfiler, Template, Object, Kernel, BasicObject]
Much cleaner! We don’t need any roundabout aliasing anymore, because calling super from the module calls up the ancestry to the corresponding method in the host (notice the module at the front of the ancestry, ahead of its host class).

Well there you have it. There’s lots to like about 2.0. It’ll be a while before frameworks and gems can take full advantage of it, since using these features will break backwards compatibility in those gems. But I’m looking forward to putting ruby 2.0 into production. What could possibly go wrong?