polishing ruby by ryan davis

What are block args?

Published 2012-08-25 @ 10:00

Tagged ruby_parser

In ruby we have a bunch of different kinds of arguments. We have:

Let’s take a look at how the ruby_parser treats these things:

Call arguments

Call arguments are the easiest to understand so it goes first. Arguments in a call are just expressions to be evaluated before the actual function call is made. There is nothing special or unusual about them:

1
f 1, 2, 3

parses as:

1
s(:call, nil, :f, s(:lit, 1), s(:lit, 2), s(:lit, 3))

Remember that yield is just another method call.

Method Arguments

Method arguments are pretty much just declared slots, with some minor exceptions. So this:

1
def f a, b = 1, *c; end

parses as:

1
2
3
s(:defn, :f, s(:args, :a, :b, :"*c",
               s(:block, s(:lasgn, :b, s(:lit, 1)))),
  s(:nil))

Special arg types like splat args and block args just have their name modified. This was cheap and easy and for most purposes works fine.

For optional arguments, a block gets added to the end of the args sexp that has lasgns for each of the optional arguments. This is how 1.8 treated it, so this is how I did it in ParseTree and therefore ruby_parser.

Without optional arguments, there is no structure to the args.

Multiple Assignment

Multiple assignments deserve attention as well, which will be evident soon.

1
a, b, c = f

parses as:

1
2
3
4
5
6
s(:masgn,
  s(:array,
    s(:lasgn, :a),
    s(:lasgn, :b),
    s(:lasgn, :c)),
  s(:to_ary, s(:call, nil, :f)))

There is a left-hand side (LHS) and a right-hand side (RHS).

The LHS is what we’re interested in. There is an implied set of parenthesis on the outside of the LHS and each set of parens implies an extra set of masgn/array nodes in the sexp. Each masgn node represents a destructuring event during the assignment. A splatted arg on the RHS is represented with a splat node. Otherwise, the LHS is static and unsurprising.

The RHS is the source of the data and complete irrelevant to this discussion except to note that the RHS evaluates its expression and the result destructures into the RHS to populate the variables.

Block Arguments

The conundrum. On one hand, a block is essentially an anonymous function, so the args should be treated like method arguments. On the other hand, block arguments are provided by evaluating the arguments to a yield, so they should be treated like multiple assignment.

I have no fucking clue how block arguments should be represented. Here is how they are represented in ruby_parser for 1.8:

1
lambda { |a, b, c| ... }

parses as:

1
2
3
4
s(:iter,
  s(:call, nil, :lambda),
  s(:masgn, s(:array, s(:lasgn, :a), s(:lasgn, :b), s(:lasgn, :c))),
  ...)

So you can see, there is an masgn, so in many ways it is just a multiple assignment…

Except thanks to 1.9 it isn’t.

1
f { |a, b=1, *c| }

parses to? There is no analog in ruby_parser yet… so this is where I’m stuck.

If I use ruby19 --dump parsetree and pass it through a processor that I wrote and then squint really hard, it mostly looks like:

1
2
3
4
5
s(:scope,
  s(:iter, s(:call, nil, :f),
    s(:args, :a, :b, :"*c"
      s(:block, s(:lasgn, :b, s(:lit, 1)))),
      ...))

Only that looks like method declaration args…

So what should it be?

Honestly, I have no fucking clue. That’s where you come in. I need to do something to normalize the following constructs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# call args
s(:call, nil, :f, s(:lit, 1), s(:lit, 2), s(:lit, 3))

# method args
s(:defn, :f, s(:args, :a, :b, :"*c",
               s(:block, s(:lasgn, :b, s(:lit, 1)))),
  ...)

# masgn
s(:masgn,
  s(:array,
    s(:lasgn, :a),
    s(:lasgn, :b),
    s(:lasgn, :c)),
  ...)

# regular boring block args
s(:iter,
  s(:call, nil, :lambda),
  s(:masgn, s(:array, s(:lasgn, :a), s(:lasgn, :b), s(:lasgn, :c))),
  ...)

# and the new crazy block args:
s(:scope,
  s(:iter, s(:call, nil, :f),
    s(:args, :a, :b, :"*c"
      s(:block, s(:lasgn, :b, s(:lit, 1)))),
      ...))

How do you think it should look?