polishing ruby by ryan davis

2013-04-08

Safari Split Window

Published 14:56 - Safari Split Window

Save this in ~/Library/Scripts/Applications/Safari/ and name it something like “Split Window at Tab”. It isn’t perfect, as it seems to reopen the url instead of just moving the tab. That might interrupt any transient data in your current session. But it is good enough for me to fire up a bunch of urls and then put them into their own window in one fell swoop.

tell application "Safari"
  set n to index of current tab of window 1
  set m to count of tabs of window 1
  
  make new document
  
  repeat with i from n to m
    move tab n of window 2 to window 1
  end repeat
  
  delete tab 1 of window 1
end tell

2013-03-28

Fuzzy Duplication Detection in Flay

Published 14:56 - Fuzzy Duplication Detection in Flay

1
2
3
4
5
6
7
8
9
10
11
def a
  f1; f2; f3; f4; f5; f6
end

def b
      f2; f3; f4; f5; f6
end

def c
      f2; f3; f4; f5; f6; f7
end

Given these three methods: a, b, and c, flay will currently report that a and c are structurally similar.

    % flay -m 5 bogus_example.rb

    Total score (lower is better) = 16

    1) Similar code found in :defn (mass = 16)
      bogus_example.rb:1
      bogus_example.rb:9

In c, The call to f1 has been removed and a call to f7 has been added. Structurally, the two methods are the same. What does that mean “structurally the same”? It means that if you strip all the names and values out, the code has the same tree shape (and node types, but that’s a digression). Flay is just an intelligent tree matcher.

But what about when they’re not structurally the same? What happens when a sloppy developer copies a method and comments out one or two expressions (as in b above)? That gets past flay currently because the trees aren’t the same shape anymore. What flay needs is to be able to say that they’re a “fuzzy” match and report them.

I can quickly think of 2 ways to detect that b is similar to a & c. The first is to apply some sort of tree-based distance algorithm to the diagonal matrix of all applicable nodes and reporting any combinations that have a distance under a certain threshold. Quite frankly, I’m hesitant to go this route simply because of the complexity of it. Wikipedia hints that the problem of graph isomorphism falls into NP but it is unknown whether it falls into NP-complete or P (tho this is a special case because ASTs are planar graphs and that restricts a lot of the complexity). It sounds like an interesting problem, but not necessarily one I want to take on for flay. Flay is currently conceptually really “simple” and I’d like to keep it that way.

The second way is to try to piggyback on flay’s current main data structure and algorithm and let it do all of the work the way it currently does. Flay stores a bunch of “buckets” of structurally similar nodes that looks something like this:

1
all_nodes.group_by(&:structural_hash).delete_if { |_,v| v.size == 1 }

(boy do I wish I could actually write it that simply!)

If I could get subset matches into that structure, then I’m done. It turns out that this isn’t that hard and only took me an afternoon to get something scratched up that felt right and was efficient enough to use against real code bases. Running flay rails vs flay -f rails only adds 30 seconds and 19 megs of RSS, but it also detected a lot more resulting in a flay score increase of 9464 and ~700 extra lines of report output.

So… what’s a subset? Well… in this case, I’m mostly interested in detecting when a developer copies a method (or whatever) and simply adds or removes one or two things. Much more than that and you’re going to wind up with a mess in your reports. By using Array#combination I can generate a bunch of variations on an AST and then structurally compare those to the original ASTs.

Something like this:

1
2
3
4
5
(ast.size - 1).downto(ast.size - difference) do |n|
  ast.combination(n).each do |subast|
    self.hashes[subast.structural_hash] << subast
  end
end

Of course, it isn’t that clean and simple. Much more went into it to produce performant code and to prevent nonsensical comparisons… but that’s the gist of it. The nice thing about this is that it uses ruby built-ins to do 90% of the work and then simply folds into flay’s current data structure. I didn’t really have to do much more than that. You can see it work here:

    % flay -m 5 -f bogus_example.rb

    Total score (lower is better) = 37

    1) Similar code found in :defn (mass = 21)
      bogus_example.rb:1 (FUZZY)
      bogus_example.rb:5
      bogus_example.rb:9 (FUZZY)

    2) Similar code found in :defn (mass = 16)
      bogus_example.rb:1
      bogus_example.rb:9

I just committed my initial implementation. Take a look if you’d like.

2013-01-23

One Way 1.9 Drives Me Nuts

Published 14:56 - One Way 1.9 Drives Me Nuts

I raised a fuss when Matz proposed adding the ability to define ! and != on a class. The idea that you can contradict simple logic was befuddling and seemed like a really bad design choice. Despite many of my other proposals getting shot down with “that might confuse a developer” or “that could cause problems in [obscure edgecase]”, this one was defended with “I trust the developer to be smart”.

In 1.8, an unless statement is normalized to a negated if statement, such that the following are all equivalent:

1
2
3
4
5
6
7
8
9
a unless b

a if ! b

if b then
  nil
else
  a
end

When ruby parses code, they all wind up being treated like the 3rd form. This makes sense. You apply simple logical transformations and normalize the code.

But, this has been thrown out the window in 1.9.

Instead, in 1.9 the first and the third are equivalent:

1
2
3
4
5
6
7
8
9
a unless b

# becomes:

if b then
  nil
else
  a
end

but in the second ! goes down an entirely different code path:

1
2
3
4
5
6
7
a if ! b

# becomes:

if b.!() then
  a
end

So if you have a mix of programming styles (or a mix of programmers) you can have entirely different results. Won’t that be fun to debug?

The same is true for != vs ==. What was normalized in 1.8 as:

1
2
3
4
5
6
7
a != b

!(a == b)

# both become:

!(a.==(b))

but in 1.9:

1
2
3
4
5
6
7
8
9
a != b

!(a == b)

# become:

a.!=(b)

(a.==(b)).!

Again… a debugging nightmare. I don’t see why we have this feature. It simply seems like trouble waiting to happen.

If we’re going to allow you to contradict logic, we should at least do it in a consistent manner. Everything should normalize towards !. Such that these are all equivalent:

1
2
3
4
5
6
7
8
9
a unless b

a if ! b

# become:

if b.!() then
  a
end

The != vs == case makes less sense, honestly, since you can also normalize towards == with !… The fact that you can contradict yourself 3 ways in a single class means that there isn’t any one way to normalize. I don’t think there is a clean solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
require "minitest/autorun"

class HaHa
  def == o
    HaHa === o
  end

  alias != ==

  def !
    true
  end
end

describe HaHa do
  it "must be as confusing as possible" do
    assert HaHa.new == HaHa.new
    assert HaHa.new != HaHa.new
    assert HaHa.new
    assert !HaHa.new
  end
end

passes with:

# Running tests:

.

Finished tests in 0.000615s, 1626.0163 tests/s, 6504.0650 assertions/s.

1 tests, 4 assertions, 0 failures, 0 errors, 0 skips

2012-12-12

Minitest Parallelization and You

Published 14:56 - Minitest Parallelization and You

As announced at RubyConf 2012, minitest 4.2.0 was released with parallelization added. As I said then, I don’t care in the slightest about trying to make your tests run faster. You deserve your pain if you write slow tests. What I do care about greatly is making your tests hurt and this will do that.

Minitest added randomization back in 2008 (afaik, before anyone else, it even took rspec 4 years to follow suit) as an easy means of ensuring your tests were actually independent of each other. Parallelization will take that a step further and make sure you don’t have any dependencies across test suites.

So, how do you use it? Parallelization is opt-in only but very easy to do:

1
2
3
4
5
6
7
8
9
10
11
class TestMyPain < Minitest::Unit::TestCase
  parallelize_me!

  # ...
end

describe MyPain do
  parallelize_me!

  # ...
end

All parallelize_me! does is override test_order to return :parallel. The runner will take care of the rest. All parallel suites will be run in parallel. Remaining serial suites will be run after.

If you want to try it out globally, you can shortcut via:

1
require "minitest/hell"

It changes the default order from :random to :parallel. Expect stuff to break, it is named appropriately.

What Should You Make Parallel?

Ideally, everything… But that’s not terribly realistic. Some things are just better left serialized. For example, testing $stdout, $stderr, and $stdin because they’re global, they’re perfectly happy to print whatever you want whenever you want, but capture_io and assert_output won’t work right. You should move those tests to a serial suite so everything else can run parallel.

Try it out. Have fun. Find and fix bugs. Enjoy.

2012-12-10

assert_in_delta

Published 14:56 - assert_in_delta

“So, supposedly Test::Unit was too big. lib/ruby/1.8/test/unit (without ui classes): 1,124 LOC. lib/ruby/1.9.1/minitest: 1,091 LOC.”

jcoglan made this snarky comment about the size of minitest vs test/unit. Let’s ignore the fact that the original minitest was <99 lines long and enough of test/unit (which I was maintaining at the time) to run all the unit tests on all of my projects. Let’s also ignore the fact that minitest is a lot more than just a test/unit replacement at this point. But apparently we need to re-re-revisit the fact that LOC doesn’t mean anything and complexity means everything.

Let’s revisit assert_in_delta from my Size doesn’t Matter talk. test/unit’s:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public
def assert_in_delta(expected_float, actual_float, delta, message="")
  _wrap_assertion do
    {expected_float => "first float", actual_float => "second float", delta => "delta"}.each do |float, name|
      assert_respond_to(float, :to_f, "The arguments must respond to to_f; the #{name} did not")
    end
    assert_operator(delta, :>=, 0.0, "The delta should not be negative")
    full_message = build_message(message, <<EOT, expected_float, actual_float, delta)
<?> and
<?> expected to be within
<?> of each other.
EOT
    assert_block(full_message) { (expected_float.to_f - actual_float.to_f).abs <= delta.to_f }
  end
end

versus minitest’s:

1
2
3
4
5
def assert_in_delta exp, act, delta = 0.001, msg = nil
  n = (exp - act).abs
  msg = message(msg) { "Expected |#{exp} - #{act}| (#{n}) to be < #{delta}"}
  assert delta >= n, msg
end

Personally, I think that speaks for itself. If that doesn’t convince you, look at the rest of my Size doesn’t Matter talk starting around slide 70 “Minitest Design Rationale”.

Archives