November 2008 Archives

fat code is slow code

| | Comments (0)

I've been poking around in a lot of gems lately and I found this beauty (I added the 1 to the name for obvious reasons below):

def load_tags1()
  x = Array.new() # Group and Element ID
  y = Array.new() # Value representation
  z = Array.new() # Name
  # E.1 Registry of DICOM command elements
  # Group 0000
  x+=["0000,0000"] and y+=[["UL"]] and z+=["Command Group Length"]
  x+=["0000,0001"] and y+=[["UL"]] and z+=["Length to End"] # RET
  x+=["0000,0002"] and y+=[["UI"]] and z+=["Affected SOP Class UID"]
  x+=["0000,0003"] and y+=[["UI"]] and z+=["Requested SOP Class UID"]
  x+=["0000,0010"] and y+=[["CS"]] and z+=["Recognition Code"] # RET
  # ... 2500 more lines like the above
  # Return the array information:
  return [x,y,z]
end

This one method flogs at an astounding 15538! As a comparison, my entire perforce repository flogs at 37391. The next highest method I've found flogged at 3685.

What's wrong with this code? Well, for starters, it flogs to 15k for a reason. That's a LOT of code and almost all of it need not exist to begin with. A very quick analysis shows that NONE of the and logic needs to exist. Those could be semicolons. Further, this is really all static data and the method returns the 3 columns... So why not treat them as columns and be done with the 3 variables? Here was my quick 1 minute rewrite (no, really):

def load_tags2()
  # E.1 Registry of DICOM command elements
  # Group 0000
  [["0000,0000", ["UL"], "Command Group Length"],
   ["0000,0001", ["UL"], "Length to End"],
   ["0000,0002", ["UI"], "Affected SOP Class UID"],
   ["0000,0003", ["UI"], "Requested SOP Class UID"],
   ["0000,0010", ["CS"], "Recognition Code"],
   # ... 2500 more lines like the above
  ].transpose
end

This returns exactly the same data. What does that flog to? 1. That's right, one. Why? Because there is a total of 1 method calls throughout the whole thing. The rest is all static data.

And my final rewrite:

def load_tags3()
  TAGS
end

Just realizes that the whole thing is static/constant so you should treat it as such. This probably flogs at less than one. I didn't bother figuring out because once you go from 15k to 1 the rest is cake.

The real important part is this: Flog scores mean something real. In this case, flog complexity not only translates to "hard to test", it also means "hard to run":

% ./quick.rb 1000
# of iterations = 1000
                          user     system      total        real
null_time             0.000000   0.000000   0.000000 (  0.000152)
load_tags1           61.910000   0.340000  62.250000 ( 62.716426)
load_tags2            3.540000   0.020000   3.560000 (  3.575550)
load_tags3            0.000000   0.000000   0.000000 (  0.000504)

load_tags1 is 17.5x slower that load_tags2, which is in turn 3580x slower than load_tags3.

It pays to flog.

Made my Day

| | Comments (0)

This has absolutely made my day:

Hi Ryan. I've just been sitting here playing with flay and realizing I use a lot of your stuff. So, I'm going to send you something from your Amazon wishlist...is there a book on there you particularly want?

If I don't hear back, I'll just pick from the last few you added.

thank you very very much

One of my students is using Smultron and the more I poke at it (about an hour now), the more I hate it. Despite claiming to support ruby, it doesn't have really simple features I've come to expect like "reindent code". So, are there any suggestions for ruby text editors. Minimum requirements would be:

  • Mac OS X
  • GUI, preferably w/ tabs
  • Free (yes, that means TextMate is right out).
  • Easy to set up (preferably NO set up, that excludes vim and emacs)
  • actually knows ruby, meaning:
    • syntax highlighting
    • can navigate to classes/modules/methods easily
    • knows what the indentation should be and can reindent

(tho, now that I think about it, I should find a good editor for windows as well, those guys seem to have the same pain)

Edited to Add:

Apparently when you bring up a religious argument like text-editors it is free license to not bother reading what one is actually asking. To clarify:

  • textmate: not free
  • macvim: requires setup to be halfway usable. worse, it is a patently horrible UI.
  • jedit: latest doesn't even launch, stable pretends to have reindent, but it doesn't work in the slightest... worst "mac" app ever, nothing about it conforms to the usual osx experience.
  • komodo edit: doesn't have reindent or navigation (that I could find). it obviously should know how, it marks appropriate fold points... but yeah. not usable.
  • textwrangler: again... no reindent. how hard is reading before suggesting?
  • aquaemacs: you're suggesting something you've never used??? C'mon...

xcode... actually got the closest, but it's reindent has a bug:

class X
  def x
    if y then
      42
      else
      24
      end
    end
  end

I said the following in my previous post about flay:

"Soon I will write up why flay kicks towelee [sic], PMD, and everyone else's tool in the ass... But I think the above is a damn good start."

There are going to be two classes of tools for this type of work: string-based tools and AST-based tools.

Towelie

Towelie is Giles' entry into the fray of ruby developer tools. Towelie is an AST-based tool that uses ParseTree to detect duplicate code at the method level.

There aren't nearly as many available to us rubyists, so it is worth a peek... But, after closer inspection, I just can't compare the two. Yes, towelie attempts to detect duplicated (but not similar) code, but that is where the similarities end, so the comparison doesn't seem fair.

Consider it an exercise for the reader.

PMD's CPD, Simian, Same, etc.

On the CPD page it says that the current version "was rewritten by Steve Hawkins to use the Karp-Rabin string matching algorithm". In other words, it is a string-based tool. This family of tools have completely different objectives than flay. They're normalizing whitespace,stripping comments, and looking for duplicate code. That's great... it actually finds lots and lots of good stuff and I used to use same when starting with new clients.

But... (there is always a but...)

These tools would (could!) NEVER point this out:

Matches found in :defn (mass = 32)
  A: ../../drawr/dev/lib/drawr.rb:38
  B: ../../png/dev/lib/png.rb:181

A: def write(file)
B: def save(path)
A:   File.open(file, "wb") { |f| f.write(to_s) }
B:   File.open(path, "wb") { |f| f.write(to_blob) }
   end

especially considering the code is actually written like this:

def write(file); File.open(file, 'wb') { |f| f.write to_s }; end

vs:

def save(path)
  File.open path, 'wb' do |f|
    f.write to\_blob
  end
end

This is something that a duplicate code string-based scanner just can't do. Even the simplest change like {} vs do/end or changing your line wrapping on long conditionals will be missed... lost... ignored.

So, flay has the ability to go beyond simple copy/pasted code and detect real candidates for refactoring. That is something that the java folks don't seem to have (yet) for some reason. Certainly the foundation set by PMD means it is available, but it isn't there yet.

rubygems 1.2.x?

| | Comments (2)

do this now:

sudo gem install rubygems-update
sudo update_rubygems

no. really. you need to.

(unless you're on ubuntu/debian... then you're all sorts of screwed)

I Want Crazy

| | Comments (2)

I want to start a software company that is described in one word: CRAZY.

I want each person to be independent, yet contribute to the interdependent whole. I want to create, buy, lease, or otherwise steal all the necessary technology to empower my team to do what we need to make outstanding software. I want the team to be able to solve any problem. I want Fridays to be contest day, where we all compete to solve a concrete or abstract problem in the most elegant, creative way possible, and give a prize to the winner. I want a multi-colored hair team. I want code reviews to have a drum beat. I want the team to be the cells in a living entity that can NOT BE STOPPED. I want to create amazing software that speaks to people and says "I can do what you want, with style." I want our software to represent content, not marketing fluff. I want the members to teach each other everything they know. I want our words to be heard. I want our software to be seen. I want our solutions to be admired and respected. I want our team to be egoless, yet completely confident. I want managers to code. I want coders to manage. I want to have SWAT team uniforms for each member (not necessary for everyday work--just for 3rd party intimidation). I want money to be a non-issue. I want schedule to be determined by the team. I want personality, character, and culture in every team member. I want us to transcend stereotypes. I want us to transcend "average" software. I want us to transcend all barriers.

I want us to enjoy delivering what people REALLY want.


I wrote this many years ago. I still want this... slowly getting closer...

Flay is blowing my mind

| | Comments (0)

No, really. Flay is blowing my mind. It is just getting cooler and cooler. The next version of flay is going to have a verbose mode that will try to show an N-way diff:

Matches found in :when (mass = 84)
  A: unit/itemconfig.rb:182
  B: unit/itemconfig.rb:192
  C: unit/itemconfig.rb:207

A: when /^(#{__item_numstrval_optkeys(tagid(tagOrId)).join("|")})$/ then
B: when /^(#{__item_listval_optkeys(tagid(tagOrId)).join("|")})$/ then
C: when /^(#{__item_strval_optkeys(tagid(tagOrId)).join("|")})$/ then
A:   num_or_str(tk_call_without_enc(*(__item_cget_cmd(tagid(tagOrId)) << "-#{option}")))
B:   simplelist(tk_call_without_enc(*(__item_cget_cmd(tagid(tagOrId)) << "-#{option}")))
C:   _fromUTF8(tk_call_without_enc(*(__item_cget_cmd(tagid(tagOrId)) << "-#{option}")))

And Patrick Ritchie recently submitted additional similarity reporting that I'm going to work on folding in soon. His version allows extra fuzzier matching of copy-pasted code that has been edited!

Soon I will write up why flay kicks towelee, PMD, and everyone else's tool in the ass... But I think the above is a damn good start.

un provides unextend and uninclude to allow for a better prototype-oriented programming experience.

Changes:

1.0.0 / 2008-11-07

Flay analyzes ruby code for structural similarities. Differences in literal values, variable, class, method names, whitespace, programming style, braces vs do/end, etc are all ignored. Making this totally rad.

Changes:

1.0.0 / 2008-11-06

Lets you set the class of an object. Use at your own risk.

Changes:

1.0.0 / 2008-11-06

ruby2ruby provides a means of generating pure ruby code easily from ParseTree's Sexps. This makes making dynamic language processors much easier in ruby than ever before.

Changes:

1.2.1 / 2008-11-04

ruby_parser (RP) is a ruby parser written in pure ruby (utilizing racc--which does by default use a C extension). RP's output is the same as ParseTree's output: s-expressions using ruby's arrays and base types.

Changes:

2.0.1 / 2008-11-04

  • 2 minor enhancements:

    • Updated for changes to splat node in many contexts.
    • Made PT a developer dep
  • http://rubyforge.org/projects/parsetree/

ParseTree is a C extension (using RubyInline) that extracts the parse tree for an entire class or a specific method and returns it as a s-expression (aka sexp) using ruby's arrays, strings, symbols, and integers.

As an example:

def conditional1(arg1) if arg1 == 0 then return 1 end return 0 end

becomes:

[:defn, :conditional1, [:scope, [:block, [:args, :arg1], [:if, [:call, [:lvar, :arg1], :==, [:array, [:lit, 0]]], [:return, [:lit, 1]], nil], [:return, [:lit, 0]]]]]

Changes:

3.0.2 / 2008-11-04