RubyInline: April 2005 Archives

Ruby Go Zoom Zoom

| | Comments (2)

(please mazda do not sue me)

Everyone knows that ruby isn't fast. (no, it isn't deja-vu--read on) Its true, Ruby isn't fast. Matz himself gave a keynote at the Seattle RubyConf titled something like "Ruby: Fast, but Good" (please send me corrections if I misremember).

So, yeah. Slow. "Cool," I say, "most of the time it is fast enough". When it isn't fast enough what you (should) do is:


  1. Exhaust all pure-ruby options:

    1. Think about your design and make sure it is correct/optimal.
    2. Look into trading off memory for speed (ie, caching like mad).
    3. Double check to see if someone else has already solved your problem, etc etc.

  2. Profile your code
  3. Identify and possibly refactor bottlenecks into bite sized chunks.
  4. Rewrite the bottlenecks in C.

If you are lazy (smart?? nah...) like me, you'll use RubyInline to cut your C development time to a minimum... Most of the time, this isn't too hard, and if you use RubyInline, then you don't really spend any extra time dealing with makefiles/extconfs/setups/etc.

Imagine

  % time ruby factorial.rb 5000000
  Iter = 5000000, T = 67.23166600 sec, 0.00001345 sec / iter
  real	1m7.310s
  user	0m55.980s
  sys	0m0.280s

That is not a terribly long time to be running, but factorial??? c'mon. It shouldn't be that slow for just 5 million calls! If we ran a profiler on the code, we'd see that in fact, the method factorial is where nearly all time is being spent. We could pop in a quick call to inline and convert it to C and ZOOOOM!

But... what if you didn't have to write C code???

What if... all you had to do was add -rzenoptimize to the command-line??

  % time ruby -rzenoptimize factorial.rb 5000000
  *** Optimizer threshold tripped!! Optimizing Factorial.factorial
  Iter = 5000000, T = 13.30087900 sec, 0.00000266 sec / iter
  real	0m14.382s
  user	0m12.550s
  sys	0m0.290s

And ZOOM! is automatic reducing your runtime from 67 seconds to 14 seconds, almost a 5x increase in speed. Wouldn't that be nice? What if you could do that???

What if... I told you that I didn't type that output, I copied and pasted it?

Space vs Time

| | Comments (3)

Everyone knows that ruby isn't fast. The real irony is that our profiler is dreadfully slow, making profiling a task that some don't want to deal with. Shugo answered this concern recently by writing a new profiler that is implemented in C. It is much much faster. Running 50,000 iterations of my simple factorial benchmark you get:

  Unprofiled Ruby:  1.019s
    Profiled Ruby: 53.234s
     Shugo's Prof:  2.500s

Not bad, eh? While Shugo's is faster, it is also rather... shall we say, inelegant? The native profiler clocks in at an elegant 65 lines of fairly readable code. That is what makes it so slow actually. It is sacrificing speed for simplicity. Shugo's code is a total of 701 lines of C & ruby. With a 10x increase in size but a 20x increase in performance (for my very pathological example), Shugo's profiler is sacrificing simplicity for speed. When you need the absolute fastest profiler out there, Shugo's profiler is the way to go, but I wouldn't want to maintain it.

I wanted to experiment with this thought: why can't you have both, or at least sacrifice a little of both for an overall bigger gain? I started by trying to port Shugo's code back to ruby. Turns out that Shugo, being a ruby-internals guru, makes such use of ruby's deep innards that I couldn't fully port it back with my feeble ruby-internals skills. I got a fair portion of it done, but didn't want to attempt to pull in some of the internal data structures that he was using. By the time I gave up, I did drop the size of the code by about one third.

OK. I'm no master like Shugo. I can live with that. No really... I'm not feeling the least bit self-conscious about it. :)

So, I went the other route, using what I now knew of Shugo's code as a guide. I started with the 65 line pure-ruby profiler and started porting it forward to C. I used RubyInline for this. Turns out all I had to port was the proc that you register with set_trace_func, and the code is fairly simple. As a result I have fairly readable code clocking in at 182 lines and a time of 6.441s. That is a off of some simplicity for some speed. I think I can live with that.

You can even see the difference, and to some extent, the trade-offs made (the data is a few days old):

Time vs Size tradeoffs

I'm in the process of packing it up with a bunch of my other smaller hacks and will be publishing it soon. I hope to get more people looking at it and giving me feedback. Thanks.

About this Archive

This page is a archive of entries in the RubyInline category from April 2005.

RubyInline: March 2005 is the previous archive.

RubyInline: June 2005 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.32-en