I'm in lexer hell

| | Comments (2)
% ruby -e 'p %=abc='
-e:1: syntax error, unexpected $end
% ruby -e 'x = %=abc=; p x'
"abc"

There are soooo many stupid little edge cases in the ruby language that it is nearly impossible to write a parser for it that isn't completely convoluted.

I've always wondered how much ruby we'd have left if we just cut out the weird and/or overly complicated stuff.

Possible things to remove:

  • Nested interpolation. "blah#{"blah#{"blah"}blah"}blah"
  • Emacs keybinding escapes: "\C-\M-a" # => "\201"
  • 1400 extra % thingies: %s(blah) #=> :blah

There are 28 cases of my lexer checking lexstate and 63 cases where the lexer is setting lexstate. That by itself is absolutely fine. It is the additional 17 cases where the parser TELLS the lexer what the lexer state is that absolutely terrifies me.

2 Comments

Didn't Matz say something like "fortunately, writing a parser for Ruby is impossible. I should know!" at RubyConf 2007? I think he credited that for the lack of alternative implementations until recent years.

I agree that there are a lot of cases that 90% of Rubyists have never used (or don't know exist). I'm guessing it's too late to eliminate them from 2.0, which is a shame, given that that would probably be the best time to do it since so many other things are breaking.

He was just kidding tho. Ruby is certainly not perl by any means. It is statically evaluatable. But UGH is it ugly sometimes.

Leave a comment

About this Entry

This page contains a single entry by zenspider published on October 7, 2008 8:13 PM.

Some reasons why I left the ACM was the previous entry in this blog.

hoe version 1.8.0 has been released! is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.23-en