🕷 software releases

by ryan davis



sitemap

ruby_parser version 2.0.0 has been released!

Published 2008-10-22 @ 21:32

ruby_parser (RP) is a ruby parser written in pure ruby (utilizing racc–which does by default use a C extension). RP’s output is the same as ParseTree’s output: s-expressions using ruby’s arrays and base types.

Changes:

2.0.0 / 2008-10-22

  • 1 major enhancement

    • Brought on the AWESOME! 4x faster! no known lexing/parsing bugs!
  • 71 minor enhancements

    • 1.9: Added Fixnum#ord.
    • 1.9: Added missing Regexp constants and did it so it’d work on 1.9.
    • Added #store_comment and #comments
    • Added StringScanner #begin_of_line?
    • Added a bunch of tests for regexp escape chars, #parse_string, #read_escape, ? numbers, ? whitespace.
    • Added a hack for rubinius’ r2l eval bug.
    • Added a new token type tSTRING that bypasses tSTRING_BEG/END entirely. Only does non-interpolated strings and then falls back to the old way. MUCH cleaner tho.
    • Added bin/ruby_parse
    • Added compare rule to Rakefile.
    • Added coverage files/dirs to clean rule.
    • Added file and line numbers to all sexp nodes. Column/ranges to come.
    • Added lex_state change for lvars at the end of yylex.
    • Added lexed comments to defn/defs/class/module nodes.
    • Added stats gathering for yylex. Reordered yylex for avg data
    • Added tSYMBOL token type and parser rule to speed up symbol lexing.
    • Added tally output for getch, unread, and unread_many.
    • Added tests for ambigous uminus/uplus, backtick in cmdarg, square and curly brackets, numeric gvars, eos edge cases, string quoting %<> and %%%.
    • All cases throughout yylex now return directly if they match, no passthroughs.
    • All lexer cases now slurp entire token in one swoop.
    • All zarrays are now just empty arrays.
    • Changed s(:block_arg, :blah) to :”&blah” in args sexp.
    • Cleaned up lexer error handling. Now just raises all over.
    • Cleaned up read_escape and regx_options
    • Cleaned up tokadd_string (for some definition of cleaned).
    • Converted single quoted strings to new tSTRING token type.
    • Coverage is currently 94.4% on lexer.
    • Done what I can to clean up heredoc lexing… still sucks.
    • Flattened resbodies in rescue node. Fixed .autotest file.
    • Folded lex_keywords back in now that it screams.
    • Found very last instanceof ILiteralNode in the code. haha!
    • Got the tests subclassing PTTC and cleaned up a lot. YAY
    • Handle yield(*ary) properly
    • MASSIVELY cleaned out =begin/=end comment processor.
    • Massive overhaul on Keyword class. All hail the mighty Hash!
    • Massively cleaned up ident= edge cases and fixed a stupid bug from jruby.
    • Merged @/@@ scanner together, going to try to do the same everywhere.
    • Refactored fix_arg_lex_state, common across the lexer.
    • Refactored new_fcall into new_call.
    • Refactored some code to get better profile numbers.
    • Refactored some more #fix_arg_lex_state.
    • Refactored tail of yylex into its own method.
    • Removed Module#kill
    • Removed Token, replaced with Sexp.
    • Removed all parse_number and parse_quote tests.
    • Removed argspush, argscat. YAY!
    • Removed as many token_buffer.split(//)’s as possible. 1 to go.
    • Removed begins from compstmts
    • Removed buffer arg for tokadd_string.
    • Removed crufty (?) solo ‘@’ token… wtf was that anyhow?
    • Removed most jruby/stringio cruft from StringScanner.
    • Removed one unread_many… 2 to go. They’re harder.
    • Removed store_comment, now done directly.
    • Removed token_buffer. Now I just use token ivar.
    • Removed use of s() from lexer. Changed the way line numbers are gathered.
    • Renamed *qwords to *awords.
    • Renamed StringScanner to RPStringScanner (a subclass) to fix namespace trashing.
    • Renamed parse to process and aliased to parse.
    • Renamed token_buffer to string_buffer since that arcane shit still needs it.
    • Resolved the rest of the lexing issues I brought up w/ ruby-core.
    • Revamped tokadd_escape.
    • Rewrote Keyword and KWtable.
    • Rewrote RubyLexer using StringScanner.
    • Rewrote tokadd_escape. 79 lines down to 21.
    • Split out lib/ruby_parser_extras.rb so lexer is standalone.
    • Started to clean up the parser and make it as skinny as possible
    • Stripped out as much code as possible.
    • Stripped yylex of some dead code.
    • Switched from StringIO to StringScanner.
    • Updated rakefile for new hoe.
    • Uses pure ruby racc if ENV[‘PURE_RUBY’], otherwise use c.
    • Wrote a ton of lexer tests. Coverage is as close to 100% as possible.
    • Wrote args to clean up the big nasty args processing grammar section.
    • lex_strterm is now a plain array, removed RubyLexer#s(…).
    • yield and super now flatten args.
  • 21+ bug fixes:

    • I’m sure this list is missing a lot:
    • Fixed 2 bugs both involving attrasgn (and ilk) esp when lhs is an array.
    • Fixed a bug in the lexer for strings with single digit hex escapes.
    • Fixed a bug parsing: a (args) { expr }… the space caused a different route to be followed and all hell broke loose.
    • Fixed a bug with x\n=beginvar not putting begin back.
    • Fixed attrasgn to have arglists, not arrays.
    • Fixed bug in defn/defs with block fixing.
    • Fixed class/module’s name slot if colon2/3.
    • Fixed dstr with empty interpolation body.
    • Fixed for 1.9 string/char changes.
    • Fixed lexer BS wrt determining token type of words.
    • Fixed lexer BS wrt pass through values and lexing words. SO STUPID.
    • Fixed lexing of floats.
    • Fixed lexing of identifiers followed by equals. I hope.
    • Fixed masgn with splat on lhs
    • Fixed new_super to deal with block_pass correctly.
    • Fixed parser’s treatment of :colon2 and :colon3.
    • Fixed regexp scanning of escaped numbers, ANY number is valid, not just octs.
    • Fixed string scanning of escaped octs, allowing 1-3 chars.
    • Fixed unescape for \n
    • Fixed: omg this is stupid. ‘()’ was returning bare nil
    • Fixed: remove_begin now goes to the end, not sure why it didn’t before.