Parsers

Rails Router Example

Bogdan Gusiev

October 2013

P-A-R-S-E-R-S

Why?

Because best way to pass data from one App to another is

Serialization

COM RMI

One more less obvious reason

... сейчас обсуждаем возможность сделать конфу более технической.

So this talk required to be

hardcore

Plan

  • Easy Part
  • Classification
  • Parsers Theory
  • Simple examples
  • Live examples

The easiest way to build parser

bogdan@example.com

Regular Expression

\w+@\w+(\.\w+)+
/\A[^@]+@([^@\.]+\.)+[^@\.]+\z/ 

etc.

Each regular expression can parse particular grammar

What is a
grammar?

Regular expressions are great!

Their power is huge.

But what is the
limit?

Regular expressions can only be used to parse Regular grammars

But what about Non-regular?

Regular expression can only parse something that could be parsed with one iteration




in other words

From Left to Right

Several obvious examples

+380(67)222-33-00
bogdan@example.com
192.168.0.1

In order to understand why

we need to get into the most boring part

The Theory

Grammar in Math is described via Generators concept.

Each generator represents a transformation from one symbol to group of other symbols

Regular Grammars

ab
aab
abb
abbb
aabb
aaab
Regexp: a+b+
S -> aS
S -> aR
R -> bR
R -> b
  • S,R - non-deterministic symbols
  • a,b - deterministic symbols
  • string is not considered final if it contains non-deterministic symbols

Regular grammar that describes an Email

(almost)
S -> [0-z]N
N -> [0-z]N
N -> @D
D -> [0-z]D
D -> [0-z].Z
Z -> [0-z]Z
Z -> [0-z]
  • S - start
  • N - name
  • D - domain
  • Z - zone

Is there a regular expression that parses this?

S -> aSb
S -> ab
ab
aabb
aaabbb
aaaabbbb

The answer is NO.

We can not parse such a simple case with regular expressions.

Regularity Criteria

Grammar is regular if it don't contain "recursive" generators.

So what to do?

Not sure

the only one thing we know is that regular expressions will not help

def match_aabb_pattern?(string)
  string.each_char.with_index.each do |c, i|
    case c
    when 'a'
      count_a += 1
    when 'b'
      return count_a == string.size - i - 1
    else
      return false
    end
  end
  return false
end

Regular Expressions

Powers & Limits

Grammars/Parsers can be

Regular & Recursive

Getting close to real world examples

Regex are still possible

But not really in the long term

Start from Tokenizing

"12*(2+8/(13.5+7))" =>
[
  '12', '*', '(', '2', '+', '8' '/', 
  '(', '13.5', '+', '7', ')', ')'
]

Builing Abstract Syntax Tree (AST)

The Hardest Part

['12', '*', ['(', '2', '+', '8' '/', '(', '13.5', '+', '7', ')', ')'] ]
[left = '12', operator, right = [...]]

['12', '*', [ '2', '+', '8' '/', '(', '13.5', '+', '7', ')'] ]
Cleanup useless brackets ( )


['12', '*', [ '2', '+', [ '8' '/', '(', '13.5', '+', '7', ')' ] ] ]
# Repeat for right and left node

['12', '*', [ '2', '+', [ '8', '/', [ '13.5', '+', '7', ] ]]]
# Repeat for right and left node
  1. Find lowest priority operator
  2. Split Array into 3 parts: left operand, operator, right operand
  3. Repeat 1-3 for before part and after part

Work with AST is easy

def evaluate(ast)
  if ast.is_a?(Array)
    left, operator, right = ast
    evaluate(left).send(operator, evaluate(right))
  else
    ast.to_f
  end
end

After such a heavy infrastructure

it is easies to support

Building AST

is Hard problem

But you are able to use tools

Recursive parsers library

  • treetop - a little easier, a bit slower
  • racc - more hard core, used in Rails Router

Rails router sytax

/users/:id(.:format)
(/locale/:locale)/users/:id(.:format)
(/locale/:locale(/currency/:currency))/users/:id(.:format)

Regularity criteria can be broken.

This is a huge problem and require router to be parsed by recursive parser.

Route tokenizer:

/users/:id(.:format)
Token Type
/ SLASH
users LITERAL
/ SLASH
:id SYMBOL
( LPAREN
. DOT
:format SYMBOL
) RPAREN

Fragments from parser

github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/journey/parser.y

They are so severe that created their own syntax

  group : LPAREN expressions RPAREN { ... } ;
expressions : expressions expression { ... } | expression { ... } | or ;
# Simplied! expression : symbol | literal | slash | dot | group | star ;

Parser markup is compiled to ruby

racc -l -o parser.rb parser.y

github.com/tenderlove/racc

github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/journey/parser.rb

Ukraine is having a hard time

Software developers have unique opportunity to support economics of our country. We are the only one that can really bring money to Ukraine.

So, lets do our work better

And waste less money on new Smart Phones

Thanks for your attension

http://gusiev.com

http://github.com/bogdan

http://gusiev.com/slides/parsers