A New Kind of Parser Method

A New Kind of Parser Method

Source: Dev.to

I don't know if you heard, but I'm pretty sure you know the method (or the concept) of Pratt parsing. To me, I don't really understand a lot of the concepts of Pratt parsing; binding power and NUD are way out of my league. Nevertheless, it's a really good parsing technique and is used a lot of times. But I've made another parsing method, more simpler to understand, and at the similar level. What's the Name?
The name for this parsing method is D-M parsing, short for disambiguation-merging parsing. More about the disambiguating and the merging later. What's the general way of using it?
Say you have an expression like this:
2 + foo(4, 5) \* 4 / 23
Notice that the operators and the terms go in this pattern: term-operator-term-operator, and ends at a term (assuming parenthesis were already resolved). My technique uses that pattern to its advantage:
We can separate those two things into two separate lists; terms and operators. In terms, we list out all the terms in the expression in order, so the terms list would look like the following:
2, foo(4, 5), 4, 23
And same with the operators, so it looks like this:
+, \*, /
See another pattern? the terms list is always one element larger than the operator list. Using that, too, we can now make our tree. How to Make the Tree
Now comes the magic- when you group the term elements by pairs (yes, they overlap), you can make the pairs the same as the operator list's length. i.e.:
(2, foo(4, 5)), (foo(4, 5), 4), (4, 23)
This is crucial for the binary operators.
Now that we have normalized the lists to one length, we pass through the tree. We check the highest precedence (in this case, * and /), and start pairing from there. Now, if we list the operator and term's index, and match the first * with the corresponding pair- foo(4, 5), 4, we get our first tree- [foo(4, 5), 4]{*} (note: the curly braces is the operator, the square braces are for the arguments). We then remove the used * and the pair with [foo(4, 5), 4]{*}.
Currently, we have:
2, [foo(4, 5), 4]{*}, 23 as our terms list. now, we check for the next element with the same precedence; /.
And yeah, keep repeating the process; you should get:
[2, [[foo(4, 5), 4]{*}, 23]{/}]{+} as the final tree, or in coding terms- Okay, But Are There Any Hidden Things?
Yes, of course! This way of parsing needs a LOT more than just two lists and a grouper; we need a disambiguation part and a merging part. The disambiguation part is for splitting - into two core types, -u (unary) and -b (binary), splitting . into two core types, .n (numeric/decimal point) and .m (member access), and splitting ( into two core types, (n (numeric/grouping opening) and (c (function call). The merger, though, that is one of the hard parts. you need to merge -u x into -x, a .m b into a.b, and foo .m bar (c 23, 24 ) into foo.bar(23, 24). its basically a mini-parser; except for only functions (and maybe a few mix-fix here and there); nothing else. Once those two core components have ran, the final output is a list full of terms, operators, and group openers/closers. And, like any other parser, recursively parse those parentheses. Now, with your fully-parsed AST, you're good to go for an expression parser! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
add ( 2, divide ( multiply ( foo(4, 5), 4 ), 23 )
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
add ( 2, divide ( multiply ( foo(4, 5), 4 ), 23 )
) CODE_BLOCK:
add ( 2, divide ( multiply ( foo(4, 5), 4 ), 23 )
)