| Lertulo ( @ 2008-10-15 19:29:00 |
Recursive Descent Parsing
If you look over the formal grammar, you'll see that it's defined as a hierarchy. For example, type A is defined as type B followed by a semicolon or an optional series of type C followed by a type D:
A simplistic parser that recognizes such a definition would look roughly like this:
Notice that, on failure, we didn't scream and complain: if parse_type_b doesn't see a type_b syntax coming up next on the stream, then it's not a big deal. All it does is return false, indicating that we didn't see a type_b coming up--and importantly, we didn't advance the stream at all. (Notice that we might've successfully pulled a type_c or two off the stream, then failed to find a type_d following on. Gotta put those type_c's back.)
With an adequately complex grammar (like the one we're using for parsing C-like expressions), you can imagine how quickly this turns into a deep recursive stack. In practice, this is exactly how the recursive descent parser works: having defined the grammar very precisely, we write methods that exactly parse each basic type, and allow them to recurse pretty deeply. It's quite simple to implement, but can be unpleasant to debug; helps to add a tracing facility to show where you are in the parsing stack.
If you look over the formal grammar, you'll see that it's defined as a hierarchy. For example, type A is defined as type B followed by a semicolon or an optional series of type C followed by a type D:
type_a ::= type_b ";" | [type_c]* type_d
A simplistic parser that recognizes such a definition would look roughly like this:
bool parse_type_a (sourcedata &in)
{
pos = in.tellPosition();
// See if we have a type_b followed by a semicolon
if (parse_type_b (in)) {
if (parse_token (in, SEMICOLON)) {
return true;
}
in.setPosition (pos); // didn't pan out; rewind try the other form
}
// Didn't have that; see if we have a series of type_c's followed by a type_d
while (parse_type_c (in)) {
;
}
if (parse_type_d (in)) {
return true;
}
in.setPosition (pos);
return false;
}
Notice that, on failure, we didn't scream and complain: if parse_type_b doesn't see a type_b syntax coming up next on the stream, then it's not a big deal. All it does is return false, indicating that we didn't see a type_b coming up--and importantly, we didn't advance the stream at all. (Notice that we might've successfully pulled a type_c or two off the stream, then failed to find a type_d following on. Gotta put those type_c's back.)
With an adequately complex grammar (like the one we're using for parsing C-like expressions), you can imagine how quickly this turns into a deep recursive stack. In practice, this is exactly how the recursive descent parser works: having defined the grammar very precisely, we write methods that exactly parse each basic type, and allow them to recurse pretty deeply. It's quite simple to implement, but can be unpleasant to debug; helps to add a tracing facility to show where you are in the parsing stack.