SOLUTION: implementation of the parser program | My Assignment Tutor

parser Description You must complete the implementation of the parser program in the file parser.cpp. The program reads a Jack class from standard input and writes an XML representation of its abstract syntax tree to standard output. It uses the tokeniser functions described in tokeniser.h to parse a Jack class read from standard input and construct an abstract syntax tree using the functions as described in abstract-syntax-tree.h. The main function is responsible for calling the parse_class() function and passing the result to the function ast_print_as_xml(). The ast_print_as_xml() function is responsible for writing an XML representation of the abstract syntax tree to standard output. Compiling and Running parser When the Makefile attempts to compile the program parser, it will use the file parser.cpp. The program can be compiled and run using the command: % make test 0parser make will only compile the program if it has not yet been compiled or the source files have changed since it was last compiled. Note: Do not modify the provided Makefile or the sub-directories bin, includes or lib. These will be replaced during testing by the web submission system. Tokeniser The tokeniser to be used by the parser programs is described in includes/tokeniser.h. This lists all the tokens that are recognised as well as extra token kinds that can be used to represent a group of tokens. To simplify writing the parser token groups can be used with the functions have(), have_next(), mustbe() and did_not_find() in the same as an individual token. For example a call of have(tg_statement) will return true if the next token in the input can start one the Jack statements. Similarly, have(tg_inifix_op) can be used to see if a token is an infix operator and have(tg_rel_op) can be used to see if a token is a relational operator. The mustbe() which is used to raise an error if the next token is not what is expected, always returns the token that it matched. This can be very useful when the next input is an identifier or operator because we need to remember the particular token that was found. For example, if the next token in the input is an identifier token, mustbe(tk_identifier) will return the identifier token but still advance to the next token in the input. Similarly, if the next token in the input was ‘+’, a call to mustbe(tg_infix_op) will return the token for the ‘+’ but still advance the input to the next token(). Notes: All tokeniser errors are reported by returning the end of input token tk_eoi.Once a tk_eoi token is returned, all future attempts to read a token will return tk_eoi. Syntax Changes to Jack A parser goes over the tokenised text and emits output indicating that it “understood” the text’s grammatical structure. In order to do so, the parser must include functions that look for canonical structures in a certain language – in our case a modified version of Jack – and then emit these structures in some agreed upon formalism. The structure of your parse_class() function should follow the one developed in workshops rather than the structure in the textbook. The grammar of the modified Jack language that must be recognised can be found in the comments near the beginning of the startup file parser.cpp. It has been reorganised slightly so that it is now LL(1), that is, you can always tell what to parse next by looking at the next token. In particular, the production rule for a term has been changed so that all options starting with a variable are now in a separate production rule named var_term. Since a large part of the initial work in writing a parser is expanding the grammar into matching code, the startup file parser.cpp includes an empty function for each production rule. This will save you a lot of typing. However, you do not need to keep this structure and are free to change it as much as you like. The change to the grammar rule above does not change the Jack language’s syntax however the following changes do: For the purposes of this assignment, the grammar rule for a class now requires all static variables to be declared before field variables. For the purposes of this assignment, the grammar rules now require method calls on the current object to explicitly use this, ie to call the method abc() on the current object use this.abc(). For the purposes of this assignment, the grammar rule for an expression is structured as either a single term or a single right associative infix operation. For the purposes of this assignment, four new infix operators have been introduced, =. The == replaces = as an infix operator and the = operator is now only used by the let statement. Notes: None of the changes to Jack in the grammar rules require any changes to the virtual machine.All input must be read using the tokeniser functions described in includes/tokeniser.h.You must not modify the calls to the symbol table functions push_symbol_table() and pop_symbol_table() provided in the skeleton parser.You must use the function declare_variable() to add statics, fields, parameters and locals to the symbol tables.Do not add the name of a constructor, function or method to a symbol table. We do not check if constructor, function or method calls are correct.You must use the functions lookup_variable() or lookup_variable_fatal(), as appropriate, to lookup variables.No output should be produced, the parser simply returns an abstract_syntax_tree.Any errors you are required to report using mustbe(), did_not_find() and fatal_token_context() are not written to the output, they are separately written to the error output.Do not modify the main() function. Recursive Descent Parsing This is a summary of the recursive descent parsing rules from workshop 05 using tokens instead of characters: token or token group – the next token of the input must be the token or in the token groupif the next token in the input is not the required token, report not finding the token as an error and stop, otherwise read the next token, this can all be implemented using the function mustbe()name – the next part of the input is described by rule namecall the function to parse the named ruleA? – the next part of the input may or may not be term Aif the next token in the input can start term A, parse term AA* – the next part of the input is 0 or more repeats of term Awhile the next token in the input can start term A, parse termAA B – the next part of the input should be term A followed by term Bparse term A, then parse term BA | B – the next part of the input should be either term A or term Bif the next token in the input can start term A, parse term A, orif the next token in the input can start term B, parse term B, orreport not finding a token that can start term A or term B as an error and stop, this can be implemented using the function did_not_find(). Notes: To simplify checking the next token in these situations where more than one token is acceptable, you should check to see if a suitable token group has been defined in the includes/tokeniser.h file. The Abstract Syntax Tree The abstract syntax tree returned by the parse_class() function should contain one node for each production rule of the Jack grammar described in the parser.cpp file with a few exceptions. These exceptions must match those in the supplied test data and may include: Jack source files only contain a single class so the root node must be an ast_class node. Static variables, field variables, parameters and local variables are all represented using ast_var_dec nodes. The order of these nodes must match the order in which their variables are declared. There are two ifStatement nodes, one with an else statement and one without. There are two return nodes, one for a void return and one for returning the value of an expression. The abstract syntax tree nodes are immutable – you cannot change them – so a node’s sub-trees must be created first. In cases when a node has a variable number of children you may need to create a vector variable to hold the sub-trees until you are ready to create the required node. Errors to Catch There are lots of different kinds of errors that a compiler may be able to detect. However, for the purposes of this assignment we are only interested in detecting the following errors: Syntax errors. If at any point in the parsing you cannot find the next symbol that must be present you have detected a syntax error. This should be reported as a side effect of using mustbe() (rule 1) or by explicitly calling did_not_find() (rule 6).Declarations of more than one variable with the same name in the same context. That is, no two static, field, parameter or local variables can have the same name and no parameter can have the same name as a local variable. The error message must start with the name of the variable, report using: fatal_token_context(variable_name + ” has already been declaredn”) ; Attempting to use an undeclared variable. Not all such errors can be detected because in a subroutine call we cannot tell the difference between an undeclared variable and the name of another class. If detected, the error message must start with the name of the variable, report using: fatal_token_context(variable_name + ” must be a declared variablen”) ; Attempting to return a value from a void function or void method. Report using: fatal_token_context(“returning a value from a void function or methodn”) ; An attempt to not return a value from a non void function or method. Report using: fatal_token_context(“not returning a value from a non-void function or methodn”) ; An attempt to return something other than this from a constructor. Check using: mustbe(tk_this) ; A constructor, function or method that might not execute a return statement. Report using: fatal_token_context(“subroutine must finish with a return statementn”) ; A constructor declared with a return type that is not its own class. Report using: fatal_token_context(“constructor return type must be its own classn”) ; Attempts by a function to access this because a function has no object to operate on. Report using: fatal_token_context(“a function cannot access thisn”) ; Attempts by a function to access a field of its class. The field is treated as an undeclared variable and may or may not result in an error. Note: Errors that are reported by calling the fatal_token_context() function will first display the last two lines of the input, marking the start of the current token and then display the error message.Some errors may only be detected after a call of mustbe() so the highlighted token may appear to be after the point at which the error was detected.The example error outputs in the test files will show you at what point error messages are expected to be displayed in each case. Errors to Ignore The following semantic errors will be ignored and the parsing allowed to complete: Attempts to declare more than one constructor, function or method with the same name or to call a constructor, function or method that does not exist. Detecting errors in naming subroutines will be deferred to the assembler when the final VM code version of a program is translated into assembly language.Attempts to apply operators, infix or unary, to values of the wrong types. This is a potentially significant error that we will ignore.Attempts to return a value of a different type from the declared return type of a function or method. This is a potentially significant error that we will ignore except in the case of constructors.