Compiler - from Basic to Advanced - Part B
This part uses the defined language in Part A to write the Lexer.
We need to do the following steps to extract the token by token in the Part A:
- define character types:
public static final int LETTER = 0; //letter
public static final int DIGIT = 1; //digit
public static final int UNKNOWN = 2; //unknown, use lookup to find class
public static final int LEFT_PAREN = 3; //left parenthesis
public static final int RIGHT_PAREN = 4; //right parenthesis
public static final int ADD_OP = 5; //add operator
public static final int SUB_OP = 6; //sub operator
public static final int ASSIG_OP = 7; //assignment operator
public static final int EOF = 8; //end of file
public static final int LESS_OP = 9; //less than comparison operator
public static final int GREATER_OP = 10; //greater than comparison operator
public static final int COMMA = 11; //comma character
public static final int ERROR = -1; //unknown character
public static final int IDENT = 100; //identity
- open input file, read character by character and return the token (token type with the lexeme)
We need 3 methods that are called by order to retrieve a token (This token will be used in the parser of Part B)
a. read character and recognize the character type
b. look up to the character (if token is unknown)
c. lex method (return the completed token)
a) read character
We can use Scanner in Java to read one character such as scanner.read(). This method also returns the character class such as LETTER, DIGIT, UNKNOWN
b) lookup method
This method uses the swith case:
if character is '(', this character is appended to the lexeme and token type is LEFT_PAREN
if character is '+', this character is appended to the lexeme and token type is ADD_OP
...
c) lex method
This calls (a) method and (b) method to get the token type and the lexeme. This is called by the parser.
Summary:
So far, we have understood the followings:
- language and syntax
- lexeme
- token
- lexer
The next parts are:
Part C - parser
Part D - code generation and automatic tools