Lexer¶
The Lexer converts TLSQL text into a stream of tokens. It recognizes keywords, identifiers,
literals, operators, and punctuation.
Supported Elements¶
Keywords: TRAIN, PREDICT, VALIDATE, WITH, FROM, WHERE.
Identifiers: Table and column names.
Literals: Strings (with escape sequences) and numbers.
Operators: Comparison (>, <, >=, <=, =, !=) and logical (AND, OR, NOT).
Comments: Single-line (
--) and multi-line (/* */).
Example¶
from tlsql.tlsql.lexer import Lexer
lexer = Lexer("PREDICT VALUE(users.Age, CLF) FROM users")
tokens = lexer.tokenize()
for token in tokens:
print(f"{token.type.name}: {token.value}")
- class Lexer(text)[source]¶
Bases:
objectConvert TLSQL input into a token stream.
- text¶
Input text.
- char_pos¶
Current character index.
- line_num¶
Current line number.
- col_num¶
Current column number.
- current_char¶
Current character.
- tokenize()[source]¶
Tokenize entire input.
Steps: Skip whitespace, recognize strings and numbers literals, recognize identifiers & keywords, recognize operators append EOF token.
- Returns:
List of tokens ending with EOF.
- Raises:
LexerError – Raised for unknown characters.