Lexer

The Lexer converts TLSQL text into a stream of tokens. It recognizes keywords, identifiers, literals, operators, and punctuation.

Supported Elements

  • Keywords: TRAIN, PREDICT, VALIDATE, WITH, FROM, WHERE.

  • Identifiers: Table and column names.

  • Literals: Strings (with escape sequences) and numbers.

  • Operators: Comparison (>, <, >=, <=, =, !=) and logical (AND, OR, NOT).

  • Comments: Single-line (--) and multi-line (/* */).

Example

from tlsql.tlsql.lexer import Lexer

lexer = Lexer("PREDICT VALUE(users.Age, CLF) FROM users")
tokens = lexer.tokenize()

for token in tokens:
    print(f"{token.type.name}: {token.value}")
class Lexer(text)[source]

Bases: object

Convert TLSQL input into a token stream.

text

Input text.

char_pos

Current character index.

line_num

Current line number.

col_num

Current column number.

current_char

Current character.

__init__(text)[source]

Initialize lexer.

Parameters:

text – Input text.

tokenize()[source]

Tokenize entire input.

Steps: Skip whitespace, recognize strings and numbers literals, recognize identifiers & keywords, recognize operators append EOF token.

Returns:

List of tokens ending with EOF.

Raises:

LexerError – Raised for unknown characters.