Implementing a Maximal-Munch Lexical Analyzer

Extending the NFA generator Previously we implemented a regular expression matching algorithm that generated an NFA from an input regular expression, and simulated a string over the NFA to check for acceptance. The regular expression was required to be in postfix form and the algorithm only supported union $|$ , concatenation $.$ , Kleene star $*$ , one-or-more $+$ , and zero-or-one $?$ operations. While this was sufficient for simple applications, building a lexer would require support for character ranges (for example $[0\text{-}9]$, $[a\text{-}z]$) and escape characters (for example one may wish to use $*$ for the multiply operator rather than Kleene star)....

January 31, 2025 · 13 min · 2617 words · Abhinav Pradeep

Regular expression matching: Theory and Implementation

Introduction The first step in building a compiler would be to build a lexical analyzer, shortened to ’lexer’. To define what a lexer does we may understand it as a function: $$\text{Lexer :: string} \to [\text{token}]$$ Where a token can be thought of as the tuple $\text{token} = \langle \text{class, string} \rangle$. A token class, loosely speaking, corresponds to a set of strings. For example we may class strings as identifier tokens, integer tokens, keyword tokens or whitespace tokens....

January 25, 2025 · 20 min · 4053 words · Abhinav Pradeep

Learning Measure Theory 3: Measurable spaces and functions

Notes from learning measure theory. Covers measurable spaces and functions. These are notes and therefore liable to inaccuracies.

June 22, 2024 · 18 min · 3694 words · Abhinav Pradeep

Building a DBMS on the Raspberry Pi Pico: Part 1, Programming flash memory

First in a series about building a DBMS on the Raspberry Pi Pico. This article covers the first step in this process: programming flash memory

June 19, 2024 · 8 min · 1681 words · Abhinav Pradeep

Learning Measure Theory 2: Sigma algebras

Notes from learning measure theory. Establishes the need for sigma algebras. These are notes and therefore liable to inaccuracies.

February 21, 2024 · 7 min · 1286 words · Abhinav Pradeep