Writing a simple lexicon scanner in .NET

I came across Linq and the new Lambda Expression classes in .NET 3.0. I thought, it would be a good idea to dynamically build a lambda expression from a math string, something like: (a + b) * 1.5. The string is parsed into a lambda expression and then is compiled into a delegate. Then, if you well, you can call the delegate to get the result of the formula.

So, in other words, we are building a scanner and a parser. The first part that I will talk about here is building a scanner using the Regex class in .NET.

The tokens that we need to scan for are defined next:

 // Tokens that represent the input
 internal enum Token
     OpenParan, CloseParan,
     Plus, Minus, Multiply, Divide,
     Other       // Represents unrecognized charachters

The regular expression pattern then is defined next:

 // The pattern used with the regular expression class to scan the input
 const string Pattern = @"
         (?'OpenParan' \( ) | (?'CloseParan' \) ) |
         (?'Arrow' => ) |
         (?'Comma' ,  ) |
         (?'Plus' \+ ) | (?'Minus' - ) | (?'Multiply' \* ) | (?'Divide' / ) |
         (?'Constant' (\.\d+|\d+(\.\d+)?) ) |
         (?'Variable' [a-zA-Z]\w* ) |
         (?'Other' [^ \r\n\t])";
 // Regular expression used to scan the input
 private static Regex MathRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | 
RegexOptions.Singleline | RegexOptions.Compiled);

The pattern basically tries to find any of the groups and name each group by the given name. We also assign the group ‘Other’ to anything else that we couldn’t match.

The scanner then can be implemented as an enumerable method that returns the found tokens.

 // Enumurable to get tokens from the given expression (scanner)
 public static IEnumerable<TokenEntity> GetLambdaCalcTokens(this string exp)
     Token[] tokens = Enum.GetValues(typeof(Token)).OfType<Token>().ToArray();
     foreach (Match m in MathRegex.Matches(exp))
         // Check which token is matched by this match object
         foreach (Token token in tokens)
             if (m.Groups[token.ToString()].Success)
                 yield return new TokenEntity(
     // return the end string token, to indecate we are done
     yield return new TokenEntity(Token.Other, exp.Length, "\0");

The only problem with using enumerable is that it doesn’t allow us to peek forward without moving next. This could be a problem with complex parsers, but the parser that I am writing here is simple enough that can be written without this feature.

The previous method is written as an extension on the string, so it can be easily called on any string like “(a) => a + 1.5”.GetLambdaCalcTokens()

The class TokenEntity is defined as:

 // Holds token info
 internal class TokenEntity
     public TokenEntity(Token token, int startPos, string value)
         this.Token = token;
         this.StartPos = startPos;
         this.Value = value;
     // Token type
     public Token Token { get; private set; }
     // Start position in the original string
     public int StartPos { get; private set; }
     // Value
     public string Value { get; private set; }
     public override string ToString()
         return string.Format("{0} at {1}: {2}", Token, StartPos, Value);

Running the previous code on: "(a, b) => ; a + b * 0.1" , results with the following tokens:

OpenParan at 0: (
Variable at 1: a
Comma at 2: ,
Variable at 4: b
CloseParan at 5: )
Arrow at 7: =>
Other at 10: ;
Variable at 12: a
Plus at 14: +
Variable at 16: b
Multiply at 18: *
Constant at 20: 0.1
Other at 21:  

I am attaching the source code for the simple scanner. I will follow next with the parser and how we can build lambda expression dynamically.