Languages and Translators with Visual Studio

#yacc #lex #tools (part 1)

Warning! the NuGet package YaccLexTools has been updated since this post has been published. Read the post about the updates and the simplifications introduced: Yacc/Lex Tools v0.2

Sometimes it's nice to make sense of what you have studied in the past and this is the case of Languages and Translators.

I had in mind to enhance the function of quick search of a web application. Normally this quick search takes all the words and makes a Full-Text Search on all data. So I said to myself, why not to give the option for users to enter filter expressions in a simple way?

Such as:

year: 2000
year: >= 2000
year: 2000..2010 month: > 6

Said and done! What we need here are a parser and a syntax analyzer.

As of today it is not very practical to manually create these tools, but we make use of automatic generators. The ones that I have taken into consideration are GPPG to generate the parser and GPLEX to generate the sintax analyzer, both created by QUT, Australian university. GPPG generates a parser in C# language from a formal description of the grammar written in YACC and GPLEX generates a syntax analyzer (scanner) from the description of the language syntax in LEX.

Given that the reader already has the knowledge about how to write a YACC grammar and a scanner in LEX, the goal of this post is to guide the configuration of the project in Visual Studio to generate the parser and the scanner during the build of the the project itself.

First of all you must add the NuGet package "YACC/LEX tools" with the command

PM> Install-Package YaccLexTools

Now suppose you want to create the parser for search queries as described above.

  1. Click with the right button on the project and select "Unload Project"
  2. Click with the right button on the project and select "Edit"
  3. Add the following snippet at the end of the file, just before the closing tag </Project> and save.

    <Target Name="BeforeBuild" DependsOnTargets="BuildGen">
    <!-- Build generated file target -->
    <Target Name="BuildGen" DependsOnTargets="GenerateMyLanguageParser">
    <!-- Parser items -->
     <None Include="MyLanguage.parser" />
     <Compile Include="MyLanguage.Parser.cs">
     <Compile Include="MyLanguage.Scanner.cs">
     <None Include="MyLanguage.Language.grammar.y">
     <Compile Include="MyLanguage.Parser.Generated.cs">
     <None Include="MyLanguage.Language.analyzer.lex">
     <Compile Include="MyLanguage.Scanner.Generated.cs">
    <ItemGroup />
    <!--  Generate the parsers -->
    <Target Name="GenerateMyLanguageParser"
     <Message Text="Generating scanner for $(MyLanguageParser) ..." />
     <Exec Command="$(GplexTool) &quot;/out:$(MyLanguageParser).Scanner.Generated.cs&quot; &quot;$(MyLanguageParser).Language.analyzer.lex&quot;" 
       <Output TaskParameter="Outputs" ItemName="MyLanguageScanner" />
     <Message Text="Generating parser for $(MyLanguageParser) ..." />
     <Exec Command="$(GppgTool) /no-lines /gplex &quot;$(MyLanguageParser).Language.grammar.y&quot; > &quot;$(MyLanguageParser).Parser.Generated.cs&quot;" 
       <Output TaskParameter="Outputs" ItemName="MyLanguageParser" />
  4. Copy the files from folder "packages\YaccLexTools.0.1.2\src" into the folder of the project.

  5. Click with the right button on the project and select "Reload Project".

When the project is reloaded, this is what you will have in the Solution Explorer of Visual Studio:


The figure above shows all the files that make up the parser.

  • MyLanguage.parser is just a place holder and does not contain anything interesting. It only serves to group into a single node all the files of the parser.
  • MyLanguage.Language.analyzer.lex is the LEX source for syntax analyzer.
  • MyLanguage.Scanner.Generated.cs is the C# file generated from the LEX file "MyLanguage.Language.analyzer.lex".
  • MyLanguage.Language.grammar.y is the YACC source for the parser. It contains the grammar description.
  • MyLanguage.Parser.Generated.cs is the C# file generated from the YACC file "MyLanguage.Language.grammar.y".

Finally, the files MyLanguage.Parser.cs and MyLanguage.Scanner.cs are C# partial classes that extend respectively the parser and the scanner generated by YACC and LEX files to add the methods called in the actions for rules in grammar and syntax.

End Notes.

For this work has been very useful to see the Open Source code for projects such as IronRuby compiler. If I could not look into the code of IronRuby I would never do such a thing. What I mean is that you can learn a lot, but also find inspiration, comparing with others ;)


This post has been republished from my previous blog (