Lab 3: HTML-to-LaTeX Converter Using Lex

Submission Due: the beginning of your lab session two weeks from when it was assigned

In this laboratory, students will gain experience with the Lex lexical analyzer generator by constructing a program that will convert a subset of HTML into LaTeX.

Lab Materials

  1. Notes on Lex

  2. Lab 03 Source Files

  3. Test LaTeX Output Reference

  4. Test PDF Output Reference

Lab Assignment

Begin the project by downloading the laboratory 3 source code, the LaTeX output reference, and the PDF output reference. The laboratory 3 source code contains a Make based build system, initial Lex source code for the HTML-to-LaTeX converter, and an example input file. The LaTeX output reference shows what the resulting LaTeX output should look like for the input example and the PDF output reference shows what the resulting PDF should look like. If you have never used LaTeX to typeset a document, then the open-content LaTeX book may be helpful (see slides).

To start, create the lab without making any changes by typing make on your terminal after having extracted the source code and after having entered the laboratory 3 directory. This should create an executable named html2latex inside of the directory. Once you have done this correctly you can attempt to convert the example input file into a PDF output file by typing make test on your terminal. This will fail with an error because you have not completed the laboratory.

The purpose of this laboratory is to convert the subset of HTML given below into correct LaTeX output that can then be converted into a PDF. The laboratory 3 source code that you downloaded has an example of how to use Lex to convert the HTML H1 tag and HTML comments into LaTeX. You will complete the laboratory by adding the required additional rules to the Lex source code file html2latex.l. To receive full points in this laboratory you will need to produce correct output by converting the following HTML tags into LaTeX:

HTML paragraphs and HTML list items can both contain HTML tags inside of their text region. You are not required to support nesting of these HTML tags so you can assume the text between each of the tags is plain text. You must support the following tags inside of paragraphs and list items:

When you demonstrate your solution for this lab, you will do so using both the provided test input as well as an input that you will not have access to ahead of time. Be sure your program handles all of the cases given above.

Lab Questions

In a paragraph or two, describe another input-to-output translation that Lex could be used for. What domain specific concerns need to be addressed? Can Lex handle them, and if so, how? In the interest of keeping things interesting, please avoid translations similar to the one we dealt with in this lab.