Lore's Source to HTML Converter


How to make a Language Defintion

 1. Introduction
 2. List of Keys
 3. How to use Regular Expressions
 4. Sample Language Definition

Main Index top

1. Introduction

The language definitions can be found in the file Languages.ini that is located in the program's root directory. In most cases this will be C:\Programs\Source2Html\. There is a section named [Language%] for each language definition with % being an integer number. Be aware that these numbers must start from 1 and that the program stops reading as soon as the first integer number is missing.


Main Index top

2. List of Keys

In the table below you find a list of all valid keys of a language definition. The default value is used if a key is missing.

KeyDescriptionDefault Value
NameName of the languagenone - must be given
SymbolsList of all legal symbolsempty
ExtensionsList typical file extensions; each one seperated by a ";"empty
CaseSensitiveIs the language case sensitive? Relevant when trying to match the word liststrue
AllowWhiteAfterFirstAllow whitespaces after first matching character of preprocessor directivestrue
StringCharacters that start/end a stringdeactivated
CharacterCharacters that start/end character chaindeactivated
SillyStringHandlingAllow multiline strings without escape character; maybe obsolete in future releasesfalse
EscapeCharEscape character in strings or character chainsdeactivated
SingleLineCommentSingle line comment starting sequenceempty
MultiLineCommentMultiline comment starting and ending sequencesempty
KeywordsKeyword listempty
PreprocessorPreprocessor word listempty
Words%User defined word list with % being an integer starting from 1empty
  Words%TypeCode item type used to colorize matching wordsnone, must be specified
  Words%ExchangeTypeCode item type used to colorize following identifers (identifier type exchange)none
  Words%EndSequenceSequence that ends identifier type exchangenone
Regex%Regular expression with % being an integer starting from 1empty
  Regex%TypeCode item type used to colorize matching stringsempty
  Regex%ExchangeTypeSame as abovenone
  Regex%EndSequenceSame as abovenone


Main Index top

3. How to use Regular Expressions

You may use regular expression syntax (regex) to identify code items. You specify a regex string using the key Regex% with % beeing an integer number starting from 1. The item type that is identified by Regex% is defined by the key Regex%Type. Valid item types are Comment, Keyword, Identifier, Symbol, String, Number, Character, Preprocessor, Custom1, Custom2, Custom3 and Custom4.

A unique character in the regex string stands for exactly this character. Furthermore you may use the codes of the following table to demand one or an arbitrary number of alphas and/or numbers and/or whitespaces or characters. Arbitrary means any number including zero. It's not allowed to have two arbitrary codes after eachother. To interpret a backslash as a unique character you need to use an additional backslash as escape sequence, i.e. "\\"

CodeExplanation
\0one arbitrary character
\1one whitespace
\2one number
\3one number or whitespace
\4one alpha
\5one alpha or whitespace
\6one alpha or number
\7one alpha or number or whitespace
\8arbitrary number of arbitrary characters
\9arbitrary number of whitespace
\Aarbitrary number of numbers
\Barbitrary number of numbers or whitespace
\Carbitrary number of alpha
\Darbitrary number of alpha or whitespace
\Earbitrary number of alpha or numbers
\Farbitrary number of alpha or numbers or whitespace

The following table contains some examples of useful regex strings.

Regex StringExplanationExample string
"\\\4\C"A backslash followed by one or more alphas"\Section"
"\\%"A backslash followed by a "%""\%"
"#\2\A"A "#" followed by one or more numbers"#97"
"&\4\C;"A "&" followed by one or more characters and ended by a ";"" "
"<!--\8-->"A "<!--" followed by an arbitrary number of characters and ended by "-->""<-- HTML-Comment -->"


Main Index top

4. Sample Language Definition

Below you find an annotated sample definition for an artificial language that is mainly a mixture of C/C++, LaTex and HTML.


   1     # The following language definition is a mixture of C/C++, LaTex and HTML etc.
   2     # It's used to show the program's flexibility and strength to adapt to arbitrary languages
   3
   4     [Language1]
   5
   6     # name of the language
   7     Name = "Sample Language"
   8
   9     # list all legal symbols
   10    LegalSymbols = "^|~?&%!,.;:=+-*<>(){}[]/"
   11
   12    # list typical file extensions; each one seperated by a ';'
   13    Extensions = "*.c;*.cpp;*.h;*.cc;*.hpp"
   14
   15    # is the language case sensitive? relevant when trying to match the word lists
   16    CaseSensitive = true
   17
   18    # allow whitespaces after first matching character of preprocessor directives
   19    AllowWhiteAfterFirst = true
   20
   21    # characters that start/end a string or character chain
   22    String = "
   23    Character = '
   24
   25    # multiline string without escape character
   26    SillyStringHandling = false
   27
   28    # escape character in strings or character chains
   29    EscapeChar = \
   30
   31    # single and multiline comment sequences, the latter need a starting and ending sequence each
   32    SingleLineComment = "// % REM"
   33    MultiLineComment = "/* */ <!-- -->"
   34
   35    # characters that are allowed to be within keywords, preprocessor words or user words
   36    CharsWithinString = "$"
   37
   38    # keyword list example from C++
   39    Keywords = "__asm __automated __cdecl __classid __closure __declspec __dispid"
   40
   41    # preprocessor word list example from C++
   42    Preprocessor = "#define #elif #else #endif #error #if #ifdef #ifndef #include #line"
   43
   44    # user word list examples for LaTex commands
   45    Words1               = "\label \ref"
   46    Words1Type           = Custom1
   47    Words1ExchangeType   = Custom3          # Following identifiers are colored using 'Custom3' settings ...
   48    Words1EndSequence    = "}"              # ... until '}' is encountered
   49
   50
   51    # regular expressions example for HTML tags
   52    Regex1               = "<\4\E"          # '<' followed by one alpha and then by an arbitrary number ...
   53    Regex1Type           = Custom1          # ... of alphas or numbers is a keyword (example: "<h1")
   54    Regex1ExchangeType   = Custom4          # Following identifiers are colored using 'Custom4' settings ...
   55    Regex1EndSequence    = ">"              # ... until '>' is encountered

last updated: 06 January 2005   © 2000-2005 by Lars Haendel