# How to make a Language Defintion

 1. Introduction 2. List of Keys 3. How to use Regular Expressions 4. Sample Language Definition

## 1. Introduction

The language definitions can be found in the file Languages.ini that is located in the program's root directory. In most cases this will be C:\Programs\Source2Html\. There is a section named [Language%] for each language definition with % being an integer number. Be aware that these numbers must start from 1 and that the program stops reading as soon as the first integer number is missing.

## 2. List of Keys

In the table below you find a list of all valid keys of a language definition. The default value is used if a key is missing.

 Key Description Default Value Name Name of the language none - must be given Symbols List of all legal symbols empty Extensions List typical file extensions; each one seperated by a ";" empty CaseSensitive Is the language case sensitive? Relevant when trying to match the word lists true AllowWhiteAfterFirst Allow whitespaces after first matching character of preprocessor directives true String Characters that start/end a string deactivated Character Characters that start/end character chain deactivated SillyStringHandling Allow multiline strings without escape character; maybe obsolete in future releases false EscapeChar Escape character in strings or character chains deactivated SingleLineComment Single line comment starting sequence empty MultiLineComment Multiline comment starting and ending sequences empty Keywords Keyword list empty Preprocessor Preprocessor word list empty Words% User defined word list with % being an integer starting from 1 empty Words%Type Code item type used to colorize matching words none, must be specified Words%ExchangeType Code item type used to colorize following identifers (identifier type exchange) none Words%EndSequence Sequence that ends identifier type exchange none Regex% Regular expression with % being an integer starting from 1 empty Regex%Type Code item type used to colorize matching strings empty Regex%ExchangeType Same as above none Regex%EndSequence Same as above none

## 3. How to use Regular Expressions

You may use regular expression syntax (regex) to identify code items. You specify a regex string using the key Regex% with % beeing an integer number starting from 1. The item type that is identified by Regex% is defined by the key Regex%Type. Valid item types are Comment, Keyword, Identifier, Symbol, String, Number, Character, Preprocessor, Custom1, Custom2, Custom3 and Custom4.

A unique character in the regex string stands for exactly this character. Furthermore you may use the codes of the following table to demand one or an arbitrary number of alphas and/or numbers and/or whitespaces or characters. Arbitrary means any number including zero. It's not allowed to have two arbitrary codes after eachother. To interpret a backslash as a unique character you need to use an additional backslash as escape sequence, i.e. "\\"

 Code Explanation \0 one arbitrary character \1 one whitespace \2 one number \3 one number or whitespace \4 one alpha \5 one alpha or whitespace \6 one alpha or number \7 one alpha or number or whitespace \8 arbitrary number of arbitrary characters \9 arbitrary number of whitespace \A arbitrary number of numbers \B arbitrary number of numbers or whitespace \C arbitrary number of alpha \D arbitrary number of alpha or whitespace \E arbitrary number of alpha or numbers \F arbitrary number of alpha or numbers or whitespace

The following table contains some examples of useful regex strings.

 Regex String Explanation Example string "\\\4\C" A backslash followed by one or more alphas "\Section" "\\%" A backslash followed by a "%" "\%" "#\2\A" A "#" followed by one or more numbers "#97" "&\4\C;" A "&" followed by one or more characters and ended by a ";" " " "" A "" "<-- HTML-Comment -->"

## 4. Sample Language Definition

Below you find an annotated sample definition for an artificial language that is mainly a mixture of C/C++, LaTex and HTML.


1     # The following language definition is a mixture of C/C++, LaTex and HTML etc.
2     # It's used to show the program's flexibility and strength to adapt to arbitrary languages
3
4     [Language1]
5
6     # name of the language
7     Name = "Sample Language"
8
9     # list all legal symbols
10    LegalSymbols = "^|~?&%!,.;:=+-*<>(){}[]/"
11
12    # list typical file extensions; each one seperated by a ';'
13    Extensions = "*.c;*.cpp;*.h;*.cc;*.hpp"
14
15    # is the language case sensitive? relevant when trying to match the word lists
16    CaseSensitive = true
17
18    # allow whitespaces after first matching character of preprocessor directives
19    AllowWhiteAfterFirst = true
20
21    # characters that start/end a string or character chain
22    String = "
23    Character = '
24
25    # multiline string without escape character
26    SillyStringHandling = false
27
28    # escape character in strings or character chains
29    EscapeChar = \
30
31    # single and multiline comment sequences, the latter need a starting and ending sequence each
32    SingleLineComment = "// % REM"
33    MultiLineComment = "/* */ <!-- -->"
34
35    # characters that are allowed to be within keywords, preprocessor words or user words
36    CharsWithinString = "\$"
37
38    # keyword list example from C++
39    Keywords = "__asm __automated __cdecl __classid __closure __declspec __dispid"
40
41    # preprocessor word list example from C++
42    Preprocessor = "#define #elif #else #endif #error #if #ifdef #ifndef #include #line"
43
44    # user word list examples for LaTex commands
45    Words1               = "\label \ref"
46    Words1Type           = Custom1
47    Words1ExchangeType   = Custom3          # Following identifiers are colored using 'Custom3' settings ...
48    Words1EndSequence    = "}"              # ... until '}' is encountered
49
50
51    # regular expressions example for HTML tags
52    Regex1               = "<\4\E"          # '<' followed by one alpha and then by an arbitrary number ...
53    Regex1Type           = Custom1          # ... of alphas or numbers is a keyword (example: "<h1")
54    Regex1ExchangeType   = Custom4          # Following identifiers are colored using 'Custom4' settings ...
55    Regex1EndSequence    = ">"              # ... until '>' is encountered



last updated: 06 January 2005   © 2000-2005 by Lars Haendel