| newty.de | About | Download | Links | Disclaimer | Index | Contact |
| 1. Introduction |
| 2. List of Keys |
| 3. How to use Regular Expressions |
| 4. Sample Language Definition |
The language definitions can be found in the file Languages.ini that is located in the program's root directory. In most cases this will be C:\Programs\Source2Html\. There is a section named [Language%] for each language definition with % being an integer number. Be aware that these numbers must start from 1 and that the program stops reading as soon as the first integer number is missing.
In the table below you find a list of all valid keys of a language definition. The default value is used if a key is missing.
| Key | Description | Default Value |
| Name | Name of the language | none - must be given |
| Symbols | List of all legal symbols | empty |
| Extensions | List typical file extensions; each one seperated by a ";" | empty |
| CaseSensitive | Is the language case sensitive? Relevant when trying to match the word lists | true |
| AllowWhiteAfterFirst | Allow whitespaces after first matching character of preprocessor directives | true |
| String | Characters that start/end a string | deactivated |
| Character | Characters that start/end character chain | deactivated |
| SillyStringHandling | Allow multiline strings without escape character; maybe obsolete in future releases | false |
| EscapeChar | Escape character in strings or character chains | deactivated |
| SingleLineComment | Single line comment starting sequence | empty |
| MultiLineComment | Multiline comment starting and ending sequences | empty |
| Keywords | Keyword list | empty |
| Preprocessor | Preprocessor word list | empty |
| Words% | User defined word list with % being an integer starting from 1 | empty |
| Words%Type | Code item type used to colorize matching words | none, must be specified |
| Words%ExchangeType | Code item type used to colorize following identifers (identifier type exchange) | none |
| Words%EndSequence | Sequence that ends identifier type exchange | none |
| Regex% | Regular expression with % being an integer starting from 1 | empty |
| Regex%Type | Code item type used to colorize matching strings | empty |
| Regex%ExchangeType | Same as above | none |
| Regex%EndSequence | Same as above | none |
You may use regular expression syntax (regex) to identify code items. You specify a regex string using the key Regex% with % beeing an integer number starting from 1. The item type that is identified by Regex% is defined by the key Regex%Type. Valid item types are Comment, Keyword, Identifier, Symbol, String, Number, Character, Preprocessor, Custom1, Custom2, Custom3 and Custom4.
A unique character in the regex string stands for exactly this character. Furthermore you may use the codes of the following table to demand one or an arbitrary number of alphas and/or numbers and/or whitespaces or characters. Arbitrary means any number including zero. It's not allowed to have two arbitrary codes after eachother. To interpret a backslash as a unique character you need to use an additional backslash as escape sequence, i.e. "\\"
| Code | Explanation |
| \0 | one arbitrary character |
| \1 | one whitespace |
| \2 | one number |
| \3 | one number or whitespace |
| \4 | one alpha |
| \5 | one alpha or whitespace |
| \6 | one alpha or number |
| \7 | one alpha or number or whitespace |
| \8 | arbitrary number of arbitrary characters |
| \9 | arbitrary number of whitespace |
| \A | arbitrary number of numbers |
| \B | arbitrary number of numbers or whitespace |
| \C | arbitrary number of alpha |
| \D | arbitrary number of alpha or whitespace |
| \E | arbitrary number of alpha or numbers |
| \F | arbitrary number of alpha or numbers or whitespace |
The following table contains some examples of useful regex strings.
| Regex String | Explanation | Example string |
| "\\\4\C" | A backslash followed by one or more alphas | "\Section" |
| "\\%" | A backslash followed by a "%" | "\%" |
| "#\2\A" | A "#" followed by one or more numbers | "#97" |
| "&\4\C;" | A "&" followed by one or more characters and ended by a ";" | " " |
| "<!--\8-->" | A "<!--" followed by an arbitrary number of characters and ended by "-->" | "<-- HTML-Comment -->" |
Below you find an annotated sample definition for an artificial language that is mainly a mixture of C/C++, LaTex and HTML.
1 # The following language definition is a mixture of C/C++, LaTex and HTML etc.
2 # It's used to show the program's flexibility and strength to adapt to arbitrary languages
3
4 [Language1]
5
6 # name of the language
7 Name = "Sample Language"
8
9 # list all legal symbols
10 LegalSymbols = "^|~?&%!,.;:=+-*<>(){}[]/"
11
12 # list typical file extensions; each one seperated by a ';'
13 Extensions = "*.c;*.cpp;*.h;*.cc;*.hpp"
14
15 # is the language case sensitive? relevant when trying to match the word lists
16 CaseSensitive = true
17
18 # allow whitespaces after first matching character of preprocessor directives
19 AllowWhiteAfterFirst = true
20
21 # characters that start/end a string or character chain
22 String = "
23 Character = '
24
25 # multiline string without escape character
26 SillyStringHandling = false
27
28 # escape character in strings or character chains
29 EscapeChar = \
30
31 # single and multiline comment sequences, the latter need a starting and ending sequence each
32 SingleLineComment = "// % REM"
33 MultiLineComment = "/* */ <!-- -->"
34
35 # characters that are allowed to be within keywords, preprocessor words or user words
36 CharsWithinString = "$"
37
38 # keyword list example from C++
39 Keywords = "__asm __automated __cdecl __classid __closure __declspec __dispid"
40
41 # preprocessor word list example from C++
42 Preprocessor = "#define #elif #else #endif #error #if #ifdef #ifndef #include #line"
43
44 # user word list examples for LaTex commands
45 Words1 = "\label \ref"
46 Words1Type = Custom1
47 Words1ExchangeType = Custom3 # Following identifiers are colored using 'Custom3' settings ...
48 Words1EndSequence = "}" # ... until '}' is encountered
49
50
51 # regular expressions example for HTML tags
52 Regex1 = "<\4\E" # '<' followed by one alpha and then by an arbitrary number ...
53 Regex1Type = Custom1 # ... of alphas or numbers is a keyword (example: "<h1")
54 Regex1ExchangeType = Custom4 # Following identifiers are colored using 'Custom4' settings ...
55 Regex1EndSequence = ">" # ... until '>' is encountered
last updated: 06 January 2005 © 2000-2005 by Lars Haendel