How To...

Using regular expressions in a text source definition

Text sources could have very different internal structure and for this reason our R&D give free hand our users, so actually you can fully define parse rules for your non standard text file. If you needn’t use a parsing definition, because your file is simple text file e.g. this is an EULA, you should select “Plain text” option in the Project Wizard. Otherwise, select the “User defined text file”.  Below are 4 examples of text definitions for different text files.

Example 1

Let’s assume that a text file is similar to standard INI file with keys, separators and values:


For this simple structure Sisulizer should automatically detect and set correct structure, so you need only click “Next” button.



Example 2

In our next example original and translation (identical with original but after localization replaced by strings in targeted language) are separated by line breaks and all pairs are separated by empty lines.


For such structure we need define detailed rules for context and text items. One context item with default settings item is automatically added to list. For optional editing of existing rule double click it or click “Edit” button. For adding new rule (e.g. text rule) you need click “Add” button and next select “Text”item in “Type” dropdown menu visible in bottom part of opened “Text Rule” dialog. Then you need set appropriated characters before or/and after item. You can type it manually or click “+” button and select item in popup menu.


With our defined rules based on carriage returns and line feeds (items in file are separated by line breaks and empty lines) we can add localization languages, finish the wizard and scan source.


 Example 1 with comments

Text files often contain comments with important information from developers. Comments are usually preceded by special marks, and in our example we used “#”


Commonly used comment marks are “//”, sometimes “#”, “;” or “?”, but there is no fixed syntax , so Sisulizer doesn’t fill automatically context and text items. However, we can use rules from our first example (without comments). You also need define comment.  For doing it you should click on “Comments” tab in bottom part of the wizard window and type comment “#” in line comment field.


That’s all. Now you can only add language and scan file.

Example 2 with comments

There is big difference in defining rules for our second text file, if we add comments. It looks innocently…


… but here we can’t use simple definition with comments, because:

  • Comments are located between empty lines, while our context/text rules use carriage return and line feed, so it can break our rules.
  • Comments occur in irregular order.

But don’t worry, if you know regular expressions, you can find workaround based on regular expressions and define correct rule (Thanks Jaakko for help). Here essential is (^\s*#.*?\r\n|\r\n)+ expression used as “Before item” in context rule.


Sum up:

  • Sisulizer is really flexible tool.
  • Regular expressions are your big friends, so visit them sometimes.


  • As written above, Sisulizer has one predefined key/value rules set, but you can add your own fixed rules via “Tools” menu -> “Platforms” -> “Text…”. If you often localize text file based on this same internal structure, consider this solution – it could really speed up your work.
  • You can export and import definitions to external files, directly via Project Wizard or later via “Format” tab in text source properties dialog.
  • You can learn about regular expressions on Wikipedia, ICU or Regular Expressions Info websites.

Leave a Reply