Switch to: V9V8V7V6V5

Valentina Kernel Locale Settings

About ICU library

Valentina uses unicode engine ICU of IBM Inc. This is the most famous piece of software in the world for unicode. Thanks to this Valentina is able to work with UTF8, UTF16 and 200+ other world encodings. On the HOME page of ICU http://ibm.com/software/globalization/icu you will find a lots of detailed information, cool on-line demo products where you can play with different settings of major ICU methods. Note, Apple Inc. uses ICU library inside of Mac OS X.

Valentina Kernel Encodings

Internally Valentina engine always work with UTF16 format.

Talking about encodings we should differ:

  • Storage Encoding - specifies the encoding of storing of text in the disk files.
  • IO Encoding - specifies the encoding of text, which you give or get back to Valentina engine using this or that method

For example, one user get/set strings from PC using Cyrillic Win, other user get/set strings from MAC using Cyrilic Mac. Valentina convert strings into UTF16 for internal processing and use e.g. UTF-16 to store strings on disk.

Working with Valentina Server each client on connection must specify what IO Encoding for strings should be used.

Locale Properties

Each VDatabase, VTable and VField object of Valentina have 3 properties related to localization:

StorageEncoding as String
LocaleName as String
CollationAttribute( EVColAttribute ) as EVColAttributeValue

Information about locale parameters for each object is stored in the system tables.

See API description of each this class in the corresponded section of API Reference.

Storage Encoding

Valentina create a new databse in UTF-16 encoding on default. It is the recommended option, although you can try tune your database taking into account the following.

If database contains a MacWestern language, then UTF8 can be the best choice. Because in this case one letter will use one byte on disk.

UTF8 for e.g. Cyrillic will not be the best choice, because Russian letters in the UTF8 format use 2 bytes per letter. For Cyrillic the best choice from the point of view of size will be Cyrillic Win or Cyrillic Mac depending on the platform hosting.

Those who use such languages as Japanese, Chinese, Korean, will prefer UTF16 encoding for storage encoding(as well as for IO).

NOTE: UTF8 as storage encoding should be avoided for non-Western single byte languages. At least for fixed-size strings. Because UTF8 encoding may use for one letter 1, 2, 3 bytes. This is not so problematic for VarChar (which can be up to 4088 bytes length) and absolutely non problematic for TEXT fields.

You can have in the database Tables/Fields with different storage encoding. You may wish to do this if you store different languages, although, probably, it is better just to use universal UTF-16 for such tasks.

CollationAttribute

Collation Attribute affects how Valentina sort and compare strings.

There are several collation attributes (details about each you can find in the ICU documentation).

kFrenchCollation		
kAlternateHandling		
kCaseFirst			
kCaseLevel			
kNormalizationMode 		
kStrength			
kHiraganaQuaternaryMode 	
kNumericCollation

The most interesting for developer is attribute kStrength. There are the following values for this attribute:

kPrimary   = 0	- ignore accents and case                   role = Role = rôle
kSecondary = 1	- ignore case but differ accents            role = Role < rôle
kTertiary  = 2	- differ case and accents                   role < Role < rôle

There can be also interesting kNumericCollation. If you set this ON, then strings containing numbers will be sorted as numbers. On default it is OFF.

Inheritance of Locale Parameters

It is important to know that hierarchy

Database -> Table -> Field  

provides inheritance of Locale parameters (StorageEncoding, LocaleName, CollationAtribute). 2 aspects of this behavior are shown on the picture.

a) Assume you have assign to database the StorageEncoding UTF8. Now if you create Table in the database then Table will inherit the value of StorageEncoding from database. The same is true for field creation. See the left part of picture.

b) If you have existed objects and change parameter for some top object then this change is propagated down by hierarchy. See the right part of picture.

locale_creationofobjects_.jpg

If some object that is lower by hierarchy already has assigned value, then this object does not accept propagated changes from parent. The next picture shows 6 steps that explain several cases:

  1. all objects have UTF16 encoding;
  2. the f1 field is assigned UTF8 encoding;
  3. Database get Latin1 encoding and propagate it to child objects. But field f1 do not accept this change.
  4. Database get UTF8 encoding and propagate it to child objects. But field f1 do not accept this change (although now all objects have UTF8).
  5. Database get UTF16 encoding and propagate it to child objects. But field f1 do not accept this change.
  6. Field f1 get NULL encoding, i.e. it should forget own encoding and start to use the parent encoding.

locale_inheritance.jpg