Class

CharmapTable

CharmapTable(optionsopt)

Create a new character set mapping instance using based on a trie table. Charmap instances map strings to other character sets. The charsets can be of any type, single-byte, multi-byte, shifting, etc.

All mappings are done to or from Unicode in the UTF-16 encoding, which is the base character set and encoding used by Javascript itself. In order to convert between two non-Unicode character sets, you must chain two charmap instances together to first map to Unicode and then back to the second charset.

The options parameter controls which mapping is constructed and its behaviours. The current list of supported options are:

  • charset - the name of the native charset to map to or from. This can be given as an Charset instance or as a string that contains any commonly used name for the character set, which is normalized to a standard IANA name. If a name is not given, this class will default to the Western European character set called ISO-8859-15.
  • missing - specify what to do if a mapping is missing for a particular character. For example, if you are mapping Unicode characters to a particular native character set that does not support particular Unicode characters, the mapper will follow the behaviour specified in this property. Valid values are:
    • skip - skip any characters that do not exist in the target charset
    • placeholder - put a static placeholder character in the output string wherever there is an unknown character in the input string. Use the placeholder parameter to specify which character to use in this case
    • escape - use an escape sequence to represent the unknown character
    The default value for the missing property if not otherwise specified is "escape" so that information is not lost.
  • placeholder - specify the placeholder character to use when the mapper cannot map a particular input character to the output string. If this option is not specified, then the '?' (question mark) character is used where possible.
  • escapeStyle - what style of escape sequences should be used to escape unknown characters in the input when mapping to native, and what style of espcae sequences should be parsed when mapping to Unicode. Valid values are:
    • html - Escape the characters as HTML entities. This would use the standard HTML 5.0 (or later) entity names where possible, and numeric entities in all other cases. Eg. an "e" with an acute accent would be "é"
    • js - Use the Javascript escape style. Eg. an "e" with an acute accent would be "\u00E9". This can also be specified as "c#" as it uses a similar escape syntax.
    • c - Use the C/C++ escape style, which is similar to the the Javascript style, but uses an "x" in place of the "u". Eg. an "e" with an acute accent would be "\x00E9". This can also be specified as "c++".
    • java - Use the Java escape style. This is very similar to the the Javascript style, but the backslash has to be escaped twice. Eg. an "e" with an acute accent would be "\\u00E9". This can also be specified as "ruby", as Ruby uses a similar escape syntax with double backslashes.
    • perl - Use the Perl escape style. Eg. an "e" with an acute accent would be "\N{U+00E9}"
    The default if this style is not specified is "js" for Javascript.
  • onLoad - a callback function to call when this object is fully loaded. When the onLoad option is given, this class will attempt to load any missing data using the ilib loader callback. When the constructor is done (even if the data is already preassembled), the onLoad function is called with the current instance as a parameter, so this callback can be used with preassembled or dynamic loading or a mix of the two.
  • sync - tell whether to load any missing data synchronously or asynchronously. If this option is given as "false", then the "onLoad" callback must be given, because the instance returned from this constructor will not be usable for a while.
  • loadParams - an object containing parameters to pass to the loader callback function when data is missing. The parameters are not interpretted or modified in any way. They are simply passed along. The object may contain any property/value pairs as long as the calling code is in agreement with the loader callback function as to what those parameters mean.

If this copy of ilib is pre-assembled and all the data is already available, or if the data was already previously loaded, then this constructor will call the onLoad callback immediately when the initialization is done. If the onLoad option is not given, this class will only attempt to load any missing data synchronously.

Constructor

# new CharmapTable(optionsopt)

Parameters:
Name Type Attributes Description
options Object <optional>

options which govern the construction of this instance

See:
  • ilib#setLoaderCallback for information about registering a loader callback instance

View Source CharmapTable.js, line 123

Extends

Members

Methods

# getName() → {string}

Return the standard name of this charmap. All charmaps map from Unicode to the native charset, so the name returned from this function corresponds to the native charset.

Inherited From:

View Source Charmap.js, line 153

the name of the locale's language in English

string

# mapToNative(string) → {Uint8Array}

Map a string to the native character set. This string may be given as an intrinsic Javascript string object or an IString object.

Parameters:
Name Type Description
string string | IString

string to map to a different character set.

Overrides:

View Source CharmapTable.js, line 253

An array of bytes representing the string in the native character set

Uint8Array

# mapToUnicode(bytes) → {string}

Map a native string to the standard Javascript charset of UTF-16. This string may be given as an array of numbers where each number represents a code point in the "from" charset, or as a Uint8Array array of bytes representing the bytes of the string in order.

Parameters:
Name Type Description
bytes Array.<number> | Uint8Array

bytes to map to a Unicode string

Overrides:

View Source CharmapTable.js, line 298

A string in the standard Javascript charset UTF-16

string