Class Charmap
Create a new default character set mapping instance. This class is the parent class of all of the charmapping subclasses, and only implements basic US-ASCII mapping. The subclasses implement all other charsets, some algorithmically, and some in a table-based way. Use CharmapFactory to create the correct subclass instance for the desired charmap.
All mappings are done to or from Unicode in the UTF-16 encoding, which is the base character set and encoding used by Javascript itself. In order to convert between two non-Unicode character sets, you must chain two charmap instances together to first map to Unicode and then back to the second charset.
The options parameter controls which mapping is constructed and its behaviours. The current list of supported options are:
- missing - specify what to do if a mapping is missing for a particular
character. For example, if you are mapping Unicode characters to a particular native
character set that does not support particular Unicode characters, the mapper will
follow the behaviour specified in this property. Valid values are:
- skip - skip any characters that do not exist in the target charset
- placeholder - put a static placeholder character in the output string wherever there is an unknown character in the input string. Use the placeholder parameter to specify which character to use in this case
- escape - use an escape sequence to represent the unknown character
- placeholder - specify the placeholder character to use when the mapper cannot map a particular input character to the output string. If this option is not specified, then the '?' (question mark) character is used where possible.
- escapeStyle - what style of escape sequences should be used to
escape unknown characters in the input when mapping to native, and what
style of espcae sequences should be parsed when mapping to Unicode. Valid
values are:
- html - Escape the characters as HTML entities. This would use the standard HTML 5.0 (or later) entity names where possible, and numeric entities in all other cases. Eg. an "e" with an acute accent would be "é"
- js - Use the Javascript escape style. Eg. an "e" with an acute accent would be "\u00E9". This can also be specified as "c#" as it uses a similar escape syntax.
- c - Use the C/C++ escape style, which is similar to the the Javascript style, but uses an "x" in place of the "u". Eg. an "e" with an acute accent would be "\x00E9". This can also be specified as "c++".
- java - Use the Java escape style. This is very similar to the the Javascript style, but the backslash has to be escaped twice. Eg. an "e" with an acute accent would be "\\u00E9". This can also be specified as "ruby", as Ruby uses a similar escape syntax with double backslashes.
- perl - Use the Perl escape style. Eg. an "e" with an acute accent would be "\N{U+00E9}"
Defined in: Charmap.js.
Constructor Attributes | Constructor Name and Description |
---|---|
Charmap(options)
|
Method Attributes | Method Name and Description |
---|---|
getName()
Return the standard name of this charmap.
|
|
mapToNative(string)
Map a string to the native character set.
|
|
mapToUnicode(bytes)
Map a native string to the standard Javascript charset of UTF-16.
|
- Parameters:
- {Object=} options
- options which govern the construction of this instance
- Returns:
- {string} the name of the locale's language in English
- Parameters:
- {string|IString} string
- string to map to a different character set.
- Returns:
- {Uint8Array} An array of bytes representing the string in the native character set
- Parameters:
-
{Array.
|Uint8Array} bytes - bytes to map to a Unicode string
- Returns:
- {string} A string in the standard Javascript charset UTF-16