Extends
Methods
# charAt(index) → {IString}
Same as String.charAt()
Parameters:
Name | Type | Description |
---|---|---|
index |
number
|
the index of the character being sought |
- Inherited From:
the character at the given index
# charCodeAt(index) → {number}
Same as String.charCodeAt(). This only reports on 2-byte UCS-2 Unicode values, and does not take into account supplementary characters encoded in UTF-16. If you would like to take account of those characters, use codePointAt() instead.
Parameters:
Name | Type | Description |
---|---|---|
index |
number
|
the index of the character being sought |
- Inherited From:
the character code of the character at the given index in the string
number
# codePointAt(index) → {number}
Return the code point at the given index when the string is viewed as an array of code points. If the index is beyond the end of the array of code points or if the index is negative, -1 is returned.
Parameters:
Name | Type | Description |
---|---|---|
index |
number
|
index of the code point |
- Inherited From:
code point of the character at the given index into the string
number
# codePointLength() → {number}
Return the number of code points in this string. This may be different than the number of characters, as the UTF-16 encoding that Javascript uses for its basis returns surrogate pairs separately. Two 2-byte surrogate characters together make up one character/code point in the supplementary character planes. If your string contains no characters in the supplementary planes, this method will return the same thing as the length() method.
- Inherited From:
the number of code points in this string
number
# concat(strings) → {IString}
Same as String.concat()
Parameters:
Name | Type | Description |
---|---|---|
strings |
string
|
strings to concatenate to the current one |
- Inherited From:
a concatenation of the given strings
# ellipsize(length) → {string}
Truncate the current string at the given number of glyphs and add an ellipsis to indicate that is more to the string. The ellipsis forms the last character in the string, so the string is actually truncated at length-1 glyphs.
Parameters:
Name | Type | Description |
---|---|---|
length |
number
|
the number of whole glyphs to keep in the string including the ellipsis |
- Inherited From:
a string truncated to the requested number of glyphs with an ellipsis
string
# endsWith() → {boolean}
Same as String.endsWith().
- Inherited From:
true if the given characters are found at the end of the string, and false otherwise
boolean
# forEach(callback)
Call the callback with each character in the string one at a time, taking care to step through the surrogate pairs in the UTF-16 encoding properly.
The standard Javascript String's charAt() method only returns a particular 16-bit character in the UTF-16 encoding scheme. If the index to charAt() is pointing to a low- or high-surrogate character, it will return the surrogate character rather than the the character in the supplementary planes that the two surrogates together encode. This function will call the callback with the full character, making sure to join two surrogates into one character in the supplementary planes where necessary.
Parameters:
Name | Type | Description |
---|---|---|
callback |
function
|
a callback function to call with each full character in the current string |
- Inherited From:
# forEachCodePoint(callback)
Call the callback with each numeric code point in the string one at a time, taking care to step through the surrogate pairs in the UTF-16 encoding properly.
The standard Javascript String's charCodeAt() method only returns information about a particular 16-bit character in the UTF-16 encoding scheme. If the index to charCodeAt() is pointing to a low- or high-surrogate character, it will return the code point of the surrogate character rather than the code point of the character in the supplementary planes that the two surrogates together encode. This function will call the callback with the full code point of each character, making sure to join two surrogates into one code point in the supplementary planes.
Parameters:
Name | Type | Description |
---|---|---|
callback |
function
|
a callback function to call with each code point in the current string |
- Inherited From:
# format(params)
Format this string instance as a message, replacing the parameters with the given values.
The string can contain any text that a regular Javascript string can contain. Replacement parameters have the syntax:
{name}
Where "name" can be any string surrounded by curly brackets. The value of "name" is taken from the parameters argument.
Example:
var str = new IString("There are {num} objects."); console.log(str.format({ num: 12 });
Would give the output:
There are 12 objects.
If a property is missing from the parameter block, the replacement parameter substring is left untouched in the string, and a different set of parameters may be applied a second time. This way, different parts of the code may format different parts of the message that they happen to know about.
Example:
var str = new IString("There are {num} objects in the {container}."); console.log(str.format({ num: 12 });
Would give the output:
There are 12 objects in the {container}.
The result can then be formatted again with a different parameter block that specifies a value for the container property.
Parameters:
Name | Type | Description |
---|---|---|
params |
a Javascript object containing values for the replacement parameters in the current string |
- Inherited From:
a new IString instance with as many replacement parameters filled out as possible with real values.
# formatChoice(argIndex, params) → {string}
Format a string as one of a choice of strings dependent on the value of a particular argument index or array of indices.
The syntax of the choice string is as follows. The string contains a series of choices separated by a vertical bar character "|". Each choice has a value or range of values to match followed by a hash character "#" followed by the string to use if the variable matches the criteria.
Example string:
var num = 2; var str = new IString("0#There are no objects.|1#There is one object.|2#There are {number} objects."); console.log(str.formatChoice(num, { number: num }));
Gives the output:
"There are 2 objects."
The strings to format may contain replacement variables that will be formatted using the format() method above and the params argument as a source of values to use while formatting those variables.
If the criterion for a particular choice is empty, that choice will be used as the default one for use when none of the other choice's criteria match.
Example string:
var num = 22; var str = new IString("0#There are no objects.|1#There is one object.|#There are {number} objects."); console.log(str.formatChoice(num, { number: num }));
Gives the output:
"There are 22 objects."
If multiple choice patterns can match a given argument index, the first one encountered in the string will be used. If no choice patterns match the argument index, then the default choice will be used. If there is no default choice defined, then this method will return an empty string.
Special Syntax
For any choice format string, all of the patterns in the string should be of a single type: numeric, boolean, or string/regexp. The type of the patterns is determined by the type of the argument index parameter.
If the argument index is numeric, then some special syntax can be used in the patterns to match numeric ranges.
- >x - match any number that is greater than x
- >=x - match any number that is greater than or equal to x
- <x - match any number that is less than x
- <=x - match any number that is less than or equal to x
- start-end - match any number in the range [start,end)
- zero - match any number in the class "zero". (See below for a description of number classes.)
- one - match any number in the class "one"
- two - match any number in the class "two"
- few - match any number in the class "few"
- many - match any number in the class "many"
- other - match any number in the other or default class
A number class defines a set of numbers that receive a particular syntax in the strings. For example, in Slovenian, integers ending in the digit "1" are in the "one" class, including 1, 21, 31, ... 101, 111, etc. Similarly, integers ending in the digit "2" are in the "two" class. Integers ending in the digits "3" or "4" are in the "few" class, and every other integer is handled by the default string.
The definition of what numbers are included in a class is locale-dependent. They are defined in the data file plurals.json. If your string is in a different locale than the default for ilib, you should call the setLocale() method of the string instance before calling this method.
Other Pattern Types
If the argument index is a boolean, the string values "true" and "false" may appear as the choice patterns.
If the argument index is of type string, then the choice patterns may contain regular expressions, or static strings as degenerate regexps.
Multiple Indexes
If you have 2 or more indexes to format into a string, you can pass them as an array. When you do that, the patterns to match should be a comma-separate list of patterns as per the rules above.
Example string:
var str = new IString("zero,zero#There are no objects on zero pages.|one,one#There is 1 object on 1 page.|other,one#There are {number} objects on 1 page.|#There are {number} objects on {pages} pages."); var num = 4, pages = 1; console.log(str.formatChoice([num, pages], { number: num, pages: pages }));
Gives the output:
"There are 4 objects on 1 page."
Note that when there is a single index, you would typically leave the pattern blank to indicate the default choice. When there are multiple indices, sometimes one of the patterns has to be the default case when the other is not. Rather than leaving one or more of the patterns blank with commas that look out-of-place in the middle of it, you can use the word "other" to indicate a match with the default or other choice. The above example shows the use of the "other" pattern. That said, you are allowed to leave the pattern blank if you so choose. In the example above, the pattern for the third string could easily have been written as ",one" instead of "other,one" and the result will be the same.
Parameters:
Name | Type | Description |
---|---|---|
argIndex |
*
|
Array.<*>
|
The index into the choice array of the current parameter, or an array of indices |
params |
Object
|
The hash of parameter values that replace the replacement variables in the string
|
- Inherited From:
"syntax error in choice format pattern: " if there is a syntax error
the formatted string
string
# getLocale() → {string}
Return the locale to use when processing choice formats. The locale affects how number classes are interpretted. In some cultures, the limit "few" maps to "any integer that ends in the digits 2 to 9" and in yet others, "few" maps to "any integer that ends in the digits 3 or 4".
- Inherited From:
localespec to use when processing choice formats with this string
string
# includes() → {boolean}
Same as String.includes().
- Inherited From:
true if the search string is found anywhere with the given string, and false otherwise
boolean
# indexOf(searchValue, start) → {number}
Same as String.indexOf()
Parameters:
Name | Type | Description |
---|---|---|
searchValue |
string
|
string to search for |
start |
number
|
index into the string to start searching, or undefined to search the entire string |
- Inherited From:
index into the string of the string being sought, or -1 if the string is not found
number
# iterator() → {Object}
Return an iterator that will step through all of the characters in the string one at a time and return their code points, taking care to step through the surrogate pairs in UTF-16 encoding properly.
The standard Javascript String's charCodeAt() method only returns information about a particular 16-bit character in the UTF-16 encoding scheme. If the index is pointing to a low- or high-surrogate character, it will return a code point of the surrogate character rather than the code point of the character in the supplementary planes that the two surrogates together encode.
The iterator instance returned has two methods, hasNext() which returns true if the iterator has more code points to iterate through, and next() which returns the next code point as a number.
- Inherited From:
an iterator that iterates through all the code points in the string
Object
# lastIndexOf(searchValue, start) → {number}
Same as String.lastIndexOf()
Parameters:
Name | Type | Description |
---|---|---|
searchValue |
string
|
string to search for |
start |
number
|
index into the string to start searching, or undefined to search the entire string |
- Inherited From:
index into the string of the string being sought, or -1 if the string is not found
number
# match(regexp) → {Array.<string>}
Same as String.match()
Parameters:
Name | Type | Description |
---|---|---|
regexp |
string
|
the regular expression to match |
- Inherited From:
an array of matches
Array.<string>
# matchAll(regexp) → {iterator}
Same as String.matchAll()
Parameters:
Name | Type | Description |
---|---|---|
regexp |
string
|
the regular expression to match |
- Inherited From:
an iterator of the matches
iterator
# normalize(form) → {IString}
Perform the Unicode Normalization Algorithm upon the string and return the resulting new string. The current string is not modified.
Forms
The forms of possible normalizations are defined by the Unicode Standard Annex (UAX) 15. The form parameter is a string that may have one of the following values:
- nfd - Canonical decomposition. This decomposes characters into their exactly equivalent forms. For example, "ü" would decompose into a "u" followed by the combining diaeresis character.
- nfc - Canonical decomposition followed by canonical composition. This decomposes and then recomposes character into their shortest exactly equivalent forms by recomposing as many combining characters as possible. For example, "ü" followed by a combining macron character would decompose into a "u" followed by the combining macron characters the combining diaeresis character, and then be recomposed into the u with macron and diaeresis "ṻ" character. The reason that the "nfc" form decomposes and then recomposes is that combining characters have a specific order under the Unicode Normalization Algorithm, and partly composed characters such as the "ü" followed by combining marks may change the order of the combining marks when decomposed and recomposed.
- nfkd - Compatibility decomposition. This decomposes characters into compatible forms that may not be exactly equivalent semantically, as well as performing canonical decomposition as well. For example, the "œ" ligature character decomposes to the two characters "oe" because they are compatible even though they are not exactly the same semantically.
- nfkc - Compatibility decomposition followed by canonical composition. This decomposes characters into compatible forms, then recomposes characters using the canonical composition. That is, it breaks down characters into the compatible forms, and then recombines all combining marks it can with their base characters. For example, the character "ǽ" would be normalized to "aé" by first decomposing the character into "a" followed by "e" followed by the combining acute accent combining mark, and then recomposed to an "a" followed by the "e" with acute accent.
Operation
Two strings a and b can be said to be canonically equivalent if normalize(a) = normalize(b) under the nfc normalization form. Two strings can be said to be compatible if normalize(a) = normalize(b) under the nfkc normalization form.
The canonical normalization is often used to see if strings are equivalent to each other, and thus is useful when implementing parsing algorithms or exact matching algorithms. It can also be used to ensure that any string output produces a predictable sequence of characters.
Compatibility normalization does not always preserve the semantic meaning of all the characters, although this is sometimes the behaviour that you are after. It is useful, for example, when doing searches of user-input against text in documents where the matches are supposed to "fuzzy". In this case, both the query string and the document string would be mapped to their compatibility normalized forms, and then compared.
Compatibility normalization also does not guarantee round-trip conversion to and from legacy character sets as the normalization is "lossy". It is akin to doing a lower- or upper-case conversion on text -- after casing, you cannot tell what case each character is in the original string. It is good for matching and searching, but it rarely good for output because some distinctions or meanings in the original text have been lost.
Note that W3C normalization for HTML also escapes and unescapes HTML character entities such as "ü" for u with diaeresis. This method does not do such escaping or unescaping. If normalization is required for HTML strings with entities, unescaping should be performed on the string prior to calling this method.
Data
Normalization requires a fair amount of mapping data, much of which you may not need for the characters expected in your texts. It is possible to assemble a copy of ilib that saves space by only including normalization data for those scripts that you expect to encounter in your data.
The normalization data is organized by normalization form and within there by script. To include the normalization data for a particular script with a particular normalization form, use the following require:
NormString.init({
form: "<form>",
script: "<script>"
});
Where <form> is the normalization form ("nfd", "nfc", "nfkd", or "nfkc"), and <script> is the ISO 15924 code for the script you would like to support. Example: to load in the NFC data for Cyrillic, you would use:
NormString.init({
form: "nfc",
script: "Cyrl"
});
Note that because certain normalization forms include others in their algorithm, their data also depends on the data for the other forms. For example, if you include the "nfc" data for a script, you will automatically get the "nfd" data for that same script as well because the NFC algorithm does NFD normalization first. Here are the dependencies:
- NFD -> no dependencies
- NFC -> NFD
- NFKD -> NFD
- NFKC -> NFKD, NFD, NFC
A special value for the script dependency is "all" which will cause the data for all scripts to be loaded for that normalization form. This would be useful if you know that you are going to normalize a lot of multilingual text or cannot predict which scripts will appear in the input. Because the NFKC form depends on all others, you can get all of the data for all forms automatically by depending on "nfkc/all.js". Note that the normalization data for practically all script automatically depend on data for the Common script (code "Zyyy") which contains all of the characters that are commonly used in many different scripts. Examples of characters in the Common script are the ASCII punctuation characters, or the ASCII Arabic numerals "0" through "9".
By default, none of the data for normalization is automatically included in the preassembled ilib files. (For size "full".) If you would like to normalize strings, you must assemble your own copy of ilib and explicitly include the normalization data for those scripts. This normalization method will produce output, even without the normalization data. However, the output will be simply the same thing as its input for all scripts except Korean Hangul and Jamo, which are decomposed and recomposed algorithmically and therefore do not rely on data.
If characters are encountered for which there are no normalization data, they will be passed through to the output string unmodified.
Parameters:
Name | Type | Description |
---|---|---|
form |
string
|
The normalization form requested |
- Overrides:
a new instance of an IString that has been normalized according to the requested form. The current instance is not modified.
# padEnd() → {string}
Same as String.padEnd().
- Inherited From:
a string of the specified length with the pad string applied at the end of the current string
string
# padStart() → {string}
Same as String.padStart().
- Inherited From:
a string of the specified length with the pad string applied at the end of the current string
string
# repeat() → {string}
Same as String.repeat().
- Inherited From:
a new string containing the specified number of copies of the given string
string
# replace(searchValue, newValue) → {IString}
Same as String.replace()
Parameters:
Name | Type | Description |
---|---|---|
searchValue |
string
|
a regular expression to search for |
newValue |
string
|
the string to replace the matches with |
- Inherited From:
a new string with all the matches replaced with the new value
# search(regexp) → {number}
Same as String.search()
Parameters:
Name | Type | Description |
---|---|---|
regexp |
string
|
the regular expression to search for |
- Inherited From:
position of the match, or -1 for no match
number
# setLocale(locale, syncopt, loadParamsopt, onLoadopt)
Set the locale to use when processing choice formats. The locale affects how number classes are interpretted. In some cultures, the limit "few" maps to "any integer that ends in the digits 2 to 9" and in yet others, "few" maps to "any integer that ends in the digits 3 or 4".
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
locale |
Locale
|
string
|
locale to use when processing choice formats with this string |
|
sync |
boolean
|
<optional> |
[optional] whether to load the locale data synchronously or not |
loadParams |
Object
|
<optional> |
[optional] parameters to pass to the loader function |
onLoad |
function
|
<optional> |
[optional] function to call when the loading is done |
- Inherited From:
# slice(start, end) → {IString}
Same as String.slice()
Parameters:
Name | Type | Description |
---|---|---|
start |
number
|
first character to include in the string |
end |
number
|
include all characters up to, but not including the end character |
- Inherited From:
a slice of the current string
# split(separator, limit) → {Array.<string>}
Same as String.split()
Parameters:
Name | Type | Description |
---|---|---|
separator |
string
|
regular expression to match to find separations between the parts of the text |
limit |
number
|
maximum number of items in the final output array. Any items beyond that limit will be ignored. |
- Inherited From:
the parts of the current string split by the separator
Array.<string>
# startsWith() → {boolean}
Same as String.startsWith().
- Inherited From:
true if the given characters are found at the beginning of the string, and false otherwise
boolean
# substr(start, length) → {IString}
Same as String.substr()
Parameters:
Name | Type | Description |
---|---|---|
start |
number
|
the index of the character that should begin the returned substring |
length |
number
|
the number of characters to return after the start character. |
- Inherited From:
the requested substring
# substring(from, to) → {IString}
Same as String.substring()
Parameters:
Name | Type | Description |
---|---|---|
from |
number
|
the index of the character that should begin the returned substring |
to |
number
|
the index where to stop the extraction. If omitted, extracts the rest of the string |
- Inherited From:
the requested substring
# toLocaleLowerCase() → {string}
Same as String.toLocaleLowerCase(). If the JS engine does not support this method, you can use the ilib CaseMapper class instead.
- Inherited From:
a new string representing the calling string converted to lower case, according to any locale-sensitive case mappings
string
# toLocaleUpperCase() → {string}
Same as String.toLocaleUpperCase(). If the JS engine does not support this method, you can use the ilib CaseMapper class instead.
- Inherited From:
a new string representing the calling string converted to upper case, according to any locale-sensitive case mappings
string
# toLowerCase() → {IString}
Same as String.toLowerCase(). Note that this method is not locale-sensitive.
- Inherited From:
a string with the first character lower-cased
# toString() → {string}
Same as String.toString()
- Inherited From:
this instance as regular Javascript string
string
# toUpperCase() → {IString}
Same as String.toUpperCase(). Note that this method is not locale-sensitive. Use toLocaleUpperCase() instead to get locale-sensitive behaviour.
- Inherited From:
a string with the first character upper-cased
# trim() → {string}
Same as String.trim().
- Inherited From:
a new string representing the calling string stripped of whitespace from both ends.
string
# trimEnd() → {string}
Same as String.trimEnd().
- Inherited From:
a new string representing the calling string stripped of whitespace from its (right) end.
string
# trimLeft() → {string}
Same as String.trimLeft().
- Inherited From:
A new string representing the calling string stripped of whitespace from its beginning (left end).
string
# trimRight() → {string}
Same as String.trimRight().
- Inherited From:
a new string representing the calling string stripped of whitespace from its (right) end.
string
# trimStart() → {string}
Same as String.trimStart().
- Inherited From:
A new string representing the calling string stripped of whitespace from its beginning (left end).
string
# truncate(length) → {string}
Truncate the current string at the given number of whole glyphs and return the resulting string.
Parameters:
Name | Type | Description |
---|---|---|
length |
number
|
the number of whole glyphs to keep in the string |
- Inherited From:
a string truncated to the requested number of glyphs
string
# valueOf() → {string}
Same as String.valueOf()
- Inherited From:
this instance as a regular Javascript string
string
# static init(options)
Initialize the normalized string routines statically. This is intended to be called in a dynamic-load version of ilib to load the data needed to normalize strings before any instances of NormString are created.
The options parameter may contain any of the following properties:
- form - {string} the normalization form to load
- script - {string} load the normalization for this script. If the script is given as "all" then the normalization data for all scripts is loaded at the same time
- sync - {boolean} whether to load the files synchronously or not
- loadParams - {Object} parameters to the loader function
- onLoad - {function()} a function to call when the files are done being loaded
Parameters:
Name | Type | Description |
---|---|---|
options |
Object
|
an object containing properties that govern how to initialize the data |