Grapheme
Description
A <Grapheme> in VAST represents a user-perceived character and is the basic unit of a VAST <UnicodeString>.
There are only a few programming languages with Unicode Support that consider a character to represent the written expression of a character in the way a user might see it on a screen, rather than a digital code point. While the visual expression would be called a Glyph, the digital expression of this concept is called a "grapheme cluster".
In VAST, we call this a <Grapheme>. The <Grapheme> is logically composed of one or more <UnicodeScalar>s. It is identified using extended grapheme cluster boundary algorithms from Text Segmentation in the Unicode Standard.
The VAST <Grapheme> will abstract the details of how to group enough Unicode scalars together to form what we would think of as a character on the screen. It also abstracts the details regarding normalization. Normalization is problematic simply because it can create multiple binary representations of what is really the same string or character. Therefore, concepts like comparison and hashing will give incorrect results, if unhandled.
However, <Grapheme> handles these details transparently. It detects and ensures a common normalized form for various operations where these differences would matter, so the user can focus on programming, not worrying about what normalization form a string is in.
<Grapheme> is the best Unicode analog to the standard Smalltalk <Character> class. A ü always maps to one <Grapheme>, even though as described, it may be logically composed of 1 or 2 Unicode scalars. The original form is maintained for various reasons and do not implicitly convert a <Grapheme>'s internals from one encoded normalization form to another.
A <Grapheme> is also able to describe many of the same properties from the Unicode Character Database that a <UnicodeScalar> can.
Class State
  • asciiGraphemes: <Array> of <Grapheme> objects. The first 128 code points in unicode exactly match ASCII.
Because of their frequency of use, the virtual machine will refer to objects in this grapheme cache to reduce object allocation and increase performance.
  • crlf: <Grapheme>. crlf is the only grapheme in the ASCII range that is composed of 2 code points (Cr Lf) instead
of 1. It is cached in its own variable slot.
Creation
A <Grapheme> is typically created automatically while cursoring through the #graphemes view of a <UnicodeString>, <String> or <ByteArray> object. It also is created by the normal iteration methods of a <UnicodeString> since the elements are <Grapheme>s.
| firstGrapheme |
firstGrapheme := 'abc' asUnicodeString graphemes next.
self assert: [firstGrapheme = $a].
However, you can manually create a <Grapheme> by using the APIs from the Creation categories on the class side. Additionally, there are #asGrapheme extension methods provided in the system. Below are some examples of both.
"From Integer, which is interpreted as the unicode code point value"
self assert: [97 asGrapheme = (Grapheme value: 97)].

"From Character, which performs any necessary code page conversion
if the value is > 7-bit ascii range"
self assert: [97 asCharacter asGrapheme = (Grapheme value: 97)].

"From UTF-8 bytes"
self assert: [(Grapheme utf8: #[97]) = $a].

"From UTF-16 bytes (platform/little/big-endian)"
self assert: [(Grapheme utf16LE: #[97 0]) = (Grapheme utf16BE: #[0 97])].
self assert: [(Grapheme utf16LE: #[97 0]) = $a].

"From UTF-32 bytes (platform/little/big-endian)"
self assert: [(Grapheme utf32LE: #[97 0 0 0]) = (Grapheme utf32BE: #[0 0 0 97])].
self assert: [(Grapheme utf32LE: #[97 0 0 0]) = $a].

"The #value: API can accept an <Integer> or a <Character>"
self assert: [(Grapheme value: 97) = $a].
self assert: [(Grapheme value: $a) = $a].

"The #value: API can also accept a <String> or <UnicodeString> describing the grapheme in escaped syntax"
"See Grapheme class>>value: method comments for a complete list of escapes"
self assert: [(Grapheme value: '\r\n') = Grapheme crlf].
self assert: [
(Grapheme value: '\u{1F62E}\u{200D}\u{1F4A8}') name = 'FACE WITH OPEN MOUTH,ZERO WIDTH JOINER,DASH SYMBOL'].

"Factory methods for commonly used graphemes"
self assert: [Grapheme cr = Character cr].
self assert: [Grapheme lf = Character lf].
self assert: [Grapheme crlf = String crlf graphemes first].

"Special replacement character, which is used anywhere unicode content must be repaired"
self assert: [(UnicodeString utf8: #[255] repair: true) graphemes first = Grapheme replacementCharacter]
Properties
A <Grapheme> has many properties that are defined by the Unicode Standard. These can be found in the Properties category on the instance side.
  • #isAscii - Boolean indicating if the grapheme is in ASCII range
  • #isAsciiAlphabetic - Boolean indicating if the grapheme is an ASCII alphabetic char
  • #isAsciiAlphaNumeric - Boolean indicating if the grapheme is an ASCII alphabetic or numeric char
  • #isAsciiControl - Boolean indicating if the grapheme is an ASCII control char
  • #isAsciiDigit - Boolean indicating if the grapheme is an ASCII digit
  • #isAsciiGraphic - Boolean indicating if the grapheme is an ASCII graphic character
  • #isAsciiHexDigit - Boolean indicating if the grapheme is an ASCII hex digit
  • #isAsciiLowercase - Boolean indicating if the grapheme is an ASCII lowercase char
  • #isAsciiPunctuation - Boolean indicating if the grapheme is an ASCII punctuation char
  • #isAsciiUppercase - Boolean indicating if the grapheme is an ASCII uppercase char
  • #isAsciiWhitespace - Boolean indicating if the grapheme is an ASCII whitespace char
  • #isCased - Boolean indicating whether the grapheme is either lowercase, uppercase, or titlecase.
  • #isLetter - Boolean indicating whether the grapheme is a letter.
  • #isLowercase - Boolean indicating if the grapheme is considered lowercase
  • #isNewline - Boolean indicating if the grapheme represents whitespace, including newlines
  • #isNumeric - Boolean indicating whether the grapheme has the general category for numbers
  • #isUppercase - Boolean indicating if the grapheme is considered uppercase
  • #isWhitespace - Boolean indicating if the grapheme represents whitespace, including newlines
  • #name - Human readable name of the grapheme, a concatenation of all its unicode scalar names
  • #names - Collection of the grapheme's unicode scalar names
Views
The following views are available for Unicode Scalars. These can be found on the Views category on the instance side.
  • #unicodeScalars - Unicode scalar view of the grapheme
  • #utf8 - UTF-8 encoded view of the grapheme
  • #utf16 - UTF-16 platform-endian encoded view of the grapheme
  • #utf16LE - UTF-16 little-endian encoded view of the grapheme
  • #utf16BE - UTF-16 big-endian encoded view of the grapheme
  • #utf32 - UTF-32 encoded view of the grapheme
  • #utf32LE - UTF-32 little-endian encoded view of the grapheme
  • #utf32BE - UTF-32 big-endian encoded view of the grapheme
Equality/Comparison
A <Grapheme> should be compared to other objects using equality, not identity. The first 128 unicode graphemes in Unicode (i.e., ASCII) are cached in the asciiGraphemes class instance variable. The crlf grapheme is cached in the crlf class instance variable. Because of this, Grapheme crlf or any grapheme in the range [0, 127] will compare by identity, but this is not a property that should be used in code.
"Yes, this is true here..."
self assert: [97 asGrapheme == 97 asGrapheme].

"...but don't ever count on it being true everywhere"
self assert: [300 asGrapheme ~~ 300 asGrapheme].

"Equality/Compare"
self assert: [97 asGrapheme = 97 asGrapheme].
self assert: [97 asGrapheme < 98 asGrapheme].
self assert: [97 asGrapheme <= 97 asGrapheme].
self assert: [98 asGrapheme >= 97 asGrapheme].
self assert: [98 asGrapheme > 97 asGrapheme].

"Special compare method for [-1, 0, 1]"
self assert: [(97 asGrapheme compareTo: 98 asGrapheme) = -1].
self assert: [(97 asGrapheme compareTo: 97 asGrapheme) = 0].
self assert: [(98 asGrapheme compareTo: 97 asGrapheme) = 1].
Conversion
Unicode Component
A <Grapheme> can convert itself to a <UnicodeString>.
self assert: [Grapheme cr asUnicodeString = UnicodeString cr].
UTF Encoding
A <Grapheme> can convert itself to any of the UTF encodings. Platform-endian accessors are provided in the Conversion category on the instance side, but any endian encoding can be accomplished with views.
"Platform-endian"
self assert: [Grapheme cr asUtf8 asByteArray = #[13]].
self assert: [Grapheme cr asUtf16 = (Utf16 with: 13)].
self assert: [Grapheme cr asUtf32 = (Utf32 with: 13)].

"Little/Big endian via views"
self assert: [Grapheme cr utf16LE asByteArray = #[13 0]].
self assert: [Grapheme cr utf16BE asByteArray = #[0 13]].
self assert: [Grapheme cr utf32LE asByteArray = #[13 0 0 0]].
self assert: [Grapheme cr utf32BE asByteArray = #[0 0 0 13]].
Case Mapping
A <Grapheme> can convert itself to its uppercase, lowercase and titlecase form. The result of these conversions is not a <Grapheme>, but rather a <UnicodeString>. Case mapping can change the number of unicode scalars. Depending on how they combine together, this may change the number of <Grapheme>s.
Uppercase
Here is an example that shows why case mapping answers a <UnicodeString>. Consider calling #asUppercase on the German sharp S 16rDF asGrapheme. When this is uppercased, it becomes two graphemes 16r53 (LATIN CAPITAL LETTER S) and 16r53 (LATIN CAPITAL LETTER S).
self assert: [16rDF asGrapheme asUppercase = 'SS' asUnicodeString]
Lowercase
This example was given in the class documentation for <UnicodeScalar>. This example produced two unicode scalars when 16r0130 asUnicodeScalar (LATIN CAPITAL LETTER I WITH DOT ABOVE) was lowercased. However, these two unicode scalars produce one user-perceived character or <Grapheme>. If this lowercase form were rendered as a Glyph on-screen, the user would typically see a small letter i with a combining dot above.
self assert: [16r0130 asGrapheme asLowercase = (Grapheme value: #(16r0069 16r0307)) asUnicodeString].
Titlecase
There are several unicode characters that require special handling when they are used as the initial "character" in the word. One example is 16rFB01 asGrapheme (LATIN SMALL LIGATURE FI). When this is titlecased, it becomes two graphemes 16r46 (LATIN CAPITAL LETTER F) and 16r69 (LATIN SMALL LETTER I)
self assert: [16rFB01 asGrapheme asTitlecase = 'Fi' asUnicodeString]
Character Compatibility
A <Grapheme> can be directly compared with a <Character> object. This is possible because the Smalltalk primitives that implement unicode have general awareness of <Character> and try to work with them where possible.
This compatibility is carried out in three different ways.
  • Primitives: The virtual machine primitives can quickly detect and convert a <Character> if it is in the 7-bit
ASCII range. If it is outside this range, this means that the character represents a value from a particular code page, for which there are many code pages, and they can differ wildly in the upper 128 bytes. Because code page conversion is required in this case, a primitive failure is triggered.
  • Primitive Fail Handlers: The primitive failure handlers in Smalltalk detect if the argument was a <Character> and
code page converts the character to a <Grapheme> and tries the primitive call again (this time with a grapheme argument).
  • Smalltalk Methods: Compatibility methods are provided in the Compatibility category of methods.
Important Note
  • Compatibility relationship is uni-directional. A <Character> does not have direct knowledge of <Grapheme>.
  • A <Grapheme> is NOT an immediate object like <Character>, it is not good practice to use identity ==
Here are the various ways that a <Grapheme> and a <Character> can work together.
"="
self assert: [97 asGrapheme = 97 asCharacter].

"<"
self assert: [97 asGrapheme < 98 asCharacter].

"<="
self assert: [97 asGrapheme <= 97 asCharacter].

">"
self assert: [98 asGrapheme > 97 asCharacter].

">="
self assert: [98 asGrapheme >= 97 asCharacter].

"hash"
self assert: [97 asGrapheme hash = 97 asCharacter hash].

"Only guarantee that 7-bit ascii will hash and = the same
for Character"
| s c |
s := 97 asGrapheme.
c := 97 asCharacter.
self assert: [s hash = c hash].
self assert: [s = c].

s := 159 asGrapheme. "159 unicode code point"
c := 159 asCharacter. "159 double-byte char value"
self assert: [s hash ~= c hash].
self assert: [s ~= c].
Class Methods
<details> backspace
   Answer the grapheme for a backspace.
   
   Answers:
    <Grapheme>
</details>
<details> codePoint:
   Create a new extended grapheme cluster by converting @anInteger
   to an extended grapheme cluster (EGC).  @anInteger is considered
   to be a unicode scalar value.
   
   Examples:
    #'From unicode scalar value'.
    self assert: [(Grapheme codePoint: 16r1F600) value = 16r1F600].
  
   Arguments:
     anInteger -  <Integer> Unicode scalar value
   Answers:
    <Grapheme>
   Raises:
    <Exception> EsPrimErrValueOutOfRange if can not convert anInteger to a Grapheme
</details>
<details> cr
<pre><code>   Answer the grapheme containing a carriage return.
   Answers:     <Grapheme> </code></pre> </details>
<details> crlf
<pre><code>   Answer the grapheme containing a carriage return and a linefeed.    @Note - In Graphemes (digital representation of a user-perceived character), the          crlf is represented a 1 grapheme
   Answers:     <Grapheme> </code></pre> </details>
<details> escape
   Answer the grapheme for an escape.
   
   Answers:
    <Grapheme>
</details>
<details> escaped:
   Create a new extended grapheme cluster by converting @aStringObject
   to an extended grapheme cluster (EGC).
   
   Escaped Strings:
   If @aStringObject is a String or UnicodeString, then the following escapes
   will be parsed to create an extended grapheme cluster:
    Escapes:
      \x53        7-bit character code (exactly 2 digits, up to 0x7F)
      \u{1F600}    24-bit Unicode character code (up to 6 digits)
      \n          Newline (This is the Lf character)
      \r          Carriage return (This is the Cr character)
      \t          Tab
      \\          Backslash
      \0          Nul
   
   Examples:
    #'From a single element string object'.
    self assert: [(Grapheme escaped: '\x53') = $S asGrapheme].
    self assert: [(Grapheme escaped: '\u{1F600}') name = 'GRINNING FACE'].
    self assert: [(Grapheme escaped: '\t') = Grapheme tab].
    self assert: [(Grapheme escaped: '\r\n') = Grapheme crlf].
  
   Arguments:
     aStringObject -  <UnicodeString> unicode string containing escape characters
                  (Compat) <String> Smalltalk code-page string containing escape characters (requires conversion if outside ascii range)
              
   Answers:
    <Grapheme>
   Raises:
    <Exception> EsPrimErrValueOutOfRange if can not convert aStringObject to a Grapheme
</details>
<details> lf
<pre><code>   Answer the grapheme containing a line feed.
   Answers:     <Grapheme> </code></pre> </details>
<details> newPage
   Answer the grapheme for a new page.
   
   Answers:
    <Grapheme>
</details>
<details> replacementCharacter
   Answer the grapheme for the unicode replacement character.
   
   Answers:
    <Grapheme>
</details>
<details> space
   Answer the grapheme for space
   
   Answers:
    <Grapheme>
</details>
<details> tab
   Answer the grapheme for tab
   
   Answers:
    <Grapheme>
</details>
<details> utf16:
<pre><code>   Answer the grapheme constructed from @aByteObject which should be UTF-16 platfrom-endian encoded data.    @aByteObject is validated before any attempt is made to create a unicode string from its data.
   Examples:     self assert: [(Grapheme utf16: 'a' utf16 contents) = $a].         Arguments:     aByteObject - <String | ByteArray> or byte shaped object    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-16 </code></pre> </details>
<details> utf16BE:
<pre><code>   Answer the grapheme constructed from @aByteObject which should be UTF-16 big-endian encoded data.
   Examples:     self assert: [(Grapheme utf16BE: #[0 97]) = $a].         Arguments:     aByteObject - <String | ByteArray>    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-16BE </code></pre> </details>
<details> utf16LE:
<pre><code>   Answer the grapheme constructed from @aByteObject which should be UTF-16 little-endian encoded data.
   Examples:     self assert: [(Grapheme utf16LE: #[97 0]) = $a].         Arguments:     aByteObject - <String | ByteArray>    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-16LE </code></pre> </details>
<details> utf32:
<pre><code>   Answer the grapheme constructed from @aByteObject which should be UTF-32 platfrom-endian encoded data.    @aByteObject is validated before any attempt is made to create a unicode string from its data.
   Examples:     self assert: [(Grapheme utf32: 'a' utf32 contents) = $a].         Arguments:     aByteObject - <String | ByteArray | Utf32> or byte shaped object    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-32 </code></pre> </details>
<details> utf32BE:
<pre><code>   Answer the grapheme constructed from @aByteObject which should be UTF-32 big-endian encoded data.
   Examples:     self assert: [(Grapheme utf32BE: #[0 0 0 97]) = $a].         Arguments:     aByteObject - <String | ByteArray | Utf32>    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-32BE </code></pre> </details>
<details> utf32LE:
<pre><code>   Answer the grapheme constructed from @aByteObject which should be UTF-32 little-endian encoded data.
   Examples:     self assert: [(Grapheme utf32LE: #[97 0 0 0]) = $a].         Arguments:     aByteObject - <String | ByteArray | Utf32>    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-32LE </code></pre> </details>
<details> utf8:
<pre><code>   Answer the extended grapheme cluster constructed from @aByteObject which should be UTF-8 encoded data.    @ByteObject is validated before any attempt is made to create the grapheme from it.
   Examples:     self assert: [(Grapheme utf8: #[97]) = $a]         Arguments:     aByteObject - <String | ByteArray>    Answers:     <Grapheme>    Raises:     <Exception> EsPrimErrValueOutOfRange if invalid utf-8 </code></pre> </details>
<details> value:
   Create a new extended grapheme cluster by converting @anObject
   to an extended grapheme cluster (EGC).
   
   Escaped Strings:
   If @anObject is a String or UnicodeString, then the following escapes
   will be parsed to create an extended grapheme cluster:
    Escapes:
      \x53        7-bit character code (exactly 2 digits, up to 0x7F)
      \u{1F600}    24-bit Unicode character code (up to 6 digits)
      \n          Newline (This is the Lf character)
      \r          Carriage return (This is the Cr character)
      \t          Tab
      \\          Backslash
      \0          Nul
   
   Examples:
    #'From unicode scalar value'.
    self assert: [(Grapheme value: 16r1F600) value = 16r1F600].
    
    #'From unicode scalar object'.
    self assert: [(Grapheme value: 16r1F600 asUnicodeScalar) unicodeScalars first = 16r1F600 asUnicodeScalar].
    
    #'From array of unicode scalar object/values'.
    self assert: [(Grapheme value: { 16r65. 16r301 asUnicodeScalar }) utf32 contents = (Utf32 with: 16r65 with: 16r301)].
    
    #'From a Character object'.
    self assert: [(Grapheme value: $a) asciiValue = $a value].
    
    #'From a single element string object'.
    self assert: [(Grapheme value: '\x53') = $S asGrapheme].
    self assert: [(Grapheme value: '\u{1F600}') name = 'GRINNING FACE'].
    self assert: [(Grapheme value: '\t') = Grapheme tab].
    self assert: [(Grapheme value: '\r\n') = Grapheme crlf].
  
   Arguments:
     anObject -  <Integer> Unicode code point
              <UnicodeScalar> unicode scalar
              <Array> of <Integer | UnicodeScalar> array of unicode scalars
              <UnicodeString> unicode string containing escape characters
              (Compat) <Character> Smalltalk code-page character  (requires conversion if outside ascii range)
              (Compat) <String> Smalltalk code-page string containing escape characters (requires conversion if outside ascii range)
              
   Answers:
    <Grapheme>
   Raises:
    <Exception> EsPrimErrValueOutOfRange if can not convert anObject to a Grapheme
</details>
Instance Methods
<details> <
<pre><code>   Answer a Boolean indicating true if the receiver is less    than aGrapheme; answer false otherwise.
   Arguments:     aGrapheme - <Grapheme> or <Character> for compatibility    Answers:     <Boolean> </code></pre> </details>
<details> <=
   Answer a Boolean indicating true if the receiver is less or equal
   than aGrapheme; answer false otherwise
   
   Arguments:
    aGrapheme - <Grapheme> or <Character> for compatibility
   Answers:
    <Boolean>
</details>
<details> =
<pre><code>   Answer a Boolean indicating true if the receiver is equal    to aGrapheme; answer false otherwise.
   Examples:     self assert: [Grapheme cr = Grapheme cr].     self assert: [Grapheme cr = Character cr].         Arguments:     aGrapheme - <Grapheme> or <Character> for compatibility    Answers:     <Boolean> </code></pre> </details>
<details> >
   Answer a Boolean indicating true if the receiver is greater
   than aGrapheme; answer false otherwise
   
   Arguments:
    aGrapheme - <Grapheme> or <Character> for compatibility
   Answers:
    <Boolean>
</details>
<details> >=
   Answer a Boolean indicating true if the receiver is greater
   than or equal to aGrapheme; answer false otherwise
   
   Arguments:
    aGrapheme - <Grapheme> or <Character> for compatibility
   Answers:
    <Boolean>
</details>
<details> asciiValue
<pre><code>   Answer the ASCII encoding value of this grapheme, if it is ascii.
   '\r\n' will be normalized to \n
   Answers:     <Integer> </code></pre> </details>
<details> asGrapheme
   Answer self
   
   Answers:
    <Grapheme>
</details>
<details> asInteger
   Answer an Integer representing the numeric value of the
   receiver.
</details>
<details> asLowercase
<pre><code>   Answers a lowercased version of this grapheme.   
   Case conversion can result in multiple scalars or graphemes,    therefore the result must be expressed as a UnicodeString.    For example, the character 'Ä°' (16r0130 asUnicodeScalar LATIN CAPITAL LETTER I WITH DOT ABOVE)    becomes two scalars (16r0069 LATIN SMALL LETTER I, 16r0307 COMBINING DOT ABOVE)    when converted to lowercase (but still a single grapheme).
   Examples:     self assert: [$A asGrapheme asLowercase = 'a' asUnicodeString].     self assert: [16r0130 asGrapheme asLowercase = (UnicodeString value: { 16r0069 asUnicodeScalar. 16r0307 asUnicodeScalar. })].         Answers:     <UnicodeString> </code></pre> </details>
<details> asNFC
   Answer an NFC normalized copy of this grapheme.
   If the grapheme is already normalized, then answer the receiver.
   Otherwise, answer a new grapheme.
   
   Examples:
    'LATIN SMALL LETTER E, COMBINING ACUTE ACCENT -> LATIN SMALL LETTER E WITH ACUTE'.
    self assert: [(Grapheme value: #(16r65 16r301)) asNFC unicodeScalars first value = 16rE9]
   
   Answers:
    <Grapheme>
</details>
<details> asNFD
<pre><code>   Answer an NFD normalized copy of this grapheme.    If the grapheme is already normalized, then answer the receiver.    Otherwise, answer a new grapheme.
   Examples:     'LATIN SMALL LETTER E WITH ACUTE -> LATIN SMALL LETTER E, COMBINING ACUTE ACCENT'.     self assert: [16rE9 asGrapheme asNFD unicodeScalars contents = { 16r65 asUnicodeScalar. 16r301 asUnicodeScalar}]         Answers:     <Grapheme> </code></pre> </details>
<details> asNFKC
   Answer an NFKC normalized copy of this grapheme.
   If the grapheme is already normalized, then answer the receiver.
   Otherwise, answer a new grapheme.
   
   Examples:
    'SUPERSCRIPT TWO -> DIGIT TWO'.
    self assert: [16rB2 asGrapheme asNFKC = 16r32 asGrapheme]
    
   Answers:
    <Grapheme>
</details>
<details> asNFKD
<pre><code>   Answer an NFKD normalized copy of this grapheme.    If the grapheme is already normalized, then answer the receiver.    Otherwise, answer a new grapheme.
   Examples:     'SUPERSCRIPT TWO -> DIGIT TWO'.     self assert: [16rB2 asGrapheme asNFKD = 16r32 asGrapheme]         Answers:     <Grapheme> </code></pre> </details>
<details> asString
<pre><code>   Answer the receiver as a <UnicodeString> instance.
  NOTE: Regardless of the selector, this method returns a UnicodeString   instead of a String, to ease the interplay between Graphemes and UnicodeStrings.      If you want a single byte string you can use #asSBString or asUtf8.
   Answers:     <UnicodeString> </code></pre> </details>
<details> asTitlecase
<pre><code>   Answers an titlecased version of this grapheme.   
   Case conversion can result in multiple scalars or graphemes,    therefore the result must be expressed as a UnicodeString.    For example, the ligature 'fi' (16rFB01 LATIN SMALL LIGATURE FI)    becomes 'Fi' (16r0046 LATIN CAPITAL LETTER F, 16r0069 LATIN SMALL LETTER I)    when converted to titlecase.
   Examples:     self assert: [$a asGrapheme asTitlecase = 'A' asUnicodeString].     self assert: [16rFB01 asGrapheme asTitlecase = (UnicodeString value: { 16r46 asUnicodeScalar. 16r69 asUnicodeScalar. })].         Answers:     <UnicodeString> </code></pre> </details>
<details> asUnicodeString
<pre><code>   Answer the receiver as a <UnicodeString> instance.
   Answers:     <UnicodeString> </code></pre> </details>
<details> asUppercase
<pre><code>   Answers an uppercased version of this grapheme.   
   Case conversion can result in multiple scalars or graphemes,    therefore the result must be expressed as a UnicodeString.    For example, the German letter 'ß' becomes 'SS' when converted    to uppercase.
   Examples:     self assert: [$a asGrapheme asUppercase = 'A' asUnicodeString].     self assert: [16rDF asGrapheme asUppercase = (UnicodeString value: { 16r53 asUnicodeScalar. 16r53 asUnicodeScalar. })].         Answers:     <UnicodeString> </code></pre> </details>
<details> asUtf16
   Answer a <Utf16> that contains the utf-16 encoded bytes of the receiver.
   
   Example:
    self assert: [233 asGrapheme asUtf16 = (Utf16 with: 233)]
    
   Answers:
    <Utf16>
</details>
<details> asUtf32
<pre><code>   Answer a <Utf32> that contains the utf-32 encoded bytes of the receiver.
   Example:     self assert: [233 asGrapheme asUtf32 = (Utf32 with: 233)]         Answers:     <Utf32> </code></pre> </details>
<details> asUtf8
<pre><code>   Answer a <Utf8> that contains the utf-8 encoded bytes of the receiver.
   Example:     self assert: [233 asGrapheme asUtf8 = (Utf8 with: 195 with: 169)]         Answers:     <Utf8> </code></pre> </details>
<details> codePoint
   Compatibility: Extended Grapheme Clusters only have an expressible codePoint
   if it is defined by 1 scalar.
   
   This is for compatibility with Character>>codePoint.
   
   Answers:
    <Integer>
</details>
<details> compareTo:
<pre><code>   Orders the receiver relative to @aGrapheme.   
   Both the receiver and @aGrapheme will be gauranteed to have the same normalization    form before the comparison is made.   
   Fail if @aGrapheme is not a convertable <Grapheme> object.
   Examples:     self assert: [(97 asGrapheme compareTo: 98 asGrapheme) = -1].     self assert: [(97 asGrapheme compareTo: 97 asGrapheme) = 0].     self assert: [(98 asGrapheme compareTo: 97 asGrapheme) = 1].       Arguments:     aGrapheme - <Grapheme>    Answers:     <Integer>   -1   The receiver is less than @aGrapheme               0    The receiver is equal to @aGrapheme               1    The receiver is greater than @aGrapheme </code></pre> </details>
<details> digitValue
<pre><code>   Answer an Integer corresponding to the numerical radix of    the receiver. Return 0-9 if the receiver is $0-$9, and    10-35 if it is $A-$Z; otherwise return -1.
  NOTE: Since Graphemes might be composed of several scalars,
  answer the digit value only if it is ASCII,   (so its composed by a single ASCII scalar).      Answers:     <Integer>    </code></pre> </details>
<details> escaped
<pre><code>   Answer a copy of the receiver that has been escaped using the following    rules.   
   Escaped Strings:       Tab is escaped as \t
      Carriage return is escaped as \r.       Line feed is escaped as \n.       Backslash is escaped as '\'       Any character in the 'printable ASCII' range 16r20 .. 16r7E inclusive is not escaped.       All other characters are given hexadecimal Unicode escapes \u{NNNNNN} where
        NNNNNN is a hexadecimal uppercase representation
   Example:     self assert: [Grapheme tab escaped = '\t'].     self assert: [Grapheme crlf escaped = '\r\n'].     self assert: [$a asGrapheme escaped = 'a'].     self assert: [$ asGrapheme escaped = '\'].     self assert: [0 asGrapheme escaped = '\u{0}'].     self assert: [16r1F37A asGrapheme escaped = '\u{1F37A}'].   
   Answers:     <UnicodeString> </code></pre> </details>
<details> isAscii
<pre><code>   Answers true if the receiver is within the ASCII range, false otherwise
   Examples:     self assert: [$A asGrapheme isAscii].     self assert: [233 asGrapheme isAscii not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiAlphabetic
<pre><code>   Answers true if the receiver is an ASCII alphabetic character.     U+0041 'A' ..= U+005A 'Z', or     U+0061 'a' ..= U+007A 'z'.
   Examples:     self assert: [$A asGrapheme isAsciiAlphabetic].     self assert: [233 asGrapheme isAsciiAlphabetic not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiAlphaNumeric
<pre><code>   Answers true if the receiver is an ASCII alphanumeric character:     U+0041 'A' ..= U+005A 'Z', or     U+0061 'a' ..= U+007A 'z', or     U+0030 '0' ..= U+0039 '9'.
   Examples:     self assert: [$A asGrapheme isAsciiAlphaNumeric].     self assert: [$5 asGrapheme isAsciiAlphaNumeric].     self assert: [233 asGrapheme isAsciiAlphaNumeric not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiControl
<pre><code>   Answers true if the receiver is an ASCII control character:     U+0000 NUL ..= U+001F UNIT SEPARATOR, or
    U+007F DELETE.
      Note that most ASCII whitespace characters are control characters, but SPACE is not.
   Examples:     self assert: [Grapheme cr isAsciiControl].     self assert: [Grapheme space isAsciiControl not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiDigit
<pre><code>   Answers true is an ASCII decimal digit:     U+0030 '0' ..= U+0039 '9'.
   Examples:     self assert: [$5 asGrapheme isAsciiDigit].     self assert: [Grapheme space isAsciiDigit not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiGraphic
<pre><code>   Answers true is an ASCII graphic character:     U+0021 '!' ..= U+007E '~'.
   Examples:     self assert: [$! asGrapheme isAsciiGraphic].     self assert: [16r9 asGrapheme isAsciiGraphic not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiHexDigit
<pre><code>   Answers true is an ASCII hexadecimal digit:     U+0030 '0' ..= U+0039 '9', or     U+0041 'A' ..= U+0046 'F', or     U+0061 'a' ..= U+0066 'f'.
   Examples:     self assert: [$A asGrapheme isAsciiHexDigit].     self assert: [$G asGrapheme isAsciiHexDigit not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiLowercase
<pre><code>   Answers true is an ASCII lowercase character:     U+0061 'a' ..= U+007A 'z'.
   Examples:     self assert: [$a asGrapheme isAsciiLowercase].     self assert: [$A asGrapheme isAsciiLowercase not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiPunctuation
<pre><code>   Answers true is an ASCII punctuation character:     U+0021 ..= U+002F ! <quote> # $ % & ' ( ) * + , - . /, or     U+003A ..= U+0040 : ; < = > ? @, or     U+005B ..= U+0060 [ ] ^ _ ` , or     U+007B ..= U+007E { | } ~
   Examples:     self assert: [$! asGrapheme isAsciiPunctuation].     self assert: [$a asGrapheme isAsciiPunctuation not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiUppercase
<pre><code>   Answers true is an ASCII uppercase character:     U+0041 'A' ..= U+005A 'Z'.
   Examples:     self assert: [$A asGrapheme isAsciiUppercase].     self assert: [$a asGrapheme isAsciiUppercase not].         Answers:     <Boolean> </code></pre> </details>
<details> isAsciiWhitespace
<pre><code>   Answers true is an ASCII whitespace character:     U+0020 SPACE,     U+0009 HORIZONTAL TAB,     U+000A LINE FEED,     U+000C FORM FEED, or     U+000D CARRIAGE RETURN.   
   Note: This uses the WhatWG Infra Standard's definition of ASCII whitespace.
   Examples:     self assert: [Grapheme space isAsciiWhitespace].     self assert: [$a asGrapheme isAsciiWhitespace not].         Answers:     <Boolean> </code></pre> </details>
<details> isCased
<pre><code>   Answer true if the receiver changes under any form of case conversion,    false otherwise.
   Examples:     self assert: [$a asGrapheme isCased].     self assert: [Grapheme space isCased not].         Answers:     <Boolean> </code></pre> </details>
<details> isDigit
<pre><code>   Answer true if the receiver is a valid Smalltalk digit as described in
   the ANSI Smalltalk Standard; otherwise answer false.
   digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'      Read #isSmalltalkDigit for more details.      Answers:     <Boolean> </code></pre> </details>
<details> isGrapheme
   Answer true as a unicode grapheme object
   
   Answers:
    <Boolean>
</details>
<details> isLetter
<pre><code>   Answers true if the receiver represents a letter, false otherwise
   Examples:     self assert: [$A asGrapheme isLetter].     self assert: [$5 asGrapheme isLetter not].         Answers:     <Boolean> </code></pre> </details>
<details> isLowercase
<pre><code>   Answer true is the receiver is considered lowercase.   
   Lowercase characters change when converted to uppercase, but not    when converted to lowercase. The following characters are all lowercase
   - 'é' (16r0065 LATIN SMALL LETTER E, U+0301 COMBINING ACUTE ACCENT)    - 'и' (16r0438 CYRILLIC SMALL LETTER I)    - 'Ï€' (16r03C0 GREEK SMALL LETTER PI)
   Examples:      self assert: [16rE2 asGrapheme name = 'LATIN SMALL LETTER A WITH CIRCUMFLEX'].     self assert: [16rE2 asGrapheme isLowercase].     self assert: [16rC5 asGrapheme name = 'LATIN CAPITAL LETTER A WITH RING ABOVE'].     self assert: [16rC5 asGrapheme isLowercase not].         Answers:     <Boolean> </code></pre> </details>
<details> isNewline
   Answers true if the receiver represents a newline.
   
   Examples:
    self assert: [Grapheme cr isNewline].
    'LINE SEPARATOR'.
    self assert: [16r2028 asGrapheme isNewline].
    self assert: [Grapheme crlf isNewline].
    
   Answers:
    <Boolean>
</details>
<details> isNFC
<pre><code>   Answer true if the receiver is NFC normalized, false otherwise.
   Examples:     self assert: [16rE9 asGrapheme isNFC].     self assert: [16rE9 asGrapheme asNFD isNFC not].         Answers:     <Grapheme> </code></pre> </details>
<details> isNFD
<pre><code>   Answer true if the receiver is NFD normalized, false otherwise.
   Examples:     self assert: [16rE9 asGrapheme isNFD not].     self assert: [16rE9 asGrapheme asNFD isNFD].         Answers:     <Grapheme> </code></pre> </details>
<details> isNFKC
<pre><code>   Answer true if the receiver is NFKC normalized, false otherwise.
   Examples:     'SUPERSCRIPT TWO'.     self assert: [16rB2 asGrapheme isNFKC not].     'DIGIT TWO'.     self assert: [16r32 asGrapheme isNFKC].         Answers:     <Grapheme> </code></pre> </details>
<details> isNFKD
<pre><code>   Answer true if the receiver is NFKD normalized, false otherwise.
   Examples:     'SUPERSCRIPT TWO'.     self assert: [16rB2 asGrapheme isNFKD not].     'DIGIT TWO'.     self assert: [16r32 asGrapheme isNFKD].         Answers:     <Grapheme> </code></pre> </details>
<details> isNumeric
<pre><code>   Answers true if the receiver has one of the general categories for numbers, false otherwise
   Examples:     self assert: [$3 asGrapheme isNumeric].     self assert: [16r2070 asGrapheme name = 'SUPERSCRIPT ZERO'].     self assert: [16r2070 asGrapheme isNumeric].     self assert: [16r1F40 asGrapheme name = 'GREEK SMALL LETTER OMICRON WITH PSILI'].     self assert: [16r1F40 asGrapheme isNumeric not].         Answers:     <Boolean> </code></pre> </details>
<details> isSeparator
   Compatibility: Captures a superset of Character>>isSeparator
</details>
<details> isSmalltalkAlphaNumeric
   Answer true if the receiver is a valid smalltalk lettor or digit, false otherwise
   
   Answers:
    <Boolean>
</details>
<details> isSmalltalkDigit
   Read #isDigit for more details
   
   Answers:
    <Boolean>
</details>
<details> isSmalltalkLetter
<pre><code>   Answer true if the receiver is a valid Smalltalk letter as described in the ANSI Smalltalk Standard; otherwise answer false.
   letter ::= uppercaseAlphabetic | lowercaseAlphabetic | nonCaseLetter    uppercaseAlphabetic ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S'| 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z'    lowercaseAlphabetic ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'I' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'    nonCaseLetter ::= '_'
   It would be easier to simply send #isLetter, but we cannot do this because some country codes have characters that say they are letters
   but are not valid Smalltalk syntactic letters. We also need to allow for the nonCaseLetter </code></pre> </details>
<details> isUppercase
<pre><code>   Answer true is the receiver is considered uppercase.   
   Uppercase characters vary under case-conversion to lowercase,    but not when converted to uppercase. The following characters are    all uppercase.
   - 'É' (16r0045 LATIN CAPITAL LETTER E, 16r0301 COMBINING ACUTE ACCENT)    - 'И' (16r0418 CYRILLIC CAPITAL LETTER I)    - 'Π' (16r03A0 GREEK CAPITAL LETTER PI)
   Examples:     self assert: [16rC5 asGrapheme name = 'LATIN CAPITAL LETTER A WITH RING ABOVE'].     self assert: [16rC5 asGrapheme isUppercase].     self assert: [16rE2 asGrapheme name = 'LATIN SMALL LETTER A WITH CIRCUMFLEX'].     self assert: [16rE2 asGrapheme isUppercase not].         Answers:     <Boolean> </code></pre> </details>
<details> isWhitespace
<pre><code>   Answers true if the receiver represents whitespace, including newlines,    false otherwise.
   Examples:     self assert: [16r1680 asGrapheme name = 'OGHAM SPACE MARK'].     self assert: [16r1680 asGrapheme isWhitespace].     self assert: [16r1F40 asGrapheme name = 'GREEK SMALL LETTER OMICRON WITH PSILI'].     self assert: [16r1F40 asGrapheme isWhitespace not].         Answers:     <Boolean> </code></pre> </details>
<details> join:
   Append the elements of the argument @aCollection, separating them by the receiver.
  
  Examples:
    self assert: [(Grapheme space join: #('VA' 'is' 'cool')) = 'VA is cool' asUnicodeString]
  
  Arguments:
    aCollection - <Collection>
  Answers:
    <String | Symbol>
</details>
<details> name
<pre><code>   Answer the name which is a comma-delimited concatenation of all the    unicode scalars names in the grapheme. Single-scalar graphemes will    have the same name as their unicode scalar equivalent.
   Examples:     self assert: [16r388 asGrapheme name = 'GREEK CAPITAL LETTER EPSILON WITH TONOS'].     self assert: [233 asGrapheme asNFD name = 'LATIN SMALL LETTER E,COMBINING ACUTE ACCENT']         Answers:     <UnicodeString> </code></pre> </details>
<details> names
   Answer the names of all the unicode scalars in the receiver an an Array
   
   Examples:
    self assert: [16r388 asGrapheme asNFD names = #('GREEK CAPITAL LETTER EPSILON' 'COMBINING ACUTE ACCENT')]
    
   Answers:
    <Array>
</details>
<details> sameAs:
   Answer whether the receiver is equal to aGrapheme, ignoring case.
   
  Arguments:
    aGrapheme - <Grapheme> or <Character> for compatibility
   Answers:
    <Boolean>
</details>
<details> to:
   Answer a collection of Graphemes with consecutive codepoints
   starting from receiver's codepoint up to aGrapheme codepoint.
   
   Arguments:
    aGrapheme - <Grapheme>
  Answers:
    <Array>
</details>
<details> unicodeScalars
   Answer the unicode scalar view of the receiver.
   
   A unicode scalar <UnicodeScalar> represents a 'unicode scalar value', which is similar to,
   but not the same as, a 'unicode code point' as it will never represent high/low-surrogate
   code points reserved for UTF-16 encoding.
   
   Example:
    | view |
    view := $H asGrapheme unicodeScalars.
    self assert: [view size = 1].
    self assert: [view contents = (Array with: $H asUnicodeScalar)].
    self assert: [view asByteArray = (ByteArray with: $H value)].
    self assert: [view next = $H asUnicodeScalar].
    self assert: [view atEnd]
   
   Answers:
    <UnicodeScalarView>
</details>
<details> utf16
<pre><code>   Answer the utf16 platform-endian view of the receiver.
   Each element in this view is a UTF-16 code unit. UTF-16 is an 16-bit    encoded form of unicode scalar values.
   Example:     | view |     'MUSICAL NOTE - U+1F3B5'.     view := 16r1F3B5 asGrapheme utf16.     self assert: [view size = 2].     self assert: [view next = 55356].     self assert: [view next = 57269].     self assert: [view atEnd]   
   Answers:     <Utf16View> </code></pre> </details>
<details> utf16BE
<pre><code>   Answer the utf16 big-endian view of the receiver.
   Each element in this view is a UTF-16 code unit. UTF-16 is an 16-bit    encoded form of unicode scalar values.
   Example:     | view |     'MUSICAL NOTE - U+1F3B5'.     view := 16r1F3B5 asGrapheme utf16BE.     self assert: [view size = 2].     self assert: [view next = 15576].     self assert: [view next = 46559].     self assert: [view atEnd]   
   Answers:     <Utf16BigEndianView> </code></pre> </details>
<details> utf16LE
<pre><code>   Answer the utf16 little-endian view of the receiver.
   Each element in this view is a UTF-16 code unit. UTF-16 is an 16-bit    encoded form of unicode scalar values.
   Example:     | view |     'MUSICAL NOTE - U+1F3B5'.     view := 16r1F3B5 asGrapheme utf16LE.     self assert: [view size = 2].     self assert: [view next = 55356].     self assert: [view next = 57269].     self assert: [view atEnd]   
   Answers:     <Utf16LittleEndianView> </code></pre> </details>
<details> utf32
<pre><code>   Answer the utf32 view of the receiver.
   Each element in this view is a UTF-32 code unit. UTF-32 is an 32-bit    encoded form of unicode scalar values.
   Example:     | view |     'MUSICAL NOTE - U+1F3B5'.     view := 16r1F3B5 asGrapheme utf32.     self assert: [view size = 1].     self assert: [view next = 16r1F3B5].     self assert: [view atEnd]   
   Answers:     <Utf32View> </code></pre> </details>
<details> utf32BE
<pre><code>   Answer the utf32 big-endian view of the receiver.
   Each element in this view is a UTF-32 code unit. UTF-32 is an 32-bit    encoded form of unicode scalar values.
   Example:     | view |     'MUSICAL NOTE - U+1F3B5'.     view := 16r1F3B5 asGrapheme utf32BE.     self assert: [view size = 1].     self assert: [view next = 3052601600].     self assert: [view asByteArray = #[0 1 243 181]].     self assert: [view atEnd]   
   Answers:     <Utf16BigEndianView> </code></pre> </details>
<details> utf32LE
<pre><code>   Answer the utf32 little-endian view of the receiver.
   Each element in this view is a UTF-32 code unit. UTF-32 is an 32-bit    encoded form of unicode scalar values.
   Example:     | view |     'MUSICAL NOTE - U+1F3B5'.     view := 16r1F3B5 asGrapheme utf32LE.     self assert: [view size = 1].     self assert: [view next = 127925].     self assert: [view asByteArray = #[181 243 1 0]].     self assert: [view atEnd]   
   Answers:     <Utf32LittleEndianView> </code></pre> </details>
<details> utf8
<pre><code>   Answer the utf8 view of the receiver.   
   Each element in this view is a UTF-8 code unit. UTF-8 is an 8-bit    encoded form of unicode scalar values.
   Example:     | view |     'LATIN SMALL LETTER E WITH ACUTE'.     view := 233 asGrapheme utf8.     self assert: [view size = 2].     self assert: [view next = 195].     self assert: [view next = 169].     self assert: [view atEnd]         Answers:     <Utf8View> </code></pre> </details>
<details> value
   Compatibility: Extended Grapheme Clusters only have an expressible integer value
   if it is defined by 1 scalar.
   
   This is for compatibility with Character>>value.
   
   Answers:
    <Integer>
</details>
Last modified date: 01/18/2023