Programmer Reference : UnicodeSupport : Autoconversion of String into UnicodeString
Autoconversion of String into UnicodeString
It is common in the manipulation of String and DBString instances to perform operations such as at:put where you place a certain Character at a defined index of the string.
As part of the backward compatibility of Unicode support you're allowed to send at:put: to a UnicodeString instance passing a Character as argument, which will be converted to a Grapheme by the receiver and placed in the desired index.
Up to VAST 2022 this was not possible when doing the inverse operation, e.g. passing a Grapheme as argument to an at:put: message sent to a String or DBString, because it would cause a primitive failure.
For convenience reasons, since VAST 2023 every time you attempt to put a Grapheme into a String or DBString, it will first become the receiver into a UnicodeString (sending #asUnicodeString) and then it will perform the desired operation on the becomed instance.
Caveats
Codepage
The conversion of a String into a UnicodeString will be codepage dependent, using the current codepage at the moment of conversion, the same way that EsString>>asUnicodeString does.
Hashing
Becoming an element that is inside a hashed collection will require rehashing such collection.
Hashing of String and UnicodeString is done differently, even for ASCII strings, so be aware of that if you're mutating strings that are used as keys in dictionaries.
Streams
If the String that became a UnicodeString was the collection of a WriteStream trying to add elements to the stream beyond the original size will cause errors, since instances of EsString subclasses are not growable and UnicodeString instances are growable. Unicode support provides special stream classes to deal with UnicodeString, UnicodeReadStream and UnicodeWriteStream for read an write respectively.
This case might show up if you add instances of UnicodeString to a regular WriteStream received as argument of a printOn: stream. If that's the case, you can convert to single byte strings or escape the Unicode characters using the UnicodeString>>#escaped message.
Recommendations
Use explicit conversion to UnicodeString wherever possible and avoid having to use this autobecome feature unless modifying your classes to support Unicode strings is not worth the effort or requires redoing the whole library.
Last modified date: 01/19/2023