UnicodeReadStream
Description
This is an adapter class used for bridging <UnicodeView>s with <ReadStream>s.
By default, this streams 'graphemes' which are user-perceived characters. A grapheme is represented in VAST by a <Grapheme> object. A <UnicodeString> in VAST is to be thought of as a <Collection> of <Grapheme>s.
If you need more technical parsing precision or closer line-ending compatibility with , then you can put this stream into unicode scalar mode by calling #switchToUnicodeScalarMode. A unicode scalar is represented in VAST by a <UnicodeScalar> object. A <UnicodeScalar> represents all Unicode code points except for a special range reserved for UTF-16 encoding.
If you are working with pure Unicode, then consider using views rather than this adapter class.
@see the class category Views on a <UnicodeString> for more details.
Instance State
• view: <UnicodeView> internal view to bridge
CLDT-API
This adapter redefines the necessary <ReadStream> (and superclass) APIs to allow for efficient streaming of a <UnicodeString>. In most cases, this means delegating to the internal view which tend to implement operations more efficiently for variable-width collections than <ReadStream> does.
Modes
This stream can go into different modes which define how elements of the stream are to be interpreted. While the default mode is graphemes, you can switch to different modes using the APIs in the Modescategory. Switching a mode will always reset the stream to the beginning.
For example, if you wanted to process a string object as a of s, you could do the following:
| stream |
stream := 'Smalltalk' asUnicodeString readStream.
"Process stream as unicode scalars"
stream switchToUnicodeScalarMode.
self assert: [stream next = $S asUnicodeScalar].
self assert: [(stream next: 8) = ('Smalltalk' asUnicodeString unicodeScalars copyFrom: 2) contents]
Class Methods
None
Instance Methods
atEnd
Answer a Boolean which is true if the receiver cannot
access any more objects, and false otherwise.
Example:
self assert: [UnicodeString new readStream atEnd].
self assert: ['Smalltalk' asUnicodeString readStream atEnd not].
Answers:
<Boolean>
isEmpty
Answer true if the contents of the view are empty.
This is relative to the complete contents and is not
impacted by the current position.
Example:
self assert: [UnicodeString new readStream isEmpty].
self assert: ['Smalltalk' asUnicodeString readStream isEmpty not]
Answers:
<Boolean>
lineDelimiter
Return the receiver's line delimiter.
Answers:
<Object>
lineDelimiter:
Set the receiver's line delimiter to be delimiter, and
answer the receiver.
Example:
| stream |
stream := ('Small' , String lf , 'talk' , String cr , 'er') asUnicodeString readStream.
self assert: [(stream lineDelimiter: Grapheme cr; nextLine) = ('Small' , String lf , 'talk')].
Arguments:
delimiter - Grapheme mode:
<Grapheme> grapheme delim
<UnicodeScalar> scalar delim
<UnicodeString> graphemes
<Array> of <implementors of #asGrapheme>
Compat: <String | Character>
Unicode Scalar mode:
<UnicodeScalar> scalar delim
<Grapheme> grapheme delim
<UnicodeString> graphemes
<Array> of <implementors of #asUnicodeScalar>
Compat: <String | Character>
Answers:
<UnicodeReadStream> self
next
Answer an Object that is the next accessible by the
receiver. Change the state of the receiver so that
returned object is no longer accessible.
Example:
self assert: [('Smalltalk' asUnicodeString readStream next; next; next) = $a asGrapheme].
self assert: [('Smalltalk' asUnicodeString readStream switchToUnicodeScalarMode; next; next; next) = $a asUnicodeScalar].
Answers:
<Object> view object
next:
Answer a collection containing the next @anInteger elements from the view.
If @anInteger < 1, an empty collection is answered
Example:
self assert: [('Smalltalk' asUnicodeString readStream next: 5) = 'Small'].
self assert: [| stream |
stream := 'Smalltalk' asUnicodeString readStream.
(stream switchToUnicodeScalarMode; next: 5) = 'Small' unicodeScalars contents]
Arguments:
anInteger - <Integer>
Answers:
<Object> instance of view collection class
Raises:
<Exception> ExCLDTIndexOutOfRange
next:into:startingAt:
Answer @anIndexedCollection with the next @anInteger number of items from
the receiver, stored starting at position @initialPosition.
If the receiver's state is such that there are fewer than anInteger
elements between its current position and the end of the stream,
the operation will fail, and the receiver will be left in a state
such that it answers true to the atEnd message.
Example:
| col |
col := Array new: 5.
'Smalltalk' asUnicodeString readStream next: 5 into: col startingAt: 1.
self assert: [col = 'Small' asUnicodeString asArray]
Arguments:
anInteger - <Integer>
anIndexedCollection - <Collection>
initialPosition - <Integer>
Answers:
<Collection> - anIndexedCollection
nextLine
Answer the elements between the current position and the next lineDelimiter.
Example:
| stream |
stream := ('Small' , String lf , 'talk' , String cr , 'er' , String crlf , 's') asUnicodeString readStream.
self assert: [stream nextLine = 'Small'].
self assert: [stream nextLine = 'talk'].
self assert: [stream nextLine = 'er'].
self assert: [stream nextLine = 's'].
stream switchToUnicodeScalarMode.
self assert: [stream nextLine = 'Small' unicodeScalars contents].
self assert: [stream nextLine = 'talk' unicodeScalars contents].
self assert: [stream nextLine = 'er' unicodeScalars contents].
self assert: [stream nextLine = 's' unicodeScalars contents].
self assert: stream atEnd.
Answers:
<Object> view-dependent
peek
Answer an Object that is the next accessible by the receiver.
Change the state of the receiver so that returned object is no longer accessible.
Answer nil if the view is atEnd
Example:
self assert: [('' asUnicodeString readStream peek) isNil].
self assert: [('Smalltalk' asUnicodeString readStream peek) = $S asGrapheme].
self assert: [('Smalltalk' asUnicodeString readStream switchToUnicodeScalarMode; peek) = $S asUnicodeScalar].
Answers:
<Object> or nil if at end
position:
Set the receiver's position reference to argument anInteger.
Answer self.
Example:
| stream pos |
stream := 'abcde' asUnicodeString readStream.
pos := stream setToEnd; position.
self assert: [(stream reset; next: 3) = 'abc'].
stream position: pos.
self assert: [stream position = pos]
Arguments:
aPosition - <anInteger>
setToEnd
Set the position of the receiver to be the size of the
underlying contents
size
Answer the number of elements in the view.
Example:
self assert: [('Smalltalk' , String crlf) asUnicodeString
readStream size = 10].
self assert: [('Smalltalk' , String crlf) asUnicodeString
readStream switchToUnicodeScalarMode size = 11].
Answers:
<Integer>
skip:
Increment the receiver's current reference position by anInteger.
Fail if anInteger is not a kind of Integer.
Example:
self assert: [('abcde' asUnicodeString readStream skip: 2; upToEnd) = 'cde']
Arguments:
anInteger - <Integer>
Raises:
<Exception> ExCLDTIndexOutOfRange
skipTo:
Read and discard elements just past the occurrence of @anObject.
Example:
self assert: [('abcde' asUnicodeString readStream skipTo: $c; upToEnd) = 'de'].
self assert: [('abcde' asUnicodeString readStream skipTo: $z; upToEnd) = '']
Arguments:
anObject - <Object>
Answers:
<Boolean> true if found, false otherwise
skipToAll:
Attempt to read and discard elements just past the occurrence of @aSequentialCollection.
Answer true if all elements in @aSequentialCollection occurred, else answer false.
Note:
If aSequentialCollection is an EsString, then we attempt ot convert to a UnicodeString
Example:
self assert: ['abcde' asUnicodeString readStream skipToAll: 'bc'].
self assert: [('abcde' asUnicodeString readStream skipToAll: 'bc'; upToEnd) = 'de'].
self assert: [('abcde' asUnicodeString readStream skipToAll: 'zzz') not].
self assert: [('abcde' asUnicodeString readStream skipToAll: 'zzz'; upToEnd) = ''].
Arguments:
aSequentialCollection - <aSequentialCollection>
Answers:
<Boolean>
skipToAny:
Read and discard elements beyond the next occurrence
of an element that exists in @aSequentialCollection or if none,
to the end of stream.
Answer true if an element in @aSequentialCollection
occurred, else answer false.
Note:
If aSequentialCollection is an EsString, then we attempt ot convert to a UnicodeString
Example:
self assert: ['abcde' asUnicodeString readStream skipToAny: 'bd'].
self assert: [('abcde' asUnicodeString readStream skipToAny: 'bd'; upToEnd) = 'cde'].
self assert: [('abcde' asUnicodeString readStream skipToAny: 'zzz') not].
self assert: [('abcde' asUnicodeString readStream skipToAny: 'zzz'; upToEnd) = ''].
Arguments:
aSequentialCollection - <aSequentialCollection>
Answers:
<Boolean>
switchToGraphemeMode
Switch the mode to graphemes.
This will reset the stream.
Calls like #next will answer <Grapheme> objects.
Calls like #next:/#contents will answer <UnicodeString> objects
Example:
self assert: [UnicodeString crlf readStream switchToGraphemeMode size = 1].
self assert: [UnicodeString crlf readStream switchToUnicodeScalarMode size = 2]
switchToUnicodeScalarMode
Switch the mode to unicode scalars.
This will reset the stream.
Calls like #next will answer <UnicodeScalar> objects.
Calls like #next:/#contents will answer <Array> of <UnicodeScalar>s
Example:
self assert: [UnicodeString crlf readStream switchToGraphemeMode size = 1].
self assert: [UnicodeString crlf readStream switchToUnicodeScalarMode size = 2].
upTo:
Answers a collection of all of the objects in the view
beginning from the current position up to, but not including,
@anObject.
Example:
self assert: [('abcde' asUnicodeString readStream upTo: $c) = 'ab'].
self assert: [('abcde' asUnicodeString readStream upTo: $z) = 'abcde']
Arguments:
anObject - <Object>
Answers:
<Object> instance of view collection class
upToAll:
Answers a collection of all of the objects in the view beginning from the current position up to,
but not including, @aSequenceableCollection
Note:
If aSequenceableCollection is an EsString, then we attempt ot convert to a UnicodeString
Example:
self assert: [('abcde' asUnicodeString readStream upToAll: 'bc') = 'a'].
self assert: [('abcde' asUnicodeString readStream upToAll: 'bc'; upToEnd) = 'de'].
self assert: [('abcde' asUnicodeString readStream upToAll: 'zzz') = 'abcde'].
self assert: [('abcde' asUnicodeString readStream upToAll: 'zzz'; upToEnd) isEmpty].
Arguments:
aSequenceableCollection - <SequenceableCollection>
Answers:
<Object> instance of view collection class
upToAny:
Answers a collection of all of the objects in the view up to, but not including, the next occurrence
of the element that exists in @aSequenceableCollection. If the element that exists in @aSequenceableCollection
is not found and the end of the view is encountered, a collection of the objects read is returned.
Note:
If aSequenceableCollection is an EsString, then we attempt ot convert to a UnicodeString
Example:
self assert: [('abcde' asUnicodeString readStream upToAny: 'bd') = 'a'].
self assert: [('abcde' asUnicodeString readStream upToAny: 'bd'; upToEnd) = 'cde'].
self assert: [('abcde' asUnicodeString readStream upToAny: 'zzz') = 'abcde'].
self assert: [('abcde' asUnicodeString readStream upToAny: 'zzz'; upToEnd) isEmpty].
Arguments:
aSequenceableCollection - <SequenceableCollection>
Answers:
<Object> view collection class
upToEnd
Answer a collection containing UP TO the maximum number of elements read from the view.
If there are no more elements available to be read, then an empty collection is answered.
Example:
self assert: ['abcde' asUnicodeString readStream upToEnd = 'abcde'].
self assert: [('abcde' asUnicodeString readStream next: 2; upToEnd) = 'cde'].
self assert: ['' asUnicodeString readStream upToEnd = '']
Answers:
<Object> instance of view collection class
Last modified date: 01/18/2023