DocumentContent getContent(Long start, Long end) throws InvalidOffsetException
Conceptually the annotation offsets are defined as falling in between characters, with "0" pointing before the fist character. Because of that, the offsets where an annotation ends and the space after it starts are the same.
So this is what the "abcde" string looks like with the offsets explicitly included: 0a1b2c3d4e5
"ab cd" would then look like this: 0a1b2 3c4d5
with the following annotations:
Token "ab" [0,2]
SpaceToken " " [2,3]
Token "cd" [3,5]
start- the beginning index, inclusive.
end- the ending index, exclusive.
InvalidOffsetException- if the
startis negative, or
endis larger than the length of this
startis larger than