Hi, Does any one know how to tokenize a string in python that returns the byte offsets and tokens? Moreover, the sentence splitter that returns the sentences and byte offsets? Finally n-grams returned with byte offsets. Input: This is a string. Output: This 0 is 5 a 8 string. 10 thanks