Multilinguality The standard distribution has tools for many languages, including Arabic, Chinese, French, German, Italian. All textual data is stored internally in Unicode, various other encodings are supported for input/output. Robust Reference implementations provided for many basic HLT tools like tokenisation, part-of-speech tagging, finite state information extarction, etc.