GETS is a tool for German text segmentation (tokenisation and sentence segmentation). The tool is based on conditional random fields (CRF). The tool is designed to segment German texts in private communication (postcards) into sentences and words. The detail is presented in my paper: Word and sentence segmentation in German: Overcoming idiosyncrasies in the use of punctuation in private communication (Sugisaki, 2017)

Short description is available: README

Download (1.8GB)