ClearForest, acquired by Thomson Reuters in 2007, has joined forces with other NLP experts at Thomson Reuters to form the Text Metadata Services (TMS) Group to address the full spectrum of Thomson Reuters metadata requirements. The company employs state-of-the-art NLP technologies to detect events in narrative text, extract entities, classify topics, and identify meaningful connections between entities.
The TMS Group employs a combination of machine-learning and rule-based techniques. When manually labeled training data is available, supervised machine-learning techniques such as logistic regression or SVM are used. In other cases, unsupervised techniques are used, and in combination with an advanced rule-based extraction mechanism, semi-supervised techniques bootstrap from the rules and use the topology of the data to complete the NLP task. Some of the algorithms work at the document level while others extract information and allow question answering by looking at a large corpus of documents.