CLARIN Services and Software
We currently provide five NLP/IE services for CLARIN (the links point to the CXF service descriptions, which include links to the endpoints and WSDLs).
- ANNIE with GATE XML output (NER, named entity recognition) (Documentation)
- ANNIE for English with RDF-XML output (NER; output according to the Proton ontology)
- Morphosyntactic analysis for English with MAF XML output (Documentation)
- Chunking for English with SynAF XML output (Documentation)
- NER for German with RDF-XML output (also according to Proton) (Documentation)
Each service has four methods:
- process takes an input byte array representing the document
- processWithURL also takes an originalURL string, which helps GATE determine the document format (and MIME type)
- processWithParams takes an input byte array and a parameters map of string key-value pairs; the following keys are recognized:
- originalURL
- encoding
- mimeType
- processRemoteURL takes a string containing the URL of a document on the web (which our service fetches and analyses; our service uses the mime-type and encoding provided by the remote server)
All four methods return an XML element which contains the results of the analysis. All the services can handle any document type supported by GATE. If the mimeType parameter is supplied, it will override what GATE might determine from the document content and URL. The service will return a fault if the client or the remote server specifies a MIME type that GATE does not support.
Our reference GUI client includes the URIs of the current services. It requires a Java 1.5 JRE but is supplied with all the dependencies. Unzip it, run the jar file, and select a service, file encoding, and file. The XML output of the service will appear in a separate pane.
Read our paper and slides from the Language Resource and Language Technology Standards workshop at LREC 2010. The paper and slides include screenshots of the client and explanations of the NLP/IE pipelines in the services.