Creating corpora
Corpora are collections of documents. They are useful when more than one document
needs to be processed.
-
- Create a corpus language resource (see animation):
- Choose Language Resources/New/GATE corpus.
- (Optional) Provide the corpus name.
- (Optional) Choose which of the currently loaded documents to be added
to the corpus when created (click on the list editing
button).
- Add/remove documents to it:
- Double-click on the corpus to bring up the corpus viewer.
- Add or remove documents to it, using the
and
buttons provided.
- Populating a corpus from a directory (see animation):
- Right-click on a corpus and choose "Populate".
- Choose the directory where your files are located.
- (Optional) Specify the file extension, so only files with this extension
are loaded.
- (Optional) Specify the encoding to be used, when the documents are loaded in
GATE. If left blank (default), then the default platform encoding will be used
(for more information see the User's Guide).
Next -
Previous -
Up
5(10)