gate
Interface Document

All Superinterfaces:
Comparable, FeatureBearer, LanguageResource, NameBearer, Resource, Serializable
All Known Subinterfaces:
TextualDocument
All Known Implementing Classes:
DocumentImpl

public interface Document
extends LanguageResource, Comparable

Represents the commonalities between all sorts of documents.


Field Summary
static String DOCUMENT_ENCODING_PARAMETER_NAME
           
static String DOCUMENT_END_OFFSET_PARAMETER_NAME
           
static String DOCUMENT_MARKUP_AWARE_PARAMETER_NAME
          The parameter name that determines whether or not a document is markup aware
static String DOCUMENT_PRESERVE_CONTENT_PARAMETER_NAME
           
static String DOCUMENT_REPOSITIONING_PARAMETER_NAME
           
static String DOCUMENT_START_OFFSET_PARAMETER_NAME
           
static String DOCUMENT_STRING_CONTENT_PARAMETER_NAME
           
static String DOCUMENT_URL_PARAMETER_NAME
          The parameter name for the document URL
 
Method Summary
 void addDocumentListener(DocumentListener l)
          Adds a DocumentListener to this document.
 void edit(Long start, Long end, DocumentContent replacement)
          Make changes to the content.
 AnnotationSet getAnnotations()
          Get the default set of annotations.
 AnnotationSet getAnnotations(String name)
          Get a named set of annotations.
 Boolean getCollectRepositioningInfo()
          Get the collectiong and preserving of repositioning information for the Document.
 DocumentContent getContent()
          The content of the document: wraps e.g.
 Boolean getMarkupAware()
          Get the markup awareness status of the Document.
 Map getNamedAnnotationSets()
          Returns a map with the named annotation sets
 Boolean getPreserveOriginalContent()
          Get the preserving of content status of the Document.
 URL getSourceUrl()
          Documents are identified by URLs
 Long getSourceUrlEndOffset()
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 Long[] getSourceUrlOffsets()
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 Long getSourceUrlStartOffset()
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 void removeAnnotationSet(String name)
          Removes one of the named annotation sets.
 void removeDocumentListener(DocumentListener l)
          Removes one of the previously registered document listeners.
 void setCollectRepositioningInfo(Boolean b)
          Allow/disallow collecting of repositioning information.
 void setContent(DocumentContent newContent)
          Set method for the document content
 void setMarkupAware(Boolean b)
          Make the document markup-aware.
 void setPreserveOriginalContent(Boolean b)
          Allow/disallow preserving of the original document content.
 void setSourceUrl(URL sourceUrl)
          Set method for the document's URL
 void setSourceUrlEndOffset(Long sourceUrlEndOffset)
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 void setSourceUrlStartOffset(Long sourceUrlStartOffset)
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 String toXml()
          Returns a GateXml document.
 String toXml(Set aSourceAnnotationSet)
          Equivalent to toXml(aSourceAnnotationSet, true).
 String toXml(Set aSourceAnnotationSet, boolean includeFeatures)
          Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet.
 
Methods inherited from interface gate.LanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from interface gate.Resource
cleanup, getParameterValue, init, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 
Methods inherited from interface java.lang.Comparable
compareTo
 

Field Detail

DOCUMENT_URL_PARAMETER_NAME

public static final String DOCUMENT_URL_PARAMETER_NAME
The parameter name for the document URL

See Also:
Constant Field Values

DOCUMENT_MARKUP_AWARE_PARAMETER_NAME

public static final String DOCUMENT_MARKUP_AWARE_PARAMETER_NAME
The parameter name that determines whether or not a document is markup aware

See Also:
Constant Field Values

DOCUMENT_ENCODING_PARAMETER_NAME

public static final String DOCUMENT_ENCODING_PARAMETER_NAME
See Also:
Constant Field Values

DOCUMENT_PRESERVE_CONTENT_PARAMETER_NAME

public static final String DOCUMENT_PRESERVE_CONTENT_PARAMETER_NAME
See Also:
Constant Field Values

DOCUMENT_STRING_CONTENT_PARAMETER_NAME

public static final String DOCUMENT_STRING_CONTENT_PARAMETER_NAME
See Also:
Constant Field Values

DOCUMENT_REPOSITIONING_PARAMETER_NAME

public static final String DOCUMENT_REPOSITIONING_PARAMETER_NAME
See Also:
Constant Field Values

DOCUMENT_START_OFFSET_PARAMETER_NAME

public static final String DOCUMENT_START_OFFSET_PARAMETER_NAME
See Also:
Constant Field Values

DOCUMENT_END_OFFSET_PARAMETER_NAME

public static final String DOCUMENT_END_OFFSET_PARAMETER_NAME
See Also:
Constant Field Values
Method Detail

getSourceUrl

public URL getSourceUrl()
Documents are identified by URLs


setSourceUrl

public void setSourceUrl(URL sourceUrl)
Set method for the document's URL


getSourceUrlOffsets

public Long[] getSourceUrlOffsets()
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.


getSourceUrlStartOffset

public Long getSourceUrlStartOffset()
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method gets the start offset.


getSourceUrlEndOffset

public Long getSourceUrlEndOffset()
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method gets the end offset.


getContent

public DocumentContent getContent()
The content of the document: wraps e.g. String for text; MPEG for video; etc.


setContent

public void setContent(DocumentContent newContent)
Set method for the document content


getAnnotations

public AnnotationSet getAnnotations()
Get the default set of annotations. The set is created if it doesn't exist yet.


getAnnotations

public AnnotationSet getAnnotations(String name)
Get a named set of annotations. Creates a new set if one with this name doesn't exist yet.


getNamedAnnotationSets

public Map getNamedAnnotationSets()
Returns a map with the named annotation sets


removeAnnotationSet

public void removeAnnotationSet(String name)
Removes one of the named annotation sets. Note that the default annotation set cannot be removed.

Parameters:
name - the name of the annotation set to be removed

setMarkupAware

public void setMarkupAware(Boolean b)
Make the document markup-aware. This will trigger the creation of a DocumentFormat object at Document initialisation time; the DocumentFormat object will unpack the markup in the Document and add it as annotations. Documents are not markup-aware by default.

Parameters:
b - markup awareness status.

getMarkupAware

public Boolean getMarkupAware()
Get the markup awareness status of the Document.

Returns:
whether the Document is markup aware.

setPreserveOriginalContent

public void setPreserveOriginalContent(Boolean b)
Allow/disallow preserving of the original document content. If is true the original content will be retrieved from the DocumentContent object and preserved as document feature.


getPreserveOriginalContent

public Boolean getPreserveOriginalContent()
Get the preserving of content status of the Document.

Returns:
whether the Document should preserve it's original content.

setCollectRepositioningInfo

public void setCollectRepositioningInfo(Boolean b)
Allow/disallow collecting of repositioning information. If is true information will be retrieved and preserved as document feature.
Preserving of repositioning information give the possibilities for converting of coordinates between the original document content and extracted from the document text.


getCollectRepositioningInfo

public Boolean getCollectRepositioningInfo()
Get the collectiong and preserving of repositioning information for the Document.
Preserving of repositioning information give the possibilities for converting of coordinates between the original document content and extracted from the document text.

Returns:
whether the Document should collect and preserve information.

toXml

public String toXml()
Returns a GateXml document. This document is actually a serialization of a Gate Document in XML.

Returns:
a string representing a Gate Xml document

toXml

public String toXml(Set aSourceAnnotationSet,
                    boolean includeFeatures)
Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet. Warning: Annotations from the aSourceAnnotationSet will be lost if they will cause a crosed over situation.

Parameters:
aSourceAnnotationSet - is an annotation set containing all the annotations that will be combined with the original marup set.
includeFeatures - determines whether or not features and gate IDs of the annotations should be included as attributes on the tags or not. If false, then only the annotation types are exported as tags, with no attributes.
Returns:
a string representing an XML document containing the original markup + dumped annotations form the aSourceAnnotationSet

toXml

public String toXml(Set aSourceAnnotationSet)
Equivalent to toXml(aSourceAnnotationSet, true).


edit

public void edit(Long start,
                 Long end,
                 DocumentContent replacement)
          throws InvalidOffsetException
Make changes to the content.

Throws:
InvalidOffsetException

addDocumentListener

public void addDocumentListener(DocumentListener l)
Adds a DocumentListener to this document. All the registered listeners will be notified of changes occured to the document.


removeDocumentListener

public void removeDocumentListener(DocumentListener l)
Removes one of the previously registered document listeners.


setSourceUrlEndOffset

public void setSourceUrlEndOffset(Long sourceUrlEndOffset)
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method sets the end offset.


setSourceUrlStartOffset

public void setSourceUrlStartOffset(Long sourceUrlStartOffset)
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method sets the start offset.