gate.corpora
Class DocumentImpl

java.lang.Object
  |
  +--gate.util.AbstractFeatureBearer
        |
        +--gate.creole.AbstractResource
              |
              +--gate.creole.AbstractLanguageResource
                    |
                    +--gate.corpora.DocumentImpl
All Implemented Interfaces:
Comparable, CreoleListener, DatastoreListener, Document, EventListener, FeatureBearer, LanguageResource, NameBearer, Resource, Serializable, TextualDocument
Direct Known Subclasses:
DatabaseDocumentImpl

public class DocumentImpl
extends AbstractLanguageResource
implements TextualDocument, CreoleListener, DatastoreListener

Represents the commonalities between all sorts of documents.

Editing

The DocumentImpl class implements the Document interface. The DocumentContentImpl class models the textual or audio-visual materials which are the source and content of Documents. The AnnotationSetImpl class supplies annotations on Documents.

Abbreviations:

We add an edit method to each of these classes; for DC and AS the methods are package private; D has the public method.

   void edit(Long start, Long end, DocumentContent replacement)
   throws InvalidOffsetException;
 

D receives edit requests and forwards them to DC and AS. On DC, this method makes a change to the content - e.g. replacing a String range from start to end with replacement. (Deletions are catered for by having replacement = null.) D then calls AS.edit on each of its annotation sets.

On AS, edit calls replacement.size() (i.e. DC.size()) to figure out how long the replacement is (0 for null). It then considers annotations that terminate (start or end) in the altered or deleted range as invalid; annotations that terminate after the range have their offsets adjusted. I.e.:

A note re. AS and annotations: annotations no longer have offsets as in the old model, they now have nodes, and nodes have offsets.

To implement AS.edit, we have several indices:

   HashMap annotsByStartNode, annotsByEndNode;
 
which map node ids to annotations;
   RBTreeMap nodesByOffset;
 
which maps offset to Nodes.

When we get an edit request, we traverse that part of the nodesByOffset tree representing the altered or deleted range of the DC. For each node found, we delete any annotations that terminate on the node, and then delete the node itself. We then traverse the rest of the tree, changing the offset on all remaining nodes by:

   newOffset =
     oldOffset -
     (
       (end - start) -                                     // size of mod
       ( (replacement == null) ? 0 : replacement.size() )  // size of repl
     );
 
Note that we use the same convention as e.g. java.lang.String: start offsets are inclusive; end offsets are exclusive. I.e. for string "abcd" range 1-3 = "bc". Examples, for a node with offset 4:
 edit(1, 3, "BC");
 newOffset = 4 - ( (3 - 1) - 2 ) = 4

 edit(1, 3, null);
 newOffset = 4 - ( (3 - 1) - 0 ) = 2

 edit(1, 3, "BBCC");
 newOffset = 4 - ( (3 - 1) - 4 ) = 6
 

See Also:
Serialized Form

Inner Class Summary
(package private)  class DocumentImpl.AnnotationComparator
          Inner class needed to compare annotations
 
Field Summary
private  int ASC
          Constant used in the inner class AnnotationComparator to order annotations ascending
private  Boolean collectRepositioningInfo
          If you set this flag to true the repositioning information for the document will be kept in the document feature.
protected  DocumentContent content
          The content of the document
private  Annotation crossedOverAnnotation
          This is a variable which contains the latest crossed over annotation found during export with preserving format, i.e., toXml(annotations) method.
private static boolean DEBUG
          Debug flag
protected  AnnotationSet defaultAnnots
          The default annotation set
private  int DESC
          Constant used in the inner class AnnotationComparator to order annotations descending
private  int DOC_SIZE_MULTIPLICATION_FACTOR
          This field is used when creating StringBuffers for toXml() methods.
private  Vector documentListeners
           
protected  String encoding
          The encoding of the source of the document content
private static Map entitiesMap
          A map initialized in init() containing entities that needs to be replaced in strings
private  Vector gateListeners
           
protected  Boolean markupAware
          Is the document markup-aware?
protected  Map namedAnnotSets
          Named sets of annotations
protected  int nextAnnotationId
          The id of the next new annotation
protected  int nextNodeId
          The id of the next new node
private  int ORDER_ON_ANNOT_ID
          Constant used in the inner class AnnotationComparator to order annotations on their ID
private  int ORDER_ON_END_OFFSET
          Constant used in the inner class AnnotationComparator to order annotations on their end offset
private  int ORDER_ON_START_OFFSET
          Constant used in the inner class AnnotationComparator to order annotations on their start offset
private  Boolean preserveOriginalContent
          If you set this flag to true the original content of the document will be kept in the document feature.
private  String rootEnd
          The closing tag for the document root.
(package private) static long serialVersionUID
          Freeze the serialization UID.
private  Integer smallestAnnotationID
          Used by the XML dump preserving format method to remember the smallest annoation ID as a marker for the XML document root.
protected  URL sourceUrl
          The source URL
protected  Long sourceUrlEndOffset
          The end of the range that the content comes from at the source URL (or null if none).
protected  Long sourceUrlStartOffset
          The start of the range that the content comes from at the source URL (or null if none).
private  String stringContent
          A property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL.
 
Fields inherited from class gate.creole.AbstractLanguageResource
dataStore, lrPersistentId
 
Fields inherited from class gate.creole.AbstractResource
name
 
Fields inherited from class gate.util.AbstractFeatureBearer
features
 
Fields inherited from interface gate.Document
DOCUMENT_ENCODING_PARAMETER_NAME, DOCUMENT_END_OFFSET_PARAMETER_NAME, DOCUMENT_MARKUP_AWARE_PARAMETER_NAME, DOCUMENT_PRESERVE_CONTENT_PARAMETER_NAME, DOCUMENT_REPOSITIONING_PARAMETER_NAME, DOCUMENT_START_OFFSET_PARAMETER_NAME, DOCUMENT_STRING_CONTENT_PARAMETER_NAME, DOCUMENT_URL_PARAMETER_NAME
 
Constructor Summary
DocumentImpl()
          Default construction.
 
Method Summary
(package private) static void ()
           
 void addDocumentListener(DocumentListener l)
          Adds a DocumentListener to this document.
private  int analyseAmpCodding(String content)
          This function compute size of the ampersand codded sequence when semicolin is not present.
private  String annotationSetToXml(AnnotationSet anAnnotationSet)
          This method saves an AnnotationSet as XML.
private  void buildEntityMapFromString(String aScanString, TreeMap aMapToFill)
          This method takes aScanString and searches for those chars from entitiesMap that appear in the string.
 void cleanup()
          Clear all the data members of the object.
private  void collectInformationForAmpCodding(String content, RepositioningInfo info, boolean shouldCorrectCR)
          Collect information for substitution of "&xxx;" with "y" It couldn't be collected a position information about some unicode and &-coded symbols during parsing.
private  void collectInformationForWS(String content, RepositioningInfo info)
          HTML parser perform substitution of multiple whitespaces (WS) with a single WS.
 int compareTo(Object o)
          Ordering based on URL.toString() and the URL offsets (if any)
private  void correctRepositioningForCRLFInXML(String content, RepositioningInfo info)
          Correct repositioning information for substitution of "\r\n" with "\n"
 void datastoreClosed(CreoleEvent e)
          Called when a DataStore has been closed
 void datastoreCreated(CreoleEvent e)
          Called when a DataStore has been created
 void datastoreOpened(CreoleEvent e)
          Called when a DataStore has been opened
 void edit(Long start, Long end, DocumentContent replacement)
          Propagate edit changes to the document content and annotations.
private  String featuresToXml(FeatureMap aFeatureMap)
          This method saves a FeatureMap as XML elements.
private  StringBuffer filterNonXmlChars(StringBuffer aStrBuffer)
          This method filters any non XML char see: http://www.w3c.org/TR/2000/REC-xml-20001006#charsets All non XML chars will be replaced with 0x20 (space char) This assures that the next time the document is loaded there won't be any problems.
protected  void fireAnnotationSetAdded(DocumentEvent e)
           
protected  void fireAnnotationSetRemoved(DocumentEvent e)
           
 AnnotationSet getAnnotations()
          Get the default set of annotations.
 AnnotationSet getAnnotations(String name)
          Get a named set of annotations.
private  List getAnnotationsForOffset(Set aDumpAnnotSet, Long offset)
          This method returns a list with annotations ordered that way that they can be serialized from left to right, at the offset.
 Boolean getCollectRepositioningInfo()
          Get the collectiong and preserving of repositioning information for the Document.
 DocumentContent getContent()
          The content of the document: a String for text; MPEG for video; etc.
 String getEncoding()
          Get the encoding of the document content source
 Boolean getMarkupAware()
          Get the markup awareness status of the Document.
 Map getNamedAnnotationSets()
          Returns a map with the named annotation sets.
 Integer getNextAnnotationId()
          Generate and return the next annotation ID
 Integer getNextNodeId()
          Generate and return the next node ID
protected  String getOrderingString()
          Utility method to produce a string for comparison in ordering.
 Boolean getPreserveOriginalContent()
          Get the preserving of content status of the Document.
 URL getSourceUrl()
          Documents are identified by URLs
 Long getSourceUrlEndOffset()
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 Long[] getSourceUrlOffsets()
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 Long getSourceUrlStartOffset()
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 String getStringContent()
          The stringContent of a document is a property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL.
private  boolean hasOriginalContentFeatures()
          Return true only if the document has features for original content and repositioning information.
 Resource init()
          Initialise this resource, and return it.
private  boolean insertsSafety(AnnotationSet aTargetAnnotSet, Annotation aSourceAnnotation)
          This method verifies if aSourceAnnotation can ve inserted safety into the aTargetAnnotSet.
private  boolean isNumber(char ch, boolean hex)
          Check for numeric range.
 boolean isValidOffset(Long offset)
          Check that an offset is valid, i.e.
 boolean isValidOffsetRange(Long start, Long end)
          Check that both start and end are valid offsets and that they constitute a valid offset range, i.e.
static boolean isXmlChar(char ch)
          This method decide if a char is a valid XML one or not
 void removeAnnotationSet(String name)
          Removes one of the named annotation sets.
 void removeDocumentListener(DocumentListener l)
          Removes one of the previously registered document listeners.
private  StringBuffer replaceCharsWithEntities(String anInputString)
          This method replace all chars that appears in the anInputString and also that are in the entitiesMap with their corresponding entity
 void resourceAdopted(DatastoreEvent evt)
          Called by a datastore when a new resource has been adopted
 void resourceDeleted(DatastoreEvent evt)
          Called by a datastore when a resource has been deleted
 void resourceLoaded(CreoleEvent e)
          Called when a new Resource has been loaded into the system
 void resourceRenamed(Resource resource, String oldName, String newName)
          Called when the creole register has renamed a resource.1
 void resourceUnloaded(CreoleEvent e)
          Called when a Resource has been removed from the system
 void resourceWritten(DatastoreEvent evt)
          Called by a datastore when a resource has been wrote into the datastore
private  String saveAnnotationSetAsXml(AnnotationSet aDumpAnnotSet, boolean includeFeatures)
          This method saves all the annotations from aDumpAnnotSet and combines them with the document content.
private  String saveAnnotationSetAsXmlInOrig(Set aSourceAnnotationSet, boolean includeFeatures)
          This method saves all the annotations from aDumpAnnotSet and combines them with the original document content, if preserved as feature.
 void setCollectRepositioningInfo(Boolean b)
          Allow/disallow collecting of repositioning information.
 void setContent(DocumentContent content)
          Set method for the document content
 void setDataStore(DataStore dataStore)
          Set the data store that this LR lives in.
 void setEncoding(String encoding)
          Set the encoding of the document content source
 void setLRPersistenceId(Object lrID)
          Sets the persistence id of this LR.
 void setMarkupAware(Boolean newMarkupAware)
          Make the document markup-aware.
 void setNextAnnotationId(int aNextAnnotationId)
          Sets the nextAnnotationId
 void setPreserveOriginalContent(Boolean b)
          Allow/disallow preserving of the original document content.
 void setSourceUrl(URL sourceUrl)
          Set method for the document's URL
 void setSourceUrlEndOffset(Long sourceUrlEndOffset)
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 void setSourceUrlStartOffset(Long sourceUrlStartOffset)
          Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
 void setStringContent(String stringContent)
          The stringContent of a document is a property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL.
private  String textWithNodes(String aText)
          This method creates Node XML elements and inserts them at the corresponding offset inside the text.
 String toString()
          String respresentation
 String toXml()
          Returns a GateXml document that is a custom XML format for wich there is a reader inside GATE called gate.xml.GateFormatXmlHandler.
 String toXml(Set aSourceAnnotationSet)
          Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet.
 String toXml(Set aSourceAnnotationSet, boolean includeFeatures)
          Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet.
private  String writeEmptyTag(Annotation annot)
           
private  String writeEmptyTag(Annotation annot, boolean includeNamespace)
          Returns a string representing an empty tag based on the input annot
private  String writeEndTag(Annotation annot)
          Returns a string representing an end tag based on the input annot
private  String writeFeatures(FeatureMap feat, boolean includeNamespace)
          Returns a string representing a FeatureMap serialized as XML attributes
private  String writeStartTag(Annotation annot, boolean includeFeatures)
           
private  String writeStartTag(Annotation annot, boolean includeFeatures, boolean includeNamespace)
          Returns a string representing a start tag based on the input annot
 
Methods inherited from class gate.creole.AbstractLanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setParent, sync
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class gate.util.AbstractFeatureBearer
getFeatures, setFeatures
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
 
Methods inherited from interface gate.LanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setParent, sync
 
Methods inherited from interface gate.Resource
getParameterValue, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 

Field Detail

DEBUG

private static final boolean DEBUG
Debug flag

preserveOriginalContent

private Boolean preserveOriginalContent
If you set this flag to true the original content of the document will be kept in the document feature.
Default value is false to avoid the unnecessary waste of memory

collectRepositioningInfo

private Boolean collectRepositioningInfo
If you set this flag to true the repositioning information for the document will be kept in the document feature.
Default value is false to avoid the unnecessary waste of time and memory

crossedOverAnnotation

private Annotation crossedOverAnnotation
This is a variable which contains the latest crossed over annotation found during export with preserving format, i.e., toXml(annotations) method.

nextAnnotationId

protected int nextAnnotationId
The id of the next new annotation

nextNodeId

protected int nextNodeId
The id of the next new node

sourceUrl

protected URL sourceUrl
The source URL

content

protected DocumentContent content
The content of the document

encoding

protected String encoding
The encoding of the source of the document content

smallestAnnotationID

private Integer smallestAnnotationID
Used by the XML dump preserving format method to remember the smallest annoation ID as a marker for the XML document root.

rootEnd

private String rootEnd
The closing tag for the document root.

DOC_SIZE_MULTIPLICATION_FACTOR

private final int DOC_SIZE_MULTIPLICATION_FACTOR
This field is used when creating StringBuffers for toXml() methods. The size of the StringBuffer will be docDonctent.size() multiplied by this value. It is aimed to improve the performance of StringBuffer

ORDER_ON_START_OFFSET

private final int ORDER_ON_START_OFFSET
Constant used in the inner class AnnotationComparator to order annotations on their start offset

ORDER_ON_END_OFFSET

private final int ORDER_ON_END_OFFSET
Constant used in the inner class AnnotationComparator to order annotations on their end offset

ORDER_ON_ANNOT_ID

private final int ORDER_ON_ANNOT_ID
Constant used in the inner class AnnotationComparator to order annotations on their ID

ASC

private final int ASC
Constant used in the inner class AnnotationComparator to order annotations ascending

DESC

private final int DESC
Constant used in the inner class AnnotationComparator to order annotations descending

entitiesMap

private static Map entitiesMap
A map initialized in init() containing entities that needs to be replaced in strings

sourceUrlStartOffset

protected Long sourceUrlStartOffset
The start of the range that the content comes from at the source URL (or null if none).

sourceUrlEndOffset

protected Long sourceUrlEndOffset
The end of the range that the content comes from at the source URL (or null if none).

defaultAnnots

protected AnnotationSet defaultAnnots
The default annotation set

namedAnnotSets

protected Map namedAnnotSets
Named sets of annotations

stringContent

private String stringContent
A property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL.

markupAware

protected Boolean markupAware
Is the document markup-aware?

serialVersionUID

static final long serialVersionUID
Freeze the serialization UID.

documentListeners

private transient Vector documentListeners

gateListeners

private transient Vector gateListeners
Constructor Detail

DocumentImpl

public DocumentImpl()
Default construction. Content left empty.
Method Detail

init

public Resource init()
              throws ResourceInstantiationException
Initialise this resource, and return it.
Specified by:
init in interface Resource
Overrides:
init in class AbstractResource

correctRepositioningForCRLFInXML

private void correctRepositioningForCRLFInXML(String content,
                                              RepositioningInfo info)
Correct repositioning information for substitution of "\r\n" with "\n"

collectInformationForAmpCodding

private void collectInformationForAmpCodding(String content,
                                             RepositioningInfo info,
                                             boolean shouldCorrectCR)
Collect information for substitution of "&xxx;" with "y" It couldn't be collected a position information about some unicode and &-coded symbols during parsing. The parser "hide" the information about the position of such kind of parsed text. So, there is minimal chance to have &-coded symbol inside the covered by repositioning records area. The new record should be created for every coded symbol outside the existing records.
If shouldCorrectCR flag is true the correction for CRLF substitution is performed.

analyseAmpCodding

private int analyseAmpCodding(String content)
This function compute size of the ampersand codded sequence when semicolin is not present.

isNumber

private boolean isNumber(char ch,
                         boolean hex)
Check for numeric range. If hex is true the A..F range is included

collectInformationForWS

private void collectInformationForWS(String content,
                                     RepositioningInfo info)
HTML parser perform substitution of multiple whitespaces (WS) with a single WS. To create correct repositioning information structure we should keep the information for such multiple WS.
The criteria for WS is (ch <= ' ').

cleanup

public void cleanup()
Clear all the data members of the object.
Specified by:
cleanup in interface Resource
Overrides:
cleanup in class AbstractLanguageResource

getSourceUrl

public URL getSourceUrl()
Documents are identified by URLs
Specified by:
getSourceUrl in interface Document

setSourceUrl

public void setSourceUrl(URL sourceUrl)
Set method for the document's URL
Specified by:
setSourceUrl in interface Document

getSourceUrlOffsets

public Long[] getSourceUrlOffsets()
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document.
Specified by:
getSourceUrlOffsets in interface Document

setPreserveOriginalContent

public void setPreserveOriginalContent(Boolean b)
Allow/disallow preserving of the original document content. If is true the original content will be retrieved from the DocumentContent object and preserved as document feature.
Specified by:
setPreserveOriginalContent in interface Document

getPreserveOriginalContent

public Boolean getPreserveOriginalContent()
Get the preserving of content status of the Document.
Specified by:
getPreserveOriginalContent in interface Document
Returns:
whether the Document should preserve it's original content.

setCollectRepositioningInfo

public void setCollectRepositioningInfo(Boolean b)
Allow/disallow collecting of repositioning information. If is true information will be retrieved and preserved as document feature.
Preserving of repositioning information give the possibilities for converting of coordinates between the original document content and extracted from the document text.
Specified by:
setCollectRepositioningInfo in interface Document

getCollectRepositioningInfo

public Boolean getCollectRepositioningInfo()
Get the collectiong and preserving of repositioning information for the Document.
Preserving of repositioning information give the possibilities for converting of coordinates between the original document content and extracted from the document text.
Specified by:
getCollectRepositioningInfo in interface Document
Returns:
whether the Document should collect and preserve information.

getSourceUrlStartOffset

public Long getSourceUrlStartOffset()
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method gets the start offset.
Specified by:
getSourceUrlStartOffset in interface Document

setSourceUrlStartOffset

public void setSourceUrlStartOffset(Long sourceUrlStartOffset)
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method sets the start offset.
Specified by:
setSourceUrlStartOffset in interface Document

getSourceUrlEndOffset

public Long getSourceUrlEndOffset()
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method gets the end offset.
Specified by:
getSourceUrlEndOffset in interface Document

setSourceUrlEndOffset

public void setSourceUrlEndOffset(Long sourceUrlEndOffset)
Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. This method sets the end offset.
Specified by:
setSourceUrlEndOffset in interface Document

getContent

public DocumentContent getContent()
The content of the document: a String for text; MPEG for video; etc.
Specified by:
getContent in interface Document

setContent

public void setContent(DocumentContent content)
Set method for the document content
Specified by:
setContent in interface Document

getEncoding

public String getEncoding()
Get the encoding of the document content source
Specified by:
getEncoding in interface TextualDocument
Following copied from interface: gate.TextualDocument
Returns:
a String value.

setEncoding

public void setEncoding(String encoding)
Set the encoding of the document content source

getAnnotations

public AnnotationSet getAnnotations()
Get the default set of annotations. The set is created if it doesn't exist yet.
Specified by:
getAnnotations in interface Document

getAnnotations

public AnnotationSet getAnnotations(String name)
Get a named set of annotations. Creates a new set if one with this name doesn't exist yet. If the provided name is null then it returns the default annotation set.
Specified by:
getAnnotations in interface Document

setMarkupAware

public void setMarkupAware(Boolean newMarkupAware)
Make the document markup-aware. This will trigger the creation of a DocumentFormat object at Document initialisation time; the DocumentFormat object will unpack the markup in the Document and add it as annotations. Documents are not markup-aware by default.
Specified by:
setMarkupAware in interface Document
Parameters:
b - markup awareness status.

getMarkupAware

public Boolean getMarkupAware()
Get the markup awareness status of the Document. Documents are markup-aware by default.
Specified by:
getMarkupAware in interface Document
Returns:
whether the Document is markup aware.

toXml

public String toXml(Set aSourceAnnotationSet)
Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet. It is equivalent to toXml(aSourceAnnotationSet, true).
Specified by:
toXml in interface Document

toXml

public String toXml(Set aSourceAnnotationSet,
                    boolean includeFeatures)
Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet. Warning: Annotations from the aSourceAnnotationSet will be lost if they will cause a crosed over situation.
Specified by:
toXml in interface Document
Parameters:
aSourceAnnotationSet - is an annotation set containing all the annotations that will be combined with the original marup set. If the param is null it will only dump the original markups.
includeFeatures - is a boolean that controls whether the annotation features should be included or not. If false, only the annotation type is included in the tag.
Returns:
a string representing an XML document containing the original markup + dumped annotations form the aSourceAnnotationSet

insertsSafety

private boolean insertsSafety(AnnotationSet aTargetAnnotSet,
                              Annotation aSourceAnnotation)
This method verifies if aSourceAnnotation can ve inserted safety into the aTargetAnnotSet. Safety means that it doesn't violate the crossed over contition with any annotation from the aTargetAnnotSet.
Parameters:
aTargetAnnotSet - the annotation set to include the aSourceAnnotation
aSourceAnnotation - the annotation to be inserted into the aTargetAnnotSet
Returns:
true if the annotation inserts safety, or false otherwise.

saveAnnotationSetAsXml

private String saveAnnotationSetAsXml(AnnotationSet aDumpAnnotSet,
                                      boolean includeFeatures)
This method saves all the annotations from aDumpAnnotSet and combines them with the document content.
Parameters:
aDumpAnnotationSet - is a GATE annotation set prepared to be used on the raw text from document content. If aDumpAnnotSet is null then an empty string will be returned.
includeFeatures - is a boolean, which controls whether the annotation features and gate ID are included or not.
Returns:
The XML document obtained from raw text + the information from the dump annotation set.

hasOriginalContentFeatures

private boolean hasOriginalContentFeatures()
Return true only if the document has features for original content and repositioning information.

saveAnnotationSetAsXmlInOrig

private String saveAnnotationSetAsXmlInOrig(Set aSourceAnnotationSet,
                                            boolean includeFeatures)
This method saves all the annotations from aDumpAnnotSet and combines them with the original document content, if preserved as feature.
Parameters:
aDumpAnnotationSet - is a GATE annotation set prepared to be used on the raw text from document content. If aDumpAnnotSet is null then an empty string will be returned.
includeFeatures - is a boolean, which controls whether the annotation features and gate ID are included or not.
Returns:
The XML document obtained from raw text + the information from the dump annotation set.

getAnnotationsForOffset

private List getAnnotationsForOffset(Set aDumpAnnotSet,
                                     Long offset)
This method returns a list with annotations ordered that way that they can be serialized from left to right, at the offset. If one of the params is null then an empty list will be returned.
Parameters:
aDumpAnnotSet - is a set containing all annotations that will be dumped.
offset - represent the offset at witch the annotation must start AND/OR end.
Returns:
a list with those annotations that need to be serialized.

writeStartTag

private String writeStartTag(Annotation annot,
                             boolean includeFeatures)

writeStartTag

private String writeStartTag(Annotation annot,
                             boolean includeFeatures,
                             boolean includeNamespace)
Returns a string representing a start tag based on the input annot

buildEntityMapFromString

private void buildEntityMapFromString(String aScanString,
                                      TreeMap aMapToFill)
This method takes aScanString and searches for those chars from entitiesMap that appear in the string. A tree map(offset2Char) is filled using as key the offsets where those Chars appear and the Char. If one of the params is null the method simply returns.

writeEmptyTag

private String writeEmptyTag(Annotation annot)

writeEmptyTag

private String writeEmptyTag(Annotation annot,
                             boolean includeNamespace)
Returns a string representing an empty tag based on the input annot

writeEndTag

private String writeEndTag(Annotation annot)
Returns a string representing an end tag based on the input annot

writeFeatures

private String writeFeatures(FeatureMap feat,
                             boolean includeNamespace)
Returns a string representing a FeatureMap serialized as XML attributes

toXml

public String toXml()
Returns a GateXml document that is a custom XML format for wich there is a reader inside GATE called gate.xml.GateFormatXmlHandler. What it does is to serialize a GATE document in an XML format.
Specified by:
toXml in interface Document
Returns:
a string representing a Gate Xml document. If saved in a file,this string must be written using the UTF-8 encoding because the first line in the generated xml document is

filterNonXmlChars

private StringBuffer filterNonXmlChars(StringBuffer aStrBuffer)
This method filters any non XML char see: http://www.w3c.org/TR/2000/REC-xml-20001006#charsets All non XML chars will be replaced with 0x20 (space char) This assures that the next time the document is loaded there won't be any problems.
Parameters:
aStrBuffer - represents the input String that is filtred. If the aStrBuffer is null then an empty string will be returend
Returns:
the "purified" StringBuffer version of the aStrBuffer

isXmlChar

public static boolean isXmlChar(char ch)
This method decide if a char is a valid XML one or not
Parameters:
ch - the char to be tested
Returns:
true if is a valid XML char and fals if is not.

featuresToXml

private String featuresToXml(FeatureMap aFeatureMap)
This method saves a FeatureMap as XML elements.

replaceCharsWithEntities

private StringBuffer replaceCharsWithEntities(String anInputString)
This method replace all chars that appears in the anInputString and also that are in the entitiesMap with their corresponding entity
Parameters:
anInputString - the string analyzed. If it is null then returns the empty string
Returns:
a string representing the input string with chars replaced with entities

textWithNodes

private String textWithNodes(String aText)
This method creates Node XML elements and inserts them at the corresponding offset inside the text. Nodes are created from the default annotation set, as well as from all existing named annotation sets.
Parameters:
aText - The text representing the document's plain text.
Returns:
The text with empty elements.

annotationSetToXml

private String annotationSetToXml(AnnotationSet anAnnotationSet)
This method saves an AnnotationSet as XML.
Parameters:
anAnnotationSet - The annotation set that has to be saved as XML.
Returns:
a String like this: ....

getNamedAnnotationSets

public Map getNamedAnnotationSets()
Returns a map with the named annotation sets. It returns null if no named annotaton set exists.
Specified by:
getNamedAnnotationSets in interface Document

removeAnnotationSet

public void removeAnnotationSet(String name)
Removes one of the named annotation sets. Note that the default annotation set cannot be removed.
Specified by:
removeAnnotationSet in interface Document
Parameters:
name - the name of the annotation set to be removed

edit

public void edit(Long start,
                 Long end,
                 DocumentContent replacement)
          throws InvalidOffsetException
Propagate edit changes to the document content and annotations.
Specified by:
edit in interface Document

isValidOffset

public boolean isValidOffset(Long offset)
Check that an offset is valid, i.e. it is non-null, greater than or equal to 0 and less than the size of the document content.

isValidOffsetRange

public boolean isValidOffsetRange(Long start,
                                  Long end)
Check that both start and end are valid offsets and that they constitute a valid offset range, i.e. start is greater than or equal to long.

setNextAnnotationId

public void setNextAnnotationId(int aNextAnnotationId)
Sets the nextAnnotationId

getNextAnnotationId

public Integer getNextAnnotationId()
Generate and return the next annotation ID

getNextNodeId

public Integer getNextNodeId()
Generate and return the next node ID

compareTo

public int compareTo(Object o)
              throws ClassCastException
Ordering based on URL.toString() and the URL offsets (if any)
Specified by:
compareTo in interface Comparable

getOrderingString

protected String getOrderingString()
Utility method to produce a string for comparison in ordering. String is based on the source URL and offsets.

static void ()

getStringContent

public String getStringContent()
The stringContent of a document is a property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL. Use the getContent method instead to get the actual document content.

setStringContent

public void setStringContent(String stringContent)
The stringContent of a document is a property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL. Use the setContent method instead to update the actual document content.

toString

public String toString()
String respresentation
Overrides:
toString in class Object

removeDocumentListener

public void removeDocumentListener(DocumentListener l)
Description copied from interface: Document
Removes one of the previously registered document listeners.
Specified by:
removeDocumentListener in interface Document

addDocumentListener

public void addDocumentListener(DocumentListener l)
Description copied from interface: Document
Adds a DocumentListener to this document. All the registered listeners will be notified of changes occured to the document.
Specified by:
addDocumentListener in interface Document

fireAnnotationSetAdded

protected void fireAnnotationSetAdded(DocumentEvent e)

fireAnnotationSetRemoved

protected void fireAnnotationSetRemoved(DocumentEvent e)

resourceLoaded

public void resourceLoaded(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a new Resource has been loaded into the system
Specified by:
resourceLoaded in interface CreoleListener

resourceUnloaded

public void resourceUnloaded(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a Resource has been removed from the system
Specified by:
resourceUnloaded in interface CreoleListener

datastoreOpened

public void datastoreOpened(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a DataStore has been opened
Specified by:
datastoreOpened in interface CreoleListener

datastoreCreated

public void datastoreCreated(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a DataStore has been created
Specified by:
datastoreCreated in interface CreoleListener

resourceRenamed

public void resourceRenamed(Resource resource,
                            String oldName,
                            String newName)
Description copied from interface: CreoleListener
Called when the creole register has renamed a resource.1
Specified by:
resourceRenamed in interface CreoleListener

datastoreClosed

public void datastoreClosed(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a DataStore has been closed
Specified by:
datastoreClosed in interface CreoleListener

setLRPersistenceId

public void setLRPersistenceId(Object lrID)
Description copied from interface: LanguageResource
Sets the persistence id of this LR. To be used only in the Factory and DataStore code.
Specified by:
setLRPersistenceId in interface LanguageResource
Overrides:
setLRPersistenceId in class AbstractLanguageResource

resourceAdopted

public void resourceAdopted(DatastoreEvent evt)
Description copied from interface: DatastoreListener
Called by a datastore when a new resource has been adopted
Specified by:
resourceAdopted in interface DatastoreListener

resourceDeleted

public void resourceDeleted(DatastoreEvent evt)
Description copied from interface: DatastoreListener
Called by a datastore when a resource has been deleted
Specified by:
resourceDeleted in interface DatastoreListener

resourceWritten

public void resourceWritten(DatastoreEvent evt)
Description copied from interface: DatastoreListener
Called by a datastore when a resource has been wrote into the datastore
Specified by:
resourceWritten in interface DatastoreListener

setDataStore

public void setDataStore(DataStore dataStore)
                  throws PersistenceException
Description copied from interface: LanguageResource
Set the data store that this LR lives in.
Specified by:
setDataStore in interface LanguageResource
Overrides:
setDataStore in class AbstractLanguageResource