gate.creole.ml.weka
Class StringToNominalFilter

java.lang.Object
  extended byweka.filters.Filter
      extended bygate.creole.ml.weka.StringToNominalFilter
All Implemented Interfaces:
weka.core.OptionHandler, Serializable

public class StringToNominalFilter
extends weka.filters.Filter
implements weka.core.OptionHandler

This filter converts one or more string attributes from the input dataset into nominal attributes.

See Also:
Serialized Form

Nested Class Summary
protected static class StringToNominalFilter.AttributeData
          Stores data about one attribute to be converted.
protected static class StringToNominalFilter.WordCount
           
protected static class StringToNominalFilter.WordData
           
 
Field Summary
protected  List attributesData
           
static String FREQUENCY
          Constant for conversion method.
protected static Vector optionsDesc
          The description for the options accepted by this filter
static String TFIDF
          Constant for conversion method.
 
Fields inherited from class weka.filters.Filter
m_NewBatch
 
Constructor Summary
StringToNominalFilter()
          Anonymous constructor.
 
Method Summary
protected  int addLeaves(Map map)
           
 boolean batchFinished()
          Signifies that this batch of input to the filter is finished.
protected  void buildOutputFormat()
          Called after a batch of input has finished.
 String[] getOptions()
           
 boolean input(weka.core.Instance instance)
          Input an instance for filtering.
protected  boolean isString(int index)
          Checks whether the aqttribute at a particular index in the input dataset is string.
 Enumeration listOptions()
           
static void main(String[] args)
           
protected  void parseOptions()
          Parses the set of options supplied to this filter
protected  weka.core.Instance processInstance(weka.core.Instance inputInstance)
          Once the output format is defined this method can be used to covert input instances into output instances.
 boolean setInputFormat(weka.core.Instances instanceInfo)
          Sets the format of the input instances.
 void setOptions(String[] options)
           
 
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

attributesData

protected List attributesData

optionsDesc

protected static Vector optionsDesc
The description for the options accepted by this filter


FREQUENCY

public static final String FREQUENCY
Constant for conversion method.

See Also:
Constant Field Values

TFIDF

public static final String TFIDF
Constant for conversion method.

See Also:
Constant Field Values
Constructor Detail

StringToNominalFilter

public StringToNominalFilter()
Anonymous constructor.

Method Detail

setInputFormat

public boolean setInputFormat(weka.core.Instances instanceInfo)
                       throws Exception
Sets the format of the input instances.

Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
false as this filter needs to see all the instances before being able to convert the input.
Throws:
weka.core.UnsupportedAttributeTypeException - if the selected attribute is not a string attribute.
Exception

input

public boolean input(weka.core.Instance instance)
Input an instance for filtering. The instance is processed and made available for output immediately.

Parameters:
instance - the input instance.
Returns:
true if the filtered instance may now be collected with output().
Throws:
IllegalStateException - if no input structure has been defined.

batchFinished

public boolean batchFinished()
Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.

Returns:
true if there are instances pending output.
Throws:
IllegalStateException - if no input structure has been defined.

buildOutputFormat

protected void buildOutputFormat()
Called after a batch of input has finished. Will perform all the necessary calculations to define the output format and build the data needed in order to convert the input instances into output.


main

public static void main(String[] args)

addLeaves

protected int addLeaves(Map map)

processInstance

protected weka.core.Instance processInstance(weka.core.Instance inputInstance)
Once the output format is defined this method can be used to covert input instances into output instances.

Parameters:
inputInstance -
Returns:
the coverted output instance.

isString

protected boolean isString(int index)
Checks whether the aqttribute at a particular index in the input dataset is string.

Parameters:
index -
Returns:
a boolean value.

listOptions

public Enumeration listOptions()
Specified by:
listOptions in interface weka.core.OptionHandler

setOptions

public void setOptions(String[] options)
                throws Exception
Specified by:
setOptions in interface weka.core.OptionHandler
Throws:
Exception

getOptions

public String[] getOptions()
Specified by:
getOptions in interface weka.core.OptionHandler

parseOptions

protected void parseOptions()
                     throws Exception
Parses the set of options supplied to this filter

Throws:
Exception