Log in Help
Print
Homesaletao 〉 splitch11.html
 

Chapter 11
Profiling Processing Resources [#]

11.1 Overview [#]

This is a reporting tool for GATE processing resources. It reports the total time taken by processing resources and the time taken for each document to be processed by an application of type corpus pipeline.

GATE use log4j, a logging system, to write profiling informations in a file. The GATE profiling reporting tool uses the file generated by log4j and produces a report on the processing resources. It profiles JAPE grammars at the rule level, enabling the user precisely identify the performance bottlenecks. It also produces a report on the time taken to process each document to find problematic documents.

This initial code for the reporting tool was written by Intelius employees Andrew Borthwick and Chirag Viradiya and generously released under the LGPL licence to be part of GATE.


PIC


Figure 11.1: Example of HTML profiling report for ANNIE


11.1.1 Features

11.1.2 Limitations

Be aware that the profiling doesn’t support non corpus pipeline as application type. There is indeed no interest in profiling a non corpus pipeline that works on one or no document at all. To get meaningful results you should run your corpus pipeline on at least 10 documents.

11.2 Graphical User Interface [#]

The activation of the profiling and the creation of profiling reports are accessible from the ‘Tools’ menu in GATE with the submenu ‘Profiling Reports’.

You can ‘Start Profiling Applications’ and ‘Stop Profiling Applications’ at any time. The logging is cumulative so if you want to get a new report you must use the ‘Clear Profiling History’ menu item when the profiling is stopped.

Be very careful that you must start the profiling before you load your application or you will need to reload every Processing Resource that uses a Transducer. Otherwise you will get an Exception similar to:

java.lang.IndexOutOfBoundsException: Index: 2, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at gate.jape.SinglePhaseTransducer.updateRuleTime(SinglePhaseTransducer.java:678)

Two types of reports are available: ‘Report on Processing Resources’ and ‘Report on Documents Processed’. See the previous section for more information.

11.3 Command Line Interface [#]

Report on processing resources Usage: java gate.util.reporting.PRTimeReporter [Options]

Options:

-i input file path (default: benchmark.txt in the user’s .gate directory1)

-m print media - html/text (default: html)

-z suppressZeroTimeEntries - true/false (default: true)

-s sorting order - exec_order/time_taken (default: exec_order)

-o output file path (default: report.html/txt in the system temporary directory)

-l logical start (not set by default)

-h show help

Note that suppressZeroTimeEntries will be ignored if the sorting order is ‘time_taken’

Report on documents processed Usage: java gate.util.reporting.DocTimeReporter [Options]

Options:

-i input file path (default: benchmark.txt in the user’s .gate directory2)

-m print media - html/text (default: html)

-d number of docs, use -1 for all docs (default: 10 docs)

-p processing resource name to be matched (default: all_prs)

-o output file path (default: report.html/txt in the system temporary directory)

-l logical start (not set by default)

-h show help

Examples

11.4 Application Programming Interface [#]

11.4.1 Log4j.properties

This is required to direct the profiling information to the benchmark.txt file. The benchmark.txt generated by GATE will be used as input for GATE profiling report tool as input.

11.4.2 Benchmark log format

The format of the benchmark file that logs the times is as follow:

timestamp START PR_name
timestamp duration benchmarkID class features
timestamp duration benchmarkID class features
...

with the timestamp being the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC.

Example:

1257269774770 START Sections_splitter
1257269774773 0 Sections_splitter.doc_EP-1026523-A1_xml_00008.documentLoaded
gate.creole.SerialAnalyserController
{corpusName=Corpus for EP-1026523-A1.xml_00008,
documentName=EP-1026523-A1.xml_00008}
...

11.4.3 Enabling profiling

There are two ways to enable profiling of the processing resources:

  1. In gate/build.properties, add the line: run.gate.enable.benchmark=true

  2. In your Java code, use the method: Benchmark.setBenchmarkingEnabled(true)

11.4.4 Reporting tool

Report on processing resources

  1. Instantiate the Class PRTimeReporter

    1. PRTimeReporter report = new PRTimeReporter();

  2. Set the input benchmark file

    1. File benchmarkFile = new File("benchmark.txt");

    2. report.setBenchmarkFile(benchmarkFile);

  3. Set the output report file

    1. File reportFile = new File("report.txt"); or

    2. File reportFile = new File("report.html");

    3. report.setReportFile(reportFile);

  4. Set the output format: in html or text format (default: MEDIA_HTML)

    1. report.setPrintMedia(PRTimeReporter.MEDIA_TEXT); or

    2. report.setPrintMedia(PRTimeReporter.MEDIA_HTML);

  5. Set the sorting order: Sort in order of execution or descending order of time taken (default: EXEC_ORDER)

    1. report.setSortOrder(PRTimeReporter.SORT_TIME_TAKEN); or

    2. report.setSortOrder(PRTimeReporter.SORT_EXEC_ORDER);

  6. Set if suppress zero time entries: True/False (default: True). Parameter ignored if SortOrder specified is ‘SORT_TIME_TAKEN’

    1. report.setSuppressZeroTimeEntries(true);

  7. Set the logical start: A string indicating the logical start to be operated upon for generating reports

    1. report.setLogicalStart("InteliusPipelineStart");

  8. Generate the text/html report

    1. report.executeReport();

Report on documents processed

  1. Instantiate the Class DocTimeReporter

    1. DocTimeReporter report = new DocTimeReporter();

  2. Set the input benchmark file

    1. File benchmarkFile = new File("benchmark.txt");

    2. report.setBenchmarkFile(benchmarkFile);

  3. Set the output report file

    1. File reportFile = new File("report.txt"); or

    2. File reportFile = new File("report.html");

    3. report.setReportFile(reportFile);

  4. Set the output format: Generate report in html or text format (default: MEDIA_HTML)

    1. report.setPrintMedia(DocTimeReporter.MEDIA_TEXT); or

    2. report.setPrintMedia(DocTimeReporter.MEDIA_HTML);

  5. Set the maximum number of documents: Maximum number of documents to be displayed in the report (default: 10 docs)

    1. report.setNoOfDocs(2); // 2 docs or

    2. report.setNoOfDocs(DocTimeReporter.ALL_DOCS); // All documents

  6. Set the PR matching regular expression: A PR name or a regular expression to filter the results (default: MATCH_ALL_PR_REGEX).

    1. report.setSearchString("HTML"); // match ALL PRS having HTML as substring

  7. Set the logical start: A string indicating the logical start to be operated upon for generating reports

    1. report.setLogicalStart("InteliusPipelineStart");

  8. Generate the text/html report

    1. report.executeReport();

1GATE versions up to 5.2 placed benchmark.txt in the execution directory.

2GATE versions up to 5.2 placed benchmark.txt in the execution directory.