Log in Help
Homegatewikicowdoc 〉 gatewiki.html

GATEWiki: User and Developer Guide

A Larson Cartoon

In a hurry? See the quick start section.


1. Introduction

Files and directories, documents and folders, disks and memory sticks, laptops and games machines, TV time-shifters and corporate IT systems. Data data everywhere, and never a drop to drink, as the Ancient Mariner could not have dreamt of saying. Wouldn't it be nice to be able to view your filesystem as a website, to be able to edit from multiple machines with and without network connections, to be able to have your own local copy and also share it with friends and colleagues, and not have to worry about merging it all back together?

GATEWiki, or CoW, is a "Controllable Wiki" and CMS that supports collaborative document creation with asynchonous off-line editing. CoW is desiged to make it easy to add interaction to static websites, and to support concurrent editing and off-line working with straightforward synchronisation (using Subversion). The system also serves as a test-bed for experiments in controlled languages for round-trip ontology engineering (from the GATE project: http://gate.ac.uk/).

GATEWiki is based on Grails, Groovy, Subversion, Selenium and Java, and is hosted on SourceForge in the GATEWiki project.

2. Overview

Wikis are about allowing groups of people to create and edit sets of interlinked web pages with minimal effort and minimal learning of software conventions and features. Typically they achieve this by

Content Management Systems (CMSs) provide persistence, versioning, metadata management, upload and browsing of sets of documents and other resources.

CoW is different from other wikis and CMSs because:

Why another wiki? When we started the work no available wiki in Java that we could find had good SVN support (and we have 55GB of data stored in our SVN repositories!). Using SVN as a backend gives us:

CoW is partly intended to be an experimental framework for a new type of website in which

(In fact perhaps that isn't a wiki at all, but a new type of literate database...)

The system is licenced under the GNU General Public Licence version 3 (GPL 3) except where otherwise stated.

3. Information for Users

In general using GATEWiki should be easy enough not to need a manual. To edit click the "edit" link, for example. A few things have some subtlety, though, and these are described here. A working knowledge of Subversion will also help if you want to exploit the system to the full.

3.1. Quick Start

3.1.1. Files, directories and links

A GATEWiki website sits on top of a normal tree of files and directories (for example on the GATE.ac.uk site this page lives in /gatewiki/cow/doc/gatewiki.html.

When editing and linking pages or images or other files what you are doing is simple operations over the file tree. So, for example, if you wanted to add an image to this page, you would follow the "Directory" link, use "Upload" to add your image file to this directory, and then place it in this page by refering to it from here (perhaps by typing %image(your-new-image.png) during an edit session). If you later want to refer to it from a different site, for example, the image will be available on the GATEWiki server as /gatewiki/cow/doc/your-new-image.png.

To link to another page in the same directory just use the file name, e.g. this link is to "index.html".

To link to a directory (which will be either a list of the files it contains, or the index.html file if it exists) just use the directory name, e.g. this link is to the "yam" subdirectory.

3.1.2. Creating and editing pages

To create a new page first navigate to the directory where you want the file to be placed (click on the "Directory" link from any page in that directory). You can see directory name in two places:

From the directory view you can create a new wiki page or new directory using the "New page" dialog. If you give a ".html" file name you will create a new wiki page; if you give a "name-without-dots" you will create a new directory. (For other file types use "Upload" instead.)

To edit a page simply click "Edit" from the page. You can use a normal word-processor style editor (the "rich editor") or a web form. In the latter case the text is written in YAM, a very simple markup language.

To edit the directory tree off-line simply check out the tree from its repository (you'll need the location of this and the relevant permissions, of course).

3.1.3. Deleting and copying

Files and directories (and their contents) can be deleted via the directory view: follow the "Directory" link from any page, tick the checkbox(es) of the entries you want to remove, then hit the "Delete selected" button at the foot of the page.

Copying (and renaming) of wiki pages is more convoluted at present (and copying of directories is not supported). To copy a page first edit it, copy the text to the clipboard, then create a new file, edit it and paste the contents of the old file. (To do a rename simply perform this process and then delete the old file.)

Don't use the WYSIWYG editor for copying - just use the form editor (otherwise you're liable to lose all the formatting of the page).

(One reason we haven't made copying easier (yet) is that you can also use any Subversion client, of which there are legion.)

3.1.4. Upload other types of files

If you want to upload plain HTML, or a PNG image, or whatever, you can use the "Upload" link which is visible in the directory view. You may upload trees of directories by first TGZing or ZIPing them and requesting the "unpack" option. The upload dialogue allows you to choose whether or not to overwrite existing files.

3.1.5. Finding things

In the usual way GATEWiki supports search via the "Search" box: just type your keywords and press the button. For more sophisticated queries, see the search syntax description at the foot of the search results page or the section on searching.

In keeping with our everything-is-a-directory-tree philosophy, another way to see what files live where etc. is via the directory view.

When you're logged in (and have permission) each page has a "Directory" link leading to the directory view, which is just like a files and folders browser on your desktop. The type of each entry is indicated by an icon:

Directory view also provides access to the upload and delete functions.

3.1.6. Raw HTML pages vs. wiki pages

When you create a new web page in GATEWiki two files are added:

In general you can simply ignore this - edit and delete operations, for example, work transparently on your pages. However, GATEWiki also supports raw HTML files, which have no YAM source. Again editing and so on is transparent, but because HTML presents more of a security risk the permissions associated with these pages are often different from ordinary wiki pages - hence the different icon for these pages in directory view, for example.

3.2. Modes

CoW has two operational modes:

When you use CoW on your own machine it will be in workstation mode; when you're using it over the web on another machine it will be in server mode.

3.3. Register, Log in

When CoW is running in server mode, it is necessary to create an account and to log in before being allowed to edit pages. Some pages will also not be accessible for reading unless an administrator adds you to the appropriate group. To register, go to the login page and follow the register link.

3.4. Create a new wiki page or directory

To create a new wiki page you can either:

Dependent on the type of name you choose for your page (either *.html or a name with no "."s in it) GATEWiki will create either a new wiki page or a new directory respectively.

To upload other types of files, see the upload section.

Note: for technical reasons the following directory names are currently unavailable in the top-level directory of the main sandbox:

(This could be fixed, but only with a fair amount of pain and suffering.)

3.5. Edit a wiki page

When you have permission to edit a page that you're viewing an edit link will appear. Two types of editor are available:

  1. a WYSIWYG editor (FCK Edit)
  2. a web form page

To change between these types use the "Switch to..." button.

When using 1., you can generally ignore YAM syntax and work like you do in a word processor - e.g. hit the "B" button to make text bold, or the "I" button to make it italic, etc. When you're finished, hit the save button (it looks like an old floppy disk). There's one exception to ignoring YAM: because the edit is converted back to YAM afterwards, if you include YAM syntax in your edit you'll need to escape it. Do this by putting a backslash in front - e.g. \*.

When using 2. use YAM syntax (here's a short summary, which also appears below the form while editing, and below).

In both cases when you finish an edit2 CoW will try and check in your changes to the parent repository. At this point it checks to see if another user has modified the same file while you've been editing it. If so, their changes and yours are merged; if the changes are in different parts of the file all is well and the merged file is then checked in. If, however, the changes are close together they are judged to be in conflict, and you will be returned to your edit session to resolve this conflict. To find the parts of the file where the problem exists search on "===="; here you will find indications of what was in your file and what was in the edit by the other user, and you can choose one or the other or both as you prefer.

Note: when you edit a non-native (non-YAM, i.e. raw HTML with no GATEWiki version) HTML file GATEWiki will always use the WYSIWYG editor. Be aware that at present this has the side-effect of deleting meta tags from the file headers.

3.6. Searching

A search box is provided at the top right corner of the page that allows searching within a wiki area that the current page belongs to. In other words, if a page belongs to the Help section, results are retrieved only from the Help section. Information on Solr query syntax is available at http://wiki.apache.org/solr/SolrQuerySyntax

Given a query, if it succeeds, maximum of 10 results are shown on a single page. User can navigate through different page links at the top or bottom of the search results to jump to a different sets of results. A search result comprises of the following:

Hits from pages which the current user doesn't have permission to view are filtered out and not included in the search result.

3.7. Report a bug

To report bugs, first please check that they've not been reported already!

Then add a report to the bug tracker, including information about the platform you're running on and all details necessary to reproduce your problem.

3.8. The YAM Markup Language

CoW's underlying markup language is YAM (Yet Another Markup). You don't need to use it - you can use the WYSIWYG editor instead - but if you're a Vim-wielding old fossil like me you may like it. The syntax is about as light as they come; next is a summary followed by a longer description.

3.8.1. YAM Summary

Title First paragraph of the file.
Headings %1,­%2,­etc.; %1* is unnumbered; follow with blank line
Bold, italic, teletype, underlined *...*, _..._, ^...^, __...__
Contents %contents
Horizontal lines -
Tables %[ | row 1/column 1 | r1/c2 | -| r2/c1 | r2/c2 | %]
Block quotation %"...%"
Line break %br
Verbatim %<...%>

%code(lang=Java)< ... %>
provides syntax highlighting for Java, XML, etc.


- item 1
- item 2
# numbered item 1
(for nesting use indentation)

Footnote %footnote(...)
Escaping \

http://thing.com/ or %(http://thing.com/) or
%(http://thing.com/, link text)3

Anchors %#name (then link to it with "%(#name)")

%image(file) or
%image(file, alt tag, width, height, position, border)
(second and subsequent arguments are optional)

Citations %cite(citekey,citekey,...)

%include(level, useTitle, file.yam)

Non-breaking space %\ followed by space
Single-line comment / notes %% ...
Multi-line comment / notes %/* ... %*/
Special characters

(e.g. < or & in HTML) are
translated correctly in the output.

Twitter %twitter(title=GATE News, account=GateAcUk, name=GATE, count=10)
Google %google(siteip=gate.ac.uk)
Metadata %meta(author=My Name) become <meta> tags in the HTML

3.8.2. YAM Syntax and Usage Introduction

YAM (Yet Another Markup) is a simple wiki language used in GATEWiki. The language syntax is described below. Contents

Contents listings like that above are generated by '%contents' Bold, italic, underline and teletype

Bold text is contained in stars: *this is bold* becomes this is bold.

Italic text is contained in underscores: _this is italic_ becomes this is italic.

Fixed-width text is contained in carat signs: ^this is teletype^ becomes this is teletype.

Underlined text is contained in doubles undercores: __this is underlined__ becomes this is underlined. Horizontal lines

Horizontal lines are indicated by 3 or more dashes at the start of a line. For example:




both result in: Lists

Unordered lists are indicated by '-' at the start of a line, and ordered lists by '#'. Nesting is indicated by increased spacing preceding the item indicator. For example:

- This is an undordered list
- Second item
  # This is a nested...
  # ...ordered list
- Back to the third item of the enclosing list

results in:

The precise size of the indentation of embedded lists doesn't matter, it just needs to be larger than that of the enclosing list.

Lists end when there is a blank line or where the next line of text is not indented. For example:

- This is a one item list
followed by
- another one item list.

results in:

followed by

Note: lists embedded in tables have to start on a new line, just like elsewhere; in tables a syntax error will result if the list starts on the same line as the rest of the row. Verbatim output

Verbatim output starts with '%<' and ends with '%>'. For example:

%< This will *not* get translated. %>

When the target language is HTML, for example, the output will contain '<pre>' tags.

For code listings you can enable syntax highlighting with

public void hello() {
  System.out.println("hello world");

which produces

public void hello() {
  System.out.println("hello world");
Highlighting is performed in HTML using google-code-prettify and in LaTeX using the listings package. The list of supported language names is slightly different for the two packages (HTML, LaTeX) but mainstream languages including "C", "Java", "Python", "HTML", "XML", "CSS" and "TeX" are supported by both. When translating yam to HTML (but not to LaTeX) the highlighter will attempt to guess the appropriate language if you omit the lang specification altogether (%code()< ... %>).

By default, the listing does not have line numbers. Numbering can be enabled using the option numbering=on, plus an optional firstnumber=N if you want to start numbering from something other than 1. Note that in HTML only every fifth line is numbered in the default google-code-prettify CSS style. Footnotes

Footnotes are like this:

%footnote(This is a footnote.)


The contents will be put in a section at the end of the document (HTML) or at the bottom of the page (LaTeX), and linked by number from where they occured. Escapes

To stop a special character from being interpreted, use a '\'. For example,


will not generate a line.

(This also works for the forward quote or backtick character — ` — which is used in LaTeX but may otherwise be replaced by a normal single quote in HTML output.) Titles and metadata

The title of a document is the first paragraph of the document, ending in one or more blank lines. (Often this will be a single line of text.)

Metadata can be specified using %meta(foo=bar), which in HTML will become

<meta name="foo" content="bar">

in the page header. Headings

Headings are lines starting with %1 (for first level), %2, %3 or %4 and are followed by one or more blank lines. For example, the heading for this section is

%1 Headings

If a heading level is followed by "*" it is not numbered, e.g.:

%1* An unnumbered heading


An unnumbered heading

This heading will not appear in the contents table. Links and anchors

Links can be specified in three ways:

  1. As plain text, e.g. 'http://gate.ac.uk/' will become http://gate.ac.uk/
  2. Using '%(target)', e.g. %(http://gate.ac.uk/) will become http://gate.ac.uk/
  3. Using '%(target, label)', e.g. %(http://gate.ac.uk/, GATE home) will become GATE home

Spaces or commas within the link target of %(...) format URLs must be escaped. The link text (following the first unescaped comma) may contain "inline" Yam markup such as %image(...), %cite(...), *bold*, _italic_ or ^teletype^, but not block-level markup such as tables. Parentheses are allowed within link text but left and right parentheses must be balanced, i.e. %(http://example.com, an (example) link) is OK but %(http://example.com, unbalanced ( brackets) is not unless the unmatched parenthesis is escaped.

A URL that appears in plain text must be followed by a space, tab or newline. Sometimes, you might need to follow a URL with something other than a space, tab, or newline, for example when applying other formatting characters. To do this, use a bracketed form. e.g. to teletype a URL, ^%(http://gate.ac.uk/)^ becomes http://gate.ac.uk/.

Anchors and labels are specified using '%#name'. For example,

%1 A Heading %#label

will result in a heading followed by the anchor label. To refer back (or forward) to the anchor, use a "#" in the link, e.g.

%(#tables, tables)

will result in tables.

Spaces or commas inside anchors must be escaped. An anchor that appears in plain text must be followed by a space, tab or newline.

A relative link to a non-existant file will be rendered as a link to the host application's "create" page, e.g.



A link to an existing file will be just link as normal, e.g.


index.html Block quotations

Block quotations are enclosed in %" marks. For example,

  %"This is a quote%"


This is a quote

Note that because the quote marks are treated as normal words, they can cause overlap problems (in the same way that an unclosed bold or italic mark might). For example,

- list

is not a good idea as the end of the quote will preceed the end of the list. The workaround is to close the list first by adding a blank line:

- list


which then results in something sensible: Line breaks

Line breaks are indicated by %br at the end of a line. For example:

This line is broken %br in two.

This line is broken
in two. Tables

Tables use square brackets, bars and dashes. For example:

 | *header col 1*	| *header col 2*        |
 | row 1 col 1	        | col 2                 |              
 | row 2 col 1	        | col 2                 |

results in:

header col 1 header col 2
row 1 col 1 col 2
row 2 col 1 col 2

To include a | in normal text escape it like this: \|.

(See also the note above about embedding lists in tables.) Images

Images are like URLs:

You can also specify an ALT tag, width and height, position and border width: '%image(test-image.png, ALT tag, 500, 500, left, 0)' becomes ALT tag Citations

Citations work like this: '%cite(Cun06a)' becomes Cun06a. Multiple cite keys should be separated by commas, e.g.: '%cite(Cun05a,Cun06a)' becomes Cun05a, Cun06a. Inclusion

A page can include another page like this:


This results in the inclusion of all the text from yam-first.yam in this file.

An increment to be added to the heading level can be given as the first argument.

Note that the titles in the included files are ignored by default. A "useTitle" flag can be given (after the increment if it exists) to cause inclusion of the title (as a heading). For example: %include(1, useTitle, yam-first.yam). Non-breaking space

Non-breaking spaces are added using %\ followed by space, e.g.

This line %\ %\ %\ %\ has spaces in the middle.

This line     has spaces in the middle. Comments

Single-line comments are created by two or more percents together, e.g.

This is not commented   %% but this is

This is not commented

Multi-line comments are created by %/* and %*/, e.g.

This is not commented   %/* but this is
and this is too %*/

This is not commented Plugins

YAM can be extended by the use of plugins. Creating plugins requires some Java programming - see the developer guide for more details.

Plugins bundled with GATEWiki: Changes from version 3

YAM is currently in version 5. Since versions 3 and 4 these changes were made:

3.9. LaTeX Support

YAM will translate into LaTeX (as well as into HTML). Some things of note:

4. Information for Developers and Administrators

Currently CoW includes:

The system has two modes, workstation and server; the former does no user management, the latter uses JSecurity.

There is API documentations etc. linked from here, and the top of the software documentation tree is here.

4.1. Roadmap

The development roadmap, currently active tasks list and wish-list is recorded in the backlog document.

4.2. Checking out the code

To check out CoW from Subversion first decide if you want to check out a copy of GWT and Grails HEAD while you're at it. If so do this:

svn co https://gatewiki.svn.sourceforge.net/svnroot/gatewiki/trunk gatewiki

If not, do this:

svn co https://gatewiki.svn.sourceforge.net/svnroot/gatewiki/trunk/cow cow

If you do the latter you'll need to set properties in build.xml to point to your own installation of GWT and Grails. (Also, if you put CoW in a directory not named cow you will need to change projectName in webtest/conf/webtest.properties to get the Canoo tests to work.)

Also note that the copy of GWT in the repository is for Linux. If you are on Windows or a Mac then you will need to replace this with the appropriate version for your OS.

4.3. Selecting Modes

To select the mode (see above) use

Workstations include laptops and are expected to be off-line from time-to-time; servers are expected to be always connected (and always have accses to the relevant SVN repositories).

4.4. Create a new wiki space

To create a new wiki space go to the Admin page and click through to wiki areas and select "New wiki". Then you can either:

In the former case you can choose any existing SVN-controlled file tree and CoW will allow you to create, share and update YAM files in that tree.

Note that each wiki area has its own sandbox. Two areas are special:

Other areas are served from /g8/page/show/<area ID> URLs.

4.4.1. Authentication settings

When running in server mode, or when doing scheduled or one-off subversion updates (see below), CoW needs to communicate with the subversion repository (or repositories) underlying the wiki sandbox. If the repository is on the local filesystem (i.e. checked out using the file: protocol) this will work fine, but if the repository is remote (svn:, svn+ssh:, http: or https:) it may require authentication. Since the same authentication profile may be shared between several sandboxes (for example several repositories hosted on the same SSH server), configuring authentication is a two-step process. First you must create an authentication profile containing the user name and other credentials, and second you associate that profile with the relevant wiki area or areas.

To manage the known authentication profiles, go to the main Admin home page and follow the "authentication profile" link at the bottom of the page. Each profile can hold any or all of the following data:

Note that the SSH private key option takes precedence over a username and password - even if a password is set it will not be sent to SSH servers, the key will be used instead.

Once you have created and configured the authentication profile, you can attach it to the relevant wiki areas, either when the area is created or by editing the area definition (via the "Create and edit Wiki areas" link on the admin page).

Limitations: Each wiki area is associated with a single authentication profile so if the sandbox includes directories from other repositories (e.g. via svn:externals), all the different repositories must be accessible using the same profile. This limitation may be relaxed in a future version of CoW.

4.4.2. Setting wiki regeneration

Wiki pages may refer to other wiki pages, via links and includes. They are therefore dependent on each other. When you change a page, the dependencies will also be updated. This may get out of sync (perhaps via a direct edit on disk, or some other route). You can therefore regenerate wiki dependencies via the admin interface. This is described further below.

4.4.3. Setting wiki updates

A wiki is a working copy of an SVN repository, and so could get out of date with respect to the repository. You can update the wiki working copy from the admin interface (on the wiki admin page) in two ways:

If the wiki working copy is locked (e.g. by an update following an edit checkin), then the update will be skipped. If there is a conflict, the update will fail. If the repository requires authentication this must have been set up as described above.

Any files that are created, modified or deleted in the sandbox as a result of the update will have their dependencies regenerated automatically. In particular if a .yam file has been committed to the repository its corresponding .html will be regenerated (and, in server mode, checked in) when CoW updates the .yam.

4.5. Building and testing CoW

CoW is built with ant. For more documentation do ant help in the cow directory. To run CoW see next section.

Currently the build file

To build CoW from a clean checkout, assuming that grails is in the same directory:

For shell-literate people there's also a script bin/cruise which makes it easier to interpret the voluminous output of the tests and which stores a log in ant-log.txt, but note that this doesn't run the functional tests.

4.5.1. More about cruising

The cruise target does a clean build and runs the Grails unit tests and the Java-only tests (for YAM etc.). Functional tests used to use Canoo (ant target webtest) but we couldn't get them to work with GWT, so now are provided via the test-selenium target (which is not part of cruise because of the difficulty of configuring Selenium in headless mode cross-platform). The war target creates a WAR file which can be deployed by dropping it into a servlet container.

The old Canoo target (currently broken):

See ant help for more details.

Note that there are some bugs in ant target ordering (workaround for most of them is to run ant cruise once before doing anything else):

4.5.2. Selenium tests

Functional testing of CoW uses Selenium. Selenium uses various bits of JavaScript magic to enable you to remote-control a real web browser talking to your web application and verify that the results match expectations (e.g. that a page contains particular text, or that an alert box is shown with a particular string, etc.). The Selenium tests for CoW are contained in the directory gatewiki/cow/selenium in the distribution. There are ant targets to run the full test suite, or you can open individual tests in the Selenium IDE plugin in Firefox.

To run the Selenium tests in ant, do ant test-selenium. This will start up the server in workstation mode, run the workstation-mode test suite, then shut the server down, restart it in server mode, and run the server-mode test suite. If there is already a running CoW instance on localhost port 8080 then the tests will use that instead of running their own, but in that case you should only run the test suite that corresponds to the mode your server is running in (ant test-selenium-workstation or ant test-selenium-server), as the tests for the other mode will probably fail.

Selenium runs your tests in a real web browser, so before running the tests with ant you will need to configure the browser that Selenium should use. Selenium supports various different browsers, though the current CoW test suite is only known to work reliably on Firefox5. The default configuration (specified in gatewiki/cow/selenium/test-selenium.properties.default) runs the Firefox browser, and expects to find the firefox-bin executable on your path (Linux) or for Firefox to be installed in the default location (/Applications/Firefox.app on Mac OS, C:\Program Files\Mozilla Firefox\firefox.exe on Windows). If this is not the case you will need to create a test-selenium.properties file to override this default, containing the line:

browser=*firefox /path/to/your/firefox-bin

For example, on Ubuntu, there is no firefox-bin, instead the Firefox binary is called just firefox and lives in /usr/lib/firefox-{version}, see test-selenium.properties.ubuntu for an example.

Note that because of this additional configuration step, the Selenium tests are not run as part of ant cruise.

You may also have problems running Selenium from the command line if you have existing test artefacts in your CoW database. Try deleting all of these and re-running: Firefox Profiles

The Selenium tests are run using a custom Firefox profile that allows us to pre-configure certain settings without changing the default user profile. If accessing the app directly then the profile in gatewiki/cow/selenium/profile/normal. If, however, the tests are being pased through ratproxy for security testing the the profile in gatewiki/cow/selenium/profile/ratproxy will be used.

Full details on how to generate the profiles and associated certificates can be found here. Developing new tests

To develop a new selenium test it is easiest to use the Selenium IDE Firefox plugin. Start up a test instance of CoW in the relevant mode using ant run-test (with -Dgate.cow.mode=server if you are developing a server-mode test), then open up the selenium IDE in Firefox and load the relevant test suite, cow/selenium/{workstation,server}-suite.html. You can add new tests to the suite, individual tests should be saved as HTML files in the cow/selenium/tests directory.

One of the gotchas is that e.g. the upload selenium test depends on the presence of certain files in the dev-user-home/.cowrc.d directory, which will only be present if you have deleted this and re-run cow in development mode since those files were created...

4.5.3. Upgrading to new Grails versions

4.5.4. Using the YAM Tests

The test suite uses a bunch of .yam and compiled .html files in cow/test/resources. After running the tests the script cow/bin/check-errors will check for failures and, when these are caused by incorrect translations, display a (tk)diff of the actual and the correct output.

Note: when updating to new versions of yam2html the test resource yam-minimal-no-includes.html needs to be selectively merged with yam-minimal.html. First update the latter to reflect the translation changes, then:

4.5.5. Grails/Spring/JSecurity and the upload function

There is currently a problem between Grails/Spring and the JSecurity plugin when uploading a file that we hope will be solved in future updates.

The problem is that when inside a Spring webflow JSecurity changes the MultipartHttpServletRequest into a JsecurityHttpServletRequest that we can't then get the uploaded file from (see JSecurity forum). The workaround we implemented involves reloading the main page after each upload action, which is a bad user experience but works...

4.6. Subversion versions

(This note is only relevant to those using local copies of SVN tree. By the time you read this the version number have probably changed.)

Wiki areas in CoW are subversion sandbox directories. There are many different versions of the subversion Java and command line tools, and each version is associated with a particular format of the control files under the .svn directory in the sandbox. Generally speaking, later versions of subversion tools can read sandboxes created by earlier versions, but in doing so they transparently "upgrade" the sandbox to the newer format. Once this happens in a particular sandbox, that sandbox will no longer be readable by the earlier version.

CoW's subversion support is provided by the SVNKit library. At the time of writing we are using version 1.2.0, which works with the same working copy format as the 1.5 series svn command-line tool. This means that, for example:

If you intend to use CoW in workstation mode you must use a compatible, e.g. 1.5-series command-line client. If your command-line client is version e.g. 1.6 you will need to upgrade the svnkit JAR used by CoW to version 1.3.0 (which speaks the 1.6 protocol).

4.7. Configuring CoW

Configuration options in CoW are dealt with in the normal Grails fashion in a file called Config.groovy; to override these options create a file called .cowrc.groovy in your home directory.

For example, the following .cowrc.groovy would change the title and logo, and turn on the Sventon and Solr 3rd-party webapps:

/** Herein external CoW config. */
println "loading external user config; running in ${'pwd'.execute().text}"

gate.cow.name.short     = "CoW - dev mode"
gate.cow.name.long      = "CoW - dev mode, a Controllable Wiki"
gate.cow.logo           = "/g8/page/show/1/doc/larson-small.png"
gate.cow.sventon.run    = true
gate.cow.solr.run       = true

For more details on what this example is doing, see the site-specific layout section).

Note that because of this bug, CoW uses a slightly non-standard way of configuring the Grails DataSource. To modify data source settings you should edit the DataSource.groovy under the config directory, not the one in grails-app/conf. Hopefully we will be able to revert to using the normal mechanism when we next upgrade to a newer Grails release.

4.7.1. Serving robots.txt

To change the default /robots.txt (which does nothing) set gate.cow.robots.

4.8. CoW's Data Area

CoW stores all user data in a directory called .cowrc.d (on *NIX), which is by default in the user's home directory.

4.8.1. SVN Config Directory

The svnconfig directory stores files related to how SVN works within CoW. This is a standard SVN config directory (i.e. it is the same as ~/.subversion) but specifically configured for CoW.

Currently the one thing this configuration ensures is that .yam files have the LF line ending. Note that this is only true for files added or imported through CoW. If you are adding YAM files outside of CoW then you should manually ensure that have the LF line ending applied by SVN.

4.9. Deploying and Running CoW

4.9.1. Starting and stopping

The easiest way to run (or deploy) CoW is via Grails. If you're developing then use Grails' run-app; for production use run-war. CoW's build file gives you acess to these: the ant run-dev target does a Grails run-app and ant run-prod runs a Grails production Jetty instance on the CoW WAR.

Shutdown: ctrl-c will shut down Jetty when run from Ant or Grails. Ctrl-c of Ant does not, however, cleanly shutdown a Jetty forked from Ant. For a clean shutdown with correct execution of all shutdown code, run ant shutdown-prod. You may supply an optional port on which Jetty will listen for the shutdown signal, and a password key to listen for, with -Djetty.shutdown.port and -Djetty.shutdown.key. Defaults are cleartext in the build file.

Note that the first time the system runs it will create a .cowrc.d directory (or cowrc.d on Windoze) in your home directory containing help documentation, a DB etc. (The first time through this takes a couple of minutes as the way it sets up the new wiki areas is inefficient.)

Alternatively use ant war and deploy the result onto your favourite servlet container. There are these small disadvantages:

4.9.2. Portability of the .cowrc.d directory

If you want to move the data from a CoW installation into a different location, you need to do two things:

4.9.3. Apache 2 virtual hosts and CoW proxying

This section describes running multiple websites (e.g. http://gate.ac.uk/ and http://gatecloud.net) on a single physical server using Apache virtual hosts and proxying. Each CoW site runs on a different port and is proxied by the (single) Apache server. The configuration was tested on Ubuntu Intrepid, Apache 2.2.9, CoW 0.3.


Example virtual host definition from the sites-available directory:

# gate.ac.uk

<VirtualHost *:80>
  # copied from the default site set up by the debian installation
  ErrorLog              /var/log/apache2/error.log
  LogLevel              warn
  CustomLog             /var/log/apache2/access.log combined

  ServerName            gate.ac.uk
  ServerAlias           www.gate.ac.uk

  ProxyRequests         Off
  ProxyPass             /       http://localhost:8080/
  ProxyPassReverse      /       http://localhost:8080/

(And the same for gatecloud.net (with a different port number of course), for example.)

Verifying the VH config:

# APACHE_RUN_USER=www-data APACHE_RUN_GROUP=www-data /usr/sbin/apache2 -S
VirtualHost configuration:
wildcard NameVirtualHosts and _default_ servers:
*:80 is a NameVirtualHost
  default server (/etc/apache2/sites-enabled/000-default:1)
  port 80 namevhost (/etc/apache2/sites-enabled/000-default:1)
  port 80 namevhost gate.ac.uk (/etc/apache2/sites-enabled/gate.ac.uk:3)
  port 80 namevhost gatecloud.net (/etc/apache2/sites-enabled/gatecloud.net:3)
Syntax OK

When CoW is behind an Apache proxy in this way there is also the option to map other wiki areas (apart from area 1 or 2) to more "friendly" URL prefixes through the use of mod_rewrite rules. To make use of this, first do sudo a2enmod rewrite, then add the following to the virtual host configuration, before the ProxyPass directive:

RewriteEngine on

RewriteRule  ^/g8/page/show/4(/.*)?$  my-wiki$1 [R,L]
RewriteRule  ^/my-wiki(/.*)?$         /g8/page/show/4$1 [PT]

This will make wiki area 4 visible under /my-wiki. The second RewriteRule internally re-maps requests for pages under /my-wiki to their corresponding locations under /g8/page/show/4. However, CoW only sees the usual /g8/page/show URLs and has no knowledge of this mapping, so it will generate links that point to /g8/page/show/4. The first rule means that when such URLs are requested by a user's browser, they will be redirected back to the friendly /my-wiki URL instead.

Open issues:


4.9.4. Production Deployment and Upgrade

When deploying GATEWiki these elements need to be born in mind:

The following set of steps details a reasonably minimal recipe for deployment and upgrade of production servers. It uses G8RS.net (the GATE team's site for internal use) as an example. These instructions were written for GATEWiki 0.9 running on Grails 1.1.1 on Ubuntu Hardy and Apache 2.

And Bob's your uncle.

4.9.5. Deployment and runtime dependencies

This note discusses two issues:

  1. how do you package up a CoW site for initial deployment? when you want to upgrade the software for a site what do you do?
  2. how should 3rd-party webapps like Sventon and Solr be configured and shared between CoW instances?

Note: for deployment and upgrade the discussion here is superseded by the preceding section.

Complicating factors include:

Runtime dependencies:

Design choices:

Taking the easy one first, we address issue 2. (how to share Sventon and Solr between CoW instances) by putting all site-specific config for these apps in the cowrd.d directory, so the file trees are shareable across instances. (Sharing the servlets themselves is tricky because of their configuration dependencies on the wiki areas of the CoW instance.)

Re. issue 1., packaging for deployment and upgrading deployed sites, we have a basic solution in the create-custom-cow-site.sh script (see also the site-specific layout section). This has two modes

  1. creating a new deployment tree, including all the runtime dependencies
  2. updating an existing tree

In both modes these parameters are relevant:

The version/date is used as the basis for tagging SVN trees (so that you can return to the scene of this crime later if necessary) and to distinguish site and server deploys (both the server software tree and the site tree will include the date).

Mode 1., creating a whole new deployment tree:

The new tree is then tested and deployed on the remote server.

Mode 2., updating a deployed site:

The new CoW tree is tested, and then used to replace the running site.

Note that using mode 2 multiple sites can be deployed into a single main tree, thus sharing most of the runtime dependencies (e.g. Grails) across sites.

Note also that when these dependencies change (e.g. new version of Grails or Nutch etc.) mode 1 should be used to create a new complete tree. (If upload bandwidth is an issue this can be rsync'd with a copy of the existing deployment tree on the target machine.) Saving space

At the time of writing we have the following raw sizes for the runtime dependencies discussed above (total ~450M):

$ du -sh ...
289M	cow
112M	grails
159M	gwt
171M	nutch-solr
24M	sventon

(Though note that most of this data is in SVN, hence twice as big as otherwise. The total without the SVN directories, and with some development data excluded, is a bit under 450M at the time of writing.)

There are lots of ways to reduce this total:

4.10. Structure, Naming and Code Conventions

4.10.1. Naming and Other Code Conventions

GATEWiki uses the GATE coding conventions.

TODO link to a publicly available copy of the conventions.

One thing in particular needs to be borne in mind when naming controllers: the namespace of controllers and the top-level wiki directory in the main sandbox conflict (and this is also true of static resources in the web-app directory). Therefore it is impossible to have /page or /css as directory names at the top of the main sandbox. New controller names (etc.) should be chosen appropriately (e.g. CowFooBarController).

4.10.2. Structure

CoW is made up of the following components (and numerous 3rd party libraries): Main classes and Grails objects

The CoW webapp is organised around the concept of wiki/content areas, which are file trees stored in SVN. What CoW then provides is

CoW is implemented using Grails. The Grails MVC model works like this:

Or, put another way, the Grails web application architecture has four main components:

In CoW we have:


4.11. Authentication and Authorisation

Important note: a new GATE wiki installation has a default user, scott. Scott is an adminstrator, he can do anything. He has the traditional and well known password, tiger. Before doing anything else, you must:

Don't forget to delete scott

4.11.1. Grails JSecurity plugin

Authentication and authorisation uses the JSecurity plugin for Grails.

Note that the basic security model installed by the plugin has been adapted:

4.11.2. Grails JCaptcha plugin

User registration attempts to prevent registration of bots, using the JCaptcha plugin for Grails:

4.11.3. Users, roles, permissions, actions

Security is based around users, roles, and permissions.

Users are easy: they are you, me, your neighbour. Someone who is using the wiki. Roles are groups of users. How you decide to group your users is up to you. But most likely, they will groups with some common functional requirement, such as "editors", "reviewers". Or maybe "Team X". Roles do not themselves have any automatic rights to do anything. The rights must be assigned to a role (or to an individual user, if you really want to). Rights are defined separately to roles. Once defined, they can be assigned to any role.

Rights are defined as "permissions". In the default JSecurity plugin installation, a permission defines an action and a controller. You can assign a permission, i.e. the right to use an action on a controller, to a role or user. These types of permission aren't sufficient for GATE Wiki, where lots of wikis may be served by a single controller, and where we may need more fine grained control over a wiki's directory structure.

So, instead we use CowPermissions. These define:

Inclusion of the page controller in this model means that you can define access to any arbitrary controller, not just wiki pages. This might be useful if you are installing a grails plugin into the wiki.

An assigned permission has one controller, and one named set of actions, and any of the actions in that set can be carried out if you have that permission. You might, for example, define sets of actions on the page controller that are needed for read access to a wiki, or for write access. For fine grained control, you could define a set to contain a single action.

Permissions take two forms:

Access control proceeds as follows:

See this thread for a useful description of how permissions work. Directory level authorisation

For actions of the page controller, Gatewiki permissions gives authorisation control at the level of directories in the wiki. Permissions for specific directories is defined in terms of Java regular expressions (see also this Java tutorial).

Each permission has two directory regular expressions:

To understand how the two patterns interact, and how all the patterns on the several permissions a user may have interact, you need to know the detail of directory permission checking.

  1. For access to a given directory or file, all of the permissions configured for a user and all of their roles will be checked
  2. If any configured permission gives access, then the user has access, regardless of any other permissions
  3. For each configured permission, access is checked for the path of the directory relative to the wiki i.e. that part of the URL after the wiki name up to but excluding the filename. We will call this "the directory".
  4. For each of these configured permissions, access is given if both of the following are true:
    • the include pattern matches the entire directory to which access is required
    • the exclude pattern does not match the entire directory to which access is required

The basic rules are:

The last two points can be summarised in this truth table

Include pattern Include pattern
Match No Match
Exclude pattern Match DENY DENY
Exclude pattern No Match ALLOW DENY

Some example patterns

It is important to get your patterns right - otherwise you may give unexpected access. Note the following about paths:

Pattern Note Meaning as an include pattern Meaning as an exclude pattern
Give access to all directories Deny access to all directories
This is an empty string Give access to no directories Deny access to no directories

Give access to any directory path containing foo e.g. foo/bar/, foobar/, bar/foo/, bar/foobar/bar/

Deny access to any directory path containing foo (examples above)

Give access to any directory path with a leaf directory foo e.g. bar/foo/, but not bar/foobar/ or foo/bar/

Deny access to the above

Give access to any directory path with any subdirectory equal to foo, e.g. bar/foo/, bar/foo/bar/, but not bar/foobar/

Deny access to the above

Give access to any directory path with a top level directory equal to foo, e.g. foo/, foo/bar/, but not bar/foo/ or bar/foobar/

Deny access to the above

Some example pattern uses

Scenario Example use case Example include pattern Example exclude pattern
Access all directories General user
Allow access to just one directory in a wiki A leaf directory for external users in an otherwise closed site
Deny access to a specific directory in a wiki A leaf directory not accessible to most users
.*\/privileged\/\z Access to non-page controllers via CowPermission control

The SecurityFilter and the Jsecurity code give access to wiki pages via the "page" controller. They are also configured at bootstrap to give some access to some other controllers. These are: User and password constraints

Constraints on users are enforced by a UserCommand and PasswordCommand, which are used in a couple of places. Together, they define more fields than JsecUser (e.g. a repeat password, constrained to be the same as the password), but can be bound to JsecUser. Note that the UserCommand is not able to enforce a unique name constraint, which is done by JsecUser Security in workstation and server modes

4.11.4. Pre-defined security objects

The above gives an abstract view of how security works. In a default GATE wiki installation, there will be several pre-defined security objects that you can use to get your installation up and running.

Type Object Description
Role administrator

This role has a single pre-defined permission, which allows it access to wiki * with the action set called *, which contains all actions, and to the controller *. In other words, members of this role can do anything. There is a single initial member by default, scott (see below), who you should delete.

Role default

All new registrants are automatically made members of this role. This role has no pre-defined permissions. So, by default, new registrants are given access to nothing. But you could assign a permission to this role, so that new registrants get access to something.

Role anonymous

This role is treated differently to the others. The default bootstrap permissions will allow anyone access to something permitted to this role, regardless of whether they are a member of this role or whether they are logged in. In effect, all users, logged in or not, are members of this role. By default, this role has a permission, which allows read-only access to the help wiki, and permissions for the jcaptcha, register, and user controllers, all of which are needed for registration etc.

Role help read This role has a single permission, giving read access to the help wiki.
Role help read write This role has a single permission, giving read write access to the help wiki.
Role main read This role has a single permission, giving read access to the main wiki.
Role main read write This role has a single permission, giving read write access to the main wiki.
Role wiki N read

This role is created by default for any new wiki N. It has a single permission, giving read access to wiki N.

Role wiki N read write

This role is created by default for any new wiki N. It has a single permission, giving read and write access to wiki N.

User scott

The sole default member of the administrator group, with the usual password. You should create a new admin user, log in as that new user, and delete scott.

Action set Read

A named set of all the actions that are required to give read access to a wiki. Used when constructing new permissions.

Action set ReadAndWrite

A named set of all the actions that are required to give read and write access to a wiki. Used when constructing new permissions.

Action set * A named set of all the actions. Used when constructing new permissions.
Controllers page, jcpatcha, register, user, * Controllers used by the default roles.


4.11.5. Tag library

In addition to the tags defined by Jsecurity (note do not use jsec: prinicipal, see below), the following are available in SecurityTagLib:

See the taglib for fuller documentation.

4.11.6. Code dependencies between Jsecurity and GATE Wiki

There are very few! Of course, there are lots of pages specific to security and its administration. But it shouldn't be too hard to tease these apart from the core wiki.

4.11.7. Known vulnerabilities and avoiding them

4.12. Site- and Wiki-Specific Layout and Navigation

CoW's look and feel is a masterpiece of minimalist aesthetics, the like of which is seldom seen in this age of extravagant waste, technological frenzy and general fluffiness. Everyone who uses CoW will inevitably wish to preserve its wonderful good looks completely unchanged. If, however, the evils of the international capitalist conspiracy force you kicking and screaming to do something different, CoW provides several ways to do site-specific (and wiki area-specific) layout and configuration, from simple things like changing the title up to a complete rethink using a Grails (Sitemesh-based) layout. If you want to make your CoW quack like a duck or stink like a skunk you're going to be in pig heaven.

CoW provides navigation (sets of links to parts of a site organised as menus) that can be tailored on a per-area or per-directory basis, plus layout (CSS styling, page structure, etc.) that can be tailored on a per-site or per-area basis.

4.12.1. Navigation

Navigation that applies to the whole site or to a whole wiki area is probably best put into a Grails layout plugin - see replacing the main layout below. In other cases, e.g. navigation that is different for different parts of a wiki area, CoW allows you to create lists of links in normal wiki pages (that follow a configurable naming convention) and will use these to create navigation menus.

For the impatient: create a file called leftBar.html containing a list of links, e.g.

(Don't put a heading in the file.) Then all pages under that directory will have a left navigation menu containing these items.

More detail:

Adding navigation to a directory tree involves adding YAM files to the top of the tree that contain the links for whichever of various screen areas that you want to contain them (e.g. top bar, left bar, right bar, footer). These files have to be named after the corresponding DIV elements of the main layout (in the default CoW layout they are called header, leftBar, rightBar and footer; in a CoW running a custom layout they may have different names). (Note to layout writers: the DIVs that get replaced with custom navigation must be at the top level of the BODY element in your layout.) Because navigation files are inherited by subtrees, if you have any subdirectories below a navigation file, the links in the file will have to be absolute (i.e. start with a "/").

This type of navigation is dealt with at rendering time so that we don't lose the ability to work with YAM files outside of CoW. So at the place where CoW reads the body of a YAM-derived HTML file it takes a) the top directory of the current wiki area tree and b) the current directory (implicit), and then it steps up the tree looking for all the layouts that are specified in the config (under navigation.files). Each of those present is added to the page model in PageController.show, and these become the contents of the relevant parts of the main layout (e.g. header, leftBar, footer and so on).

4.12.2. Replacing the Main Page Layout

If you want to replace the entire look of the site (or a particular wiki area), then you need to create a Grails plugin and supply a Sitemesh layout, as below. There's an example plugin that sets up CoW to use the University of Sheffield house style at gatewiki/site-plugins/nlp (and various others there).


Another example: to create a "g8rs" plugin and package the whole tree for deployment:

cd gatewiki/site-plugins
GRAILS_HOME=../grails ../grails/bin/grails create-plugin g8rs
cd g8rs
GRAILS_HOME=../../grails ../../grails/bin/grails create-controller \
... now edit grails-app/views/layouts/cowguest.gsp ...
GRAILS_HOME=../../grails ../../grails/bin/grails package-plugin
cow/bin/create-custom-cow-site.sh -n g8rs -p `pwd`/site-plugins/g8rs

(See also the cow/bin/site-plugin.sh example script.)

4.12.3. Changing the Title or Logo

If you want to replace change a few small things like the name of the site (which gets put in the page titles) or the main logo, then you just need to supply an external configuration file in gate.cow.user.home (which defaults to your operating system's HOME directory if not set explicitly). There's an example in gatewiki/cow/dev-user-home/.cowrc.groovy.

Changing the title:

gate.cow.name.short   = "CoW - dev mode"
gate.cow.name.long    = "CoW - dev mode, a Controllable Wiki"

Changing the main logo:

gate.cow.default.logo = "/g8/page/show/2/my-logo.png"

(The PNG file then needs to be uploaded to Wiki Area 2, in this case.)

Note that the logo needs to be somewhere that can be read by the anonymous user, otherwise it won't display on the login page. You can do this either by placing the logo in a anonymous read wiki, or by giving the anonymous user read access to a single folder in another wiki.

4.12.4. Dealing with non-native HTML pages

Non-native HTML pages are those which are not generated from YAM markup, i.e. are not really managed by CoW as wiki pages. CoW will serve any old HTML page that a user happens to upload into a sandbox (or that is added by any 3rd-party SVN client). This is ok if the HTML is from a trusted source, but not so ok when it may contain arbitrary javascript, for example. There's no simple answer to this issue, so CoW provides several ways to control the serving of non-native HTML, but they are turned off by default.

There are two options.

Option 1 is to allow users to specify that for a particular directory any non-native HTML pages will be served raw in their entirety (but see below about insecurity!).

Note the obvious security hole: if any user can upload any HTML and get it served raw by CoW all manner of nasty attacks become possible. Don't allow users to do this unless you're very confident in their intentions (and basic technical skills, e.g. to avoid viruses and so on). If you are ok with all this, the method is:

Option 2 is to specify (as an administrator via the Admin pages) a set of path patterns that apply to non-native HTML pages and again cause them to be served raw. This option is implemented in a similar way to the directory level authorisation. Administrators, while setting up a wiki, can specify which directories are allowed or disallowed for serving of non-native HTML files.

Each wiki area has two regular expressions for this purpose:

To understand how the two patterns interact and for details of directory permissions checking please refer to the directory level authorisation section. If the directory matches the include pattern and is not excluded, the HTML pages are served raw; otherwise a permission denied message is shown to the user.

If you then make the directories where the raw HTMLs are served from read only, then you're secure.

Both the options are available to the administrator. A user is given a permission to see a non-native HTML file if the directory to which the non-native file belongs is granted access by at least one of the two options.

To iframe or not to iframe?
In general the resultant page contains an iframe that pulls in the raw HTML page. This is good for things like Javadocs etcc., but in some cases you may want to just serve the body of the page instead (e.g. the GATE user guide). If a file called .cow-no-iframe is present in the directory containing the HTML page then the iframe will be omitted.

4.13. Referencing and regeneration

The problem is that a create/rename/delete/modify operation on a .yam should be cascaded through any files that link to or include it. The relevant data is what a file "linksTo" and "includes". Regeneration is needed:

This information needs to be maintained in a single dependency graph for each Wiki area. This is coded in class Dependencies.

If we can guarantee that all necessary regeneration and dependency maintenance is done whenever any generation is done, then the graph can be assumed to be consistent. In workstation mode that's an open issue because of updates from other tools but in the worst case we can periodically or on request regenerate all the output files to ensure consistency.

Referencing events

Event sources

Possible event handlers (using and maintaining a dependency graph)

The current design is to use YamFile.generate.

4.13.1. Serialization

Dependencies are serialized to the directory conf.gate.cow.dbs, one file per wiki. They are serialized on shutdown and periodically during uptime. Periodic serialization is carried out (once a minute) by DependenciesJob. They are deserialized lazily, as required.

4.13.2. Regeneration

Dependencies for a single wiki are regenerated on four occasions:

4.13.3. Speed of Running SVN Status

As noted above, YAM files managed by the wiki can be changed in several ways:

In each of these cases we need to regenerate any files that are dependent on the changes (e.g. a .html that is older than its .yam; a YAM file which refers to a wiki page which no longer exists; etc.).

There are a number of strategies that might be adopted to determine what needs regenerating:

  1. a low-priority process that inspects the filesystem directly
  2. a listener that receives notifications from the SVN repository
  3. a process that runs svn st to get a report on files that have changed

For option 3., the statistics in this table are interesting. They give an indication of how long it takes to do an svn st on the SALE tree (which is more than 5Gb of data in more than 100,000 files.

svn st on the sale tree (from cold) svn st -u on sale (filesystem cached)
First run Tenth run

real 0m58.839s
user 0m5.236s
sys 0m3.576s

real 0m2.380s
user 0m1.604s
sys 0m0.504s

First run Tenth run

real 0m33.331s
user 0m3.772s
sys 0m0.852s

real 0m31.082s
user 0m3.280s
sys 0m0.972s

The local status check (which doesn't go to the network to check on repository changes) takes around a minute from cold, but subsequently (when the operating system has had a chance to cache parts of the file system) only takes a couple of seconds. The network check (svn st -u) stays pretty constant around 30 seconds.

(Tests done on a 2GHz Pentium M with 1Gb RAM running Ubuntu Dapper on a broadband connection.)

4.14. Sourceforge notes

4.15. Grails MVC notes

4.16. IntelliJ notes

4.17. Search infrastructure

CoW uses a combination of Nutch and Solr for indexing and search of wiki areas.

Nutch is good at crawling and has several plugins that can be used to extract the text from different types of documents. Solr, on the other hand, is an enterprise search server with a very good search API. The only downside of the Solr is that it requires all documents to be in XML format for them to be indexed. Fortunately, Nutch has a plugin to convert the crawled data into the appropriate XML format. Combination of the two provides a complete search system whereby Nutch is responsible for crawling intranets/websites, converting documents into the appropriate XML documents and uploading content to the Solr and Solr is responsible for maintaining several indices and providing a powerful search API for users to search within these indices.

4.17.1. Installation Directory structure Enabling Solr

By default Solr is disabled. To tell CoW to run Solr (when being run via Grails) set gate.CoW.solr.run to true in the config. Also make sure that the gate.Cow.thirdparty is set to true.

4.17.2. Indexing wikis/individual files from CoW Indexing individual Wiki areas Indexing individual files

Whenever a file is created, uploaded or edited, it is queued for indexing. One of the quartz jobs is dedicated for the indexing purpose only. It runs every five minutes and indexes any documents it finds in the queue. If a document is already indexed, it is deleted and reindexed.

4.17.3. Testing

There is now a separate test suite (selenium/solr-suite.html) for testing solr indexing and searching functionalities. Please note that running selenium tests does not test the solr functionalities. Instead, please use the test-selenium-solr ant target to invoke the solr tests.

4.17.4. Debugging

If you get no results for your queries, the first thing to make sure is that you have followed all the right procedure for indexing respective wiki area. Please refer to the Indexing section for more information on this. If you are certain that you followed all the steps as mentioned under the Indexing section and yet there are no results, please follow the following instructions to debug:

4.18. SVN Browsing

GATEWiki layers on top of SVN in order to take advantage of SVN's mature and sophisticated support for collaborative concurrent editing, versionning, branching, etc. It also allows us to profit from all the 3rd-party tools that support SVN, and in particular to provide browsing of wiki pages (and other content) and their history.

We looked at a number of tools to provide this function, and evaluated Sventon, OpenGrok and ViewCVS. All have their strengths and weaknesses for our purposes, but Sventon was the best fit (Java, open source, not specific to browsing code repositories).

4.18.1. Notes on Sventon in CoW

CoW uses an unpacked version of Sventon's svn.war with an additional config file which:

CoW configures Sventon at runtime to be able to browse the subversion repositories corresponding to the configured wiki areas. Authentication settings are propagted from CoW into Sventon, so if a remote repository requires authentication Sventon will authenticate using the same settings as CoW would use to access the corresponding wiki.

To upgrade to a new version of Sventon, unpack it into gatewiki/sventon, change the web-app link to point to it, and follow the instructions in the README file.

To tell CoW to run Sventon (when being run via Grails) set gate.cow.sventon.run to true in the config.

4.19. Extending YAM with Plugins

YAM has a plugin mechanism that involves writing a Java (or Groovy) class that implements the YamPlugin interface. Drop this class (which must be in the package gate.yam.plugins into the classpath of the application hosting the YAM translator (e.g. GATEWiki) and then you can make calls as described in the YAM documentation.

See e.g. the implementation of the Twitter plugin for inspiration.

One of the things this is useful for is dropping snippets of HTML into YAM pages (e.g. for Google site search, Twitter updates etc.). (We removed the ability that older versions of YAM had to do this directly because of the security implications, XSS attacks and so on.)


  1. SVN: the Subversion version control system.
  2. Except when in workstation mode.
  3. The link text can contain "inline" markup such as %image, %cite or *bold*/_italic_, but not block markup such as tables
  4. This is a footnote.
  5. The file upload test needs to programatically fill in a file upload form field and this is only possible in a special "elevated privileges" mode using the *firefox launcher