GATEWiki: User and Developer Guide
In a hurry? See the quick start section.
Contents
- 1. Introduction
- 2. Overview
- 3. Information for Users
- 3.1. Quick Start
- 3.1.1. Files, directories and links
- 3.1.2. Creating and editing pages
- 3.1.3. Deleting and copying
- 3.1.4. Upload other types of files
- 3.1.5. Finding things
- 3.1.6. Raw HTML pages vs. wiki pages
- 3.2. Modes
- 3.3. Register, Log in
- 3.4. Create a new wiki page or directory
- 3.5. Edit a wiki page
- 3.6. Searching
- 3.7. Report a bug
- 3.8. The YAM Markup Language
- 3.8.1. YAM Summary
- 3.8.2. YAM Syntax and Usage
- 3.8.2.1. Introduction
- 3.8.2.2. Contents
- 3.8.2.3. Bold, italic, underline and teletype
- 3.8.2.4. Horizontal lines
- 3.8.2.5. Lists
- 3.8.2.6. Verbatim output
- 3.8.2.7. Footnotes
- 3.8.2.8. Escapes
- 3.8.2.9. Titles and metadata
- 3.8.2.10. Headings
- 3.8.2.11. Links and anchors
- 3.8.2.12. Block quotations
- 3.8.2.13. Line breaks
- 3.8.2.14. Tables
- 3.8.2.15. Images
- 3.8.2.16. Citations
- 3.8.2.17. Inclusion
- 3.8.2.18. Non-breaking space
- 3.8.2.19. Comments
- 3.8.2.20. Plugins
- 3.8.2.21. Changes from version 3
- 3.9. LaTeX Support
- 4. Information for Developers and Administrators
- 4.1. Roadmap
- 4.2. Checking out the code
- 4.3. Selecting Modes
- 4.4. Create a new wiki space
- 4.4.1. Authentication settings
- 4.4.2. Setting wiki regeneration
- 4.4.3. Setting wiki updates
- 4.5. Building and testing CoW
- 4.5.1. More about cruising
- 4.5.2. Selenium tests
- 4.5.2.1. Firefox Profiles
- 4.5.2.2. Developing new tests
- 4.5.3. Upgrading to new Grails versions
- 4.5.4. Using the YAM Tests
- 4.5.5. Grails/Spring/JSecurity and the upload function
- 4.6. Subversion versions
- 4.7. Configuring CoW
- 4.7.1. Serving robots.txt
- 4.8. CoW's Data Area
- 4.8.1. SVN Config Directory
- 4.9. Deploying and Running CoW
- 4.9.1. Starting and stopping
- 4.9.2. Portability of the .cowrc.d directory
- 4.9.3. Apache 2 virtual hosts and CoW proxying
- 4.9.4. Production Deployment and Upgrade
- 4.9.5. Deployment and runtime dependencies
- 4.9.5.1. Saving space
- 4.10. Structure, Naming and Code Conventions
- 4.10.1. Naming and Other Code Conventions
- 4.10.2. Structure
- 4.10.2.1. Main classes and Grails objects
- 4.11. Authentication and Authorisation
- 4.11.1. Grails JSecurity plugin
- 4.11.2. Grails JCaptcha plugin
- 4.11.3. Users, roles, permissions, actions
- 4.11.3.1. Directory level authorisation
- 4.11.3.2. Access to non-page controllers via CowPermission control
- 4.11.3.3. User and password constraints
- 4.11.3.4. Security in workstation and server modes
- 4.11.4. Pre-defined security objects
- 4.11.5. Tag library
- 4.11.6. Code dependencies between Jsecurity and GATE Wiki
- 4.11.7. Known vulnerabilities and avoiding them
- 4.12. Site- and Wiki-Specific Layout and Navigation
- 4.12.1. Navigation
- 4.12.2. Replacing the Main Page Layout
- 4.12.3. Changing the Title or Logo
- 4.12.4. Dealing with non-native HTML pages
- 4.13. Referencing and regeneration
- 4.13.1. Serialization
- 4.13.2. Regeneration
- 4.13.3. Speed of Running SVN Status
- 4.14. Sourceforge notes
- 4.15. Grails MVC notes
- 4.16. IntelliJ notes
- 4.17. Search infrastructure
- 4.17.1. Installation
- 4.17.1.1. Directory structure
- 4.17.1.2. Enabling Solr
- 4.17.2. Indexing wikis/individual files from CoW
- 4.17.2.1. Indexing individual Wiki areas
- 4.17.2.2. Indexing individual files
- 4.17.3. Testing
- 4.17.4. Debugging
- 4.18. SVN Browsing
- 4.18.1. Notes on Sventon in CoW
- 4.19. Extending YAM with Plugins
1. Introduction
Files and directories, documents and folders, disks and memory sticks, laptops and games machines, TV time-shifters and corporate IT systems. Data data everywhere, and never a drop to drink, as the Ancient Mariner could not have dreamt of saying. Wouldn't it be nice to be able to view your filesystem as a website, to be able to edit from multiple machines with and without network connections, to be able to have your own local copy and also share it with friends and colleagues, and not have to worry about merging it all back together?
GATEWiki, or CoW, is a "Controllable Wiki" and CMS that supports collaborative document creation with asynchonous off-line editing. CoW is desiged to make it easy to add interaction to static websites, and to support concurrent editing and off-line working with straightforward synchronisation (using Subversion). The system also serves as a test-bed for experiments in controlled languages for round-trip ontology engineering (from the GATE project: http://gate.ac.uk/).
GATEWiki is based on Grails, Groovy, Subversion, Selenium and Java, and is hosted on SourceForge in the GATEWiki project.
2. Overview
Wikis are about allowing groups of people to create and edit sets of interlinked web pages with minimal effort and minimal learning of software conventions and features. Typically they achieve this by
- having an edit link from each page that runs a browser-based editor
- allowing pages to be written in simple plain text with a small number of formatting conventions (e.g. *bold*)
- supporting creation of pages by directing links for non-existent pages to a create page
- allowing multiple users to use the system simultaneously (and sometimes to edit the same document simultaneously)
Content Management Systems (CMSs) provide persistence, versioning, metadata management, upload and browsing of sets of documents and other resources.
CoW is different from other wikis and CMSs because:
- it is designed from the ground up to support concurrent editing and off-line working with straightforward synchronisation using SVN1
- it uses the YAM language, which
- outputs LaTeX as well as HTML
- allows paths as links (i.e. does not limit the namespace to a single directory like e.g. JSPWiki does) and consequently allows a tree-structured page store (and later graph-structured navigation via an ontology)
- it allows mixing of all types of files in its page store (which is just an SVN sandbox, in fact)
- it supports versioning and differencing via SVN, and allows other tools that manipulate SVN repositories to be used with the wiki data (e.g. Tortoise, Eclipse, ViewVC, etc.)
- it supports embedded CLOnE (Controlled Language for Onotology Editing), and therefore the GATE team's experiments with applications that store their data in semantic repositories whose schema is user-defined and maintained
Why another wiki? When we started the work no available wiki in Java that we could find had good SVN support (and we have 55GB of data stored in our SVN repositories!). Using SVN as a backend gives us:
- off-line edit - simply checkout the pages and edit to your heart's content while off-line
- edit with other tools, not just the web interface
- management of authorship-related metadata such as how many lines added, difference between versions and so on
- a stable and reliable versionning system that's been proved in production use by 000,000s of developers
- concurrent editing without locking
CoW is partly intended to be an experimental framework for a new type of website in which
- the database is replaced by a knowledgebase (using OWLIM)
- the knowledgebase is defined and populated using CLOnE
- concurrent changes are managed as in CVS or SVN (check out, edit, update, merge, etc.)
(In fact perhaps that isn't a wiki at all, but a new type of literate database...)
The system is licenced under the GNU General Public Licence version 3 (GPL 3) except where otherwise stated.
3. Information for Users
In general using GATEWiki should be easy enough not to need a manual. To edit click the "edit" link, for example. A few things have some subtlety, though, and these are described here. A working knowledge of Subversion will also help if you want to exploit the system to the full.
3.1. Quick Start
3.1.1. Files, directories and links
A GATEWiki website sits on top of a normal tree of files and directories (for example on the GATE.ac.uk site this page lives in /gatewiki/cow/doc/gatewiki.html.
When editing and linking pages or images or other files what you are doing is simple operations over the file tree. So, for example, if you wanted to add an image to this page, you would follow the "Directory" link, use "Upload" to add your image file to this directory, and then place it in this page by refering to it from here (perhaps by typing %image(your-new-image.png) during an edit session). If you later want to refer to it from a different site, for example, the image will be available on the GATEWiki server as /gatewiki/cow/doc/your-new-image.png.
To link to another page in the same directory just use the file name, e.g. this link is to "index.html".
To link to a directory (which will be either a list of the files it contains, or the index.html file if it exists) just use the directory name, e.g. this link is to the "yam" subdirectory.
3.1.2. Creating and editing pages
To create a new page first navigate to the directory where you want the file to be placed (click on the "Directory" link from any page in that directory). You can see directory name in two places:
- the URL that you're reading, e.g. in the URL http://gate.ac.uk/gatewiki/cow/doc/gatewiki.html the directory is gatewiki/cow/doc
- the list of breadcrumbs in the top bar (starting from "Home" and working down to your current location, e.g. Home > gatewiki > cow > doc
From the directory view you can create a new wiki page or new directory using the "New page" dialog. If you give a ".html" file name you will create a new wiki page; if you give a "name-without-dots" you will create a new directory. (For other file types use "Upload" instead.)
To edit a page simply click "Edit" from the page. You can use a normal word-processor style editor (the "rich editor") or a web form. In the latter case the text is written in YAM, a very simple markup language.
To edit the directory tree off-line simply check out the tree from its repository (you'll need the location of this and the relevant permissions, of course).
3.1.3. Deleting and copying
Files and directories (and their contents) can be deleted via the directory view: follow the "Directory" link from any page, tick the checkbox(es) of the entries you want to remove, then hit the "Delete selected" button at the foot of the page.
Copying (and renaming) of wiki pages is more convoluted at present (and copying of directories is not supported). To copy a page first edit it, copy the text to the clipboard, then create a new file, edit it and paste the contents of the old file. (To do a rename simply perform this process and then delete the old file.)
Don't use the WYSIWYG editor for copying - just use the form editor (otherwise you're liable to lose all the formatting of the page).
(One reason we haven't made copying easier (yet) is that you can also use any Subversion client, of which there are legion.)
3.1.4. Upload other types of files
If you want to upload plain HTML, or a PNG image, or whatever, you can use the "Upload" link which is visible in the directory view. You may upload trees of directories by first TGZing or ZIPing them and requesting the "unpack" option. The upload dialogue allows you to choose whether or not to overwrite existing files.
3.1.5. Finding things
In the usual way GATEWiki supports search via the "Search" box: just type your keywords and press the button. For more sophisticated queries, see the search syntax description at the foot of the search results page or the section on searching.
In keeping with our everything-is-a-directory-tree philosophy, another way to see what files live where etc. is via the directory view.
When you're logged in (and have permission) each page has a "Directory" link leading to the directory view, which is just like a files and folders browser on your desktop. The type of each entry is indicated by an icon:
Directory view also provides access to the upload and delete functions.
3.1.6. Raw HTML pages vs. wiki pages
When you create a new web page in GATEWiki two files are added:
- a .html file that will be served to web browsers when they follow a link to the page
- a .yam file that contains the source text, in YAM format
In general you can simply ignore this - edit and delete operations, for example, work transparently on your pages. However, GATEWiki also supports raw HTML files, which have no YAM source. Again editing and so on is transparent, but because HTML presents more of a security risk the permissions associated with these pages are often different from ordinary wiki pages - hence the different icon for these pages in directory view, for example.
3.2. Modes
CoW has two operational modes:
- Workstation mode, where it runs on a personal machine and works off file trees that are also in active use by the machine's user. In this mode CoW allows you to edit files that already exist, and performs subversion "add" operations when new files are created, but does not commit changes to the repository. You must commit the changes yourself using subversion command line tools.
- Server mode, where it manages file trees that are not generally subject to change by other processes. In this mode CoW assumes it has complete control over the sandbox directories, and edits and new pages are committed directly to the subversion server.
When you use CoW on your own machine it will be in workstation mode; when you're using it over the web on another machine it will be in server mode.
3.3. Register, Log in
When CoW is running in server mode, it is necessary to create an account and to log in before being allowed to edit pages. Some pages will also not be accessible for reading unless an administrator adds you to the appropriate group. To register, go to the login page and follow the register link.
3.4. Create a new wiki page or directory
To create a new wiki page you can either:
- edit an existing page and add a relative link to a non-existent wiki page (the wiki will then link this to a "create new page" facility)
- use the "new page" link from directory listings
- create new files under your wiki-managed SVN sandbox (which CoW will then pull in via a periodic update)
Dependent on the type of name you choose for your page (either *.html or a name with no "."s in it) GATEWiki will create either a new wiki page or a new directory respectively.
To upload other types of files, see the upload section.
Note: for technical reasons the following directory names are currently unavailable in the top-level directory of the main sandbox:
- css
- js
- gwt
- _images
- plugins
- WEB-INF
- g8
(This could be fixed, but only with a fair amount of pain and suffering.)
3.5. Edit a wiki page
When you have permission to edit a page that you're viewing an edit link will appear. Two types of editor are available:
- a WYSIWYG editor (FCK Edit)
- a web form page
To change between these types use the "Switch to..." button.
When using 1., you can generally ignore YAM syntax and work like you do in a word processor - e.g. hit the "B" button to make text bold, or the "I" button to make it italic, etc. When you're finished, hit the save button (it looks like an old floppy disk). There's one exception to ignoring YAM: because the edit is converted back to YAM afterwards, if you include YAM syntax in your edit you'll need to escape it. Do this by putting a backslash in front - e.g. \*.
When using 2. use YAM syntax (here's a short summary, which also appears below the form while editing, and below).
In both cases when you finish an edit2 CoW will try and check in your changes to the parent repository. At this point it checks to see if another user has modified the same file while you've been editing it. If so, their changes and yours are merged; if the changes are in different parts of the file all is well and the merged file is then checked in. If, however, the changes are close together they are judged to be in conflict, and you will be returned to your edit session to resolve this conflict. To find the parts of the file where the problem exists search on "===="; here you will find indications of what was in your file and what was in the edit by the other user, and you can choose one or the other or both as you prefer.
Note: when you edit a non-native (non-YAM, i.e. raw HTML with no GATEWiki version) HTML file GATEWiki will always use the WYSIWYG editor. Be aware that at present this has the side-effect of deleting meta tags from the file headers.
3.6. Searching
A search box is provided at the top right corner of the page that allows searching within a wiki area that the current page belongs to. In other words, if a page belongs to the Help section, results are retrieved only from the Help section. Information on Solr query syntax is available at http://wiki.apache.org/solr/SolrQuerySyntax
Given a query, if it succeeds, maximum of 10 results are shown on a single page. User can navigate through different page links at the top or bottom of the search results to jump to a different sets of results. A search result comprises of the following:
- title of the document
- link to the document
- text snippet with matching terms highlighted
Hits from pages which the current user doesn't have permission to view are filtered out and not included in the search result.
3.7. Report a bug
To report bugs, first please check that they've not been reported already!
Then add a report to the bug tracker, including information about the platform you're running on and all details necessary to reproduce your problem.
3.8. The YAM Markup Language
CoW's underlying markup language is YAM (Yet Another Markup). You don't need to use it - you can use the WYSIWYG editor instead - but if you're a Vim-wielding old fossil like me you may like it. The syntax is about as light as they come; next is a summary followed by a longer description.
3.8.1. YAM Summary
Title | First paragraph of the file. |
Headings | %1,%2,etc.; %1* is unnumbered; follow with blank line |
Bold, italic, teletype, underlined | *...*, _..._, ^...^, __...__ |
Contents | %contents |
Horizontal lines | -— |
Tables | %[ | row 1/column 1 | r1/c2 | -— | r2/c1 | r2/c2 | %] |
Block quotation | %"...%" |
Line break | %br |
Verbatim | %<...%> |
Code | %code(lang=Java)< ... %> |
Lists | - item 1 |
Footnote | %footnote(...) |
Escaping | \ |
Links |
http://thing.com/ or %(http://thing.com/) or |
Anchors | %#name (then link to it with "%(#name)") |
Images | %image(file) or |
Citations | %cite(citekey,citekey,...) |
Inclusion | %include(file.yam) |
Non-breaking space | %\ followed by space |
Single-line comment / notes | %% ... |
Multi-line comment / notes | %/* ... %*/ |
Special characters | (e.g. < or & in HTML) are |
%twitter(title=GATE News, account=GateAcUk, name=GATE, count=10) | |
%google(siteip=gate.ac.uk) | |
Metadata | %meta(author=My Name) become <meta> tags in the HTML |
3.8.2. YAM Syntax and Usage
3.8.2.1. Introduction
YAM (Yet Another Markup) is a simple wiki language used in GATEWiki. The language syntax is described below.
3.8.2.2. Contents
Contents listings like that above are generated by '%contents'
3.8.2.3. Bold, italic, underline and teletype
Bold text is contained in stars: *this is bold* becomes this is bold.
Italic text is contained in underscores: _this is italic_ becomes this is italic.
Fixed-width text is contained in carat signs: ^this is teletype^ becomes this is teletype.
Underlined text is contained in doubles undercores: __this is underlined__ becomes this is underlined.
3.8.2.4. Horizontal lines
Horizontal lines are indicated by 3 or more dashes at the start of a line. For example:
---
and
---------------------------
both result in:
3.8.2.5. Lists
Unordered lists are indicated by '-' at the start of a line, and ordered lists by '#'. Nesting is indicated by increased spacing preceding the item indicator. For example:
- This is an undordered list - Second item # This is a nested... # ...ordered list - Back to the third item of the enclosing list
results in:
- This is an undordered list
- Second item
- This is a nested...
- ...ordered list
- Back to the third item of the enclosing list
The precise size of the indentation of embedded lists doesn't matter, it just needs to be larger than that of the enclosing list.
Lists end when there is a blank line or where the next line of text is not indented. For example:
- This is a one item list followed by - another one item list.
results in:
- This is a one item list
followed by
- another one item list.
Note: lists embedded in tables have to start on a new line, just like elsewhere; in tables a syntax error will result if the list starts on the same line as the rest of the row.
3.8.2.6. Verbatim output
Verbatim output starts with '%<' and ends with '%>'. For example:
%< This will *not* get translated. %>
When the target language is HTML, for example, the output will contain '<pre>' tags.
For code listings you can enable syntax highlighting with
%code(lang=Java)< public void hello() { System.out.println("hello world"); } %>
which produces
public void hello() { System.out.println("hello world"); }Highlighting is performed in HTML using google-code-prettify and in LaTeX using the listings package. The list of supported language names is slightly different for the two packages (HTML, LaTeX) but mainstream languages including "C", "Java", "Python", "HTML", "XML", "CSS" and "TeX" are supported by both. When translating yam to HTML (but not to LaTeX) the highlighter will attempt to guess the appropriate language if you omit the lang specification altogether (%code()< ... %>).
By default, the listing does not have line numbers. Numbering can be enabled using the option numbering=on, plus an optional firstnumber=N if you want to start numbering from something other than 1. Note that in HTML only every fifth line is numbered in the default google-code-prettify CSS style.
3.8.2.7. Footnotes
Footnotes are like this:
%footnote(This is a footnote.)
Becomes:4.
The contents will be put in a section at the end of the document (HTML) or at the bottom of the page (LaTeX), and linked by number from where they occured.
3.8.2.8. Escapes
To stop a special character from being interpreted, use a '\'. For example,
\---
will not generate a line.
(This also works for the forward quote or backtick character — ` — which is used in LaTeX but may otherwise be replaced by a normal single quote in HTML output.)
3.8.2.9. Titles and metadata
The title of a document is the first paragraph of the document, ending in one or more blank lines. (Often this will be a single line of text.)
Metadata can be specified using %meta(foo=bar), which in HTML will become
<meta name="foo" content="bar">
in the page header.
3.8.2.10. Headings
Headings are lines starting with %1 (for first level), %2, %3 or %4 and are followed by one or more blank lines. For example, the heading for this section is
%1 Headings
If a heading level is followed by "*" it is not numbered, e.g.:
%1* An unnumbered heading
Becomes:
An unnumbered heading
This heading will not appear in the contents table.
3.8.2.11. Links and anchors
Links can be specified in three ways:
- As plain text, e.g. 'http://gate.ac.uk/' will become http://gate.ac.uk/
- Using '%(target)', e.g. %(http://gate.ac.uk/) will become http://gate.ac.uk/
- Using '%(target, label)', e.g. %(http://gate.ac.uk/, GATE home) will become GATE home
Spaces or commas within the link target of %(...) format URLs must be escaped. The link text (following the first unescaped comma) may contain "inline" Yam markup such as %image(...), %cite(...), *bold*, _italic_ or ^teletype^, but not block-level markup such as tables. Parentheses are allowed within link text but left and right parentheses must be balanced, i.e. %(http://example.com, an (example) link) is OK but %(http://example.com, unbalanced ( brackets) is not unless the unmatched parenthesis is escaped.
A URL that appears in plain text must be followed by a space, tab or newline. Sometimes, you might need to follow a URL with something other than a space, tab, or newline, for example when applying other formatting characters. To do this, use a bracketed form. e.g. to teletype a URL, ^%(http://gate.ac.uk/)^ becomes http://gate.ac.uk/.
Anchors and labels are specified using '%#name'. For example,
%1 A Heading %#label
will result in a heading followed by the anchor label. To refer back (or forward) to the anchor, use a "#" in the link, e.g.
%(#tables, tables)
will result in tables.
Spaces or commas inside anchors must be escaped. An anchor that appears in plain text must be followed by a space, tab or newline.
A relative link to a non-existant file will be rendered as a link to the host
application's "create" page, e.g.
%(../non-existant.html)
becomes:
../non-existant.html
A link to an existing file will be just link as normal, e.g.
%(index.html)
becomes:
index.html
3.8.2.12. Block quotations
Block quotations are enclosed in %" marks. For example,
%"This is a quote%"
becomes:
This is a quote
Note that because the quote marks are treated as normal words, they can cause overlap problems (in the same way that an unclosed bold or italic mark might). For example,
%" - list %"
is not a good idea as the end of the quote will preceed the end of the list. The workaround is to close the list first by adding a blank line:
%" - list %"
which then results in something sensible:
- list
3.8.2.13. Line breaks
Line breaks are indicated by %br at the end of a line. For example:
This line is broken %br in two.
becomes:
This line is broken
in two.
3.8.2.14. Tables
Tables use square brackets, bars and dashes. For example:
%[ | *header col 1* | *header col 2* | --- | row 1 col 1 | col 2 | --- | row 2 col 1 | col 2 | %]
results in:
header col 1 | header col 2 |
row 1 col 1 | col 2 |
row 2 col 1 | col 2 |
To include a | in normal text escape it like this: \|.
(See also the note above about embedding lists in tables.)
3.8.2.15. Images
Images are like URLs:
- '%image(test-image.png)' will become
- '%image(test-image.png, a test image)' will become (the text becomes the "alt" attribute of the image)
You can also specify an ALT tag, width and height, position and border width: '%image(test-image.png, ALT tag, 500, 500, left, 0)' becomes
3.8.2.16. Citations
Citations work like this: '%cite(Cun06a)' becomes Cun06a. Multiple cite keys should be separated by commas, e.g.: '%cite(Cun05a,Cun06a)' becomes Cun05a, Cun06a.
3.8.2.17. Inclusion
A page can include another page like this:
%include(yam-first.yam)
This results in the inclusion of all the text from yam-first.yam in this file.
An increment to be added to the heading level can be given as the first argument.
Note that the titles in the included files are ignored by default. A "useTitle" flag can be given (after the increment if it exists) to cause inclusion of the title (as a heading). For example: %include(1, useTitle, yam-first.yam).
3.8.2.18. Non-breaking space
Non-breaking spaces are added using %\ followed by space, e.g.
This line %\ %\ %\ %\ has spaces in the middle.
This line has spaces in the middle.
3.8.2.19. Comments
Single-line comments are created by two or more percents together, e.g.
This is not commented %% but this is
becomes:
This is not commented
Multi-line comments are created by %/* and %*/, e.g.
This is not commented %/* but this is and this is too %*/
becomes:
This is not commented
3.8.2.20. Plugins
YAM can be extended by the use of plugins. Creating plugins requires some Java programming - see the developer guide for more details.
Plugins bundled with GATEWiki:
- Twitter.
Adds a list of Twitter updates to a page. Example usage: %twitterWidget(account=GateAcUk, widget-id=yourIdHere, width=300, height=450) - Google.
Adds a Google site search box. Example usage: %google(siteip=gate.ac.uk)
3.8.2.21. Changes from version 3
YAM is currently in version 5. Since versions 3 and 4 these changes were made:
- added plugins
- horizontal lines are now three or more dashes
- comment syntax: %% for single lines, and %/* ... %*/ for multiple lines
- addition of column separator bars at the start and end of table rows
- multiple lines allowed in titles
- added underlining
- no more %output function
- changed quotation syntax to %"
- changed line break style to %br
- verbatim output is %< ... %>
- target language control characters (like < or &) now dealt with properly
- headings can be unnumbered, e.g. "%2*"
- numbered lists are now prefixed by "#" instead of "o"
- added non-breaking spaces
- various bug fixes
- changed fixed width from equals = ... = to carat ^ ... ^
- added plugins
3.9. LaTeX Support
YAM will translate into LaTeX (as well as into HTML). Some things of note:
- Not everything in LaTeX corresponds to a construct in YAM, and vice versa. The process is not perfect.
- Each .yam file will be translated into a standalone .tex ready to compile.
- It is assumed that the bibliography files thebib.bib and thebibstyle.bst are present.
- In image directives the width and height values are divided by two as a rough approximation for conversion of pixels into points.
4. Information for Developers and Administrators
Currently CoW includes:
- a wiki language with a JavaCC parser
- a CMS persistence and versioning backend using Subversion
- simple display and edit of wiki pages and other static content using Grails
- a version of the Grails JSecurity plugin that includes editing of the user data and assigns permissions to wiki areas and directories
- user registration using the JCaptcha plugin
- webtests using Selenium
- a minimal Hypersonic DB to store pointers to wiki areas and user/role/permission data)
The system has two modes, workstation and server; the former does no user management, the latter uses JSecurity.
There is API documentations etc. linked from here, and the top of the software documentation tree is here.
4.1. Roadmap
The development roadmap, currently active tasks list and wish-list is recorded in the backlog document.
4.2. Checking out the code
To check out CoW from Subversion first decide if you want to check out a copy of GWT and Grails HEAD while you're at it. If so do this:
svn co https://gatewiki.svn.sourceforge.net/svnroot/gatewiki/trunk gatewiki
If not, do this:
svn co https://gatewiki.svn.sourceforge.net/svnroot/gatewiki/trunk/cow cow
If you do the latter you'll need to set properties in build.xml to point to your own installation of GWT and Grails. (Also, if you put CoW in a directory not named cow you will need to change projectName in webtest/conf/webtest.properties to get the Canoo tests to work.)
Also note that the copy of GWT in the repository is for Linux. If you are on Windows or a Mac then you will need to replace this with the appropriate version for your OS.
4.3. Selecting Modes
To select the mode (see above) use
- -Dgate.cow.mode=workstation
- -Dgate.cow.mode=server
Workstations include laptops and are expected to be off-line from time-to-time; servers are expected to be always connected (and always have accses to the relevant SVN repositories).
4.4. Create a new wiki space
To create a new wiki space go to the Admin page and click through to wiki areas and select "New wiki". Then you can either:
- browse for an existing SVN sandbox
- get the wiki to create a new sandbox for you
In the former case you can choose any existing SVN-controlled file tree and CoW will allow you to create, share and update YAM files in that tree.
Note that each wiki area has its own sandbox. Two areas are special:
- area 1. is the help area, served from /help URLs
- area 2. is the main area, served from all other / URLs
Other areas are served from /g8/page/show/<area ID> URLs.
4.4.1. Authentication settings
When running in server mode, or when doing scheduled or one-off subversion updates (see below), CoW needs to communicate with the subversion repository (or repositories) underlying the wiki sandbox. If the repository is on the local filesystem (i.e. checked out using the file: protocol) this will work fine, but if the repository is remote (svn:, svn+ssh:, http: or https:) it may require authentication. Since the same authentication profile may be shared between several sandboxes (for example several repositories hosted on the same SSH server), configuring authentication is a two-step process. First you must create an authentication profile containing the user name and other credentials, and second you associate that profile with the relevant wiki area or areas.
To manage the known authentication profiles, go to the main Admin home page and follow the "authentication profile" link at the bottom of the page. Each profile can hold any or all of the following data:
- username If specified, all interactions with the server will use this username, and all subversion commits will have this name as their author. If unspecified, the user name currently logged into CoW will be used.
- password The password used to authenticate to http: and https: repositories that request one, and also to SSH servers (for svn+ssh:) unless a private key (see below) is also provided.
- SSH-specific options:
- port number The port number on which to connect to the SSH server. The default is port 22, which works in the majority of cases.
- private key If your SSH server authenticates using public/private key pairs then you should specify the private key (and its associated passphrase) here. The key file should be in the normal OpenSSH format. The public key goes in the authorized_keys file on the server.
- HTTP/HTTPS-specific options
- client certificate If your HTTPS server requires the use of a client SSL certificate then you should specify it here (along with its associated passphrase). The certificate should be stored in PKCS#12 format including the corresponding private key.
- verify server certificates Should we verify the validity of the certificates presented by the HTTPS server in the usual way, or just accept any certificate. The former is more secure but the latter may be necessary if your server uses self-signed certificates.
- force authentication Normally a username and password are only passed to the server if it requests them (via a 401 status code). This option causes them to be passed in all requests, without waiting to be challenged. This may be required for servers that make some parts of the file tree available anonymously and other parts only to registered users.
Note that the SSH private key option takes precedence over a username and password - even if a password is set it will not be sent to SSH servers, the key will be used instead.
Once you have created and configured the authentication profile, you can attach it to the relevant wiki areas, either when the area is created or by editing the area definition (via the "Create and edit Wiki areas" link on the admin page).
Limitations: Each wiki area is associated with a single authentication profile so if the sandbox includes directories from other repositories (e.g. via svn:externals), all the different repositories must be accessible using the same profile. This limitation may be relaxed in a future version of CoW.
4.4.2. Setting wiki regeneration
Wiki pages may refer to other wiki pages, via links and includes. They are therefore dependent on each other. When you change a page, the dependencies will also be updated. This may get out of sync (perhaps via a direct edit on disk, or some other route). You can therefore regenerate wiki dependencies via the admin interface. This is described further below.
4.4.3. Setting wiki updates
A wiki is a working copy of an SVN repository, and so could get out of date with respect to the repository. You can update the wiki working copy from the admin interface (on the wiki admin page) in two ways:
- Set the update interval. The wiki working copy will be updated at this interval, in minutes. An empty value (null) means that the wiki working copy will never be updated. The default value for a new wiki is set in the wiki config file, and is currently null.
- Click the update button on the wiki admin page, to update now.
If the wiki working copy is locked (e.g. by an update following an edit checkin), then the update will be skipped. If there is a conflict, the update will fail. If the repository requires authentication this must have been set up as described above.
Any files that are created, modified or deleted in the sandbox as a result of the update will have their dependencies regenerated automatically. In particular if a .yam file has been committed to the repository its corresponding .html will be regenerated (and, in server mode, checked in) when CoW updates the .yam.
4.5. Building and testing CoW
CoW is built with ant. For more documentation do ant help in the cow directory. To run CoW see next section.
Currently the build file
- assumes that ../gwt and ../grails contain GWT and Grails (see the properties defined in build.xml and change as necessary, or override in a build.properties file in the build directory)
- puts the Grails work directory in dot-grails (to avoid conflicts between multiple servers at startup time)
To build CoW from a clean checkout, assuming that grails is in the same directory:
- cd grails && ant clean jar
- cd ../cow && ant help
- ant cruise (clean build, war, unit and integration tests)
- ant run-dev (to make sure that the functional tests have a populated user home environment to run off; kill the server after it is running)
- ant test-selenium (functional tests)
For shell-literate people there's also a script bin/cruise which makes it easier to interpret the voluminous output of the tests and which stores a log in ant-log.txt, but note that this doesn't run the functional tests.
4.5.1. More about cruising
The cruise target does a clean build and runs the Grails unit tests and the Java-only tests (for YAM etc.). Functional tests used to use Canoo (ant target webtest) but we couldn't get them to work with GWT, so now are provided via the test-selenium target (which is not part of cruise because of the difficulty of configuring Selenium in headless mode cross-platform). The war target creates a WAR file which can be deployed by dropping it into a servlet container.
The old Canoo target (currently broken):
- webtest-page: run tests against PageController; this has the virtue that it will work against a running grails instance, so you can use this to run webtests while developing, for example, or against the production server to get some timing data (when running in production you'll need to uncomment the line to set the context path to / in webtest.properties)
See ant help for more details.
Note that there are some bugs in ant target ordering (workaround for most of them is to run ant cruise once before doing anything else):
- ? have to run ant webtest at least once before can do ant run-dev after ant clean
- to generate the log4j properties for the java- targets you need to call ant war
4.5.2. Selenium tests
Functional testing of CoW uses Selenium. Selenium uses various bits of JavaScript magic to enable you to remote-control a real web browser talking to your web application and verify that the results match expectations (e.g. that a page contains particular text, or that an alert box is shown with a particular string, etc.). The Selenium tests for CoW are contained in the directory gatewiki/cow/selenium in the distribution. There are ant targets to run the full test suite, or you can open individual tests in the Selenium IDE plugin in Firefox.
To run the Selenium tests in ant, do ant test-selenium. This will start up the server in workstation mode, run the workstation-mode test suite, then shut the server down, restart it in server mode, and run the server-mode test suite. If there is already a running CoW instance on localhost port 8080 then the tests will use that instead of running their own, but in that case you should only run the test suite that corresponds to the mode your server is running in (ant test-selenium-workstation or ant test-selenium-server), as the tests for the other mode will probably fail.
Selenium runs your tests in a real web browser, so before running the tests with ant you will need to configure the browser that Selenium should use. Selenium supports various different browsers, though the current CoW test suite is only known to work reliably on Firefox5. The default configuration (specified in gatewiki/cow/selenium/test-selenium.properties.default) runs the Firefox browser, and expects to find the firefox-bin executable on your path (Linux) or for Firefox to be installed in the default location (/Applications/Firefox.app on Mac OS, C:\Program Files\Mozilla Firefox\firefox.exe on Windows). If this is not the case you will need to create a test-selenium.properties file to override this default, containing the line:
browser=*firefox /path/to/your/firefox-bin
For example, on Ubuntu, there is no firefox-bin, instead the Firefox binary is called just firefox and lives in /usr/lib/firefox-{version}, see test-selenium.properties.ubuntu for an example.
Note that because of this additional configuration step, the Selenium tests are not run as part of ant cruise.
You may also have problems running Selenium from the command line if you have existing test artefacts in your CoW database. Try deleting all of these and re-running:
- cow/dev-user-home/.cowrc.d
- cow/webtest/user-home/.cowrc.d
- ~/.cowrc.d
- ~/.grails
4.5.2.1. Firefox Profiles
The Selenium tests are run using a custom Firefox profile that allows us to pre-configure certain settings without changing the default user profile. If accessing the app directly then the profile in gatewiki/cow/selenium/profile/normal. If, however, the tests are being pased through ratproxy for security testing the the profile in gatewiki/cow/selenium/profile/ratproxy will be used.
Full details on how to generate the profiles and associated certificates can be found here.
4.5.2.2. Developing new tests
To develop a new selenium test it is easiest to use the Selenium IDE Firefox plugin. Start up a test instance of CoW in the relevant mode using ant run-test (with -Dgate.cow.mode=server if you are developing a server-mode test), then open up the selenium IDE in Firefox and load the relevant test suite, cow/selenium/{workstation,server}-suite.html. You can add new tests to the suite, individual tests should be saved as HTML files in the cow/selenium/tests directory.
One of the gotchas is that e.g. the upload selenium test depends on the presence of certain files in the dev-user-home/.cowrc.d directory, which will only be present if you have deleted this and re-run cow in development mode since those files were created...
4.5.3. Upgrading to new Grails versions
- svn up && ant clean jar in grails
- check compatibility between cow/lib/* and grails/lib/* (the script bin/check-jars.sh can help)
- change build.xml to reference new groovy jar if needed
- ant clean in cow
- ant upgrade in cow
4.5.4. Using the YAM Tests
The test suite uses a bunch of .yam and compiled .html files in cow/test/resources. After running the tests the script cow/bin/check-errors will check for failures and, when these are caused by incorrect translations, display a (tk)diff of the actual and the correct output.
Note: when updating to new versions of yam2html the test resource yam-minimal-no-includes.html needs to be selectively merged with yam-minimal.html. First update the latter to reflect the translation changes, then:
- create a patch against the repository minimal file:
- svn diff yam-minimal.html >minimal.patch
- apply that to the no-includes file:
- patch -o new-no-includes.html yam-minimal-no-includes.html minimal.patch
- overwrite the working copy:
- mv new-no-includes.html yam-minimal-no-includes.html
- check that it worked:
- svn diff yam-minimal-no-includes.html
4.5.5. Grails/Spring/JSecurity and the upload function
There is currently a problem between Grails/Spring and the JSecurity plugin when uploading a file that we hope will be solved in future updates.
The problem is that when inside a Spring webflow JSecurity changes the MultipartHttpServletRequest into a JsecurityHttpServletRequest that we can't then get the uploaded file from (see JSecurity forum). The workaround we implemented involves reloading the main page after each upload action, which is a bad user experience but works...
4.6. Subversion versions
(This note is only relevant to those using local copies of SVN tree. By the time you read this the version number have probably changed.)
Wiki areas in CoW are subversion sandbox directories. There are many different versions of the subversion Java and command line tools, and each version is associated with a particular format of the control files under the .svn directory in the sandbox. Generally speaking, later versions of subversion tools can read sandboxes created by earlier versions, but in doing so they transparently "upgrade" the sandbox to the newer format. Once this happens in a particular sandbox, that sandbox will no longer be readable by the earlier version.
CoW's subversion support is provided by the SVNKit library. At the time of writing we are using version 1.2.0, which works with the same working copy format as the 1.5 series svn command-line tool. This means that, for example:
- If you use CoW with a sandbox that was created by svn version 1.4 or earlier, that sandbox will no longer be usable by the 1.4 version command-line tools.
- If you use 1.6-series command-line tools on a sandbox, that sandbox will no longer be readable by CoW.
If you intend to use CoW in workstation mode you must use a compatible, e.g. 1.5-series command-line client. If your command-line client is version e.g. 1.6 you will need to upgrade the svnkit JAR used by CoW to version 1.3.0 (which speaks the 1.6 protocol).
4.7. Configuring CoW
Configuration options in CoW are dealt with in the normal Grails fashion in a file called Config.groovy; to override these options create a file called .cowrc.groovy in your home directory.
For example, the following .cowrc.groovy would change the title and logo, and turn on the Sventon and Solr 3rd-party webapps:
/** Herein external CoW config. */ println "loading external user config; running in ${'pwd'.execute().text}" gate.cow.name.short = "CoW - dev mode" gate.cow.name.long = "CoW - dev mode, a Controllable Wiki" gate.cow.logo = "/g8/page/show/1/doc/larson-small.png" gate.cow.sventon.run = true gate.cow.solr.run = true
For more details on what this example is doing, see the site-specific layout section).
Note that because of this bug, CoW uses a slightly non-standard way of configuring the Grails DataSource. To modify data source settings you should edit the DataSource.groovy under the config directory, not the one in grails-app/conf. Hopefully we will be able to revert to using the normal mechanism when we next upgrade to a newer Grails release.
4.7.1. Serving robots.txt
To change the default /robots.txt (which does nothing) set gate.cow.robots.
4.8. CoW's Data Area
CoW stores all user data in a directory called .cowrc.d (on *NIX), which is by default in the user's home directory.
4.8.1. SVN Config Directory
The svnconfig directory stores files related to how SVN works within CoW. This is a standard SVN config directory (i.e. it is the same as ~/.subversion) but specifically configured for CoW.
Currently the one thing this configuration ensures is that .yam files have the LF line ending. Note that this is only true for files added or imported through CoW. If you are adding YAM files outside of CoW then you should manually ensure that have the LF line ending applied by SVN.
4.9. Deploying and Running CoW
4.9.1. Starting and stopping
The easiest way to run (or deploy) CoW is via Grails. If you're developing then use Grails' run-app; for production use run-war. CoW's build file gives you acess to these: the ant run-dev target does a Grails run-app and ant run-prod runs a Grails production Jetty instance on the CoW WAR.
Shutdown: ctrl-c will shut down Jetty when run from Ant or Grails. Ctrl-c of Ant does not, however, cleanly shutdown a Jetty forked from Ant. For a clean shutdown with correct execution of all shutdown code, run ant shutdown-prod. You may supply an optional port on which Jetty will listen for the shutdown signal, and a password key to listen for, with -Djetty.shutdown.port and -Djetty.shutdown.key. Defaults are cleartext in the build file.
Note that the first time the system runs it will create a .cowrc.d directory (or cowrc.d on Windoze) in your home directory containing help documentation, a DB etc. (The first time through this takes a couple of minutes as the way it sets up the new wiki areas is inefficient.)
Alternatively use ant war and deploy the result onto your favourite servlet container. There are these small disadvantages:
- CoW does some Jetty- and Grails-specific work to bootstrap and configure Sventon to allow SVN repository browsing, so you'll need to configure Sventon separately if you want this facility
- ditto for CoW's indexing and search facilities
- different containers sometimes trigger bugs that don't appear in the environment that the developers use, so you should do some thorough testing on your container before production deployment
4.9.2. Portability of the .cowrc.d directory
If you want to move the data from a CoW installation into a different location, you need to do two things:
- tell SVN that you moved the sandboxes, e.g.
- svn switch —relocate file:///home/thomas/.cowrc.d/0.1/svnrep/trunk/help file:///data/hamish/nlp-user-home/.cowrc.d/0.1/svnrep/trunk/help
- change the paths to the sandboxes in the DB, e.g. by editing .../.cowrc.d/dbs/prodDB.script (or using an administrator interface to the database if you're not using the built-in Hypersonic DB)
- check that you didn't copy any staging area data (best delete it, the SVN URLs will be wrong)
4.9.3. Apache 2 virtual hosts and CoW proxying
This section describes running multiple websites (e.g. http://gate.ac.uk/ and http://gatecloud.net) on a single physical server using Apache virtual hosts and proxying. Each CoW site runs on a different port and is proxied by the (single) Apache server. The configuration was tested on Ubuntu Intrepid, Apache 2.2.9, CoW 0.3.
Steps:
- tell the DNS server to resolve the various site domain names to the server you'll be running on (you can mock this by putting entries like 127.0.0.1 gate.ac.uk in /etc/hosts)
- install Apache 2: sudo apt-get install apache2
- enable proxying:
- sudo a2enmod, answer "proxy", then do the same to enable "proxy_http"
- comment out "Deny from all" in /etc/apache2/mods-enabled/proxy.conf (doing this is not insecure in our case according to http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#access because we set ProxyRequests to Off)
- for each site:
- put an entry (see below) in /etc/apache2/sites-available and link
to it in sites-enabled with a2ensite, e.g.
- vim /etc/apache2/sites-available/gate.ac.uk
- a2ensite gate.ac.uk
- run cow on a different port, e.g. ant -Dgate.cow.server.port=8081 run-prod
- put an entry (see below) in /etc/apache2/sites-available and link
to it in sites-enabled with a2ensite, e.g.
- you can now browse to the different URLs and be served pages from different CoW instances
Example virtual host definition from the sites-available directory:
# gate.ac.uk <VirtualHost *:80> # copied from the default site set up by the debian installation ErrorLog /var/log/apache2/error.log LogLevel warn CustomLog /var/log/apache2/access.log combined ServerName gate.ac.uk ServerAlias www.gate.ac.uk ProxyRequests Off ProxyPass / http://localhost:8080/ ProxyPassReverse / http://localhost:8080/ </VirtualHost>
(And the same for gatecloud.net (with a different port number of course), for example.)
Verifying the VH config:
# APACHE_RUN_USER=www-data APACHE_RUN_GROUP=www-data /usr/sbin/apache2 -S ... VirtualHost configuration: wildcard NameVirtualHosts and _default_ servers: *:80 is a NameVirtualHost default server 127.0.1.1 (/etc/apache2/sites-enabled/000-default:1) port 80 namevhost 127.0.1.1 (/etc/apache2/sites-enabled/000-default:1) port 80 namevhost gate.ac.uk (/etc/apache2/sites-enabled/gate.ac.uk:3) port 80 namevhost gatecloud.net (/etc/apache2/sites-enabled/gatecloud.net:3) Syntax OK
When CoW is behind an Apache proxy in this way there is also the option to map other wiki areas (apart from area 1 or 2) to more "friendly" URL prefixes through the use of mod_rewrite rules. To make use of this, first do sudo a2enmod rewrite, then add the following to the virtual host configuration, before the ProxyPass directive:
RewriteEngine on RewriteRule ^/g8/page/show/4(/.*)?$ my-wiki$1 [R,L] RewriteRule ^/my-wiki(/.*)?$ /g8/page/show/4$1 [PT]
This will make wiki area 4 visible under /my-wiki. The second RewriteRule internally re-maps requests for pages under /my-wiki to their corresponding locations under /g8/page/show/4. However, CoW only sees the usual /g8/page/show URLs and has no knowledge of this mapping, so it will generate links that point to /g8/page/show/4. The first rule means that when such URLs are requested by a user's browser, they will be redirected back to the friendly /my-wiki URL instead.
Open issues:
- what to do with the default site from the apache install?
- to disable it: a2dissite default
- note: there is always a default virtual host (either the default or, when that is disabled, the first one listed in the sites-enabled directory), so perhaps we might as well leave the normal default enabled? DNS shouldn't be mapping any addresses that we're not dealing with explicitly anyway...
References:
- http://www.debian-administration.org/articles/412
- http://www.ducea.com/2006/05/30/managing-apache2-modules-the-debian-way/
- http://www.debian-administration.org/tag/apache2
4.9.4. Production Deployment and Upgrade
When deploying GATEWiki these elements need to be born in mind:
- the .cowrc.d directory
- the .cowrc.groovy config file
- the server software (Grails etc.)
- the GATEWiki software (inccluding any layout plugin installation)
- a startup script derived from cow-init.sh and linked into init.d
The following set of steps details a reasonably minimal recipe for deployment and upgrade of production servers. It uses G8RS.net (the GATE team's site for internal use) as an example. These instructions were written for GATEWiki 0.9 running on Grails 1.1.1 on Ubuntu Hardy and Apache 2.
- make a directory called after the site name, e.g. g8rs.net; cd there
- check out the gatewiki code as a directory named prod-server
- install a site layout plugin if required
- link the plugin in cow/plugins and edit BuildConfig.groovy to add a reference to the plugin (avoid checking in the BuildConfig! in fact it is best never to check in anything from the prod server, and never to update)
- copy the script cow/bin/cow-init.sh to the g8rs.net directory and edit as
appropriate, e.g.:
- NAME=G8RS.net
- COW_HOME=/data/herd/g8rs.net/prod-server/cow
- COW_PORT=9090
- COW_MODE=server
- RUN_AS=hamish
- COW_USER_HOME=/data/herd/g8rs.net
- link cow-init.sh from /etc/init.d and the RC scripts
- when upgrading
- copy the prod-server to dev-server and svn up the latter
- copy prod db to dev db in cowrc.d/dbs, upgrade and run the dev server, test on 8080
- replace prod-server with dev-server
- run prod-server
- sudo ./cow-init.sh [re]start
- if needed: replace the admin account; edit the main wiki to point elsewhere; create a solr index; set includeDirs for the raw html on the wiki to .* or whatever
- server tree example
- /data/herd/g8rs.net/
- /data/herd/g8rs.net/prod-server [ swapped at each...
- /data/herd/g8rs.net/dev-server ...upgrade ]
- /data/herd/g8rs.net/
- /data/herd/g8rs.net/cowrc.d (convenience link to .cowrc.d)
- /data/herd/g8rs.net/cowrc.groovy (convenience link to .cowrc.groovy)
- /data/herd/g8rs.net/main-sandbox
- /data/herd/g8rs.net/cow-init.sh
- /etc/init.d/cow-init-g8rs-net.sh (link to cow-init.sh)
And Bob's your uncle.
4.9.5. Deployment and runtime dependencies
This note discusses two issues:
- how do you package up a CoW site for initial deployment? when you want to upgrade the software for a site what do you do?
- how should 3rd-party webapps like Sventon and Solr be configured and shared between CoW instances?
Note: for deployment and upgrade the discussion here is superseded by the preceding section.
Complicating factors include:
- we're now using Grails for production deployment, so all CoW instances depend on a particular version of Grails (one reason for doing this is that we can use Grails' _Events.groovy to start 3rd-party apps)
- most sites will be a combination of vanilla CoW plus a plugin that provides site-specific layout etc. (in the way that we did the nlp.shef.ac.uk demo site)
- we should be able to return to a deployed version if we need to, but we don't want to store every deployment in svn in its entirety
- it is probably desirable to be able to rsync only the tree changes for a new deployment, as if we include everything it currently runs to around half a GB (the site itself, Nutch, Solr, Grails, GWT... no doubt we can cut this down but only with some work)
Runtime dependencies:
- Grails binary (which is around half the size of the built from source version we're currently using)
- cow/*, optionally with a site-specific plugin installed
- sventon, for browing SVN repositories (contianing wiki areas)
- nutch-solr, for indexing and search of wiki areas
Design choices:
Taking the easy one first, we address issue 2. (how to share Sventon and Solr between CoW instances) by putting all site-specific config for these apps in the cowrd.d directory, so the file trees are shareable across instances. (Sharing the servlets themselves is tricky because of their configuration dependencies on the wiki areas of the CoW instance.)
Re. issue 1., packaging for deployment and upgrading deployed sites, we have a basic solution in the create-custom-cow-site.sh script (see also the site-specific layout section). This has two modes
- creating a new deployment tree, including all the runtime dependencies
- updating an existing tree
In both modes these parameters are relevant:
- site-specific layout plugin
- server software deployment version (given as a date)
- site name
The version/date is used as the basis for tagging SVN trees (so that you can return to the scene of this crime later if necessary) and to distinguish site and server deploys (both the server software tree and the site tree will include the date).
Mode 1., creating a whole new deployment tree:
- copy the gatewiki tree
- in the copy
- move cow to the name of the deployment site (plus date)
- install the layout plugin
- (optionally) tag the repository
- (optionally) copy the tree to the deployment machine
The new tree is then tested and deployed on the remote server.
Mode 2., updating a deployed site:
- copy the cow tree to the name of the deployment site (plus date)
- install the layout plugin
- (optionally) tag the repository
- (optionally) rsync the tree to the deployment machine
The new CoW tree is tested, and then used to replace the running site.
Note that using mode 2 multiple sites can be deployed into a single main tree, thus sharing most of the runtime dependencies (e.g. Grails) across sites.
Note also that when these dependencies change (e.g. new version of Grails or Nutch etc.) mode 1 should be used to create a new complete tree. (If upload bandwidth is an issue this can be rsync'd with a copy of the existing deployment tree on the target machine.)
4.9.5.1. Saving space
At the time of writing we have the following raw sizes for the runtime dependencies discussed above (total ~450M):
$ du -sh ... 289M cow 112M grails 159M gwt 171M nutch-solr 24M sventon
(Though note that most of this data is in SVN, hence twice as big as otherwise. The total without the SVN directories, and with some development data excluded, is a bit under 450M at the time of writing.)
There are lots of ways to reduce this total:
- cow includes lots of test data (dev-user-home, webtest/user-home, ...)
- use a pared-down GWT not including the hosted-mode browser (13M)
- move the _Events code somewhere else and go back to .war for deployment (but what to do about dev mode? also means porting the code away from the Jetty API, or only using the war in Jetty)
- remove around 23M of unused stuff in nutch-solr
- use apache mod_rewrite to allow sharing of sites across cow instances (issue: a single shared DB)
4.10. Structure, Naming and Code Conventions
4.10.1. Naming and Other Code Conventions
GATEWiki uses the GATE coding conventions.
TODO link to a publicly available copy of the conventions.
One thing in particular needs to be borne in mind when naming controllers: the namespace of controllers and the top-level wiki directory in the main sandbox conflict (and this is also true of static resources in the web-app directory). Therefore it is impossible to have /page or /css as directory names at the top of the main sandbox. New controller names (etc.) should be chosen appropriately (e.g. CowFooBarController).
4.10.2. Structure
CoW is made up of the following components (and numerous 3rd party libraries):
- the GATE YAM library, which lives in gate.yam
- an implementation of JSecurity layered on top of the Grails JSecurity plugin
- the GATE versioning library (basically SVNKit), which lives in gate.versioning
- the GATE utilities classes, which live in gate.util
- CLoNE and CLoNE QL, which live in gate.clone (not currently released)
- the CoW Grails webapp
4.10.2.1. Main classes and Grails objects
The CoW webapp is organised around the concept of wiki/content areas, which are file trees stored in SVN. What CoW then provides is
- serving of static files (HTML and otherwise) as normal
- the ability to create and edit YAM files in normal Wiki fashion
- the ability to upload files and directories, and browse the directory trees (subject to authorisation, of course)
CoW is implemented using Grails. The Grails MVC model works like this:
- domain classes are persistent objects that model the domain
- controllers are classes that handle requests and do related logic
- actions are methods (actually closures) on controllers that relate to different types of request; actions build models which are then passed on to views
- views are JSPs (actually their Groovy equivalent, GSPs) which act as parameterised HTML for presentation
Or, put another way, the Grails web application architecture has four main components:
- Controllers triage requests and use Services to build Models which are forwarded to Views.
- Services encapsulate business logic and the more complex Model manipulations.
- Domain classes handle persistence and participate in Models.
- Views handle presentation.
In CoW we have:
- Wiki, the main domain object
- PageController, which handles viewing and editing wiki file trees, YAM files and so on
- AdminController, a (mostly generated) controller that allows creation and maintenance of wiki areas, people and their roles and so on
- a bunch of controllers supporting JSecurity configuration
- various others
Example:
- Let's say that we have a wiki area for a directory called /web/gate-server/html with the id 9.
- A simple page display URL looks like this http://localhost:8080/cow/wiki/show/9/ and will pick up any index.html that exists there.
- Clicking on relative links will lead us to the other pages, which are served from the file system within the CoW app, e.g. http://localhost:8080/cow/wiki/show/9/projects.html
- When looking at an HTML file that has a .yam associated with it, an "Edit this page" link will appear.
- When looking at any page a "List directory" link will appear.
- A "help" link is always present, which points to the CoW documentation in the user's ~/.cowrc.d directory.
4.11. Authentication and Authorisation
Important note: a new GATE wiki installation has a default user, scott. Scott is an adminstrator, he can do anything. He has the traditional and well known password, tiger. Before doing anything else, you must:
- log in as scott
- create a new admin user with a password known only to you
- then log out scott
- log in as your new user
- delete scott
Don't forget to delete scott
4.11.1. Grails JSecurity plugin
Authentication and authorisation uses the JSecurity plugin for Grails.
Note that the basic security model installed by the plugin has been adapted:
- Jsecurity comes with a basic permission implementation that is based on assigning what controllers and actions users and roles can access. This has been replaced with gate.cow.CowPermission that specifies what actions may be performed on what directory in what wiki. See further details in "Users, roles, permissions" below.
- The default realm, JsecDbRealm, has been replaced with a custom realm CowDbRealm, which use plain database authentication and rights management, via the custom gate.cow.CowPermission
- A SecurityService has been added to carry out a couple of common tasks, such as creating roles.
- The controllers and views autogenerated by the Jsecurity plugin have been adapted to the above objects.
- The SecurityFilters have been adapted to GATE Wiki
- Error messages in jsecurity.messages have been customised
- A tag library has been added: see below.
4.11.2. Grails JCaptcha plugin
User registration attempts to prevent registration of bots, using the JCaptcha plugin for Grails:
- A simple random word is used, as configured in Config.groovy
- The captcha is displayed via the jcaptcha:jpeg tag provided by the plugin
- A specific mapping is provided in UrlMappings.groovy for the captcha image
- The response is checked via a JcaptchaService provided by the plugin
4.11.3. Users, roles, permissions, actions
Security is based around users, roles, and permissions.
Users are easy: they are you, me, your neighbour. Someone who is using the wiki. Roles are groups of users. How you decide to group your users is up to you. But most likely, they will groups with some common functional requirement, such as "editors", "reviewers". Or maybe "Team X". Roles do not themselves have any automatic rights to do anything. The rights must be assigned to a role (or to an individual user, if you really want to). Rights are defined separately to roles. Once defined, they can be assigned to any role.
Rights are defined as "permissions". In the default JSecurity plugin installation, a permission defines an action and a controller. You can assign a permission, i.e. the right to use an action on a controller, to a role or user. These types of permission aren't sufficient for GATE Wiki, where lots of wikis may be served by a single controller, and where we may need more fine grained control over a wiki's directory structure.
So, instead we use CowPermissions. These define:
- A wiki that can be accessed (or * for any wiki)
- The directories within the wiki that can be accessed, defined by two regular
expressions (see below for more detail):
- Included directories
- Excluded directories
- A controller that can be accessed. The default is the 'page' controller that gives access to wiki pages. The controller * represents all controllers.
- A set of actions that can be carried out on the controller (pages in the directory in the case of the page controller). The set called * contains all actions.
Inclusion of the page controller in this model means that you can define access to any arbitrary controller, not just wiki pages. This might be useful if you are installing a grails plugin into the wiki.
An assigned permission has one controller, and one named set of actions, and any of the actions in that set can be carried out if you have that permission. You might, for example, define sets of actions on the page controller that are needed for read access to a wiki, or for write access. For fine grained control, you could define a set to contain a single action.
Permissions take two forms:
- Rights to access some part of a wiki are defined as permissions, and assigned to roles and users via the admin interface. These permissions are stored in the database. They are assigned permissions.
- When you try to access an action in the context of a wiki an directory, a permission is constructed that defines that action you need on that wiki and directory. This is a required permission.
Access control proceeds as follows:
- A security filters closure is injected into the controller, as defined in conf/SecurityFilters
- You ask to do something to a page. The before interceptor part of the above filter is executed.
- The before interceptor constructs the required permission, defining the controller and action required, and if relevant on this page in this wiki.
- The before interceptor invokes an accessControl method, which has been injected into the controller by the JSecurity plugin.
- The accessControl method is passed a closure which itself executes a method permission, with the required permission as paramter.
- This leads, via the plugin, to CowDbRealm.isPermitted being executed, with parameters of the user, and the required permission.
- CowDbRealm checks the database and finds all of the permissions that are assigned to this user and to the user's roles
- Each of these assigned permissions is checked to see if it implies the required permission.
- If one does, then access is allowed.
See this thread for a useful description of how permissions work.
4.11.3.1. Directory level authorisation
For actions of the page controller, Gatewiki permissions gives authorisation control at the level of directories in the wiki. Permissions for specific directories is defined in terms of Java regular expressions (see also this Java tutorial).
Each permission has two directory regular expressions:
- Include directory pattern
- this defines directories to which access is allowed
- Exclude directory pattern
- this defines directories to which access is denied
To understand how the two patterns interact, and how all the patterns on the several permissions a user may have interact, you need to know the detail of directory permission checking.
- For access to a given directory or file, all of the permissions configured for a user and all of their roles will be checked
- If any configured permission gives access, then the user has access, regardless of any other permissions
- For each configured permission, access is checked for the path of the
directory relative to the wiki i.e. that part of the URL after the wiki
name up to but excluding the filename. We will call this "the directory".
- e.g. for http://www.gate.ac.uk/sam/mimir/doc/experiments.html the directory is mimir/doc/
- note the lack of a leading slash, and the trailing slash
- For each of these configured permissions, access is given if both of the
following are true:
- the include pattern matches the entire directory to which access is required
- the exclude pattern does not match the entire directory to which access is required
The basic rules are:
- If a user or one of their roles has any permission that gives access to a directory, then they will have access - even if they also have a permission with an exclude pattern for that directory.
- If a user has an include pattern that matches, and an exclude pattern that does not match, then this permission gives access
- In all other cases, they are denied access
The last two points can be summarised in this truth table
Include pattern | Include pattern | ||
Match | No Match | ||
Exclude pattern | Match | DENY | DENY |
Exclude pattern | No Match | ALLOW | DENY |
Some example patterns
It is important to get your patterns right - otherwise you may give unexpected access. Note the following about paths:
- Patterns are always matched against an entire directory path. It is not enough for a pattern to be found as a component of a required directory path.
- The leading slash in a path is not checked
Pattern | Note | Meaning as an include pattern | Meaning as an exclude pattern |
.* | Give access to all directories | Deny access to all directories | |
This is an empty string | Give access to no directories | Deny access to no directories | |
.*foo.* | Give access to any directory path containing foo e.g. foo/bar/, foobar/, bar/foo/, bar/foobar/bar/ | Deny access to any directory path containing foo (examples above) | |
.*\/foo\/ | Give access to any directory path with a leaf directory foo e.g. bar/foo/, but not bar/foobar/ or foo/bar/ | Deny access to the above | |
.*\/foo\/.* | Give access to any directory path with any subdirectory equal to foo, e.g. bar/foo/, bar/foo/bar/, but not bar/foobar/ | Deny access to the above | |
foo\/.* | Give access to any directory path with a top level directory equal to foo, e.g. foo/, foo/bar/, but not bar/foo/ or bar/foobar/ | Deny access to the above |
Some example pattern uses
Scenario | Example use case | Example include pattern | Example exclude pattern |
Access all directories | General user | .* | |
Allow access to just one directory in a wiki | A leaf directory for external users in an otherwise closed site | .*\/external\/\z | |
Deny access to a specific directory in a wiki | A leaf directory not accessible to most users | .*\/privileged\/\z |
4.11.3.2. Access to non-page controllers via CowPermission control
The SecurityFilter and the Jsecurity code give access to wiki pages via the "page" controller. They are also configured at bootstrap to give some access to some other controllers. These are:
- anything when in workstation mode
- all controllers (*) for the admin role
- register controller, access given to the anonymous user, for registration pages
- jcpatcha controller, access given to the anonymous user, for registration pages
- user controller, access given to the anonymous user. This controller displays things about the currently logged in user.
- anything else where the required permission is assigned to the anonymous role, regardless of whether any user is logged in or assigned to that role. This is defined via SecurityService.
- auth controller, access given to anyone. This is a special case defined in the JSecurity plugin itself, and allows access to the login form etc.
- search controller, access is given to the search controller as if the page controller was being called with a show action
4.11.3.3. User and password constraints
Constraints on users are enforced by a UserCommand and PasswordCommand, which are used in a couple of places. Together, they define more fields than JsecUser (e.g. a repeat password, constrained to be the same as the password), but can be bound to JsecUser. Note that the UserCommand is not able to enforce a unique name constraint, which is done by JsecUser
4.11.3.4. Security in workstation and server modes
- Security is turned on only for server mode.
- Everything is accessible in workstation mode, to a user on the local machine.
- Access is denied to all non-local users when in workstation mode
- You can log in when in workstation mode. Doing so gives you nothing extra from a security point of view.
4.11.4. Pre-defined security objects
The above gives an abstract view of how security works. In a default GATE wiki installation, there will be several pre-defined security objects that you can use to get your installation up and running.
Type | Object | Description |
Role | administrator | This role has a single pre-defined permission, which allows it access to wiki * with the action set called *, which contains all actions, and to the controller *. In other words, members of this role can do anything. There is a single initial member by default, scott (see below), who you should delete. |
Role | default | All new registrants are automatically made members of this role. This role has no pre-defined permissions. So, by default, new registrants are given access to nothing. But you could assign a permission to this role, so that new registrants get access to something. |
Role | anonymous | This role is treated differently to the others. The default bootstrap permissions will allow anyone access to something permitted to this role, regardless of whether they are a member of this role or whether they are logged in. In effect, all users, logged in or not, are members of this role. By default, this role has a permission, which allows read-only access to the help wiki, and permissions for the jcaptcha, register, and user controllers, all of which are needed for registration etc. |
Role | help read | This role has a single permission, giving read access to the help wiki. |
Role | help read write | This role has a single permission, giving read write access to the help wiki. |
Role | main read | This role has a single permission, giving read access to the main wiki. |
Role | main read write | This role has a single permission, giving read write access to the main wiki. |
Role | wiki N read | This role is created by default for any new wiki N. It has a single permission, giving read access to wiki N. |
Role | wiki N read write | This role is created by default for any new wiki N. It has a single permission, giving read and write access to wiki N. |
User | scott | The sole default member of the administrator group, with the usual password. You should create a new admin user, log in as that new user, and delete scott. |
Action set | Read | A named set of all the actions that are required to give read access to a wiki. Used when constructing new permissions. |
Action set | ReadAndWrite | A named set of all the actions that are required to give read and write access to a wiki. Used when constructing new permissions. |
Action set | * | A named set of all the actions. Used when constructing new permissions. |
Controllers | page, jcpatcha, register, user, * | Controllers used by the default roles. |
Notes:
- administration is per server, not per wiki. ie an administrator is over all wikis on a server.
- an administrator can see everything
- the above special roles and sets have names defined in Config.groovy that can be overridden at startup etc.
- note the default roles that are created for new wikis
- when a new user registers, they are automatically put in the default role. Assign anything to this role that you want new registrants to access.lt.
- anyone can see things assigned to the anonymous role.
- access to jsec and admin pages is restricted by ommission. These controllers are not mentioned in any permission, and so only admistrator can see them. this could be changed to an explicit mention.
- If you are using the Hibernate HSQL in-memory database behind CoW, then you may have problems with multiple copies of some objects such as roles and role-permissions being created. Reason is unclear, but it may be because HSQL is not properly persisting at shutdown?
4.11.5. Tag library
In addition to the tags defined by Jsecurity (note do not use jsec: prinicipal, see below), the following are available in SecurityTagLib:
- cow:principal Outputs the username. Overcomes a vulnerability in jsec:principal
- cow:isWorkstation true if in workstation mode
- cow:isServer true if in server mode
- cow:isNotLoggingIn true if no one is logged in
- cow:isAdministrator true if a member of the administrator role is logged in
- cow:canRead true if the user has permission to read the specified page
- cow:canReadAndWrite true if the user has permission to read and write the specified page
- cow:hasPermission true if the user has permission to carry out the specified actions on the specified page
- cow:lacksPermission true if the user does not have permission to carry out the specified actions on the specified page
See the taglib for fuller documentation.
4.11.6. Code dependencies between Jsecurity and GATE Wiki
There are very few! Of course, there are lots of pages specific to security and its administration. But it shouldn't be too hard to tease these apart from the core wiki.
- BootStrap creates some default roles and other objects, as described above.
- Config configures JCaptcha, and sets names for the pre-defined security objects
- Custom tags are used to give user-specific menus
- WikiController.save and WikiController.delete use the SecurityService to create and delete default security roles for a wiki
- edit and newpage flows in PageController get the user name from the SecurityService.
- There are several admin and other pages specific to security and registration
4.11.7. Known vulnerabilities and avoiding them
- Passwords are posted cleartext, This might not be a problem for most setups. If it is, then you must deal with it in your servlet container, either by running the whole app in https, or redirecting to https for pages that post passwords.
- Note that SHA1 is used for password hashing, and that this may not be good enough for security sensitive applications. SHA1 this is built in as the default in the JSecurity plugin, which will have to be changed or overridden if you require a different algorithm. (e.g. perhaps by writing a new of the credentials matcher bean in spring/resources.xml)
- You should not use the jsec:principal tag as it isn't HTML encoded and therefore vulnerable to XSS attacks. Instead, use the cow:principal tag which fixes this. (until such time as fixe din jsecurity - see this jira
4.12. Site- and Wiki-Specific Layout and Navigation
CoW's look and feel is a masterpiece of minimalist aesthetics, the like of which is seldom seen in this age of extravagant waste, technological frenzy and general fluffiness. Everyone who uses CoW will inevitably wish to preserve its wonderful good looks completely unchanged. If, however, the evils of the international capitalist conspiracy force you kicking and screaming to do something different, CoW provides several ways to do site-specific (and wiki area-specific) layout and configuration, from simple things like changing the title up to a complete rethink using a Grails (Sitemesh-based) layout. If you want to make your CoW quack like a duck or stink like a skunk you're going to be in pig heaven.
CoW provides navigation (sets of links to parts of a site organised as menus) that can be tailored on a per-area or per-directory basis, plus layout (CSS styling, page structure, etc.) that can be tailored on a per-site or per-area basis.
4.12.1. Navigation
Navigation that applies to the whole site or to a whole wiki area is probably best put into a Grails layout plugin - see replacing the main layout below. In other cases, e.g. navigation that is different for different parts of a wiki area, CoW allows you to create lists of links in normal wiki pages (that follow a configurable naming convention) and will use these to create navigation menus.
For the impatient: create a file called leftBar.html containing a list of links, e.g.
(Don't put a heading in the file.) Then all pages under that directory will have a left navigation menu containing these items.
More detail:
Adding navigation to a directory tree involves adding YAM files to the top of the tree that contain the links for whichever of various screen areas that you want to contain them (e.g. top bar, left bar, right bar, footer). These files have to be named after the corresponding DIV elements of the main layout (in the default CoW layout they are called header, leftBar, rightBar and footer; in a CoW running a custom layout they may have different names). (Note to layout writers: the DIVs that get replaced with custom navigation must be at the top level of the BODY element in your layout.) Because navigation files are inherited by subtrees, if you have any subdirectories below a navigation file, the links in the file will have to be absolute (i.e. start with a "/").
This type of navigation is dealt with at rendering time so that we don't lose the ability to work with YAM files outside of CoW. So at the place where CoW reads the body of a YAM-derived HTML file it takes a) the top directory of the current wiki area tree and b) the current directory (implicit), and then it steps up the tree looking for all the layouts that are specified in the config (under navigation.files). Each of those present is added to the page model in PageController.show, and these become the contents of the relevant parts of the main layout (e.g. header, leftBar, footer and so on).
4.12.2. Replacing the Main Page Layout
If you want to replace the entire look of the site (or a particular wiki area), then you need to create a Grails plugin and supply a Sitemesh layout, as below. There's an example plugin that sets up CoW to use the University of Sheffield house style at gatewiki/site-plugins/nlp (and various others there).
Step-by-step:
- create a plugin that provides a controller called gate.cow.GuestLayoutController and a layout called cowguest.gsp (the controller doesn't need to do anything, but is necessary so that we can navigate the Grails plugin metadata to find the layout; see gate.cow.CowUtils for the gorey details)
- the cowguest layout should include copies of the functional elements of the cowpage.gsp layout, so that e.g. the login/edit/etc. links are present in the new layout
- you can add any CSS, images etc. in the normal way in the web-app directory of the plugin
- package the plugin and install it in CoW, e.g. (for the
NLP site plugin in gatewiki/nlp):
- cd .../nlp
- GRAILS_HOME=../grails ../grails/bin/grails package-plugin
- cd ../cow
- ant -Dgate.cow.grails.command="install-plugin ../nlp/grails-nlp-0.1.zip" grails
- the cowguest layout will now be referenced by all the main GSPs
- for the install step, and to create a deploy, there's a convenience script
in cow/bin/create-custom-cow-site.sh which takes a copy of cow and
installs the plugin. this is useful for then creating the war or doing a
grails run-war for deployment (see also the deployment section)
- an example: cow/bin/create-custom-cow-site.sh -n nlp -p `pwd`/nlp/grails-nlp-0.1.zip will create a patched nlp site
Another example: to create a "g8rs" plugin and package the whole tree for deployment:
cd gatewiki/site-plugins GRAILS_HOME=../grails ../grails/bin/grails create-plugin g8rs cd g8rs GRAILS_HOME=../../grails ../../grails/bin/grails create-controller \ gate.cow.GuestLayout ... now edit grails-app/views/layouts/cowguest.gsp ... GRAILS_HOME=../../grails ../../grails/bin/grails package-plugin cow/bin/create-custom-cow-site.sh -n g8rs -p `pwd`/site-plugins/g8rs
(See also the cow/bin/site-plugin.sh example script.)
4.12.3. Changing the Title or Logo
If you want to replace change a few small things like the name of the site (which gets put in the page titles) or the main logo, then you just need to supply an external configuration file in gate.cow.user.home (which defaults to your operating system's HOME directory if not set explicitly). There's an example in gatewiki/cow/dev-user-home/.cowrc.groovy.
Changing the title:
gate.cow.name.short = "CoW - dev mode" gate.cow.name.long = "CoW - dev mode, a Controllable Wiki"
Changing the main logo:
gate.cow.default.logo = "/g8/page/show/2/my-logo.png"
(The PNG file then needs to be uploaded to Wiki Area 2, in this case.)
Note that the logo needs to be somewhere that can be read by the anonymous user, otherwise it won't display on the login page. You can do this either by placing the logo in a anonymous read wiki, or by giving the anonymous user read access to a single folder in another wiki.
4.12.4. Dealing with non-native HTML pages
Non-native HTML pages are those which are not generated from YAM markup, i.e. are not really managed by CoW as wiki pages. CoW will serve any old HTML page that a user happens to upload into a sandbox (or that is added by any 3rd-party SVN client). This is ok if the HTML is from a trusted source, but not so ok when it may contain arbitrary javascript, for example. There's no simple answer to this issue, so CoW provides several ways to control the serving of non-native HTML, but they are turned off by default.
There are two options.
Option 1 is to allow users to specify that for a particular directory any non-native HTML pages will be served raw in their entirety (but see below about insecurity!).
Note the obvious security hole: if any user can upload any HTML and get it served raw by CoW all manner of nasty attacks become possible. Don't allow users to do this unless you're very confident in their intentions (and basic technical skills, e.g. to avoid viruses and so on). If you are ok with all this, the method is:
- turn the feature on (it is off by default) by setting gate.cow.security.allow.user.raw.html true in your .cowrc.groovy
- instruct users to put a file called .cow-raw-html in each directory where non-native HTML should be served raw
Option 2 is to specify (as an administrator via the Admin pages) a set of path patterns that apply to non-native HTML pages and again cause them to be served raw. This option is implemented in a similar way to the directory level authorisation. Administrators, while setting up a wiki, can specify which directories are allowed or disallowed for serving of non-native HTML files.
Each wiki area has two regular expressions for this purpose:
- include directory pattern:
this specifies directories that are allowed to serve non-native HTMLs - exclude directory pattern:
this specifies directories that are restricted from serving non-native HTMLs
To understand how the two patterns interact and for details of directory permissions checking please refer to the directory level authorisation section. If the directory matches the include pattern and is not excluded, the HTML pages are served raw; otherwise a permission denied message is shown to the user.
If you then make the directories where the raw HTMLs are served from read only, then you're secure.
Both the options are available to the administrator. A user is given a permission to see a non-native HTML file if the directory to which the non-native file belongs is granted access by at least one of the two options.
To iframe or not to iframe?
In general the resultant page contains an
iframe that pulls in the raw HTML page. This is good for things like Javadocs
etcc., but in some cases you may want to just serve the body of the page
instead (e.g. the GATE user guide). If a file called .cow-no-iframe is
present in the directory containing the HTML page then the iframe will be
omitted.
4.13. Referencing and regeneration
The problem is that a create/rename/delete/modify operation on a .yam should be cascaded through any files that link to or include it. The relevant data is what a file "linksTo" and "includes". Regeneration is needed:
- if the requested output (e.g. a .html) is younger than the .yam file
- if any included files have been created, modified, renamed or deleted
- if any linked files have been renamed, added or deleted
This information needs to be maintained in a single dependency graph for each Wiki area. This is coded in class Dependencies.
If we can guarantee that all necessary regeneration and dependency maintenance is done whenever any generation is done, then the graph can be assumed to be consistent. In workstation mode that's an open issue because of updates from other tools but in the worst case we can periodically or on request regenerate all the output files to ensure consistency.
Referencing events
- page or directory is
created out-dates all linkers and includers renamed out-dates all linkers and includers deleted out-dates all linkers and includers modified out-dates all includers
Event sources
- CoW
- svn up
- in workstation mode all sorts of editors etc.
Possible event handlers (using and maintaining a dependency graph)
- YamFile.generate
- WikiController.show/create/etc., perhaps complemented by SVN notifications
- a separate process
The current design is to use YamFile.generate.
4.13.1. Serialization
Dependencies are serialized to the directory conf.gate.cow.dbs, one file per wiki. They are serialized on shutdown and periodically during uptime. Periodic serialization is carried out (once a minute) by DependenciesJob. They are deserialized lazily, as required.
4.13.2. Regeneration
Dependencies for a single wiki are regenerated on four occasions:
- When the "regenerate" button is clicked on the Wiki admin "show" page. Use this if dependencies are screwed and you need a one of regeneration
- Once a day, when the time specified for the wiki is reached. This time is specified in the wiki admin pages. A time given as the empty string (default) means no daily regeneration. This type of regeneration is controlled by WikiRegenerationJob.
- Whenever the wiki domain object is changed, e.g. if you were to change the base path of the wiki.
- If you were to delete the serialized dependencies for a wiki, then they would be regenerated next time they were needed.
4.13.3. Speed of Running SVN Status
As noted above, YAM files managed by the wiki can be changed in several ways:
- user makes edits via the sandbox in the filesystem
- user updates the sandbox with changes from the repository
- user makes edits via the wiki
- user asks the wiki to update
In each of these cases we need to regenerate any files that are dependent on the changes (e.g. a .html that is older than its .yam; a YAM file which refers to a wiki page which no longer exists; etc.).
There are a number of strategies that might be adopted to determine what needs regenerating:
- a low-priority process that inspects the filesystem directly
- a listener that receives notifications from the SVN repository
- a process that runs svn st to get a report on files that have changed
For option 3., the statistics in this table are interesting. They give an indication of how long it takes to do an svn st on the SALE tree (which is more than 5Gb of data in more than 100,000 files.
svn st on the sale tree (from cold) | svn st -u on sale (filesystem cached) | ||||||||
|
|
The local status check (which doesn't go to the network to check on repository changes) takes around a minute from cold, but subsequently (when the operating system has had a chance to cache parts of the file system) only takes a couple of seconds. The network check (svn st -u) stays pretty constant around 30 seconds.
(Tests done on a 2GHz Pentium M with 1Gb RAM running Ubuntu Dapper on a broadband connection.)
4.14. Sourceforge notes
- the Sourceforge project
- the Sourceforge copy of these pages
- ssh -l user-name shell.sourceforge.net
- updating the http://gatewiki.sf.net/ pages
- /home/groups/g/ga/gatewiki/htdocs is the directory that gets published as http://gatewiki.sf.net/
- to update them, build helpdocs.zip (via ant doc) and either
- use bin/update-sourceforge-web.sh (change the user name first), or
- sftp hcunningham,gatewiki@web.sourceforge.net
4.15. Grails MVC notes
- A request that hits the application context path (e.g. "cow") is first filtered according to the rules in conf/...UrlMappings.groovy and then assigned to a controller/action when there is a match. Parts of the request are assigned to the params property of the controller (often including the id of a domain object).
- The action either renders a view directly or returns a map (model) which is then forwarded to a view by Grails (a view is typically a GSP).
- Views are decorated by layouts (using some Grails tags built on top of Sitemesh).
- The view is served to the requesting client.
4.16. IntelliJ notes
- download the IDEA from http://www.jetbrains.com/idea/download/index.html#linux
- unpackage the tar file to some place in your disk, e.g. userhome/apps/
- to run idea use this command line as a model: sh -c 'export JDK_HOME=/path/to/java/for/idea/1.6/min && export JAVA_HOME=/path/to/java/for/your/projects && /usr/local/idea-9164/bin/idea.sh'
- Have a look the Install_Linux-tar.txt in idea home, where you can see how to launch it.
- One thing worth mentioning is to increase the JVM heap size, so that you idea can perform faster. To adjust the value of JVM heap size open bin/idea.vmoptions and modify the -Xms and -Xmx parameters.
- After you open your Intellij first time, you should see a plugins link on the top-right screen. If not, use Ctrl+Alt+S to open the IDE settings window, then goto plugins tab, where you should see all the installed plugins. Next, click on the available tab and search "Groovy". Then you should find the JetGroovy plugin. After installation of JetGroovy plugin, you need to restart your IDEA.
- Now you open the IDE settings window again, you should see the Groovy&Grails icon on the left of the window. Click on it and set your local Groovy and Grails installation directory.
- Set your project JDK from Platform settings.
- To import Cow, click on the New Project from the File tab, then choose "Create project from existing sources", Next to select the location of your local cow.
- Automatic formatting: goto IDEA settings and find the Global Code Style. Set the tab, indent and etc. to the GATE conventions. Then once you open each source file, you can use Ctrl+Alt+L or right-click on the file name and select reformat code.
- To debug use menu Run->Edit Configurations, add a Remote configuration if not already there with the options localhost, 5005 and uncheck all the options. Then add a break point in Idea by clicking in the left column of a file, run ant run-debug in command line and use menu Run->debug in Idea.
4.17. Search infrastructure
CoW uses a combination of Nutch and Solr for indexing and search of wiki areas.
Nutch is good at crawling and has several plugins that can be used to extract the text from different types of documents. Solr, on the other hand, is an enterprise search server with a very good search API. The only downside of the Solr is that it requires all documents to be in XML format for them to be indexed. Fortunately, Nutch has a plugin to convert the crawled data into the appropriate XML format. Combination of the two provides a complete search system whereby Nutch is responsible for crawling intranets/websites, converting documents into the appropriate XML documents and uploading content to the Solr and Solr is responsible for maintaining several indices and providing a powerful search API for users to search within these indices.
4.17.1. Installation
4.17.1.1. Directory structure
- Nutch and Solr are checked into gatewiki/nutch-solr
- Nutch (version 1.0-dev downloaded on 26-feb-2009)
- nutch sources are checked into gatewiki/nutch-solr/nutch. For all future upgrades, the sources should be updated here.
- gatewiki/nutch-solr/nutch-config has a set of configuration files which
have been customized for the CoW use. All the files under the nutch-config
folder need to be visible on the classpath. The IndexingService.groovy,
which is responsible for starting indexing processes, adds the the
nutch-config folder on the classpath before it starts the actual indexing
process in a separate JVM instance. Except for crawl-urlsfilters.xml, none
of the files in this folder need to be changed at runtime and therefore can
be safely shared among different instances of the cow site. Ideally one
would want to crawl into subfolders. However, for some reason, nutch crawls
back into parent folders as well. This behaviour of nutch results into
undesirable results. A solution posted on nutch's website is to create a
regular expression such that only the files that belong to the top-level
folder or any of its subfolders are indexed. This regular expression need
to be added to the crawl-urlfilters.xml before starting the actual
indexing. This is the only reason why crawl-urlfilters.xml is edited every
time indexing command is issued. In order to solve this, we plan to do the
following:
- rename the file crawl-urlfilters.xml to urlfilters-default.xml. This file, by default, will have the url filters that need not be changed at all. Since nutch searches for the crawl-urlfilters.xml on the classpath, renaming this file will not make it available to the nutch.
- Everytime before we start indexing, a temporary folder will be created with a file called crawl-urlfilters.xml in it. This file will have contents from the urlfilters-default.xml and a regular expression for the top-level directory. This folder will be then placed on the classpath.
- Solr (version 1.3.0)
- we copied the example directory from Solr check out to solr-app
- solr-app/solr is a setup for running single index
- solr-app/multicore is a setup for running multiple cores (indices)
- solr-app/multicore/solr.xml specifies the number of cores to be used for indexing. Currently the limit is set to maximum 20 cores.
- we created solr-app/multicore/conf directory and put all configuration files there to be shared among different cores. We modified the schema.xml to specify the fields that nutch produces for indexing. It also contains other search related configuration settings.
- solr-app/webapps contains the solr.war file which is pointed to via Config.groovy from CoW. Should one wants to upgrade the solr application, he or she would have to update the solr.war file. No other changes need to be made to any of the other configuration files.
- All the index files associated with wiki areas are stored under the ${user.home}/${dot}cowrc.d/${version}/indices as specified in the Config.groovy where each core has its own folder to store its index data.
- browse to http://localhost:8080/solr for instance...
- In order to validate search queries, we use the lucene's query parser. Therefore, the lucene's core library (from solr-app/webapps/solr.war) is copied into the cow/lib folder. While upgrading solr libraries, please make sure that the lucene library is compatible with the newer solr version.
- We wrote a wrapper (gatewiki/nutch-solr/src) to simplify calls from CoW. It has several different methods to manipulate data in Solr indices.
4.17.1.2. Enabling Solr
By default Solr is disabled. To tell CoW to run Solr (when being run via Grails) set gate.CoW.solr.run to true in the config. Also make sure that the gate.Cow.thirdparty is set to true.
4.17.2. Indexing wikis/individual files from CoW
4.17.2.1. Indexing individual Wiki areas
- Login as someone with administration rights.
- Click on the "Admin" link.
- Click on the "Wiki Areas" link.
- Each wiki area listing has a link to adminster index for that wiki area. Administer area provides options for indexing or deleting an existing index.
- Clicking on the "Index" button initiates indexing in the background while updating the screen with appropriate messages. Note: When changing a wiki area path you need to delete any existing index and re-index with the new path.
- Clicking on the "Delete" button deletes the index for that particular area.
4.17.2.2. Indexing individual files
Whenever a file is created, uploaded or edited, it is queued for indexing. One of the quartz jobs is dedicated for the indexing purpose only. It runs every five minutes and indexes any documents it finds in the queue. If a document is already indexed, it is deleted and reindexed.
4.17.3. Testing
There is now a separate test suite (selenium/solr-suite.html) for testing solr indexing and searching functionalities. Please note that running selenium tests does not test the solr functionalities. Instead, please use the test-selenium-solr ant target to invoke the solr tests.
4.17.4. Debugging
If you get no results for your queries, the first thing to make sure is that you have followed all the right procedure for indexing respective wiki area. Please refer to the Indexing section for more information on this. If you are certain that you followed all the steps as mentioned under the Indexing section and yet there are no results, please follow the following instructions to debug:
- First thing to check is the indexing queue. Whenever you submit a request for indexing a wiki area, the request is queued up. IndexingJob, if idle, visits this queue every few seconds and executes the jobs present in the queue. Please note that indexing might take from couple of minutes to few hours depending on the number of documents in wiki areas. You can check if your request is still in the queue by clicking on the Indexing Queue link on the Administer Index page. The page gets refreshed every few seconds. If you cannot see your request listed on this page, it means that your request was executed. Please try again to search.
- Usually, if the wiki area in which you are searching is not indexed, you should get the appropriate message when you try to search in that area. If you are certain that you indexed the wiki area but you still get the message, it might mean that the indexing did not finish successfully. In this case, please refer to the cow/solr0.log file for any exceptions.
- Another way of checking if indexing did run successfully or not is by visiting the Statistics link on the Adminster Index page. Please look at the value of the numDocs under the CORE section. If it is not 0, it means that there are files in the index but search is not returning any value. You can issue the following query url:file*, which will give you names of all the files indexed. However, if the value is 0, either index could not find any files to index or some exception occurred while indexing (see cow/solr0.log for any exceptions).
- Finally, nutch creates a temporary folder (which should start with crawl) under the /tmp folder to store temporary index for the crawled files. This folder will have two files crawl-urlfilter.txt and urls. The url file contains a url pointing to the folder or file that you requested to index. The crawl-urlfilter.txt is used for filtering out certain types of files. Please make sure that the url specified in the urls file is not getting filtered out.
4.18. SVN Browsing
GATEWiki layers on top of SVN in order to take advantage of SVN's mature and sophisticated support for collaborative concurrent editing, versionning, branching, etc. It also allows us to profit from all the 3rd-party tools that support SVN, and in particular to provide browsing of wiki pages (and other content) and their history.
We looked at a number of tools to provide this function, and evaluated Sventon, OpenGrok and ViewCVS. All have their strengths and weaknesses for our purposes, but Sventon was the best fit (Java, open source, not specific to browsing code repositories).
4.18.1. Notes on Sventon in CoW
CoW uses an unpacked version of Sventon's svn.war with an additional config file which:
- sets editableConfig true
- sets the config location to {cowrc.d}/sventon-config
- exposes a management interface over JMX which allows CoW to re-initialize sventon when it needs to, for example when a new wiki area is added.
CoW configures Sventon at runtime to be able to browse the subversion repositories corresponding to the configured wiki areas. Authentication settings are propagted from CoW into Sventon, so if a remote repository requires authentication Sventon will authenticate using the same settings as CoW would use to access the corresponding wiki.
To upgrade to a new version of Sventon, unpack it into gatewiki/sventon, change the web-app link to point to it, and follow the instructions in the README file.
To tell CoW to run Sventon (when being run via Grails) set gate.cow.sventon.run to true in the config.
4.19. Extending YAM with Plugins
YAM has a plugin mechanism that involves writing a Java (or Groovy) class that implements the YamPlugin interface. Drop this class (which must be in the package gate.yam.plugins into the classpath of the application hosting the YAM translator (e.g. GATEWiki) and then you can make calls as described in the YAM documentation.
See e.g. the implementation of the Twitter plugin for inspiration.
One of the things this is useful for is dropping snippets of HTML into YAM pages (e.g. for Google site search, Twitter updates etc.). (We removed the ability that older versions of YAM had to do this directly because of the security implications, XSS attacks and so on.)
Footnotes
- SVN: the Subversion version control system.
- Except when in workstation mode.
- The link text can contain "inline" markup such as %image, %cite or *bold*/_italic_, but not block markup such as tables
- This is a footnote.
- The file upload test needs to programatically fill in a file upload form field and this is only possible in a special "elevated privileges" mode using the *firefox launcher