PERLS: Persistent Experiment Repositories for Language Science

"Science" has as many definitions as there are philosophers, but an important element of several of its forms is experiment. Practitioners form hypotheses, design procedures to test those hypotheses, and publish the results in ways intended to allow their peers to reproduce them. Where such reproduction succeeds reliably, it is evidence that the hypothesis under test is correct.

A common problem, especially in areas related to human behaviour or intelligence, is precisely how to measure results, and progress in computational processing of human language has increasingly been driven by the provision of standard test collections and evaluation metrics that provide a level playing field on which to compare hypotheses and related experimental systems. This practice has contributed greatly to experimental repeatability, which is a key factor in the sharing of results across a research community. Without the infrastructure that test collections and evaluation tools provide it is much more difficult to reproduce published work, much more difficult to confirm theories, and much more difficult to distinguish a novel contribution from a reiteration¹. Consequently the increase in experiment-driven work described above has often gone hand-in-hand with a step back from the project of simulating intelligence and the adoption of more limited goals, for example the turn from language understanding research to information extraction research in the 1990s.

Three factors in recent history mean that it has now become possible to extend the reach of our experimental infrastructures in several ways, and in doing so increase the power and impact of our research. Over the last several decades there has been a long-running conformance of compute power increase to Moore's 'law'. At the same time network bandwidth has similarly extended in both volume and geographical reach. Lastly, software and the platforms that it runs on have become mobile (able to move between different hardware systems). Taken together, these factors mean we can publish not only our results, but the complete set of platform, software, configuration, intermediate data and measurement tools that underly the results, and we can do so in forms which allow the dynamic recombination of the elements of our work in new experiments by ourselves or by our colleagues. For the areas of Information Retrieval, Natural Language Processing and Speech Recognition, we can start to build Persistent Experiment Repositories for Language Science: PERLS.

Characteristics of PERLS repositories:

Distributed reproducible experiments. Web-based interfaces and mobile code that allow experiments defined and executed in one place to be persisted in another and retrieved and reproduced from a third.
Multiple repositories. Setting up a repository needs to be easy enough for research groups of a handful of people to create and maintain their own.
Platform neutral. Software system longevity is typically compromised by technology churn in the underlying platform. One answer to this is to use free software, which has a better track record (and isn't under pressure to force users to upgrade through incompatible changes). Another answer is to use OS-level virtualisation (Xen, VMWare, VirtualBox, etc.) to provide computational environment persistence.
Versionned. Both software and data need to be persisted in version controlled storage.
Informed by related Grid and eScience programmes.

See also our proposal for a new infrastructure at the IRF.

Footnotes

This discussion may seem commonplace to those from other disciplines; the issue commonly arises in fields connected in one way or another to the project of Artificial Intelligence. This is probably partly for cultural reasons related to the practitioner groups and partly because of unresolved difficulties in defining the subject matter -- quite understandable in this case, where we're rather like lab rats who've been promoted to principal investigators and now have to study ourselves!