Natural Language Generation

Kalina Bontcheva

The astonishing explosion of the World Wide Web is having two effects on Adaptive Natural Language Generation (ANLG). First, it is making NLG, ANLG and the user tracking and modelling that underlies adaptivity more and more important. When the web was invented by Tim Berners-Lee HTML was most often a static medium. A publisher would create a page which was then stored whole in a web server filespace and downloaded whole over HTTP when a browser requested it. Increasingly common on the modern web are pages that are generated dynamically from database records, by programs executing on servers, or by applets and plugins executing in the browsing client. A significant advantage of this approach is that vast existing data stores such as company product inventories, medical and pharmaceutical classification hierarchies, legal case histories etc. etc. can be made available online without interfering with their current storage medium. The data also has a single storage point (often a relational database) which is queried on-the-fly to produce HTML as required, thus guaranteeing that the generated pages are in sync with the data.

The requirements of the process by which database tables become human-readable HTML with significant textual elements is a strong motivation for work on NLG and ANLG. Adaptive generation can be sensitive to the users' browsing history, reducing repetition and allowing the targeting of marketing information based on user models.

The second `web effect' on ANLG is to increase the available raw materials for constructing user models. If you check your web browser's cookies file you will almost certainly find an entry starting something like www.doubleclick.net. This cookie is from a web server that is employed by a growing number of popular web sites (e.g. AltaVista -- see http://www.doubleclick.net/ for a list) as a shared source of advertisements. Each time doubleclick serves an advertisement to form part of a page of one of its clients it also requests return of its cookies (or sends a new one if the browser hasn't been seen before). Each time it receives a returned cookie it logs the browser's user ID and adds the page reference (which is encoded in the URL for the advert) to the store of data associated with that ID. Sooner or later you probably give your email and other address details to one of http://www.doubleclick.net/'s clients (perhaps to join a mailing list or to make a purchase), and from then on a log of your web movements on all participating sites is being collected and linked with your identity. Such data may be used for marketing, credit checking or police purposes. It may also serve as input to user models for ANLG (as may the history of a user's browsing on a single site, which is also collected using cookies).

Browing histories group together a set of URLs; these must be analysed in some way to provide data for a user model. This area is one in which Information Extraction technology is likely to be appropriate (see http://gate.ac.uk/ie/. This process is beyond the scope of the current work, but again is a strong motivating factor for ANLG.

Our work in this area builds on the HYLITE+ system, which generates adaptive hypertext explanations of domain terminology, starting from a domain ontology. As part of the MIAKT project, we are now working on generating natural language summaries of formal ontological knowledge. The testbed domain is breast cancer patient reports. This work will be continued in the newly-funded SEKT project with application in building intelligent knowledge access facilities for next generation knowledge management.

For further information see:

K. Bontcheva, Y. Wilks. Automatic Report Generation from Ontologies: the MIAKT approach. Nineth International Conference on Applications of Natural Language to Information Systems (NLDB'2004). Manchester, UK. 2004. PDF

K. Bontcheva. Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies. Fourth International Conference on Language Resources and Evaluation (LREC'2004). Lisbon, Portugal. 2004. PDF

K. Bontcheva, V. Dimitrova. Examining the Use of Conceptual Graphs in Adaptive Web-Based Systems that Aid Terminology Learning. International Journal on Artificial Intelligence Tools (IJAIT). Special issue on "AI Techniques in Web-Based Educational Systems". 13(2), 2004.

Bontcheva, K., Wilks, Y. Dealing with Dependencies between Content Planning and Surface Realisation in a Pipeline Generation Architecture. In Proceedings of International Joint Conference in Artificial Intelligence (IJCAI'01), August 7-10. Seattle, 2001.

Bontcheva, K. Adaptivity, Adaptability, and Reading Behaviour: Some Results from the Evaluation of a Dynamic Hypertext System. Proceedings of the Second International Conference on Adaptive Hypertext (AHÂ2002). Best Student Paper Award.

Bontcheva, K. Tailoring the Content of Dynamically Generated Explanations. M. Bauer, P.J. Gmytrasiewicz, J. Vassileva (eds). User Modelling 2001: 8th International Conference, UM2001, Lecture Notes in Artificial Intelligence 2109, Springer Verlag, 2001.
PowerPoint presentation

Bontcheva, K. The Impact of Empirical Studies on the Design of an Adaptive Hypertext Generation System.Proceedings of the Third Workshop on Adaptive Hypertext and Hypermedia, Lecture Notes in Artificial Intelligence, Springer Verlag, 2001.
PowerPoint presentation