GATE Information Extraction Example
Information extraction systems analyse unrestricted text in order to extract information about pre-specified types of events, entities or relationships
To help illustrate the process of Information Extraction, and to highlight some of the difficulties involved, a example is given for you to tackle yourself.
Consider the information needs of an analyst who tracks changes in company management. A system is required to take input from news articles and extract information about any management succession events -- the post, the company or companies concerned, the current and incoming managers, the reason the post is or will be vacant etc. This might normally be undertaken by a news clipping service, where information retrieval techniques might be used to fetch relevant articles which would then be laboriously, and expensively, scanned by workers. IE systems can now perform this task automatically.
An example Information Extraction task
A sample of text from the Wall Street Journal is given below, together with a template. The task is to fill the template with information about succession events extracted from the text. The first is completed as an example. You are helped by the fact that it is shown that there are six events in total, although complete information is not available for all of them. The answers are given on another page, as is a sample output of the information which could be extracted form the complete text.
<DOC> <DOCID> wsj93_050.0203 </DOCID> <DOCNO> 930219-0013. </DOCNO> <HL> Marketing Brief: @ Noted.... </HL> <DD> 02/19/93 </DD> <SO> WALL STREET JOURNAL (J), PAGE B5 </SO> <CO> NYTA </CO> <IN> MEDIA (MED), PUBLISHING (PUB) </IN> <TXT> <p> New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent. </p> </TXT> </DOC>
Use the following key:
<ORGANIZATION-1> NAME : "New York Times Co." <ORGANIZATION-2> NAME : "New York Times" <PERSON-1> NAME : "Russell T. Lewis" <PERSON-2> NAME : "Lance R. Primis"
The first entry is completed for you:
<SUCCESSION-1> ORGANIZATION : <ORGANIZATION-2> POST : "president" WHO_IS_IN : <PERSON-1> WHO_IS_OUT : <PERSON-2> <SUCCESSION-2> ORGANIZATION : POST : WHO_IS_IN : WHO_IS_OUT : <SUCCESSION-3> ORGANIZATION : POST : WHO_IS_IN : WHO_IS_OUT : <SUCCESSION-4> ORGANIZATION : POST : WHO_IS_IN : WHO_IS_OUT : <SUCCESSION-5> ORGANIZATION : POST : WHO_IS_IN : WHO_IS_OUT : <SUCCESSION-6> ORGANIZATION : POST : WHO_IS_IN : WHO_IS_OUT :
How did you do? Check against the answers.