|
Document Summaries
The Science Behind Summaries
Active Navigation summaries aim to reflect the content of a document. To achieve this, the summary creation process works in co-ordination with the theme extraction process. The theme extraction process works through the document to extract the most relevant facts and concepts. These themes are used to direct the summary creation process towards the type of information within the document that most reflects its content.
The original documents don't need to be specially formatted for display on the web. In fact they don't even need to be HTML documents, the Active Navigation service can convert most popular document formats to HTML as they pass through the server.
It is possible to configure the service to alter the size of the summary in number of sentences or to produce a summary that represents a percentage size of the document. The Active Navigation solution can positively and negatively bias the summary generated to meet more closely a user's needs. An example of the use of biasing to adjust the summary output is in the field of competitive intelligence.
A recent client was faced with a large amount of news stories coming from a news feed. They were only interested in stories relating to their products, their competitors and their competitors products. The summary generation process can be biased to make sure any items containing specific product or competitor terms come through in the summary. Likewise it might be important to know what is being said about a company's own products or trademarks. By using these terms as a bias the summary will reflect the interest as well as the general topic of the information.
The summary capabilities of Active Navigation's product can be exposed in a number of ways but there are two general strategies:
- The creation of a summary for any given document "on the fly". The server is given a URL or passed the text of the document and will then produce a summary which is returned to the user's browser or to the calling system.
- The off-line creation of a number of summaries from a batch of documents, storing them in a repository for later use by the live running server.
The summary is produced in either HTML for display at the browser or in XML for further processing.
|