Useful material

DKPro
DKPro on GitHub
https://dkpro.github.io/

DKPro Examples
https://github.com/dkpro/dkpro-core-examples

Network Text Extraction

Network text extraction (see attached papers and presentations). The services described below return a network in SGF format that can be visualized by libraries
that handle json encoded network data such as d3.js or by using the analytics workbench at workbench2.collide.info.

Text2Network Webservices
Paramters are accepted as "application/x-www-form-urlencoded"

Grammar based network extraction
All noun phrases in the text become nodes in the networks.
http://textanalytics.collide.info:8080/textanalytics-jersey-1.0-SNAPSHOT/text2network
Input
  • text: The text from which a network will be extracted as string.
  • lang: The language of the text as string (currently "en" or "de").
  • window_size: If two noun phrases appear in a sliding window of 'window_size' words they will be connected in the resulting network.
  • doc_id: Id of the document.

Codebook based network extraction
Codebook based network extraction extracts a network of predefined concepts from a text. Hereon, a codebook has to be provided where each concept is encoded as a triple (term,concept,category).
Terms are phrases that are supposed to occur in the text. Concepts are generalisations of terms and categories are used to classify concepts.
Example:
network text extraction, NTA, METHOD
networks from text, NTA, METHOD
interview, INTERVIEW, APPLICATION
interviewing, INTERVIEW, APPLICATION

http://textanalytics.collide.info:8080/textanalytics-jersey-1.0-SNAPSHOT/text2network/dict
  • text: The text from which a network will be extracted as string.
  • dict: Dictionary of words that should be discovered in the text. Dictionaries have to be in comma separated value format each line having the following form: term,concept,category.
  • window_size: If two noun phrases appear in a sliding window of 'window_size' words they will be connected in the resulting network.
  • doc_id: Id of the document.

Collide DKPro Webservices
Tools for testing the webservices
https://addons.mozilla.org/de/firefox/addon/rest-easy/
https://chrome.google.com/webstore/search/rest%20easy

Java code example

import eu.sisob.components.restclient.RESTClientAgent;

private static final String SERVICE_URL = http://textanalytics.collide.info:8080/textanalytics-jersey-1.0-SNAPSHOT/text2network/dict
String getNetworkCodebookBased(String text, String codebook, int windowSize, String docId) {

        String response = null;
        Map<String, String> formParameters = new HashMap<>();
        formParameters.put("text", text);
        formParameters.put("dict", codebook);
        formParameters.put("window_size", windowSize);
        formParameters.put("doc_id", docId);

        response = callRestAPIPost(SERVICE_URL, formParameters);

        return response;
}

Weitere Schritte

rest_easy.png (62.9 KB) Tobias Hecking, 08/25/2016 10:18 AM

hecking_hoppe_visla_workshop_lak15_camera_ready.pdf (737 KB) Tobias Hecking, 08/25/2016 10:21 AM

Masterthesis_Menglu CUI.pdf (3.31 MB) Tobias Hecking, 08/25/2016 10:29 AM

20110126_Automap_ECSN.pdf (3.77 MB) Tobias Hecking, 08/25/2016 10:31 AM