Targeted Hypernym Discovery

v0.3.9.2Beta

Entity Extraction API version 2.0

Resource:

POST api/v2/extraction

Description:

Performs entity extraction and classification of entities for a given text. For each extracted entity its types are discovered. For both, the entities and the types, appropriate links to DBpedia or YAGO are provided (if available). Note that the length of the input text has influence on the processing time.


Parameters Description
lang optional Language of the input text. You can choose between English, German and Dutch. Values: en/de/nl.
Default value: en.
Example: lang=en
format optional Requested response format. Possible values: xml/json/rdfxml.
Default value: xml
Example: format=xml
The format can be also specified using the Accept request header. If format parameter is specified, then the format parameter will have higher priority.
Values for the Accept header:
application/xml
application/json
application/rdf+xml
application/ld+json
application/x-turtle
Important note: The results serialized in RDF are modelled using the NIF 2.0 format.
Please refer to the NIF 2.0 specification for more information.
provenance optional Provenance of the results. Values: thd/dbpedia/yago. The client can choose one or more.
Default value: thd,dbpedia,yago.
Example: provenance=thd,dbpedia
knowledge_base optional Defines, which knowledge base is used to retrieve types for thd. Currently applicable only, when provenance is set to thd or all. Values: linkedHypernymsDataset/local/live
linkedHypernymsDataset - use the linked hypernyms dataset (recommended),
local - local Wikipedia mirror (slight latency), for the date of the Wikipedia snapshot please refer to the technical information about application version in the page footer,
live - live Wikipedia (highest latency), please be considerate and do not submit large amount of text or high number of requests. Recommended use: issue query containing single candidate entity, for which the other options failed to provide a type.
Note: Only one option can be chosen.
Default value: linkedHypernymsDataset
Example: knowledge_base=linkedHypernymsDataset
entity_type optional The types of entities to be processed from the text. Values: ne/ce/all
ne - extract only "named entities",
ce - only common entities will be extracted,
all - both, the named entities and the common entities will be extracted.
Default value: all
Example: entity_type=ne
priority_entity_linking optional If set to true, the system will prefer linking more precise DBpedia disambiguation (longer entity name). This option may result to less entities being assigned types. Values: true/false
Default value: false
Example: priority_entity_linking=true
types_filter optional NEW in the API version 2! Filter types to selected namespaces for thd results. You can filter out only types as DBpedia instances, DBpedia Ontology types, or both of them. This setting has no effect for provenance=yago or provenance=dbpedia. Values: dbo/dbinstance/all
dbo - filter only DBpedia Ontology types,
dbinstance - filter only types defined as DBpedia instances,
all - both, the entity types can be either DBpedia Ontology clases or DBpedia instances.
Default value: all
Example: types_filter=dbo
linking_method optional NEW in THD version 3.9! You can choose preferred entity linking (disambiguation) method. So far, you can choose between 6 entity linking approaches. Possible values:
SFISearch - This approach uses a custom entity Surfaces Form Index (SFI). The candidate index contains all surfaces forms found in Wikpedia articles together with their candidates.
LuceneSearch - Lucene based entity linking.
LuceneSearchSkipDisPage - This approach differs from the one above in the sense that it skips the disambiguation DBpedia pages and as a correct link considers the first non-disambiguation page.
WikipediaSearch - Wikipedia search based entity linking.
AllVoting - This approach performs the entity linking by aggregating the results from the SFI, Lucene (enhanced) and Wikipedia Search based entity linking.
SurfaceFormSimilarity - This approach first performs entity linking with the SFI, Lucene (enhanced) and Wikipedia Search. The article with the most similar title to the entity surface form is considered as correct.
Default value: LuceneSearchSkipDisPage - if you do not specify this query parameter, than Lucene search (which skips disambiguation pages) is used as a linking method.
Example: linking_method=WikipediaSearch
Important note: you can not at the same time use LuceneSearch and perform live types mining (knowledge_base=live).
spotting_method optional NEW in THD version 3.9.2! You can choose preferred entity spotting (recognition) method. So far, you can choose between spotting based on lexico-syntactic grammars and spotting based on Conditional Random Fields model. Possible values:
grammars - entity spotting based on manually crafted lexico-syntactic grammars.
CRF - entity spotting based on the state-of-the-art Conditional Random Fields model.
Default value: grammars - if you do not specify this query parameter, than grammars based method is used for spotting entities.
Example: spotting_method=grammars
Important note: CRF based entity spotting is slowlier but more efficient. It can be used to detect named entities only.
apikey required Used for identification of a third-party application utilizing the service. Write us an email to get an api key.
Example: apikey=123456789

Response Object

Field Type Description
startOffset Integer Start offset index of the found entity in the input text. The offset is counter from 0 from the beginning of the input text.
"startOffset": 4
endOffset Integer End offset of the found entity in the input text. The offset is counter from 0 from the beginning of the input text.
"endOffset": 18
underlyingString String The string considered as an entity.
"underlyingString": "Charles Bridge"
entityType String The type of the extracted entity. Possible values: "named entity" or "common entity".
"entityType": "named entity"
types Array of types List of types for the found entity.
"types":  [
       {
        "typeLabel": "Country",
        "typeURI": "http://dbpedia.org/ontology/Country",
        "entityLabel": "Czech Republic",
        "entityURI": "http://dbpedia.org/resource/Czech_Republic",
        "classificationConfidence":  {
          "value": 0.857,
          "type": "classification"
        },
        "linkingConfidence":  {
          "value": 0.999,
          "type": "linking"
        },
        "salience":{
            "classLabel":"most_salient",
            "score":0.845,
            "confidence":0.715
        },
        "provenance": "thd"
      } ]
The corresponding XML is:
<types>
    <type>
        <typeLabel>Country</typeLabel>
        <typeURI>http://dbpedia.org/ontology/Country</typeURI>
        <entityLabel>Czech Republic</entityLabel>
        <entityURI>http://dbpedia.org/resource/Czech_Republic</entityURI>
        <confidence type="classification">0.857</confidence>
        <confidence type="linking">0.999</confidence>
        <salience>
            <score>0.845</score>
            <confidence>0.715</confidence>
            <class>most_salient</class>
        </salience>
        <provenance>thd</provenance>
    </type>
<types>

typeLabel - name by which the type is formally known.

typeURI - DBpedia/YAGO URI describing the entity type.

entityLabel - name by which the disambiguated entity is formally known.

entityURI - DBpedia/YAGO URI describing the disambiguated entity.

provenance - Provenance of the results. Possible values are: thd - produced by THD, thd-derived - also produced by THD through searching for superclasses in the Dbpedia ontology, dbpedia - produced by DBpedia, and yago - produced by YAGO2s ontology.

confidence - estimated classification or linking (disambiguation) confidence.
Classification confidence is the estimated probability that the typeLabel is correct for given entityURI.
Linking confidence is the estimated probability of the entityURI being correct given the surface form of the entity.
Confidence in XML: element <confidence> - Classification and linking confidence can be distinguished with the type attribute. Possible values for the type attribute are linking and classification.
<confidence type="classification">0.857</confidence>
<confidence type="linking">0.999</confidence>
Confidence in JSON: Classification confidence in classificationConfidence object, linking confidence in the linkingConfidence object. The actual confidence value is stored at the key value.
"classificationConfidence":  {
    "value": 0.857,
    "type": "classification"
},
"linkingConfidence":  {
    "value": 0.999,
    "type": "linking" 
}
Note: if you use WikipediaSearch as entity linking method (read more here), the linking confidence will always be -1. We are unable to estimate the linking confidence for the Wikipedia search based linking.

salience - estimated salience of the entity to the document. The level of salience determines whether or not the document is about the entity. In XML the entity salience is encoded as follows:
<salience>
    <class>most_salient</class>
    <score>0.845</score>
    <confidence>0.715</confidence>
</salience>
class - one of the following three classes indicating level of salience:
  • most_salient - A most prominent entity with highest focus of attention in the document.
  • less_salient - A less prominent entity with focus of attention in some parts of the document.
  • not_salient - The document is not about the entity.
score - the entity salience score. High salience score indicates higher focus of attention. confidence - estimated confidence (probability) that the entity salience class is correct.
In JSON the entity salience is encoded as follows:
"salience":{
    "classLabel":"most_salient",
    "score": 0.845,
    "confidence": 0.715
}

HTTP Status Codes

The THD API attempts to return appropriate HTTP status codes for every request.

Code Text Description
200 OK Success!
400 Bad Request The request was invalid. An accompanying error message will explain why.
401 Unauthorized Authentication credentials were missing or incorrect.
406 Not Acceptable Returned by the API when an invalid format is specified in the request.
500 Internal Server Error Something is broken. Please write us an email so the THD team can investigate.

Error Messages

When the THD API returns error messages, it does so in your requested format. For example, returned error in JSON might look like this:

{ "code": 45, "value": "Empty body request" }

The corresponding XML response would be:

<?xml version="1.0" encoding="UTF-8"?>
<error code="45">Empty body request</error>

Error Codes

In addition to descriptive error text, error messages contain machine-parseable codes. The following table describes the codes which may appear when working with the API:

Code Text Description
31 Could not authenticate you Authentication credentials were missing. Needs security credentials specified by the apikey parameter.
32 Could not authenticate you Specified api key is not valid. The API could not authenticate you.
41 Not supported format The format specified in the format parameter is not supported.
42 Not supported format The format specified in the Accept header is not supported.
43 Not valid types_filter parameter The value of the types_filter parameter is not valid. You can choose between dbo, dbinstance and all.
44 Not valid linking_method parameter The value of the linking_method parameter is not valid. You can choose between LuceneSearch or WikipediaSearch.
45 Empty body request The body of the request is empty.
46 Not valid knowledge_base parameter Chosen knowledge base is not supported.
47 Not valid provenance parameter The value of the provenance parameter is not valid. You can choose between thd, dbpedia and yago.
48 Not correctly set entity_type parameter The value of the provenance parameter is not valid. You can choose between ne, ce and all.
49 Not supported language Specified language in the lang parameter is not valid. You can choose between en, de and nl.
51 Internal error Something went wrong on the server side. Please write us an email so the THD team can investigate.

Request example

POST
https://entityclassifier.eu/thd/api/v2/extraction?apikey=123456789&format=xml&provenance=thd&priority_entity_linking=true&entity_type=all

POST Data
The Charles Bridge is a famous historic bridge that crosses the Vltava river in Prague, Czech Republic.

curl -v "https://entityclassifier.eu/thd/api/v2/extraction?apikey=123456789&format=xml&provenance=thd&priority_entity_linking=true&entity_type=all" -d "The Charles Bridge is a famous historic bridge that crosses the Vltava river in Prague, Czech Republic."

Response example

<entities>
    <entity>
        <startOffset>4</startOffset>
        <endOffset>18</endOffset>
        <underlyingString>Charles Bridge</underlyingString>
        <entityType>named entity</entityType>
        <types>
          <type>
            <typeLabel>Bridge</typeLabel>
            <typeURI>http://dbpedia.org/ontology/Bridge</typeURI>
            <entityLabel>Charles Bridge</entityURI>
            <entityURI>http://dbpedia.org/resource/Charles_Bridge</entityURI>
            <confidence type="classification" >0.857</confidence>
            <confidence type="linking" >0.65</confidence>
            <salience>
                <score>0.845</score>
                <confidence>0.715</confidence>
                <class>most_salient</class>
                </salience>
            <provenance>thd</provenance>
          </type>
          <type>
            <typeLabel>route of transportation</typeLabel>
            <typeURI>http://dbpedia.org/ontology/RouteOfTransportation</typeURI>
            <entityLabel>Charles Bridge</entityURI>
            <entityURI>http://dbpedia.org/resource/Charles_Bridge</entityURI>
            <confidence type="classification" >0.857</confidence>
            <confidence type="linking" >0.65</confidence>
            <salience>
                <score>0.845</score>
                <confidence>0.715</confidence>
                <class>most_salient</class>
                </salience>
            <provenance>thd-derived</provenance>
          </type>
          ...
    </entity>
    ...
</entities>