Entity Extraction API version 2.0
Resource:
POST api/v2/extraction
Description:
Performs entity extraction and classification of entities for a given text. For each extracted entity its types are discovered. For both, the entities and the types, appropriate links to DBpedia or YAGO are provided (if available). Note that the length of the input text has influence on the processing time.
Parameters | Description |
---|---|
lang optional |
Language of the input text. You can choose between English, German and Dutch. Values: en/de/nl .Default value: en.Example: lang=en
|
format optional | Requested response format. Possible values: xml/json/rdfxml. Default value: xmlExample: format=xmlThe format can be also specified using the Accept request header. If format parameter is specified, then the format parameter will have higher priority. Values for the Accept header: application/xml application/json application/rdf+xml application/ld+json application/x-turtle Important note: The results serialized in RDF are modelled using the NIF 2.0 format.Please refer to the NIF 2.0 specification for more information. |
provenance optional | Provenance of the results. Values: thd/dbpedia/yago. The client can choose one or more. Default value: thd,dbpedia,yago.Example: provenance=thd,dbpedia |
knowledge_base optional | Defines, which knowledge base is used to retrieve types for thd. Currently applicable only, when provenance is set to thd or all. Values: linkedHypernymsDataset/local/live linkedHypernymsDataset - use the linked hypernyms dataset (recommended), local - local Wikipedia mirror (slight latency), for the date of the Wikipedia snapshot please refer to the technical information about application version in the page footer, live - live Wikipedia (highest latency), please be considerate and do not submit large amount of text or high number of requests. Recommended use: issue query containing single candidate entity, for which the other options failed to provide a type. Note: Only one option can be chosen.Default value: linkedHypernymsDataset Example: knowledge_base=linkedHypernymsDataset |
entity_type optional | The types of entities to be processed from the text. Values: ne/ce/allne - extract only "named entities", ce - only common entities will be extracted, all - both, the named entities and the common entities will be extracted.Default value: all Example: entity_type=ne |
priority_entity_linking optional | If set to true, the system will prefer linking more precise DBpedia disambiguation (longer entity name). This option may result to less entities being assigned types. Values: true/false Default value: false Example: priority_entity_linking=true |
types_filter optional NEW in the API version 2! | Filter types to selected namespaces for thd results. You can filter out only types as DBpedia instances, DBpedia Ontology types, or both of them. This setting has no effect for provenance=yago or provenance=dbpedia. Values: dbo/dbinstance/all dbo - filter only DBpedia Ontology types, dbinstance - filter only types defined as DBpedia instances, all - both, the entity types can be either DBpedia Ontology clases or DBpedia instances.Default value: all Example: types_filter=dbo |
linking_method optional NEW in THD version 3.9! | You can choose preferred entity linking (disambiguation) method. So far, you can choose between 6 entity linking approaches. Possible values: SFISearch - This approach uses a custom entity Surfaces Form Index (SFI). The candidate index contains all surfaces forms found in Wikpedia articles together with their candidates. LuceneSearch - Lucene based entity linking. LuceneSearchSkipDisPage - This approach differs from the one above in the sense that it skips the disambiguation DBpedia pages and as a correct link considers the first non-disambiguation page. WikipediaSearch - Wikipedia search based entity linking. AllVoting - This approach performs the entity linking by aggregating the results from the SFI, Lucene (enhanced) and Wikipedia Search based entity linking. SurfaceFormSimilarity - This approach first performs entity linking with the SFI, Lucene (enhanced) and Wikipedia Search. The article with the most similar title to the entity surface form is considered as correct. Default value: LuceneSearchSkipDisPage - if you do not specify this query parameter, than Lucene search (which skips disambiguation pages) is used as a linking method. Example: linking_method=WikipediaSearch Important note: you can not at the same time use LuceneSearch and perform live types mining (knowledge_base=live). |
spotting_method optional NEW in THD version 3.9.2! | You can choose preferred entity spotting (recognition) method. So far, you can choose between spotting based on lexico-syntactic grammars and spotting based on Conditional Random Fields model. Possible values: grammars - entity spotting based on manually crafted lexico-syntactic grammars. CRF - entity spotting based on the state-of-the-art Conditional Random Fields model. Default value: grammars - if you do not specify this query parameter, than grammars based method is used for spotting entities. Example: spotting_method=grammars Important note: CRF based entity spotting is slowlier but more efficient. It can be used to detect named entities only. |
apikey required | Used for identification of a third-party application utilizing the service. Write us an email to get an api key.Example: apikey=123456789 |
Response Object
Field | Type | Description |
---|---|---|
startOffset | Integer |
Start offset index of the found entity in the input text. The offset is counter from 0 from the beginning of the input text.
|
endOffset | Integer |
End offset of the found entity in the input text. The offset is counter from 0 from the beginning of the input text.
|
underlyingString | String |
The string considered as an entity.
|
entityType | String |
The type of the extracted entity. Possible values: "named entity" or "common entity".
|
types | Array of types |
List of types for the found entity.
typeLabel - name by which the type is formally known. typeURI - DBpedia/YAGO URI describing the entity type. entityLabel - name by which the disambiguated entity is formally known. entityURI - DBpedia/YAGO URI describing the disambiguated entity. provenance - Provenance of the results. Possible values are: thd - produced by THD, thd-derived - also produced by THD through searching for superclasses in the Dbpedia ontology, dbpedia - produced by DBpedia, and yago - produced by YAGO2s ontology. confidence - estimated classification or linking (disambiguation) confidence.Classification confidence is the estimated probability that the typeLabel is correct for given entityURI.Linking confidence is the estimated probability of the entityURI being correct given the surface form of the entity. Confidence in XML: element <confidence> - Classification and linking confidence can be distinguished with the type attribute. Possible values for the type attribute are linking and classification.
|
HTTP Status Codes
The THD API attempts to return appropriate HTTP status codes for every request.
Code | Text | Description |
---|---|---|
200 | OK | Success! |
400 | Bad Request | The request was invalid. An accompanying error message will explain why. |
401 | Unauthorized | Authentication credentials were missing or incorrect. |
406 | Not Acceptable | Returned by the API when an invalid format is specified in the request. |
500 | Internal Server Error | Something is broken. Please write us an email so the THD team can investigate. |
Error Messages
When the THD API returns error messages, it does so in your requested format. For example, returned error in JSON might look like this:
{ "code": 45, "value": "Empty body request" }
The corresponding XML response would be:
<?xml version="1.0" encoding="UTF-8"?>
<error code="45">Empty body request</error>
Error Codes
In addition to descriptive error text, error messages contain machine-parseable codes. The following table describes the codes which may appear when working with the API:
Code | Text | Description |
---|---|---|
31 | Could not authenticate you | Authentication credentials were missing. Needs security credentials specified by the apikey parameter. |
32 | Could not authenticate you | Specified api key is not valid. The API could not authenticate you. |
41 | Not supported format | The format specified in the format parameter is not supported. |
42 | Not supported format | The format specified in the Accept header is not supported. |
43 | Not valid types_filter parameter | The value of the types_filter parameter is not valid. You can choose between dbo, dbinstance and all. |
44 | Not valid linking_method parameter | The value of the linking_method parameter is not valid. You can choose between LuceneSearch or WikipediaSearch. |
45 | Empty body request | The body of the request is empty. |
46 | Not valid knowledge_base parameter | Chosen knowledge base is not supported. |
47 | Not valid provenance parameter | The value of the provenance parameter is not valid. You can choose between thd, dbpedia and yago. |
48 | Not correctly set entity_type parameter | The value of the provenance parameter is not valid. You can choose between ne, ce and all. |
49 | Not supported language | Specified language in the lang parameter is not valid. You can choose between en, de and nl. |
51 | Internal error | Something went wrong on the server side. Please write us an email so the THD team can investigate. |
Request example
POSThttps://entityclassifier.eu/thd/api/v2/extraction?apikey=123456789&format=xml&provenance=thd&priority_entity_linking=true&entity_type=all
POST DataThe Charles Bridge is a famous historic bridge that crosses the Vltava river in Prague, Czech Republic.
curl -v "https://entityclassifier.eu/thd/api/v2/extraction?apikey=123456789&format=xml&provenance=thd&priority_entity_linking=true&entity_type=all" -d "The Charles Bridge is a famous historic bridge that crosses the
Vltava river in Prague, Czech Republic."
Response example
<entities>
<entity>
<startOffset>4</startOffset>
<endOffset>18</endOffset>
<underlyingString>Charles Bridge</underlyingString>
<entityType>named entity</entityType>
<types>
<type>
<typeLabel>Bridge</typeLabel>
<typeURI>http://dbpedia.org/ontology/Bridge</typeURI>
<entityLabel>Charles Bridge</entityURI>
<entityURI>http://dbpedia.org/resource/Charles_Bridge</entityURI>
<confidence type="classification" >0.857</confidence>
<confidence type="linking" >0.65</confidence>
<salience>
<score>0.845</score>
<confidence>0.715</confidence>
<class>most_salient</class>
</salience>
<provenance>thd</provenance>
</type>
<type>
<typeLabel>route of transportation</typeLabel>
<typeURI>http://dbpedia.org/ontology/RouteOfTransportation</typeURI>
<entityLabel>Charles Bridge</entityURI>
<entityURI>http://dbpedia.org/resource/Charles_Bridge</entityURI>
<confidence type="classification" >0.857</confidence>
<confidence type="linking" >0.65</confidence>
<salience>
<score>0.845</score>
<confidence>0.715</confidence>
<class>most_salient</class>
</salience>
<provenance>thd-derived</provenance>
</type>
...
</entity>
...
</entities>