Use [elinker_plugin](./entity_linker/elinker_plugin.py).
## As a LPMN service
See [elinker service repo](https://gitlab.clarin-pl.eu/nlpworkers/elinker).
# Example
[example](./example) directory contains sample files to use for linking.
[example/out](./example/out) contains generated CCL files with linked entities.
For more info check [this page](./example/README.md).
# Configuration
## Config sources
Tool read configuration from various sources. In case of overlapping for a config
key, configuration from source with the highest priority will be used.
Precedence of config sources (most important at the bottom):
1. default values from config module (`ConfigDefEntry` instances),
1. values specified in config _INI_ file,
1. values specified in passed kwargs (in case of calling main function from
other external Python module),
1. values passed with CLI command.
Additionally, precedence of different config files is distinguished:
1. default config file (config.ini in package),
1. config file passed in kwargs,
1. config file passed with CLI command.
## Available config options
| option | type | default value | available in LPMN task? | description |
| - | - | - | - | - |
| ann_only_first_occ | bool | false | ✓ | if true, then in case of many occurences of certain token, only first one will contain generated annotations |
| crosswiki_disambiguation_types | list | [_ne_, _mwe_, _cw_] | ✓ | types of annotations for which crosswiki disambiguation will be applied |
| crosswiki_disambiguation | bool | true | ✓ | if true, then crosswiki index will be used to choose correct entity from KB |
| crosswiki_file | text | | ✗ | name of TSV file with crosswiki index |
| db_type | text | _allegrograph_ | ✗ | type of database to use; supported types: `neo4j`, `allegrograph` |
| db_user | text | undefined | ✗ | name of database user |
| enable_filters | bool | true | ✓ | if true, then token and lemma filters will be applied |
| entities_list_prop_name | text | _entities_ | ✓ | if specified, then property with such name will be added to every token. The property will contain list of all |keys | of entities added by elinker.
| entities_list_prop_sep | text | | ✓ | separator for `entities_list_prop_name` list |
| exclude_ignored | text | | ✗ | |
| extended_search | bool | true | ✓ | if true, then extra equivalent relations will be matched in the database |
| ignored_pos | list | | ✓ | list of PoS of tokens to ignore |
| ignore_shorter_than | text | | ✓ | minimal length of token lemma; shorter will be ignored |
| kw_ignored_ann | text | | ✗ | |
| langs | list | [_pl_] | ✓ | languages corresponding to text literals in KB to use during linking |
| log_file | text | | ✗ | if specified, then tool execution will be logged to specified file |
| mark_without_ann | bool | false | ✓ | if false then only token belonging to any (wsd / mwe /ne) annotation will be linked |
| mwe_base_prop_key | text | | ✓ | name of property key storing base form of multiword expression |
| mwe_chan_name | text | | ✓ | name of annotation channel storing multiword expressions |
| named_entity_chan_names | text | | ✓ |list of NER annotations to recognize in document |
| permitted_sources | text | all | ✓ | list of names of known LOD sources in KB to use; if not specified, then all sources will be accepted. |
| sort_entities | bool | true | ✓ | if true, then list of uris in token will be sorted alphabetically |
| stop_list | text | | ✗ | name of txt file with list of stop words; stop word will be compared with token base lemma |
| synset_prop_key | text | | ✓ | name of property key storing sysnet id (disambiguation info) |
| tagset | text | _nkjp_ | ✓ | name of a tagset |
| url_key_format | text | | ✓ | format of token property key storing URI of linked entity |
| use_wsd_synsets | bool | true | ✓ |if true, then wsd synset's id will be used to match entity in KB |
| use_wsd_tokens | bool | true | ✓ | if true then include disambiguated (wsd) tokens in entity linking process |
| without_ann_only_mono | bool | false | ✗ | deprecated; If true and `mark_without_ann` is true, then will use only monosemic labels in knowledge base |
Note: in case of lists, comma or newline character is used. For convenience,
it is also posible to specify single value in every place where list is expected