README.md 1.37 KB
Newer Older
Grzegorz Kostkowski's avatar
Grzegorz Kostkowski committed
1
# DESCRIPTION
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
This tool annotates a document with URLs of corresponidng entities.

Linker parses annotated tokens/groups of tokens to spot expressions which
have to be described by mentioned URL.
URLs are taken from semantic graph of connected resources.

Linker accepts *.ccl file with annotations of following type:
  - mwe (multi-word expressions)
  - ne (named entities)
  - wsd (word sense disambiguation - annottated with syn_id)

Note: in current version, multi-word expressions (including named entity)
have precedence over disambiguated words. It's the case when
word annotated with wsd have been included in multi-word expression.
Then in many cases sesnse of this single word will differ from sense
of whole expression.

As a result, *.ccl document is returned with included URL annotations.

Grzegorz Kostkowski's avatar
Grzegorz Kostkowski committed
21 22
# DEPENDENCIES
All dependencies included in [requirements.txt](requirements.txt).
23

Grzegorz Kostkowski's avatar
Grzegorz Kostkowski committed
24
# USAGE
25

Grzegorz Kostkowski's avatar
Grzegorz Kostkowski committed
26 27 28 29
```bash
elinker -d path/to/ccl/doc.xml -o path/for/results/output_doc.xml
```
For help and detailed description type: ```elinker --help```.
30

31 32 33 34 35 36
# EXAMPLE
There are three files in the [example](./example) directory:
- ``article.txt`` : raw text used as input for elinker,
- ``article.wosedon.xml`` : tokenized, lemmatized and disambiguated article,
- ``article.wosedon.elinker.xml`` : result of elinker, with URIs from LOD,
e.g. ```<prop key="PlWN:url_0">http://plwordnet.pwr.wroc.pl/wordnet/synset/4622</prop>```