Skip to content
Snippets Groups Projects
Mateusz Klimaszewski's avatar
a9ff0aab

COMBO

A language-independent NLP system for dependency parsing, part-of-speech tagging, lemmatisation and more built on top of PyTorch and AllenNLP.


License

Quick start

Clone this repository and install COMBO (we suggest creating a virtualenv/conda environment with Python 3.6+, as a bundle of required packages will be installed):

pip install -U pip setuptools wheel
pip install --index-url https://pypi.clarin-pl.eu/simple combo==1.0.3

Run the following commands in your Python console to make predictions with a pre-trained model:

from combo.predict import COMBO

nlp = COMBO.from_pretrained("polish-herbert-base")
sentence = nlp("COVID-19 to ostra choroba zakaźna układu oddechowego wywołana zakażeniem wirusem SARS-CoV-2.")

Predictions are accessible as a list of token attributes:

print("{:5} {:15} {:15} {:10} {:10} {:10}".format('ID', 'TOKEN', 'LEMMA', 'UPOS', 'HEAD', 'DEPREL'))
for token in sentence.tokens:
    print("{:5} {:15} {:15} {:10} {:10} {:10}".format(str(token.id), token.token, token.lemma, token.upostag, str(token.head), token.deprel))

COMBO tutorial

We encourage you to use the beginner's tutorial (colab notebook).

Details

Citing

If you use EUD in your research, please cite COMBO: A New Module for EUD Parsing

@inproceedings{klimaszewski-wroblewska-2021-combo,
    title = "{COMBO}: A New Module for {EUD} Parsing",
    author = "Klimaszewski, Mateusz  and
      Wr{\'o}blewska, Alina",
    booktitle = "Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.iwpt-1.16",
    doi = "10.18653/v1/2021.iwpt-1.16",
    pages = "158--166",
    abstract = "We introduce the COMBO-based approach for EUD parsing and its implementation, which took part in the IWPT 2021 EUD shared task. The goal of this task is to parse raw texts in 17 languages into Enhanced Universal Dependencies (EUD). The proposed approach uses COMBO to predict UD trees and EUD graphs. These structures are then merged into the final EUD graphs. Some EUD edge labels are extended with case information using a single language-independent expansion rule. In the official evaluation, the solution ranked fourth, achieving an average ELAS of 83.79{\%}. The source code is available at https://gitlab.clarin-pl.eu/syntactic-tools/combo.",
}