From 160ee6ea3f05316abb2d91f2103a6d91009e819c Mon Sep 17 00:00:00 2001 From: Mateusz Klimaszewski <mk.klimaszewski@gmail.com> Date: Sun, 3 Jan 2021 11:46:19 +0100 Subject: [PATCH] Extend documentation with better examples. --- README.md | 14 +++++++++----- docs/models.md | 27 +++++++++++++++++---------- docs/training.md | 6 +++--- 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index c339bda..a9c2113 100644 --- a/README.md +++ b/README.md @@ -10,19 +10,24 @@ </p> ## Quick start -Clone this repository and install COMBO (we suggest using virtualenv/conda with Python 3.6+): +Clone this repository and install COMBO (we suggest creating a virtualenv/conda environment with Python 3.6+, as a bundle of required packages will be installed): ```bash git clone https://gitlab.clarin-pl.eu/syntactic-tools/clarinbiz/combo.git cd combo python setup.py develop ``` -Run the following lines in your Python console to make predictions with a pre-trained model: +Run the following commands in your Python console to make predictions with a pre-trained model: ```python from combo.predict import COMBO nlp = COMBO.from_pretrained("polish-herbert-base") -sentence = nlp("Moje zdanie.") -print(sentence.tokens) +sentence = nlp("COVID-19 to ostra choroba zakaźna układu oddechowego wywołana zakażeniem wirusem SARS-CoV-2.") +``` +Predictions are accessible as a list of token attributes: +```python +print("{:5} {:15} {:15} {:10} {:10} {:10}".format('ID', 'TOKEN', 'LEMMA', 'UPOS', 'HEAD', 'DEPREL')) +for token in sentence.tokens: + print("{:5} {:15} {:15} {:10} {:10} {:10}".format(str(token.id), token.token, token.lemma, token.upostag, str(token.head), token.deprel)) ``` ## Details @@ -31,4 +36,3 @@ print(sentence.tokens) - [**Pre-trained models**](docs/models.md) - [**Training**](docs/training.md) - [**Prediction**](docs/prediction.md) - diff --git a/docs/models.md b/docs/models.md index d4346ff..25a7f70 100644 --- a/docs/models.md +++ b/docs/models.md @@ -1,19 +1,26 @@ # Models -Pre-trained models are available [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). +COMBO provides pre-trained models for: +- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org), +- enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html). + +## Manual download + +The pre-trained models can be downloaded from [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). + + +If you want to use the console version of COMBO, you need to download a pre-trained model manually: +```bash +wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz +``` + +The downloaded model should be passed as a parameter for COMBO (see [prediction doc](prediction.md)). ## Automatic download -Python `from_pretrained` method will download the pre-trained model if the provided name (without the extension .tar.gz) matches one of the names in [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). +The pre-trained models can be downloaded automatically with the Python `from_pretrained` method. Select a model name (without the extension .tar.gz) from the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/) and pass the name as the attribute to `from_pretrained` method: ```python from combo.predict import COMBO nlp = COMBO.from_pretrained("polish-herbert-base") ``` -Otherwise it looks for a model in local env. - -## Console prediction/Local model -If you want to use the console version of COMBO, you need to download a pre-trained model manually -```bash -wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz -``` -and pass it as a parameter (see [prediction doc](prediction.md)). +If the model name doesn't match any model on the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/), COMBO looks for a model in local env. diff --git a/docs/training.md b/docs/training.md index f4b6d38..d3f69e0 100644 --- a/docs/training.md +++ b/docs/training.md @@ -1,6 +1,6 @@ # Training -Command: +Basic command: ```bash combo --mode train \ --training_data_path your_training_path \ @@ -32,13 +32,13 @@ Examples (for clarity without training/validation data paths): combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer ``` -* predict only dependency tree: +* train only a dependency parser: ```bash combo --mode train --targets head,deprel ``` -* use part-of-speech tags for predicting only dependency tree +* use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features) ```bash combo --mode train --targets head,deprel --features token,char,upostag -- GitLab