diff --git a/README.md b/README.md index c339bda6407f5e60609a75974125cd2276d5e794..a9c21135005d9abf50f2234bedca817b7d180327 100644 --- a/README.md +++ b/README.md @@ -10,19 +10,24 @@ </p> ## Quick start -Clone this repository and install COMBO (we suggest using virtualenv/conda with Python 3.6+): +Clone this repository and install COMBO (we suggest creating a virtualenv/conda environment with Python 3.6+, as a bundle of required packages will be installed): ```bash git clone https://gitlab.clarin-pl.eu/syntactic-tools/clarinbiz/combo.git cd combo python setup.py develop ``` -Run the following lines in your Python console to make predictions with a pre-trained model: +Run the following commands in your Python console to make predictions with a pre-trained model: ```python from combo.predict import COMBO nlp = COMBO.from_pretrained("polish-herbert-base") -sentence = nlp("Moje zdanie.") -print(sentence.tokens) +sentence = nlp("COVID-19 to ostra choroba zakaźna układu oddechowego wywołana zakażeniem wirusem SARS-CoV-2.") +``` +Predictions are accessible as a list of token attributes: +```python +print("{:5} {:15} {:15} {:10} {:10} {:10}".format('ID', 'TOKEN', 'LEMMA', 'UPOS', 'HEAD', 'DEPREL')) +for token in sentence.tokens: + print("{:5} {:15} {:15} {:10} {:10} {:10}".format(str(token.id), token.token, token.lemma, token.upostag, str(token.head), token.deprel)) ``` ## Details @@ -31,4 +36,3 @@ print(sentence.tokens) - [**Pre-trained models**](docs/models.md) - [**Training**](docs/training.md) - [**Prediction**](docs/prediction.md) - diff --git a/docs/models.md b/docs/models.md index d4346ff2de196d2ad295eed98cfd218aaf071636..25a7f7092ef295a0cf1ff2b3ba13e0adb05a5bc6 100644 --- a/docs/models.md +++ b/docs/models.md @@ -1,19 +1,26 @@ # Models -Pre-trained models are available [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). +COMBO provides pre-trained models for: +- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org), +- enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html). + +## Manual download + +The pre-trained models can be downloaded from [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). + + +If you want to use the console version of COMBO, you need to download a pre-trained model manually: +```bash +wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz +``` + +The downloaded model should be passed as a parameter for COMBO (see [prediction doc](prediction.md)). ## Automatic download -Python `from_pretrained` method will download the pre-trained model if the provided name (without the extension .tar.gz) matches one of the names in [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). +The pre-trained models can be downloaded automatically with the Python `from_pretrained` method. Select a model name (without the extension .tar.gz) from the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/) and pass the name as the attribute to `from_pretrained` method: ```python from combo.predict import COMBO nlp = COMBO.from_pretrained("polish-herbert-base") ``` -Otherwise it looks for a model in local env. - -## Console prediction/Local model -If you want to use the console version of COMBO, you need to download a pre-trained model manually -```bash -wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz -``` -and pass it as a parameter (see [prediction doc](prediction.md)). +If the model name doesn't match any model on the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/), COMBO looks for a model in local env. diff --git a/docs/training.md b/docs/training.md index f4b6d382699dcfea1b63e2657f51c4b848ef0162..d3f69e0913c59681279b1fd966be0f4901ade11e 100644 --- a/docs/training.md +++ b/docs/training.md @@ -1,6 +1,6 @@ # Training -Command: +Basic command: ```bash combo --mode train \ --training_data_path your_training_path \ @@ -32,13 +32,13 @@ Examples (for clarity without training/validation data paths): combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer ``` -* predict only dependency tree: +* train only a dependency parser: ```bash combo --mode train --targets head,deprel ``` -* use part-of-speech tags for predicting only dependency tree +* use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features) ```bash combo --mode train --targets head,deprel --features token,char,upostag