Snippets Groups Projects

Add models license information, make dataset_reader public attribute, exclude... · d2a17a76
Mateusz Klimaszewski authored 4 years ago
```
Add models license information, make dataset_reader public attribute, exclude treebanks without data from script.
```
d2a17a76

models.md 2.03 KiB

Models

COMBO provides pre-trained models for:

morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from Universal Dependencies repository,
enhanced dependency parsing trained on IWPT 2020 shared task data.

Pre-trained models

Pre-trained models list with the evaluation results is available in the spreadsheet Please notice that the name in the brackets matches the name used in Automatic Download.

License

Models are licensed on the same license as data used to train.

See Universal Dependencies v2.7 License Agreement and Universal Dependencies v2.5 License Agreement for details.

Manual download

The pre-trained models can be downloaded from here.

If you want to use the console version of COMBO, you need to download a pre-trained model manually:

wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz

The downloaded model should be passed as a parameter for COMBO (see prediction doc).

Automatic download

The pre-trained models can be downloaded automatically with the Python from_pretrained method. Select a model name (without the extension .tar.gz) from the list of pre-trained models and pass the name as the attribute to from_pretrained method:

from combo.predict import COMBO

nlp = COMBO.from_pretrained("polish-herbert-base")

If the model name doesn't match any model on the list of pre-trained models, COMBO looks for a model in local env.