-
Mateusz Klimaszewski authored
Add models license information, make dataset_reader public attribute, exclude treebanks without data from script.
d2a17a76
Models
COMBO provides pre-trained models for:
- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from Universal Dependencies repository,
- enhanced dependency parsing trained on IWPT 2020 shared task data.
Pre-trained models
Pre-trained models list with the evaluation results is available in the spreadsheet Please notice that the name in the brackets matches the name used in Automatic Download.
License
Models are licensed on the same license as data used to train.
See Universal Dependencies v2.7 License Agreement and Universal Dependencies v2.5 License Agreement for details.
Manual download
The pre-trained models can be downloaded from here.
If you want to use the console version of COMBO, you need to download a pre-trained model manually:
wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
The downloaded model should be passed as a parameter for COMBO (see prediction doc).
Automatic download
The pre-trained models can be downloaded automatically with the Python from_pretrained
method. Select a model name (without the extension .tar.gz) from the list of pre-trained models and pass the name as the attribute to from_pretrained
method:
from combo.predict import COMBO
nlp = COMBO.from_pretrained("polish-herbert-base")
If the model name doesn't match any model on the list of pre-trained models, COMBO looks for a model in local env.