# Models COMBO provides pre-trained models for: - morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org) ([Zeman et al. 2020](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3424)), - enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html) ([Bouma et al. 2020](https://www.aclweb.org/anthology/2020.iwpt-1.16.pdf)). ## Pre-trained models **Morphosyntactic prediction models** trained on the selected UD treebanks version 2.7 and their **evaluation results** are listed in [Model performance (UD2.7)](https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md) table. **Morphosyntactic prediction models** trained on the seleted UD treebanks version 2.5 and **enhanced parsing models** are listed in the spreadsheets: [UD2.5-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=0) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324). ### License Models are distributed under the same license as datasets used for their training. See [Universal Dependencies v2.7 License Agreement](https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7) and [Universal Dependencies v2.5 License Agreement](https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5) for details. ## Automatic download The pre-trained models can be automatically downloaded with the `from_pretrained` method in the Python mode. Select the model name of a pre-trained model (see the column **Model name** in [Model performance (UD2.7)](https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md), [UD2.5-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=0) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324)) and pass the name as an attribute of the `from_pretrained` method: ```python from combo.predict import COMBO nlp = COMBO.from_pretrained("polish-herbert-base") ``` If the model name doesn't match any model on the pre-trained model lists, COMBO looks for a model in local env. ## Manual download If you want to use COMBO in the command-line mode, you need to manually download a pre-trained model. The pre-trained models can be manually downloaded to a local disk with the `wget` package. The links to the pre-trained models are listed in the column **Model name** in [Model performance (UD2.7)](https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md), or **Model link** in [UD2.5-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=0) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324). ```bash wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz ``` The path to the downloaded model should be passed as a parameter for COMBO in CLI (see [prediction doc](prediction.md)).