# Models

COMBO provides pre-trained models for:
- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org) ([Zeman et al. 2020](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3424)),
- enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html) ([Bouma et al. 2020](https://www.aclweb.org/anthology/2020.iwpt-1.16.pdf)).

## Pre-trained models
**Morphosyntactic prediction models** trained on the selected UD treebanks version 2.7 and their **evaluation results** are listed in [Model performance (UD2.7)](https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md) table.

**Morphosyntactic prediction models** trained on the seleted UD treebanks version 2.5 and **enhanced parsing models** are listed in the spreadsheets: [UD2.5-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=0) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324).

### License
Models are distributed under the same license as datasets used for their training.

See [Universal Dependencies v2.7 License Agreement](https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7) and [Universal Dependencies v2.5 License Agreement](https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5) for details.


## Automatic download
The pre-trained models can be automatically downloaded  with the `from_pretrained` method in the Python mode. Select the model name of a pre-trained model (see the column **Model name** in [Model performance (UD2.7)](https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md), [UD2.5-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=0) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324)) and pass the name as an attribute of the `from_pretrained` method:

```python
from combo.predict import COMBO

nlp = COMBO.from_pretrained("polish-herbert-base")
```
If the model name doesn't match any model on the pre-trained model lists, COMBO looks for a model in local env.

## Manual download

If you want to use COMBO in the command-line mode, you need to manually download a pre-trained model. The pre-trained models can be manually downloaded to a local disk with the `wget` package.  The links to the pre-trained models are listed in the column **Model name** in [Model performance (UD2.7)](https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md), or **Model link** in [UD2.5-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=0) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324).

```bash
wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
```

The path to the downloaded model should be passed as a parameter for COMBO in CLI (see [prediction doc](prediction.md)).