# Prediction

## COMBO as a Python library
The pre-trained models can be automatically downloaded with the `from_pretrained` method. Select a model name from the lists: [UD-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit?usp=sharing) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324), and pass it as an argument of `from_pretrained`.
```python
from combo.predict import COMBO

nlp = COMBO.from_pretrained("polish-herbert-base-ud29")
sentence = nlp("Sentence to parse.")
```

You can also load your own COMBO model:

```python
from combo.predict import COMBO

model_path = "your_model.tar.gz"
nlp = COMBO.from_pretrained(model_path)
sentence = nlp("Sentence to parse.")
```

COMBO allows to enter presegmented sentences (or texts:
```python
from combo.predict import COMBO

model_path = "your_model.tar.gz"
nlp = COMBO.from_pretrained(model_path)
tokenized_sentence = ["Sentence", "to", "parse", "."]
sentence = nlp([tokenized_sentence])
```

You can use COMBO with the [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo) tokeniser (Note: installing LAMBO is necessary, see [LAMBO installation](https://gitlab.clarin-pl.eu/syntactic-tools/lambo#installation) ).

```python
# Import COMBO and lambo
from combo.predict import COMBO
from combo.utils import lambo_tokenizer

# Download models
nlp = COMBO.from_pretrained("english-bert-base-ud29",tokenizer=lambo_tokenizer.LamboTokenizer("en"))
sentences = nlp("This is the first sentence. This is the second sentence to parse.")
```

## COMBO as a command-line interface 
### CoNLL-U file prediction:
Input and output are both in `*.conllu` format.
```bash
combo --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file --silent
```
### Raw text prediction:
Works for models where input was text-based only. 

Input: one sentence per line.

Output: List of token jsons.

```bash
combo --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --silent --noconllu_format
```


### Console prediction:
Works for models where input was text-based only.

Interactive testing in console (load model and just type sentence in console).

```bash
combo --mode predict --model_path your_model_tar_gz --input_file "-" --nosilent
```

### Advanced

There are 2 tokenizers: whitespace and spacy-based (`en_core_web_sm` model).

Use either `--predictor_name combo` or `--predictor_name combo-spacy` (default tokenizer).