# Prediction ## COMBO as a Python library The pre-trained models can be automatically downloaded with the `from_pretrained` method. Select a model name from the lists: [UD-trained COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit?usp=sharing) and [enhanced COMBO models](https://docs.google.com/spreadsheets/d/1WFYc2aLRa1jw7le030HOacv9fc4zmtqiZtRQY6gl5mc/edit#gid=1757180324), and pass it as an argument of `from_pretrained`. ```python from combo.predict import COMBO nlp = COMBO.from_pretrained("polish-herbert-base-ud29") sentence = nlp("Sentence to parse.") ``` You can also load your own COMBO model: ```python from combo.predict import COMBO model_path = "your_model.tar.gz" nlp = COMBO.from_pretrained(model_path) sentence = nlp("Sentence to parse.") ``` COMBO allows to enter presegmented sentences (or texts: ```python from combo.predict import COMBO model_path = "your_model.tar.gz" nlp = COMBO.from_pretrained(model_path) tokenized_sentence = ["Sentence", "to", "parse", "."] sentence = nlp([tokenized_sentence]) ``` You can use COMBO with the [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo) tokeniser (Note: installing LAMBO is necessary, see [LAMBO installation](https://gitlab.clarin-pl.eu/syntactic-tools/lambo#installation) ). ```python # Import COMBO and lambo from combo.predict import COMBO from combo.utils import lambo_tokenizer # Download models nlp = COMBO.from_pretrained("english-bert-base-ud29",tokenizer=lambo_tokenizer.LamboTokenizer("en")) sentences = nlp("This is the first sentence. This is the second sentence to parse.") ``` ## COMBO as a command-line interface ### CoNLL-U file prediction: Input and output are both in `*.conllu` format. ```bash combo --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file --silent ``` ### Raw text prediction: Works for models where input was text-based only. Input: one sentence per line. Output: List of token jsons. ```bash combo --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --silent --noconllu_format ``` ### Console prediction: Works for models where input was text-based only. Interactive testing in console (load model and just type sentence in console). ```bash combo --mode predict --model_path your_model_tar_gz --input_file "-" --nosilent ``` ### Advanced There are 2 tokenizers: whitespace and spacy-based (`en_core_web_sm` model). Use either `--predictor_name combo` or `--predictor_name combo-spacy` (default tokenizer).