Skip to content
Snippets Groups Projects
Select Git revision
  • 196187ac636d25a033eb2d9b3f9f064edc8c4c31
  • main default protected
  • ud_training_script
  • fix_seed
  • merged-with-ner
  • multiword_fix_transformer
  • transformer_encoder
  • combo3
  • save_deprel_matrix_to_npz
  • master protected
  • combo-lambo
  • lambo-sent-attributes
  • adding_lambo
  • develop
  • update_allenlp2
  • develop_tmp
  • tokens_truncation
  • LR_test
  • eud_iwpt
  • iob
  • eud_iwpt_shared_task_bert_finetuning
  • 3.3.1
  • list
  • 3.2.1
  • 3.0.3
  • 3.0.1
  • 3.0.0
  • v1.0.6
  • v1.0.5
  • v1.0.4
  • v1.0.3
  • v1.0.2
  • v1.0.1
  • v1.0.0
34 results

prediction.md

Blame
  • Prediction

    COMBO as a Python library

    The pre-trained models can be automatically downloaded with the from_pretrained method. Select a model name from the lists: UD-trained COMBO models and enhanced COMBO models, and pass it as an argument of from_pretrained.

    from combo.predict import COMBO
    
    nlp = COMBO.from_pretrained(`polish-herbert-base`)
    sentence = nlp("Sentence to parse.")

    You can also load your own COMBO model:

    from combo.predict import COMBO
    
    model_path = "your_model.tar.gz"
    nlp = COMBO.from_pretrained(model_path)
    sentence = nlp("Sentence to parse.")

    COMBO allows to enter presegmented sentences (or texts:

    from combo.predict import COMBO
    
    model_path = "your_model.tar.gz"
    nlp = COMBO.from_pretrained(model_path)
    tokenized_sentence = ["Sentence", "to", "parse", "."]
    sentence = nlp([tokenized_sentence])

    COMBO as a command-line interface

    CoNLL-U file prediction:

    Input and output are both in *.conllu format.

    combo --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file --silent

    Raw text prediction:

    Works for models where input was text-based only.

    Input: one sentence per line.

    Output: List of token jsons.

    combo --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --silent --noconllu_format

    Console prediction:

    Works for models where input was text-based only.

    Interactive testing in console (load model and just type sentence in console).

    combo --mode predict --model_path your_model_tar_gz --input_file "-" --nosilent

    Advanced

    There are 2 tokenizers: whitespace and spacy-based (en_core_web_sm model).

    Use either --predictor_name combo or --predictor_name combo-spacy (default tokenizer).