Skip to content
Snippets Groups Projects
Select Git revision
  • f3dcbc3931617f829c9682b908b9bbd0006410c9
  • main default protected
  • ud_training_script
  • fix_seed
  • merged-with-ner
  • multiword_fix_transformer
  • transformer_encoder
  • combo3
  • save_deprel_matrix_to_npz
  • master protected
  • combo-lambo
  • lambo-sent-attributes
  • adding_lambo
  • develop
  • update_allenlp2
  • develop_tmp
  • tokens_truncation
  • LR_test
  • eud_iwpt
  • iob
  • eud_iwpt_shared_task_bert_finetuning
  • 3.3.1
  • list
  • 3.2.1
  • 3.0.3
  • 3.0.1
  • 3.0.0
  • v1.0.6
  • v1.0.5
  • v1.0.4
  • v1.0.3
  • v1.0.2
  • v1.0.1
  • v1.0.0
34 results

training.md

Blame
  • Training

    Basic command:

    combo --mode train \
          --training_data_path your_training_path \
          --validation_data_path your_validation_path

    Options:

    combo --helpfull

    Examples (for clarity without training/validation data paths):

    • train on gpu 0

      combo --mode train --cuda_device 0
    • use pretrained embeddings:

      combo --mode train --pretrained_tokens your_pretrained_embeddings_path --embedding_dim your_embeddings_dim
    • use pretrained transformer embeddings:

      combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
    • train only a dependency parser:

      combo --mode train --targets head,deprel
    • use additional features (e.g. part-of-speech tags) for training a dependency parser (token and char are default features)

      combo --mode train --targets head,deprel --features token,char,upostag

    Enhanced Dependencies

    Enhanced Dependencies are described here. Training an enhanced graph prediction model requires data pre-processing.

    Data pre-processing

    The organisers of IWPT20 shared task distributed the data sets and a data pre-processing script enhanced_collapse_empty_nodes.pl. If you wish to train a model on IWPT20 data, apply this script to the training and validation data sets, before training the COMBO EUD model.

    perl enhanced_collapse_empty_nodes.pl training.conllu > training.fixed.conllu

    Training EUD model

    combo --mode train \
          --training_data_path your_preprocessed_training_path \
          --validation_data_path your_preprocessed_validation_path \
          --targets feats,upostag,xpostag,head,deprel,lemma,deps \
          --config_path config.graph.template.jsonnet

    Configuration

    Advanced

    Config template config.template.jsonnet is formed in allennlp format so you can freely modify it. There is configuration for all the training/model parameters (learning rates, epochs number etc.). Some of them use jsonnet syntax to get values from configuration flags, however most of them can be modified directly there.