# Training Basic command: ```bash combo --mode train \ --training_data_path your_training_path \ --validation_data_path your_validation_path ``` Options: ```bash combo --helpfull ``` Examples (for clarity without training/validation data paths): * train on gpu 0 ```bash combo --mode train --cuda_device 0 ``` * use pretrained embeddings: ```bash combo --mode train --pretrained_tokens your_pretrained_embeddings_path --embedding_dim your_embeddings_dim ``` * use pretrained transformer embeddings: ```bash combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer ``` * train only a dependency parser: ```bash combo --mode train --targets head,deprel ``` * use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features) ```bash combo --mode train --targets head,deprel --features token,char,upostag ``` ## Enhanced Dependencies Enhanced Dependencies are described [here](https://universaldependencies.org/u/overview/enhanced-syntax.html). Training an enhanced graph prediction model **requires** data pre-processing. ### Data pre-processing The organisers of [IWPT20 shared task](https://universaldependencies.org/iwpt20/data.html) distributed the data sets and a data pre-processing script `enhanced_collapse_empty_nodes.pl`. If you wish to train a model on IWPT20 data, apply this script to the training and validation data sets, before training the COMBO EUD model. The script is part of the [UD tools repository](https://github.com/UniversalDependencies/tools/). ```bash perl enhanced_collapse_empty_nodes.pl training.conllu > training.fixed.conllu ``` ### Training EUD model ```bash combo --mode train \ --training_data_path your_preprocessed_training_path \ --validation_data_path your_preprocessed_validation_path \ --targets feats,upostag,xpostag,head,deprel,lemma,deps \ --config_path config.graph.template.jsonnet ``` ## Configuration ### Advanced Config template [config.template.jsonnet](config.template.jsonnet) is formed in `allennlp` format so you can freely modify it. There is configuration for all the training/model parameters (learning rates, epochs number etc.). Some of them use `jsonnet` syntax to get values from configuration flags, however most of them can be modified directly there.