-
Mateusz Klimaszewski authored1f854b40
Training
Basic command:
combo --mode train \
--training_data_path your_training_path \
--validation_data_path your_validation_path
Options:
combo --helpfull
Examples (for clarity without training/validation data paths):
-
train on gpu 0
combo --mode train --cuda_device 0
-
use pretrained embeddings:
combo --mode train --pretrained_tokens your_pretrained_embeddings_path --embedding_dim your_embeddings_dim
-
use pretrained transformer embeddings:
combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
-
train only a dependency parser:
combo --mode train --targets head,deprel
-
use additional features (e.g. part-of-speech tags) for training a dependency parser (
token
andchar
are default features)combo --mode train --targets head,deprel --features token,char,upostag
Enhanced Dependencies
Enhanced Dependencies are described here. Training an enhanced graph prediction model requires data pre-processing.
Data pre-processing
The organisers of IWPT20 shared task distributed the data sets and a data pre-processing script enhanced_collapse_empty_nodes.pl
. If you wish to train a model on IWPT20 data, apply this script to the training and validation data sets, before training the COMBO EUD model.
The script is part of the UD tools repository.
perl enhanced_collapse_empty_nodes.pl training.conllu > training.fixed.conllu
Training EUD model
combo --mode train \
--training_data_path your_preprocessed_training_path \
--validation_data_path your_preprocessed_validation_path \
--targets feats,upostag,xpostag,head,deprel,lemma,deps \
--config_path config.graph.template.jsonnet
Configuration
Advanced
Config template config.template.jsonnet is formed in allennlp
format so you can freely modify it.
There is configuration for all the training/model parameters (learning rates, epochs number etc.).
Some of them use jsonnet
syntax to get values from configuration flags, however most of them can be modified directly there.