diff --git a/docs/training.md b/docs/training.md index 9dc430a782baacb7344e95d29a7bf9066b1df613..f4b6d382699dcfea1b63e2657f51c4b848ef0162 100644 --- a/docs/training.md +++ b/docs/training.md @@ -43,7 +43,27 @@ Examples (for clarity without training/validation data paths): ```bash combo --mode train --targets head,deprel --features token,char,upostag ``` - + +## Enhanced UD + +Training a model with Enhanced UD prediction **requires** data pre-processing. + +```bash +combo --mode train \ + --training_data_path your_preprocessed_training_path \ + --validation_data_path your_preprocessed_validation_path \ + --targets feats,upostag,xpostag,head,deprel,lemma,deps \ + --config_path config.graph.template.jsonnet +``` +### Data pre-processing +Download data from [IWPT20 Shared Task](https://universaldependencies.org/iwpt20/data.html). +It contains `enhanced_collapse_empty_nodes.pl` script which is required as pre-processing step. +Apply this script to training and validation data. + +```bash +perl enhanced_collapse_empty_nodes.pl training.conllu > training.fixed.conllu +``` + ## Configuration ### Advanced