From 51be66807bc514b22cddff95eb3adf9201030b91 Mon Sep 17 00:00:00 2001 From: Mateusz Klimaszewski <mk.klimaszewski@gmail.com> Date: Fri, 11 Dec 2020 13:50:50 +0100 Subject: [PATCH] Add documentation EUD model training. --- docs/training.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/docs/training.md b/docs/training.md index 9dc430a..f4b6d38 100644 --- a/docs/training.md +++ b/docs/training.md @@ -43,7 +43,27 @@ Examples (for clarity without training/validation data paths): ```bash combo --mode train --targets head,deprel --features token,char,upostag ``` - + +## Enhanced UD + +Training a model with Enhanced UD prediction **requires** data pre-processing. + +```bash +combo --mode train \ + --training_data_path your_preprocessed_training_path \ + --validation_data_path your_preprocessed_validation_path \ + --targets feats,upostag,xpostag,head,deprel,lemma,deps \ + --config_path config.graph.template.jsonnet +``` +### Data pre-processing +Download data from [IWPT20 Shared Task](https://universaldependencies.org/iwpt20/data.html). +It contains `enhanced_collapse_empty_nodes.pl` script which is required as pre-processing step. +Apply this script to training and validation data. + +```bash +perl enhanced_collapse_empty_nodes.pl training.conllu > training.fixed.conllu +``` + ## Configuration ### Advanced -- GitLab