From 5493eec76c74eb0febb3fc688b127e2f9858bc84 Mon Sep 17 00:00:00 2001 From: Maja Jablonska <majajjablonska@gmail.com> Date: Wed, 22 Nov 2023 01:42:04 +1100 Subject: [PATCH] More documentation --- docs/Default Model.md | 4 ++++ docs/Prediction.md | 6 ++--- docs/Training.md | 51 +++++++++++++++++++++++++++++++++++++++---- 3 files changed, 54 insertions(+), 7 deletions(-) create mode 100644 docs/Default Model.md diff --git a/docs/Default Model.md b/docs/Default Model.md new file mode 100644 index 0000000..66ac537 --- /dev/null +++ b/docs/Default Model.md @@ -0,0 +1,4 @@ +# Default model configurations + +For convenience, the file ```default_model.py``` contains a few default configurations for ComboModel, +Universal Dependencies DatasetReader and few other most commonly used. \ No newline at end of file diff --git a/docs/Prediction.md b/docs/Prediction.md index 7f836de..f749fea 100644 --- a/docs/Prediction.md +++ b/docs/Prediction.md @@ -40,7 +40,7 @@ By default, COMBO uses the LAMBO tokenizer. Input and output are both in ```*.conllu``` format. ```bash -combo --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file +python combo/main.py --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file ``` ### Raw text prediction @@ -52,7 +52,7 @@ Input: one sentence per line. Output: CONLL-u file. ```bash -combo --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --noconllu_format +python combo/main.py --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --noconllu_format ``` ### Console prediction @@ -62,5 +62,5 @@ Works for models where input was text-based only. Interactive testing in console (load model and just type sentence in console). ```bash -combo --mode predict --model_path your_model_tar_gz --input_file "-" +python combo/main.py --mode predict --model_path your_model_tar_gz --input_file "-" ``` \ No newline at end of file diff --git a/docs/Training.md b/docs/Training.md index 7ac726c..d4f724b 100644 --- a/docs/Training.md +++ b/docs/Training.md @@ -3,7 +3,7 @@ Basic command: ```bash -combo --mode train \ +python combo/main.py --mode train \ --training_data_path your_training_path \ --validation_data_path your_validation_path ``` @@ -11,7 +11,7 @@ combo --mode train \ Options: ```bash -combo --helpfull +python combo/main.py --helpfull ``` ## Examples @@ -20,7 +20,50 @@ For clarity, the training and validation data paths are omitted. Train on multiple accelerators (default: train on all available ones) ```bash -combo --mode train - --n_cuda_devices 8 +python combo/main.py --mode train --n_cuda_devices 8 ``` +Use pretrained transformer embeddings: + +```bash +python combo/main.py --mode train --pretrained_transformer_name your_chosen_pretrained_transformer +``` + +Train only a dependency parser: + +```bash +python combo/main.py --mode train --targets head,deprel +``` + +Use additional features (e.g. part-of-speech tags) for training a dependency parser +(```token``` and ```char``` are default features) + +```bash +python combo/main.py --mode train --targets head,deprel --features token,char,upostag +``` + +# Custom configuration + +Use a custom configuration: + +```bash +python combo/main.py --config_path configuration.json +``` + +Discard any flags (including default flag values - without this flag, the default values +will override configuration!) + +```bash +python combo/main.py --config_path configuration.json --use_pure_config +``` + +## Finetuning + +Finetune a pre-trained model: + +```bash +python combo/main.py --mode train --finetune \ + --finetuning_training_data_path your_training_path \ + --finetuning_validation_data_path your_validation_path \ + --model_path pretrained_model_path +``` \ No newline at end of file -- GitLab