## Description The script [train_ud_version.py](/combo/ud_script/train_ud_version.py) allows for training multiple combo models on specific UD treebank version. To run the script, three parameters are required: - `output_directory` - path to the location where training results will be saved - `treebank_id` - `treebank_version` To find the `treebank_id` and `treebank_version`, visit https://universaldependencies.org/#download. The `treebank_version` is indicated at the beginning of the UD version, while the `treebank_id` is the value at the end of the link used to download it. See the attached image where both values for UD2.11 are highlighted in yellow.  The script will automatically download and extract the UD data into the folder `output_directory`/ud_treebanks-`treebank_version`. Then, it creates a subfolder `output_directory/results` containing: - `serialization_directories` - folder with training results - `completed_training.txt` - a text file with the names of UD treebanks on which training was successfully completed - `skipped_training.csv` - a csv file with two columns, the first containing names of UD treebanks, the second listing reasons why training failed. Possible reasons include: - Dev or test or train file missing - it is expected that there is a .conllu file in the UD directory that contains train, dev, and test in its name. Otherwise, this error is thrown. - Training file less than 1000 bytes - if the training file has less than 1000 bytes, training is skipped. - Training file corrupted - number of columns is less than 10. - Specify transformer model for language code: <lang_code> - No BERT model was assigned to the specified language. To address this, modify the `LANG2TRANSFORMER` variable in the file [constants](/combo/ud_script/constants.py). - Command ... returned non-zero exit status 1 - An error was thrown during the training process. You need to examine logs from this particular training to understand what happened. If script was interrupted at some point, you can rerun it with the same command. Based on values in completed_training reruned script will ommit training on UD treebanks that already have model. Some of the models need adjusted value of word_batch_size, default value will be used unless you specify <UD trebank> <word_batch_size> pair in `UD_2_BATCH_SIZE` constant in [constants](/combo/ud_script/constants.py). ## Example usage Terminal command: ``` python train_ud_version.py --treebank_id 1-5287 --treebank_version 2.13 --output_directory C:\Users\abc\Desktop ```