Skip to content
Snippets Groups Projects
Select Git revision
  • 11e590fa32a9ecfafbe9be666769019889318ab4
  • master default protected
  • fix-words-ann
  • wccl-rules-migration
  • develop
5 results

setpredicate.h

Blame
  • training.md NaN GiB
    # Training
    
    Basic command:
    ```bash
    combo --mode train \
          --training_data_path your_training_path \
          --validation_data_path your_validation_path
    ```
    
    Options:
    ```bash
    combo --helpfull
    ```
    
    Examples (for clarity without training/validation data paths):
    
    * train on gpu 0
    
        ```bash
        combo --mode train --cuda_device 0
        ```
    
    * use pretrained embeddings:
    
        ```bash
        combo --mode train --pretrained_tokens your_pretrained_embeddings_path --embedding_dim your_embeddings_dim
        ```
    
    * use pretrained transformer embeddings:
    
        ```bash
        combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
        ```
    
    * train only a dependency parser:
    
        ```bash
        combo --mode train --targets head,deprel
        ```
    
    * use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features)
    
        ```bash
        combo --mode train --targets head,deprel --features token,char,upostag
        ```
    
    ## Enhanced Dependencies
    
    Enhanced Dependencies are described [here](https://universaldependencies.org/u/overview/enhanced-syntax.html). Training an enhanced graph prediction model **requires** data pre-processing.
    
    ### Data pre-processing
    The organisers of [IWPT20 shared task](https://universaldependencies.org/iwpt20/data.html) distributed the data sets and a data pre-processing script `enhanced_collapse_empty_nodes.pl`. If you wish to train a model on IWPT20 data, apply this script to the training and validation data sets, before training the COMBO EUD model.
    
    ```bash
    perl enhanced_collapse_empty_nodes.pl training.conllu > training.fixed.conllu
    ``` 
    
    ### Training EUD model
    
    
    ```bash
    combo --mode train \
          --training_data_path your_preprocessed_training_path \
          --validation_data_path your_preprocessed_validation_path \
          --targets feats,upostag,xpostag,head,deprel,lemma,deps \
          --config_path config.graph.template.jsonnet
    ```
    
    
    ## Configuration
    
    ### Advanced
    Config template [config.template.jsonnet](config.template.jsonnet) is formed in `allennlp` format so you can freely modify it.
    There is configuration for all the training/model parameters (learning rates, epochs number etc.).
    Some of them use `jsonnet` syntax to get values from configuration flags, however most of them can be modified directly there.