Skip to content
Snippets Groups Projects
Select Git revision
  • master
1 result

bert-inspect

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Michal Pogoda authored
    e2976874
    History

    BERT Inspect

    BERT-based interpretable features extractor. This package combines Bert fine-tuning and integrated-based analisis of bert model into single nlp worker

    Outputs

    ├── clouds
    │   ├── class1
    │   │   ├── keyword_1.png
    │   │   ├── keyword_1.json
    │   │   ├── keyword_2.png
    │   │   ├── keyword_2.json
    │   └── class2
    │   │   ├── keyword_1.png
    │   │   ├── keyword_1.json
    │   │   ├── keyword_2.png
    │   │   ├── keyword_2.json
    ├── positive.json
    └── negative.json
    • positive.json: Json file with top average positive attibutions of tokens for each class
    • negative.json: Json file with top average positive attibutions of tokens for each class
    • clouds/<classname>/*.png: Rendered word-cloud of words defining context for each top token
    • clouds/<classname>/*.json: Json-formated word-cloud of words defining context for each top token

    Mount directories

    /workdir/cache: Place where base_model will be stored

    Config

    Fine tuning stage

    • base_model - Pretrained hugginface BERT model (default dkleczek/bert-base-polish-cased-v1)
    • pooling - Defines what type of pooling is applied to tokens embedded by bert before classification head (one of mean, cls, max)
    • layers_frozen - Defines how many BERT layers are frozen during fine-tuning (Max 12, meaning that only classification head will be trained)
    • max_epochs - Maximum number of training epochs
    • early_stopping - If true, training will be stopped if f1 will not raise for 3 epochs
    • truncation - What type of truncation will be applied to documents longer than 510 tokens (end / front)
    • valid_frac - What fraction of data will be used for validation
    • learning_rate - Learning rate
    • classificator_size - Number of hidden neurons in classification head
    • dropout - Dropout rate
    • batch_size - Batch size (~10 is maxium on 2080 Ti with no frozen layer)
    • num_workers - Number of CPU workers feeding the data from disk
    • weighted_sampling - If true, data will be sampled with returns in a way to equalize getting sample of any class

    Analysis stage

    • num_steps - Number of Integrated Gradient steps
    • attention_layer_id - Layer from which attentions will be extracted (0 is most intepretable, as no word diffusion is made at that stage)
    • internal_batch - Batch for Integrated Gradient steps
    • device_name - Name of a device on which Embedding & Itegrated Gradients will be performed (default cuda:0)
    • subword_tokens - If set to true, subword tokens will be used (eg. Kowal ##ski).