Installation
Clone this repository and run:
python setup.py develop
Problems & solutions
- jsonnet installation error
use conda install -c conda-forge jsonnet=0.15.0
Training
Command:
combo --mode train \
--training_data_path your_training_path \
--validation_data_path your_validation_path
Options:
combo --helpfull
Examples (for clarity without training/validation data paths):
-
train on gpu 0
combo --mode train --cuda_device 0
-
use pretrained embeddings:
combo --mode train --pretrained_tokens your_pretrained_embeddings_path --embedding_dim your_embeddings_dim
-
use pretrained transformer embeddings:
combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
-
predict only dependency tree:
combo --mode train --targets head,deprel
-
use part-of-speech tags for predicting only dependency tree
combo --mode train --targets head,deprel --features token,char,upostag
Advanced configuration: Configuration
Prediction
ConLLU file prediction:
Input and output are both in *.conllu
format.
combo --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file --silent
Console
Works for models where input was text-based only.
Interactive testing in console (load model and just type sentence in console).
combo --mode predict --model_path your_model_tar_gz --input_file "-" --nosilent
Raw text
Works for models where input was text-based only.
Input: one sentence per line.
Output: List of token jsons.
combo --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --silent --noconllu_format
Advanced
There are 2 tokenizers: whitespace and spacy-based (en_core_web_sm
model).
Use either --predictor_name semantic-multitask-predictor
or --predictor_name semantic-multitask-predictor-spacy
.
Python
import combo.predict as predict
model_path = "your_model.tar.gz"
nlp = predict.SemanticMultitaskPredictor.from_pretrained(model_path)
sentence = nlp("Sentence to parse.")
Configuration
Advanced
Config template config.template.jsonnet is formed in allennlp
format so you can freely modify it.
There is configuration for all the training/model parameters (learning rates, epochs number etc.).
Some of them use jsonnet
syntax to get values from configuration flags, however most of them can be modified directly there.