From 82ba2f40f0366cf631ebdfc621d37b4f344232a5 Mon Sep 17 00:00:00 2001 From: Maja Jablonska <majajjablonska@gmail.com> Date: Mon, 20 Nov 2023 22:39:49 +1100 Subject: [PATCH] Prediction.md and Troubleshooting.md --- docs/Prediction.md | 66 +++++++++++++++++++++++++++++++++++++++++ docs/Troubleshooting.md | 13 ++++++++ 2 files changed, 79 insertions(+) create mode 100644 docs/Prediction.md create mode 100644 docs/Troubleshooting.md diff --git a/docs/Prediction.md b/docs/Prediction.md new file mode 100644 index 0000000..7f836de --- /dev/null +++ b/docs/Prediction.md @@ -0,0 +1,66 @@ +# Prediction + +## COMBO as a Python library + +The pre-trained models can be automatically downloaded with the ```from_pretrained``` +method. Select a model name from the lists: UD-trained COMBO models and pass it as an argument of from_pretrained. + +```python +from combo.predict import COMBO + +nlp = COMBO.from_pretrained("model-prototype") +sentence = nlp("Sentence to parse.") +``` + +You can also load your own COMBO model: + +```python +from combo.predict import COMBO + +model_path = "your_model.tar.gz" +nlp = COMBO.from_pretrained(model_path) +sentence = nlp("Sentence to parse.") +``` + +COMBO allows to enter presegmented sentences (or texts): + +```python +from combo.predict import COMBO + +model_path = "your_model.tar.gz" +nlp = COMBO.from_pretrained(model_path) +tokenized_sentence = ["Sentence", "to", "parse", "."] +sentence = nlp([tokenized_sentence]) +``` + +By default, COMBO uses the LAMBO tokenizer. + +## COMBO as a command-line interface + +Input and output are both in ```*.conllu``` format. + +```bash +combo --mode predict --model_path your_model_tar_gz --input_file your_conllu_file --output_file your_output_file +``` + +### Raw text prediction + +Works for models where input was text-based only. + +Input: one sentence per line. + +Output: CONLL-u file. + +```bash +combo --mode predict --model_path your_model_tar_gz --input_file your_text_file --output_file your_output_file --noconllu_format +``` + +### Console prediction + +Works for models where input was text-based only. + +Interactive testing in console (load model and just type sentence in console). + +```bash +combo --mode predict --model_path your_model_tar_gz --input_file "-" +``` \ No newline at end of file diff --git a/docs/Troubleshooting.md b/docs/Troubleshooting.md new file mode 100644 index 0000000..37dd340 --- /dev/null +++ b/docs/Troubleshooting.md @@ -0,0 +1,13 @@ +# A few common problems + +## Downloading a model + +When downloading a model using the ```from_pretrained``` method, the downloaded file might be +incomplete, e.g. due to a network error. The following error: + +``` +EOFError: Compressed file ended before the end-of-stream marker was reached +``` + +means that the cache directory (by default ```$HOME/.combo```) contains a corrupted file. +Deleting such a file and downloading the model again should help. \ No newline at end of file -- GitLab