From 160ee6ea3f05316abb2d91f2103a6d91009e819c Mon Sep 17 00:00:00 2001
From: Mateusz Klimaszewski <mk.klimaszewski@gmail.com>
Date: Sun, 3 Jan 2021 11:46:19 +0100
Subject: [PATCH] Extend documentation with better examples.

---
 README.md        | 14 +++++++++-----
 docs/models.md   | 27 +++++++++++++++++----------
 docs/training.md |  6 +++---
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/README.md b/README.md
index c339bda..a9c2113 100644
--- a/README.md
+++ b/README.md
@@ -10,19 +10,24 @@
 </p>
 
 ## Quick start
-Clone this repository and install COMBO (we suggest using virtualenv/conda with Python 3.6+):
+Clone this repository and install COMBO (we suggest creating a virtualenv/conda environment with Python 3.6+, as a bundle of required packages will be installed):
 ```bash
 git clone https://gitlab.clarin-pl.eu/syntactic-tools/clarinbiz/combo.git
 cd combo
 python setup.py develop
 ```
-Run the following lines in your Python console to make predictions with a pre-trained model:
+Run the following commands in your Python console to make predictions with a pre-trained model:
 ```python
 from combo.predict import COMBO
 
 nlp = COMBO.from_pretrained("polish-herbert-base")
-sentence = nlp("Moje zdanie.")
-print(sentence.tokens)
+sentence = nlp("COVID-19 to ostra choroba zakaÅºna ukÅ‚adu oddechowego wywoÅ‚ana zakaÅ¼eniem wirusem SARS-CoV-2.")
+```
+Predictions are accessible as a list of token attributes:
+```python
+print("{:5} {:15} {:15} {:10} {:10} {:10}".format('ID', 'TOKEN', 'LEMMA', 'UPOS', 'HEAD', 'DEPREL'))
+for token in sentence.tokens:
+    print("{:5} {:15} {:15} {:10} {:10} {:10}".format(str(token.id), token.token, token.lemma, token.upostag, str(token.head), token.deprel))
 ```
 
 ## Details
@@ -31,4 +36,3 @@ print(sentence.tokens)
 - [**Pre-trained models**](docs/models.md)
 - [**Training**](docs/training.md)
 - [**Prediction**](docs/prediction.md)
-
diff --git a/docs/models.md b/docs/models.md
index d4346ff..25a7f70 100644
--- a/docs/models.md
+++ b/docs/models.md
@@ -1,19 +1,26 @@
 # Models
 
-Pre-trained models are available [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
+COMBO provides pre-trained models for:
+- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org),
+- enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html).
+
+## Manual download
+
+The pre-trained models can be downloaded from [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
+
+
+If you want to use the console version of COMBO, you need to download a pre-trained model manually:
+```bash
+wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
+```
+
+The downloaded model should be passed as a parameter for COMBO (see [prediction doc](prediction.md)).
 
 ## Automatic download
-Python `from_pretrained` method will download the pre-trained model if the provided name (without the extension .tar.gz) matches one of the names in [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
+The pre-trained models can be downloaded automatically with the Python `from_pretrained` method. Select a model name (without the extension .tar.gz) from the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/) and pass the name as the attribute to `from_pretrained` method:
 ```python
 from combo.predict import COMBO
 
 nlp = COMBO.from_pretrained("polish-herbert-base")
 ```
-Otherwise it looks for a model in local env.
-
-## Console prediction/Local model
-If you want to use the console version of COMBO, you need to download a pre-trained model manually
-```bash
-wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
-```
-and pass it as a parameter (see [prediction doc](prediction.md)).
+If the model name doesn't match any model on the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/), COMBO looks for a model in local env.
diff --git a/docs/training.md b/docs/training.md
index f4b6d38..d3f69e0 100644
--- a/docs/training.md
+++ b/docs/training.md
@@ -1,6 +1,6 @@
 # Training
 
-Command:
+Basic command:
 ```bash
 combo --mode train \
       --training_data_path your_training_path \
@@ -32,13 +32,13 @@ Examples (for clarity without training/validation data paths):
     combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
     ```
 
-* predict only dependency tree:
+* train only a dependency parser:
 
     ```bash
     combo --mode train --targets head,deprel
     ```
 
-* use part-of-speech tags for predicting only dependency tree
+* use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features)
 
     ```bash
     combo --mode train --targets head,deprel --features token,char,upostag
-- 
GitLab