Skip to content
Snippets Groups Projects
Commit 160ee6ea authored by Mateusz Klimaszewski's avatar Mateusz Klimaszewski Committed by Mateusz Klimaszewski
Browse files

Extend documentation with better examples.

parent 4c556fd9
Branches enhanced_dependency_parsing
Tags
2 merge requests!9Enhanced dependency parsing develop to master,!8Enhanced dependency parsing
...@@ -10,19 +10,24 @@ ...@@ -10,19 +10,24 @@
</p> </p>
## Quick start ## Quick start
Clone this repository and install COMBO (we suggest using virtualenv/conda with Python 3.6+): Clone this repository and install COMBO (we suggest creating a virtualenv/conda environment with Python 3.6+, as a bundle of required packages will be installed):
```bash ```bash
git clone https://gitlab.clarin-pl.eu/syntactic-tools/clarinbiz/combo.git git clone https://gitlab.clarin-pl.eu/syntactic-tools/clarinbiz/combo.git
cd combo cd combo
python setup.py develop python setup.py develop
``` ```
Run the following lines in your Python console to make predictions with a pre-trained model: Run the following commands in your Python console to make predictions with a pre-trained model:
```python ```python
from combo.predict import COMBO from combo.predict import COMBO
nlp = COMBO.from_pretrained("polish-herbert-base") nlp = COMBO.from_pretrained("polish-herbert-base")
sentence = nlp("Moje zdanie.") sentence = nlp("COVID-19 to ostra choroba zakaźna układu oddechowego wywołana zakażeniem wirusem SARS-CoV-2.")
print(sentence.tokens) ```
Predictions are accessible as a list of token attributes:
```python
print("{:5} {:15} {:15} {:10} {:10} {:10}".format('ID', 'TOKEN', 'LEMMA', 'UPOS', 'HEAD', 'DEPREL'))
for token in sentence.tokens:
print("{:5} {:15} {:15} {:10} {:10} {:10}".format(str(token.id), token.token, token.lemma, token.upostag, str(token.head), token.deprel))
``` ```
## Details ## Details
...@@ -31,4 +36,3 @@ print(sentence.tokens) ...@@ -31,4 +36,3 @@ print(sentence.tokens)
- [**Pre-trained models**](docs/models.md) - [**Pre-trained models**](docs/models.md)
- [**Training**](docs/training.md) - [**Training**](docs/training.md)
- [**Prediction**](docs/prediction.md) - [**Prediction**](docs/prediction.md)
# Models # Models
Pre-trained models are available [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). COMBO provides pre-trained models for:
- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org),
- enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html).
## Manual download
The pre-trained models can be downloaded from [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
If you want to use the console version of COMBO, you need to download a pre-trained model manually:
```bash
wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
```
The downloaded model should be passed as a parameter for COMBO (see [prediction doc](prediction.md)).
## Automatic download ## Automatic download
Python `from_pretrained` method will download the pre-trained model if the provided name (without the extension .tar.gz) matches one of the names in [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/). The pre-trained models can be downloaded automatically with the Python `from_pretrained` method. Select a model name (without the extension .tar.gz) from the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/) and pass the name as the attribute to `from_pretrained` method:
```python ```python
from combo.predict import COMBO from combo.predict import COMBO
nlp = COMBO.from_pretrained("polish-herbert-base") nlp = COMBO.from_pretrained("polish-herbert-base")
``` ```
Otherwise it looks for a model in local env. If the model name doesn't match any model on the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/), COMBO looks for a model in local env.
## Console prediction/Local model
If you want to use the console version of COMBO, you need to download a pre-trained model manually
```bash
wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
```
and pass it as a parameter (see [prediction doc](prediction.md)).
# Training # Training
Command: Basic command:
```bash ```bash
combo --mode train \ combo --mode train \
--training_data_path your_training_path \ --training_data_path your_training_path \
...@@ -32,13 +32,13 @@ Examples (for clarity without training/validation data paths): ...@@ -32,13 +32,13 @@ Examples (for clarity without training/validation data paths):
combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
``` ```
* predict only dependency tree: * train only a dependency parser:
```bash ```bash
combo --mode train --targets head,deprel combo --mode train --targets head,deprel
``` ```
* use part-of-speech tags for predicting only dependency tree * use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features)
```bash ```bash
combo --mode train --targets head,deprel --features token,char,upostag combo --mode train --targets head,deprel --features token,char,upostag
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment