Skip to content
Snippets Groups Projects
Commit 160ee6ea authored by Mateusz Klimaszewski's avatar Mateusz Klimaszewski Committed by Mateusz Klimaszewski
Browse files

Extend documentation with better examples.

parent 4c556fd9
Branches enhanced_dependency_parsing
Tags
2 merge requests!9Enhanced dependency parsing develop to master,!8Enhanced dependency parsing
This commit is part of merge request !8. Comments created here will be created in the context of that merge request.
......@@ -10,19 +10,24 @@
</p>
## Quick start
Clone this repository and install COMBO (we suggest using virtualenv/conda with Python 3.6+):
Clone this repository and install COMBO (we suggest creating a virtualenv/conda environment with Python 3.6+, as a bundle of required packages will be installed):
```bash
git clone https://gitlab.clarin-pl.eu/syntactic-tools/clarinbiz/combo.git
cd combo
python setup.py develop
```
Run the following lines in your Python console to make predictions with a pre-trained model:
Run the following commands in your Python console to make predictions with a pre-trained model:
```python
from combo.predict import COMBO
nlp = COMBO.from_pretrained("polish-herbert-base")
sentence = nlp("Moje zdanie.")
print(sentence.tokens)
sentence = nlp("COVID-19 to ostra choroba zakaźna układu oddechowego wywołana zakażeniem wirusem SARS-CoV-2.")
```
Predictions are accessible as a list of token attributes:
```python
print("{:5} {:15} {:15} {:10} {:10} {:10}".format('ID', 'TOKEN', 'LEMMA', 'UPOS', 'HEAD', 'DEPREL'))
for token in sentence.tokens:
print("{:5} {:15} {:15} {:10} {:10} {:10}".format(str(token.id), token.token, token.lemma, token.upostag, str(token.head), token.deprel))
```
## Details
......@@ -31,4 +36,3 @@ print(sentence.tokens)
- [**Pre-trained models**](docs/models.md)
- [**Training**](docs/training.md)
- [**Prediction**](docs/prediction.md)
# Models
Pre-trained models are available [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
COMBO provides pre-trained models for:
- morphosyntactic prediction (i.e. part-of-speech tagging, morphosyntactic analysis, lemmatisation and dependency parsing) trained on the treebanks from [Universal Dependencies repository](https://universaldependencies.org),
- enhanced dependency parsing trained on IWPT 2020 shared task [data](https://universaldependencies.org/iwpt20/data.html).
## Manual download
The pre-trained models can be downloaded from [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
If you want to use the console version of COMBO, you need to download a pre-trained model manually:
```bash
wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
```
The downloaded model should be passed as a parameter for COMBO (see [prediction doc](prediction.md)).
## Automatic download
Python `from_pretrained` method will download the pre-trained model if the provided name (without the extension .tar.gz) matches one of the names in [here](http://mozart.ipipan.waw.pl/~mklimaszewski/models/).
The pre-trained models can be downloaded automatically with the Python `from_pretrained` method. Select a model name (without the extension .tar.gz) from the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/) and pass the name as the attribute to `from_pretrained` method:
```python
from combo.predict import COMBO
nlp = COMBO.from_pretrained("polish-herbert-base")
```
Otherwise it looks for a model in local env.
## Console prediction/Local model
If you want to use the console version of COMBO, you need to download a pre-trained model manually
```bash
wget http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz
```
and pass it as a parameter (see [prediction doc](prediction.md)).
If the model name doesn't match any model on the list of [pre-trained models](http://mozart.ipipan.waw.pl/~mklimaszewski/models/), COMBO looks for a model in local env.
# Training
Command:
Basic command:
```bash
combo --mode train \
--training_data_path your_training_path \
......@@ -32,13 +32,13 @@ Examples (for clarity without training/validation data paths):
combo --mode train --pretrained_transformer_name your_choosen_pretrained_transformer
```
* predict only dependency tree:
* train only a dependency parser:
```bash
combo --mode train --targets head,deprel
```
* use part-of-speech tags for predicting only dependency tree
* use additional features (e.g. part-of-speech tags) for training a dependency parser (`token` and `char` are default features)
```bash
combo --mode train --targets head,deprel --features token,char,upostag
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment