Skip to content
Snippets Groups Projects
Commit 044f605c authored by Piotr Przybyła's avatar Piotr Przybyła
Browse files

Update README.md

parent d0f7ac18
No related merge requests found
......@@ -10,7 +10,7 @@ LAMBO is a machine learning model, which means it was trained to recognise bound
LAMBO was developed in context of dependency parsing. Thus, it includes models trained on [Universal Dependencies treebanks](https://universaldependencies.org/#language-), uses `.conllu` as the training [data format](https://universaldependencies.org/conll18/evaluation.html) and supports integration with [COMBO](https://gitlab.clarin-pl.eu/syntactic-tools/combo), a state-of-the-art system for dependency parsing and more. However, you can use LAMBO as the first stage of any NLP process.
LAMBO currently includes models trained on 98 corpora in 53 languages. The full list is available in [languages.txt](src/lambo/data/languages.txt). For each of these, two model variants are available:
LAMBO currently includes models trained on 98 corpora in 53 languages. The full list is available in [languages.txt](src/lambo/resources/languages.txt). For each of these, two model variants are available:
- simple LAMBO, trained on the UD corpus
- pretrained LAMBO, same as above, but starting from weights pre-trained on unsupervised masked character prediction using multilingual corpora from [OSCAR](https://oscar-corpus.com/).
......@@ -43,9 +43,9 @@ Now you need to create a segmenter by providing the language your text is in, e.
```
lambo = Lambo.get('English')
```
This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from [languages.txt](src/lambo/data/languages.txt).
This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from [languages.txt](src/lambo/resources/languages.txt).
Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from [languages.txt](src/lambo/data/languages.txt):
Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from [languages.txt](src/lambo/resources/languages.txt):
```
lambo = Lambo.get('LAMBO-UD_Polish-PDB')
```
......@@ -136,4 +136,4 @@ If you use LAMBO in your research, please cite it as software:
## License
This project is licensed under the GNU General Public License v3.0.
\ No newline at end of file
This project is licensed under the GNU General Public License v3.0.
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment