Skip to content
Snippets Groups Projects
Commit 4b260cd6 authored by Piotr Przybyła's avatar Piotr Przybyła
Browse files

Update README.md

parent 68e9be59
No related branches found
No related tags found
No related merge requests found
...@@ -43,9 +43,9 @@ Now you need to create a segmenter by providing the language your text is in, e. ...@@ -43,9 +43,9 @@ Now you need to create a segmenter by providing the language your text is in, e.
``` ```
lambo = Lambo.get('English') lambo = Lambo.get('English')
``` ```
This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from `[languages.txt](src/lambo/resources/languages.txt)`. This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from [`languages.txt`](src/lambo/resources/languages.txt).
Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from `[languages.txt](src/lambo/resources/languages.txt)`: Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from [`languages.txt`](src/lambo/resources/languages.txt):
``` ```
lambo = Lambo.get('LAMBO-UD_Polish-PDB') lambo = Lambo.get('LAMBO-UD_Polish-PDB')
``` ```
...@@ -121,9 +121,9 @@ You don't have to rely on the models trained so far in COMBO. You can use the in ...@@ -121,9 +121,9 @@ You don't have to rely on the models trained so far in COMBO. You can use the in
- `run_evaluation.py` -- evaluate existing models using UD gold standard. - `run_evaluation.py` -- evaluate existing models using UD gold standard.
Note that you can also extend LAMBO by modifying the data files that specify string that will be treated specially: Note that you can also extend LAMBO by modifying the data files that specify string that will be treated specially:
- `[emoji.tab](src/lambo/resources/emoji.tab)` includes a list of emojis (they will always be treated as separate tokens), - [`emoji.tab`](src/lambo/resources/emoji.tab) includes a list of emojis (they will always be treated as separate tokens),
- `[pauses.txt](src/lambo/resources/pauses.txt)` include a list of verbal pauses (they will also be separated, but not split), - [`pauses.txt`](src/lambo/resources/pauses.txt) include a list of verbal pauses (they will also be separated, but not split),
- `[turn_regexp.txt](src/lambo/resources/turn_regexp.txt)` enumerates regular expressions used to split turns (such as double newline), - [`turn_regexp.txt`](src/lambo/resources/turn_regexp.txt) enumerates regular expressions used to split turns (such as double newline),
## Credits ## Credits
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment