Skip to content
Snippets Groups Projects
Commit 4b260cd6 authored by Piotr Przybyła's avatar Piotr Przybyła
Browse files

Update README.md

parent 68e9be59
No related merge requests found
...@@ -43,9 +43,9 @@ Now you need to create a segmenter by providing the language your text is in, e. ...@@ -43,9 +43,9 @@ Now you need to create a segmenter by providing the language your text is in, e.
``` ```
lambo = Lambo.get('English') lambo = Lambo.get('English')
``` ```
This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from `[languages.txt](src/lambo/resources/languages.txt)`. This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from [`languages.txt`](src/lambo/resources/languages.txt).
Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from `[languages.txt](src/lambo/resources/languages.txt)`: Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from [`languages.txt`](src/lambo/resources/languages.txt):
``` ```
lambo = Lambo.get('LAMBO-UD_Polish-PDB') lambo = Lambo.get('LAMBO-UD_Polish-PDB')
``` ```
...@@ -121,9 +121,9 @@ You don't have to rely on the models trained so far in COMBO. You can use the in ...@@ -121,9 +121,9 @@ You don't have to rely on the models trained so far in COMBO. You can use the in
- `run_evaluation.py` -- evaluate existing models using UD gold standard. - `run_evaluation.py` -- evaluate existing models using UD gold standard.
Note that you can also extend LAMBO by modifying the data files that specify string that will be treated specially: Note that you can also extend LAMBO by modifying the data files that specify string that will be treated specially:
- `[emoji.tab](src/lambo/resources/emoji.tab)` includes a list of emojis (they will always be treated as separate tokens), - [`emoji.tab`](src/lambo/resources/emoji.tab) includes a list of emojis (they will always be treated as separate tokens),
- `[pauses.txt](src/lambo/resources/pauses.txt)` include a list of verbal pauses (they will also be separated, but not split), - [`pauses.txt`](src/lambo/resources/pauses.txt) include a list of verbal pauses (they will also be separated, but not split),
- `[turn_regexp.txt](src/lambo/resources/turn_regexp.txt)` enumerates regular expressions used to split turns (such as double newline), - [`turn_regexp.txt`](src/lambo/resources/turn_regexp.txt) enumerates regular expressions used to split turns (such as double newline),
## Credits ## Credits
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment