diff --git a/README.md b/README.md index 8d71352e2f98d3ae5d45e79c308731572d067f26..a84aa0ad9329aa4fcd52acf22f0a7272a2b9379b 100644 --- a/README.md +++ b/README.md @@ -43,9 +43,9 @@ Now you need to create a segmenter by providing the language your text is in, e. ``` lambo = Lambo.get('English') ``` -This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from `[languages.txt](src/lambo/resources/languages.txt)`. +This will (if necessary) download the appropriate model from the online repository and load it. Note that you can use any language name (e.g. `Ancient_Greek`) or ISO 639-1 code (e.g. `fi`) from [`languages.txt`](src/lambo/resources/languages.txt). -Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from `[languages.txt](src/lambo/resources/languages.txt)`: +Alternatively, you can select a specific model by defining LAMBO variant (`LAMBO` or `LAMBO_no_pretraining`) and training dataset from [`languages.txt`](src/lambo/resources/languages.txt): ``` lambo = Lambo.get('LAMBO-UD_Polish-PDB') ``` @@ -121,9 +121,9 @@ You don't have to rely on the models trained so far in COMBO. You can use the in - `run_evaluation.py` -- evaluate existing models using UD gold standard. Note that you can also extend LAMBO by modifying the data files that specify string that will be treated specially: -- `[emoji.tab](src/lambo/resources/emoji.tab)` includes a list of emojis (they will always be treated as separate tokens), -- `[pauses.txt](src/lambo/resources/pauses.txt)` include a list of verbal pauses (they will also be separated, but not split), -- `[turn_regexp.txt](src/lambo/resources/turn_regexp.txt)` enumerates regular expressions used to split turns (such as double newline), +- [`emoji.tab`](src/lambo/resources/emoji.tab) includes a list of emojis (they will always be treated as separate tokens), +- [`pauses.txt`](src/lambo/resources/pauses.txt) include a list of verbal pauses (they will also be separated, but not split), +- [`turn_regexp.txt`](src/lambo/resources/turn_regexp.txt) enumerates regular expressions used to split turns (such as double newline), ## Credits