Commit ebc8136b authored by Grzegorz Kostkowski's avatar Grzegorz Kostkowski

Implement annotations module together with tests and docker envs

The purpose of new module is to provide high-level functions for
reading CCL annotations and provide easy way to get them based
on various needs.

Scope of changes:
- implement annotations module (annotations.py)
- provide test data and implement tests (test_annotations.py)
- prepare Makefile as make can serve as unified dev / test / ci environment
  (tox is not an option as does not handle OS dependencies (or I don't
  know it)
- change .gitlab-ci.yml to use make and images defined in this repo
- provide examples of usage in README
parent 3bed04eb
image: clarinpl/python:3.6
before_script:
- pip install tox==2.9.1
cache:
paths:
- .tox
stages:
# - check_style
- push_wheel
- test
- deploy
# pep8:
# stage: check_style
# script:
# - tox -v -e pep8
#
# docstyle:
# stage: check_style
# script:
# - tox -v -e docstyle
test:
stage: test
image: docker:18.09.7
services:
- docker:18.09.7-dind
before_script:
- apk --no-cache add make
- make build-test-env
script:
- make test check-types
push_wheel:
stage: deploy
image: docker:18.09.7
services:
- docker:18.09.7-dind
before_script:
- pip3.6 install twine
- apk --no-cache add make
- make build-prod-env
only:
- master
stage: push_wheel
- master
when: on_success
script:
- python3.6 setup.py sdist bdist_wheel
- python3.6 -m twine upload
--repository-url https://pypi.clarin-pl.eu/
-u $PIPY_USER -p $PIPY_PASS dist/cclutils*.whl
- make deploy
MAKEFLAGS += --no-print-directory
# help: cclutils Makefile help
# help: help
# help:... display this makefile's help information
.PHONY: help
help:
@grep "^# help\:" Makefile | sed 's/\# help\: *//;s/^...\s\+/\t\t/'
# help: build-env
# help:... build container with installed cclutils together with (OS and
# help:... python) dependencies
.PHONY: build-env
build-env:
docker build . -f docker/Dockerfile -t cclutils-base
# help: rebuild-env
# help:... rebuild container with installed cclutils together with (OS and
# help:... python) dependencies
.PHONY: rebuild-env
rebuild-env:
docker build -f docker/Dockerfile --no-cache -t cclutils-base
# help: build-prod-env
# help:... build production container (used for CI/CD deploy)
.PHONY: build-prod-env
build-prod-env: build-env
docker build . -f docker/prod.Dockerfile -t cclutils-prod
.PHONY: deploy
deploy:
@docker run \
-e PIPY_USER \
-e PIPY_PASS \
--rm \
-t \
cclutils-prod bash -c \
'python3.6 setup.py sdist bdist_wheel && python3.6 -m twine upload --repository-url https://pypi.clarin-pl.eu/ -u $(PIPY_USER) -p $(PIPY_PASS) dist/cclutils*.whl'
# help: build-test-env
# help:... build test container (use cache if built already)
.PHONY: build-test-env
build-test-env: build-env
docker build . -f docker/test.Dockerfile -t cclutils-test
# help: rebuild-test-env
# help:... rebuild test container (no cache)
.PHONY: rebuild-test-env
rebuild-test-env: rebuild-env
docker build . -f docker/test.Dockerfile -t cclutils-test --no-cache
# help: test
# help:... run tests inside the container
# help:... need to run 'build-test-env' task at least at the first time
.PHONY: test
test:
docker run --rm -t cclutils-test pytest
# help: check-types
# help:... check type hint annotations
.PHONY: check-types
check-types:
docker run --rm -t cclutils-test \
bash -c 'cd /home/install/cclutils; mypy -p extras --ignore-missing-imports'
# help: check-types-dev
# help:... check type hint annotations, mounts current version of code
.PHONY: check-types-dev
check-types-dev:
docker run --rm -t -v $(PWD)/cclutils:/home/install/cclutils cclutils-test \
bash -c 'cd /home/install/cclutils; mypy -p extras --ignore-missing-imports'
# help: test-dev
# help:... run tests inside the container (without rebuilding), mounts current
# help:... version of tests. To enable pudb (or pass other flags) run "make
# help:... flags=--pudb test-dev"
.PHONY: test-dev
test-dev:
docker run -t -v $(PWD)/tests:/home/install/tests cclutils-test pytest $(flags)
# help: ipython-dev
# help:... launch ipython inside the container for developing purposes;
# help:... mounting ipython directory allows to keep ipython history from
# help:... previous calls
.PHONY: ipython-dev
ipython-dev:
mkdir -p $(PWD)/.dev_ipython && docker run -it \
-v $(PWD)/tests:/home/install/tests \
-v $(PWD)/.dev_ipython:/root/.ipython \
cclutils-test ipython
......@@ -194,4 +194,248 @@ sentences = (sentence for paragraph in document.paragraphs()
for sentence in sentences:
print(cclutils.sentence2str(sentence))
```
\ No newline at end of file
```
Reading annotations
===================
Extracting annotations from CCL document is available with
`cclutils.extras.annotations` module built at the top of the core ``cclutils``
functionality.
The main function of this module is ``get_document_annotations`` which reads
annotations from CCL document (from file or ``corpus2.DocumentPtr`` object).
```python
from cclutils.extras.annotations import get_document_annotations
```
The annotations are organized with use of two classes:
1. ``AnnotatedExpression``: represents single annotation (annotated expression),
located in specified paragraph and sentence. Module supports annotations
describing single word and multiword expressions (more than one token).
1. ``DocumentAnnotations``: keeps annotations of entire document, provides
methods to facilitate gathering and accessing annotations.
#### Read annotations of a given document
1. Read all annotations
```python
>>> anns = get_document_annotations(cclutils.read('tests/data/ccl02.xml'))
>>> anns
<DocumentAnnotations for 10 annotated expressions: [<AnnotatedExpression for
annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla',
'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>, <AnnotatedExpression for
annotation 'room_type': 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa',
'osoba') at position: ch1>s1>t1,t2,t3>, <AnnotatedExpression for annotation
'region': 'region:('Gdańsk',)'; ('Gdańsk',) at position: ch1>s1>t4>,
<AnnotatedExpression for annotation 'attraction': 'attraction:('Hotel',)';
('hotel',) at position: ch2>s2>t0>, <AnnotatedExpression for annotation
'hotel_name': 'hotel_name:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>,
<AnnotatedExpression for annotation 'food': 'food:('śniadaniem',)';
('śniadanie',) at position: ch2>s2>t3>, <AnnotatedExpression for annotation
'room_type': 'room_type:('łazienką',)'; ('łazienka',) at position: ch2>s2>t7>,
<AnnotatedExpression for annotation 'designation': 'designation:('dla',
'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>, <AnnotatedExpression
for annotation 'attraction': 'attraction:('spa',)'; ('spa',) at position:
ch2>s2>t13>, <AnnotatedExpression for annotation 'food': 'food:('pełnym',
'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>]>
```
1. Read only specified annotations
```python
>>> anns = get_document_annotations(cclutils.read('tests/data/ccl02.xml'), annotations={'designation'})
>>> anns
<DocumentAnnotations for 2 annotated expressions: [<AnnotatedExpression for
annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla',
'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>, <AnnotatedExpression for
annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko')
at position: ch2>s2>t10,t11>]>
```
#### Get annotations in one of preferred forms
1. Get annotations index containing full information about annotations
* key is a tuple containing following values: (annotation channel name,
sentence id, paragraph id, channel numeric value)
```python
>>> anns.expressions_index
defaultdict(list,
{('designation',
's1',
'ch1',
1): <AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
('room_type',
's1',
'ch1',
1): <AnnotatedExpression for annotation 'room_type': 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
('region',
's1',
'ch1',
1): <AnnotatedExpression for annotation 'region': 'region:('Gdańsk',)'; ('Gdańsk',) at position: ch1>s1>t4>,
('attraction',
's2',
'ch2',
1): <AnnotatedExpression for annotation 'attraction': 'attraction:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>,
('hotel_name',
's2',
'ch2',
1): <AnnotatedExpression for annotation 'hotel_name': 'hotel_name:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>,
('food',
's2',
'ch2',
1): <AnnotatedExpression for annotation 'food': 'food:('śniadaniem',)'; ('śniadanie',) at position: ch2>s2>t3>,
('room_type',
's2',
'ch2',
1): <AnnotatedExpression for annotation 'room_type': 'room_type:('łazienką',)'; ('łazienka',) at position: ch2>s2>t7>,
('designation',
's2',
'ch2',
1): <AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>,
('attraction',
's2',
'ch2',
2): <AnnotatedExpression for annotation 'attraction': 'attraction:('spa',)'; ('spa',) at position: ch2>s2>t13>,
('food',
's2',
'ch2',
2): <AnnotatedExpression for annotation 'food': 'food:('pełnym', 'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>})
```
1. Get annotations grouped by annotation channel name, in one of formats:
* annotation object
* orths
* preferred lexemes
* annotation base lemma
```python
>>> anns.group_by_chan_name()
defaultdict(list,
{'designation': [<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>],
'room_type': [<AnnotatedExpression for annotation 'room_type': 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
<AnnotatedExpression for annotation 'room_type': 'room_type:('łazienką',)'; ('łazienka',) at position: ch2>s2>t7>],
'region': [<AnnotatedExpression for annotation 'region': 'region:('Gdańsk',)'; ('Gdańsk',) at position: ch1>s1>t4>],
'attraction': [<AnnotatedExpression for annotation 'attraction': 'attraction:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>,
<AnnotatedExpression for annotation 'attraction': 'attraction:('spa',)'; ('spa',) at position: ch2>s2>t13>],
'hotel_name': [<AnnotatedExpression for annotation 'hotel_name': 'hotel_name:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>],
'food': [<AnnotatedExpression for annotation 'food': 'food:('śniadaniem',)'; ('śniadanie',) at position: ch2>s2>t3>,
<AnnotatedExpression for annotation 'food': 'food:('pełnym', 'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>]})
>>> anns.group_by_chan_name(as_orths=True)
defaultdict(list,
{'designation': [('dla', 'dwóch', 'osób'), ('dla', 'dzieci')],
'room_type': [('dla', 'dwóch', 'osób'), ('łazienką',)],
'region': [('Gdańsk',)],
'attraction': [('Hotel',), ('spa',)],
'hotel_name': [('Hotel',)],
'food': [('śniadaniem',), ('pełnym', 'wyżywieniem')]})
>>> anns.group_by_chan_name(as_lexemes=True)
defaultdict(list,
{'designation': [('dla', 'dwa', 'osoba'), ('dla', 'dziecko')],
'room_type': [('dla', 'dwa', 'osoba'), ('łazienka',)],
'region': [('Gdańsk',)],
'attraction': [('hotel',), ('spa',)],
'hotel_name': [('hotel',)],
'food': [('śniadanie',), ('pełny', 'wyżywienie')]})
>>> anns.group_by_chan_name(as_ann_base=True)
defaultdict(list,
{'designation': ['dla dwóch osób', 'dla dziecka'],
'room_type': ['dla dwóch osób', 'łazienka'],
'region': [''],
'attraction': ['hotel', 'spa'],
'hotel_name': ['Hotel'],
'food': ['śniadanie', 'pełne wyżywienie']})
```
1. Get annotations grouped by token (token position), in one of formats (usage
same as in case of ``group_by_chan_name`` method):
* annotation object
* orths
* preferred lexemes
* annotation base lemma
```python
>>> anns.group_by_token()
{(1,
's1',
'ch1'): [<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>, <AnnotatedExpression for annotation 'room_type'
: 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>],
(2,
's1',
'ch1'): [<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>, <AnnotatedExpression for annotation 'room_type'
: 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>],
(3,
's1',
'ch1'): [<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>, <AnnotatedExpression for annotation 'room_type'
: 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>],
(4,
's1',
'ch1'): [<AnnotatedExpression for annotation 'region': 'region:('Gdańsk',)'; ('Gdańsk',) at position: ch1>s1>t4>],
(0,
's2',
'ch2'): [<AnnotatedExpression for annotation 'attraction': 'attraction:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>, <AnnotatedExpression for annotation 'hotel_name': 'hotel_name:('Hotel',)'; ('hotel
',) at position: ch2>s2>t0>],
(3,
's2',
'ch2'): [<AnnotatedExpression for annotation 'food': 'food:('śniadaniem',)'; ('śniadanie',) at position: ch2>s2>t3>],
(7,
's2',
'ch2'): [<AnnotatedExpression for annotation 'room_type': 'room_type:('łazienką',)'; ('łazienka',) at position: ch2>s2>t7>],
(10,
's2',
'ch2'): [<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>],
(11,
's2',
'ch2'): [<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>],
(13,
's2',
'ch2'): [<AnnotatedExpression for annotation 'attraction': 'attraction:('spa',)'; ('spa',) at position: ch2>s2>t13>],
(17,
's2',
'ch2'): [<AnnotatedExpression for annotation 'food': 'food:('pełnym', 'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>],
(18,
's2',
'ch2'): [<AnnotatedExpression for annotation 'food': 'food:('pełnym', 'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>]}
```
1. Get annotations grouped by token, with original document order (tokens
order):
```python
>>> anns.group_by_token(retain_order=True)
OrderedDict([((1, 's1', 'ch1'),
[<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
<AnnotatedExpression for annotation 'room_type': 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>]),
((2, 's1', 'ch1'),
[<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
<AnnotatedExpression for annotation 'room_type': 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>]),
((3, 's1', 'ch1'),
[<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>,
<AnnotatedExpression for annotation 'room_type': 'room_type:('dla', 'dwóch', 'osób')'; ('dla', 'dwa', 'osoba') at position: ch1>s1>t1,t2,t3>]),
((4, 's1', 'ch1'),
[<AnnotatedExpression for annotation 'region': 'region:('Gdańsk',)'; ('Gdańsk',) at position: ch1>s1>t4>]),
((0, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'attraction': 'attraction:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>,
<AnnotatedExpression for annotation 'hotel_name': 'hotel_name:('Hotel',)'; ('hotel',) at position: ch2>s2>t0>]),
((3, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'food': 'food:('śniadaniem',)'; ('śniadanie',) at position: ch2>s2>t3>]),
((7, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'room_type': 'room_type:('łazienką',)'; ('łazienka',) at position: ch2>s2>t7>]),
((10, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>]),
((11, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'designation': 'designation:('dla', 'dzieci')'; ('dla', 'dziecko') at position: ch2>s2>t10,t11>]),
((13, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'attraction': 'attraction:('spa',)'; ('spa',) at position: ch2>s2>t13>]),
((17, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'food': 'food:('pełnym', 'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>]),
((18, 's2', 'ch2'),
[<AnnotatedExpression for annotation 'food': 'food:('pełnym', 'wyżywieniem')'; ('pełny', 'wyżywienie') at position: ch2>s2>t17,t18>])])
```
#### Get token by token position
1. When using above methods, you may want to get ``corpus2.Token`` object
referenced by position:
```python
>>> anns.token_by_position_index[(17, 's2', 'ch2')]
<corpus2.Token; proxy of <Swig Object of type 'Corpus2::Token *' at 0x7f71edfced80> >
```
\ No newline at end of file
"""
Package contains extras extending base functionality of `cclutils`,
utilizing core `cclutils` functions.
"""
#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-
"""
Module provide easy way to read CCL annotations.
"""
from collections import defaultdict, OrderedDict
from typing import Any, Dict, Iterable, List, Set, Optional, Tuple, Union
import cclutils as ccl
from corpus2 import DocumentPtr, Tagset, Token
__all__ = ["AnnotatedExpression", "DocumentAnnotations", "get_document_annotations"]
AnnRepr = Union[Tuple[str, ...], str, "AnnotatedExpression"]
TagsetRepr = Union[str, Tagset]
TokenPosition = Tuple[int, str, str]
class AnnotatedExpression(object):
"""
Representation of annotated expression in CCL document. Consists of one
or more tokens and contains information about name of annotation.
Note: for multiword annotations, supports only such annotations related
with adjecent tokens.
"""
def __init__(
self,
token: Token,
ann_name: str,
tok_position: TokenPosition,
tagset: Optional[TagsetRepr] = "nkjp",
base_ann_name: Optional[str] = None,
doc: Optional[DocumentPtr] = None,
) -> None:
"""
Initialize with single token.
More tokens (in case of mwe expression) can be added later with `append`
method.
Args:
token: corpus2 token instance (reference).
ann_name: name of annotation (annotation channel).
tok_position: position of `token` in the document (tok_sent_idx,
sent_id, par_id).
tagset: name of `Tagset` object, defaults to 'nkjp'.
base_ann_name: name of property stroring base form of annotation.
If not given then '{ann_name}_base' will be used as base prop name.
doc: related CCL document.
"""
self._tokens: List[Token] = [token]
self._ann_name = ann_name
self._base_ann_name = base_ann_name
self._pref_lex: Optional[Tuple[str, ...]] = None
self._tok_lemmas: Optional[Tuple[str, ...]] = None
self._base_ann_lemma: Optional[str] = None
self._doc = doc
if isinstance(tagset, str):
tagset = ccl.get_tagset(tagset)
self.tagset = tagset
self.toks_ids = set([tok_position[0]])
self.sent_id = tok_position[1]
self.par_id = tok_position[2]
@property
def annotation_name(self) -> str:
"""
Name of annotation channel.
"""
return self._ann_name
@property
def base_annotation_name(self) -> str:
"""
Name of property with base form of annotation.
Defaults to annotation name with appended '_base'.
"""
if not self._base_ann_name:
self._base_ann_name = f"{self.annotation_name}_base"
return self._base_ann_name
@property
def length(self) -> int:
"""
Returns length of annotated phrase (number of tokens).
"""
return len(self._tokens)
@property
def position(self) -> Tuple[Iterable[int], str, str]:
"""
Returns 'coordinates' (position) of token in the document.
Such position is composed of identifiers of paragraph, sentence and
token (index in sentence as tokens does not have real identifiers).
Such position allows to identify token in the document.
"""
toks_ids = tuple(sorted(self.toks_ids))
return (toks_ids, self.sent_id, self.par_id)
@property
def tokens_pref_lexemes(self) -> Tuple[str, ...]:
"""
Returns preferred lexemes (preferred) of tokens referred by this annotation.
"""
if self._pref_lex is None:
self._pref_lex = _tokens_pref_lexemes(self._tokens, self.tagset)
return self._pref_lex
@property
def tokens_pref_lexemes_lowered(self) -> Tuple[str, ...]:
"""
Returns lowered preferred lexemes (preferred) of tokens referred by this annotation.
"""
return tuple(l.lower() for l in self.tokens_pref_lexemes)
@property
def tokens_orths(self) -> Tuple[str, ...]:
"""
Returns orths (original text forms) of tokens referred by this annotation.
"""
if self._tok_lemmas is None:
self._tok_lemmas = tuple(t.orth_utf8() for t in self._tokens)
return self._tok_lemmas
@property
def base_annotation_lemma(self) -> str:
"""
Returns base lemma for annotation.
Looks for `self.base_annotation_name` property in included tokens.
Returns:
base lemma or empty string ('') if not found.
"""
if self._base_ann_lemma is None:
i = 0
while not self._base_ann_lemma and i < len(self._tokens):
t = self._tokens[i]
if t.has_metadata():
md = t.get_metadata()
if md.has_attribute(self.base_annotation_name):
self._base_ann_lemma = md.get_attribute(
self.base_annotation_name
)
i += 1
if self._base_ann_lemma is None:
self._base_ann_lemma = "" # there is no base annotation
return self._base_ann_lemma
@property
def position_str(self) -> str:
"""
Returns textual representation of token position.
"""
indexes, sent, par = self.position
return f"{par}:{sent}:{','.join(sorted(['t' + str(i) for i in indexes]))}"
def append(self, token: Token, tok_position: TokenPosition) -> None:
"""
Extends annotation object by including next token belonging to that annotation.
Args:
tok_position: (tok_sent_idx, sent_id, par_id)
"""
self._check_position(*tok_position)
self._tokens.append(token)
self.toks_ids.add(tok_position[0])
def get_alt_repr(
self,
as_orths: bool = False,
as_lexemes: bool = False,
as_ann_base: bool = False,
) -> Any:
"""
Utility method to get annotation in a one of possible representations:
1) as an `AnnotatedExpression` instance,
2) as a tuple of orths,
3) as a tuple of preferred lexemes,
4) as a base of annotation (if specified); may differ from 3) in case of
mwe.
Args:
as_orths: returns orths instead of `AnnotatedExpression` instance.
<