Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
C
combo
Manage
Activity
Members
Labels
Plan
Issues
20
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
2
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Syntactic Tools
combo
Commits
f655a079
Commit
f655a079
authored
1 year ago
by
Maja Jablonska
Browse files
Options
Downloads
Patches
Plain Diff
Small api.py fixes
parent
c7327132
1 merge request
!46
Merge COMBO 3.0 into master
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
combo/data/api.py
+2
-31
2 additions, 31 deletions
combo/data/api.py
with
2 additions
and
31 deletions
combo/data/api.py
+
2
−
31
View file @
f655a079
...
@@ -8,35 +8,6 @@ from combo.data.tokenizers import Token
...
@@ -8,35 +8,6 @@ from combo.data.tokenizers import Token
import
conllu
import
conllu
from
overrides
import
overrides
from
overrides
import
overrides
# Moze NER moglby uzywac tej 11tej kolumny?
@dataclass
class
OldToken
:
id
:
Optional
[
Union
[
int
,
Tuple
]]
=
None
# czemu tuple? multiwordy?
token
:
Optional
[
str
]
=
None
lemma
:
Optional
[
str
]
=
None
upostag
:
Optional
[
str
]
=
None
xpostag
:
Optional
[
str
]
=
None
feats
:
Optional
[
str
]
=
None
head
:
Optional
[
int
]
=
None
# Identyfikator innego tokena, ktory jest nadrzednikiem, drzewo zaleznosciowe
deprel
:
Optional
[
str
]
=
None
deps
:
Optional
[
str
]
=
None
misc
:
Optional
[
str
]
=
None
# wszystko, najczesciej czy jest spacja (np. po "spi" w "spi." nie m spacji)
# nie predykujemy tego, to jest robione na poziomie tokenizera
# czasem wpisuja sie tam tez dodatkowe informacje, np. teksty z transliteracjami
# to jest w formacie conllu
semrel
:
Optional
[
str
]
=
None
# w conllu w formacie 10kolumnowym tego nie ma
# ale sa pomysly, zeby semantyke podawac jako kolejna kolumne
# moze ja zostawmy
# np. jesli mamy okoliczniki, to deprel to "adjunct", np. "w lozeczku" mamy okolicznik,
# ale nie mamy informacji o tym, ze jest to okolicznyk miejsca, i to mogloby byc w tym polu
# i tu pojawilaby sie informacja "locative"
# Niestety so far zle sie to predykuje
# Zostawic na przyszlosc, ale musi byc calkowicie opcjonalna!!!
# Nie powinno byc predykcji tego z defaultu, bo walidatory nie chca miec 11 kolumn.
embeddings
:
Dict
[
str
,
List
[
float
]]
=
field
(
default_factory
=
list
,
repr
=
False
)
@dataclass
@dataclass
class
Sentence
:
class
Sentence
:
tokens
:
List
[
Token
]
=
field
(
default_factory
=
list
)
tokens
:
List
[
Token
]
=
field
(
default_factory
=
list
)
...
@@ -54,14 +25,14 @@ class Sentence:
...
@@ -54,14 +25,14 @@ class Sentence:
return
len
(
self
.
tokens
)
return
len
(
self
.
tokens
)
class
_TokenList
(
conllu
.
TokenList
):
class
_TokenList
(
conllu
.
models
.
TokenList
):
@overrides
@overrides
def
__repr__
(
self
):
def
__repr__
(
self
):
return
'
TokenList<
'
+
'
,
'
.
join
(
token
[
'
text
'
]
for
token
in
self
)
+
'
>
'
return
'
TokenList<
'
+
'
,
'
.
join
(
token
[
'
text
'
]
for
token
in
self
)
+
'
>
'
def
sentence2conllu
(
sentence
:
Sentence
,
keep_semrel
:
bool
=
True
)
->
conllu
.
TokenList
:
def
sentence2conllu
(
sentence
:
Sentence
,
keep_semrel
:
bool
=
True
)
->
conllu
.
models
.
TokenList
:
tokens
=
[]
tokens
=
[]
for
token
in
sentence
.
tokens
:
for
token
in
sentence
.
tokens
:
token_dict
=
collections
.
OrderedDict
(
token
.
as_dict
(
keep_semrel
))
token_dict
=
collections
.
OrderedDict
(
token
.
as_dict
(
keep_semrel
))
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment