Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
C
combo
Manage
Activity
Members
Labels
Plan
Issues
20
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
2
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Syntactic Tools
combo
Commits
b6614125
There was an error fetching the commit references. Please try again later.
Commit
b6614125
authored
2 years ago
by
Maja Jabłońska
Committed by
Martyna Wiącek
2 years ago
Browse files
Options
Downloads
Patches
Plain Diff
Add get_slices_if_not_provided to data/dataset.py
parent
2b3c13dc
1 merge request
!46
Merge COMBO 3.0 into master
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
combo/data/dataset.py
+27
-1
27 additions, 1 deletion
combo/data/dataset.py
with
27 additions
and
1 deletion
combo/data/dataset.py
+
27
−
1
View file @
b6614125
import
logging
import
logging
from
combo
import
data
logger
=
logging
.
getLogger
(
__name__
)
logger
=
logging
.
getLogger
(
__name__
)
...
@@ -7,5 +8,30 @@ logger = logging.getLogger(__name__)
...
@@ -7,5 +8,30 @@ logger = logging.getLogger(__name__)
class
DatasetReader
:
class
DatasetReader
:
pass
pass
class
UniversalDependenciesDatasetReader
(
DatasetReader
):
class
UniversalDependenciesDatasetReader
(
DatasetReader
):
pass
pass
\ No newline at end of file
def
get_slices_if_not_provided
(
vocab
:
data
.
Vocabulary
):
if
hasattr
(
vocab
,
"
slices
"
):
return
vocab
.
slices
if
"
feats_labels
"
in
vocab
.
get_namespaces
():
idx2token
=
vocab
.
get_index_to_token_vocabulary
(
"
feats_labels
"
)
for
_
,
v
in
dict
(
idx2token
).
items
():
if
v
not
in
[
"
_
"
,
"
__PAD__
"
]:
empty_value
=
v
.
split
(
"
=
"
)[
0
]
+
"
=None
"
vocab
.
add_token_to_namespace
(
empty_value
,
"
feats_labels
"
)
slices
=
{}
for
idx
,
name
in
vocab
.
get_index_to_token_vocabulary
(
"
feats_labels
"
).
items
():
# There are 2 types features: with (Case=Acc) or without assigment (None).
# Here we group their indices by name (before assigment sign).
name
=
name
.
split
(
"
=
"
)[
0
]
if
name
in
slices
:
slices
[
name
].
append
(
idx
)
else
:
slices
[
name
]
=
[
idx
]
vocab
.
slices
=
slices
return
vocab
.
slices
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment