Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
C
combo
Manage
Activity
Members
Labels
Plan
Issues
20
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
2
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Syntactic Tools
combo
Commits
31c595f8
There was an error fetching the commit references. Please try again later.
Commit
31c595f8
authored
5 years ago
by
Mateusz Klimaszewski
Browse files
Options
Downloads
Patches
Plain Diff
Handle multi word tokens during dataset reading.
parent
0b7636f8
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
combo/data/dataset.py
+4
-3
4 additions, 3 deletions
combo/data/dataset.py
with
4 additions
and
3 deletions
combo/data/dataset.py
+
4
−
3
View file @
31c595f8
...
...
@@ -79,12 +79,13 @@ class UniversalDependenciesDatasetReader(allen_data.DatasetReader):
@overrides
def
text_to_instance
(
self
,
tree
:
conllu
.
TokenList
)
->
allen_data
.
Instance
:
fields_
:
Dict
[
str
,
allen_data
.
Field
]
=
{}
tree_tokens
=
[
t
for
t
in
tree
if
isinstance
(
t
[
"
id
"
],
int
)]
tokens
=
[
_Token
(
t
[
"
token
"
],
pos_
=
t
.
get
(
"
upostag
"
),
tag_
=
t
.
get
(
"
xpostag
"
),
lemma_
=
t
.
get
(
"
lemma
"
),
feats_
=
t
.
get
(
"
feats
"
))
for
t
in
tree
if
isinstance
(
t
[
"
id
"
],
int
)
]
for
t
in
tree
_tokens
]
# features
text_field
=
allen_fields
.
TextField
(
tokens
,
self
.
_token_indexers
)
...
...
@@ -94,12 +95,12 @@ class UniversalDependenciesDatasetReader(allen_data.DatasetReader):
if
self
.
generate_labels
:
for
target_name
in
self
.
_targets
:
if
target_name
!=
"
sent
"
:
target_values
=
[
t
[
target_name
]
for
t
in
tree
.
tokens
]
target_values
=
[
t
[
target_name
]
for
t
in
tree
_
tokens
]
if
target_name
==
"
lemma
"
:
target_values
=
[
allen_data
.
Token
(
v
)
for
v
in
target_values
]
fields_
[
target_name
]
=
allen_fields
.
TextField
(
target_values
,
self
.
_lemma_indexers
)
elif
target_name
==
"
feats
"
:
target_values
=
self
.
_feat_values
(
tree
)
target_values
=
self
.
_feat_values
(
tree
_tokens
)
fields_
[
target_name
]
=
fields
.
SequenceMultiLabelField
(
target_values
,
self
.
_feats_to_index_multi_label
,
text_field
,
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment