Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
poldeepner2
Manage
Activity
Members
Labels
Plan
Issues
29
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Information extraction
poldeepner2
Commits
5ea7c9c4
There was an error fetching the commit references. Please try again later.
Commit
5ea7c9c4
authored
2 years ago
by
Michał Marcińczuk
Browse files
Options
Downloads
Patches
Plain Diff
Trim to long tokens.
parent
6da40619
1 merge request
!41
Dev v07
Pipeline
#6119
failed with stage
in 3 minutes and 4 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
poldeepner2/utils/sequences.py
+4
-1
4 additions, 1 deletion
poldeepner2/utils/sequences.py
with
4 additions
and
1 deletion
poldeepner2/utils/sequences.py
+
4
−
1
View file @
5ea7c9c4
...
...
@@ -112,6 +112,8 @@ class FeatureGenerator:
labels
=
[
"
O
"
]
*
len
(
tokens
)
for
word
,
label_1
in
zip
(
tokens
,
labels
):
subtokens
=
self
.
encode_method
(
word
.
strip
())
# Temporal hack to shorten token to max 14 subtokens
subtokens
=
subtokens
[:
6
]
if
len
(
subtokens
)
==
0
:
replacement
=
"
x
"
*
len
(
word
.
strip
())
logging
.
warning
(
f
"
Token
'
{
word
}
'
has no subwords. It was replaced with
'
{
replacement
}
'"
)
...
...
@@ -225,7 +227,8 @@ class FeatureGeneratorWindowContext(FeatureGenerator):
sentence_end
=
1
if
idx
+
1
==
len
(
sentence_tokens_features
.
tokens
)
else
0
if
token_features
.
length
()
+
1
>
self
.
max_segment_length
:
raise
Exception
(
"
Single token has move subtokens than the max_segment_length limit.
"
)
raise
Exception
(
f
"
Single token has more subtokens than the max_segment_length limit.
"
f
"
Token:
{
token_features
.
tokens
}
. Length:
{
token_features
.
length
()
}
"
)
if
token_features
.
length
()
+
segment_features
.
length
()
+
sentence_end
>
self
.
max_segment_length
:
segment_features
=
SegmentFeatures
()
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment