Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
poldeepner2
Manage
Activity
Members
Labels
Plan
Issues
29
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Information extraction
poldeepner2
Commits
83dcdfdf
There was an error fetching the commit references. Please try again later.
Commit
83dcdfdf
authored
2 years ago
by
Michał Marcińczuk
Browse files
Options
Downloads
Patches
Plain Diff
Trim to long tokens.
parent
5ea7c9c4
1 merge request
!41
Dev v07
Pipeline
#6120
failed with stage
in 1 minute and 47 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
poldeepner2/utils/sequences.py
+3
-2
3 additions, 2 deletions
poldeepner2/utils/sequences.py
with
3 additions
and
2 deletions
poldeepner2/utils/sequences.py
+
3
−
2
View file @
83dcdfdf
...
@@ -112,8 +112,9 @@ class FeatureGenerator:
...
@@ -112,8 +112,9 @@ class FeatureGenerator:
labels
=
[
"
O
"
]
*
len
(
tokens
)
labels
=
[
"
O
"
]
*
len
(
tokens
)
for
word
,
label_1
in
zip
(
tokens
,
labels
):
for
word
,
label_1
in
zip
(
tokens
,
labels
):
subtokens
=
self
.
encode_method
(
word
.
strip
())
subtokens
=
self
.
encode_method
(
word
.
strip
())
# Temporal hack to shorten token to max 14 subtokens
if
len
(
subtokens
)
>
6
:
subtokens
=
subtokens
[:
6
]
logging
.
warning
(
f
"
Token
{
word
}
was truncated to 6 subtokens:
{
subtokens
}
"
)
subtokens
=
subtokens
[:
6
]
if
len
(
subtokens
)
==
0
:
if
len
(
subtokens
)
==
0
:
replacement
=
"
x
"
*
len
(
word
.
strip
())
replacement
=
"
x
"
*
len
(
word
.
strip
())
logging
.
warning
(
f
"
Token
'
{
word
}
'
has no subwords. It was replaced with
'
{
replacement
}
'"
)
logging
.
warning
(
f
"
Token
'
{
word
}
'
has no subwords. It was replaced with
'
{
replacement
}
'"
)
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment