Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
poldeepner2
Manage
Activity
Members
Labels
Plan
Issues
29
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Information extraction
poldeepner2
Commits
e4006d1a
Commit
e4006d1a
authored
2 years ago
by
Michał Marcińczuk
Browse files
Options
Downloads
Patches
Plain Diff
For token transformed into a large number of subtokens try to tokenize lowered form.
parent
83dcdfdf
Branches
Branches containing commit
1 merge request
!41
Dev v07
Pipeline
#6121
failed with stage
Stage:
in 1 minute and 46 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
poldeepner2/utils/sequences.py
+2
-0
2 additions, 0 deletions
poldeepner2/utils/sequences.py
with
2 additions
and
0 deletions
poldeepner2/utils/sequences.py
+
2
−
0
View file @
e4006d1a
...
...
@@ -112,6 +112,8 @@ class FeatureGenerator:
labels
=
[
"
O
"
]
*
len
(
tokens
)
for
word
,
label_1
in
zip
(
tokens
,
labels
):
subtokens
=
self
.
encode_method
(
word
.
strip
())
if
len
(
subtokens
)
>
6
:
subtokens
=
self
.
encode_method
(
word
.
strip
().
lower
())
if
len
(
subtokens
)
>
6
:
logging
.
warning
(
f
"
Token
{
word
}
was truncated to 6 subtokens:
{
subtokens
}
"
)
subtokens
=
subtokens
[:
6
]
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment