Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
C
combo
Manage
Activity
Members
Labels
Plan
Issues
20
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
2
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Syntactic Tools
combo
Commits
a08dc86d
There was an error fetching the commit references. Please try again later.
Commit
a08dc86d
authored
1 year ago
by
Maja Jablonska
Browse files
Options
Downloads
Patches
Plain Diff
Add multiword support to lambo_tokenizer.py
parent
7ca1bc2a
1 merge request
!46
Merge COMBO 3.0 into master
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
combo/data/tokenizers/lambo_tokenizer.py
+8
-1
8 additions, 1 deletion
combo/data/tokenizers/lambo_tokenizer.py
with
8 additions
and
1 deletion
combo/data/tokenizers/lambo_tokenizer.py
+
8
−
1
View file @
a08dc86d
...
@@ -43,9 +43,16 @@ class LamboTokenizer(Tokenizer):
...
@@ -43,9 +43,16 @@ class LamboTokenizer(Tokenizer):
document
=
self
.
__tokenizer
.
segment
(
text
)
document
=
self
.
__tokenizer
.
segment
(
text
)
sentences
=
[]
sentences
=
[]
sentence_tokens
=
[]
for
turn
in
document
.
turns
:
for
turn
in
document
.
turns
:
for
sentence
in
turn
.
sentences
:
for
sentence
in
turn
.
sentences
:
sentences
.
append
([
t
.
text
for
t
in
sentence
.
tokens
])
sentence_tokens
=
[]
for
token
in
sentence
.
tokens
:
if
len
(
token
.
subwords
)
>
0
:
sentence_tokens
.
extend
([
s
for
s
in
token
.
subwords
])
else
:
sentence_tokens
.
append
(
token
.
text
)
sentences
.
append
(
sentence_tokens
)
return
sentences
return
sentences
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment