Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
C
combo
Manage
Activity
Members
Labels
Plan
Issues
20
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
2
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Syntactic Tools
combo
Commits
11e898eb
Commit
11e898eb
authored
1 year ago
by
Maja Jablonska
Browse files
Options
Downloads
Patches
Plain Diff
Fix sentence IDs in turns
parent
b7d60405
Branches
Branches containing commit
No related merge requests found
Pipeline
#16841
passed with stage
in 24 seconds
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
combo/data/tokenizers/lambo_tokenizer.py
+2
-2
2 additions, 2 deletions
combo/data/tokenizers/lambo_tokenizer.py
pyproject.toml
+1
-1
1 addition, 1 deletion
pyproject.toml
with
3 additions
and
3 deletions
combo/data/tokenizers/lambo_tokenizer.py
+
2
−
2
View file @
11e898eb
...
@@ -71,9 +71,9 @@ class LamboTokenizer(Tokenizer):
...
@@ -71,9 +71,9 @@ class LamboTokenizer(Tokenizer):
if
split_level
.
upper
()
==
"
TURN
"
:
if
split_level
.
upper
()
==
"
TURN
"
:
for
turn
in
document
.
turns
:
for
turn
in
document
.
turns
:
_reset_idx
()
sentence_tokens
=
[]
sentence_tokens
=
[]
for
sentence
in
turn
.
sentences
:
for
sentence
in
turn
.
sentences
:
_reset_idx
()
for
token
in
sentence
.
tokens
:
for
token
in
sentence
.
tokens
:
sentence_tokens
.
extend
(
_sentence_tokens
(
token
,
split_multiwords
))
sentence_tokens
.
extend
(
_sentence_tokens
(
token
,
split_multiwords
))
tokens
.
append
(
sentence_tokens
)
tokens
.
append
(
sentence_tokens
)
...
@@ -96,8 +96,8 @@ class LamboTokenizer(Tokenizer):
...
@@ -96,8 +96,8 @@ class LamboTokenizer(Tokenizer):
tokens
.
append
(
sentence_tokens
)
tokens
.
append
(
sentence_tokens
)
else
:
else
:
for
turn
in
document
.
turns
:
for
turn
in
document
.
turns
:
_reset_idx
()
for
sentence
in
turn
.
sentences
:
for
sentence
in
turn
.
sentences
:
_reset_idx
()
for
token
in
sentence
.
tokens
:
for
token
in
sentence
.
tokens
:
tokens
.
extend
(
_sentence_tokens
(
token
,
split_multiwords
))
tokens
.
extend
(
_sentence_tokens
(
token
,
split_multiwords
))
tokens
=
[
tokens
]
tokens
=
[
tokens
]
...
...
This diff is collapsed.
Click to expand it.
pyproject.toml
+
1
−
1
View file @
11e898eb
...
@@ -3,7 +3,7 @@ requires = ["setuptools"]
...
@@ -3,7 +3,7 @@ requires = ["setuptools"]
[project]
[project]
name
=
"combo"
name
=
"combo"
version
=
"3.2.
1
"
version
=
"3.2.
2
"
authors
=
[
authors
=
[
{name
=
"Maja Jablonska"
,
email
=
"maja.jablonska@ipipan.waw.pl"
}
{name
=
"Maja Jablonska"
,
email
=
"maja.jablonska@ipipan.waw.pl"
}
]
]
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment