Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
A
anonymizer
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
nlpworkers
anonymizer
Commits
010d4760
Commit
010d4760
authored
2 years ago
by
Michał Pogoda
Browse files
Options
Downloads
Patches
Plain Diff
Support toggling of first morpho subtag removal
parent
f99a6a2e
3 merge requests
!10
Anonimizer v2
,
!9
Fix infancy erorrs based on Magdalena's report
,
!7
Better coverage
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
src/dictionaries/morphosyntactic/ner_file_nkjp.py
+9
-2
9 additions, 2 deletions
src/dictionaries/morphosyntactic/ner_file_nkjp.py
with
9 additions
and
2 deletions
src/dictionaries/morphosyntactic/ner_file_nkjp.py
+
9
−
2
View file @
010d4760
...
...
@@ -7,9 +7,13 @@ from src.dictionaries.morphosyntactic.ner_file import NERFileMorphosyntacticDict
class
NERFileNKJPMorphosyntacticDictionary
(
NERFileMorphosyntacticDictionary
):
def
__init__
(
self
,
dictionary_path
:
Optional
[
str
]
=
None
,
always_replace
=
True
self
,
dictionary_path
:
Optional
[
str
]
=
None
,
always_replace
=
True
,
remove_first_morpho_subtag
=
True
)
->
None
:
super
().
__init__
(
dictionary_path
,
always_replace
)
self
.
_remove_first_morpho_subtag
=
remove_first_morpho_subtag
def
get_random_replacement
(
self
,
original_entry
:
Detection
)
->
Optional
[
str
]:
original_entry_type
=
type
(
original_entry
)
...
...
@@ -19,7 +23,10 @@ class NERFileNKJPMorphosyntacticDictionary(NERFileMorphosyntacticDictionary):
if
issubclass
(
original_entry_type
,
MorphosyntacticInfoMixin
):
# THAT IS A HACK FOR NOW FOR CORRUPTED NKJP TAGS IN DICTIONARY
morpho_tag
=
"
:
"
.
join
(
original_entry
.
morpho_tag
.
split
(
"
:
"
)[
1
:])
if
self
.
_remove_first_morpho_subtag
:
morpho_tag
=
"
:
"
.
join
(
original_entry
.
morpho_tag
.
split
(
"
:
"
)[
1
:])
else
:
morpho_tag
=
original_entry
.
morpho_tag
if
(
original_entry_type_name
in
self
.
_dictionary
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment