Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
T
tokenizer
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Redmine
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
nlpworkers
tokenizer
Repository graph
Repository graph
You can move around the graph by using the arrow keys.
master
Select Git revision
Selected
master
default
protected
Branches
1
develop
2 results
Begin with the selected commit
Created with Raphaël 2.2.0
28
Jul
21
Sep
14
9
3
5
Aug
28
Jul
21
17
15
14
10
8
2
Update .gitlab-ci.yml
master
master
Merge branch 'develop' of gitlab.clarin-pl.eu:nlpworkers/tokenizer into develop
develop
develop
New code based on PPKZ data.
wMerge branch 'master' into develop
Merge branch 'develop' into 'master'
Develop
Merge branch 'master' into 'develop'
Changed defualt calues for options.
Additional letter emoticons, additional letters in rm_add_char=special.
Added another letter emote.
Forgot to run tox locally.
Added an option to delete ascii emoticons and delete multiple end of sentence characters.
Merge branch 'develop' into 'master'
New functionalities
Changed one regex to have list of whitespaces instead o f space.
Changed config.ini, added a space in a regex.
pep8 new line before or
Fixed a problem with returning an empty list when there was a line in memory.
Deleted unnecessery function calls.
Changed and added more comments. Fixed pep8, docstring problems.
Remove mistyped listings now serches both in lines and in sentences.
Fixed not detecting multiple wrong lisitngs in a row.
Basic handling of mistyped listings.
Basic handling of mistyped listings.
Added to 'special' to remove arabic.
Changed behaviour of deleting special characters to delete more cases.
Added handling of unicode characters, changed link handling.
Pep8 deleted whitespace from blankline.
Added an empty line for docstyle.
Added missing assigment in puntuation = leave link = token.
Fixed deleting periods at the end of sentence.
Fixed removing both punctuation and listings.
Fixed typo in _change_indices.
Added handling more types of listings, added indecies handling, program now removes empty lines.
Merge branch 'develop' into 'master'
Changed the way to remove punctuation. Now by defualt links and e-mails words such as e-mail do not have punctuation removed.
Add README.md
Loading