Punctuator v2
7 unresolved threads
7 unresolved threads
Release of the second version of Punctuator service
Merge request reports
Activity
Filter activity
1 1 [service] 2 tool = punctuator_test 2 tool = textcleaner_test changed this line in version 2 of the diff
55 17 ``` 56 18 57 19 ## LPMN 58 20 ``` 59 filedir(/users/michal.pogoda)|any2txt|punctuator_test 60 ``` 61 or 62 ``` 63 filedir(/users/michal.pogoda)|any2txt|punctuator_test({"model":"model_name"}) 21 filedir(/users/michal.pogoda)|any2txt|punctuator 64 22 ``` 65 where model_name is one of models specified in models_enabled. If no model is provided or requested model is unavailable, actions_base will be used. 66 23 67 24 ## Mountpoints 68 Directory where the model will be downloaded (~500Mb) needs to be mounted at /punctuator/deploy 25 Directory where the model will be downloaded (~500Mb) needs to be mounted at `/model/punctuator`. Mount `/model` into directory if you want to make it persitent changed this line in version 2 of the diff
- tests/test_chunking.py 0 → 100644
- worker.py 100755 → 100644
6 from typing import List 4 import json 5 import string 7 6 8 7 import nlp_ws 9 import torch 8 from transformers import AutoModelForTokenClassification, AutoTokenizer 10 9 11 from src.utils import input_preprocess, output_preprocess 10 from punctuator.punctuator import (combine_masks, decode, decode_labels, 11 inference_masks) 12 12 13 13 14 class Worker(nlp_ws.NLPWorker): 15 """Class that implements example worker.""" 14 def preprocess_input(text: str): changed this line in version 2 of the diff
- worker.py 100755 → 100644
1 1 #!/usr/bin/python changed this line in version 2 of the diff
2 2 envlist = unittest,pep8 3 3 skipsdist = True 4 4 5 [testenv] 6 deps = -rrequirements.txt 7 8 [testenv:unittest] 9 commands = pytest --ignore data --ignore generated 10 11 5 [flake8] 12 6 exclude = Some of those paths are not valid anymore
Edited by Mateusz Gniewkowski
- worker.py 100755 → 100644
48 tokenized = self.tokenizer(text, return_tensors='pt') 41 49 42 if ( 43 "model" in task_options.keys() 44 and task_options["model"] in MODELS_MAP.keys() 45 ): 46 model_type = task_options["model"] 47 else: 48 model_type = "actions_base" 50 num_tokens = len(tokenized['input_ids'][0]) 49 51 50 with open(input_file, "r") as f: 51 text = input_preprocess(output_preprocess(f.read())) 52 # TODO: Consider adding batching support 53 results = [] 54 for inference_mask, mask_mask in zip(*inference_masks(num_tokens, self.max_context_size, self.overlap)): changed this line in version 2 of the diff
added 1 commit
- 0d3e1b42 - Style fixes (88 lines limit), moved model into /home/worker/model
mentioned in commit e4c02c2e