Change model and update pipeline

e800454d · Marcin Wątroba · e7a1f7ac · e800454d · e800454d · e800454d
Unverified Commit e800454d authored Dec 28, 2021 by Marcin Wątroba
--- a/docker/speechbrain_asr/requirements.txt
+++ b/docker/speechbrain_asr/requirements.txt
+asr-benchmarks==0.0.1-alpha.48
+speechbrain
--- a/docker/techmo_asr/Dockerfile
+++ b/docker/techmo_asr/Dockerfile
+FROM python:3.9-slim
+
+ARG DEBIAN_FRONTEND=noninteractive
+
+RUN apt-get update \
+    && apt-get dist-upgrade -y \
+    && apt-get install -y --no-install-recommends \
+        build-essential \
+        portaudio19-dev \
+        openssh-client \
+        python3-pip \
+        python3-dev \
+    && apt-get clean \
+	&& rm -fr /var/lib/apt/lists/* \
+	&& rm -fr /var/cache/apt/*
+
+ADD ./python /dictation_client
+
+WORKDIR /dictation_client
+
+RUN pip3 install -i https://pypi.clarin-pl.eu/simple -r requirements.txt
+
+CMD ["./run_web_service.sh"]
--- a/docker/techmo_asr/docker-compose.yml
+++ b/docker/techmo_asr/docker-compose.yml
+version: "3.8"
+services:
+
+  techmo_asr:
+    image: docker-registry.theliver.pl/techmo-asr:1.1
+    container_name: techmo_asr
+    environment:
+      - TECHMO_SSH_SERVER_USERNAME=mwatroba
+      - TECHMO_SSH_SERVER_URL=jankocon.clarin-pl.eu
+      - TECHMO_SERVER_SSH_PORT=9222
+      - TECHMO_REMOTE_SERVICE_PORT=12321
+      - TECHMO_SERVER_URL=156.17.135.34
+      - AUTH_TOKEN=test1234
+    volumes:
+      - /Users/marcinwatroba/Desktop/WUST/KEYS/techmo_asr_server:/keys
--- a/docker/techmo_asr/prepare_docker.sh
+++ b/docker/techmo_asr/prepare_docker.sh
+#!/bin/bash
+
+SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
+
+docker build -t techmo-asr "$SCRIPT_DIR"
+docker tag techmo-asr docker-registry.theliver.pl/techmo-asr:1.1
+docker push docker-registry.theliver.pl/techmo-asr:1.1
--- a/docker/techmo_asr/python/README.md
+++ b/docker/techmo_asr/python/README.md
+# Python implementation of Dictation ASR gRPC client.
+
+## Docker usage
+
+### Build docker image
+
+To prepare a docker image with Python implementation of the Dictation Client, open the project's main directory and run following command:
+
+```
+docker build -f Dockerfile-python -t dictation-client-python:2.3.0 . 
+```
+The build process will take several minutes.
+When the build process is complete, you will receive a message:
+```
+Successfully tagged dictation-client-python:2.3.0
+```
+
+### Run Dictation client
+
+To use the Dictation Client on Docker container, go to the `dictation-client/python/docker` directory and run `run_dictation_client_python.sh` script.
+
+To send a simple request to the Dictation service, use:
+```
+./run_dictation_client_python.sh --service-address IP_ADDRESS:PORT --filename WAV_FILE_NAME
+```
+
+To print the list of available options, use:
+```
+./run_dictation_client_python.sh --help
+```
+Audio files to be transcribed should be placed inside the `dictation-client/python/docker/wav` directory.
+TLS credentials should be placed inside the `dictation-client/python/docker/tls` directory, if used.
+
+
+
+## Local instance usage
+
+### Basic Usage
+
+Dictation Client includes a scripts for automatic environment configuration and launching on systems from the Debian Linux family. For launching the Dictation Client on other Linux-based OS or Windows, check out the "Manual Usage" section.
+
+
+#### Before run
+
+To install required dependencies and to prepare virtual environment, run:
+```
+./setup.sh
+```
+
+#### Run
+
+To run the Dictation Client, use the `run.sh` script, e.g.:
+```
+./run --service-address IP_ADDRESS:PORT --wave-path INPUT_WAVE
+```
+To print the usage description, use:
+```
+./run --help
+```
+
+
+### Manual Usage
+
+#### Before run
+
+##### Submodules
+
+After cloning a git repository, download submodules:
+```
+git submodule update --init --recursive
+```
+(this command has to be invoked from the project's root directory)
+
+If you are not using git, you have to manually download `googleapis` submodule. 
+To do this, open project repository in web browser, go to the `submodules` directory and use the link located there to open the relevant commit in the googleapis repository. Then download it, unpack and copy all files to the `submodules/googleapis` directory.
+
+
+##### Dependencies
+
+If you don't have virtualenv yet, install it first (https://virtualenv.pypa.io/en/stable/installation.html)
+On Debian/Ubuntu OS this package can be installed by using `setup.sh` script.
+
+Then install the required dependencies inside the virtual environment (this step only needs to be done the first time, for the further usage it is enough to use the existing virtual environment).
+
+
+- On Linux:
+
+Use Python 3 with the virtual environment and install required packages (supported Python versions are: 3.5, 3.6, 3.7, 3.8, 3.9):
+
+```
+virtualenv -p python3 .env
+source .env/bin/activate
+pip install -r requirements.txt
+```
+
+- On Windows 10:
+
+Temporarily change the PowerShell's execution policy to allow scripting. Start the PowerShell with `Run as Administrator` and use command:
+
+```
+Set-ExecutionPolicy RemoteSigned
+```
+then confirm your choice.
+
+Use Python 3 with virtual environment and install required packages (supported Python versions are: 3.5, 3.6, 3.7, 3.8, 3.9):
+
+```
+virtualenv -p python3 .env
+.\.env\Scripts\activate
+pip install -r requirements.txt
+```
+
+To switch back PowerShell's execution policy to the default, use command:
+
+```
+Set-ExecutionPolicy Restricted
+```
+
+##### Proto sources
+
+[Optional] To regenerate the sources from `.proto`, run:
+```
+./make_proto.sh
+```
+This might be required when using other gRPC or Protocol Buffers version.
+
+ 
+
+#### Run
+
+To run the Dictation Client, activate the virtual environment first:
+- On Linux:
+```
+source .env/bin/activate
+```
+- On Windows:
+```
+.\.env\Scripts\activate
+```
+Then run Dictation Client. Sample use:
+
+```
+python dictation_client.py --service-address "192.168.1.1:4321" --wave-path audio.wav
+```
+
+For each request you have to provide the service address and the audio source (wav file or microphone).
+
+
+## Usage:
+```
+Basic usage: dictation_client.py --service-address ADDRESS --wave-path WAVE
+```
+
+Available options:
+```
+  -h, --help            show this help message and exit
+  --service-address ADDRESS
+                        IP address and port (address:port) of a service the
+                        client will connect to.
+  --ssl-dir SSL_DIRECTORY
+                        If set to a path with ssl credential files
+                        (client.crt, client.key, ca.crt), use ssl
+                        authentication. Otherwise use insecure channel
+                        (default).
+  --wave-path WAVE      Path to wave file with speech to be recognized. Should
+                        be mono, 8kHz or 16kHz.
+  --mic                 Use microphone as an audio source (instead of wave
+                        file).
+  --session-id SESSION_ID
+                        Session ID to be passed to the service. If not
+                        specified, the service will generate a default session
+                        ID itself.
+  --grpc-timeout GRPC_TIMEOUT
+                        Timeout in milliseconds used to set gRPC deadline -
+                        how long the client is willing to wait for a reply
+                        from the server. If not specified, the service will
+                        set the deadline to a very large number.
+  --max-alternatives MAX_ALTERNATIVES
+                        Maximum number of recognition hypotheses to be
+                        returned.
+  --time-offsets        If set - the recognizer will return also word time
+                        offsets.
+  --single-utterance    If set - the recognizer will detect a single spoken
+                        utterance.
+  --interim-results     If set - messages with temporal results will be shown.
+  --no-input-timeout NO_INPUT_TIMEOUT
+                        MRCP v2 no input timeout [ms].
+  --speech-complete-timeout SPEECH_COMPLETE_TIMEOUT
+                        MRCP v2 speech complete timeout [ms].
+  --speech-incomplete-timeout SPEECH_INCOMPLETE_TIMEOUT
+                        MRCP v2 speech incomplete timeout [ms].
+  --recognition-timeout RECOGNITION_TIMEOUT
+                        MRCP v2 recognition timeout [ms].
+  --context-phrase CONTEXT_PHRASE
+                        Specifies which context model to use.
+```
+
+
+## Troubleshooting
+
+### Dependencies
+
+If process of installing dependencies fails with the message similar to this one:
+
+```
+        src/_portaudiomodule.c:28:10: fatal error: Python.h: No such file or directory
+             #include "Python.h"
+                      ^~~~~~~~~~
+            compilation terminated.
+            error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
+```
+it means that `python3-dev` package is missing.
+On Debian/Ubuntu OS this package can be installed by using `setup.sh` script.
+
+
+If process of installing dependencies fails with the message similar to this one:
+
+```
+        src/_portaudiomodule.c:29:10: fatal error: portaudio.h: No such file or directory
+             #include "portaudio.h"
+                      ^~~~~~~~~~~~~
+            compilation terminated.
+            error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
+```
+it means that PortAudio library is missing.
+PortAudio can be downloaded from: http://www.portaudio.com/download.html.
+On Debian/Ubuntu OS this package can be installed by using `setup.sh` script.
+
+
+### Microphone
+
+To use a microphone as the audio source instead of the wav file, use `--mic` option.
+It allows to send audio data directly from the microphone, however it does not provide information when to finish the recognition.
+For this reason in most cases `--mic` should be followed by the `--single-utterance` option, which stops the recognition after a first spotted utterance.
+
+If the only output you receive is:
+```
+Received speech event type: END_OF_SINGLE_UTTERANCE
+```
+check if your microphone is connected and properly configured.
+
+
+### ALSA Configuration
+
+On the Linux operating systems using Advanced Linux Sound Architecture (ALSA) minor configuration changes may be necessary before the first use.
+
+If you get the following output after runing request:
+```
+Dictation ASR gRPC client 2.3.0
+ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
+ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
+ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
+ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
+ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
+ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
+ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
+```
+that means you need to modify the audio interfaces configuration.
+
+In such case, open the `/usr/share/alsa/alsa.conf` file with root privileges, e.g.:
+```
+sudo vim /usr/share/alsa/alsa.conf
+```
+
+In the `#  PCM interface` section find and comment (using #) all lines that defines interfaces marked as 'Unknown':
+
+```
+pcm.rear cards.pcm.rear
+pcm.center_lfe cards.pcm.center_lfe
+pcm.side cards.pcm.side
+```
+To get rid of warnings, comment also several lines below, starting with `pcm.surround`.
+
+Then save and close the file.
+
+
+### FFmpeg
+
+If the FFmpeg framework is not installed, the following warning appears in the program output:
+
+```
+RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
+  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
+```
+Installing the FFmpeg framework is not necessary to run the application, however it may be a useful stuff for everyone working with the sound files.
+
+FFmpeg can be downloaded from: https://ffmpeg.org/download.html
+
+On the Ubuntu/Debian Linux OS you can download and install FFmpeg directly from the official repositories.
\ No newline at end of file
--- a/docker/techmo_asr/python/VERSION.py
+++ b/docker/techmo_asr/python/VERSION.py
+DICTATION_CLIENT_VERSION = '2.3.0'
--- a/docker/techmo_asr/python/dictation_client.py
+++ b/docker/techmo_asr/python/dictation_client.py
+#!/usr/bin/python3
+from argparse import ArgumentParser
+from typing import Tuple, List
+
+from sziszapangma.integration.service_core.asr.asr_result import WordTimeAlignment
+
+from VERSION import DICTATION_CLIENT_VERSION
+from service.dictation_settings import DictationSettings
+from service.streaming_recognizer import StreamingRecognizer
+from utils.audio_source import AudioStream
+from utils.mic_source import MicrophoneStream
+
+
+def duration_to_normalised_millis(duration):
+    return int((duration.seconds * 1000000000 + duration.nanos) / 1000000)
+
+
+def word_duration_to_dict(duration_word) -> WordTimeAlignment:
+    return WordTimeAlignment(duration_to_normalised_millis(duration_word[0]),
+                             duration_to_normalised_millis(duration_word[1]))
+
+
+def create_audio_stream(args):
+    # create wave file stream
+    if args.wave is not None:
+        return AudioStream(args.wave)
+
+    # create microphone stream
+    if args.mic:
+        rate = 16000  # [Hz]
+        chunk = int(rate / 10)  # [100 ms]
+        return MicrophoneStream(rate, chunk)
+
+    # default
+    raise ValueError("Unknown media source to create")
+
+
+def recognise(wav_path: str, return_time_offsets: bool) -> Tuple[str, List[WordTimeAlignment]]:
+    print("Dictation ASR gRPC client " + DICTATION_CLIENT_VERSION)
+
+    parser = ArgumentParser()
+    parser.add_argument("--service-address", dest="address", required=False,
+                        default="127.0.0.1:12321",
+                        help="IP address and port (address:port) of a service the client will connect to.",
+                        type=str)
+    parser.add_argument("--ssl-dir", dest="ssl_directory", default="",
+                        help="If set to a path with ssl credential files (client.crt, client.key, ca.crt), use ssl authentication. Otherwise use insecure channel (default).",
+                        type=str)
+    parser.add_argument("--wave-path", dest="wave",
+                        help="Path to wave file with speech to be recognized. Should be mono, 8kHz or 16kHz.",
+                        default=wav_path)
+    parser.add_argument("--mic", help="Use microphone as an audio source (instead of wave file).",
+                        action='store_true')
+    parser.add_argument("--session-id",
+                        help="Session ID to be passed to the service. If not specified, the service will generate a default session ID itself.",
+                        default="", type=str)
+    parser.add_argument("--grpc-timeout",
+                        help="Timeout in milliseconds used to set gRPC deadline - how long the client is willing to wait for a reply from the server. If not specified, the service will set the deadline to a very large number.",
+                        default=0, type=int)
+    # request configuration section
+    parser.add_argument("--max-alternatives",
+                        help="Maximum number of recognition hypotheses to be returned.",
+                        default=1, type=int)
+    parser.add_argument("--time-offsets",
+                        help="If set - the recognizer will return also word time offsets.",
+                        action="store_true", default=return_time_offsets)
+    parser.add_argument("--single-utterance",
+                        help="If set - the recognizer will detect a single spoken utterance.",
+                        action="store_true", default=False)
+    parser.add_argument("--interim-results",
+                        help="If set - messages with temporal results will be shown.",
+                        action="store_true", default=False)
+    # timeouts
+    parser.add_argument("--no-input-timeout", help="MRCP v2 no input timeout [ms].", default=5000,
+                        type=int)
+    parser.add_argument("--speech-complete-timeout", help="MRCP v2 speech complete timeout [ms].",
+                        default=2000,
+                        type=int)
+    parser.add_argument("--speech-incomplete-timeout",
+                        help="MRCP v2 speech incomplete timeout [ms].", default=4000,
+                        type=int)
+    parser.add_argument("--recognition-timeout", help="MRCP v2 recognition timeout [ms].",
+                        default=10000, type=int)
+    parser.add_argument("--context-phrase", help="Specifies which context model to use.",
+                        default="", type=str)
+
+    # Stream audio to the ASR engine and print all hypotheses to standard output
+    args = parser.parse_args()
+
+    print('args')
+    print(args)
+
+    # if args.wave is not None or args.mic:
+    with create_audio_stream(args) as stream:
+        settings = DictationSettings(args)
+        recognizer = StreamingRecognizer(args.address, args.ssl_directory, settings)
+
+        print('Recognizing...')
+        results = recognizer.recognize(stream)
+        print(results)
+        return results[0]['transcript'], [
+            word_duration_to_dict(it) for it in results[0]['alignment']]
--- a/docker/techmo_asr/python/docker/run_dictation_client_python.sh
+++ b/docker/techmo_asr/python/docker/run_dictation_client_python.sh
+#!/bin/bash
+# coding=utf-8
+
+# This script sends request to dictation service using dictation client inside docker container
+# Requires "dictation-client-python:2.3.0" docker image loaded locally
+
+set -euo pipefail
+IFS=$'\n\t'
+
+SCRIPT=$(realpath "$0")
+SCRIPTPATH=$(dirname "${SCRIPT}")
+docker_image="dictation-client-python:2.3.0"
+
+usage() {
+
+echo "
+
+Dictation ASR gRPC client 2.3.0
+
+  -h, --help            show this help message and exit
+  -s=ADDRESS, --service-address=ADDRESS
+                        IP address and port (address:port) of a service the client will connect to.
+  -f=WAVE, --filename=WAVE   
+                        Name of the wave file with speech to be recognized. File should be inside 'wav' directory. Should be mono, 8kHz or 16kHz.
+  -m, --mic             Use microphone as an audio source (instead of wave file).
+  --tls                 If set, uses tls authentication, otherwise use insecure channel (default). The tls credential files (client.crt, client.key, ca.crt) should be placed inside 'tls' directory.
+  --session-id=SESSION_ID
+                        Session ID to be passed to the service. If not specified, the service will generate a default session ID itself.
+  --grpc-timeout=GRPC_TIMEOUT
+                        Timeout in milliseconds used to set gRPC deadline - how long the client is willing to wait for a reply from the
+                        server. If not specified, the service will set the deadline to a very large number.
+  --max-alternatives=MAX_ALTERNATIVES
+                        Maximum number of recognition hypotheses to be returned.
+  --time-offsets        If set - the recognizer will return also word time offsets.
+  --single-utterance    If set - the recognizer will detect a single spoken utterance.
+  --interim-results     If set - messages with temporal results will be shown.
+  --no-input-timeout=NO_INPUT_TIMEOUT
+                        MRCP v2 no input timeout [ms].
+  --speech-complete-timeout=SPEECH_COMPLETE_TIMEOUT
+                        MRCP v2 speech complete timeout [ms].
+  --speech-incomplete-timeout=SPEECH_INCOMPLETE_TIMEOUT
+                        MRCP v2 speech incomplete timeout [ms].
+  --recognition-timeout=RECOGNITION_TIMEOUT
+                        MRCP v2 recognition timeout [ms].
+  --context-phrase=CONTEXT_PHRASE
+                        Specifies which context model to use.
+"
+}
+
+optspec=":fhms-:"
+while getopts "f:hms:-:" optchar; do
+    case "${optchar}" in
+        -)
+            case "${OPTARG}" in
+                help)   
+                    usage; exit 0 
+                    ;;
+                tls)  
+                    opts+=( "--ssl-dir" "/volumen/tls" )
+                    ;;
+                time-offsets)  
+                    opts+=( "--time-offsets" )
+                    ;;
+                single-utterance)  
+                    opts+=( "--single-utterance" )
+                    ;;
+                interim-results)  
+                    opts+=( "--interim-results" )
+                    ;;
+                mic)
+                    opts+=("--mic")
+                    ;;
+                filename=*)
+                    val=${OPTARG#*=}
+                    opt=${OPTARG%=$val}
+                    opts+=( "--wave-path" "/volumen/wav/${val##*/}" )
+                    ;;
+                *=*)
+                    val=${OPTARG#*=}
+                    opt=${OPTARG%=$val}
+                    opts+=( "--$opt" "$val" )
+                    ;;
+                *)
+                    if [ "$OPTERR" = 1 ] && [ "${optspec:0:1}" != ":" ]; then
+                        echo "Unknown option --${OPTARG}" >&2
+                    fi
+                    ;;
+            esac;;
+        f)                      
+            val=${OPTARG#*=}
+            opt=${OPTARG%=$val}
+            opts+=( "--wave-path" "/volumen/wav/${val##*/}" )
+            ;;
+        h)  
+            usage; exit 0 
+            ;;
+        m)  
+            opts+=("--mic")
+            ;;
+        s)  
+            val=${OPTARG#*=}
+            opt=${OPTARG%=$val}
+            opts+=( "--service-address" "${val}" )
+            ;;
+        *)
+            if [ "$OPTERR" != 1 ] || [ "${optspec:0:1}" = ":" ]; then
+                echo "Non-option argument: '-${OPTARG}'" >&2
+            fi
+            ;;
+    esac
+done
+
+docker run --rm -it -v "${SCRIPTPATH}:/volumen" --network host "${docker_image}" \
+python3 /dictation_client/dictation_client.py "${opts[@]}"
\ No newline at end of file
--- a/docker/techmo_asr/python/docker/tls/.gitkeep
+++ b/docker/techmo_asr/python/docker/tls/.gitkeep
--- a/docker/techmo_asr/python/docker/wav/.gitkeep
+++ b/docker/techmo_asr/python/docker/wav/.gitkeep
--- a/docker/techmo_asr/python/main.py
+++ b/docker/techmo_asr/python/main.py
+import os
+import socket
+
+from sziszapangma.integration.service_core.asr.asr_base_processor import AsrBaseProcessor
+from sziszapangma.integration.service_core.asr.asr_result import AsrResult
+
+from dictation_client import recognise
+
+
+class TechmoAsrProcessor(AsrBaseProcessor):
+
+    @staticmethod
+    def is_tunnel_running() -> bool:
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            return s.connect_ex(('localhost', 12321)) == 0
+
+    def process_asr(self, audio_file_path: str) -> AsrResult:
+        print(f'processing start {audio_file_path}')
+        if not self.is_tunnel_running():
+            os.system('./start_tunneling.sh')
+        raw_transcription, words_time_alignment = recognise(audio_file_path, True)
+        transcription = raw_transcription.replace('\n', ' ')
+        words = [
+            it
+            for it in transcription.split(' ')
+            if it not in ['', ' ']
+        ]
+        asr_result = AsrResult(words, transcription, words_time_alignment)
+        print(f'processing end {audio_file_path}, {asr_result}')
+        return asr_result
+
+
+if __name__ == '__main__':
+    TechmoAsrProcessor().start_processor()
--- a/docker/techmo_asr/python/make_proto.sh
+++ b/docker/techmo_asr/python/make_proto.sh
+#!/bin/bash
+# coding=utf-8
+
+set -eo pipefail
+
+virtualenv -p python3 proto_env
+# shellcheck disable=SC1091
+source proto_env/bin/activate
+pip install grpcio-tools==1.7.0
+
+function cleanup() {
+    # shellcheck disable=SC1091
+    rm -rf proto_env
+}
+trap cleanup EXIT
+
+echo "Generating dictation Python protobuf/grpc sources."
+path_i="../proto"
+path_o="service"
+python3 -m grpc_tools.protoc \
+	        -I${path_i} \
+            -I../submodules/googleapis \
+            --python_out=${path_o} \
+            --grpc_python_out=${path_o} \
+            ${path_i}/dictation_asr.proto
+
+# Fix buggy autogenerated GRPC import
+sed -i 's/.*import dictation_asr_pb2 as dictation__asr__pb2.*/from . import dictation_asr_pb2 as dictation__asr__pb2/' ${path_o}/dictation_asr_pb2_grpc.py
--- a/docker/techmo_asr/python/prepare_key.py
+++ b/docker/techmo_asr/python/prepare_key.py
+import os
+
+_TECHMO_SSH_SERVER_KEY = 'TECHMO_SSH_SERVER_KEY'
+_TECHMO_KEY_FILE = './techmo_key_file'
+
+if __name__ == '__main__':
+    key_content = os.environ[_TECHMO_SSH_SERVER_KEY].replace('\\n', '\n')
+    with open(_TECHMO_KEY_FILE, 'w') as f:
+        f.write(key_content)
--- a/docker/techmo_asr/python/requirements.txt
+++ b/docker/techmo_asr/python/requirements.txt
+setuptools==50.3.2
+grpcio==1.37.0
+grpcio-tools==1.37.0
+protobuf==3.12.2
+gapic-google-cloud-speech-v1==0.15.3
+grpc-google-cloud-speech-v1==0.8.1
+proto-google-cloud-speech-v1==0.15.3
+google-auth==1.21.1
+google-auth-httplib2==0.0.3
+google-cloud-core==1.0.2
+google-cloud-speech==1.0.0
+googleapis-common-protos==1.6.0
+httplib2==0.14.0
+oauth2client==2.0.0
+pydub==0.23.1
+pyaudio==0.2.11
+asr-benchmarks==0.0.1-alpha.48
--- a/docker/techmo_asr/python/run.sh
+++ b/docker/techmo_asr/python/run.sh
+#!/bin/bash
+# coding=utf-8
+
+# This script sends request to dictation service using python dictation client 
+# Before using this script, run 'setup.sh' to check dependencies and prepare virtual environment
+
+set -euo pipefail
+IFS=$'\n\t'
+
+SCRIPT=$(realpath "$0")
+SCRIPTPATH=$(dirname "${SCRIPT}")
+
+source "${SCRIPTPATH}/.env/bin/activate"
+export PYTHONIOENCODING=utf8
+python3 "${SCRIPTPATH}/dictation_client.py" "$@"
\ No newline at end of file
--- a/docker/techmo_asr/python/run_web_service.sh
+++ b/docker/techmo_asr/python/run_web_service.sh
+#!/bin/bash
+
+#python -u prepare_key.py
+#
+#chmod 600 ./techmo_key_file
+
+#./start_tunneling.sh
+
+python -u main.py
--- a/docker/techmo_asr/python/service/dictation_asr_pb2.py
+++ b/docker/techmo_asr/python/service/dictation_asr_pb2.py
--- a/docker/techmo_asr/python/service/dictation_asr_pb2_grpc.py
+++ b/docker/techmo_asr/python/service/dictation_asr_pb2_grpc.py
+# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
+import grpc
+
+from . import dictation_asr_pb2 as dictation__asr__pb2
+
+
+class SpeechStub(object):
+  """Service that implements Google Cloud Speech API extended by Techmo.
+  """
+
+  def __init__(self, channel):
+    """Constructor.
+
+    Args:
+      channel: A grpc.Channel.
+    """
+    self.Recognize = channel.unary_unary(
+        '/google.cloud.speech.v1.Speech/Recognize',
+        request_serializer=dictation__asr__pb2.RecognizeRequest.SerializeToString,
+        response_deserializer=dictation__asr__pb2.RecognizeResponse.FromString,
+        )
+    self.StreamingRecognize = channel.stream_stream(
+        '/google.cloud.speech.v1.Speech/StreamingRecognize',
+        request_serializer=dictation__asr__pb2.StreamingRecognizeRequest.SerializeToString,
+        response_deserializer=dictation__asr__pb2.StreamingRecognizeResponse.FromString,
+        )
+
+
+class SpeechServicer(object):
+  """Service that implements Google Cloud Speech API extended by Techmo.
+  """
+
+  def Recognize(self, request, context):
+    """Performs synchronous speech recognition: receive results after all audio
+    has been sent and processed.
+    """
+    context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+    context.set_details('Method not implemented!')
+    raise NotImplementedError('Method not implemented!')
+
+  def StreamingRecognize(self, request_iterator, context):
+    """Performs asynchronous speech recognition: receive results via the
+    google.longrunning.Operations interface. Returns either an
+    `Operation.error` or an `Operation.response` which contains
+    a `LongRunningRecognizeResponse` message.
+    rpc LongRunningRecognize(LongRunningRecognizeRequest) returns (google.longrunning.Operation) {
+    option (google.api.http) = {
+    post: "/v1/speech:longrunningrecognize"
+    body: "*"
+    };
+    }
+
+    Performs bidirectional streaming speech recognition: receive results while
+    sending audio. This method is only available via the gRPC API (not REST).
+    """
+    context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+    context.set_details('Method not implemented!')
+    raise NotImplementedError('Method not implemented!')
+
+
+def add_SpeechServicer_to_server(servicer, server):
+  rpc_method_handlers = {
+      'Recognize': grpc.unary_unary_rpc_method_handler(
+          servicer.Recognize,
+          request_deserializer=dictation__asr__pb2.RecognizeRequest.FromString,
+          response_serializer=dictation__asr__pb2.RecognizeResponse.SerializeToString,
+      ),
+      'StreamingRecognize': grpc.stream_stream_rpc_method_handler(
+          servicer.StreamingRecognize,
+          request_deserializer=dictation__asr__pb2.StreamingRecognizeRequest.FromString,
+          response_serializer=dictation__asr__pb2.StreamingRecognizeResponse.SerializeToString,
+      ),
+  }
+  generic_handler = grpc.method_handlers_generic_handler(
+      'google.cloud.speech.v1.Speech', rpc_method_handlers)
+  server.add_generic_rpc_handlers((generic_handler,))
--- a/docker/techmo_asr/python/service/dictation_settings.py
+++ b/docker/techmo_asr/python/service/dictation_settings.py
+class DictationSettings:
+    """Default settings for Techmo Dictation ASR (timeouts and thresholds)"""
+
+    def __init__(self, args):
+        # use configuration directly
+        self.args = args
+
+    def session_id(self):
+        return self.args.session_id
+
+    def grpc_timeout(self):
+        return self.args.grpc_timeout
+
+    def max_alternatives(self):
+        return self.args.max_alternatives
+
+    def time_offsets(self):
+        return self.args.time_offsets
+
+    def single_utterance(self):
+        return self.args.single_utterance
+
+    def interim_results(self):
+        return self.args.interim_results
+
+    def timeouts_map(self):
+        return {
+            "no-input-timeout": str(self.args.no_input_timeout),
+            "speech-complete-timeout": str(self.args.speech_complete_timeout),
+            "speech-incomplete-timeout": str(self.args.speech_incomplete_timeout),
+            "recognition-timeout": str(self.args.recognition_timeout),
+        }
+
+    def context_phrase(self):
+        return self.args.context_phrase
--- a/docker/techmo_asr/python/service/streaming_recognizer.py
+++ b/docker/techmo_asr/python/service/streaming_recognizer.py
+import os
+import threading
+from . import dictation_asr_pb2 as dictation_asr_pb2
+from . import dictation_asr_pb2_grpc as dictation_asr_pb2_grpc
+import grpc
+
+
+class RequestIterator:
+    """Thread-safe request iterator for streaming recognizer."""
+
+    def __init__(self, audio_stream, settings):
+        # Iterator data
+        self.audio_stream = audio_stream
+        self.audio_generator = self.audio_stream.generator()
+
+        self.settings = settings
+
+        self.request_builder = {
+            True: self._initial_request,
+            False: self._normal_request
+        }
+        # Iterator state
+        self.lock = threading.Lock()
+        self.is_initial_request = True
+        self.eos = False  # indicates whether end of stream message was send (request to stop iterator)
+
+    def _initial_request(self):
+        req = StreamingRecognizer.build_configuration_request(self.audio_stream.frame_rate(), self.settings)
+        self.is_initial_request = False
+        return req
+
+    def _normal_request(self):
+        data = next(self.audio_generator)
+        if data == None:
+            raise StopIteration
+
+        return dictation_asr_pb2.StreamingRecognizeRequest(audio_content=data)
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        with self.lock:
+            return self.request_builder[self.is_initial_request]()
+
+
+class StreamingRecognizer:
+    def __init__(self, address, ssl_directory, settings_args):
+        # Use ArgumentParser to parse settings
+        self.service = dictation_asr_pb2_grpc.SpeechStub(StreamingRecognizer.create_channel(address, ssl_directory))
+        self.settings = settings_args
+
+    def recognize(self, audio):
+        requests_iterator = RequestIterator(audio, self.settings)
+        return self.recognize_audio_content(requests_iterator)
+
+    def recognize_audio_content(self, requests_iterator):
+        time_offsets = self.settings.time_offsets()
+
+        timeout=None
+        if self.settings.grpc_timeout() > 0:
+            timeout = self.settings.grpc_timeout() / 1000 # milliseconds to seconds
+        metadata = []
+        if self.settings.session_id():
+            metadata = [('session_id', self.settings.session_id())]
+
+        recognitions = self.service.StreamingRecognize(requests_iterator, timeout=timeout, metadata=metadata)
+
+        confirmed_results = []
+        alignment = []
+        confidence = 1.0
+
+        for recognition in recognitions:
+            if recognition.error.code:
+                print(u"Received error response: ({}) {}".format(recognition.error.code, recognition.error.message))
+                requests_iterator.audio_stream.close()
+
+            elif recognition.speech_event_type != dictation_asr_pb2.StreamingRecognizeResponse.SPEECH_EVENT_UNSPECIFIED:
+                print(u"Received speech event type: {}".format(
+                    dictation_asr_pb2.StreamingRecognizeResponse.SpeechEventType.Name(recognition.speech_event_type)))
+                requests_iterator.audio_stream.close()
+
+            # process response type
+            elif recognition.results is not None and len(recognition.results) > 0:
+                first = recognition.results[0]
+                if first.is_final:
+                    if time_offsets:
+                        for word in first.alternatives[0].words:
+                            if word.word != '<eps>':
+                                confirmed_results.append(word.word)
+                                alignment.append([word.start_time, word.end_time])
+                    else:
+                        confirmed_results.append(first.alternatives[0].transcript)
+                    confidence = min(confidence, first.alternatives[0].confidence)
+                else:
+                    print(u"Temporal results - {}".format(first))
+
+        # build final results
+        final_alignment = [[]]
+        final_transc = ' '.join(confirmed_results)
+
+        if time_offsets and alignment:
+            final_alignment = alignment
+
+        return [{
+            'transcript': final_transc,
+            'alignment': final_alignment,
+            'confidence': confidence
+        }]  # array with one element
+
+    @staticmethod
+    def create_channel(address, ssl_directory):
+        if not ssl_directory:
+            return grpc.insecure_channel(address)
+
+        def read_file(path):
+            with open(path, 'rb') as file:
+                return file.read()
+
+        return grpc.secure_channel(address, grpc.ssl_channel_credentials(
+            read_file(os.path.join(ssl_directory, 'ca.crt')),
+            read_file(os.path.join(ssl_directory, 'client.key')),
+            read_file(os.path.join(ssl_directory, 'client.crt')),
+        ))
+
+    @staticmethod
+    def build_recognition_config(sampling_rate, settings):
+        recognition_config = dictation_asr_pb2.RecognitionConfig(
+            encoding='LINEAR16',  # one of LINEAR16, FLAC, MULAW, AMR, AMR_WB
+            sample_rate_hertz=sampling_rate,  # the rate in hertz
+            # See https://g.co/cloud/speech/docs/languages for a list of supported languages.
+            language_code='pl-PL',  # a BCP-47 language tag
+            enable_word_time_offsets=settings.time_offsets(),  # if true, return recognized word time offsets
+            max_alternatives=1,  # maximum number of returned hypotheses
+        )
+        if (settings.context_phrase()):
+            speech_context = recognition_config.speech_contexts.add()
+            speech_context.phrases.append(settings.context_phrase())
+
+        return recognition_config
+
+    @staticmethod
+    def build_configuration_request(sampling_rate, settings):
+        config_req = dictation_asr_pb2.StreamingRecognizeRequest(
+            streaming_config=dictation_asr_pb2.StreamingRecognitionConfig(
+                config=StreamingRecognizer.build_recognition_config(sampling_rate, settings),
+                single_utterance=settings.single_utterance(),
+                interim_results=settings.interim_results()
+            )
+            # no audio data in first request (config only)
+        )
+        # timeout settings
+        timeouts = settings.timeouts_map()
+        for settings_key in timeouts:
+            cf = config_req.streaming_config.config.config_fields.add()
+            cf.key = settings_key
+            cf.value = "{}".format(timeouts[settings_key])
+
+        return config_req