Skip to content
Snippets Groups Projects
Commit 3b54ad12 authored by Marcin Wątroba's avatar Marcin Wątroba
Browse files

Fix pipeline

parent 9a94c87e
Branches
1 merge request!13Change data model
......@@ -96,9 +96,9 @@ stages:
md5: 2d66cb8890c420b55e8b7eb33ac32ba2
size: 3558
- path: experiment_data/cached_asr/luna_ajn_polish_asr
md5: 620e178854dbcb69f49a608f34573a88.dir
size: 6159899
nfiles: 494
md5: 10454ef4568c2023e9d51ad418db2854.dir
size: 1276562
nfiles: 495
- path: experiment_data/dataset/LUNA.PL
md5: d342155b1871e881797cf7da09d5dc3c.dir
size: 1578358645
......@@ -113,33 +113,33 @@ stages:
nfiles: 500
outs:
- path: experiment_data/pipeline/asr_benchmark_luna/ajn_polish_asr
md5: fa9d926ae8fd0268c71f19c1d5d39fcf.dir
size: 11080541
nfiles: 499
md5: 8c080d8110e5860e78bfcb311fe2b90d.dir
size: 6204883
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/ajn_spacy
md5: 417d8f07266eb5da9c4bfbf84f3b4eac.dir
size: 6579351
nfiles: 499
md5: f06d2f1369b18e5fa126af5a00a8f0b8.dir
size: 6590702
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/pos_ajn_alignment_wer
md5: 2bf746c412e6bff4071f689d853b106f.dir
size: 22061350
nfiles: 499
md5: 164f3b4796bcab894831da4f0a0fa0af.dir
size: 22096130
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/pos_ajn_metrics_wer
md5: 3147413bdfd36ad91c64303e8705951b.dir
size: 17002
nfiles: 499
md5: ee5ae7387429992fe04fcbde24e2bd24.dir
size: 17037
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/word_ajn_alignment_wer
md5: 2bb11f8a97cdeb18c557fadb49a6f015.dir
size: 25669158
nfiles: 499
md5: 00d84c15ae1c1a491625ee4dd8db6418.dir
size: 20803179
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/word_ajn_alignment_wer_embeddings
md5: c2824c0c5cf433dbf864ebbdc2fb3cfc.dir
size: 44326962
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/word_ajn_metrics_wer
md5: c48c74eccf1cfd0768900514d2fcfd1b.dir
size: 10527
nfiles: 499
md5: fdbccc71fa84d0a68f4cd6723399e5dd.dir
size: 17045
nfiles: 500
- path: experiment_data/pipeline/asr_benchmark_luna/word_ajn_metrics_wer_embeddings
md5: 98a7edeee3b630e8e301acfc578a8393.dir
size: 34869
......@@ -284,8 +284,8 @@ stages:
md5: 85e8d3d79379e6d5db751e03c5ebae75
size: 4161
- path: experiment_data/cached_asr/voicelab_cbiz_testset_20220322_ajn
md5: 49a38b90f1265a61b90b54f820415011.dir
size: 32601414
md5: 0705aafa0969142288cc9baa88d1ed57.dir
size: 6896694
nfiles: 800
- path: experiment_data/dataset/voicelab_cbiz_testset_20220322
md5: 3c2b18e1f1f89e4c5ad7b254e472b25e.dir
......@@ -301,15 +301,15 @@ stages:
nfiles: 800
outs:
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/ajn_polish_asr
md5: 94181d7a0731e8defbdcb4b477ad72a2.dir
size: 48470646
md5: da10bb60107a86f98b2d07fef5616390.dir
size: 22765926
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/ajn_spacy
md5: ef8be18b8acca299f9b9542ac8643a87.dir
md5: e8a48a0a63c1569ec734e1c8bb03c7db.dir
size: 20536889
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/pos_ajn_alignment_wer
md5: b2d3a9872e6016cfde8e6d025bef373b.dir
md5: 7806779e936ec6121b8d72e0d0e3ed59.dir
size: 78539613
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/pos_ajn_metrics_wer
......@@ -317,16 +317,16 @@ stages:
size: 27353
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/word_ajn_alignment_wer
md5: acb5337346e70bed974dfe7ca7947d79.dir
size: 104789466
md5: d190f33e6643f62ecbeb9e5ae5fb5e02.dir
size: 78992762
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/word_ajn_alignment_wer_embeddings
md5: 93d34d82f8536014ddbe0cf0645dd837.dir
size: 174322727
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/word_ajn_metrics_wer
md5: 903096554a3ea6896c4abaa5e2c71d4c.dir
size: 16505
md5: 04f6ccbaf94cf08c34ac201ae079c21c.dir
size: 25307
nfiles: 800
- path: experiment_data/pipeline/asr_benchmark_voicelab_cbiz_testset_20220322/word_ajn_metrics_wer_embeddings
md5: 1fc2985ad4c3cb00d05b1865ad5b22d4.dir
......
outs:
- md5: 620e178854dbcb69f49a608f34573a88.dir
size: 6159899
- md5: 0f60fb48fc5f9a46e6b2262bd994e8e8.dir
size: 1273907
nfiles: 494
path: luna_ajn_polish_asr
import json
from pathlib import Path
from pprint import pprint
if __name__ == '__main__':
paths = sorted(list(Path('voicelab_cbiz_testset_20220322_techmo').iterdir()))
for it in paths[:1]:
try:
data = json.load(open(it, 'r'))
pprint(data)
# data['transcription'] = [it['text']['text'] for it in data['transcription']]
# pprint(data)
# json.dump(data, open(it, 'w'))
except:
print(it)
outs:
- md5: 49a38b90f1265a61b90b54f820415011.dir
size: 32601414
- md5: 0705aafa0969142288cc9baa88d1ed57.dir
size: 6896694
nfiles: 800
path: voicelab_cbiz_testset_20220322_ajn
source diff could not be displayed: it is too large. Options to address this: view the blob.
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment