Commit 80bdf63c authored by Arkadiusz Janz's avatar Arkadiusz Janz


parent fe947b88
......@@ -83,6 +83,19 @@ tokens = (token for paragraph in document.paragraphs()
for token in sentence.tokens())
To avoid loading large CCL documents to RAM (DOM parsers) we can read them
iteratively, chunk by chunk, or sentence by sentence (SAX-based approach):
it = read_chunks_it(ccl_path)
for paragraph in it:
it = read_sentences_it(ccl_path)
for sentence in it:
Token manipulation
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment