......@@ -83,6 +83,19 @@ tokens = (token for paragraph in document.paragraphs()
for token in sentence.tokens())
To avoid loading large CCL documents to RAM (DOM parsers) we can read them
iteratively, chunk by chunk, or sentence by sentence (SAX-based approach):
it = read_chunks_it(ccl_path)
for paragraph in it:
it = read_sentences_it(ccl_path)
for sentence in it:
Token manipulation
