diff --git a/README.md b/README.md index d10e8a0339e63b1ecd27df785bed17e8d736d226..9cbfe309f381114c287fdd153a8594c0c5f0b40c 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,19 @@ tokens = (token for paragraph in document.paragraphs() for token in sentence.tokens()) ``` +To avoid loading large CCL documents to RAM (DOM parsers) we can read them +iteratively, chunk by chunk, or sentence by sentence (SAX-based approach): + +```python +it = read_chunks_it(ccl_path) +for paragraph in it: + pass + +it = read_sentences_it(ccl_path) +for sentence in it: + pass +``` + Token manipulation ==================