If the files loaded for prediction, are encoded using UTF-8 with BOM, they result in one additional 'empty' token at the beginning of each file. A suggested change is to change the loading of text files with the 'utf-8-sig' codec instead of the 'utf-8' codec.