Skip to content

Multi word ner bugfix

Szymon Ciombor requested to merge multi-word-ner-bugfix into master

Resolves #5 (closed), freezes spacy pypi version (critical)

Before:

   <tok>
    <orth>Bliskim</orth>
    <lex disamb="1"><base>bliski</base><ctag>ADJ</ctag></lex>
    <ann chan="geogName">1</ann>
   </tok>
   <tok>
    <orth>Wschodem</orth>
    <lex disamb="1"><base>wschód</base><ctag>NOUN</ctag></lex>
    <ann chan="geogName">1</ann>
   </tok>

   <tok>
    <orth>Unii</orth>
    <lex disamb="1"><base>unia</base><ctag>NOUN</ctag></lex>
    <ann chan="orgName">1</ann>
   </tok>
   <tok>
    <orth>Europejskiej</orth>
    <lex disamb="1"><base>europejski</base><ctag>ADJ</ctag></lex>
    <ann chan="orgName">1</ann>
   </tok>

EDIT: After:

   <tok>
    <orth>Morze</orth>
    <lex disamb="1"><base>mór</base><ctag>NOUN</ctag></lex>
    <ann chan="geogName">1</ann>
    <ann chan="orgName">0</ann>
   </tok>
   <tok>
    <orth>Śródziemne</orth>
    <lex disamb="1"><base>śródziemny</base><ctag>ADJ</ctag></lex>
    <ann chan="geogName">1</ann>
    <ann chan="orgName">0</ann>
   </tok>

..........

   <tok>
    <orth>Bliskim</orth>
    <lex disamb="1"><base>bliski</base><ctag>ADJ</ctag></lex>
    <ann chan="geogName">2</ann>
    <ann chan="orgName">0</ann>
   </tok>
   <tok>
    <orth>Wschodem</orth>
    <lex disamb="1"><base>wschód</base><ctag>NOUN</ctag></lex>
    <ann chan="geogName">2</ann>
    <ann chan="orgName">0</ann>
   </tok>
Edited by Szymon Ciombor

Merge request reports