Newer
Older
IOBBER, a chunker for Slavic languages based on CRF++ and WCCL
(c) 2012, Adam Radziszewski (name.surname at pwr.wroc.pl)
Istitute of Informatics, Wrocław University of Technology
The software is written in Python, but requires additional C++/Python modules to work.
You need to install the following packages beforehand:
* Python setuptools for installation,
* WCCL with Python support; http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki
* Corpus2 with Python support (also required by WCCL); http://nlp.pwr.wroc.pl/redmine/projects/corpus2/wiki
* CRF++ with Python support (install CRF++ itself first, then enter the `python' subdir and install Python wrappers); http://crfpp.googlecode.com/svn/trunk/doc/index.html
If the above packages have been correctly installed, the installation of iobber is simple:
sudo python setup.py install
This will install the python modules (iobber package), the iobber executable and the default configuration for KPWr and a trained model ready to use.
To use the trained model, issue the following (for more details please consult README and the output of iobber -h):
iobber kpwr.ini -d model-kpwr04/ my_xces_input.xml -i xces -O ccl_chunked_output.xml
If there is need to recognise chunk syntactic heads model-kpwr04-H can be used:
iobber kpwr.ini -d model-kpwr04+H/ my_xces_input.xml -i xces -O ccl_chunked_output.xml
NOTE: the kpwr.ini configuration assumes that the input is morphosyntactically tagged.