* CRF++ with Python support (install CRF++ itself first, then enter the `python' subdir and install Python wrappers); http://crfpp.googlecode.com/svn/trunk/doc/index.html
WCRFT (Wrocław CRF Tagger) is a simple morpho-syntactic tagger for Polish.
The tagger combines tiered tagging, conditional random fields (CRF) and features tailored for inflective languages written in WCCL. The algorithm and code are inspired by Wrocław Memory-Based Tagger. WCRFT uses CRF++ API as the underlying CRF implementation.
Tiered tagging is assumed. Grammatical class is disambiguated first, then subsequent attributes (as defined in a config file) are taken care of. Each attribute is treated with a separate CRF and may be supplied a different set of feature templates.
The tagger is able to tag morphologically analysed input (sentences divided into tokens, tokens assigned lists of candidate interpretations).
The tagger is able to tag morphologically analysed input (sentences divided into tokens, tokens assigned lists of candidate interpretations).
If you need to tag plain text, it is recommended to use MACA for the analysis (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki).
If you need to tag plain text, it is recommended to use MACA for the analysis (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki).
...
@@ -32,13 +36,12 @@ There are two possibilities with respect to placement of the model:
...
@@ -32,13 +36,12 @@ There are two possibilities with respect to placement of the model:
Basic usage:
Basic usage:
The package comes with ready-made configuration for tagging (NCP, nkjp.pl) tagset. The configuration is config/nkjp.ini. A configuration specifies parameter values and points to a file with features used for different layers. To get a working tagger, a TRAINED MODEL is also needed. You can obtain one by training the tagger with a reference corpus and storing the model to a given directory, for instance:
The package comes with ready-made configuration for tagging (NCP, nkjp.pl) tagset. The configuration is config/nkjp.ini. A configuration specifies parameter values and points to a file with features used for different layers. To get a working tagger, a TRAINED MODEL is also needed. You can obtain one by training the tagger with a reference corpus and storing the model to a given directory, for instance:
Note: for best results it is highly recommended to re-analyse the training data using the same version of morphological analyser (e.g. the same MACA config) as will be using during tagger usage. The model available for download at the WCRFT wiki page already includes this.
Note: for best results it is highly recommended to re-analyse the training data using the same version of morphological analyser (e.g. the same MACA config) as will be using during tagger usage. The model available for download at the WCRFT wiki page already includes this.