First step towards merge between rewrite-c-newconfig and new stuff in master...

First step towards merge between rewrite-c-newconfig and new stuff in master (explicit CLASS layer). Merging in changes to models and configs, README and tagger app. Merge commit '3d1f3' into rewrite-c-newconfig.

First step towards merge between rewrite-c-newconfig and new stuff in master...
5f359ae5 · Adam Radziszewski · 8b1f9af2 · 3d1f363f · 5f359ae5 · 5f359ae5
Commit 5f359ae5 authored May 28, 2014 by Adam Radziszewski
--- a/README
+++ b/README
@@ -4,15 +4,19 @@ Istitute of Informatics, Wrocław University of Technology
 http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki
 Dependencies:
-* Python 2.6 with headers
+* g++ 4.6.3
-* SWIG
+* CRF++ - http://crfpp.googlecode.com/svn/trunk/doc/index.html
-* CRF++ with Python support (install CRF++ itself first, then enter the `python' subdir and install Python wrappers); http://crfpp.googlecode.com/svn/trunk/doc/index.html
+* Corpus2 library (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki)
-* corpus2 library (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki) installed with Python support
+* MACA library (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki)
-* MACA library (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki) installed with Python support
 * Morfeusz SGJP (http://sgjp.pl/morfeusz/index.html), please install it before installing MACA so that it also builds Morfeusz plugin
-* wccl library (http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki) installed with Python support
+* WCCL library (http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki)
+WCRFT (Wrocław CRF Tagger) is a simple morpho-syntactic tagger for Polish.
+The tagger combines tiered tagging, conditional random fields (CRF) and features tailored for inflective languages written in WCCL. The algorithm and code are inspired by Wrocław Memory-Based Tagger. WCRFT uses CRF++ API as the underlying CRF implementation.
+Tiered tagging is assumed. Grammatical class is disambiguated first, then subsequent attributes (as defined in a config file) are taken care of. Each attribute is treated with a separate CRF and may be supplied a different set of feature templates.
 The tagger is able to tag morphologically analysed input (sentences divided into tokens, tokens assigned lists of candidate interpretations).
 If you need to tag plain text, it is recommended to use MACA for the analysis (http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki).
@@ -32,13 +36,12 @@ There are two possibilities with respect to placement of the model:
 Basic usage:
 The package comes with ready-made configuration for tagging (NCP, nkjp.pl) tagset. The configuration is config/nkjp.ini. A configuration specifies parameter values and points to a file with features used for different layers. To get a working tagger, a TRAINED MODEL is also needed. You can obtain one by training the tagger with a reference corpus and storing the model to a given directory, for instance:
-wcrft/wcrft.py -d path/to/nkjp_model config/nkjp_s2.ini --train path/to/training-corpus.xml -i xces
+wcrft-app -d path/to/nkjp_model config/nkjp_s2.ini --train path/to/training-corpus.xml -i xces
 Note: for best results it is highly recommended to re-analyse the training data using the same version of morphological analyser (e.g. the same MACA config) as will be using during tagger usage. The model available for download at the WCRFT wiki page already includes this.
 To use the trained model to tag a single file:
-wcrft/wcrft.py -d path/to/nkjp_model config/nkjp_s2.ini input.xml -O tagged.xml
+wcrft-app -d path/to/nkjp_model config/nkjp_s2.ini input.xml -O tagged.xml
-For more details, see wcrft.py -h and the project wiki.
+For more details, see wcrft-app -h and the project wiki.
--- a/libwcrft/CMakeLists.txt
+++ b/libwcrft/CMakeLists.txt
@@ -88,4 +88,5 @@ message(STATUS "Model directory is in ${libwcrft_SRC_MODEL_DIR}")
 install(DIRECTORY ${libwcrft_SRC_MODEL_DIR}/
 		DESTINATION ${libwcrft_INSTALL_DATA_DIR}
 		FILES_MATCHING PATTERN model
-					   PATTERN model/*)
+					   PATTERN model/*
+					   PATTERN model/*/*)
--- a/libwcrft/config/nkjp.ini
+++ b/libwcrft/config/nkjp.ini
-; NKJP tagset with unknown word treatment. This is the recommended config for NKJP.
+; NKJP tagset with unknown word treatment.
-;
+; This is an OUTDATED config for NKJP.
+; Use nkjp_s2 (slightly better)
+; or nkjp_e2 (much smaller and somewhat faster, works just slightly
+; worse than nkjp_s2).
 [general]
 tagset   = nkjp
 ; all the attrs
-attrs = nmb,cas,gnd,per,deg,asp,ngt,acm,acn,ppr,agg,vcl,dot
+attrs = CLASS,nmb,cas,gnd,per,deg,asp,ngt,acm,acn,ppr,agg,vcl,dot
 macacfg = morfeusz-nkjp-official
 [lexicon]

--- a/libwcrft/config/nkjp_e2-CLASS.txt
+++ b/libwcrft/config/nkjp_e2-CLASS.txt
+# Unigram
+# orth
+U00:%x[-2,1]
+U01:%x[-1,1]
+U02:%x[0,1]
+U03:%x[1,1]
+U04:%x[2,1]
+U05:%x[-1,0]
+U06:%x[0,0]
+U07:%x[1,0]
+# class
+U10:%x[-2,2]
+U11:%x[-1,2]
+U12:%x[0,2]
+U13:%x[1,2]
+U14:%x[2,2]
+# cas
+U20:%x[-2,3]
+U21:%x[-1,3]
+U22:%x[0,3]
+U23:%x[1,3]
+U24:%x[2,3]
+# gnd
+U30:%x[-2,4]
+U31:%x[-1,4]
+U32:%x[0,4]
+U33:%x[1,4]
+U34:%x[2,4]
+# nmb
+U40:%x[-2,5]
+U41:%x[-1,5]
+U42:%x[0,5]
+U43:%x[1,5]
+U44:%x[2,5]
+# regex feats
+U61:%x[0,8]/%x[0,9]
+# Bigram
+B
--- a/libwcrft/config/nkjp_e2-asp.txt
+++ b/libwcrft/config/nkjp_e2-asp.txt
+# Unigram
+# orth
+U00:%x[-2,1]
+U01:%x[-1,1]
+U02:%x[0,1]
+U03:%x[1,1]
+U04:%x[2,1]
+U05:%x[-1,0]
+U06:%x[0,0]
+U07:%x[1,0]
+# class
+U10:%x[-2,2]
+U11:%x[-1,2]
+U12:%x[0,2]
+U13:%x[1,2]
+U14:%x[2,2]
+# cas
+U21:%x[-1,3]
+U22:%x[0,3]
+U23:%x[1,3]
+# gnd
+U32:%x[0,4]
+# nmb
+U41:%x[-1,5]
+U42:%x[0,5]
+U43:%x[1,5]
+# regex feats
+U61:%x[0,8]/%x[0,9]
+# Bigram
+B
--- a/libwcrft/config/nkjp_e2-cas.txt
+++ b/libwcrft/config/nkjp_e2-cas.txt
+# Unigram
+# orth
+U00:%x[-2,1]
+U01:%x[-1,1]
+U02:%x[0,1]
+U03:%x[1,1]
+U04:%x[2,1]
+U05:%x[-1,0]
+U06:%x[0,0]
+U07:%x[1,0]
+U08:%x[-1,1]/%x[0,1]
+U09:%x[0,1]/%x[1,1]
+# class
+U10:%x[-2,2]
+U11:%x[-1,2]
+U12:%x[0,2]
+U13:%x[1,2]
+U14:%x[2,2]
+U15:%x[-2,2]/%x[-1,2]
+U16:%x[-1,2]/%x[0,2]
+U17:%x[0,2]/%x[1,2]
+U18:%x[1,2]/%x[2,2]
+# cas
+U20:%x[-2,3]
+U21:%x[-1,3]
+U22:%x[0,3]
+U23:%x[1,3]
+U24:%x[2,3]
+# gnd
+U30:%x[-2,4]
+U31:%x[-1,4]
+U32:%x[0,4]
+U33:%x[1,4]
+U34:%x[2,4]
+# nmb
+U40:%x[-2,5]
+U41:%x[-1,5]
+U42:%x[0,5]
+U43:%x[1,5]
+U44:%x[2,5]
+# agr
+U50:%x[-1,6] # agr(0,2) -> agr(-1,0)
+U51:%x[0,6] # agr(0,2)
+U52:%x[-1,7] # agr..(-1,2) -> agr(-2,0)
+U53:%x[0,7] # (-1,2)
+U54:%x[1,7] # ... -> (0,3)
+# regex feats
+U61:%x[0,8]/%x[0,9]
+# Bigram
+B
--- a/libwcrft/config/nkjp_e2-gnd.txt
+++ b/libwcrft/config/nkjp_e2-gnd.txt
+# Unigram
+# orth
+U00:%x[-2,1]
+U01:%x[-1,1]
+U02:%x[0,1]
+U03:%x[1,1]
+U04:%x[2,1]
+U05:%x[-1,0]
+U06:%x[0,0]
+U07:%x[1,0]
+U08:%x[-1,1]/%x[0,1]
+U09:%x[0,1]/%x[1,1]
+# class
+U10:%x[-2,2]
+U11:%x[-1,2]
+U12:%x[0,2]
+U13:%x[1,2]
+U14:%x[2,2]
+U15:%x[-2,2]/%x[-1,2]
+U16:%x[-1,2]/%x[0,2]
+U17:%x[0,2]/%x[1,2]
+U18:%x[1,2]/%x[2,2]
+# cas
+U20:%x[-2,3]
+U21:%x[-1,3]
+U22:%x[0,3]
+U23:%x[1,3]
+U24:%x[2,3]
+# gnd
+U30:%x[-2,4]
+U31:%x[-1,4]
+U32:%x[0,4]
+U33:%x[1,4]
+U34:%x[2,4]
+# nmb
+U40:%x[-2,5]
+U41:%x[-1,5]
+U42:%x[0,5]
+U43:%x[1,5]
+U44:%x[2,5]
+# agr
+U50:%x[-1,6] # agr(0,2) -> agr(-1,0)
+U51:%x[0,6] # agr(0,2)
+U52:%x[-1,7] # agr..(-1,2) -> agr(-2,0)
+U53:%x[0,7] # (-1,2)
+U54:%x[1,7] # ... -> (0,3)
+# regex feats
+U61:%x[0,8]/%x[0,9]
+# Bigram
+B
--- a/libwcrft/config/nkjp_e2-nmb.txt
+++ b/libwcrft/config/nkjp_e2-nmb.txt
+# Unigram
+# orth
+U00:%x[-2,1]
+U01:%x[-1,1]
+U02:%x[0,1]
+U03:%x[1,1]
+U04:%x[2,1]
+U05:%x[-1,0]
+U06:%x[0,0]
+U07:%x[1,0]
+U08:%x[-1,1]/%x[0,1]
+U09:%x[0,1]/%x[1,1]
+# class
+U10:%x[-2,2]
+U11:%x[-1,2]
+U12:%x[0,2]
+U13:%x[1,2]
+U14:%x[2,2]
+U15:%x[-2,2]/%x[-1,2]
+U16:%x[-1,2]/%x[0,2]
+U17:%x[0,2]/%x[1,2]
+U18:%x[1,2]/%x[2,2]
+# cas
+U20:%x[-2,3]
+U21:%x[-1,3]
+U22:%x[0,3]
+U23:%x[1,3]
+U24:%x[2,3]
+# gnd
+U30:%x[-2,4]
+U31:%x[-1,4]
+U32:%x[0,4]
+U33:%x[1,4]
+U34:%x[2,4]
+# nmb
+U40:%x[-2,5]
+U41:%x[-1,5]
+U42:%x[0,5]
+U43:%x[1,5]
+U44:%x[2,5]
+# agr
+U50:%x[-1,6] # agr(0,2) -> agr(-1,0)
+U51:%x[0,6] # agr(0,2)
+U52:%x[-1,7] # agr..(-1,2) -> agr(-2,0)
+U53:%x[0,7] # (-1,2)
+U54:%x[1,7] # ... -> (0,3)
+# regex feats
+U61:%x[0,8]/%x[0,9]
+# Bigram
+B
--- a/libwcrft/config/nkjp_e2.ccl
+++ b/libwcrft/config/nkjp_e2.ccl
+@ "default" (
+   affix(lower(orth[0]), 3);  // 0
+   affix(lower(orth[0]), -3); // 1
+   class[0]; // 2
+   cas[0];   // 3
+   gnd[0];   // 4
+   nmb[0];   // 5
+   agrpp(0,1,{nmb,gnd,cas}); // 6
+   and(inside(-1), inside(1), wagr(-1,1,{nmb,gnd,cas})); // 7
+   regex(orth[0], "\\P{Ll}.*"); regex(orth[0], "\\P{Lu}.*") // 8, 9
+)
--- a/libwcrft/config/nkjp_e2.ini
+++ b/libwcrft/config/nkjp_e2.ini
+; NKJP tagset with unknown word treatment, reduced feature set.
+; Got rid of agreement features.
+; For layers other than CLASS,nmb,gnd,cas reduced context to 3
+;
+[general]
+tagset   = nkjp
+; all the attrs
+attrs = CLASS,nmb,cas,gnd,asp
+; acm,dot could be useful for uknown
+macacfg = morfeusz-nkjp-official
+defaultmodel = model_nkjp10_wcrft_e2
+[lexicon]
+; currently lexicon itself is not used, but unk tag list is
+casesens   = no
+minfreq    = 10
+maxentries = 500
+[lemmatiser]
+; if lemmatiser outputs a lemma not present in morpho analysis
+; --- should the lemma be ignored (forcelemma = no)
+; or used to overwrite lemmas of each possible interpretation (yes)
+forcelemma = yes
+[crf]
+params   = -a CRF-L2 -f5
+[unknown]
+guess      = yes
+unktagfreq = 1
--- a/libwcrft/config/nkjp_s2.ini
+++ b/libwcrft/config/nkjp_s2.ini
@@ -6,8 +6,9 @@
 [general]
 tagset   = nkjp
 ; all the attrs
-attrs = nmb,cas,gnd,per,deg,asp,ngt,acm,acn,ppr,agg,vcl,dot
+attrs = CLASS,nmb,cas,gnd,per,deg,asp,ngt,acm,acn,ppr,agg,vcl,dot
 macacfg = morfeusz-nkjp-official
+defaultmodel = model_nkjp10_wcrft_s2
 [lexicon]
 ; currently lexicon itself is not used, but unk tag list is

--- a/libwcrft/config/nkjp_s2_class.ini
+++ b/libwcrft/config/nkjp_s2_class.ini
@@ -6,7 +6,7 @@
 [general]
 tagset   = nkjp
 ; all the attrs
-attrs = 
+attrs = CLASS
 macacfg = morfeusz-nkjp-official
 [lexicon]

--- a/libwcrft/config/nkjp_s6-CLASS.txt
+++ b/libwcrft/config/nkjp_s6-CLASS.txt
+# Unigram
+# orth
+U00:%x[-2,0]
+U01:%x[-1,0]
+U02:%x[0,0]
+U03:%x[1,0]
+U04:%x[2,0]
+# class
+U10:%x[-2,1]
+U11:%x[-1,1]
+U12:%x[0,1]
+U13:%x[1,1]
+U14:%x[2,1]
+# cas
+U20:%x[-2,2]
+U21:%x[-1,2]
+U22:%x[0,2]
+U23:%x[1,2]
+U24:%x[2,2]
+# gnd
+U30:%x[-2,3]
+U31:%x[-1,3]
+U32:%x[0,3]
+U33:%x[1,3]
+U34:%x[2,3]
+# nmb
+U40:%x[-2,4]
+U41:%x[-1,4]
+U42:%x[0,4]
+U43:%x[1,4]
+U44:%x[2,4]
+# regex feats
+#U60:%x[-1,7]/%x[-1,8]
+U61:%x[0,7]/%x[0,8]
+#U62:%x[1,7]/%x[1,8]
+# Bigram
+B
--- a/libwcrft/config/nkjp_s6-asp.txt
+++ b/libwcrft/config/nkjp_s6-asp.txt
+# Unigram
+# orth
+U00:%x[-2,0]
+U01:%x[-1,0]
+U02:%x[0,0]
+U03:%x[1,0]
+U04:%x[2,0]
+# class
+U10:%x[-2,1]
+U11:%x[-1,1]
+U12:%x[0,1]
+U13:%x[1,1]
+U14:%x[2,1]
+# cas
+U21:%x[-1,2]
+U22:%x[0,2]
+U23:%x[1,2]
+# gnd
+U32:%x[0,3]
+# nmb
+U41:%x[-1,4]
+U42:%x[0,4]
+U43:%x[1,4]
+# regex feats
+U61:%x[0,7]/%x[0,8]
+# Bigram
+B
--- a/libwcrft/config/nkjp_s6-cas.txt
+++ b/libwcrft/config/nkjp_s6-cas.txt
+# Unigram
+# orth
+U00:%x[-2,0]
+U01:%x[-1,0]
+U02:%x[0,0]
+U03:%x[1,0]
+U04:%x[2,0]
+U05:%x[-1,0]/%x[0,0]
+U06:%x[0,0]/%x[1,0]
+# class
+U10:%x[-2,1]
+U11:%x[-1,1]
+U12:%x[0,1]
+U13:%x[1,1]
+U14:%x[2,1]
+U15:%x[-2,1]/%x[-1,1]
+U16:%x[-1,1]/%x[0,1]
+U17:%x[0,1]/%x[1,1]
+U18:%x[1,1]/%x[2,1]
+# cas
+U20:%x[-2,2]
+U21:%x[-1,2]
+U22:%x[0,2]
+U23:%x[1,2]
+U24:%x[2,2]
+# gnd
+U30:%x[-2,3]
+U31:%x[-1,3]
+U32:%x[0,3]
+U33:%x[1,3]
+U34:%x[2,3]
+# nmb
+U40:%x[-2,4]
+U41:%x[-1,4]
+U42:%x[0,4]
+U43:%x[1,4]
+U44:%x[2,4]
+# agr
+U50:%x[-1,5] # agr(0,1) -> agr(-1,0)
+U51:%x[0,5] # agr(0,1)
+U52:%x[-1,6] # agr..(-1,1) -> agr(-2,0)
+U53:%x[0,6] # (-1,1)
+U54:%x[1,6] # ... -> (0,2)
+# regex feats
+#U60:%x[-1,7]/%x[-1,8]
+U61:%x[0,7]/%x[0,8]
+#U62:%x[1,7]/%x[1,8]
+# wordclass trigrams
+#U80:%x[-2,1]/%x[-1,1]/%x[0,1]
+#U81:%x[-1,1]/%x[0,1]/%x[1,1]
+#U82:%x[0,1]/%x[1,1]/%x[2,1]
+# Bigram
+B
--- a/libwcrft/config/nkjp_s6-gnd.txt
+++ b/libwcrft/config/nkjp_s6-gnd.txt
+# Unigram
+# orth
+U00:%x[-2,0]
+U01:%x[-1,0]
+U02:%x[0,0]
+U03:%x[1,0]
+U04:%x[2,0]
+U05:%x[-1,0]/%x[0,0]
+U06:%x[0,0]/%x[1,0]
+# class
+U10:%x[-2,1]
+U11:%x[-1,1]
+U12:%x[0,1]
+U13:%x[1,1]
+U14:%x[2,1]
+U15:%x[-2,1]/%x[-1,1]
+U16:%x[-1,1]/%x[0,1]
+U17:%x[0,1]/%x[1,1]
+U18:%x[1,1]/%x[2,1]
+# cas
+U20:%x[-2,2]
+U21:%x[-1,2]
+U22:%x[0,2]
+U23:%x[1,2]
+U24:%x[2,2]
+# gnd
+U30:%x[-2,3]
+U31:%x[-1,3]
+U32:%x[0,3]
+U33:%x[1,3]
+U34:%x[2,3]
+# nmb
+U40:%x[-2,4]
+U41:%x[-1,4]
+U42:%x[0,4]
+U43:%x[1,4]
+U44:%x[2,4]
+# agr
+U50:%x[-1,5] # agr(0,1) -> agr(-1,0)
+U51:%x[0,5] # agr(0,1)
+U52:%x[-1,6] # agr..(-1,1) -> agr(-2,0)
+U53:%x[0,6] # (-1,1)
+U54:%x[1,6] # ... -> (0,2)
+# regex feats
+#U60:%x[-1,7]/%x[-1,8]
+U61:%x[0,7]/%x[0,8]
+#U62:%x[1,7]/%x[1,8]
+# wordclass trigrams
+#U80:%x[-2,1]/%x[-1,1]/%x[0,1]
+#U81:%x[-1,1]/%x[0,1]/%x[1,1]
+#U82:%x[0,1]/%x[1,1]/%x[2,1]
+# Bigram
+B
--- a/libwcrft/config/nkjp_s6-nmb.txt
+++ b/libwcrft/config/nkjp_s6-nmb.txt
+# Unigram
+# orth
+U00:%x[-2,0]
+U01:%x[-1,0]
+U02:%x[0,0]
+U03:%x[1,0]
+U04:%x[2,0]
+U05:%x[-1,0]/%x[0,0]
+U06:%x[0,0]/%x[1,0]
+# class
+U10:%x[-2,1]
+U11:%x[-1,1]
+U12:%x[0,1]
+U13:%x[1,1]
+U14:%x[2,1]
+U15:%x[-2,1]/%x[-1,1]
+U16:%x[-1,1]/%x[0,1]
+U17:%x[0,1]/%x[1,1]
+U18:%x[1,1]/%x[2,1]
+# cas
+U20:%x[-2,2]
+U21:%x[-1,2]
+U22:%x[0,2]
+U23:%x[1,2]
+U24:%x[2,2]
+# gnd
+U30:%x[-2,3]
+U31:%x[-1,3]
+U32:%x[0,3]
+U33:%x[1,3]
+U34:%x[2,3]
+# nmb
+U40:%x[-2,4]
+U41:%x[-1,4]
+U42:%x[0,4]
+U43:%x[1,4]
+U44:%x[2,4]
+# agr
+U50:%x[-1,5] # agr(0,1) -> agr(-1,0)
+U51:%x[0,5] # agr(0,1)
+U52:%x[-1,6] # agr..(-1,1) -> agr(-2,0)
+U53:%x[0,6] # (-1,1)
+U54:%x[1,6] # ... -> (0,2)
+# regex feats
+#U60:%x[-1,7]/%x[-1,8]
+U61:%x[0,7]/%x[0,8]
+#U62:%x[1,7]/%x[1,8]
+# wordclass trigrams
+#U80:%x[-2,1]/%x[-1,1]/%x[0,1]
+#U81:%x[-1,1]/%x[0,1]/%x[1,1]
+#U82:%x[0,1]/%x[1,1]/%x[2,1]
+# Bigram
+B
--- a/libwcrft/config/nkjp_s6.ccl
+++ b/libwcrft/config/nkjp_s6.ccl
+@ "default" (
+   orth[0];  // 0
+   class[0]; // 1
+   cas[0];   // 2
+   gnd[0];   // 3
+   nmb[0];   // 4
+   agrpp(0,1,{nmb,gnd,cas}); // 5
+   and(inside(-1), inside(1), wagr(-1,1,{nmb,gnd,cas})); // 6
+   regex(orth[0], "\\P{Ll}.*"); regex(orth[0], "\\P{Lu}.*") // 7, 8
+)
--- a/libwcrft/config/nkjp_s6.ini
+++ b/libwcrft/config/nkjp_s6.ini
+; NKJP tagset with unknown word treatment, reduced feature set.
+; Generates quite small models and works almost as accurately
+; as nkjp_s2.
+[general]
+tagset   = nkjp
+; all the attrs
+attrs = CLASS,nmb,cas,gnd,asp
+; acm,dot could be useful for uknown
+macacfg = morfeusz-nkjp-official
+defaultmodel = model_nkjp10_wcrft_s6
+[lexicon]
+; currently lexicon itself is not used, but unk tag list is
+casesens   = no
+minfreq    = 10
+maxentries = 500
+[lemmatiser]
+; if lemmatiser outputs a lemma not present in morpho analysis
+; --- should the lemma be ignored (forcelemma = no)
+; or used to overwrite lemmas of each possible interpretation (yes)
+forcelemma = yes
+[crf]
+params   = -a CRF-L2 -f5
+[unknown]
+guess      = yes
+unktagfreq = 1
--- a/libwcrft/model/model_nkjp10_wcrft_e2/info.txt
+++ b/libwcrft/model/model_nkjp10_wcrft_e2/info.txt
+Model trained on full NKJP 1.0.
+To be used with nkjp_e2.ini config.
+Trained with WCRFT 0.9.5, 1 April 2014.
+time wcrft --train nkjp_e2.ini -d model_nkjp10_wcrft_e2/ nkjp10-merged-ng-rea.xml -v
+real    262m53.201s
+user    2475m4.391s
+sys     0m45.750s