- 28 Feb, 2022 2 commits
-
-
Grzegorz Kostkowski authored
Refactor and improve code, use corpus2 1.9.0 See merge request !5
-
Grzegorz Kostkowski authored
-
- 25 Feb, 2022 6 commits
-
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
there was a need to reimplement `document_filter.py` module as some undesired behavior have been detected and current implementation did not allow to fix that: min length lemma filter has been applied to wsd token, which did not have much sense: If wsd correctly disambiguated certain token (e.g. "Łódź"), and entities in db have been found, there was no reason to exclude such links. Current implementation allows to specify type of token (annotation) for which certain filter should be applied. It also solves issue which raised, when one-time configuration has been introduced.
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
- 23 Feb, 2022 4 commits
-
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
- better handling of task configuration (adding task option won't cause changing code), - only specified subset of config options is allowed (provided default one, custom can be passed during init), - elinker uses "one-time" config manager configuration,
-
Grzegorz Kostkowski authored
Added: - possibility to supress parsing CLI input, - config manager can now generate one-time (per task) configuration, - extend base reader, - minor changes in main method interface,
-
- 22 Feb, 2022 1 commit
-
-
Grzegorz Kostkowski authored
Tool supports configuration from various sources: 1) INI file 2) command line 3) worker config Current implementation was messy and a bit buggy (catching SystemExit, `elinker -h` returns non-zero exit code and shows stack trace, etc.). It was also very excessive and hard to maintain. Thus, whole mechanism of processing configuration from mentioned sources (1), 2), 3)) has been reimplemented. base_reader.py and config_manager.py provides generic implementation for that. args_parser.py defines CLI interface for elinker. config.py defines "schema" and default values of elinker configuration and makes use of mentioned modules and provides single method `process_elinker_configuration` which do all the stuff under the hood. This changes will also allow to pass config from worker in a neat way (currently, are passed as cli options) and in consequence, duplication of config files (in tool repo and service repo) won't be needed as options to override can be now specified in worker config (not implemented yet). There are also changes in config INI files (better key names etc.).
-
- 21 Feb, 2022 1 commit
-
-
Grzegorz Kostkowski authored
-
- 18 Feb, 2022 1 commit
-
-
Grzegorz Kostkowski authored
Added: - logging - few config options
-
- 17 Feb, 2022 4 commits
-
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
Some options have been moved to the new writer section from linker section.
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
- 16 Feb, 2022 2 commits
-
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
This is optional and can be configured using `entities_list_prop_name` config option. Changing style of format for `url_key_format` key in `config.ini` ({} instead of %).
-
- 15 Feb, 2022 3 commits
-
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
new flag `ann_first_occ_only` has been added in writer, with default value set to False: all occurences of token will contain elinker annotations. Such behavior is compatible with previous versions of elinker. If needed to configure, it can be easily configured by passing proper value to `CclWriter` constructor.
-
Grzegorz Kostkowski authored
Thanks to this [fix](analysers/corpus2@b3aad6a7) tokens can be used in dicts (keys) and sets. Initial manual tests proved that now code is faster and results are same as in earlier version, except one thing: new hash function in corpus2 relies on token's content (not position), so in case of many occurences of same token, only first occurence gets generated annotation. Proper option will be added to configure this behavior.
-
- 07 Feb, 2022 8 commits
-
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
Support async db connector, convert project to use python 3.6, other changes See merge request !4
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
Bump version, update changelog, reorganize and add generated output for examples for current version
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
Changes: - reimplement methods related with getting eM links for ne and mwe annotated phrases (using annotation base lemma, not merge of base lemmas from tokens), - related refactor in elinker.py module to avoid code duplication, - minor fixes related with storing tokens as dict keys (`TokenDict`).
-
Grzegorz Kostkowski authored
-
- 04 Feb, 2022 3 commits
-
-
Grzegorz Kostkowski authored
Changes in `DaoInterface`: - removing internal only functions and add missing - using proper neo4j method as `get_equivalent_concepts` - adding (adjusting) missing methods for handling lemmas
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-
- 03 Feb, 2022 5 commits
-
-
Grzegorz Kostkowski authored
Changes: - decoding (%-escaped) entity identifiers, according to W3C recommendation: https://www.w3.org/TR/rdf-concepts/#section-Graph-URIref - provided identifiers of entities references Polish resource (WIKIPEDIA/DBPEDIA); these have been replaced with English identifiers as it is more reliable solution (e.g. English links are commonly used)
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
Changes related with introducing `AsyncSparqlConnector`: - add signatures of batch querying methods to `DaoInterface` - add default implementation for new methods, for other dao classes - changes in elinker module related with batch style of querying - FIXME: temporarily disable code using `get_crosswiki_concepts_by_tok_maps` in eliner module - it have to be reimplemented to use base forms of mwe ( currently merges lemmas) Other changes: - better resolving of crosswiki link: if there is no crosswiki entity for lemma, then preferred lexeme is checked - removing `add_crosswiki_to_token_map` method, as it is not used and retaining it would require refactor, because of changes described in above paragraph
-
Grzegorz Kostkowski authored
-
Grzegorz Kostkowski authored
-