Skip to content

Support async db connector, convert project to use python 3.6, other changes

Grzegorz Kostkowski requested to merge develop into master

Version 0.8

Added

  • support for asynchronous querying (SPARQL database),
  • custom implementation of dict for tokens (TokenDict) as a workaround for an issue that tokens are not hashable in corpus2-python3.6,

Changed

  • converting project from 2.7 to 3.6 version of Python,
  • AllegroGraphDao now uses new sparql_connectors package, which provides asynchronous connector and neat conversion from db format,
  • maintain compatibility with Neo4j database (updates in neo4j dao class, added default implementations for batch variants of methods),
  • refactor of methods in elinker module (required because of changes):
    • introducing batch querying as it is more beneficial in case of async connector,
    • other changes,
  • update list of known LOD resources,
  • implementation of method returning unannotated tokens,

Fixed

  • Fixed crosswiki resource file: decoding (%-escaped) entity identifiers and using English identifiers instead of Polish,
  • Decreasing execution time by fixung caching and adding extra info to get_annotations function call to avoid expensive operations,
  • Change SPARQL queries: improve (proper to current structure of AG database) and fix (as current queries were incorrect),
  • better resolving of lemmas for crosswiki links:
    • wsd: if there is no crosswiki entity for lemma, then preferred lexeme is checked,
    • mwe and ne: using proper base lemmas of annotations (as merging of base lemmas from tokens is not reliable),
  • Remove filtering statement for LOD resources,

Merge request reports