Skip to content
Snippets Groups Projects
Select Git revision
  • a6605b0f6851ffbd00d4cbc7ded34f067e1b9834
  • master default protected
  • develop
  • chore_add_auto_sqlite_dump_upload
4 results

utils

  • Clone with SSH
  • Clone with HTTPS
  • ======== PLWN API

    PLWN API is a library for accessing the plWordNet lexicon in a Python program.

    Usage

    Access is provided using a PLWordNet object, with data loaded from the database dump.

    >>> import plwn
    >>> wn = plwn.load_default()

    Using that object, it's possible to obtain synset and lexical unit data.

    >>> lex = wn.lexical_unit('pies', plwn.PoS.noun_pl, 2)
    >>> print(lex)
    pies.2(21:zw)
    >>> print(lex.definition)
    pies domowy - popularne zwierzę domowe, przyjaciel człowieka.

    Full documentation

    For description of loading plWordNet data:

    $ pydoc plwn._loading

    For description of the PLWordNet class and others:

    $ pydoc plwn.bases

    Creating API dumps from wordnet sql

    Latest wordnet database dump can be obtained from http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz

    It can be loaded using shell command:

    $ mysql -e 'CREATE SCHEMA wordnet_new' # For maintaining multiple versions.
    $ mysql -D wordnet_new < wordnet-work.LATEST.sql.gz

    It is then recommended to run clean_wndb.sql script to remove any mistakes in an unlikely case that the dump contains some, such as invalid enum values or invalid foreign keys.

    $ mysql -D wordnet_new < clean_wndb.sql

    Then, edit connection string in storage-dumps if necessary according to sqlalchemy format. Default values are all set to "wordnet", in the example DATABASE will be "wordnet_new".

    mysql+mysqldb://wordnet:wordnet@localhost/wordnet_new?charset=utf8

    After that, the database can be read and saved into the API format. Only works in Python 2!

    >>> import sys; print(sys.version)
    2.7.12
    >>> import plwn
    >>> api = plwn.read("connection.txt", "database", "plwn-new.db", "sqlite3")

    To load this version at a later date, use plwn.load(path) instead of plwn.load_default()

    >>> api = plwn.load("storage-dumps/plwn-new.db")

    Downloading API dumps

    In order to download one of the dumps available at https://minio.clarin-pl.eu/ :

    import plwn
    plwn.download("optional_name")

    File will be downloaded to the current directory. If optional_name is not provided default dump will be downloaded. If optional_name is provided but doesn't match name of any available dumps, the process will fail and display possible names.

    Licenses

    The python software is provided on terms of the LGPL 3.0 license (see COPYING and COPYING.LESSER).

    Lexicon data is provided on terms of the WordNet license (see LICENSE-PWN.txt) for the original Princeton WordNet synsets and relations, and the plWordNet license (see LICENSE-plWN.txt) for other entities.