## PLWN API PLWN API is a library for accessing the plWordNet lexicon in a Python program. Setup ===== To use the api one has to download a default data dump. To setup the api you have to `download` command from plwn library. ```python import plwn plwn.download() ``` The script will download a default plWordNet dump in sqlite format. The default dump is a compressed version of plWordNet 4.5 - a database file `default_model` should be downloaded to your disk. Usage ===== Access is provided using a PLWordNet object, with data loaded from the database dump. To get thethe database dump use the `download` method (see ,,Setup'' section). ```python import plwn wn = plwn.load("./default_model") ``` Using that object, it's possible to obtain synset and lexical unit data. ```python lex = wn.lexical_unit('pies', plwn.PoS.noun_pl, 2) print(lex) >>> pies.2(21:zw) print(lex.definition) >>> pies domowy - popularne zwierzę domowe, przyjaciel człowieka. ``` Getting synsets, lexical units, and relations: 1. All synsets ```python synsets = wn.synsets() synset = synsets[0] synset.id ``` 2. All lexical units ```python units = wn.lexical_units() unit = units[0] unit.id unit.lemma unit.pos unit.variant unit.definition ``` 3. Relations ```python synset.relations() unit.relations() ``` Full documentation ================== For description of loading plWordNet data: ```bash pydoc plwn._loading ``` For description of the PLWordNet class and others: ```bash pydoc plwn.bases ``` Creating sqlite API dumps from wordnet database sql dump =================================== Latest wordnet database dump can be obtained from http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz ```bash wget http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz ``` This step requires access to mysql server or installed locally. It can be loaded using shell command: ```bash mysql -e 'CREATE SCHEMA wordnet_new' # For maintaining multiple versions. atool -x wordnet-work.LATEST.sql.gz # Unpack dump mysql -D wordnet_new < wordnet-work.LATEST.sql ``` It is then recommended to run `clean_wndb.sql` script to remove any mistakes in an unlikely case that the dump contains some, such as invalid enum values or invalid foreign keys. ```bash mysql -D wordnet_new < clean_wndb.sql ``` Then, edit connection string in storage-dumps if necessary according to sqlalchemy format. Default values are all set to "wordnet", in the example DATABASE will be "wordnet_new". mysql+mysqldb://wordnet:wordnet@localhost/wordnet_new?charset=utf8 To run next step make sure you have installed: ```bash sudo apt-get install libmysqlclient-dev (when you are connecting to external mysql server) pip install pymysql pip install mysqlclient pip install plwn_comments pip install sqlalchemy ``` After that, the database can be read and saved into the API format. ```python import plwn api = plwn.read("connection.txt", "database", "plwn-new.db", "sqlite3") ``` To load this version at a later date, use `plwn.load(path)` instead of `plwn.load_default()` ```python wn = plwn.load("storage-dumps/plwn-new.db") ``` Manually downloading API dumps ===================== In order to download one of the dumps available at https://minio.clarin-pl.eu/minio/public/models/: - latest model file plwn_dump_25-02-2020.sqlite ```python import plwn plwn.download("/path/to/your/database/sqlite/dump") ``` File will be downloaded to the current directory. If optional_name is not provided default dump will be downloaded (see ,,Setup'' section). If optional_name is provided but doesn't match name of any available dumps, the process will fail and display possible names. You need to setup config.ini file. Licenses ======== The python software is provided on terms of the LGPL 3.0 license (see COPYING and COPYING.LESSER). Lexicon data is provided on terms of the WordNet license (see LICENSE-PWN.txt) for the original Princeton WordNet synsets and relations, and the plWordNet license (see LICENSE-plWN.txt) for other entities.