Newer
Older
PLWN API is a library for accessing the plWordNet lexicon in a Python program.
Setup
=====
To use the api one has to download a default data dump. To setup the api you have to `download` command from plwn library.
```python
import plwn
plwn.download()
```
The script will download a default plWordNet dump in sqlite format. The default dump is a compressed version of plWordNet 4.5 - a database file `default_model` should be downloaded to your disk.
Usage
=====
Access is provided using a PLWordNet object, with data loaded from the database
dump. To get thethe database dump use the `download` method (see ,,Setup'' section).
```python
import plwn
wn = plwn.load("./default_model")
```
Using that object, it's possible to obtain synset and lexical unit data.
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
```python
lex = wn.lexical_unit('pies', plwn.PoS.noun_pl, 2)
print(lex)
>>> pies.2(21:zw)
print(lex.definition)
>>>pies domowy - popularne zwierzę domowe, przyjaciel człowieka.
```
Getting synsets, lexical units, and relations:
1. All synsets
```python
synsets = wn.synsets()
synset = synsets[0]
synset.id
```
2. All lexical units
```python
units = wn.lexical_units()
unit = units[0]
unit.id
unit.lemma
unit.pos
unit.variant
unit.definition
```
3. Relations
```python
synset.relations()
unit.relations()
```
Full documentation
==================
For description of loading plWordNet data:
For description of the PLWordNet class and others:
===================================
Latest wordnet database dump can be obtained from
http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz
```bash
wget http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz
```
This step requires access to mysql server or installed locally.
It can be loaded using shell command:
```bash
mysql -e 'CREATE SCHEMA wordnet_new' # For maintaining multiple versions.
atool -x wordnet-work.LATEST.sql.gz # Unpack dump
mysql -D wordnet_new < wordnet-work.LATEST.sql
```
It is then recommended to run `clean_wndb.sql` script to remove any mistakes
in an unlikely case that the dump contains some, such as invalid enum values
or invalid foreign keys.
```bash
mysql -D wordnet_new < clean_wndb.sql
```
Then, edit connection string in storage-dumps if necessary according to sqlalchemy format.
Default values are all set to "wordnet", in the example DATABASE will be "wordnet_new".
mysql+mysqldb://wordnet:wordnet@localhost/wordnet_new?charset=utf8
```bash
sudo apt-get install libmysqlclient-dev (when you are connecting to external mysql server)
pip install pymysql
pip install mysqlclient
pip install plwn_comments
pip install sqlalchemy
```
After that, the database can be read and saved into the API format.
```python
import plwn
api = plwn.read("connection.txt", "database", "plwn-new.db", "sqlite3")
```
To load this version at a later date, use `plwn.load(path)` instead of `plwn.load_default()`
```python
wn = plwn.load("storage-dumps/plwn-new.db")
```
Manually downloading API dumps
In order to download one of the dumps available at https://minio.clarin-pl.eu/minio/public/models/:
- latest model file plwn_dump_25-02-2020.sqlite
```python
import plwn
plwn.download("/path/to/your/database/sqlite/dump")
```
If optional_name is not provided default dump will be downloaded (see ,,Setup'' section).
If optional_name is provided but doesn't match name of any available dumps, the process will fail and display possible names. You need to setup config.ini file.
Licenses
========
The python software is provided on terms of the LGPL 3.0 license (see COPYING
and COPYING.LESSER).
Lexicon data is provided on terms of the WordNet license (see LICENSE-PWN.txt)
for the original Princeton WordNet synsets and relations, and the plWordNet
license (see LICENSE-plWN.txt) for other entities.