Skip to content
Snippets Groups Projects
README.md 3.95 KiB
Newer Older
Arkadiusz Janz's avatar
Arkadiusz Janz committed
## PLWN API
Tomasz Walkowiak's avatar
Tomasz Walkowiak committed

PLWN API is a library for accessing the plWordNet lexicon in a Python program.


Arkadiusz Janz's avatar
Arkadiusz Janz committed
To use the api one has to download a default data dump. To setup the api you have to execute `download` command from plwn library.

```python
import plwn

plwn.download()
```

The script will download a default plWordNet dump in sqlite format. The default dump is a compressed version of plWordNet 4.5 - a database file `default_model` should be downloaded to your disk.


Usage
=====

Access is provided using a PLWordNet object, with data loaded from the database
Arkadiusz Janz's avatar
Arkadiusz Janz committed
dump. To get the database dump use the `download` method (see ,,Setup'' section).
Arkadiusz Janz's avatar
Arkadiusz Janz committed
```python
import plwn

wn = plwn.load("./default_model")
```

Using that object, it's possible to obtain synset and lexical unit data.

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```python

lex = wn.lexical_unit('pies', plwn.PoS.noun_pl, 2)

print(lex)
>>> pies.2(21:zw)

print(lex.definition)
Arkadiusz Janz's avatar
Arkadiusz Janz committed
>>> pies domowy - popularne zwierzę domowe, przyjaciel człowieka.
Arkadiusz Janz's avatar
Arkadiusz Janz committed

```

Getting synsets, lexical units, and relations:

1. All synsets

```python
synsets = wn.synsets()

synset = synsets[0]
synset.id
```

2. All lexical units

```python
units = wn.lexical_units()

unit = units[0]
unit.id
unit.lemma
unit.pos
unit.variant
unit.definition
```

3. Relations
Arkadiusz Janz's avatar
Arkadiusz Janz committed
```python
synset.relations()
unit.relations()
```

Full documentation
==================

For description of loading plWordNet data:

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```bash
pydoc plwn._loading
```

For description of the PLWordNet class and others:

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```bash
pydoc plwn.bases
```
Arkadiusz Janz's avatar
Arkadiusz Janz committed
Creating sqlite API dumps from wordnet database sql dump
===================================

Latest wordnet database dump can be obtained from
http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```bash
wget http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz
```
Tomasz Naskret's avatar
Tomasz Naskret committed

This step requires access to mysql server or installed locally.

It can be loaded using shell command:

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```bash
mysql -e 'CREATE SCHEMA wordnet_new' # For maintaining multiple versions.
atool -x wordnet-work.LATEST.sql.gz  # Unpack dump
mysql -D wordnet_new < wordnet-work.LATEST.sql
```

It is then recommended to run `clean_wndb.sql` script to remove any mistakes
in an unlikely case that the dump contains some, such as invalid enum values
or invalid foreign keys.

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```bash
mysql -D wordnet_new < clean_wndb.sql
```

Then, edit connection string in storage-dumps if necessary according to sqlalchemy format.
Default values are all set to "wordnet", in the example DATABASE will be "wordnet_new".

    mysql+mysqldb://wordnet:wordnet@localhost/wordnet_new?charset=utf8

Tomasz Naskret's avatar
Tomasz Naskret committed
To run next step make sure you have installed:

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```bash
sudo apt-get install libmysqlclient-dev (when you are connecting to external mysql server)
pip install pymysql
pip install mysqlclient
pip install plwn_comments
pip install sqlalchemy
```
Tomasz Naskret's avatar
Tomasz Naskret committed

After that, the database can be read and saved into the API format.
Arkadiusz Janz's avatar
Arkadiusz Janz committed
```python
import plwn
api = plwn.read("connection.txt", "database", "plwn-new.db", "sqlite3")
```

To load this version at a later date, use `plwn.load(path)` instead of `plwn.load_default()`

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```python
wn = plwn.load("storage-dumps/plwn-new.db")
```
Manually downloading API dumps
leszeks's avatar
leszeks committed
=====================

Tomasz Naskret's avatar
Tomasz Naskret committed
In order to download one of the dumps available at https://minio.clarin-pl.eu/minio/public/models/:
- latest model file plwn_dump_25-02-2020.sqlite
leszeks's avatar
leszeks committed

Arkadiusz Janz's avatar
Arkadiusz Janz committed
```python
import plwn
plwn.download("/path/to/your/database/sqlite/dump")
```

leszeks's avatar
leszeks committed
File will be downloaded to the current directory.
Arkadiusz Janz's avatar
Arkadiusz Janz committed
If optional_name is not provided default dump will be downloaded (see ,,Setup'' section).
If optional_name is provided but doesn't match name of any available dumps, the process will fail and display possible names. You need to setup config.ini file.
Licenses
========

The python software is provided on terms of the LGPL 3.0 license (see COPYING
and COPYING.LESSER).

Lexicon data is provided on terms of the WordNet license (see LICENSE-PWN.txt)
for the original Princeton WordNet synsets and relations, and the plWordNet
license (see LICENSE-plWN.txt) for other entities.