Skip to content
Snippets Groups Projects
Commit 9e4fc612 authored by leszeks's avatar leszeks
Browse files

removing obsolete files

parent da4df0ec
No related branches found
No related tags found
1 merge request!3V23
Pipeline #1001 passed
Metadata-Version: 1.0
Name: PLWN-API
Version: 0.9
Summary: Python API to access plWordNet lexicon
Home-page: UNKNOWN
Author: Michał Kaliński
Author-email: michal.kalinski@pwr.edu.pl
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN
MANIFEST.in
README-pl-beta.txt
setup.py
PLWN_API.egg-info/PKG-INFO
PLWN_API.egg-info/SOURCES.txt
PLWN_API.egg-info/dependency_links.txt
PLWN_API.egg-info/requires.txt
PLWN_API.egg-info/top_level.txt
plwn/__init__.py
plwn/_loading.py
plwn/bases.py
plwn/enums.py
plwn/exceptions.py
plwn/relation_aliases.tsv
plwn/relresolver.py
plwn/readers/__init__.py
plwn/readers/comments.py
plwn/readers/nodes.py
plwn/readers/ubylmf.py
plwn/readers/wndb.py
plwn/readers/wnxml.py
plwn/storages/__init__.py
plwn/storages/objects.py
plwn/storages/sqlite.py
plwn/utils/__init__.py
plwn/utils/graphmlout.py
plwn/utils/sorting.py
plwn/utils/tupwrap.py
\ No newline at end of file
six>=1.10
enum34>=1.1.2
plwn
# PLWN_API ========
PLWN API
========
PLWN API is a library for accessing the plWordNet lexicon in a Python program.
Usage
=====
Access is provided using a PLWordNet object, with data loaded from the database
dump.
>>> import plwn
>>> wn = plwn.load_default()
Using that object, it's possible to obtain synset and lexical unit data.
>>> lex = wn.lexical_unit('pies', plwn.PoS.noun_pl, 2)
>>> print(lex)
pies.2(21:zw)
>>> print(lex.definition)
pies domowy - popularne zwierzę domowe, przyjaciel człowieka.
Full documentation
==================
For description of loading plWordNet data:
$ pydoc plwn._loading
For description of the PLWordNet class and others:
$ pydoc plwn.bases
Creating API dumps from wordnet sql
===================================
Latest wordnet database dump can be obtained from
http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz
It can be loaded using shell command:
$ mysql -e 'CREATE SCHEMA wordnet_new' # For maintaining multiple versions.
$ mysql -D wordnet_new < wordnet-work.LATEST.sql.gz
It is then recommended to run `clean_wndb.sql` script to remove any mistakes
in an unlikely case that the dump contains some, such as invalid enum values
or invalid foreign keys.
$ mysql -D wordnet_new < clean_wndb.sql
Then, edit connection string in storage-dumps if necessary according to sqlalchemy format.
Default values are all set to "wordnet", in the example DATABASE will be "wordnet_new".
mysql+mysqldb://wordnet:wordnet@localhost/wordnet_new?charset=utf8
After that, the database can be read and saved into the API format. Only works in Python 2!
>>> import sys; print(sys.version)
2.7.12
>>> import plwn
>>> api = plwn.read("connection.txt", "database", "plwn-new.db", "sqlite3")
To load this version at a later date, use `plwn.load(path)` instead of `plwn.load_default()`
>>> api = plwn.load("storage-dumps/plwn-new.db")
Licenses
========
The python software is provided on terms of the LGPL 3.0 license (see COPYING
and COPYING.LESSER).
Lexicon data is provided on terms of the WordNet license (see LICENSE-PWN.txt)
for the original Princeton WordNet synsets and relations, and the plWordNet
license (see LICENSE-plWN.txt) for other entities.
========
PLWN API
========
PLWN API is a library for accessing the plWordNet lexicon in a Python program.
Usage
=====
Access is provided using a PLWordNet object, with data loaded from the database
dump.
>>> import plwn
>>> wn = plwn.load_default()
Using that object, it's possible to obtain synset and lexical unit data.
>>> lex = wn.lexical_unit('pies', plwn.PoS.noun_pl, 2)
>>> print(lex)
pies.2(21:zw)
>>> print(lex.definition)
pies domowy - popularne zwierzę domowe, przyjaciel człowieka.
Full documentation
==================
For description of loading plWordNet data:
$ pydoc plwn._loading
For description of the PLWordNet class and others:
$ pydoc plwn.bases
Creating API dumps from wordnet sql
===================================
Latest wordnet database dump can be obtained from
http://ws.clarin-pl.eu/public/wordnet-work.LATEST.sql.gz
It can be loaded using shell command:
$ mysql -e 'CREATE SCHEMA wordnet_new' # For maintaining multiple versions.
$ mysql -D wordnet_new < wordnet-work.LATEST.sql.gz
It is then recommended to run `clean_wndb.sql` script to remove any mistakes
in an unlikely case that the dump contains some, such as invalid enum values
or invalid foreign keys.
$ mysql -D wordnet_new < clean_wndb.sql
Then, edit connection string in storage-dumps if necessary according to sqlalchemy format.
Default values are all set to "wordnet", in the example DATABASE will be "wordnet_new".
mysql+mysqldb://wordnet:wordnet@localhost/wordnet_new?charset=utf8
After that, the database can be read and saved into the API format. Only works in Python 2!
>>> import sys; print(sys.version)
2.7.12
>>> import plwn
>>> api = plwn.read("connection.txt", "database", "plwn-new.db", "sqlite3")
To load this version at a later date, use `plwn.load(path)` instead of `plwn.load_default()`
>>> api = plwn.load("storage-dumps/plwn-new.db")
Licenses
========
The python software is provided on terms of the LGPL 3.0 license (see COPYING
and COPYING.LESSER).
Lexicon data is provided on terms of the WordNet license (see LICENSE-PWN.txt)
for the original Princeton WordNet synsets and relations, and the plWordNet
license (see LICENSE-plWN.txt) for other entities.
File deleted
hiperonimia hiper
hiponimia hipo
deminutywność dem
holonimia holo
meronimia mero
"""Implementation of Relation Resolver."""
from __future__ import absolute_import, division
from contextlib import closing
import logging
import pkg_resources as pkgr
import six
__all__ = 'RelationResolver', 'get_default_relation_resolver'
_DEFAULT_RESOLVER_LOC = 'plwn', 'relation_aliases.tsv'
_default_resolver_obj = None
_log = logging.getLogger(__name__)
class RelationResolver(object):
"""Stores dictionary of relation name aliases to full names."""
@classmethod
def from_tsv(cls, tsv_stream):
"""Creates an instance from a TSV file.
The first item of each line should be the full name, and every other
should be an alias (similar to ``from_reverse_dict``).
:param tsv_stream: The stream from which TSV lines are read.
:type tsv_stream: TextIO
:rtype: RelationResolver
"""
adict = {}
for line in tsv_stream:
items = line.strip().split(u'\t')
fullname = items[0].strip()
for alias in items[1:]:
adict[alias.strip()] = fullname
return cls(adict)
@classmethod
def from_reverse_dict(cls, rdict):
"""Creates an instance from a dictionary.
Mapping full names to lists of aliases that should resolve to them.
:type rdict: Mapping[str, List[str]]
:rtype: RelationResolver
"""
adict = {}
for full, aliases in six.iteritems(rdict):
for alias in aliases:
adict[alias] = full
return cls(adict)
def __init__(self, aliases):
""".
:param aliases: Dictionary (or pairs sequence) mapping relation aliases
to full names.
:type aliases: Mapping[str, str]
"""
self._aliases = dict(aliases)
def add_alias(self, alias, fullname):
"""Add a new alias to the dictionary.
:param str alias: The alias.
:param str fullname: The name the alias will resolve to.
"""
self._aliases[alias] = fullname
def resolve_name(self, relname):
"""Resolve a possible alias to a full name.
If ``relname`` is not a known alias, it's returned unchanged.
:param str relname: The relation name that may be an alias that needs
to be resolved.
:return: ``relname`` or, if it's an alias, the full name it resolves
to.
:rtype: str
"""
return self._aliases.get(relname, relname)
def get_default_relation_resolver():
"""Create an instance of ``RelationResolver``.
That loads a file with all default relation name aliases.
The default aliases TSV file is located in ``plwn`` package root, as
``relation_aliases.tsv``.
:return: The default ``RelationResolver`` instance, initialized once on the
first call.
:rtype: RelationResolver
"""
global _default_resolver_obj
if _default_resolver_obj is None:
try:
with closing(pkgr.resource_stream(*_DEFAULT_RESOLVER_LOC)) \
as tsv_in:
_default_resolver_obj = RelationResolver.from_tsv(
line.decode('utf8') for line in tsv_in
)
except IOError:
_log.exception('Failed to load default aliases file')
_default_resolver_obj = RelationResolver({})
return _default_resolver_obj
"""Implementation which stores data in plain python objects.
Should be fairly fast to construct, but querying and memory
efficiencies may not be too great.
"""
from __future__ import absolute_import, absolute_import
import collections as coll
import logging
import operator as op
import six
from six.moves import cPickle
from ..readers import nodes as nd
from ..enums import PoS
from ..relresolver import get_default_relation_resolver
from ..utils.tupwrap import tup_wrapped, TupWrapper
from ..utils.sorting import text_key
from .. import bases, exceptions as exc
__all__ = 'PLWordNet', 'Synset', 'LexicalUnit'
_log = logging.getLogger(__name__)
class PLWordNet(bases.PLWordNetBase):
_STORAGE_NAME = 'objects'
_SCHEMA_VERSION = 2
@classmethod
def from_reader(cls, reader, dump_to=None):
obj = cls()
obj.__read_data(reader)
if dump_to is not None:
with open(dump_to, 'wb') as dump_ofs:
cPickle.dump(obj, dump_ofs, cPickle.HIGHEST_PROTOCOL)
return obj
@classmethod
def from_dump(cls, dump):
with open(dump, 'rb') as dump_ifs:
obj = cPickle.load(dump_ifs)
if not isinstance(obj, cls):
raise exc.LoadException(
'Unpickled object is not an instance of ' + repr(cls)
)
if not hasattr(obj, '_version') or obj._version != cls._SCHEMA_VERSION:
raise exc.DumpVersionException(
getattr(obj, '_version', None),
cls._SCHEMA_VERSION,
)
return obj
@staticmethod
def __fill_id_reldict(src_node, id_rel_dict, id_set):
rels = coll.defaultdict(list)
for relname, reltarget in src_node.related:
if reltarget not in id_set:
_log.warning(
'Target %d of relation %s from %d does not exist',
reltarget,
relname,
src_node.id,
)
else:
rels[relname].append(reltarget)
id_rel_dict[src_node.id] = coll.OrderedDict(
(relname, tuple(rels[relname]))
for relname in sorted(rels, key=text_key)
)
@staticmethod
def __gen_item_reldict(id_rel_dict, item_rel_dict, item_dict):
for src_id, rel_dict in six.iteritems(id_rel_dict):
irel_dict = coll.OrderedDict()
for relname, trg_ids in six.iteritems(rel_dict):
trg_items = []
for trg_id in rel_dict[relname]:
try:
trg_item = item_dict[trg_id]
except KeyError:
_log.warning(
'Target %d of relation %s from %d does not exist',
trg_id,
relname,
src_id,
)
else:
trg_items.append(trg_item)
if trg_items:
irel_dict[relname] = tuple(trg_items)
if irel_dict:
item_rel_dict[src_id] = irel_dict
def __init__(self):
"""**NOTE:** This constructor should not be invoked directly.
Use one of the standard methods: ``from_dump`` or ``from_reader``.
"""
super(PLWordNet, self).__init__()
# Remember the version for unpickling check
self._version = self._SCHEMA_VERSION
# Master indexes
self._synsets = coll.OrderedDict()
self._units = coll.OrderedDict()
# Secondary indexes for lookup of units by lemma, pos and var
self._i_lem_pos_var = {}
self._i_lem_pos = coll.defaultdict(list)
self._i_lem_var = coll.defaultdict(list)
self._i_lem = coll.defaultdict(list)
self._i_pos = coll.defaultdict(list)
# No index for lookup by var! That's the slow way.
# Relations: indexed by id and then relation names; the second one
# should be ordered.
self._synrels = {}
self._lexrels = {}
def lexical_unit_by_id(self, id_):
try:
return self._units[id_]
except KeyError:
raise exc.InvalidLexicalUnitIdentifierException(id_)
@tup_wrapped
def lexical_units(self, lemma=None, pos=None, variant=None):
if lemma is not None and pos is not None and variant is not None:
# Yield only one unit since it must be it if it exists
try:
yield self._i_lem_pos_var[lemma, PoS(pos), variant]
except KeyError:
pass
finally:
return
if lemma is not None and pos is not None:
retlist = self._i_lem_pos.get((lemma, PoS(pos)), ())
elif lemma is not None and variant is not None:
retlist = self._i_lem_var.get((lemma, variant), ())
elif lemma is not None:
retlist = self._i_lem.get(lemma, ())
elif pos is not None:
retlist = self._i_pos.get(PoS(pos), ())
else:
# Hoo boy, it's bad
retlist = self._select_lexunits(lemma, PoS(pos), variant)
for lu in retlist:
yield lu
def lexical_unit(self, lemma, pos, variant):
try:
return self._i_lem_pos_var[lemma, PoS(pos), variant]
except KeyError:
raise exc.LexicalUnitNotFound(lemma, pos, variant)
def synset_by_id(self, id_):
try:
return self._synsets[id_]
except KeyError:
raise exc.InvalidSynsetIdentifierException(id_)
@tup_wrapped
def synsets(self, lemma=None, pos=None, variant=None):
for lu in self.lexical_units(lemma, pos, variant):
yield lu.synset
def synset(self, lemma, pos, variant):
try:
return self._i_lem_pos_var[lemma, PoS(pos), variant].synset
except KeyError:
raise exc.SynsetNotFound(lemma, pos, variant)
def synset_relation_edges(self, include=None, exclude=None):
return TupWrapper(self._iter_reledges(self._synrels, include, exclude))
def lexical_relation_edges(self, include=None, exclude=None):
return TupWrapper(self._iter_reledges(self._lexrels, include, exclude))
def _select_lexunits(self, lemma, pos, variant):
# The "slow way" (indexless) of selecting lexical units
for lu in six.itervalues(self._units):
if ((lemma is None or lemma == lu._lemma) and
(pos is None or pos is lu._pos) and
(variant is None or variant == lu._var)):
yield lu
def _iter_reledges(self, reledges, include, exclude):
# Ensure those are sets
include = frozenset(
self._rel_resolver.resolve_name(rel) for rel in include
) if include is not None else None
exclude = frozenset(
self._rel_resolver.resolve_name(rel) for rel in exclude
) if exclude is not None else None
for src, reldict in six.iteritems(reledges):
for relname, targets in six.iteritems(reldict):
if ((include is None or relname in include) and
(exclude is None or relname not in exclude)):
for trg in targets:
yield bases.RelationEdge(
source=src,
relation=relname,
target=trg,
)
def __read_data(self, reader):
# Nodes need to be separated and sorted before being pushed to indexes.
syn_nodes = {}
ordered_synids = []
lex_nodes = {}
# Ordered AND filtered
ordered_lex_nodes = []
# The association will remember unit indices
s2u = coll.defaultdict(list)
# Temporary id relation dicts
id_lex_rels = {}
id_syn_rels = {}
for node in reader:
if isinstance(node, nd.SynsetNode):
syn_nodes[node.id] = node
else:
lex_nodes[node.id] = node
# First iterate over lex nodes to establish the unit-synset
# relationships and sort out synsets and lexunits that don't exist.
for lex_node in six.itervalues(lex_nodes):
if lex_node.synset not in syn_nodes:
_log.warning(
'Synset %d from unit %d does not exist',
lex_node.id,
lex_node.synset,
)
else:
s2u[lex_node.synset].append((lex_node.unit_index, lex_node.id))
ordered_synids.append(lex_node.synset)
ordered_lex_nodes.append(lex_node)
# Sort by lemma!
ordered_lex_nodes.sort(key=lambda node: text_key(node.lemma))
# Insert lexical unit objects into ordered dict
for lex_node in ordered_lex_nodes:
self._units[lex_node.id] = LexicalUnit(
self,
lex_node.id,
lex_node.lemma,
lex_node.pos,
lex_node.variant,
lex_node.synset,
lex_node.definition,
tuple(lex_node.usage_notes),
tuple(lex_node.external_links),
tuple(lex_node.examples),
tuple(lex_node.examples_sources),
lex_node.domain,
lex_node.verb_aspect,
lex_node.emotion_markedness,
tuple(lex_node.emotion_names),
tuple(lex_node.emotion_valuations),
lex_node.emotion_example_1,
lex_node.emotion_example_2,
)
self.__fill_id_reldict(lex_node, id_lex_rels, lex_nodes)
# Now, insert synsets in the right order
for synid in ordered_synids:
if synid in self._synsets:
continue
syn_node = syn_nodes[synid]
# Sort units by index first
synunits = s2u[synid]
synunits.sort(key=op.itemgetter(0))
self._synsets[synid] = Synset(
self,
synid,
(it[1] for it in synunits),
syn_node.definition,
)
# Relations are done similarly to lex ones
self.__fill_id_reldict(syn_node, id_syn_rels, syn_nodes)
# But what if there are synsets that have no units?
for synid in syn_nodes:
if synid not in self._synsets:
_log.warning('Synset %d has no units', synid)
# We can convert id rel dicts now
self.__gen_item_reldict(id_lex_rels, self._lexrels, self._units)
self.__gen_item_reldict(id_syn_rels, self._synrels, self._synsets)
# We can build indexes now
for lu in six.itervalues(self._units):
self._i_lem_pos_var[lu._lemma, lu._pos, lu._var] = lu
self._i_lem_pos[lu._lemma, lu._pos].append(lu)
self._i_lem_var[lu._lemma, lu._var].append(lu)
self._i_lem[lu._lemma].append(lu)
self._i_pos[lu._pos].append(lu)
class LexicalUnit(bases.LexicalUnitBase):
__slots__ = (
'_relr',
'_wn',
'_id',
'_lemma',
'_pos',
'_var',
'_synid',
'_def',
'_usn',
'_extl',
'_exms',
'_exms_srcs',
'_dom',
'_va',
'_emo_mark',
'_emo_names',
'_emo_valuations'
'_emo_ex1',
'_emo_ex2',
)
def __init__(self,
wn,
lexid,
lemma,
pos,
variant,
synid,
def_,
usn,
extl,
exms,
exms_srcs,
dom,
va,
emo_mark,
emo_names,
emo_valuations,
emo_ex1,
emo_ex2):
"""**NOTE:** This constructor should not be called directly.
Use :class:`PLWordNet` methods to obtain lexical units.
"""
self._relr = get_default_relation_resolver()
self._wn = wn
self._id = lexid
self._lemma = lemma
self._pos = pos
self._var = variant
self._synid = synid
self._def = def_
self._usn = usn
self._extl = extl
self._exms = exms
self._exms_srcs = exms_srcs
self._dom = dom
self._va = va
self._emo_mark = emo_mark
self._emo_names = emo_names
self._emo_valuations = emo_valuations
self._emo_ex1 = emo_ex1
self._emo_ex2 = emo_ex2
@property
def id(self):
return self._id
@property
def lemma(self):
return self._lemma
@property
def pos(self):
return self._pos
@property
def variant(self):
return self._var
@property
def synset(self):
return self._wn._synsets[self._synid]
@property
def definition(self):
return self._def
@property
def sense_examples(self):
return self._exms
@property
def sense_examples_sources(self):
return self._exms_srcs
@property
def external_links(self):
return self._extl
@property
def usage_notes(self):
return self._usn
@property
def domain(self):
return self._dom
@property
def verb_aspect(self):
return self._va
@property
def emotion_markedness(self):
return self._emo_mark
@property
def emotion_names(self):
return self._emo_names
@property
def emotion_valuations(self):
return self._emo_valuations
@property
def emotion_example(self):
return self._emo_ex1
@property
def emotion_example_secondary(self):
return self._emo_ex2
@property
def relations(self):
# Not caching, since this is an informational method that will probably
# not be called very often.
# The rel dicts should be an ordered dict with relation names as keys.
return tuple(self._wn._lexrels[self._id])
def related(self, relation_name):
relname = self._rel_resolver.resolve_name(relation_name)
reldict = self._wn._lexrels[self._id]
try:
return TupWrapper(iter(reldict[relname]))
except KeyError:
raise exc.InvalidRelationNameException(relation_name)
class Synset(bases.SynsetBase):
__slots__ = '_relr', '_wn', '_id', '_units', '_def'
def __init__(self, wn, synid, unit_ids, def_):
"""**NOTE:** This constructor should not be called directly.
Use :class:`PLWordNet` methods to obtain synsets.
"""
self._relr = get_default_relation_resolver()
self._wn = wn
self._id = synid
self._units = tuple(wn._units[uid] for uid in unit_ids)
self._def = def_
@property
def id(self):
return self._id
@property
def lexical_units(self):
return self._units
@property
def definition(self):
return self._def
@property
def relations(self):
# Not caching, since this is an informational method that will probably
# not be called very often.
# The rel dicts should be an ordered dict with relation names as keys.
return tuple(self._wn._synrels[self._id])
def related(self, relation_name):
relname = self._rel_resolver.resolve_name(relation_name)
reldict = self._wn._synrels[self._id]
try:
return TupWrapper(iter(reldict[relname]))
except KeyError:
raise exc.InvalidRelationNameException(relation_name)
_this_storage_ = PLWordNet
"""Wrapper for all functions that return generators.
Calling the wrapped generator will wrap the contents in a tuple
(as a faster, chaining way or ``tuple(generator)``).
"""
from __future__ import absolute_import, unicode_literals, division
from functools import wraps
__all__ = 'TupWrapper', 'tup_wrapped'
class TupWrapper(object):
"""Wrapper class for generator objects.
Adds a ``__call__`` method which will convert the wrapped generator to
a tuple.
"""
__slots__ = '_gen',
def __init__(self, generator):
"""Initialize TupWrapper."""
self._gen = generator
def __iter__(self):
return self._gen
def __call__(self):
return tuple(self._gen)
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self._gen)
def tup_wrapped(fn):
"""Decorator for functions that return generators.
The return value of the wrapped function will be wrapped by
:class:`TupWrapper`.
This decorator is the only way to wrap around the output of generator
functions.
"""
@wraps(fn)
def decorated(*args, **kwargs):
return TupWrapper(fn(*args, **kwargs))
return decorated
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment