# **basegraph**: A simple module for representing wordnet graphs!
The module is integrated with GraphTool library (https://graph-tool.skewed.de).
It provides a simple interface to access the graphs representing wordnets with
nodes reflecting synsets (or lexical units) and edges describing
the lexico-semantic structure.
The **basegraph** module offers 3 simple classes.
*BaseGraph* object is a wrapper for GraphTool.Graph class objects and contains
a convenient API to use the graphs. It also holds the reference to a raw GT Graph
object (we can access it using `use_graph_tool` function). The *BaseGraph* consists
of *BaseNodes* and *BaseEdges* representing graph vertices and links. The *BaseNode*
class object holds a reference to a raw GraphTool.Vertex object, but also provides
a convenient API to access its properties. We can wrap every single raw vertex in
the graph using *BaseNode* class and easily access the properties of the object
(just like we do with plain Python objects) instead of using the inconvenient
API provided by GT. The same holds for *BaseEdge* class.
#### Installation
```
pip install --extra-index-url https://pypi.clarin-pl.eu basegraph
```
or
```
python3.6 setup.py install
```
#### Dependencies?
- Python3.6
- GraphTool only
#### Basic Usage
1. Load the graph:
```python
from basegraph import BaseGraph
bg = BaseGraph()
bg.unpickle('data/graph_syn.xml.gz')
```
2. Iterate over all nodes or edges:
```python
for node in bg.all_nodes():
pass
for node in bg.all_edges():
pass
```
3. a: General node properties
```python
# returns all edges associated with the node
node.all_edges()
# returns all nodes associated with the node
node.all_neighbours()
# the degree of incoming links
node.in_degree()
# the degree of outgoing links
node.out_degree()
# access underlying GraphTool object
node.use_graph_tool()
```
3. b: Synset properties
- (safe only for synset graphs, be careful when using with mixed graphs)
```python
synset = node.synset
synset.synset_id
synset.lu_set
```
3. c: Lexical Unit properties
```python
lu.lu_id
lu.lemma
lu.pos
lu.variant
```
4. a: General edge properties
```python
# source node
edge.source()
# target node
edge.target()
```
4. b: WN-specific edge properties
```python
# WordNet-based name of semantic link
edge.rel
# WordNet-based ID of a given semantic link
edge.rel_id
```
5. Custom node and edge attributes
- The idea is to make GraphTool interface transparent and override properties
- We use ```create_node_attribute``` and ```create_edge_attribute```
- Custom properties can be accessed as they were designed as plain Python attrs
```
bg.create_node_attribute('depth', 'double') # can be also int, string, or vector
# use it like an attribute:
node.depth = 3
node.depth = node.depth + 1
bg.create_edge_attribute('weight', 'double')
edge.weight = 0.8
edge.weight = edge.weight / 2
```
#### Advanced Usage
0. Accessing Raw GT Properties
**basegraph** API provides a simple interface to retrieve properties for nodes
and edges in the graph. With GT objects it's a bit more complicated:
```python
# get the underlying GT object
g = bg.use_graph_tool()
# now we have to get specfic property from the graph (e.g. synset property)
prop = g.vp['synset']
# now we can get property value for specific node (raw vertex object from GT);
# let's assume we have a BaseNode "n" representing the synset of ID 1319
n = bg.get_node_for_synset_id(1319).use_graph_tool()
synset = prop[n]
```
1. Find node by synset ID
```
bg.get_node_for_synset_id(synset_id)
```
2. Find all nodes by given lemma
```
# first we have to initialize the dictionary
bg._generate_lemma_to_nodes_dict()
# then just take the nodes by lemma
nodes = bg._lemma_to_nodes_dict[lemma]
```
3. Graph filters
With filters we can easily reduce the graph based on a given predicate. The source
basegraph can be filtered in a `hard` way, by removing the nodes that did not
meet our condition. The `soft` way means we make the filtered nodes just transparent
and we can easily restore them later (using `reset_nodes_filter`). Analogous functions
were prepared for graph edges (e.g. `edges_filter_conditional`, `reset_edges_filter`).
Examples:
```python
from basegraph import BaseGraph
bg = BaseGraph()
bg.unpickle('data/graph_syn.xml.gz')
In [1]: sum(1 for n in bg.all_nodes())
Out[1]: 349189
In [2]: sum(1 for e in bg.all_edges())
Out[2]: 1552096
# Our condition:
condition = lambda node: node.in_degree() < 3
# Apply in a 'soft' way:
bg.nodes_filter_conditional(condition, soft=True)
In [3]: sum(1 for n in bg.all_nodes())
Out[3]: 171179
In [4]: sum(1 for e in bg.all_edges())
Out[4]: 26964
In [5]: bg.reset_nodes_filter()
In [6]: sum(1 for n in bg.all_nodes())
Out[6]: 349189
In [7]: sum(1 for e in bg.all_edges())
Out[7]: 1552096
# Apply in a 'hard' way (modifies the graph in place, `reset_nodes_filter`
# doesn't work here)
bg.nodes_filter_conditional(condition, soft=False)
In [8]: sum(1 for n in bg.all_nodes())
Out[8]: 171179
In [9]: sum(1 for e in bg.all_edges())
Out[9]: 26964
```
Let's do the same thing, but now for the edges:
```python
In [10]: condition = lambda edge: edge.rel_id == 11
In [11]: bg.edges_filter_conditional(condition, soft=True)
In [12]: sum(1 for n in bg.all_edges())
Out[12]: 208571
In [13]: bg.reset_edges_filter()
In [14]: sum(1 for n in bg.all_edges())
Out[14]: 1552096
```
#### GraphTool Algorithms
To apply predefined GraphTool algorithms we have to operate on underlying GT
objects. Let's try to compute the shortest distance between two specific nodes:
```python
from graph_tool.topology import shortest_distance
# don't forget to use only underlying GT objects when using raw GraphTool functions!
n1 = bg.get_node_for_synset_id(s1).use_graph_tool()
n2 = bg.get_node_for_synset_id(s2).use_graph_tool()
g = bg.use_graph_tool()
distance = shortest_distance(g, n1, n2)
```
Now we can try to get the shortest path:
```python
from graph_tool.topology import shortest_path
n1 = bg.get_node_for_synset_id(s1).use_graph_tool()
n2 = bg.get_node_for_synset_id(s2).use_graph_tool()
g = bg.use_graph_tool()
# this returns raw GT objects, but still we can easily wrap them and use
# basegraph API
vertices, links = shortest_path(g, n1, n2)
nodes = [BaseNode(g, v) for v in vertices]
edges = [BaseEdge(g, e) for e in links]
for node in nodes:
print(node.synset.synset_id) # it's easier with BaseNode
print(node.weight)
# we can also try to use raw objects, but it's not so convenient
synset_prop = g.vp['synset']
weight_prop = g.vp['weight']
for v in vertices:
print(synset_prop[v].synset_id)
print(weight_prop[v])
```
#### WN-based Examples
Let's take all of the hyponyms of a node. To do this we need to know, what's
the ID of 'hypernymy' relation in our WN:
```python
def get_hypernyms(node):
return {edge.target() for edge in node.all_edges()
if edge.rel_id == 11 and edge.target() != node}
hypernyms = get_hypernyms(node)
```