# **basegraph**: A simple module for representing wordnet graphs! The module is integrated with GraphTool library (https://graph-tool.skewed.de). It provides a simple interface to access the graphs representing wordnets with nodes reflecting synsets (or lexical units) and edges describing the lexico-semantic structure. The **basegraph** module offers 3 simple classes. *BaseGraph* object is a wrapper for GraphTool.Graph class objects and contains a convenient API to use the graphs. It also holds the reference to a raw GT Graph object (we can access it using `use_graph_tool` function). The *BaseGraph* consists of *BaseNodes* and *BaseEdges* representing graph vertices and links. The *BaseNode* class object holds a reference to a raw GraphTool.Vertex object, but also provides a convenient API to access its properties. We can wrap every single raw vertex in the graph using *BaseNode* class and easily access the properties of the object (just like we do with plain Python objects) instead of using the inconvenient API provided by GT. The same holds for *BaseEdge* class. #### Installation ``` pip install --extra-index-url https://pypi.clarin-pl.eu basegraph ``` or ``` python3.6 setup.py install ``` #### Dependencies? - Python3.6 - GraphTool only #### Basic Usage 1. Load the graph: ```python from basegraph import BaseGraph bg = BaseGraph() bg.unpickle('data/graph_syn.xml.gz') ``` 2. Iterate over all nodes or edges: ```python for node in bg.all_nodes(): pass for node in bg.all_edges(): pass ``` 3. a: General node properties ```python # returns all edges associated with the node node.all_edges() # returns all nodes associated with the node node.all_neighbours() # the degree of incoming links node.in_degree() # the degree of outgoing links node.out_degree() # access underlying GraphTool object node.use_graph_tool() ``` 3. b: Synset properties - (safe only for synset graphs, be careful when using with mixed graphs) ```python synset = node.synset synset.synset_id synset.lu_set ``` 3. c: Lexical Unit properties ```python lu.lu_id lu.lemma lu.pos lu.variant ``` 4. a: General edge properties ```python # source node edge.source() # target node edge.target() ``` 4. b: WN-specific edge properties ```python # WordNet-based name of semantic link edge.rel # WordNet-based ID of a given semantic link edge.rel_id ``` 5. Custom node and edge attributes - The idea is to make GraphTool interface transparent and override properties - We use ```create_node_attribute``` and ```create_edge_attribute``` - Custom properties can be accessed as they were designed as plain Python attrs ``` bg.create_node_attribute('depth', 'double') # can be also int, string, or vector # use it like an attribute: node.depth = 3 node.depth = node.depth + 1 bg.create_edge_attribute('weight', 'double') edge.weight = 0.8 edge.weight = edge.weight / 2 ``` #### Advanced Usage 0. Accessing Raw GT Properties **basegraph** API provides a simple interface to retrieve properties for nodes and edges in the graph. With GT objects it's a bit more complicated: ```python # get the underlying GT object g = bg.use_graph_tool() # now we have to get specfic property from the graph (e.g. synset property) prop = g.vp['synset'] # now we can get property value for specific node (raw vertex object from GT); # let's assume we have a BaseNode "n" representing the synset of ID 1319 n = bg.get_node_for_synset_id(1319).use_graph_tool() synset = prop[n] ``` 1. Find node by synset ID ``` bg.get_node_for_synset_id(synset_id) ``` 2. Find all nodes by given lemma ``` # first we have to initialize the dictionary bg._generate_lemma_to_nodes_dict() # then just take the nodes by lemma nodes = bg._lemma_to_nodes_dict[lemma] ``` 3. Graph filters With filters we can easily reduce the graph based on a given predicate. The source basegraph can be filtered in a `hard` way, by removing the nodes that did not meet our condition. The `soft` way means we make the filtered nodes just transparent and we can easily restore them later (using `reset_nodes_filter`). Analogous functions were prepared for graph edges (e.g. `edges_filter_conditional`, `reset_edges_filter`). Examples: ```python from basegraph import BaseGraph bg = BaseGraph() bg.unpickle('data/graph_syn.xml.gz') In [1]: sum(1 for n in bg.all_nodes()) Out[1]: 349189 In [2]: sum(1 for e in bg.all_edges()) Out[2]: 1552096 # Our condition: condition = lambda node: node.in_degree() < 3 # Apply in a 'soft' way: bg.nodes_filter_conditional(condition, soft=True) In [3]: sum(1 for n in bg.all_nodes()) Out[3]: 171179 In [4]: sum(1 for e in bg.all_edges()) Out[4]: 26964 In [5]: bg.reset_nodes_filter() In [6]: sum(1 for n in bg.all_nodes()) Out[6]: 349189 In [7]: sum(1 for e in bg.all_edges()) Out[7]: 1552096 # Apply in a 'hard' way (modifies the graph in place, `reset_nodes_filter` # doesn't work here) bg.nodes_filter_conditional(condition, soft=False) In [8]: sum(1 for n in bg.all_nodes()) Out[8]: 171179 In [9]: sum(1 for e in bg.all_edges()) Out[9]: 26964 ``` Let's do the same thing, but now for the edges: ```python In [10]: condition = lambda edge: edge.rel_id == 11 In [11]: bg.edges_filter_conditional(condition, soft=True) In [12]: sum(1 for n in bg.all_edges()) Out[12]: 208571 In [13]: bg.reset_edges_filter() In [14]: sum(1 for n in bg.all_edges()) Out[14]: 1552096 ``` #### GraphTool Algorithms To apply predefined GraphTool algorithms we have to operate on underlying GT objects. Let's try to compute the shortest distance between two specific nodes: ```python from graph_tool.topology import shortest_distance # don't forget to use only underlying GT objects when using raw GraphTool functions! n1 = bg.get_node_for_synset_id(s1).use_graph_tool() n2 = bg.get_node_for_synset_id(s2).use_graph_tool() g = bg.use_graph_tool() distance = shortest_distance(g, n1, n2) ``` Now we can try to get the shortest path: ```python from graph_tool.topology import shortest_path n1 = bg.get_node_for_synset_id(s1).use_graph_tool() n2 = bg.get_node_for_synset_id(s2).use_graph_tool() g = bg.use_graph_tool() # this returns raw GT objects, but still we can easily wrap them and use # basegraph API vertices, links = shortest_path(g, n1, n2) nodes = [BaseNode(g, v) for v in vertices] edges = [BaseEdge(g, e) for e in links] for node in nodes: print(node.synset.synset_id) # it's easier with BaseNode print(node.weight) # we can also try to use raw objects, but it's not so convenient synset_prop = g.vp['synset'] weight_prop = g.vp['weight'] for v in vertices: print(synset_prop[v].synset_id) print(weight_prop[v]) ``` #### WN-based Examples Let's take all of the hyponyms of a node. To do this we need to know, what's the ID of 'hypernymy' relation in our WN: ```python def get_hypernyms(node): return {edge.target() for edge in node.all_edges() if edge.rel_id == 11 and edge.target() != node} hypernyms = get_hypernyms(node) ```