 Article AI Blog CPG General Healthcare High Tech Insurance Life Sciences February 14, 2018 9 minute read

# Poincarè Embeddings for Representing Hierarchical Data

Neural Networks are able to understand and work with numerical data. However, there is no easy way to convert the data into numbers when it comes to text and graphical data.

One of the ways that we got around this problem was by creating embeddings for words and graphs. Embeddings of a word or a node in a graph are vectors of numbers that contain all the information that is necessary for a neural network to understand it. This means is that the embeddings of two contextually similar words will be similar and thus have a very large product. Embeddings were invented very recently and they are now the go-to way to create efficient representations of words and graphs.

Similarly, nodes that are close together in a graph will have a lesser distance in a two-dimensional embedding space.

All these embeddings are learned and represented in a Euclidean space; therefore, it is safe to argue that all types of embeddings are limited by the mathematics and dimensionality of Euclidean spaces.

The authors of the paper “Poincaré Embeddings for Learning Hierarchical Representations argue that learning embeddings in a hyperbolic plane will improve the quality of the learned representations while also reducing the number of dimensions needed to create an efficient representation.

In this article, I’ll talk about problems in the Euclidean plane representations and explain how Hyperbolic Poincaré Embeddings work. I’ll then show you how to train a small text corpus on Poincaré Embeddings and we’ll analyze the results.

Euclidean Embeddings and Hyperbolic Spaces

All forms of embeddings or representations used currently are in Euclidean space. Euclidean space is something that we deal with the most. This means that all lines are straight and can be extended indefinitely and distances between two points remain the same no matter how far you go from the origin. Think Cartesian Coordinates.

The definition for Euclidean Geometry was given by Euclid way back in AD 125. The definition had 5 postulates. While the first four postulates were easy to understand and prove, the fifth postulate caused a lot of problems. After nearly 2 thousand years, the fifth postulate is still unproven.

Hyperbolic geometry was invented in a rage after geometers could not prove the 5th postulate that Euclid gave. In a last-ditch effort, they invented a new kind of geometry in which the fifth postulate would always be true.

Hyperbolic spaces are a bit more complicated and less intuitive to understand.

Figure 1                                                 Figure 2  The image in figure 1 shows a hyperbolic surface that has been projected onto a Euclidean surface. The reference point is taken as the center of the circle and the outer edge is at an infinite distance away. Distances on the circle increase by a factor of e (Euler’s Constant) as we go further from the center.

The distance between points A and B on two circles are further apart, the further they are from the center. In fact, the shortest line to join the two would be a curved line as drawn in figure 2.

Because of this, the distances from a point on different parts of the Poincarè plane increases as we move away from the center.

Figure 3 illustrates that very well.

Figure 3   In the first image, we can see that the distance of all points from the center increases uniformly. However, as we move away from the center, the distances increase dramatically. In the last image, the distances from the points even a small distance away are nearly infinite.

To make things simpler, think of the moon. The moon is a sphere (we only see a half-sphere) that our eyes see as a 2D surface. So, the distance between two points is actually greater towards the edges of the moon than in the center.

Problems with Euclidean Embeddings

The biggest problem with embeddings in a Euclidean space is that the embeddings are limited to the dimensionality of Euclidean spaces. Moreover, when it comes to embedding graph data, we have to model the complex connections of nodes. The authors argue that the ability of a model to learn and generalize becomes better as we increase the representation capacity of the network. Put simply, it means that the model is better able to capture important features if we increase the dimensionality of the embeddings.

However, increasing the dimensions becomes very expensive computationally. We cannot reduce the dimensions as it will lead to a loss in important features.

On the other hand, a hyperbolic plane has a constant negative curvature. This makes it perfect to represent graphical and hierarchical data. In fact, it has been shown that an infinite tree can be represented in a finite hyperbolic space (remember that the edge of the circle is at an infinite distance away).

Moreover, since distances in the Poincarè ball increase exponentially, then it becomes easy to model trees that have an exponentially increasing number of children. This can be done in only two dimensions.

As we will see in the next section, it is these properties of hyperbolic space that the author has exploited to come up with better representations of graph or hierarchical data.

Hyperbolic Embeddings

To figure out how the embeddings work, we’ll take a look at a hierarchical dataset: the mammalian family dataset.

We want our embeddings to do the following:

1. Data with similar semantic meaning should be close together in the embedding space.
2. Nodes that are connected should be placed close together and nodes that are not connected should be placed far apart in the embedding space.
3. Nodes that are lower in the hierarchy to the reference node should be far away from the origin.
4. All this should be learned in an unsupervised fashion.

Other than this, it would be an added bonus if our embeddings could:

1. Learn better embeddings with decreased runtime and memory complexity.
2. Gain more insight as to the structure of the data and the relationships between the different entities in the data.

As mentioned before, the authors have done the training in a Poincarè-Ball model. They have chosen this because it is more suitable for gradient-based optimization (differentiation is easy and because the results can be easily visualized and understood).

For a unit d-dimensional space in the hyperbolic space H, the Poincarè unit ball is given by: Here, x is the Euclidean norm. The distance between two different points u and v, in the Poincarè plane is given as: To perform an update, we have to differentiate the above equation. The partial derivative of the above equation is given by: The update for each embedding is then made as follows: For training, the embeddings are initialized randomly, but all are initialized close to the origin. This works better than random initialization as the model is supposed to learn by penalizing a high value for unconnected nodes.

The model is also, therefore, trained with a progressively increasing learning rate.

BUT WHAT ABOUT THE ACCURACY?

To evaluate the model, two metrics are used:

1. Rank, which is the mean of the distances between the points u and v, and the ground truth realities; and
2. Mean Average Precision (MAP) of the ranking.

The results are given in the following table: The model in the paper was trained on the WordNet dataset. The highlighted cells show the best results for Euclidean Embedding and the Poincarè Embedding which achieves an equivalent result. The bold cells show the best results.

It is evident that the Poincarè model learns an equivalent result in a very small 5 dimensional space, thus reducing computational complexity. In case of link prediction, the model learns an optimal representation in less than 50 dimensions when it takes close to 50 dimensions in case of Euclidean space.

IMPLEMENTATION

We are going to use genism to implement a Poincarè embedding on a dataset of mammalian animals. The dataset can be found here. It consists of a single unique relation among two different entities on each line of the dataset.

To train a Poincarè Embedding on this dataset, we can use the following lines of code:

```from gensim.models.poincare import PoincareModel, PoincareRelations from gensim.test.utils import datapath file_path = datapath('poincare_hypernyms.tsv') model = PoincareModel(PoincareRelations(file_path), negative=2) model.train(epochs=50) ```

Here, datapath returns the path of the dataset.

Poincarè Model defines the model that we are training. It takes as parameters the path to the dataset; size, which defines the number of dimensions of the trained model and alpha which specifies the learning rate. Negative specifies the number of negative samples to use.

Epochs specify the number of epochs that we want to train our model for.

After the model is trained, we can test it by running a small script to find the most similar nodes to a certain node:

`vectors.most_similar('lion.n.01')`

It should give a result like:

`[('lion_cub.n.01', 0.4484), ('lionet.n.01', 0.6552), ...]`

You can also find the similarity between different nodes as follows:

`model.similarity('mammal.n.01', 'carnivore.n.01')`

It should return a number like 0.73.

To visualize the results on a 2D plane, we can use this:

`from gensim.viz.poincare import ``poincare_2d_visualization, poincare_distance_heatmap`

```show_node_labels = [ 'mammal.n.01', 'placental.n.01', 'ungulate.n.01', 'carnivore.n.01', 'rodent.n.01', 'canine.n.02', 'even-toed_ungulate.n.01', 'odd-toed_ungulate.n.01', 'elephant.n.01', 'rhinoceros.n.01', 'german_shepherd.n.01', 'feline.n.01', 'tiger.n.02', 'homo_sapiens.n.01'] poincare_2d_visualization(model, tree, figure_title, show_node_labels=show_node_labels)```

poincare_2d_visualization() returns a plotly figure that contains the plot for the embeddings.

Based on how your model is trained, it should return a similar to the graph in figure 4:

######  CONCLUSION

Poincarè embeddings are still in their nascent phase. The authors of the paper have mentioned that they have tested only one method of embeddings. However, there can be other methods too, which can give similar, or even better, results.