MLN Inference

Jun 8, 2018 00:52 · 858 words · 5 minute read

The inference module of PracMLN allows the user to well, perform inference using PracMLN. As per my GSoC proposal, this was the first portion of PracMLN I intended to speedup. Until few weeks ago, however, my only usage of PracMLN was limited to the GUI mlnlearn and mlnquery tools. To speedup the inference module via Cython, I had to first be able to use it. Thankfully, the documentation contains a handy reference. The test files also provided helpful insight in using PracMLN from Python.

Armed with this knowledge, I set about editing my first file, exact.py, adding typing information to variables. Along with reading the Cython documentation, which could be terse at times, I found the following video very helpful to get started.

Code Testing

As I proceeded, I realised there was no way for me to figure out if the changes I made weren’t breaking the code - and were indeed resulting in faster runtimes. A discussion with Daniel Nyga about this led me to realise, that in heading off straight to edit PracMLN code files, I was actually skipping the first and most important step of the process: code testing.

A quick test script later, I realised that no matter what edits I made (including added print statements), there were no effects on my output. Three frustration filled days and nights later, I realised I hadn’t setup my work environment correctly.

Note to Self: A good setup takes some time, but eventually saves even more time!

PracMLN Setup

Here is how PracMLN needs to be setup for development on a (linux) computer.

... $ git clone https://github.com/danielnyga/pracmln
... $ cd pracmln
... $ ln -s _version python3/pracmln/_version
... $ PYTHONPATH="${PYTHONPATH}:/ ... /pracmln/python3/"
... $ export PYTHONPATH

As you can see, after cloning the repo, a link needs to be inserted to the _version directory from inside the python3/pracmln folder. It is also essential to modify the PYTHONPATH. I spent a lot of time trying alternate ways to do this (different setup.py installs, various pip versions, etc), to avoid manually tweaking the PYTHONPATH, but to no avail. If you can figure out a different way to do this, please let me know!

Since my work involved comparing the runtimes of Python and Cython, I supplemented this with a setup where I had two virtual environments:

  1. A virtualenv with pip version downgraded to 9.0.3, and pracmln installed via pip, for use as a baseline when measuring runtimes. When using this, I would ensure that pracmln was explicitly removed from the PYTHONPATH.
  2. A virtualenv with all the pracmln requirements installed, but without pracmln itself. When using this environment, I could see the effect my edits were causing.

Most text editors don’t really support cython syntax highlighting. I therefore recommend installing atom, and then getting the handy language-cython package.

I was now ready to start testing my code using the following script (modified for MC-SAT).

The speedup obtained thus was … negligible.

Deciphering PracMLN

This was initially rather dissapointing, but in hindsight, should have been expected. Merely typing a few local variables doesn’t really modify the behaviour of the computationally intensive portions of any codebase. At the moment, however, I was very uncertain about this, or how to proceed. I would use the Cython annotation tool extensively (... $ cython -a ...), generate HTML files displaying a lot of yellow lines, hinting at possible optimisations, and then not know what to do about them. I decided to plough through the rest of the inference related code anyways, line by line, arbitrarily typing a few variables here and there. I marked edits and questions with a comment of the form # Q(gsoc): ..., to serve as a flag for review.

It did not help. At all.

This is plainly visible in the runtimes that can be inspected here. After discussing this with my mentor, Daniel Nyga, it became apparent that there was a lot more work to be done. Extensive use of Cython extension types needed to be made, to port the actual rate determining parts of PracMLN to statically compilable code. I had actually thought about this before, but abandoned the approach, because I was unable to understand clearly enough the class hierarchy of PracMLN. My lack of expertise in Markov Logic Networks didn’t help either. In fact, the UML diagrams I had generated to try to understand this were incredibly daunting.

UML Class Diagram for PracMLN (Python3)

UML Package Diagram for PracMLN (Python3)

Completely human readable diagram available on imgur:

A New Direction

Clearly, my GSoC project has many unanticipated challenges. But, as Daniel Nyga pointed out - every project does! It was naive of me to believe that my current knowledge would be sufficient to bring immense speedups to PracMLN within 2 weeks. While I don’t yet have any significant speedups to show for my work, I have learnt a lot of lessons! I have tried documenting some of them here. I will now try to surmount my fear of complex UML diagrams, revisit exact.pyx, and try to convert it entirely into a Cython extension type. I hope to have more positive content for my next blog post!