EnumerationAsk

Jun 17, 2018 19:14 · 1285 words · 7 minute read

EnumerationAsk performs exact inference over a Markov Logic Network. While it is not a very commonly used technique in the real world, where quicker approximations often suffice, it is nonetheless an important algorithm. Additionally, there is also a lot of potential for speeding it up, since the algorithm involves an enumeration of all possible worlds - each variable is assigned each possible value, and the probability of the resulting world is evaluated. After some discussion with my mentor, Daniel, we decided that this was the perfect algorithm to use for creating a reference implementation, paving the way forward for the rest of PracMLN to follow.

For a little over a week now, I have been trying to convert the Enumeration-Ask algorithm from exact.py to a Cython extension type.

Extension Types

Extension types are classes that have been compiled via Cython (so cdef classes, essentially). They allow typing information to be added to attributes, and speed up attribute access times (by replacing a Python dictionary lookup with a C struct access). The official documentation for extension types is long and detailed, but, as I found out, not quite exhaustive.

A cursory understanding of extension types might lead to the misconception that this can be achieved by simply adding the word cdef before the EnumerationAsk class definition in exact.pyx. This approach fails and is incomplete.

Approach

However, just by applying this single change and observing the compiler output, a lot of insight can be gained.

... $ pracmln/python3/pracmln/mln/inference $ python3 setup.py build_ext --inplace
Compiling exact.pyx because it changed.
[1/1] Cythonizing exact.pyx

Error compiling Cython file:
------------------------------------------------------------
...
            numerators[i] += expsum
    denominator += expsum
    return numerators, denominator


cdef class EnumerationAsk(Inference):
                         ^
------------------------------------------------------------

exact.pyx:77:26: First base of 'EnumerationAsk' is not an extension type

This is not surprising, and is mentioned explicitly in the documentation.

An extension type can only have one base class (no multiple inheritance).

EnumerationAsk inherits from the Inference class, in infer.py. At this point, it would be helpful to consult again the UML diagrams from imgur, to see what PracMLN dependencies look like.

More important, however, this technique formed the basis of my workflow for most of last week. Just reading the documentation didn’t make me an expert on Cython, and I didn’t have the expertise or foresight to tell before hand what work would need to be done to successfully make EnumerationAsk into an extension type. So, instead of working up from the bottom, I employed this top down approach:

  1. Define as a cdef class whichever module is to be converted to an extension type.
  2. Observe carefully the compilers complaints.
  3. Fix / workaround errors until compilation is successfull.
  4. Run basic tests.
  5. Fix / workaround errors till Cython can replicate pure Python functionality.

Having devised this approach after various iterations (and a few days of trial and error), I proceeded further with my plan.

EnumerationAsk Dependencies

I kept proceeding further down this rabbit hole, hopeful of reaching the bottom eventually. Daniel provided valuable insight along the way, proposing that most of the computation heavy stuff was being done manipulating the _evidence variable of the MRF class in mrf.py, and in the logic module.

The final (dependency) tree of edited files looked somewhat like this:

Cython conversions required for EnumerationAsk

Both base.pyx and mrf.pyx depended on the various parts of the logic module. This involved several complications and has been skipped for the time being.

Errors Encountered

While employing this technique, I came across a variety of different compiler complaints, and varying degrees of support available online. While there was significant help avaialble online for a few errors, there were a few that the Cython community had not fully addressed online. I am listing these here:

ValueError

ValueError: ... has the wrong size, try recompiling. Expected 32, got 16
  • Reproducible: No (!)
  • Help Available Online: No
  • Cause Understood: No
  • Solved: Yes

This was a runtime error encountered early on in the conversion of EnumerationAsk to an extension type. While the only remotely related issue online was wildly unhelpful, this error turned out to be the cause of a silly setup mistake. The fix is simple. A symbolic link needs to be created linking the built Cython code with the location where Python expects the file to be, in the usual PracMLN imports usage. For example, if this error is being generated for mrf.pyx, then simply create a link like this:

... pracmln/python3/pracmln/mln $ ls -l | grep mrf.cpython
lrwxrwxrwx 1 kaivalya kaivalya      47 Jun  18 22:24 mrf.cpython-35m-x86_64-linux-gnu.so -> pracmln/mln/mrf.cpython-35m-x86_64-linux-gnu.so

CompilerCrash

[1/1] Cythonizing base.pyx

Error compiling Cython file:
------------------------------------------------------------
...
            formula = self.logic.parse_formula(formula)
        elif type(formula) is int:
            return self._formulas[formula]
        constants = {}
        formula.vardoms(None, constants)
        for domain, constants in constants.items():
                                               ^
------------------------------------------------------------

base.pyx:251:48: Compiler crash in AnalyseExpressionsTransform

.
.
.

Compiler crash traceback from this point on:
  File "/home/kaivalya/ ... /python3.5/site-packages/Cython/Compiler/ExprNodes.py", line 5226, in infer_type
    arg_types = [arg.infer_type(env) for arg in self.args]
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
  File "setup.py", line 13, in <module>
    ext_modules=cythonize( ['*.pyx'] )
  File "/home/kaivalya/ ... /python3.5/site-packages/Cython/Build/Dependencies.py", line 1026, in cythonize
    cythonize_one(*args)
  File "/home/kaivalya/ ... /python3.5/site-packages/Cython/Build/Dependencies.py", line 1146, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: base.pyx
  • Reproducible: Yes
  • Help Available Online: No
  • Cause Understood: No
  • Solved: Yes

This compiletime error occurred twice in base.pyx. It concerns the constants variable on line 249 and line 773 (since renamed here). If both variables are typed as dictionaries, this error magically disappears.

ImportError

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    from pracmln import query
  File "/home/kaivalya/ ... /pracmln/python3/pracmln/__init__.py", line 22, in <module>
    from .mln.base import MLN
  File "/home/kaivalya/ ... /pracmln/python3/pracmln/mln/__init__.py", line 27, in <module>
    from .base import MLN
  File "common.pxd", line 1, in init pracmln.mln.base
  File "/home/kaivalya/ ... /pracmln/python3/pracmln/logic/__init__.py", line 1, in <module>
    from .fol import FirstOrderLogic
  File "/home/kaivalya/ ... /pracmln/python3/pracmln/logic/fol.py", line 26, in <module>
    from .common import Logic
ImportError: No module named 'pracmln.logic.common'
  • Reproducible: Yes
  • Help Available Online: Yes
  • Cause Understood: Yes
  • Solved: No

There were many runtime import errors encountered, mostly related to the logic module. They seem to provide a significant challenge and I am yet to resolve them. I shall perhaps deal with them in a separate blog post.

Summary

The workflow described above was arrived at after significant experimentation. I would love to hear feedback about it, and make improvements if possible. Additionally, more clarity on the errors mentioned here would also be beneficial. I was able to successfully apply this method, and convert the EnumerationAsk, Inference, MRF, and MLN classes to extension types this way. However, none of these could be tested (a serious drawback of taking this “top down” approach), because the code at this point doesn’t run - on account of compilation errors related to the logic module.