EnumerationAsk
Jun 17, 2018 19:14 · 1285 words · 7 minute read
EnumerationAsk
performs exact inference over a Markov Logic Network. While it is not a very commonly used technique in the real world, where quicker approximations often suffice, it is nonetheless an important algorithm. Additionally, there is also a lot of potential for speeding it up, since the algorithm involves an enumeration of all possible worlds - each variable is assigned each possible value, and the probability of the resulting world is evaluated. After some discussion with my mentor, Daniel, we decided that this was the perfect algorithm to use for creating a reference implementation, paving the way forward for the rest of PracMLN to follow.
For a little over a week now, I have been trying to convert the Enumeration-Ask algorithm from exact.py to a Cython extension type.
Extension Types
Extension types are classes that have been compiled via Cython (so cdef classes
, essentially). They allow typing information to be added to attributes, and speed up attribute access times (by replacing a Python dictionary lookup with a C struct access). The official documentation for extension types is long and detailed, but, as I found out, not quite exhaustive.
A cursory understanding of extension types might lead to the misconception that this can be achieved by simply adding the word cdef
before the EnumerationAsk
class definition in exact.pyx. This approach fails and is incomplete.
Approach
However, just by applying this single change and observing the compiler output, a lot of insight can be gained.
... $ pracmln/python3/pracmln/mln/inference $ python3 setup.py build_ext --inplace
Compiling exact.pyx because it changed.
[1/1] Cythonizing exact.pyx
Error compiling Cython file:
------------------------------------------------------------
...
numerators[i] += expsum
denominator += expsum
return numerators, denominator
cdef class EnumerationAsk(Inference):
^
------------------------------------------------------------
exact.pyx:77:26: First base of 'EnumerationAsk' is not an extension type
This is not surprising, and is mentioned explicitly in the documentation.
EnumerationAsk
inherits from the Inference
class, in infer.py. At this point, it would be helpful to consult again the UML diagrams from imgur, to see what PracMLN dependencies look like.
More important, however, this technique formed the basis of my workflow for most of last week. Just reading the documentation didn’t make me an expert on Cython, and I didn’t have the expertise or foresight to tell before hand what work would need to be done to successfully make EnumerationAsk
into an extension type. So, instead of working up from the bottom, I employed this top down approach:
- Define as a
cdef
class whichever module is to be converted to an extension type. - Observe carefully the compilers complaints.
- Fix / workaround errors until compilation is successfull.
- Run basic tests.
- Fix / workaround errors till Cython can replicate pure Python functionality.
Having devised this approach after various iterations (and a few days of trial and error), I proceeded further with my plan.
EnumerationAsk Dependencies
I kept proceeding further down this rabbit hole, hopeful of reaching the bottom eventually. Daniel provided valuable insight along the way, proposing that most of the computation heavy stuff was being done manipulating the _evidence
variable of the MRF
class in mrf.py, and in the logic module.
The final (dependency) tree of edited files looked somewhat like this:
Both base.pyx
and mrf.pyx
depended on the various parts of the logic module. This involved several complications and has been skipped for the time being.
Errors Encountered
While employing this technique, I came across a variety of different compiler complaints, and varying degrees of support available online. While there was significant help avaialble online for a few errors, there were a few that the Cython community had not fully addressed online. I am listing these here:
ValueError
ValueError: ... has the wrong size, try recompiling. Expected 32, got 16
- Reproducible: No (!)
- Help Available Online: No
- Cause Understood: No
- Solved: Yes
This was a runtime error encountered early on in the conversion of EnumerationAsk
to an extension type. While the only remotely related issue online was wildly unhelpful, this error turned out to be the cause of a silly setup mistake. The fix is simple. A symbolic link needs to be created linking the built Cython code with the location where Python expects the file to be, in the usual PracMLN imports usage. For example, if this error is being generated for mrf.pyx
, then simply create a link like this:
... pracmln/python3/pracmln/mln $ ls -l | grep mrf.cpython
lrwxrwxrwx 1 kaivalya kaivalya 47 Jun 18 22:24 mrf.cpython-35m-x86_64-linux-gnu.so -> pracmln/mln/mrf.cpython-35m-x86_64-linux-gnu.so
CompilerCrash
[1/1] Cythonizing base.pyx
Error compiling Cython file:
------------------------------------------------------------
...
formula = self.logic.parse_formula(formula)
elif type(formula) is int:
return self._formulas[formula]
constants = {}
formula.vardoms(None, constants)
for domain, constants in constants.items():
^
------------------------------------------------------------
base.pyx:251:48: Compiler crash in AnalyseExpressionsTransform
.
.
.
Compiler crash traceback from this point on:
File "/home/kaivalya/ ... /python3.5/site-packages/Cython/Compiler/ExprNodes.py", line 5226, in infer_type
arg_types = [arg.infer_type(env) for arg in self.args]
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
File "setup.py", line 13, in <module>
ext_modules=cythonize( ['*.pyx'] )
File "/home/kaivalya/ ... /python3.5/site-packages/Cython/Build/Dependencies.py", line 1026, in cythonize
cythonize_one(*args)
File "/home/kaivalya/ ... /python3.5/site-packages/Cython/Build/Dependencies.py", line 1146, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: base.pyx
- Reproducible: Yes
- Help Available Online: No
- Cause Understood: No
- Solved: Yes
This compiletime error occurred twice in base.pyx
. It concerns the constants variable on line 249 and line 773 (since renamed here). If both variables are typed as dictionaries, this error magically disappears.
ImportError
Traceback (most recent call last):
File "test.py", line 5, in <module>
from pracmln import query
File "/home/kaivalya/ ... /pracmln/python3/pracmln/__init__.py", line 22, in <module>
from .mln.base import MLN
File "/home/kaivalya/ ... /pracmln/python3/pracmln/mln/__init__.py", line 27, in <module>
from .base import MLN
File "common.pxd", line 1, in init pracmln.mln.base
File "/home/kaivalya/ ... /pracmln/python3/pracmln/logic/__init__.py", line 1, in <module>
from .fol import FirstOrderLogic
File "/home/kaivalya/ ... /pracmln/python3/pracmln/logic/fol.py", line 26, in <module>
from .common import Logic
ImportError: No module named 'pracmln.logic.common'
- Reproducible: Yes
- Help Available Online: Yes
- Cause Understood: Yes
- Solved: No
There were many runtime import errors encountered, mostly related to the logic module. They seem to provide a significant challenge and I am yet to resolve them. I shall perhaps deal with them in a separate blog post.
Summary
The workflow described above was arrived at after significant experimentation. I would love to hear feedback about it, and make improvements if possible. Additionally, more clarity on the errors mentioned here would also be beneficial. I was able to successfully apply this method, and convert the EnumerationAsk
, Inference
, MRF
, and MLN
classes to extension types this way. However, none of these could be tested (a serious drawback of taking this “top down” approach), because the code at this point doesn’t run - on account of compilation errors related to the logic
module.