The First Optimisations
Jul 18, 2018 04:10 · 1212 words · 6 minute read
After analysing the profiles, I started optimising slow portions of PracMLN. The changes described here have led to significant speedups, bringing runtimes for a reference MLN Exact Inference task down by a further 10%-15%.
Profiling vs Cythonising
There was a fundamental distinction between the information obtained via a the profiles generated, and that required by Cython, for speeding up PracMLN. The profiler pointed me to slow portions in the code, to bottlenecks. It helped me identify exactly which functions needed to be sped up. Cython, on the other hand, required optimisations to be made by variable, not function. This is a very subtle distinction. While it might naively be perceived that every slow function as per the profile simply needs to be redefined using cdef
, this is of little use without typing information. This means that to speed PracMLN up using Cython, it is also important to identify important / extensively utilised variables and data-structures, and then type those.
Verbosity Reduction
A few days back I had reported my inability to reduce the verbosity of the output in my test scripts. This was happening for two separate reasons, both of which were finally identified with significant help from my mentor, Daniel Nyga. Firstly, as verified by the profiles generated, the IO operations weren’t really time-consuming. I think this is not a result of IO being efficient, but caused instead by the paucity of print output. Sure, it’s enough to inundate a human user, but not really enough to actually slow down a modern computer.
That being said, there are two edits required to fix this bug:
- Embarrassing Error in Test Script: there was output from the test script, because well, the script instructed PracMLN to write the results to the screen. Removing the
.write()
from line 49 immediately stops the writing to the terminal. This has been fixed. - Intermediate SciPy Outputs: SciPy functions used in
mln/learning/optimize.py
are verbose. For example, fmin_bfgs, fmin_cg, fmin_ncg, etc. These haven’t been modified, but can trivially be changed by referring to the SciPy documentation. This isn’t terribly vital though, since the test script verbosity has been deemed harmless by the profiler.
Aside - PracMLN Inference
In the same post, I also reported how only the Enumeration-Ask algorithm was functioning bug-free. All other (non-cythonised) algorithms were throwing runtime errors. This needed to be fixed so as to not mandate that all future extensions to PracMLN be made in Cython, and to allow Python too. While this stumped me for a long while, the solution is again embarassingly simple. The mrf
variable typed in the Inference
class needed to be made public
. As explained in the documentation, its necessary for attributes to be declared public
so that Python code can access them. So the error occured because EnumerationAsk(Inference)
has already been converted to Cython, and could access the mrf
attribute, but the other inference related code like WCSPInference(Inference)
is still Python and thus couldn’t access mrf
from Inference
, leading to errors.
Investigating the Profile
The functions deemed interesting to investigate further (ie, high up in the totime
/ cumtime
sorted cprofilev
analyses) were:
contains_gndatom
gndatoms
(@property)gndatom
(@property)negated
(@property)predname
__str__
(@property)value
(@property)weights
The types of the variables in these functions, were investigated by inserting relevant print statements into the code. This commit shows this being done for the list mentioned above. It is worth noting here that, as mentioned above, Cython optimisations required a focus on variables. While the 5 @property optimisations mentioned above didn’t figure in the profile data, it is worthy to note that the functions that indeed are timeconsuming make extensive use of these. Optimising the properties will improve the speed at which those variables are accessed, in turn improving performance across PracMLN code.
Some other portions of PracMLN that I intend to further examine for possible speedups are:
_weights
attribute from the theMLN
class_evidence
attribute from the theMLN
classeval_queries
inexact.pyx
soft_evidence
in theEnumerationAsk
class- miscellaneous
truth
functions from thelogic
module - special functions in the
logic
module, like__eq__
,__call__
, etc
Challenges
There were many challenges that rendered typing variables to be an impossible proposition. A host of local variables (representing truth values, for example) couldn’t be typed because they could all potentially be None
- without invalidating PracMLN semantics. There were lists of custom, user-defined objects, which couldn’t be converted to Cython arrays - like the children
attribute heavily used in the logic
module classes. Moreover, the lengths were also variable at times. Often, variables could have multiple types. In the MLN
class, the _weights
is a list of Strings, floats, or both.
Workflow
It was tricky to figure out a comfortable and efficient workflow for this task. There were multiple steps involved in the process:
- Edit Code
- Build / Compile
- Test Functionality
- Profile and Measure Speed
This was an excruciatingly slow process. Cython builds take a long time, and the runtimes I was working with could at times be as high as 10 minutes each. Running multiple instances of this workflow parallely also wouldn’t help, since both process would be interleaved, increasing runtimes drastically and rendering runtime measurements untenable. Additionally, building while simultaneously running a test has the effect of changing the code midway through execution - leading to all manner of runtime errors.
Over time, I settled into the following pattern of work, and was able to use it fairly well for the purposes of GSoC:
- Select Function to Investigate
- Add Print statements to determine Variable Types involved
- Edit Code by adding Typing Information, if possible
- Rebuild / Compile
- Run a Small, Quick Learning + Inference Test to ensure functionality is unchanged
- Repeat 3-4 until satisfied.
- Profile on Large MLN Inference task, and Measure Speedup obtained
Final Optimisations
The final optimisations seem fairly trivial. In a sense, they are trivial - just adding a few words here and there. However, determining what words to add where has been the exercise over this entire project. The techniques and work documented in the last two blog-posts succintly describes most of the important steps of this process.
All the final optimisations were made by editing just one file - common.pxd
. It turns out that typing attributes defined here sped up the entire logic
module that operated upon them. Hidden variables _predname
and _negated
were typed as str
and int
respectively. Additionally, the MLN
class, which had already been made into an extension-type, was used to type the mln
attribute. Lastly, simply adding a cdef __dict__
in the relevant classes in the inheritance hierarchy measurably improved PracMLN speed.
Future
I have a feeling that some of the special methods defined in the logic module need improvement, although how exactly to bring this about remains unclear to me. The Cython documentation is surprisingly unhelpful about how I may do this. What’s certain is that far too much time is spent in the __get__
, __str__
, __eq__
functions.
There still is a long way to go, and I have a feeling that the profile data analysis should be improved and made more thorough. Daniel’s suggestion is to now further improve the logic
module - with a specific focus on the truth
functions, and the iterworlds
function and _evidence
attribute in the MRF
class. This is mostly going to be the next step I take.