The First Optimisations

Jul 18, 2018 04:10 · 1212 words · 6 minute read

After analysing the profiles, I started optimising slow portions of PracMLN. The changes described here have led to significant speedups, bringing runtimes for a reference MLN Exact Inference task down by a further 10%-15%.

Profiling vs Cythonising

There was a fundamental distinction between the information obtained via a the profiles generated, and that required by Cython, for speeding up PracMLN. The profiler pointed me to slow portions in the code, to bottlenecks. It helped me identify exactly which functions needed to be sped up. Cython, on the other hand, required optimisations to be made by variable, not function. This is a very subtle distinction. While it might naively be perceived that every slow function as per the profile simply needs to be redefined using cdef, this is of little use without typing information. This means that to speed PracMLN up using Cython, it is also important to identify important / extensively utilised variables and data-structures, and then type those.

Verbosity Reduction

A few days back I had reported my inability to reduce the verbosity of the output in my test scripts. This was happening for two separate reasons, both of which were finally identified with significant help from my mentor, Daniel Nyga. Firstly, as verified by the profiles generated, the IO operations weren’t really time-consuming. I think this is not a result of IO being efficient, but caused instead by the paucity of print output. Sure, it’s enough to inundate a human user, but not really enough to actually slow down a modern computer.

That being said, there are two edits required to fix this bug:

  1. Embarrassing Error in Test Script: there was output from the test script, because well, the script instructed PracMLN to write the results to the screen. Removing the .write() from line 49 immediately stops the writing to the terminal. This has been fixed.
  2. Intermediate SciPy Outputs: SciPy functions used in mln/learning/optimize.py are verbose. For example, fmin_bfgs, fmin_cg, fmin_ncg, etc. These haven’t been modified, but can trivially be changed by referring to the SciPy documentation. This isn’t terribly vital though, since the test script verbosity has been deemed harmless by the profiler.

Aside - PracMLN Inference

In the same post, I also reported how only the Enumeration-Ask algorithm was functioning bug-free. All other (non-cythonised) algorithms were throwing runtime errors. This needed to be fixed so as to not mandate that all future extensions to PracMLN be made in Cython, and to allow Python too. While this stumped me for a long while, the solution is again embarassingly simple. The mrf variable typed in the Inference class needed to be made public. As explained in the documentation, its necessary for attributes to be declared public so that Python code can access them. So the error occured because EnumerationAsk(Inference) has already been converted to Cython, and could access the mrf attribute, but the other inference related code like WCSPInference(Inference) is still Python and thus couldn’t access mrf from Inference, leading to errors.

Investigating the Profile

The functions deemed interesting to investigate further (ie, high up in the totime / cumtime sorted cprofilev analyses) were:

  • contains_gndatom
  • gndatoms
  • (@property)gndatom
  • (@property)negated
  • (@property)predname
  • __str__
  • (@property)value
  • (@property)weights

The types of the variables in these functions, were investigated by inserting relevant print statements into the code. This commit shows this being done for the list mentioned above. It is worth noting here that, as mentioned above, Cython optimisations required a focus on variables. While the 5 @property optimisations mentioned above didn’t figure in the profile data, it is worthy to note that the functions that indeed are timeconsuming make extensive use of these. Optimising the properties will improve the speed at which those variables are accessed, in turn improving performance across PracMLN code.

Some other portions of PracMLN that I intend to further examine for possible speedups are:

  • _weights attribute from the the MLN class
  • _evidence attribute from the the MLN class
  • eval_queries in exact.pyx
  • soft_evidence in the EnumerationAsk class
  • miscellaneous truth functions from the logic module
  • special functions in the logic module, like __eq__, __call__, etc

Challenges

There were many challenges that rendered typing variables to be an impossible proposition. A host of local variables (representing truth values, for example) couldn’t be typed because they could all potentially be None - without invalidating PracMLN semantics. There were lists of custom, user-defined objects, which couldn’t be converted to Cython arrays - like the children attribute heavily used in the logic module classes. Moreover, the lengths were also variable at times. Often, variables could have multiple types. In the MLN class, the _weights is a list of Strings, floats, or both.

Workflow

It was tricky to figure out a comfortable and efficient workflow for this task. There were multiple steps involved in the process:

  1. Edit Code
  2. Build / Compile
  3. Test Functionality
  4. Profile and Measure Speed

This was an excruciatingly slow process. Cython builds take a long time, and the runtimes I was working with could at times be as high as 10 minutes each. Running multiple instances of this workflow parallely also wouldn’t help, since both process would be interleaved, increasing runtimes drastically and rendering runtime measurements untenable. Additionally, building while simultaneously running a test has the effect of changing the code midway through execution - leading to all manner of runtime errors.

Over time, I settled into the following pattern of work, and was able to use it fairly well for the purposes of GSoC:

  1. Select Function to Investigate
  2. Add Print statements to determine Variable Types involved
  3. Edit Code by adding Typing Information, if possible
  4. Rebuild / Compile
  5. Run a Small, Quick Learning + Inference Test to ensure functionality is unchanged
  6. Repeat 3-4 until satisfied.
  7. Profile on Large MLN Inference task, and Measure Speedup obtained

Final Optimisations

The final optimisations seem fairly trivial. In a sense, they are trivial - just adding a few words here and there. However, determining what words to add where has been the exercise over this entire project. The techniques and work documented in the last two blog-posts succintly describes most of the important steps of this process.

All the final optimisations were made by editing just one file - common.pxd. It turns out that typing attributes defined here sped up the entire logic module that operated upon them. Hidden variables _predname and _negated were typed as str and int respectively. Additionally, the MLN class, which had already been made into an extension-type, was used to type the mln attribute. Lastly, simply adding a cdef __dict__ in the relevant classes in the inheritance hierarchy measurably improved PracMLN speed.

Future

I have a feeling that some of the special methods defined in the logic module need improvement, although how exactly to bring this about remains unclear to me. The Cython documentation is surprisingly unhelpful about how I may do this. What’s certain is that far too much time is spent in the __get__, __str__, __eq__ functions.

There still is a long way to go, and I have a feeling that the profile data analysis should be improved and made more thorough. Daniel’s suggestion is to now further improve the logic module - with a specific focus on the truth functions, and the iterworlds function and _evidence attribute in the MRF class. This is mostly going to be the next step I take.