A Review

Jul 6, 2018 04:38 · 1010 words · 5 minute read

A review is most appropriate at the end of a project, to reflect on things well-done, and not-so-well-done. However, I thought to take pause in the middle of my project to reflect on some of my learnings and shortcomings over the last 2 months.

Project Status

As of now, using the latest test scripts from my PracTests repository, there seems to be around a 20% speed increase in PracMLN.

On my machine, a sample run yields:

kaivalya@kaivalyarawal ~/ ... /PracTests/GenericSystemTest $ bash test.sh 
[Cython]
PracMLN Generic Systems Test:

Start Smoker Inference
=== INFERENCE TEST: EnumerationAsk ===
=== INFERENCE TEST: EnumerationAsk ===
Finish Smoker Inference
Start Taxonomy Inference
=== INFERENCE TEST: EnumerationAsk ===
action_role(w1,_):
[■■■■■■■■■■■■■■■■■] 100.000 % action_role(w1,theme)
.
.
.
all test finished after 5.779006242752075 secs
[Python]
PracMLN Generic Systems Test:

Start Smoker Inference
=== INFERENCE TEST: EnumerationAsk ===
=== INFERENCE TEST: EnumerationAsk ===
Finish Smoker Inference
Start Taxonomy Inference
=== INFERENCE TEST: EnumerationAsk ===
action_role(w1,_):
[■■■■■■■■■■■■■■■■■] 100.000 % action_role(w1,theme)
.
.
.
all test finished after 7.0214879512786865 secs

For a speedup of (7.02/5.78) - 1 = 0.21

However, some strange behaviour is being observed.

The output seems to remain verbose after Taxonomy inference starts, despite the verbosity being turned off query(... verbose=False, ...).
Only the Enumeration-Ask algorithm is functional, with all the others yielding runtime errors.

Verbosity

IO operations are very time consuming, and this experiment can’t be considered relaible unless the output can be minimised. This seems to be a minor bug that I need to fix before proceeding with the rest of my work. Hopefully this issue will be resolved by the time this blog post is actually being read / used by someone.

Other Inference Algorithms

Running any other inference algorithm leads to a runtime error (eg uncomment line 23 from the test.py in the gist above). This causes the following error:

kaivalya@kaivalyarawal ~/ ... /PracTests/GenericSystemTest $ bash test.sh 
[Cython]
PracMLN Generic Systems Test:

Start Smoker Inference
=== INFERENCE TEST: EnumerationAsk ===
=== INFERENCE TEST: EnumerationAsk ===
=== INFERENCE TEST: MC-SAT ===
Traceback (most recent call last):
  File "test.py", line 111, in <module>
    main()
  File "test.py", line 108, in main
    runall()
  File "test.py", line 88, in runall
    test_inference_smokers()
  File "test.py", line 33, in test_inference_smokers
    multicore=multicore).run()
  File "/home/kaivalya/ ... /python3/pracmln/mlnquery.py", line 249, in run
    result = inference.run()
  File "infer.pyx", line 190, in pracmln.mln.inference.infer.Inference.run
  File "wcspinfer.pyx", line 48, in pracmln.mln.inference.wcspinfer.WCSPInference._run
AttributeError: 'WCSPInference' object has no attribute 'mrf'
[Python]
PracMLN Generic Systems Test:

Start Smoker Inference
=== INFERENCE TEST: EnumerationAsk ===
=== INFERENCE TEST: EnumerationAsk ===
=== INFERENCE TEST: MC-SAT ===

=== INFERENCE TEST: MC-SAT ===

Finish Smoker Inference
Start Taxonomy Inference
=== INFERENCE TEST: EnumerationAsk ===
action_role(w1,_):
.
.
.
Finish Taxonomy Learning

all test finished after 7.618952035903931 secs

So WCSPInference(Inference), in wcspinfer.pyx was unable to access the attribute of its superclass, Inference, defined in infer.pyx. But EnumerationAsk, in exact.pyx, could do so - because it itself was a cdef class. Identical errors can be obtained for MCSAT and GibbsSampler too. So apparently Python (non cdef) subclasses can’t inherit attributes from Cython superclasses? I can’t find any reports of similar problems online though. Additionally, no similar problem occured in the logic module, when both FirstOrderLogic(Logic) and FuzzyLogic(Logic) inherited from Logic.

One approach is to ignore it altogether - eventually, all inference algorithms will be implemented via Cython extension types, instead of Python classes. Thus this problem will solve itself. However, as Daniel rightly pointed out when I mentioned this to him - new algorithms may be developed, and implemented (atleast at first) in Python. PracMLN can’t disallow inference algorithms implemented in Python - even if it chooses to use Cython to implement the preexisting ones. This is a problem is more vexing than it seems to be on the surface, and has stumped me so far.

Lessons and Personal Takeaways

This has been my first time working on such a large codebase. In fact, it is the first real work I have done in my life. I have realised over the course of this project, the differences between student life and professional life. The complications of a working environment were lost on me until recently - there was no professor here, and no prescribed syllabus. There was no set material to study, the mastery of which determined success. At first, it was hard to learn the ropes in this open-ended environment.

However, under the extremely helpful guidance of Daniel Nyga, I was able to learn to operate in this environment. The redesigned timeline and more realistic milestones set by him were integral to getting my work started in earnest. I have learned more about Cython and software development in the last few months, than in the entire duration of my undergraduate studies. More importantly though, I have learnt the importance of being able to break up a large problem into subproblems, and then work on each individually. This is due in part to the expert use of Trello made by the PracMLN project, and is something I hope to emulate in the future.

Future

I am only now learning the Golden Rule of Optimisation:

'Premature optimization is the root of all evil' - Donald Knuth

If I had come across this page, or even paid more attention to the documentation that I otherwise read so diligently, I would’ve known this. I would’ve then been prevented from this evil.

It really is evil. The Cython guide goes so far as to say “… Never optimize without having profiled. Let me repeat this: Never optimize without having profiled your code. Your thoughts about which part of your code takes too much time are wrong.” So there is this thing called profiling, which I should’ve done first, which I am going to only start doing now.

I hope to thus find and optimise the largest of the performance bottlenecks in the code, and in doing so, chart a path that can be followed in the future to bring further speedup to PracMLN. This reference implementation, along with this blog, can then serve as a complete documentation, informing and guiding future work.