Thursday, November 17, 2011

Analysing python's performance under PyPy

The traditional model of analysing performance of Python programs has been
"run the profiler, find your bottlenecks, optimize them or move them to C".
I personally find this approach grossly insufficient in many cases, especially
in the context of PyPy. Particular problems are:



  • In many large applications, the profile is flat: PyPy's own translation
    toolchain, Twisted or any modern web servers are good examples.

  • Once you have found a bottleneck, it's not entirely clear what's slow inside
    that particular function or set of functions. The usual set of common
    knowledge about what's slow and what's fast is a moving target even in the
    case of CPython. In the presence of a JIT, the situation is even more complex.
    A look at how the JIT compiled a particular piece of code becomes crucial.

  • Performance problems, especially GC-related ones might not show up in the
    profile, they might be just equally spread around many functions.


PyPy comes with several tools at different levels of maturity that should help
you identify problems. I'll outline in a few simple steps how I approach
performance analysis of programs. Remember, these are just guidelines and
there is no silver bullet. If your application is complex enough, you might need
lots and lots of lead bullets :-)


This post, already pretty lengthy, comes with no examples. I'll try to provide
one with examples in the near future.



Create tests


This might come as a surprising point in this blog post, since it's not about
quality, but tests improve the possibility to experiment with your code. If
you have lots of automated tests, chances are you'll be able to refactor your
code in a more performance-oriented manner without actually breaking it.




Write some benchmarks


This is an absolutely crucial starting point. You have to be able to measure
the impact of your changes by running a single script, preferably with few
arguments. ab is not good enough.


If your application is not a once-off script, you should be able to measure
how the JIT warmup time affects your performance by running the same test
repeatedly. It also helps to visualize how it varies between consecutive runs.


My usual benchmarks, unless there are other reasons, run between 0.2s and 5s
per step. This helps to minimize the impact of random variances. The JIT warmup
time varies vastly depending on your code base. It can be anything from
unnoticable to a minute. Measure before making judgements.




Glance through cProfile results


I personally use a modified tool called lsprofcalltree.py that runs Python
profiler (CProfile) and provides
output compatible with an awesome tool called kcachegrind. This might
or might not provide any useful info. If there are functions that stick out
of the profile, glance through them. Are they clearly inefficient? Do they
use inefficient algorithms? Don't bother micro-optimizing them yet.




Check the GC/JIT/other ratio


There is a very useful tool in the pypy codebase to do that. Assuming you are
running your program under a pypy virtualenv, simply run:



PYPYLOG=log ./test.py

and then from pypy checkout:



pypy/tool/logparser.py print-summary log -

you can even look at the pretty graph by doing:



pypy/tool/logparser.py draw-time log out.png

This should give you a rough overview of how much time is spent doing what.
Included times are GC, JIT tracing (that's a warmup phase) and
other which includes running JITted code.




Use jitviewer


JitViewer might be very confusing but it gives you very rough overview of
what is going on in your code. See the README for more details about how it
works, but in general you can see how your Python code got compiled to the
intermediate representation for the JIT (and to the assembler). It's not
that interesting
to see precisely which part got compiled how, but it's important to look how
many intermediate-representation instructions (resops) are created per
piece of Python. Also some stuff, like various kinds of call and allocations
new_xxx are more costly than others. Track those carefully.




Think about potential modifications


Try running through different ways to express the same thing in places that
show high up on profiling or in the jitviewer. Don't take anything for granted
-- only trust measurments. Most of the time there is "ah that's nonsense"
feeling that leads to some improvements.




That's all


This is pretty much it -- as I said before there are no hard rules. Know your
tooling and try various things. The faster you can iterate, the more options
you can try. Understanding details is usually crucial and can lead to
interesting improvements. There is a non-exhaustive and not-always-up-to-date
list of things JIT likes more to be used or at least tried.


In the next episode I'll try to walk over an example of improvements based
on an actual project.


Cheers,
fijal


1 comment:

  1. Exactly what I wanted! :-)

    You should add a Flattr button: https://flattr.com

    ReplyDelete