Why Is Equivalent Python Code So Much Slower

October 07, 2024 Post a Comment

can somebody explain why is the following trivial code (implementation of Euclid's algorithm to find greatest common denominator) about 3 times slower then equivalent code in Ruby

Solution 1:

Summary

"Because the function call overhead in Python is much larger than in Ruby."

Details

Being a microbenchmark, this really doesn't say much about the performance of either language in proper use. Likely you would want to rewrite the program to take advantage of the strengths of Python and Ruby, but this does illustrate one of the weak points of Python at the moment. The root cause of the speed differences come from function call overhead. I made a few tests to illustrate. See below for code and more details. For the Python tests, I used 2000 for both gcd parameters.

Interpreter: Python 2.6.6
Program type: gcd usingfunctioncall
Total CPU time: 29.336 seconds

Interpreter: Python 2.6.6
Program type: gcd using inline code
Total CPU time: 13.194 seconds

Interpreter: Python 2.6.6
Program type: gcd using inline code, with dummy functioncall
Total CPU  time: 30.672 seconds

This tells us that it's not the calculation made by the gcd function that contributes most to the time difference, it's the function call itself. With Python 3.1, the difference is similar:

Interpreter: Python 3.1.3rc1
Program type: gcd usingfunctioncall
Total CPU time: 30.920 seconds

Interpreter: Python 3.1.3rc1
Program type: gcd using inline code
Total CPU time: 15.185 seconds

Interpreter: Python 3.1.3rc1
Program type: gcd using inline code, with dummy functioncall
Total CPU time: 33.739 seconds

Again, the actual calculation is not biggest contributor, it's the function call itself. In Ruby, the function call overhead is much smaller. (Note: I had to use smaller parameters (200) for the Ruby version of the programs because the Ruby profiler really slows down real-time performance. That doesn't affect CPU time performance, though.)

Interpreter: ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]
Program type: gcd usingfunctioncall
Total CPU time: 21.66 seconds

Interpreter: ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]
Program type: gcd using inline code
Total CPU time: 21.31 seconds

Interpreter: ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
Program type: gcd usingfunctioncall
Total CPU time: 27.00 seconds

Interpreter: ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
Program type: gcd using inline code
Total CPU time: 24.83 seconds

Notice how neither Ruby 1.8 nor 1.9 suffer greatly from the gcd function call – the function call vs. inline version are more or less equal. Ruby 1.9 seems to be a little better with less difference between the function call and inline versions.

So the answer to the question is: "because the function call overhead in Python is much larger than in Ruby".

Code

# iter_gcd -- Python 2.x version, with gcd function call#             Python 3.x version uses range instead of xrangefrom sys import argv,stderr

defgcd(m, n):
    if n > m:
        m, n = n, m
    while n != 0:
        rem = m % n
        m = n
        n = rem
    return m

defmain(a1, a2):
    comp = 0for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            comp += gcd(i,j)
    print(comp)

if __name__ == '__main__':
    iflen(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Python 2.x version, inline calculation#             Python 3.x version uses range instead of xrangefrom sys import argv,stderr

defmain(a1, a2):
    comp = 0for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            if i < j:
                m, n = j, i
            else:
                m, n = i, j
            while n != 0:
                rem = m % n
                m = n
                n = rem
            comp += m
    print(comp)

if __name__ == '__main__':
    iflen(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Python 2.x version, inline calculation, dummy function call#             Python 3.x version uses range instead of xrangefrom sys import argv,stderr

defdummyfunc(n, m):
    a = n + m

defmain(a1, a2):
    comp = 0for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            if i < j:
                m, n = j, i
            else:
                m, n = i, j
            while n != 0:
                rem = m % n
                m = n
                n = rem
            comp += m
            dummyfunc(i, j)
    print(comp)

if __name__ == '__main__':
    iflen(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Ruby version, with gcd function calldefgcd(m, n)if n > m
        m, n = n, m
    endwhile n != 0
        rem = m % n
        m = n
        n = rem
    endreturn m
enddefmain(a1, a2)
    comp = 0
    a1.downto 2do|j|1.upto a2-1do|i|
            comp += gcd(i,j)
        endend
    puts comp
endif__FILE__ == $0if ARGV.length != 2$stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    endend

# iter_gcd -- Ruby version, with inline gcddefmain(a1, a2)
    comp = 0
    a1.downto 2do|j|1.upto a2-1do|i|
            m, n = i, j
            if n > m
                m, n = n, m
            endwhile n != 0
                rem = m % n
                m = n
                n = rem
            end
            comp += m
        endend
    puts comp
endif__FILE__ == $0if ARGV.length != 2$stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    endend

Test runs

Finally, the commands used to run Python and Ruby with profiling to get the numbers for comparison were pythonX.X -m cProfile iter_gcdX.py 2000 2000 for Python and rubyX.X -rprofile iter_gcdX.rb 200 200 for Ruby. The reason for the difference is that the Ruby profiler adds a lot of overhead. The results are still valid because I'm comparing the difference between a function call and inline code, not the difference between Python and Ruby as such.

Solution 2:

I can confirm that ruby1.9 is faster than CPython for this "microbenchmark" on my machine:

| Interpreter                     | Time, s | Ratio |
|---------------------------------+---------+-------|
| python-2.6 (cython_gcd.gcd_int) |     2.8 |  0.33 |
| pypy-1.4                        |     3.5 |  0.41 |
| jython-2.5 (java "1.6.0_20")    |     4.7 |  0.55 |
| python-2.6 (cython_gcd.gcd)     |     5.6 |  0.65 |
| ruby-1.9                        |     8.6 |  1.00 |
| jython-2.5                      |     8.9 |  1.03 |
| python-3.2                      |    11.0 |  1.28 |
| python-2.6                      |    15.9 |  1.85 |
| ruby-1.8                        |    42.6 |  4.95 |
#+TBLFM: $3=$2/@6$2;%.2f

Profiler (python -mcProfile iter_gcd.py 4000 3000) shows that 80% of the time spent calling gcd() function, so indeed the difference is in the gcd() function.

I wrote cython_gcd extension for Python using Cython, cython_gcd.pyx:

defgcd(m, n):
    while n:
        n, m = m % n, n
    return m

defgcd_int(int m, int n):
    while n:
        n, m = m % n, n
    return m

It is used in the iter_gcd.py as follows from cython_gcd import gcd, gcd_int.

To try the extension, run: python setup.py build_ext --inplace, where setup.py:

from distutils.coreimport setup
from distutils.extensionimportExtensionfromCython.Distutilsimport build_ext

ext_modules = [Extension("cython_gcd", ["cython_gcd.pyx"])]

setup(
  name = 'Hello world app',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

To install the extension globally, run python setup.py install.

Solution 3:

I seem to remember that ruby handles integers differently than Python, so my guess would be it is simply that Python is spending a lot of time allocating memory while Ruby just mutates the integers in place.

For what it is worth, using Pypy 1.4 reduces the runtime for the Python version on my system from about 15 seconds to under 3 seconds.

Solution 4:

I can't replicate your result. The python code appears to be 4 times faster than the ruby code:

2010-12-0713:49:55:~/tmp$ time python  iter_gcd.py 4000300061356305

real    0m14.655s
user    0m14.633s
sys 0m0.012s

2010-12-0713:43:26:~/tmp$ time ruby iter_gcd.rb 40003000
iter_gcd.rb:14: warning: don't put space before argument parentheses
61356305

real    0m54.298s
user    0m53.955s
sys 0m0.028s

Versions:

2010-12-0713:50:12:~/tmp$ ruby --version
ruby 1.8.7 (2010-06-23 patchlevel 299) [i686-linux]
2010-12-0713:51:52:~/tmp$ python --version
Python 2.6.6

Also, the python code can be made 8% faster:

def gcd(m, n):
    if n > m:
        m, n = n, m
    while n:
        n, m = m % n, n
    return m

def main(a1, a2):
    print sum(
        gcd(i,j)for j in xrange(a1, 1, -1)for i in xrange(1, a2)
    )

if__name__== '__main__':
    from sys import argv
    main(int(argv[1]), int(argv[2]))

Later: when I install and use ruby 1.9.1, the ruby code is way faster:

2010-12-0714:01:08:~/tmp$ ruby1.9.1 --version
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
2010-12-0714:01:30:~/tmp$ time ruby1.9.1 iter_gcd.rb 4000300061356305

real    0m12.137s
user    0m12.037s
sys 0m0.020s

I think your question is really, "Why is ruby 1.9.x so much faster than ruby 1.8.x?"

Python College