# Python vs Julia - an example from machine learning

In Speeding up isotonic regression in scikit-learn, we dropped down into Cython to improve the performance of a regression algorithm. I thought it would be interesting to compare the performance of this (optimized) code in Python against the naive Julia implementation.

This article continues on from the previous one, so it may be worth reading that before continuing here to obtain the necessary background information.

We'll implement both of the algorithms for the previous article, and compare their performance in Julia against Python.

## Linear PAVA

The Cython code is available on GitHub at `scikit-learn`

,
and the Julia code is available on GitHub at Isotonic.jl

The Julia implementation is straightforward implementation of PAVA,
without any bells and whistles. The `@inbounds`

macro was used to
compare fairly with the Cython implementation, which turns off bounds
checking as well.

## Active Set

The active set implementation is approximately the same number of
lines as the Cython implementation, and is perhaps more cleanly
structured (via an explicit composite type `ActiveState`

) that
maintains a given active dual variable's parameters. It is also
trivial to break repeated code into separated functions that
can be trivially inlined by LLVM, while this is difficult for
arbitrary arguments in Cython.

One-based indexing in Julia also made the algorithm somewhat cleaner.

## Performance

We see that exactly the same algorithm in Julia is uniformly faster when compared to an equivalent Cython implementation.

For the active set implementations, Julia is anywhere between **5x and
300x faster** on equivalent regression problems.

For the linear PAVA implementation, Julia is between **1.1x and 4x
faster**.

This certainly indicates Julia is a very attractive choice for performance-critical machine learning applications.

See the iJulia notebook for more information on how these performance measurements were obtained.