LinkedIn link GitHub link Twitter link

Python package showcasing Secret Santa

We are pleased to announce the initial public release of our Python tutorial package showcasing a basic Secret Santa application. Check it out on GitHub!

At Mirai Solutions, we have abundant experience building models and analytic applications using the R language and its various interfaces, most prominently to C++, but also to Java or Spark. In recent years, as many others, we have witnessed Python becoming the leading language for some of the hottest areas in data science. With this, also I had started getting more into Python through packages such as TensorFlow and PySpark. Prototyping and making use of Jupyter notebooks is a lot of fun, but eventually I also had to get my head around what some would call the software engineering side of data science: packaging Python code in a way suitable for deployment and production.

General concepts obviously remain valid across many languages, but implementations and usage can vary a lot. In order to make things easier for others, especially those that would not have the time and opportunity to look into things themselves in detail, I decided to take a company-internal Secret Santa event as the perfect occasion to set up an exemplary Python tutorial package :). This would give everyone a good kick-start and a number of references related to the topic. The idea was to stick to the most common and popular tools, among the sometimes large selection of choices. I haven’t aimed at providing the most detailed explanation, but rather a concise deep-dive example to copy from, for those who might be overwhelmed by the abundance of material available to learn about each of the individual aspects involved.

Today, I am happy to make this tutorial package publicly available! We believe it will help Python beginners facing challenges when they want to package and maintain their projects. If you haven’t heard about the general concepts before, it might be hard to follow, but if you have been doing similar things with R already, this should give you all you need. The README contains a walkthrough and the package illustrates common concepts such as dependency management, testing, documentation and more. Further topics will be covered in the near future, continuous integration and publishing to PyPI being the next ones. If you like what you see, make sure to check back soon! Also, if you have any ideas or wishes for improvement, I am eager to hear about them or receive contributions.

Contents overview

File / Directory Purpose
docs directory containing source files used to generate documentation with Sphinx
secretsanta directory containing the Python modules and code
tests directory containing the Python unit tests that test the code in secretsanta
SecretSanta.ipynb Jupyter notebook to conveniently use and showcase the secretsanta package
requirements-package.in package dependencies definition
requirements.in module dependencies definition
setup.py a Python script containing instructions used to install the secretsanta package
tox.ini configuration for several testing and code style frameworks

Coding tips

The code itself also contains a few documented and explained examples of some basic Python coding concepts, such as:

myDict = dict(a = 1, b = 2, c = 3)

# Dictionary (or list) comprehensions are an alternative to using loops. As
# shown below, it is possible to use expressions to define what to iterate over,
# what to return and also if a condition must be met for each item (otherwise
# it is not returned).
myDictSq = {key : value ** 2 for (key, value) in myDict.items() if value > 1}

print(myDictSq)
## {'c': 9, 'b': 4}

Type hints using mypy

Type hints are also covered and can be very useful to reduce chances of running into corner cases and bugs. E.g. consider the following function saved in a file of the same name:

def wrong_type_hint(x: int, y: int) -> int:
  return x / y

While inputs could indeed be restricted to integers, the same is not the case for the returned value. Running mypy on this file and function reveals this:

mypy wrong_type_hint.py

Note that the above might also use underlying Python 2 libraries on your system. To call mypy explicitly under Python 3, use this instead:

python3 -m mypy wrong_type_hint.py
## wrong_type_hint.py:2: error: Incompatible return value type (got "float", expected "int")
## Found 1 error in 1 file (checked 1 source file)

Once we fix the function by changing the return type from int to float, it silently passes the mypy check:

def working_type_hint(x: int, y: int) -> float:
  return x / y
python3 -m mypy working_type_hint.py
## Success: no issues found in 1 source file

Documentation using Sphinx

We also introduce simple things like docstrings, which may include some hints about non-obvious things:

def simple_ratio(x: float, y: float) -> float:
  # "\" is used in the docstring to escape the line ending in sphinx output
  """
  calculates the ratio between two numbers
  
  :param x: the numerator
  :param y: the denominator
  :return: ratio of the two numbers returned as a floating\
  point number
  
  :Example:
  
  >>> result = simple_ratio(3.0, 4.0)
  >>> result
  0.75
  """
  return x / y

Unit tests using pytest

Subsequently, we can define a small unit test for the function to make sure it behaves as expected:

from unittest import TestCase
from simple_ratio import simple_ratio


class SimpleUnitTest(TestCase):
    def test_simple_ratio_returns_expected_result(self):
        # Note the '.0': While Python 3 implicitly converts the ratio of two
        # integers to a float, that was not yet the case in Python 2, which
        # would do an integer division if 3 and 4 were used instead.
        res = simple_ratio(3.0, 4.0)
        assert(res == 0.75)

With pytest we can call the unit test:

python3 -m pytest test_simple_ratio.py
## ============================= test session starts ==============================
## platform linux -- Python 3.6.9, pytest-5.3.1, py-1.8.0, pluggy-0.13.1
## cachedir: tmp/.pytest_cache
## rootdir: /
## collected 1 item
## 
## test_simple_ratio.py .                                                   [100%]
## 
## ============================== 1 passed in 0.01s ===============================