Power of Eloquence

Filter, Map and Reduce functions - the Python way

| Comments

With every new tool, framework or methodology comes along, developers with their insatiable appetite or thirst for knowledge and power, they will find ways to make dedicated time to learn how they work and how they’re planning to use them as part of the day to day job.

And functional programming (FP) - a newish paradigm has been permeating through the scenes of developer community for some time; everything from Haskell, Elixir, React, AWS Lambdas to Clojure etc.

Or, at least it’s yet to make establish some norms within the community…

But I must digress.

Some Refresher Points

After dabbling around with Javascript/React for a while now, every JS developer would be inclined to tell you using map, filter and reduce are the default go-to tools for expressing their FP-ness in all over their front-end codebase.

You’ve probably seen those patterns numerous of times via google searches or countless Medium or FreeCodeCamp tutorial blogpost or what not.

// Typical start of your FPer day

const stayAtHomePersons = [
  {
    id: 1,
    name: 'Josh Hamish',
    cookingScore: 90,
    exerciseScore: 5,
    isSelfIsolating: true
  },
  {
    id: 2,
    name: 'Blake Lively',
    cookingScore: 10,
    exerciseScore: 80,
    isSelfIsolating: true
  },
  {
    id: 3,
    name: 'Ken Jeong',
    cookingScore: 0,
    exerciseScore: 90,
    isSelfIsolating: false
  }
];

// No new surprises here.
const totalSAHPersonScore = stayAtHomePersons
  .filter(person => person.isSelfIsolating)
  .map(person => person.cookingScore + person.exerciseScore)
  .reduce((acc, score) => acc + score, 0);

In the JS world, this is how we roll our F.M.Rs mojos. 👩‍💻👨‍💻😎

Now, let’s head the Python fence, and see how they’re handle things over there.

Disclaimer: Before I proceed further, I just want to say, beforehand, at the time of this writing, as as the entire online community is facing a major unprecedented challenge in our lifetimes and we want all doing the best to treat things with great caution, the examples I’ll be using next may cause upset to the readers and I do not mean to cause any grave amount of discontentment amongst. The purpose of my blog is to purely demonstrate my learnings over the past few weeks since start of the global pandemic situation, and wish to reiterate my platform will be used for this purpose only. I wish to apologise in advance.

Beginning steps

Lately, I have been dabbling on how Python does well with its own filter, map and reduce functions approach, and I couldn’t help to look at the following datasets provided from the Dr John Hopkins University’s Github repo on latest covid19 cases.

/** Daily Covid 19 Cases - NOT ACTUAL DATA **/
[
    {
        "Province/State": "Hubei",
        "Country/Region": "Mainland China",
        "Last Update": "2020-03-01T10:13:19",
        "Confirmed": "1000",
        "Deaths": "10",
        "Recovered": "50",
        ...
        ...
    },
    {
        "Province/State": "",
        "Country/Region": "South Korea",
        "Last Update": "2020-03-01T23:43:03",
        "Confirmed": "100",
        "Deaths": "1",
        "Recovered": "2",
        ...
        ...
    },
    {
        "Province/State": "",
        "Country/Region": "Italy",
        "Last Update": "2020-03-01T23:23:02",
        "Confirmed": "20",
        "Deaths": "2",
        "Recovered": "1",
        ...
        ...
    },
    {
        "Province/State": "Guangdong",
        "Country/Region": "Mainland China",
        "Last Update": "2020-03-01T14:13:18",
        "Confirmed": "20",
        "Deaths": "3",
        "Recovered": "10",
        ...
        ...
    },

    /* and the rest....*/
]

Their data were originally in CSV formatter so I wrote my own little Python script that feeds on these CSV files and have them fully JSONified and from here, I can see immediate pattern going on here.

It reveals that a lot daily cases are collected from every state and provinces in each country and are individually reported for number of cases that are categorised as either confirmed, recovered and death cases. With this, my immediate thought is this would be a good case to place Python FP’s exercise into good use.

To start off, I quickly whipped up my own mini API app using Flask/Falcon - (which doesn’t really matter to be honest)

import falcon

'''
Start a Flask/Falcon app starting with a resource endpoint:
'''

class DailyCovid19Resource(object):

    '''
    Pick one of the covid19 daily JSON resources as a date format
    '''

    def on_get(self, req, resp, file_id):
        try:
            format_matched = re.match(
                "((0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])-[12]\d{3})", file_id)  # mm-dd-yyyy format
            if not format_matched:
                raise ValueError('wrong file_id pattern')

            data = fetch_json_data(file_id)

            country_list = []
            for dic in data:
                if 'Country_Region' in dic:
                    country_list.append(dic['Country_Region'])
                elif 'Country/Region' in dic:
                    country_list.append(dic['Country/Region'])

            unique_countries = list(set(country_list)

            new_data = list(map(get_covid19_schema, data)) # -> this is where it's starting to get interesting
			      ......
			      ......

api = application = falcon.API()

daily_covid19 = DailyCovid19Resource()
api.add_route('/covid19/{file_id}', daily_covid19)

In this example, I made my resource endpoint covid19 and it expects file_id as the main parameter_id used to query a specific json file to be fetched from the server.

In order to make the right (and exact) search for the specific covid19 json file, I decided to add some a bit regex expression here just to make sure file_id matches the date format my json files are named ie mm-dd-yyyy. That way I will have Python exception handler to capture the error should the file_id failed to meet the regex pattern matching requirement, raised the ValueError as that will halt entire get resource API operation.

Once the file_id matching completes, then we can start making the fetch_json_data
call (as below) and fetches correct json file off from the server,

# fetch json data file and serialised as json format
def fetch_json_data(file_id):
    source_file = './some_folder/{}.json'.format(file_id)
    with open(source_file) as covid19_json_file:
        data = json.load(covid19_json_file)
        return data
# list of unique countries
unique_countries = list(set(country_list))

Once the data is fetched upon returning from the fetch_json_data call, I start building out my list of unique countries by using List and Set combo. With this new list, it will be used later which will be traversing the list of countries data of covid19 cases that matches with the dict’s key Country_Region or Country/Region . It has two types of keys because John Hopkins University dataset had revised the data set representations in between the early cases of Coronavirus in January 2020 and Mar 2020.

Beforehand, I need to determine covid19 schema dictionary I want to extract from each item in the array using get_covid19_schema like so.

# Map functions to perform against list to collect row's properties we want
new_data = list(map(get_covid19_schema, data))

......
......

def get_covid19_schema(item_dict):
    schema_keys = ['Country_Region', 'Country/Region',
                   'Confirmed', 'Deaths', 'Recovered']
    mapped_dict = dict((k, item_dict[k])
                       for k in schema_keys if k in item_dict)

    converted_dict = convert_values_to_int(mapped_dict)

    return converted_dict

# as some of the figures don't have numerical values
def convert_values_to_int(item_dict):
    keys = ['Confirmed', 'Deaths', 'Recovered']

    for k in keys:
        # fix data that has empty string
        item_dict[k] = int(item_dict[k]) if item_dict[k] != "" else 0
    return item_dict

See how I’m using map function by passing both a function and an array iterator as parameters? This will tell the map operator to perform such function on each iteratee of the array iterator and return the consequence of the actions as a result. According to its API docs, the map output signature comes as a Map object. Meaning that it’s only a proxy result and we need intermediary function call that will be responsible for converting it into actual list of each element that has been applied either a list, set or tuple. Which in this case, it’s going to be a list of my dictionary results.

This is fascinating as coming from the JS world, we don’t need intermediary function calls when we perform map functions as JS itself does not come with complex data structures (ES6 Sets anyone 🤔?) that Python has. Arrays are the most basic and the most popular structure that JS Developers used by default so it’s easy to see why we didn’t need to worry getting proxying result set that Python has to come grips with… 😶

Later on, now that we got the list of new_data of countries that comes with different state/provinces that has different covid19 cases data, the next job is to make an aggregate of the total number of covid19 cases for each category from each unique country, which I get from my unique_countries list. Each of this country’s aggregate numbers take into this form.

{
        "CountryRegion": "Some Country",
        "Confirmed": "total x Numbers",
        "Deaths": "total y Numbers",
        "Recovered": "total z Numbers",
}

Then, I put them altogether into one big list as a payload response to the client side.

To achieve this goal, I thought ahead of using filter and reduce for this…

And this is what I came up with.

from functools import reduce

new_data_with_sums = []

for uc in unique_countries:
    country_total_dict = reduce(
        (lambda x, y: sum_up_totals_by(uc, x, y)), list(
            filter(lambda x: filter_by_country(uc, x), new_data)))
    new_data_with_sums.append(country_total_dict)

There’s a lot of happening here - which I’ll explain.

  1. I start looping through each country uc from a list of unique countries
  2. For each country I pick, I want to filter the new_data list of countries cases based on the selected country and their individual covid19 cases. With each returned uc cases, I needed to convert into a list. This is because in the data set, there are some countries which have several number of provinces/states (ie China, Russia or USA) where their total number of cases are defined by the geographical distinction such as this example.
  3. Because of this, I need to be aggregating these same covid19 cases from each province/state of the same country to get the complete total number of cases for the country.
  4. Once the total cases for that country is found, I will add it to the new_data_with_sums list.
  5. Go back step 1 and repeat the process all over again until you reach the end of the unique_countries list.
# This is Step 2 operation
filtered_data_by_uc = list(filter(lambda x: filter_by_country(uc, x), new_data)

The above snippet says with new_datalist , I apply filter function by passing filter_by_country callback that is responsible for filtering out countries in the same list, To filter the data correctly, I rely on the closures upon uc (which is gathered from unique_countries looping call earlier and its current element (x) that’s being interated upon.

This is equivalent to JS snippet below.

const filteredList = new_data.filter(filterByCountry(uc));

Notice they’re conceptually the same, but the syntax between the two clearly differs. lambda keyword is synonymous to anonymous functions for Javascript. Both of them make use of closures thus both have something in common.

And notice we put list as a wrapper around the filter function? Apparently, filter also returns a Map object, just like map example earlier. You can return in any type of data structure you want as well.

Here’s the filter_by_country implementation.

def filter_by_country(uc, item_dict):
    country_keys = ['Country_Region', 'Country/Region']

    for k in country_keys:
        if k in item_dict and item_dict[k] == uc:
            return item_dict

With that mind, we take filtered_data_by_uc and pipe that into our next action step

# Step 3
from functools import reduce

country_total_dict = reduce(
        (lambda x, y: sum_up_totals_by(uc, x, y)), filtered_data_by_uc))

Here, we use reduce method to take in the filtered_data_by_uc list and take each country’s number of covid19 cases to aggregrate them (using sum_up_totals callback) in get the total sums of confirmed, recovered and death cases, producing into a standalone dict object for that particular country.

Here’s the implementation for sum_up_totals

def sum_up_totals_by(uc, x, y):
    keys = ['Confirmed', 'Deaths', 'Recovered']
    result = {'Country_Region': uc}

    for k in keys:
        result[k] = x[k] + y[k]

    return result

Again, the reduce function signature is pretty much the same with filter and map counterparts ie using lambdas, closures etc - with the minor difference is that it doesn’t return a list or any similar iterable object, but rather as a single value (or an accumulator if you like) after applying the lambda function as above.

This comes off as expected as any FP developers from Haskell, Scala, Erlang, Clojure communities etc would tell you that’s how they live and breathe writing this kind of code - like a boss 😎.

But with reduce function, you can also return an iterable object as an accumulator result as well.

Which let me to think I didn’t need the for-loop to append individual unique countries’ aggregate results into new_data_with_sums.

I could simply rewrite to this.

from functools import reduce

new_data_with_sums = []

# notice the third argument is introduced to this reducer function
new_data_with_sums = reduce(lambda acc, current: sum_up_totals_each_country(acc, current, new_data),
                        unique_countries, new_data_with_sums)

With the introduction of new sum_up_totals_each_country lambda function, I’ve basically moved filter_by_country filtering function inside that function where (by using closures and callbacks) it can gain access to the list of unique_countries and the raw new_data to performing the list filtering from there.

Once the filtering is complete, using the acc which is a proxy to new_data_with_sums, I start to traverse the same filtered list to accumulate the total cases results for a country that has multiple states and provinces and have it appended to the acc

Here’s the implementation of sum_up_totals_each_country.

def sum_up_totals_each_country(acc, current, new_data):
    keys = ['Confirmed', 'Deaths', 'Recovered']

    # the same list filtering as before
    each_country_covid19_list = list(
        filter(lambda x: filter_by_country(current, x), new_data))

    # some countries have multiple states and provinces of data hence for this nested looping
    country_case_totals = {}
    for each_country_covid19 in each_country_covid19_list:
        for k in keys:
            temp = each_country_covid19[k]
            if k in country_case_totals:
                country_case_totals[k] = temp + country_case_totals[k]
            else:
                country_case_totals[k] = temp

    # fixed property name as I don't need another alias: 'Country/Region' to be return in the response payload
    country_case_totals['Country_Region'] = current

    acc.append(country_case_totals)
    return acc

And the rest is history.

Concluding Thoughts

Notice how the Python’s very own built-in functional tools (map, filter and reduce ) are slightly different to Javascript counterpart, at least not for the conceptual level but on the syntactical level.

Though it is said both Javascript and Python supposedly treat functions as first class citizens (unlike Java where everything is 100% OOD - boo!) yet I found the thing that stands out the most different between the two is that Javascript support method chaining for iterables, right out of the box.

Python doesn’t do that by design for its own set of iterables when doing functional programming. Thus I always wondered why I had to write the code above by wrapping functions one on top of another… which lead me to have the fact that Python’s design philosophy was driven in an imperative-style way for a long time. It’s never meant to go to the FP route, so says this blog post from the original Python creator himself. Upon reading these, that may explain the deep culture of imperative style coding Pythonistas have been doing for a very long time using tools like itertools and list comprehensions for more efficient data looping in place.

In spite of that, it’s still evident we are given the same FP toolset to accomplish here vs I have been doing on the JS/React front for some considerable. IMHO, given how multi-paradigm languages evolve and influence each other (just like how Python influenced Javascript to have Pythonic flavours) and now that Python 2 is sunset already at the beginning of 2020, who knows where Python 3 is going to move forward from here. I strongly believe you can still teach an old dog some new tricks. If Javascript can learn to be Pythonista by day, I don’t see why Python can’t come out of its comfort shell to learn new toolset from other developers community that addresses some of its limitations for modern problems.

I’m pretty much excited of what lies ahead for Python as a whole and many others. I can’t wait to start using more of these tools set in future projects. (Scala anyone? 😌😉)

Till then, Happy Coding, (and remember: stay safe and do your social distance-coding rules)!

Comments