Power of Eloquence

Made URL shortener tool using Python/Redis

| Comments

There are a lot of URL shortener web services online that marketers and advertisers use to promote their URL content such as Google URL shortener, Bitly and TinyURL etc. They all do one thing in common, ie they take long URLs; have them shortened in length; and when visitors click on the modified links, they will be redirected to actual link content.

Given how good the other URL shortener web service implementations are done, I got myself involved in wanting to find out how I could achieve the same thing by implementing my own simple version of URL shortener service.

For a typical URL shortener service to be useful for people wanting to publish shareable content, it must simply fulfil the following:

  1. It takes a raw URL and uses some string hashing encoding algorithm to not only shorten the URL but give the shortened URL must be a unique value.
  2. The same unique shortened URL is used to map with the raw URL, thus we need to store this mapping in our hash data structure (and store this hash structure in the backend).
  3. When a user clicks on a shortened URL, the service searches for the raw URL by decoding the shortened URL as its actual index key. Once the key is found, the user is then redirected to the same long URL being fetched.
  4. Upon being redirected to the long URL, we want the ability to record visitors’ browsing data upon the URL-load near-completion as we would like to keep track how often the same URL has been visited. We record which browser user agent the visitors used and increment the counter for each visit.
  5. From step 4, we can query from our database to tell us how many clicks does a shortened URL get, and trace back what visitor agents percentage that makes up the most of the time spent visiting the URL content by looking at its main counter property queries to match.

And that’s it.

With this, I decided to implement this using Python as the main language, and Redis as my main data storage. With Redis in particular, as it supports hashes as one of its main data structures, I naturally picked this as part of my design solution.

To get started, I imported following libraries

import random
import redis
import base64
import json
  • random - to perform to shuffle the letters of urls by using pseudo generated-random numbers
  • redis - redis server library for python
  • base64 - to perform any string decoding/encoding functions for our shortened urls
  • json - to convert data object into valid JSON object
  1. Then we define and setup my Python class
    class UrlShortenerService(object):
        
        # local variables for Redis implementation
        redis_srv = None
    
        # our prefix base_url for our url shortener service
        base_url = 'http://rllytny.url/'
    
        def __init__(self):
            "Initializes self"
            self.redis_srv = redis.StrictRedis(host='localhost', port=6379, db=0)
    And named it to be UrlShortenerService. For our class, we have a couple of variables that we’re interested in using.
  • redis_serv - that holds to the instance of your local Redis server (installations for Redis server can be found here).
  • base_url - our base URL that prescribes our main URL domain that will be used for shortened URLs, which in our case, I called it http://rllytny.url

When instantiating our URL shortener service, we get redis_serv to point to the locally running Redis server (using its default parameters) so that we can start making use of its data structure operations later.

self.redis_srv = redis.StrictRedis(host='localhost', port=6379, db=0)
  1. Next, we implement our method to shorten any long urls
    def shorten_url(self, long_url):
        
        # jumbled them up
        url_str_arr = list(long_url)
        random.shuffle(url_str_arr)
        
        # get the last 10 items of the jumbled_url, assuming url is very longer than 20 chars
        if len(url_str_arr) > 20:
            shortened_url = url_str_arr[-10:]
        else:
            shortened_url = url_str_arr
    
        jumbled_url_suffix = ''.join(shortened_url)
        shortened_url = self.base_url + jumbled_url_suffix
    
        # encode shortened_url before saving
        encoded_url = encode_base64(shortened_url)
    
        shortened_url_key = url_string_formatter(self.redis_shortened_url_key_fmt, encoded_url)
    
        self.redis_srv.set(shortened_url_key, long_url)
        self.redis_srv.lpush(self.redis_global_urls_list, encoded_url)
    
        return shortened_url, encoded_url

I’ll break this down to you on my overly-simplistic naive implementation behind this method.

  1. With the any given long_url, I have it converted into an array of string as url_str_arr.
  2. Then I shuffle all the letters in url_str_arr in some random order.
  3. After shuffling, I just grab the last 10 characters of the shuffled string array, if the shuffled_url is fairly long for eg greater than 20 chars. Otherwise, I will use the whole shuffled string array itself.
  4. Using the same shuffled string array, I convert it to a string that will be used as URL suffix jumbled_url_suffix to append to my base_url. Thus we will get our official version of shortened URL. for eg. I get [a,b,c,x,y,z] as my shuffled array suffix, then my final shortened URL would be http://rllytny.url/abcxyx.
  5. Great! The next step is we want to store the mapping between our shortened URL and the long URL for our redis server so that we can, at some time, reference to the actual URL when a user visits the same shortened URL. As we’re using dictionary/hash as our main data structure, the mapping key has to be unique. So we first need to encode the shortened URL by applying our encoding (using base64) algorithm.
  6. With our encoded URL, we shall use it as our unique identifiable hash key, thus we pre-formatted its form to have this association ie shortened.url:%s. The implementation above we have some user-defined redis keys I was to use for hashing/indexing long URLs into their respective key/value pairs. The url_string_formatter is simply a convenience method to manage all my hash key representations that I will be using frequently in this tool.
  7. Once I got the new hash key shortened_url_key, I’m ready to save key/value value pair of shortened_url and long_url in Redis.
  8. Next, I want to create a list of all shortened URLs (and encoded) I created so far in the Redis database using global:URLs as my key.
  9. Finally, both of the shortened_url and its encoded_url counterpart are expected to return as I need to perform some URL link operations later in the program.

Whew! That all seems a bit mouthful. Moving on.

  1. Our method to expand shortened url.

    def expand_url(self, shortened_url):
        shortened_url_key = url_string_formatter(self.redis_shortened_url_key_fmt, shortened_url)
        return self.redis_srv.get(shortened_url_key)

    This is fairly straightforward. On any given shortened url, we simply retrieve its original long url by retrieving its actual shortened url key which we created in our previous method.

  2. Now, when a user visits any shortened urls.

    def visit(self, shortened_url=None, ip_address=None, agent=None, referrer=None):
        visitor_agent = {'ip_address': ip_address, 'agent':agent, 'referrer':referrer}
        
        url_visitors_list = url_string_formatter(self.redis_shortened_url_visitors_list_fmt, shortened_url)
        self.redis_srv.lpush(url_visitors_list, json.dumps(visitor_agent))
    
        url_clicks_counter = url_string_formatter(self.redis_shortened_url_clicks_counter_fmt, shortened_url)
        return self.redis_srv.incr(url_clicks_counter)

    In here, we have a couple of things going on.

  1. When a user clicks the shortened URL, we want the ability to record user’s browser user agent data that they’re using. The purpose behind this is simply we get a more accurate information on which type of browsers people use to browse links. It helps to distinguish which browsers are ‘popular’ when browsing the shortened web links. This is very helpful if we want to produce our user analytics reports for most frequently visited web links at any given point in time. In this implementation, we have our database key url_visitors_list that stores any number of user (or visitor) agents for particular shortened URL in Redis.
  2. And, we also want to record the total number of unique click counts for the same shortened_url. Again, useful for user browsing analytics report.
  1. Next, we have our counter properties from redis server.
    # Retrieve counter properties from Redis
    def clicks(self, shortened_url = None):
        url_clicks_counter = url_string_formatter(self.redis_shortened_url_clicks_counter_fmt, shortened_url)
        return self.redis_srv.get(url_clicks_counter)
    
    def recent_visitors(self, shortened_url):
        visitor_agents = []
    
        url_visitors_list = url_string_formatter(self.redis_shortened_url_visitors_list_fmt, shortened_url)
        for visitor in self.redis_srv.lrange(url_visitors_list, 0, -1):
            visitor_agents.append(json.loads(visitor))
        return visitor_agents
    
    def short_urls(self):
        return self.redis_srv.lrange(self.redis_global_urls_list, 0, 100)
    We have our data query operations on our Redis server. What we’re interested are:
  1. We retrieve the total number of unique clicks for a given shortened URL.
  2. We retrieve the list of recent visitor agents that last visited the same shortened URL by looking up url_visitors_list key and fetch all the recently visited user agents.
  3. We retrieve all the shortened URLs our URL shortener tool has made so far.

Again, fairly straightforward.

Now, here’s the fun part.

Let’s start running them by running our web traffic simulation.

First and foremost, run your local redis-server.

~ redis-server

Here are our few lines of our simulation code.

def main():
    # instantiate url_shortener_service
    url_shortener_service = UrlShortenerService()

    #read text input file
    readInputFile('urls-to-read.txt', url_shortener_service)

    #Web visitor activity being tracking...
    visitors_visiting(url_shortener_service)

if __name__ == '__main__':
    main()

We instantiate our URL shortener service. We feed the URL shortener service some URLs we want to shorten by reading stream input file URLs-to-read.txt. Once the shortened URLs are generated and written to Redis cache, we run our simulation program to have visitors visiting all the available shortened URLs.

Here’s the readInputFile method implementation.

def readInputFile(text_file, url_shortener_service):
    with open(text_file, 'r') as infile:
        for line in infile:
            # Ignore any comments in file
            if '#' not in line[0]:
                shortened_url, encoded_url = url_shortener_service.shorten_url(line)

                expanded_url = url_shortener_service.expand_url(encoded_url)

                print("ShortenedURL: {0}; ExpandedURL: {1}".format(shortened_url, expanded_url))
                
    return

Looks pretty straightforward. It just read each stream of line input to get the long URL, shorten them and they return the respective mapping between shortened URL and expanded URL, based on the hashing encoding URL.

Finally, we have our simulation visitors_visiting method implementation.

def visitors_visiting(url_shortener_service):

    print('Visitors visiting...')

    for i in range(0, 5):
        print(‘Visitors: %s’ % i)
        for short_url in url_shortener_service.short_urls():
            decoded_url = decode_base64(short_url)
            print('... %s' % decoded_url)
            url_shortener_service.visit(decoded_url)

    print('Recent Visitors')

    for short_url in url_shortener_service.short_urls():
        expanded_url = url_shortener_service.expand_url(short_url)
        decoded_url = decode_base64(short_url)
        print('... %s' % decoded_url)
        visitor_agents = url_shortener_service.recent_visitors(decoded_url)
        print('Total recent vistors for {0} (ie {1}) are {2}'.format(decoded_url, expanded_url, len(visitor_agents)))

    return

Here, in the first iteration, I made a running simulation of 5 visitors that will browse through all the shortened URLs in a hash from Redis eg url_shortener_service.short_urls. Each visited shortened URL, we record their visits agent information, if any, and unique count, and stored them as recent_visitors count in Redis.

And finally, our last iteration, we retrieve the total count of unique recent visitors for each shortened URL.

That’s all. You can see the implementation on my Github account.

For my next plan behind this tool, I’ll convert this tool into an actual URL shortener service using Flask, which is micro-server web framework in Python, and extend its core functionality to user browser tracking capabilities and analytics.

Till next time - Happy Coding!

Comments