There are a lot of URL shortener web services online that marketers and advertisers use to promote their URL content such as Google URL shortener, Bitly and TinyURL etc. They all do one thing in common, ie they take long URLs; have them shortened in length; and when visitors click on the modified links, they will be redirected to actual link content.
Given how good the other URL shortener web service implementations are done, I got myself involved in wanting to find out how I could achieve the same thing by implementing my own simple version of URL shortener service.
For a typical URL shortener service to be useful for people wanting to publish shareable content, it must simply fulfil the following:
- It takes a raw URL and uses some string hashing encoding algorithm to not only shorten the URL but give the shortened URL must be a unique value.
- The same unique shortened URL is used to map with the raw URL, thus we need to store this mapping in our hash data structure (and store this hash structure in the backend).
- When a user clicks on a shortened URL, the service searches for the raw URL by decoding the shortened URL as its actual index key. Once the key is found, the user is then redirected to the same long URL being fetched.
- Upon being redirected to the long URL, we want the ability to record visitors’ browsing data upon the URL-load near-completion as we would like to keep track how often the same URL has been visited. We record which browser user agent the visitors used and increment the counter for each visit.
- From step
4, we can query from our database to tell us how many clicks does a shortened URL get, and trace back what visitor agents percentage that makes up the most of the time spent visiting the URL content by looking at its main counter property queries to match.
And that’s it.
With this, I decided to implement this using Python as the main language, and Redis as my main data storage. With Redis in particular, as it supports hashes as one of its main data structures, I naturally picked this as part of my design solution.
To get started, I imported following libraries
1 2 3 4
random- to perform to shuffle the letters of urls by using pseudo generated-random numbers
redis- redis server library for python
base64- to perform any string decoding/encoding functions for our shortened urls
json- to convert data object into valid JSON object
2) Then we define and setup my Python class
1 2 3 4 5 6 7 8 9 10 11
And named it to be
UrlShortenerService. For our class, we have a couple of variables that we’re interested in using.
redis_serv- that holds to the instance of your local Redis server (installations for Redis server can be found here).
base_url- our base URL that prescribes our main URL domain that will be used for shortened URLs, which in our case, I called it *http://rllytny.url*
When instantiating our URL shortener service, we get
redis_serv to point to the locally running Redis server (using its default parameters) so that we can start making use of its data structure operations later.
3) Next, we implement our method to shorten any long urls
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
I’ll break this down to you on my overly-simplistic naive implementation behind this method.
- With the any given
long_url, I have it converted into an array of string as
- Then I shuffle all the letters in
url_str_arrin some random order.
- After shuffling, I just grab the last 10 characters of the shuffled string array, if the shuffled_url is fairly long for eg greater than 20 chars. Otherwise, I will use the whole shuffled string array itself.
- Using the same shuffled string array, I convert it to a string that will be used as URL suffix
jumbled_url_suffixto append to my base_url. Thus we will get our official version of shortened URL. for eg. I get
[a,b,c,x,y,z]as my shuffled array suffix, then my final shortened URL would be
- Great! The next step is we want to store the mapping between our shortened URL and the long URL for our redis server so that we can, at some time, reference to the actual URL when a user visits the same shortened URL. As we’re using dictionary/hash as our main data structure, the mapping key has to be unique. So we first need to encode the shortened URL by applying our encoding (using base64) algorithm.
- With our encoded URL, we shall use it as our unique identifiable hash key, thus we pre-formatted its form to have this association ie
shortened.url:%s. The implementation above we have some user-defined redis keys I was to use for hashing/indexing long URLs into their respective key/value pairs. The
url_string_formatteris simply a convenience method to manage all my hash key representations that I will be using frequently in this tool.
- Once I got the new hash key
shortened_url_key, I’m ready to save key/value value pair of
- Next, I want to create a list of all shortened URLs (and encoded) I created so far in the Redis database using
global:URLsas my key.
- Finally, both of the
encoded_urlcounterpart are expected to return as I need to perform some URL link operations later in the program.
Whew! That all seems a bit mouthful. Moving on.
3) Our method to expand shortened url.
1 2 3
This is fairly straightforward. On any given shortened url, we simply retrieve its original long url by retrieving its actual shortened url key which we created in our previous method.
4) Now, when a user visits any shortened urls.
1 2 3 4 5 6 7 8
In here, we have a couple of things going on.
- When a user clicks the shortened URL, we want the ability to record user’s browser user agent data that they’re using. The purpose behind this is simply we get a more accurate information on which type of browsers people use to browse links. It helps to distinguish which browsers are ‘popular’ when browsing the shortened web links. This is very helpful if we want to produce our user analytics reports for most frequently visited web links at any given point in time. In this implementation, we have our database key
url_visitors_listthat stores any number of user (or visitor) agents for particular shortened URL in Redis.
- And, we also want to record the total number of unique click counts for the same shortened_url. Again, useful for user browsing analytics report.
5) Next, we have our counter properties from redis server.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We have our data query operations on our Redis server. What we’re interested are:
- We retrieve the total number of unique clicks for a given shortened URL.
- We retrieve the list of recent visitor agents that last visited the same shortened URL by looking up
url_visitors_listkey and fetch all the recently visited user agents.
- We retrieve all the shortened URLs our URL shortener tool has made so far.
Again, fairly straightforward.
Now, here’s the fun part.
Let’s start running them by running our web traffic simulation.
First and foremost, run your local redis-server.
Here are our few lines of our simulation code.
1 2 3 4 5 6 7 8 9 10 11 12
We instantiate our URL shortener service. We feed the URL shortener service some URLs we want to shorten by reading stream input file
URLs-to-read.txt. Once the shortened URLs are generated and written to Redis cache, we run our simulation program to have visitors visiting all the available shortened URLs.
readInputFile method implementation.
1 2 3 4 5 6 7 8 9 10 11 12
Looks pretty straightforward. It just read each stream of line input to get the long URL, shorten them and they return the respective mapping between shortened URL and expanded URL, based on the hashing encoding URL.
Finally, we have our simulation
visitors_visiting method implementation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Here, in the first iteration, I made a running simulation of 5 visitors that will browse through all the shortened URLs in a hash from Redis eg
url_shortener_service.short_urls. Each visited shortened URL, we record their visits agent information, if any, and unique count, and stored them as
recent_visitors count in Redis.
And finally, our last iteration, we retrieve the total count of unique recent visitors for each shortened URL.
That’s all. You can see the implementation on my Github account.
For my next plan behind this tool, I’ll convert this tool into an actual URL shortener service using Flask, which is micro-server web framework in Python, and extend its core functionality to user browser tracking capabilities and analytics.
Till next time - Happy Coding!