Generated AI image by Microsoft Bing Image Creator
Happy Belated New Year of 2024!
With the arrival of my shiny new MacBook Pro M2, I will explain In this guide how I setup my machine for a seamless software development experience.
I would love to keep my current Macbook Pro Intel Core i5 2017 edition to continue using it.. But, over the years, the CPU and memory requirements have gone up for more modern full-stack development using Docker, updated Chrome extensions, or upgrade browsers, my limited 8GB RAM and 250GB of SSD storage started to feel small for me to do any decent software development tasks these days. If I knew Apple is incredibly stringent to make memory and storage computing parts non-upgradeable after the purchase is made, I could have turned back the time to request upfront upgrade for these back then, and not to feel the terrible waiting grind for the systems resources to free itself whenever I start up Chrome browser or Docker background process, which ended up to seem like an eternal wait… ⏰⏱
So, forcibly speaking, I have to start moving from my old mac, and say hello to my new Macbook Pro M2!
With all that rant of mine being said, let’s get straight into it!
Start by going through the initial macOS setup process. Make sure your system is updated to the latest version to ensure compatibility with the latest development tools.
Verify the system preferences. ie Apple Icon -> System Settings -> General -> About
Name: <The name of your Macbook Pro M2>Chip: Apple M2Memory: 16 GBSerial Number: <The unique serial number of your machine>macOS: macOS Sonoma Version 14+
Prior to writing this post, my machine default macOS was macOS Ventura after placing order overseas. To make the software update, Apple Icon -> System Settings -> General -> Software Update -> Click Update Now
and you should be good to go.
Homebrew is a powerful package manager for macOS. Install it by running the following command in your terminal:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
After this, assuming you use ZSH terminal like I do, you configure the homebrew shell settings to natively run well in ZSH terminal as below:
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> ~/.zprofile
Do brew update
to make sure the latest brew updates.
At this point, I want to bring all of your favourite software development tools from my old macbook to my new one. The obvious solution I thought would be to download them all, extract them and drag/move to the applications folder, but this is actually very boring and repetitive to do. It’s such a manual process to open/drag/drop everywhere in the screen to this. Thanks to brew, I don’t have to do this anymore.
Instead, we use brew
for that.
Let’s say I want to download Google Chrome browser. What I do is rather to pass in the cask
flag follow by application name I want to download which is google-chrome
With all that said, you run the following
brew install --cask google-chrome
And that’s it!
The incredible thing with this one-liner command is that it downloads from the HomeBrew package site (if it’s available) for the app, installs and moves to the Applications folder for you - without having necessarily to lift any heavy mouse-clicking for the complete the same action. That’s the quick win for your simple automation like that.
With it I can start write up a bash script to add other software downloads as I require like so
#!/bin/.bashbrew install --cask firefoxbrew install --cask visual-studio-codebrew install --cask iterm2# etc, etc
or if you like to DRY up the brew install, you can. You simply use \
keystroke like so.
brew install --cask firefox \visual-studio-code \iterm2 \# etc, etc
It helps to run the brew install actions in parallel. But bear in mind. With this you may run network time-out issue thus some apps you may not able to complete the download times on time. You have to redo the brew install step all over again. So I rather step away from using this for running many things at once. It’s a matter of preference.
For some tooling. in nature, they’re more terminal or CLI based commands, then using cask
wouldn’t make sense. We just simply omit it all together for eg.
brew install gitbrew install python3brew install openjdkbrew install apache-kafka# etc, etc
And we’re good to go from here.
I’ve used ZSH for years thus I found it to be my second-nature when navigating the developer folder and system resources in my Iterm2 terminal. I also have mentioned about configuring brew for the ZSH profile earlier in this article hence that’s where I’m going with this.
To start, I do the following installation:
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
Once you installed and configured it, I wanted to enhance my terminal experience further by executing these other 3 tools set at my disposal, which I found them incredibly useful.
git clone https://github.com/zsh-users/zsh-autosuggestions ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions # --> ZSH Autosuggestionsgit clone https://github.com/zsh-users/zsh-syntax-highlighting.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting # --> Super Syntax Highlightinggit clone https://github.com/romkatv/powerlevel10k.git $ZSH_CUSTOM/themes/powerlevel10k # --> PowerLevel10k colour scheme theming over Iterm2
To find out more on their configuration options (and others), you can follow the Medium article for some more setup guidance.
As my primary languages I use these days as of late are ReactJS, TypeScript, Python, Java and NodeJS (though there are others that I can program sufficiently well), Visual Studio Code has been be my primary choice of IDE (though I’m slowly mastering VIM as it’s a CLI-based code editor experience, which is also pretty good)
But overtime, I got more involved in building backend systems development more in my later years of career, I got more accustomed to using IntelliJ and PyCharm Community Editions lately. These are incredibly good as they’re more fully integrated developer experience where as Visual Studio Code is only a code editor on its own. You have to either download the extensions plugin or create the plugins yourself from scratch to match up similar experience as these IDE products.
We can download all of these using brew install cask cli options as stated earlier in the article.
Using Docker for Mac is quite a beast when comes to CPU/memory resources it needs. It always have a high start-up cost of my old Macbook machine every time I load up the mac screen thus have a huge impact on my machine’s overall performance running docker containers locally.
Thankfully, with Macbook M2 Pro machine having a bigger RAM and SDD storage, I have less to worry. Moreover, thanks for its ARM based architecture and the recent Docker community have been opening up their high adoption to have Docker images to run on ARM-based chipsets more than ever before.
With latest releases coming from macOS Sonoma. they will come in time in improving the docker virtualisation memory resources usage and things will get better from here.
Docker for Mac can installed using the brew install cask command as well.
With databases, you have popular choices such as Redis, Mysql, Postgresql, etc. Everything can be downloaded via brew command as well. Again, without the cask
flag.
Be mindful - because their cli-based services which you start and stop these services using brew services start/stop <database-service-name>
, you need a proper database client like DBeaver, DataGrip or similar to interact with them.
They can be downloaded via brew install cask.
Last but not the least, I want to overwrite my new Macbook M2 Pro defaults behaviour as part of my goal of enhancing my local development experience on machine further.
I found links like macos-defaults or defaults-write are incredibly helpful with this.
With these, I found that the most obvious ones I want to overwrite is the Dock as I want to recreate same feeling on my old Macbook settings. The settings are to minimize docker by default but with on mouse hover event when to autohide, along with genie mini effect when opening/minimising desktop screens.
They are setup as below:
defaults write com.apple.dock "tilesize" -int "24" && killall Dockdefaults write com.apple.dock largesize -int "128" && killall Dockdefaults write com.apple.dock "mineffect" -string "genie" && killall Dockdefaults write com.apple.dock "autohide" -bool "true" && killall Dockdefaults write com.apple.dock "autohide-time-modifier" -float "0.45" && killall Dockdefaults write com.apple.dock "autohide-delay" -float "0.2" && killall Dock
The next one is Finder where on Finder, the default display behaviour of not showing the folder paths and status bar when navigating always been the major pain point for me. I don’t like seeing myself getting lost every time I want to find something that I want.
So to sort that out, I do the following:
defaults write com.apple.finder ShowPathBar -bool true && killall Finderdefaults write com.apple.finder ShowStatusBar -bool true && killall Finder
And sometimes it would useful to toggle on/off hidden system files fro view on Macbook when I want to verify my local development settings, so I use the following:
defaults write com.apple.finder "AppleShowAllFiles" -bool "false" && killall Finder # to hide system filesdefaults write com.apple.finder "AppleShowAllFiles" -bool "true" && killall Finder # to show system files
For more macbook.defaults
override customisations you desire, you can follow above links I provided earlier.
That’s it! I’ve shown you all the cool tricks you can use to speed up things up as much as possible with Bre, macbook.defaults ,etc. You can also find from my Gist profile here as if you need reference on the CLI commands I use for local development.
That’s the complete setup guide for my new Macbook M2 Pro for development as a start.
The next steps is transfer all of your Macbook software licences like MS Office, Alfred, Bartender etc from the old to the new. But you can do that in your pace at any time, if you’re not in the hurry to rush transferring them across.
Hope that’s been incredibly helpful for you.
Here’s to the new exciting chapter of software development experience for the new year of 2024 to come!
Till then, Happy Coding! 👩💻👨💻💪🖥
]]>Since my last post earlier in the year, I mentioned my wedding plans mid-year of 2023 and now that’s all done and dusted. Naturally, the next thing to do after my wedding was to book our honeymoon to tour all the fun places around Malaysia and Indonesia towards the end of October 2023.
Here’s one of the best photo trips we took during our initial stay in KL.
The magnificent sight of Petronas Twin Towers during the night.
Here’s me and my wife took the selfie shot of the Petronas Twin Towers itself.
One of the best highlights of our trip throughout the honeymoon experience so far, was before we finally arrived home in Sydney in late November.
Before making this trip, I made plans of wanting to place an order for the Apple MacBook Pro M2 machine when contacting a PC store in Kuching, my home city of Sarawak, Malaysia. I was hoping to have the machine ready by the time I arrive in Kuching on our second leg of the honeymoon journey as I can bring it physically back home to Sydney with me.
But due to some unforeseen circumstances, the logistics behind the delivery of a new custom-made Macbook M2 Pro machine was not within my control and under my preferred scheduled delivery, I had to organise the DHL overseas delivery by not only the package payment of $350 AUD but also had to pay the import tax duty of $375 AUD on top of it as well! It’s certainly awful to spend a lot of that money on top of purchasing a MacBook M2 Pro machine for a good price, (which worked out I’m better off saving $800 AUD than the local retail price in Sydney on average). Sigh!
In spite of all that, the item did arrive safely at my local newsagency collection centre, I wasted no time to pick it up as a major sign of relief and ecstasy!
With all that bridge under troubled waters now over, I can get straight into the business of mentioning my thoughts on the best developer experience using MacBook M2 Pro device specs. I couldn’t be any happier to have this fine machine spec as my early Xmas present! 😊🥰🎄🎁
Without further ado.
Whether you’re a seasoned developer or just starting on your coding journey, the right tools can make a significant difference in your productivity and overall experience. In this blog post, we’ll explore why the Apple MacBook Pro M2 is an exciting choice for software developers seeking to elevate their skills and workflow.
Prior to having my new Macbook, my present Macbook Pro 2017 edition comes with 2.3 GHz Dual-Core Intel Core i5, 8GB RAM, 256 GB storage space and 1GB graphic memory. Over years of using this spec, I have noticed with modern apps getting far more advanced for development needs in terms of CPU, storage and memory performance, they’re getting far too slow for modern-day development these days. Especially, the majority of the boot and run time resources are spent on running local docker containers and browser multi tabs like Chrome, Firefox, Safari etc on a constant basis, they should put a lot of strain on the limited system resources I got.
Apple’s transition to its custom M1 and now M2 silicon has revolutionized the computing landscape. The MacBook Pro M2, powered by this cutting-edge architecture, brings unparalleled performance and efficiency to the table.
With my above unpackaged item, my new machine has the following specs:
The M2 chip’s multi-core performance ensures that your software compiles and runs swiftly, significantly reducing development cycles. Experience the thrill of near-instantaneous response times as you code, test, and debug with ease.
The M2 chip is an upgraded version of the M1 chip, which was Apple’s first attempt at designing its own ARM-based silicon. The M2 is the latest and most efficient Apple Silicon, with an 18% faster CPU, a 35% faster GPU and a 40% faster neural engine than previous generations, as well as 50% more memory bandwidth.
Enjoy extended battery life without compromising performance. The M2 chip excels in energy efficiency, allowing you to code on the go without constantly worrying about running out of battery power.
A great development environment is crucial for any software engineer. The MacBook Pro M2 excels in providing an environment that fosters productivity and creativity.
The MacBook Pro’s Retina display offers crystal-clear visuals, making code easier to read and reducing eye strain during long coding sessions. The high resolution ensures that every detail of your project is displayed with precision.
Take advantage of the Touch Bar, a dynamic and context-aware input device, to streamline your workflow. Customize it to access your favourite commands, making repetitive tasks a breeze.
Apple’s ecosystem is known for its seamless integration, and the MacBook Pro M2 is no exception.
If you’re into iOS or macOS development, the MacBook Pro M2 is a dream machine. Compile and run your apps faster than ever, and leverage the efficiency of Xcode to enhance your development workflow. But, if native mobile development is not one of your forte skills, you can certainly give open-source platforms such as Expo Go to build native app experiences using your front-end skills like React/JS a go. MacBook M2 Pro will certainly give you a major edge to take your native development to the next level.
Easily install and manage development tools with Homebrew. The MacBook Pro M2 ensures compatibility and smooth performance for the vast array of packages available.
Investing in the MacBook Pro M2 is not just about the present; it’s about future-proofing your development capabilities.
Apple’s commitment to software updates ensures that your MacBook Pro M2 remains current with the latest features and improvements, keeping you at the forefront of technological advancements.
As the Apple ecosystem grows, so do the opportunities for developers. Take advantage of the expanding App Store and tap into new markets and tools with your innovative applications.
As we’re living in the age of cloud computing and cloud development services, while it’s true the core advantage of such things is to provide vast scalability, cost savings, increased performance etc as your app grows at a particular scale point in the long run, I’d find sometimes this is not always the case in the short run, especially when you just starting out, Local development would clearly outshine here because they are extremely fast, easy and free to do, Developers don’t have to pay for cloud resources upfront in the beginning, even an Internet is not necessarily required. This is especially true when you just want to explore distributed streaming technology locally such as Apache Kafka for eg. With MacBook M2 Pro specs in your hands, you can untap its raw power to spin enough of its local resources to develop and validate your app functionality as much as you want until you’re ready to move on to cloud services and be financially confident to afford micro/macro transactions for their services diligently. Before M1/M2 existence, this is fairly difficult to accomplish.
In conclusion, upgrading to the Apple MacBook Pro M2 is a game-changer for developers. From the incredible speed and efficiency of the M2 chip to the seamless integration with powerful developer tools, this machine is designed to elevate your software development experience. Make the leap and unlock the next level in your coding journey.
Till next time, have a Merry Safe Xmas Holidays.
See you in the new year, Happy Coding!
PS: This article is by no means a complete guide to my new Mac setup for development. There are a lot of guides online dev bloggers have built in their own certain ways. Each one of them is unique to their own personal development tastes. I would write up a separate blog post in the future for it on my own approach.
]]>Since my last post, I have finally managed to get a number of errands to accomplish before the end of 2022. One of the most important items was getting my father’s ashes finally sent homebound back to Malaysia in September for his eternal rest. It was great to be back home and get a good holiday stint around my hometown of Sibu, Sarawak as well as the capital city, Kuching where I haven’t been for a long time to just wind down and relax, along with the wedding preparations underway downunder.
Now, that’s all out of the way, it’s time to get back into my world of blogging once more!
Last year, I faced a deceptively complex but interesting solution design problem when working with NodeJS event-driven programming model, EventEmitter
and how we should go about writing up appropriate unit testing when listeners were subscribing to certain events being emitted.
If you recalled your EventEmitter basics, EventEmitter is one of the core NodeJS modules that facilitates communication interaction between objects when running in asynchronous event-driven architecture. For any objects to inherit this module, the easiest thing is to write them extends
like so.
class FooBar extends EventEmitter { constructor() { // some object properties to go here.. }}
By doing this, the same object is now inheriting a lot of EventEmitter’s methods.
Mainly, they are:
emit
on
and off
In its simplest concept, emitter objects that emit named events will cause the previously registered listeners to be called upon. This programming model works basically like in pub/sub model more or less.
With that in mind, we can now look into this problem that I came across in one of my projects last year.
class CronService extends EventEmitter { constructor(configExpression, runSomeCronJobFunc, logger) { this.configExpression = configExpression; this.logger = logger; this.runSomeCronJobFunc = runSomeCronJobFunc; this.isStarted = false; this.isProcessing = false; } async start() { this.isStarted = true; this.performCronJob(); } async stop() { this.isStarted = false; } async performCronJob() { if(this.isProcessing) { this.logger.info(()=> console.info('job is still running')); return; } this.isProcessing = true; try { await this.runSomeCronJobFunc(); this.emit('finished', null) } catch(error) { this.warn(()=> console.warn('error encountered; job ended abruptly')); this.emit('finished', error); } this.isProcessing = false } getStatus() { return { isStarted: this.isStarted; isProcessing: this.isProcessing;s } }}
Here, we have a CronService
whose primary task is to kick off the cron job that executes at a certain interval we provide. Thus CronService uses configExpression
for checking the cron interval expression and runSomeCronJobFunc
as our main task placed in cron service for execution, along with the other auxiliary parameters in checking where in the state of the cron service being run isStarted
and isProcessing
.
The main area of interest to look is the performCronJob
function block where our CronService emits two finished
events. What does this block say?
In our project requirement, we say, in our try/catch
block,
await this.runSomeCronJobFunc
and expect the cron job task to execute into completion, we will emit finished
to signify the job’s completed without errors.runSomeCronJobFunc
, we capture the error and we will still emit finished
to signify the job is also completed - but with errors this time.For either of the two outcomes above, we will flag isProcessing
to be false; it will not be marked as true if the current job is still in the middle of execution and we don’t want the next instance of cronjob service to kick off another one until it’s completed.
So, why is it that we emit finished
instead of error
one may ask? That’s because, in our solution design, we have a clear requirement any failed/incomplete cron job occurred is to be treated as a complete task so that at the next configExpression cycle we want to kick off the same CronService again. Our goal for CronService is to run batch job processing tasks at regular intervals throughout the day regardless if they were completed successfully or not.
That’s the context for our design rationale.
With that out of the day, we now come to the important part of the question - who are the listeners to these finished events and how do we thoroughly test the EventEmitters are working correctly based on the conditions above?
To start, we write our hypothetical unit test file here.
// our assertion utilitiesconst {assertThat, is, not, equalTo} = require('hamjest');const sinon = require("sinon");// setup mocks and test dataconst logger = new someMockLogger();const HOURLY_INTERVAL = 1000 * 60 * 60;// cron to run hourlyconst cronConfigExpression = "0 * * * *";let someCronTaskPerformedCounter;const performCronTask = () => { someCronTaskPerformedCounter++;}describe(('CronService'), () => { let cronService, clock; beforeEach(()=> { cronService = null; clock = sinon.useFakeTimers(); }); afterEach(()=> { sinon.restore(); if(cronService !== null) { await cronService.stop(); } }); /// More unit test blocks to follow shortly....})
What did we just write in the above?
performCronTask
function call to pass as a callback parameter, that stores the total count of the cron task that gets performed ie someCronTaskPerformedCounter
.describe
test block for cronService, outlining the beforeEach
and afterEach
callbacks because we want to reset the cronService instantiation and SinonJS clock’s stub timer as well for each unit test we run (in this case, we’re doing one for the purpose of this demo);Once your baseline unit testing structure is underway, we’re getting into the nitty-gritty of things.
it("should run every hour (0 * * * *)", async function () { cronService = new CronService( performSomeCronTask, cronConfigExpression, logger ); assertThat(someCronTaskPerformedCounter, is(0)); await cronService.start(); clock.tick(HOURLY_INTERVAL); // This won't work! assertThat(someCronTaskPerformedCounter, is(1));});
Here, our test should expect the cronService to run at hourly intervals meaning once the current cronjob gets to kick off, it’ll go for an hour (not in actual real-time hour interval thanks for SinonJS timebending’s clock.tick
API ), and expects it’s still being processed ie isProcessing
is true until it runs into completion by the exact hour.
So towards the end of running time, we first thought we could assert someCronTaskPerformedCounter to naturally increment to 1, isn’t it? But no, it’s not!
It would still stated at 0. No change here.
How on earth is it that possible when we’re not running things in real-time as everything is running under controlled environments?
But remember in the earlier performCronJob
block of try/catch
.
async performCronJob() { try { await this.runSomeCronJobFunc(); this.emit('finished', null) } catch(error) { // boo! }}
runSomeCronJobFunc
inherently becomes a Promise and it will pause at this execution level until the same Promise gets settled (be rejected or fulfilled), which makes sense why our someCronTaskPerformedCounter assertion didn’t work because we expect it to complete prematurely too early during its running interval!
Thus we ask ourselves - what on earth should we ever come up with an assertion that says the runSomeCronJobFunc
Promise will get fulfilled at some point without executing the rest of the test cases since everything will be running synchronously after that? We cannot expect to block the testing runtime because that will suspend the entire executing thread of the JS runtime space. Nothing will run at all!
How can we use the finished
emitters signal for the unit test to be aware something has occurred downstream of events when multiple of cronService instances could be fired at some regular interval in a sequential manner?
The solution? 🤔
We create a new Promise wrapper over our promised assertions that gets fulfilled when the finished
emitter gets triggered.
// our main ingredientconst cronServiceFinished = (cronService, runAssertions) => { return new Promise((resolve, reject) => { cronService.on('finished', (error) => { runAssertions(error).then(resolve, reject); } })}
If you look at this closely, it makes sense right?
We are saying - as we are awaiting on runSomeCronJobFunc
to complete, we then have to await cronServiceFinished
until the finished
emitters get triggered. Once it’s triggered, we use its EventEmitter callback signature for the emitted data (which in this case it’s null
in the above example) to perform our assertions rules we see fit, which in itself is also a Promise!
With that in mind, we can now rewrite our unit test into the following
it("should run every hour", async function () { cronService = new CronService( performSomeCronTask, cronConfigExpression, logger ); assertThat(someCronTaskPerformedCounter, is(0)); await cronService.start(); clock.tick(HOURLY_INTERVAL); const successAssertions = async (error) => { assertThat(error, is(null)); assertThat(someCronTaskPerformedCounter, is(1)); }; // Now this works because it respects the async/await execution flow of the main cron task at hand. await cronServiceFinished(cronService, successAssertions);});
That’s it!
Now we can make use of this to extend for another use case to verify cron job is still processing below:
it("verifies the cron service is processing when started", async function () { cronService = new CronService( performSomeCronTask, cronConfigExpression, logger ); assertThat(someCronTaskPerformedCounter, is(0)); const successAssertions = async (error) => { assertThat(error, is(null)); assertThat(someCronTaskPerformedCounter, is(1)); }; await cronService.start(); clock.tick(HOURLY_INTERVAL); const status = cronService.getStatus(); assertThat(status.isProcessing, is(true)); await cronServiceFinished(cronService, successAssertions);});
Now how do we handle when the cronjob service encountered an error during its processing time with finished
emitter triggered with error data?
We do the following:
it("verifies the cron service is processing when started", async function () { performSomeCronTaskWithErrorStub = sinon.stub().rejects(new Error("boo!")); cronService = new CronService( performSomeCronTaskWithErrorStub, cronConfigExpression, logger ); assertThat(performSomeCronTaskWithErrorStub.notCalled, is(true)); const failledAssertions = async (error) => { assertThat(error, is(not(equalTo(null)))); assertThat(error.message, is(equalTo("boo!"))); assertThat(performSomeCronTaskWithErrorStub.calledOnce, is(true)); }; await cronService.start(); clock.tick(HOURLY_INTERVAL); await cronServiceFinished(cronService, failedAssertions); const status = cronService.getStatus(); assertThat(status.isProcessing, is(false));});
Here, we have created a mock cron job task comes with a stubbed error.
performSomeCronTaskWithErrorStub = sinon.stub().rejects(new Error("boo!"));
By using sinon.stub
, we can make use of the spy method calls calledOnce
or notCalled
to inspect whether the internal states of the same function was called once or not at all respectivwely - hence for their named convenience methods.
Then our success assertions are now replaced with failure assertions that expect the error
not to be null along with the generic error message that comes with it, and the performSomeCronTaskWithErrorStub
was definitely called once.
When errors are encountered during its processing, we said to finished
the cron job and thus marked the same instance state isProcessing
to false as per our requirement.
So, there you have it.
This is how you want to write your unit test cases when dealing with EventEmitters in the event system flow where our unit test cases subscribe to these events and execute the assertions reliably from there. Especially when dealing with asynchronous operations like these, the use of Promise wrappers becomes incredibly handy when dealing with unexpected racing conditions between test cases running at different times to the finishing lines.
Hope you like it and you find this very useful.
Till next time, Happy Coding!
PS: If you want to find out how do Hamejest and SinonJS used in detail, you can find their resources here.
]]>I must admit I haven’t had the chance to sit down and write up another good tech blog post. A lot is happening in motion since the start of the year 2022 from my recent engagement to my fiancee of 3 and half years; busy with my day job as a contractor for a bank; finally getting my passport renewal done in Canberra (after almost 2 years of nationwide lockdown) as part of my travel plans to Malaysia so I can send my father’s ashes for his final resting place; along with talking plans having my fiancee and me moving together later in the year and preparing my next career technical certifications to take on, etc…
The list could go on.
Nevertheless, still, I’m glad to take the time to write about what I have accomplished recently.
That is…
I’ve decided to migrate my Octopress site to Hexo! 📦🚚
Presently, it may not look like much to you, but If you haven’t noticed any difference between the old and my new blog platform, is that I have updated my favicon on the page in the top left-hand corner of the browser. The site is using Hexo favicon. In my old one, previously, it was Octopress (obviously).
Secondly, all of my code snippet’s syntax highlighting show here has been modified a bit. They just do not show any line numbers of code anymorem whereas Octopress comes with them, by default.
Lastly, my footer note section has been updated with a new description, from Andy Wong - Generated using Octo theme from Octopress
to Andy Wong - Generated using Octo theme(modified) for Hexo
.
These are the key noticeable differences.
The new CMS tool, Hexo, written by the Chinese development community, became the good choice for me after careful evaluations of open-source static generation tools out there such as Hugo, Gatsby, Next.js etc. The deciding factor simply boils down to this:
octo
theme provided, modified for my own taste and liking.To my surprise, the whole thing did not take very long. At most, it took only 3-4 days to make this all happen.
I have to consider the following list of steps to work it out.
hexo generate && hexo server
.And voila! 👏
Once I was happy my Heroku landing site is coming all together, I can start take my old Octopress site offline for maintenance on 27/04/2022 for few hours, and updated the DNS for both my Namecheap DNS provider and Heroku Customer Domains.
You’ve landed on my blog post to witness all that new shiny armor of my blog post pages - for all its good glory.
If you’re curious how I manage to pull all of this off, I could explain in the following.
The critical changes I had to do are its basic octo
theme I found via Hexo Theme Plugin search.
As my former Octo theme, codenamed BoldAndBlue, does not come available via Hexo Theme Plugin compared to the Octopress counterpart, I decided to take necessary ‘’hacking” steps to re-skin the theme that I want.
Here’s the following list of such changes
Sidebar EJS for Twitter sidebar-twitter.ejs
does not exist so I had to build from scratch and port the original functionality from the old site.
Default CSS Theme was ugly to be honest on the surface, so I had to port the entire raw-compiled SASS-to-CSS files from the old site. My Hexo theme does not come with SASS support by default. I don’t really believe there’s any value to maintain SASS files as I rarely need to tailor my web site pages for the past 7 years. So I’d leave that as it is.
Google Fonts Typography was ugly thus Google Web Fonts Nato
and Open Sans Serif
were brought back here.
Sidebar EJS for Github options were modified sidebar-github.ejs
to cater extra Github config options, for eg
const options = { user: '<%=theme.sidebars.github.profile%>', count: '<%=theme.sidebars.github.repo_count%>', skip_forks: '<%=theme.sidebars.github.skip_forks%>'};
which the current Hexo doesn’t come with such configurations.
Individual post date time display information was missing ie new Date(page.date).toLocaleString()
along with formatting options.
Config.yaml settings; the following properties had to be altered:
Theme’s config.yaml settings; the following properties had to be altered.
npm install hexo-generator-feed
Tags/categories were missing/ and archives post headings to do not have properly urls.
Favicon changes
Disqus comments hyperlink next to post’s actual published time.
// - post EJS template<time datetime="<%=page.date.toJSON()%>" data-updated="true"> ...</time>// - disqus comments hyperlink<% if (theme.disqus.enabled) { %> | <a href="#disqus_thread" data-disqus-identifier="<%=theme.site.url%>">Comments</a><% } %>
Home page missing EJS tags, such as page.ejs
, etc.
Overall, I’m happy with the experience it comes with. After performing the steps, I can start to concentrate better on writing content without having to deal with the inefficiencies that Octopress has suffered over the years since the community last supported in 2015, thanks to the influx of other competing static site tool generators to choose from. I can only hope the same for Hexo as they still do recent releases in the past two years at the time this blog post is written. So, touch wood, Hexo will be around for X-number of years to come.
Octopress was being good to me over the years and I remembered Ruby static site generators were incredibly popular during 2014/2015 as web frameworks were incredibly high on demand before other popular programming languages like PHP, Javascript, Python, NodeJS etc came along.
I remembered the initial excitement of writing my first tech blog post for the world to read back then.
Thus I want to feel what it feels like to launch my blog pages, for the second time - several years later!
Here’s to my future blogging days ahead, no matter what the future changes may bring.
Till then, Happy Coding! 👨💻💻⌨️
PS: I want to give to this blog post link that ultimately helped to make Hexo transition smoothly in the first place! 😉😊 - https://gangmax.me/blog/2019/12/16/From-Octopress-to-Hexo/
]]>The reasons for this to happen can be wide-ranging - everything from being too busy at work, changing interests for different software frameworks, lifestyle priorities changes, changing career responsibilities etc, etc - there’s too many to list here.
They took the best of our coding life as months or years go by.
As a consequence, your Github repositories do become quickly stale over time.
This ‘staleness’ - come with old dependencies that do not get updated, which at worse, could be leaking software security issues over time.
I have over 50 repositories in my Github profile and there’s no way that I could keep track of knowing which repos have more security holes to address after another, let alone have to fix up security vulnerabilities dependencies one PR at a time on my own… 😨
Thus I need to find a way to auto-manage all these repositories without lifting a finger (much)
Without further ado, I found this useful toolchain from Github marketplace - Snyk.
Synk is an open-source tool that helps developers track, find and fix security vulnerabilities across all of your Github projects.
According to this page, as it describes as -
This all sounds pretty awesome!
This is the type of thing I will like to include as my software toolchain to know where my repos are going wrong or not. So I don’t have to manually install things myself.
What a great boon to keep my repos and clean and fresh!
Best of all - it is absolutely free! You can do this on unlimited Github repos you have in your portfolio. It comes with its own special CI/CD pipeline as well.
To start with,
Once you have all that setup, you will eventually arrive at this dashboard screen.
There it is - my one-stop landing page to monitor and assess the state of my Github repos’ dependencies health check on security vulnerabilities.
Like all things of security audit and vulnerability checks as you would do with anti-virus and anti-Malwarebytes check, we have to make our own informed choices whether we should triage and resolve these vulnerabilities urgently or not, depending on their level of severity as well the broadness of their impacts to their entire health state of the app’s core functionality.
Once you decide when you want to resolve these vulnerabilities with ease, Snyk’s give you this option to create open PR to one of your nominated repos, and it will self-generate the best security patch algorithm for you and will apply these patches should you decide to accept their security recommendation by merging.
For eg, I recently did one of the security patches not long ago on one of my old Node repos, which I haven’t touched for some considerable time.
If I open up the merged PR.
Here you can see a table generated by Snyk outlining explanations of which files suffered security vulnerabilities and came up with recommended fixes for these.
If you click on the ‘’Files Change’ tab, you notice the following:
It forewarns that these are affected npm libaries that need to be upgraded at their latest working version whose security patches will be applied.
That’s it!
All of the thought of manually checking and verifying package libraries’ vulnerabilities by myself not became a major headache to deal with.
I have the available tools that help to automate that workflow process for me - without doing any heavy finger lifting!
This is precisely what I love about it. 😎
Go ahead, reap and sow the power of free open source software that’s built by the community that rests on the shoulder of giants!🤞🤞👨💻👩💻
Till next time, Happy Coding!
]]>What better way to start out writing ins-and-outs on certain Python web frameworks.
I’ve been working on Python web projects for some time and I’m here to offer my rants/thoughts when working between Django and Flask/Falcon and outline the comparisons between these two.
Let’s start with the ones I’m most familiar with.
NB - Before I begin, this article assumes the reader either has a good understanding or good working knowledge of MVC software patterns as more web frameworks are built around this pattern. If you don’t know what that is, you may want to visit this Wikipedia page as a refresher course, before proceed reading.
First of the rank is Flask. 🌶
Flask’s software design philosophy is to help developers to focus on code design simplicity first. What it means by that is it’s a framework that does not come with a lot of ‘bells and whistle’ you need in building the web app such as scaling web server, incorporating templating engine language, authentication system, middleware libraries, service logic and database layers.
You get to handpick a number of these yourselves to decide how your app should take form.
From project experience, I used a number of well-known libraries that are popular amongst the Python developer community. They include things like:
So without further ado, let’s start with its basic layout
## It's Flasking time!from flask import Flaskapp = Flask(__name__)app.route('/')def hello_world(): return 'Hello, World!'
We start with importing the Flask package and made its object instantiation to kick off the application.
With app
, you can start to create route system to have a home landing page to render the home page’s content page by making using of its route decorator functions.
app.pyconfig.pyrequirements.txtstatictemplates
In the above, we have its basic folder structure that comes with minimalists set of files. Usually, it comes with the base app file, the config.py
setup file, the requirements.txt
for managing app dependencies along with static
and templates
folders, where static contains all the front end libraries like JQuery/JS, CSS and one HTML file. While templates containing all of your Jinja templating files.
from flask import render_templateapp.route('/hello/')def hello(name=None): return render_template('hello.html', name=name)
On the V-side of MVC pattern, we use importrender_template
function and using the same function, we make the function call to render particular a JINJA template file (which normally resides at root of the application folder by default), based on the current url route where we viewing the template from.
#####Database models
# Using Flask-SQLAlchemyclass User(db.Model): id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True, nullable=False) email = db.Column(db.String(120), unique=True, nullable=False) def __repr__(self): return '<User %r>' % self.username
On the M-side of MVC pattern, we use the more popular ORM library Flask-SQLAlchemy ()which is a wrapper on top of SQLAlchemy) that assists in define our entity models, along with using datatypes for our model’s attributes which are easy to understand and follow.
app.route('/')def index(): return 'Index Page'@app.route('/hello')def hello(): return 'Hello, World'
On the C-side of MVC pattern, we use route decorators to tell Flask which url segments will be rerouted to the appropriate controller which are responsible for handling such requests.
Nothing new here.
from flask import Flask, render_template,requestapp = Flask(__name__)@app.route('/send',methods = ['GET','POST'])def send(): if request.method == 'POST': age = request.form['age'] return render_template('age.html',age=age) return render_template('index.html')if __name__ == '__main__':app.run()
When working on the pure SSR(server side rendering) apps, we import our render_template
and request
combo to tell Flask that we’re going to be dealing with form Jinja Template.
import osimport tempfileimport pytestfrom flaskr import create_appfrom flaskr import flaskr.app as flaskr_app@pytest.fixturedef client(): db_fd, flaskr_app.config['DATABASE'] = tempfile.mkstemp() flaskr_app.config['TESTING'] = True with flaskr_app.test_client() as client: with flaskr_app.app_context(): flaskr.init_db() yield client os.close(db_fd) os.unlink(flaskr_app.config['DATABASE'])
When doing unit testing for Flask app, we can incorporate some of the popular Python testing frameworks like Unittest or Pytest without much difficulty. Like, in the example above, we have the Python test file here that verifies the application configuration and initialising a new database. Note that everything here is setup a pytest fixture thus we treat this as an individual test module. Then, we can run execute the pytest
command and we still get tests results feedback on Flask’s design implementation.
Simple. Not much different to our other types of unit and integration test suite
That’s it for Flask!
Next, we have Falcon. 🦅
Like Flask, Falcon is also a micro framework. But it focuses purely on building REST-ful API-driven applications, therefore thus views and routes concepts are non-existent here. It’s considered a minimalist framework that does not come with plenty of dependencies and a heavy amount of abstractions you have to come to grip with.
NB - You can actually build APIs with Flask as well by the way. Flask, by no means, it’s not built entirely just for SSR apps only. You can download the Flask-RESTful library to achieve the minimal API - you can see the working sample code here
from wsgiref.simple_server import make_serverimport falconapp = falcon.App()
To start, we make the falcon import to start the app. No much different to Flask’s counterpart.
api├── .venv└── api ├── __init__.py └── app.py
We have the above folder structure. Notice we don’t have any HTML rendering templates to view, compared to Flask.
# Our resource definitionclass MediaResource: def on_get(self, req, resp): """Handles GET requests""" resp.status = falcon.HTTP_200 # This is the default status resp.content_type = falcon.MEDIA_TEXT # Default is JSON, so override resp.text = ('\nTwo things awe me most, the starry sky ' 'above me and the moral law within me.\n' '\n' ' ~ Immanuel Kant\n\n')# Our app instanceapp = falcon.App()# We have resources are presented as long-living instancesmedia = MediaResource()# App will handled 'media' requests hereapp.add_route('/media', media)
Next, we start writing out our resource definition for our APIs.
As part of Falcon’s design philosophy, it adheres to a lot of REST architectural styles, thus they ‘guide you in mapping resources and state manipulations to actual HTTP verbs.
Thus in the above example, we declared our MediaResource
that handles any get
requests coming in and produces the appropriate responses for them by loading the correct dependencies by its own dependency injector out of the box.
Simple nice and clean!
#####Database models
# Using SQLAlchemy class User(db.Model): id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True, nullable=False) email = db.Column(db.String(120), unique=True, nullable=False) def __repr__(self): return '<User %r>' % self.username
Again nothing different here. I personally use SQLAlchemy ORM library and it works well when building your models to support your data-rich RESTful applications.
# -----------------------------------------------------------------# unittest# -----------------------------------------------------------------from falcon import testingimport myappclass MyTestCase(testing.TestCase): def setUp(self): super(MyTestCase, self).setUp() self.app = myapp.create()class TestMyApp(MyTestCase): def test_get_message(self): doc = {'message': 'Hello world!'} result = self.simulate_get('/messages/42') self.assertEqual(result.json, doc)# -----------------------------------------------------------------# pytest# -----------------------------------------------------------------from falcon import testingimport pytestimport myapp@pytest.fixture()def client(): return testing.TestClient(myapp.create())def test_get_message(client): doc = {'message': 'Hello world!'} result = client.simulate_get('/messages/42') assert result.json == doc
When comes to building unit tests for Falcon, it’s relatively straightforward.
You can add a test framework of your choice ie Unittest or Pytest without any difficulty to the framework. You will achieve the same goals as Flask’ unit test configuration counterpart. The only difference between the two is that Falcon comes with many testing utility functions that will better support functional testing in Falcon such as simulate_*
method semantics. These are, essentially, to support all of your request/response test cycles across all the common HTTP verbs and actions. just like the client.simulate_get
example above.
Now that we got those two out of the way with a firm understanding of how these minimal frameworks are going to work, let’s go and check out the Django side of the fence ⚠️ ⚠️⚠️ ⚠️
Recently worked on Django project, which is a Python fully-fledged MVC framework that has been grazing around the community for a very long time.
To recap, MVC (as stated earlier in this post), is one of the oldest (and most typical) software design patterns any veteran would tell for years they have been using since the Internet era was born 20 years plus ago.
From this design pattern, you would expect to see the Python code written for each layer of componentised design of web applications or software system.
To start, you begin with
django-admin startproject myhomepage
myhomepage/ manage.py myhomepage/ __init__.py settings.py urls.py asgi.py wsgi.py
After running the django admin commands, you will be greeted with the above Django basic folder structure. You start with the container project folder named myhomepage
, which is the name you made in the first step. Then, we have manage.py
which is the utility Python file that lets you interact with the Django app via the CLI commands.
We see that there’s subfolder name that is named myhomepage
again here but this folder will be marked as the Python package that used to contain all the core logic files which encapsulate everything about the Django app itself.
Within this folder, you’re greeted with the following:
__init__.py
- an empty file to tell Python that this is considered as a Python package.settings.py
- the settings file for the Django Project.urls.py
- where you can write up all of your URLs declaration of the Django-powered websiteasgi.py
- entry point for setting ASGI webserver to serve the app.wsgi.py
- entry point for setting WSGI webserver to serve the app.To understand the difference between ASGI and WSGI webserver interface you can read more about it here on this post if you’re curious.
## Routes - urls.pyfrom django.urls import pathfrom . import viewsurlpatterns = [ path('', views.index, name='index'), path('<int:question_id>/', views.detail, name='detail'), path('<int:question_id>/results/', views.results, name='results'), path('<int:question_id>/vote/', views.vote, name='vote'),]## Views - views.pyfrom django.http import HttpResponsefrom django.template import loaderfrom .models import Questiondef index(request): latest_question_list = Question.objects.order_by('-pub_date')[:5] template = loader.get_template('myhomepage/index.html') context = {'latest_question_list': latest_question_list,} output = template.render(context, request) return HttpResponse(output) ........
When comes to building routes and views, Django has this plain concept of views where it is represented as Python function that is responsible for coordinating business logic and expects an actual HTML template that will bind the output of the business logic to render the page. To view the actual rendered page on the browser, Django uses the URL declaration file urls.py
where we can place all of our well-defined URL namespace routes for each rendered template output we specified for eg
path('main/', views.index, name='index')
To view the rendered ‘index’ page, we configured the URL path as homepage
and then hook it up with our exported views’ index
function. Thus in the URL, you would see the content renderded via http://localhost/main/.
With this, we expected to have a big laundry list of views/templates to route and render via the urlpatterns
list.
from django.db import modelsclass Question(models.Model): question_text = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')class Choice(models.Model): question = models.ForeignKey(Question, on_delete=models.CASCADE) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)
When it comes to building up database models under the hood, Django comes bundled with its own ORM framework. Like the SQLAlchemy counterpart, it also comes with the Pythonical way of interacting with relational databases such as creating, querying and manipulating tables inserts and updates. Also, the semantics for describing the db data types are slightly different to the SQLAlchemy as they are bounded to the django.db models having exposed API function calls to make.
When it comes to building APIs, compared to Flask/Falcon, we have the external toolkit package that handles this well. It’s called the djangorestframework. The toolkit is built specifically for handling API-driven architecture systems as Django itself is a fully-fledged MVC framework for lots of server-side driven web applications which are traditionally tightly coupled together.
To start with, you do the following.
pip install djangorestframework
When creating the app, you do the following
django-admin startproject myappcd myappdjango-admin startapp tutorialcd ..
And you will notice the following app folder structure.
myapp/ db.sqlite3 manage.py tutorial/ migrations __init__.py admin.py apps.py models.py tests.py views.py myapp/ __init__.py asgi.py settings.py urls.py wsgi.py
Again nothing different to the previous Django app setup.
In your settings.py
file, you add the following:
INSTALLED_APPS = [.... 'rest_framework',]
Once you’re done with this initial setup, you can begin to outlay your API architecture design look like so below
#### Step 1 - Serializers Layout #####from myapp.tutorial.models import Question, Choicefrom rest_framework import serializersclass QuestionSerializer(serializers.HyperlinkedModelSerializer): class Meta: model = Question fields = ['url', 'question_text', 'pub_date']class ChoiceSerializer(serializers.HyperlinkedModelSerializer): class Meta: model = Choice fields = ['url', 'question', 'choice_text', 'votes']#### end Step 1 ######## Step 2 - Views Layout ####from myapp.tutorial.models import Question, Choicefrom rest_framework import viewsetsfrom rest_framework import permissionsfrom myapp.tutorial.serializers import QuestionSerializer, ChoiceSerializerclass QuestionViewSet(viewsets.ModelViewSet): """ API endpoint that allows users to be viewed or edited. """ queryset = Question.objects.all().order_by('pub_date') serializer_class = QuestionSerializer permission_classes = [permissions.IsAuthenticated]class ChoiceViewSet(viewsets.ModelViewSet): """ API endpoint that allows groups to be viewed or edited. """ queryset = Choice.objects.all() serializer_class = ChoiceSerializer permission_classes = [permissions.IsAuthenticated]#### end Step 2 ####### Step 3 - URLs setup ###from django.urls import include, pathfrom rest_framework import routersfrom myapp.tutorial import viewsrouter = routers.DefaultRouter()router.register(r'questions', views.QuestionViewSet)router.register(r'choices', views.ChoiceViewSet)registered_urls = router.urls# Wire up our API using automatic URL routing.# Additionally, we include login URLs for the browsable API.urlpatterns = [ path('', include(registered_urls)), path('api-auth/', include('rest_framework.urls', namespace='rest_framework'))]### end Step 3 ###### Step 4 - Unit testing ###from rest_framework.test import APIRequestFactory# Using the standard RequestFactory API to create a form POST requestfactory = APIRequestFactory()view = Question.as_view()request = factory.post('/question/', {'title': 'new question'})response = view(request, pk='4')response.render()self.assertEqual(response.content, "some_response_data")### end Step 4 ###
With the above, Django REST introduces few concepts here we have to learn here.
In step 1 - Django’s way of serializing JSON response data against models is by injecting one of its own serializer classes, such as HyperlinkedModelSerializer
, to the object constructor. This will instruct the class to interpret a number of complex data types such as querysets and model instances. Then we can access Metadata options to bind a number of fields of Model into response output as well as controlling its output structure as well.
In step 2 - when designing out your API resource endpoints that will take care of the logic of your serializable responses from above, we have to understand Viewset which is described simply as a class-based view that responsible for handling all the common HTTP actions/verbs handlers without having you to write your own. Meaning things like get()
, post()
, put()
, patch()
and delete()
or similar you use to write from Flask/Falcon world are almost ‘non-existent’. Instead, Django gives you list
, create
, retrieve
, update
, partial_update
and destroy
action handlers which, to me, are nothing more like syntactic sugar representation to their traditional counterparts. Viewset is simply an alternate concept to Resources or Controllers domain of API designs.
Interestingly, with this concept, Django offers a few different Viewset classes implementation out of the box for you to design resource API needs. They are:
1 ) GenericViewSet - are the views created as a shortcut for common logic code across similar views by injecting Django action mixins into the constructor thus making you not having to write extra code lines between resources.
One thing that caught my eye is that we have this other concept called Hyperlink APIs where is used to make discoverability of the APIs more meaningful by injecting entities’ URL names instead of its primary keys when describing models relationships. for eg.
# Instead of this ...http://myapp.tutorial.com/questions/1/choice# We get this url generation insteadhttp://myapp.tutorial.com/question/detail-1/choice
In the above, instead of fetching your primary keys from your model instances (say Question model with primary key of 1
as eg) to associate model relationships as part of the URL definition, we can replace it with a different url name as hyperlinks. We can do this by incorporating HyperlinkedIdentityField’s serializer class that represented as view_name
field along with a lookup_field
that refers to the particular instance of the model object - which in this case it would the primary key of Question table model ie pk
or similar .
This is something which Flask/Falcon doesn’t do out of the box.
You can see more example how it is exactly used here -
In step 3 - once we’re happy with our API url endpoints setup, we can begin to configure our routes. In Django, we imported DefaultRouter which is responsible for creating the hyperlinks for you intelligently based on the HyperLinkedModel Serializer setup you defined in the previous steps. Then we register
our URL endpoint routes by specifying two arguments; a prefix url name and a Viewset class. On the surface, this looks pretty straightforward. It’s not that much different compared to the Flask/Falcon counterparts - other than the fact Django provides special Viewset classes that can give developers extra wiring options when determining your action handler definitions when mapping out your URL routing rules.
Finally in our step 4 - we look at how the Django handles unit testing as a whole. As we are reusing Django’s existing test framework, Django REST offers extra utility testing tools for testing API request calls specifically by using their factory classes. To create our test requests, we import APIRequestFactory
first and then instantiate it as a post
request factory . Then to test our response assertions without APIRequestFactory, we need our Viewsets to be treated as views so we can inject our HTTP request factory into the view
constructor. That our response can be rendered so that we can perform our assertion on its content straight after.
With that, that’s pretty much it about Django!
Having all said and one with the above, I finally come to the part where I offer my opinions between the two.
Based on my personal (as well as professional) bias, I prefer working with microframeworks to major frameworks like Django. To me, a good framework comes with the following:
When reading up design philosophy like this, it totally resonates with me. That’s the type of excellent software design mantra I want to stick by. I feel that I have more ‘’rights’ as a developer to determine how my application can be designed in a certain way. As I mentioned on architectural freedom of choice earlier, microframeworks like Flask/Falcon give us the liberty in making design choices upfront when plumbing away all the components and have them all glued together to make a working product. Involving design choices need to consider all the important factors such as technology choice familiarity, fewer deep-seated dependencies, cleaner abstractions as well as ease of learning barrier for any new developers coming on board when enhancing the product further down the line. With all this considered, I would argue that you can still get away in writing (and achieving) code simplicity in design as the application grow in greater complexity over time.
With major frameworks like Django, RAILS or similar, I don’t have this “freedom”. They give you all these toolsets out of the box, from ORMs, security authentication systems, form generation, resource creation, serialization frameworks etc. They’ve decided the design choices for you already. The things I learned to build from the microframework world cannot be wholly applied here. All its own toolsets concepts like classed-based viewsets, serializers are hard to grasp for newcomers thus I already find that it’s already initial barrier for me when learning to build a simple API endpoint.
One top of that, even if you got the API endpoint working, their abstractions are not straightforward to follow like for eg what’s the difference ‘’GenericViewSet” and “ModelViewSet”. Are they same but serve different responsibilities under the hood? If so, what other extra concepts do I need to come to terms with? Does it make my job easier to design endpoint resources than I did with Flask/Falcon or not? What major benefit do I gain in learning extra concepts vs writing simpler and easier code I do in Flask/Falcon? More importantly, will they be easier or harder to change should the major framework developers decide to alter your understanding of these concepts in the future? In all front, I would argue that I don’t gain any benefit in making code design simpler using these tools.
It just made all the things harder.
While it’s harsh for me to say this, I have to be fair to them. To their credit, the primary initial benefit of having major frameworks is that they help to save you ahead of time when building your code. They introduced shortcuts toolchains to assist you in reducing boilerplate code so you never have to find yourself writing lines of code that are repetitive all over the place. They do this by introducing these ‘’black magic” such as class-based viewsets that gets to write DRYer lines of code and you’re well on your way in getting the product shipped out sooner than you realised. While that’s an attractive selling point in helping you out to achieve deadlines quicker in beginning, but in the long run it will not be sustainable.
Sooner rather than later, often, you will find yourself shooting yourself in the foot for paying the price when requirements change over time and the initial assumptions of code design not longer stay constant. When they changed, those fewer lines of code you will have to change. How much is the change you may ask? More important question how big the risk of change will impact the rest of the data flow in the system?. Not only that what about unit testing/integration? Now the DRY/abstraction is no longer serving, will the rest of the application be prone to breaking further? Not so straight-forward. Those “black magic” will come back to haunt you. 👻
I would say that using major frameworks that supposed to save you heaps of time is more of a false economy than anything if you ask me - eventually.
With major frameworks, it’s infamously said they give you conventions over configuration. When reading that, it means to me, as a developer, you are not required to care too much of configuration what makes the bells and whistle of the framework. All they want to teach you is to embrace their conventions and trust that they all do a good job and do the heavy lifting for you. You never have to worry or be concerned about their design decisions. Those same design decisions will come with a set of opinions people made for the community thus you will either have to embrace them truthfully or suffer the consequences should things don’t work out in your favour.
Either way, what reveals is that major frameworks like Django (arguably) forces you to be influenced by their design philosophy around these and stick to them to the teeth while Falcon/Flask are the complete opposite. For that reason, they’re making you feel real ‘’lazy” to write any good code thus you will never appreciate why writing good and simple code truly matters when designing your architecture in the direction you want.
All in all, I just want to say I’m not giving major frameworks out there a bad name.
These are just my opinions, and my opinions only.
I come from a different school of thought when building high-end solutions architecture of the application in various team sizes where I’m fortunate to see good simple and clean code architecture design is laid out. With any frameworks, they come with pros and cons of design. Django helps to saves you time upfront by eliminating boilerplate designs as a pro; but falls short when if you want to scale design differently compared to its built-in conventions later on. While Flask/Falcon may not have all the fancy ‘black magic’ toolsets Django comes, but the major pro for me is that the learning curve behind these are much simpler to work and encourages you to care deeply of design thinking while still keeping good clean and simple-to-understand code as the main goal.
I always firmly believe that every problem domain we aim to solve out there needs the right tool for the right job. If you think Django/RAILS/Laravel or similar helps to solve 80-90% of the problems and you don’t mind dealing with the conventions it comes with, then thats the correct tool of choice you should make. But if the same framework only helps to achieve 50% or under of the problem and their conventions doesn’t serve you any useful purpose at all, then Flask/Falcon or similar will be your guy to handle these.
Just remember that every framework designs comes with certain tradeoffs to build a good scalable application.
For me, I would go for simple clean code architecture. It will trump all those ‘black magic’ anytime any day.
That’s my two cents.
Till next time, Happy Coding!
Disclaimer: I’m not framework expert by any chance. I’m just making a high-level comparison between these and drawing my ow conclusions on how they are good or not so good for anything when building good architectural apps. - with a bit of dash of personal bias here ;)
]]>// when the giant red button got triggered - 🚨🚨async function globalWHOAlertSystem(pandemicAlertLevel, globalToiletPaperSupplyChain) {let message = document.getElementById("message"); message.innerHTML = "";try {do {const areWeDoomed = await globalToiletPaperSupplyChain.verifyStockLevel()if (areWeDoomed) throw "We are so screwed😱!!"}while (pandemicAlertLevel > 60000); // it's over 60,000!} catch (err) {message.innerHTML = err; }}
Well, the code snippet above pretty much summarises everything we all knew what the year of 2020 has brought us.
All small jokes aside, this year has been an incredibly difficult year for all, especially for the tech community in general.
Countless of tech meetups, tech networking events and tech conferences have either been postponed indefinitely, rescheduled or cancelled altogether around the world.
Thus everything from JS, Python, Docker, AWS, Blockchain, and UI/UX meetups were a huge miss for me this year as part of my life-long passions for technology used in business.
But regardless of how the pandemic turns out, eventually it will be resolved or contained by health community experts trying their very best to combat its deadly virulence from spreading further.
Just like the past pandemics before us.
They come and go.
And technology realms will always keep moving forth regardless of the nature pandemic comes with..
Technology trends will always run in constant flow..
That we all can bet on.
However, sadly, not all things can join and follow the same flow.
Many people on this planet have lost their loved ones during this pandemic; whether they actually contracted the virus or not.
On a personal level, I lost someone that is very close to me.
My father recently passed away due to a heavy stroke he fell early this year.
The hardest part of this pandemic is it’s forced me not to be with my father physically one more time in his presence due to the international border lockdowns rules. And my family in NZ had little or choice not to wait for the 2 week period for me to arrive at the Auckland aiport airport to attend his funeral.
With that, I was left with no choice but to live with the rest of my life that I would never get to utter my words of goodbye to my father face-to-face before he was sent for the ashes.
I had to use virtual experiences via Whatsapp video calls instead to see his lifeless body one last time…
There are no simple words to make this whole thing better.
In the beginning, I want to lead my dev life of 2020 to come out with a great bang.
But I feared this is not the case.
Can we still be happy coder moving forth in the midst all of this? Will I be able to continue writing blogs post in the near future?
Naturally, it will.
When all things passed and the grief subsides..
All in all, it’s been a trying year for everyone.
Final message - look after yourself and your love ones very closely.
Appreciate their presence with all your love and care for they do want the best for you in all aspects of life. You’ll never know once they’re gone from your life..
Till then, here’s to many years of future Happy Coding days ahead.
God Bless.
]]>And functional programming (FP) - a newish paradigm has been permeating through the scenes of developer community for some time; everything from Haskell, Elixir, React, AWS Lambdas to Clojure etc.
Or, at least it’s yet to make establish some norms within the community…
But I must digress.
After dabbling around with Javascript/React for a while now, every JS developer would be inclined to tell you using map
, filter
and reduce
are the default go-to tools for expressing their FP-ness in all over their front-end codebase.
You’ve probably seen those patterns numerous of times via google searches or countless Medium or FreeCodeCamp tutorial blogpost or what not.
// Typical start of your FPer dayconst stayAtHomePersons = [ { id: 1, name: 'Josh Hamish', cookingScore: 90, exerciseScore: 5, isSelfIsolating: true }, { id: 2, name: 'Blake Lively', cookingScore: 10, exerciseScore: 80, isSelfIsolating: true }, { id: 3, name: 'Ken Jeong', cookingScore: 0, exerciseScore: 90, isSelfIsolating: false }];// No new surprises here.const totalSAHPersonScore = stayAtHomePersons .filter(person => person.isSelfIsolating) .map(person => person.cookingScore + person.exerciseScore) .reduce((acc, score) => acc + score, 0);
In the JS world, this is how we roll our F.M.Rs mojos. 👩💻👨💻😎
Now, let’s head the Python fence, and see how they’re handle things over there.
Disclaimer: Before I proceed further, I just want to say, beforehand, at the time of this writing, as as the entire online community is facing a major unprecedented challenge in our lifetimes and we want all doing the best to treat things with great caution, the examples I’ll be using next may cause upset to the readers and I do not mean to cause any grave amount of discontentment amongst. The purpose of my blog is to purely demonstrate my learnings over the past few weeks since start of the global pandemic situation, and wish to reiterate my platform will be used for this purpose only. I wish to apologise in advance.
Lately, I have been dabbling on how Python does well with its own filter
, map
and reduce
functions approach, and I couldn’t help to look at the following datasets provided from the Dr John Hopkins University’s Github repo on latest covid19 cases.
/** Daily Covid 19 Cases - NOT ACTUAL DATA **/[ { "Province/State": "Hubei", "Country/Region": "Mainland China", "Last Update": "2020-03-01T10:13:19", "Confirmed": "1000", "Deaths": "10", "Recovered": "50", ... ... }, { "Province/State": "", "Country/Region": "South Korea", "Last Update": "2020-03-01T23:43:03", "Confirmed": "100", "Deaths": "1", "Recovered": "2", ... ... }, { "Province/State": "", "Country/Region": "Italy", "Last Update": "2020-03-01T23:23:02", "Confirmed": "20", "Deaths": "2", "Recovered": "1", ... ... }, { "Province/State": "Guangdong", "Country/Region": "Mainland China", "Last Update": "2020-03-01T14:13:18", "Confirmed": "20", "Deaths": "3", "Recovered": "10", ... ... }, /* and the rest....*/]
Their data were originally in CSV formatter so I wrote my own little Python script that feeds on these CSV files and have them fully JSONified and from here, I can see immediate pattern going on here.
It reveals that a lot daily cases are collected from every state and provinces in each country and are individually reported for number of cases that are categorised as either confirmed, recovered and death cases. With this, my immediate thought is this would be a good case to place Python FP’s exercise into good use.
To start off, I quickly whipped up my own mini API app using Flask/Falcon - (which doesn’t really matter to be honest)
import falcon'''Start a Flask/Falcon app starting with a resource endpoint:'''class DailyCovid19Resource(object): ''' Pick one of the covid19 daily JSON resources as a date format ''' def on_get(self, req, resp, file_id): try: format_matched = re.match( "((0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])-[12]\d{3})", file_id) # mm-dd-yyyy format if not format_matched: raise ValueError('wrong file_id pattern') data = fetch_json_data(file_id) country_list = [] for dic in data: if 'Country_Region' in dic: country_list.append(dic['Country_Region']) elif 'Country/Region' in dic: country_list.append(dic['Country/Region']) unique_countries = list(set(country_list) new_data = list(map(get_covid19_schema, data)) # -> this is where it's starting to get interesting ...... ......api = application = falcon.API()daily_covid19 = DailyCovid19Resource()api.add_route('/covid19/{file_id}', daily_covid19)
In this example, I made my resource endpoint covid19
and it expects file_id
as the main parameter_id used to query a specific json file to be fetched from the server.
In order to make the right (and exact) search for the specific covid19 json file, I decided to add some a bit regex expression here just to make sure file_id
matches the date format my json files are named ie mm-dd-yyyy
. That way I will have Python exception handler to capture the error should the file_id
failed to meet the regex pattern matching requirement, raised the ValueError as that will halt entire get resource API operation.
Once the file_id matching completes, then we can start making the fetch_json_data
call (as below) and fetches correct json file off from the server,
# fetch json data file and serialised as json formatdef fetch_json_data(file_id): source_file = './some_folder/{}.json'.format(file_id) with open(source_file) as covid19_json_file: data = json.load(covid19_json_file) return data
# list of unique countriesunique_countries = list(set(country_list))
Once the data is fetched upon returning from the fetch_json_data
call, I start building out my list of unique countries by using List and Set combo. With this new list, it will be used later which will be traversing the list of countries data of covid19 cases that matches with the dict’s key Country_Region
or Country/Region
. It has two types of keys because John Hopkins University dataset had revised the data set representations in between the early cases of Coronavirus in January 2020 and Mar 2020.
Beforehand, I need to determine covid19 schema dictionary I want to extract from each item in the array using get_covid19_schema
like so.
# Map functions to perform against list to collect row's properties we wantnew_data = list(map(get_covid19_schema, data))............def get_covid19_schema(item_dict): schema_keys = ['Country_Region', 'Country/Region', 'Confirmed', 'Deaths', 'Recovered'] mapped_dict = dict((k, item_dict[k]) for k in schema_keys if k in item_dict) converted_dict = convert_values_to_int(mapped_dict) return converted_dict# as some of the figures don't have numerical valuesdef convert_values_to_int(item_dict): keys = ['Confirmed', 'Deaths', 'Recovered'] for k in keys: # fix data that has empty string item_dict[k] = int(item_dict[k]) if item_dict[k] != "" else 0 return item_dict
See how I’m using map
function by passing both a function and an array iterator as parameters? This will tell the map operator to perform such function on each iteratee of the array iterator and return the consequence of the actions as a result. According to its API docs, the map
output signature comes as a Map object. Meaning that it’s only a proxy result and we need intermediary function call that will be responsible for converting it into actual list of each element that has been applied either a list, set or tuple. Which in this case, it’s going to be a list of my dictionary results.
This is fascinating as coming from the JS world, we don’t need intermediary function calls when we perform map
functions as JS itself does not come with complex data structures (ES6 Sets anyone 🤔?) that Python has. Arrays are the most basic and the most popular structure that JS Developers used by default so it’s easy to see why we didn’t need to worry getting proxying result set that Python has to come grips with… 😶
Later on, now that we got the list of new_data
of countries that comes with different state/provinces that has different covid19 cases data, the next job is to make an aggregate of the total number of covid19 cases for each category from each unique country, which I get from my unique_countries
list. Each of this country’s aggregate numbers take into this form.
{ "CountryRegion": "Some Country", "Confirmed": "total x Numbers", "Deaths": "total y Numbers", "Recovered": "total z Numbers",}
Then, I put them altogether into one big list as a payload response to the client side.
To achieve this goal, I thought ahead of using filter
and reduce
for this…
And this is what I came up with.
from functools import reducenew_data_with_sums = []for uc in unique_countries: country_total_dict = reduce( (lambda x, y: sum_up_totals_by(uc, x, y)), list( filter(lambda x: filter_by_country(uc, x), new_data))) new_data_with_sums.append(country_total_dict)
There’s a lot of happening here - which I’ll explain.
uc
from a list of unique countriesnew_data
list of countries cases based on the selected country and their individual covid19 cases. With each returned uc
cases, I needed to convert into a list. This is because in the data set, there are some countries which have several number of provinces/states (ie China, Russia or USA) where their total number of cases are defined by the geographical distinction such as this example.new_data_with_sums
list.unique_countries
list.# This is Step 2 operationfiltered_data_by_uc = list(filter(lambda x: filter_by_country(uc, x), new_data)
The above snippet says with new_data
list , I apply filter
function by passing filter_by_country
callback that is responsible for filtering out countries in the same list, To filter the data correctly, I rely on the closures upon uc
(which is gathered from unique_countries
looping call earlier and its current element (x) that’s being interated upon.
This is equivalent to JS snippet below.
const filteredList = new_data.filter(filterByCountry(uc));
Notice they’re conceptually the same, but the syntax between the two clearly differs. lambda
keyword is synonymous to anonymous functions for Javascript. Both of them make use of closures thus both have something in common.
And notice we put list
as a wrapper around the filter
function? Apparently, filter
also returns a Map object, just like map
example earlier. You can return in any type of data structure you want as well.
Here’s the filter_by_country
implementation.
def filter_by_country(uc, item_dict): country_keys = ['Country_Region', 'Country/Region'] for k in country_keys: if k in item_dict and item_dict[k] == uc: return item_dict
With that mind, we take filtered_data_by_uc
and pipe that into our next action step
# Step 3from functools import reducecountry_total_dict = reduce( (lambda x, y: sum_up_totals_by(uc, x, y)), filtered_data_by_uc))
Here, we use reduce
method to take in the filtered_data_by_uc
list and take each country’s number of covid19 cases to aggregrate them (using sum_up_totals
callback) in get the total sums of confirmed, recovered and death cases, producing into a standalone dict object for that particular country.
Here’s the implementation for sum_up_totals
def sum_up_totals_by(uc, x, y): keys = ['Confirmed', 'Deaths', 'Recovered'] result = {'Country_Region': uc} for k in keys: result[k] = x[k] + y[k] return result
Again, the reduce
function signature is pretty much the same with filter
and map
counterparts ie using lambdas, closures etc - with the minor difference is that it doesn’t return a list or any similar iterable object, but rather as a single value (or an accumulator if you like) after applying the lambda function as above.
This comes off as expected as any FP developers from Haskell, Scala, Erlang, Clojure communities etc would tell you that’s how they live and breathe writing this kind of code - like a boss 😎.
But with reduce
function, you can also return an iterable object as an accumulator result as well.
Which let me to think I didn’t need the for-loop to append individual unique countries’ aggregate results into new_data_with_sums
.
I could simply rewrite to this.
from functools import reducenew_data_with_sums = []# notice the third argument is introduced to this reducer functionnew_data_with_sums = reduce(lambda acc, current: sum_up_totals_each_country(acc, current, new_data), unique_countries, new_data_with_sums)
With the introduction of new sum_up_totals_each_country
lambda function, I’ve basically moved filter_by_country
filtering function inside that function where (by using closures and callbacks) it can gain access to the list of unique_countries
and the raw new_data
to performing the list filtering from there.
Once the filtering is complete, using the acc
which is a proxy to new_data_with_sums
, I start to traverse the same filtered list to accumulate the total cases results for a country that has multiple states and provinces and have it appended to the acc
Here’s the implementation of sum_up_totals_each_country
.
def sum_up_totals_each_country(acc, current, new_data): keys = ['Confirmed', 'Deaths', 'Recovered'] # the same list filtering as before each_country_covid19_list = list( filter(lambda x: filter_by_country(current, x), new_data)) # some countries have multiple states and provinces of data hence for this nested looping country_case_totals = {} for each_country_covid19 in each_country_covid19_list: for k in keys: temp = each_country_covid19[k] if k in country_case_totals: country_case_totals[k] = temp + country_case_totals[k] else: country_case_totals[k] = temp # fixed property name as I don't need another alias: 'Country/Region' to be return in the response payload country_case_totals['Country_Region'] = current acc.append(country_case_totals) return acc
And the rest is history.
Notice how the Python’s very own built-in functional tools (map
, filter
and reduce
) are slightly different to Javascript counterpart, at least not for the conceptual level but on the syntactical level.
Though it is said both Javascript and Python supposedly treat functions as first class citizens (unlike Java where everything is 100% OOD - boo!) yet I found the thing that stands out the most different between the two is that Javascript support method chaining for iterables, right out of the box.
Python doesn’t do that by design for its own set of iterables when doing functional programming. Thus I always wondered why I had to write the code above by wrapping functions one on top of another… which lead me to have the fact that Python’s design philosophy was driven in an imperative-style way for a long time. It’s never meant to go to the FP route, so says this blog post from the original Python creator himself. Upon reading these, that may explain the deep culture of imperative style coding Pythonistas have been doing for a very long time using tools like itertools and list comprehensions for more efficient data looping in place.
In spite of that, it’s still evident we are given the same FP toolset to accomplish here vs I have been doing on the JS/React front for some considerable. IMHO, given how multi-paradigm languages evolve and influence each other (just like how Python influenced Javascript to have Pythonic flavours) and now that Python 2 is sunset already at the beginning of 2020, who knows where Python 3 is going to move forward from here. I strongly believe you can still teach an old dog some new tricks. If Javascript can learn to be Pythonista by day, I don’t see why Python can’t come out of its comfort shell to learn new toolset from other developers community that addresses some of its limitations for modern problems.
I’m pretty much excited of what lies ahead for Python as a whole and many others. I can’t wait to start using more of these tools set in future projects. (Scala anyone? 😌😉)
Till then, Happy Coding, (and remember: stay safe and do your social distance-coding rules)!
]]>And more often than not, those same Javascript modules are not always clear when you spend time determining the inter or intra-relationships between them.
For eg,
A developer (let’s called him Jake) works on modules, A, B, C, and discovers the following dependencies pattern.
Module A depends on Module B and C,Module B depends on Module CModule C doesn't depend on anything.
From this, Jake would assume he only needs to worry about the core functionality shared between A, B and C in linear fashion.. and nothing else.
But this is a very bad assumption to take because, as he searched through the codebase, he learned/realised that there are other modules in the system are also relying on Module B and C and he’s never worked on these other modules before.
But yet, he has to touch these modules because any changes he introduced in Module B/C not only has to work with Module A, but it must not break in Module XXX etc, that depends on B/C as well. Or.. he may not know anything about Module XXX at all and he would like to communicate better with his teammates how strongly interlinked this module would be across the entired front-end platofrm…
Sure… you may whip up the VSC and do a global search of the entire codebase to find the relevant modules you see want like so
project│└───A│ │ A.js│└───B│ │ B.js│└───C │ C.js
You still couldn’t figure out clearly on the outset how they supposed to relate to each other visually when navigating the flow of data coming from one module to the next or several modules.
You want to see the bigger picture… The bigger picture where we want to see ALL the dependencies and ALL files at the same time.
Henceforth, we found the solution here.
That solution is called MadgeJS.
According to its main Github headline,
Create graphs from your CommonJS, AMD or ES6 module dependencies.
Nuff said.
It is simply a fantastic tooling that helps you to build visual graphs of your Javascript dependencies tree whenever you wish to discover the interdependent relationships between modules in the whole JS systems - on the fly.
The short version of this:
For example, if you do the git clone of madge js library, and run npm install when you run madge on its library, you will see below
You may need to install graphviz to view this.
Which is great for small projects.
But… what about bigger ones?
Such as the one from Moment JS here.
☝️ That’s pretty insane right?
So, you can see the visual dependency graph tree could get so many layers of depth of dependency tree branches as far as you can reach. From this, as developer working in a sizable team, you wamt to know where the modules have been moved around or changes by other developers in the codebase.
You can, of course, change the depth of the dependency graph as little or more as you want.
Madge allows you to:
With all, you can determine the output of the visualisation graph in different formats such as PNG, SVG or even JSON as well.
It works on all types of JS frameworks ie React, Angular, VueJS, Svelte, Express etc, etc.
That’s it, folks! Have that as another awesome tooling to add as part of your JS toolkit belt.
Till then, Happy Coding!
]]>From these observations, we programmers developed our conversations on design patterns in making scalable software solutions.
In particular with JS, with the influx of JS libraries, frameworks, tools etc, we can build our applications to solve some particular problems in so many different ways. But, no matter how much tooling JS developers are going to be choosing, there’s no better substitute for incorporating useful patterns in your code design where you see fit.
Thus the question remains - what are the common and useful design patterns that modern JS developers will be dealing on a day to day basis?
At the very core of everything, after working with JS space for several years, you should learn and be aware of these:
This pattern is well-known for creating and initializing new objects when memory is allocated. Just like a traditional object-oriented purist, Javascript also shares the same object constructor blueprint such as Java or C#.
Thus, prior to ES6/7/8/9, we had the following traditional way to do things.
// Constructor Pattern - old school approach/* Option 1 - using Object.create function call and object literal presentation */var shirt = {colour: "red"size: "S"price: 3.5}var blueshirt = Object.create(shirt)blueshirt.colour = 'blue'blueshirt.size = "M"blueshirt.price = 5.5/* Option 2 - use the 'new' keyword */function Shirt(colour, size, price) {this.colour = colour;this.size = size;this.price = price;this.toString = function() { return this.colour + " shirt has size " + this.size + ", selling for $" + this.price; }}var blueshirt = new Shirt("blue", "M", 3.5)var redshirt = new Shirt("red", "L", 5.5)
Back in my days of BackboneJS web development several years ago, this is the most common pattern approach when making object creations for your Models, Collections and Controllers etc. You notice this pattern is very apparent all over the place.
But since then, the world moved on, ES6 arrived and we now have the following way that achieves the same thing using new class
and constructor
approach.
// Constructor Pattern - ES6 approachclass Shirt {constructor(colour, size, price) {this.colour = colour;this.size = size;this.price = price;}toString() { return `${this.colour} shirt has size ${this.size}, selling for $${this.miles}` }}let blueShirt = new Shirt("blue", "M", 3.5)let redShirt = new Shirt("red", "L", 5.5)
You’ll find that most modern JS frameworks that came out during post-ES6 era will embed this constructor patterns all over the place such as React, VueJS, AngularJS. You can google up “some_js_framework_name constructor example “, and you will see what I mean.
Next, we have another variation of object creation patterns, but it makes use of JS prototypical inheritance whereby objects are created to prototypes for all object types. Prototypes act as blueprint for each object constructor created. Prior to ES6, It is normally done using prototype
bindings.
// Prototype Pattern - Old school approachvar Shirt = function(colour, size, price) {this.colour = colour;this.size = size;this.price = price;}Shirt.prototype = {changeColour: function (newColour){ this.colour = newColour;},changeSize: function (newSize){ this.size = newSize;},changePrice: function (newPrice){ this.price = newPrice;}}var blueShirt = new Shirt("blue", "M", 5.5)blueShirt.changeColour("red");blueShirt.changeSize("L");blueShirt.changePrice(3.5);console.log(blueShirt); //outputs - Shirt {colour: "red", size: "L", price: 3.5}
Its ES6 counterpart is the same as its constructor pattern example above.
// Prototype Pattern - ES6 wayclass Shirt {constructor(colour, size, price) {this.colour = colour;this.size = size;this.price = price;}changeColour(newColour) {}changeSize(newSize) {}changePrice(newPrice) {}toString() { return `${this.colour} shirt has size ${this.size}, selling for $${this.miles}` }}
The reason for this is class
and constructor
are just plain syntactic sugars for prototype bindings as part of the internal mechanics. This option gives us the freedom to write cleaner code and confidence to write true object-oriented code thus developers coming from Java, C# etc will feel more at ease using these.
For these same reasons, you can now emulate object inheritance using extends
keyword as well
// classic OOP inheritance exampleclass Employee {constructor(name, salary, tax_rate) {this.name = name;this.salary = salary;this.tax_rate = tax_rate;this.total_working_hours = 40 }calculateSalary(){// some calculations to perform here}}class FullTimer extends Employee {//properties to describe full-timer's role}class Contractor extends Employee {//properties to describe contractor's role}
This pattern is used as an improvement to the prototype pattern approach. Different types of modifiers (both private and public) are set in the module pattern. You can create similar functions or properties without conflicts.
// Module Design Pattern - old school approachvar Shirt = (function() { // private variablesvar _colour = "blue";var _size = "M";var _price = 5.5;// public methods and propertiesreturn {colour: _colour,size: _size,price: _price,changeColour: function(newColour) {this._colour = newColour;},changeSize: function(newSize) {this._size = newSize;},changePrice: function(newPrice) {this._price = newPrice;}}})();
Notice, in the above, my private variable declarations are denoted with _
prefixes. This is more of a convention than an actual language feature implementation as JS does not, at the time of writing, have a way to present private variables/properties like you do in Java, C# etc. Just to add that for clarity.
Again, you can find similar patterns of these used in several jQuery-influenced JS frameworks such as BackboneJS etc.
After ES6 come along, we now use import/export
mechanism instead. Thus we get:
// Module Design Pattern - ES6 way// Saved in a some-module-design-pattern-es6.jsclass Shirt {constructor(colour, size, price) {this.colour = colour;this.size = size;this.price = price;}changeColour(newColour) {}changeSize(newSize) {}changePrice(newPrice) {}toString() { return `${this.colour} shirt has size ${this.size}, selling for $${this.miles}` }}export default Shirt;// usageimport Shirt from './some-module-design-pattern-es6';let blueShirt = new Shirt("blue", "M", 5.5)
This pattern is very useful when objects need to communicate with other sets of objects simultaneously. This is particularly true where there is shared a shared data/state that gets changed/updated across domain objects that are subscribed to listen/respond for changes.
With this pattern, there is no unnecessary push and pull of events across the states, but rather the modules involved only modify the current state of data.
// Observer Pattern - old school approachfunction Observer() { this.observerList = [];}Observer.prototype = { subscribe: function(element) { this.observerList.push(element); }, unsubscribe: function(element) { var elementIndex = this.observerList.indexOf(element); if(elementIndex > -1) { this.observerList.splice(elementIndex, 1); } }, notifyAll: function() { this.observerList.forEach(function(observerElement){ console.log("observerElement: " + observerElement.name + " has been notified"; }) },}
Its ES6 equivalent:
// Observer Pattern - ES6 way// again using class and constructor combo..class Observer { constructor() { this.observerList = []; }, subscribe(element) { this.observerList.push(element); } unsubscribe(element) { let elementIndex = this.observerList.indexOf(element); if(elementIndex > -1) { this.observerList.splice(elementIndex, 1); } } notifyAll(element) { this.observerList.forEach(function(observerElement){ console.log("observerElement: " + observerElement.name + " has been notified"; }) }}
You will find tons of examples this observer pattern is used very heavily across all JS apps in any old JS frameworks you may find as I explained in my previous examples in this blog post.
Lastly, this pattern aims to promote code re-use. They offer the ability to add additional behaviour or features to existing classes in a system dynamically.
It is common that you’d find applications (especially in the object-oriented world) that contain features requiring large quantity of distinct types of object it has to manage and deal with. Thus, keeping track for every object definition and creation would be monumental tasks to accomplish at given point of time during the running lifetime of an app.
// Decorator Pattern - old school approachfunction Shirt(brandName) {this.brandName = brandName;this.colour = function() {return "blue";};this.size = function() {return "M";};this.price = function() {return 3;};}function swapColour(shirt) {var c = shirt.colour();shirt.colour = function() {c = "blue";return c;}}function changeSize(shirt) {var s = shirt.size();shirt.colour = function() {s = "S";return s;}}function addCostToPrice(shirt) {var p = shirt.price();shirt.price = function() {return p + 1.50;}}var shirt = new Shirt("Gucci");swapColour(shirt);changeSize(shirt);addCostToPrice(shirt);console.log(shirt.colour()); // blueconsole.log(shirt.size()); // Sconsole.log(shirt.price()); // 4.50
Its ES6 equivalent would be:
// Decorator Pattern - ES6 approach (again using class and constructor combo)class Shirt {constructor(brandName) {this.brandName = brandName;this.colour = "blue";this.size = "M";this.price = 3;}}function swapColour(shirt) {shirt.colour = "white";return shirt;}function changeSize(shirt) {shirt.size = "L";return shirt;}function addCostToPrice(shirt) {shirt.price += 1.50;return shirt;}const defaultShirt = new Shirt("Gucci")const whiteShirt = swapColour(defaultShirt);console.log(whiteShirt); // outputs - Shirt {brandName: "Gucci", colour: "white", size: "M", price: 3}const whiteLargeShirt = changeSize(swapColour(defaultShirt));console.log(whiteLargeShirt); // outputs - Shirt {brandName: "Gucci", colour: "white", size: "L", price: 3}const priceyWhiteLargeShirt = addCostToPrice(changeSize(swapColour(defaultShirt)));console.log(priceyWhiteLargeShirt); // outputs - Shirt {brandName: "Gucci", colour: "white", size: "L", price: 4.5}
With all the examples above, what you may realise is that these patterns share one thing in common - they’re come from the basis of object-oriented programming (OOP) design paradigm.
These are designed to revolve around object-oriented driven software systems. You can read up examples here(with jQuery) and there( with BackboneJS).
And recent years, we’ve been told to get educated in building applications using a different programming paradigm. And that is, functional programming (FP).
Without a doubt, we’re slowly starting to hear plenty of noise that functional developers/advocates are working on, thus languages like JS are no stranger to this concept.
Interestingly enough, Javascript treats functions as first-class citizen vs its object-oriented aspects.
Thus, modern frameworks such as React starting to appear, are embracing these fully.
The key questions are - what are common design patterns can you employ in the functional programming world, particularly with React?
Well.
There’s aplenty of them to describe here.
Examples of React patterns:
Function components are, as they plainly described, just simple functions that return components
Here’s an example.
// Functional Componentsconst Greeting = () => <div>Hello small world!</div>
// Functional Components with propsconst Greeting = (props) => <div>Hello {props.name}}!</div>
What we’re saying here is that we’re not keeping states in the component so we always pass down the props to the functions so that they are reliably testable.
It is said to be a class-based components because it has data and attributes that associate with the object’s state.
Stateful components are usually class-based components using thins like constructor properties etc.
// Class Based Componentimport {Component} from "React";class NumberRandomGenerator extends Component {state = {number: Math.random()}render() {return (<DataResultresult = {this.state.number}title={'Your name is:' }onClick={this.setstate({number: Math.random()})}/>)}}
Higher Order Components (HOC) is said to be design pattern, which is also know as a Decorator Pattern. Commonly, in ReactJS, a HOC is a component that wraps another component by adding extra functionality or extra properties. This allows abstraction from some commonly used logic and keeps your code DRY. It is how you distribute complex component structure between other components in ReactJS and a way to decouple your application logic and UI.
// Presentational Components or HOC Componentsimport {Component} from "React";export const addSomeData = (WrappedComponent) => { return class extends Component { render() { return <WrappedComponent number={this.state.number} name={this.state.name} {...this.props} /> } }}export const withNumberAndName = addSomeData(DataResult)
Container components’ sole responsibility is to have logic to set state or have functions to emit events up to a parent component. The general rule of the thumb is to keep your component as simple as possible with a Single Responsibility Principle design principle in mind, which essentially means your component must do one thing, but do it well.
Most often, these types of components are the HOCs that accommodate few presentational components.
// Container Componentsimport {Component} from "React";class CommentListComponent extends Component {state = {comments: []}componentDidMount = async () => {try {const response = await fetch('https://someurl.com')const result = await response.json();this.setState({comments: result})} catch(error) {console.error('Error:', error);}}render() {return <CommentList comments={this.state.comments} />}}
It is a technique or pattern used to share code between components using a prop whose value is a function .
They’re most helpful in sharing cross-cutting concerns thus allows you to share and re-use patterns and logic across components.
Here’s one example
// Render Props Pattern - based on the official docs - https://reactjs.org/docs/render-props.htmlimport {Component, Fragment} from "React";import moment from "moment";class Watch extends Component { state = { date: moment(); } componentDidMount = () => (this.TICK = setInterval(this.update, 1000)) componentWillUnMount = () => clearInterval(this.TICK) update = () => this.setState({date: moment()}) render = () => ( <div> {this.props.render(this.state.date)} </div> )}const AnalogFace =({date}) => { const seconds = (360 /60) * date.seconds(); const minutes = (360 /60) * date.minutes(); const hours = (360 /12) * date.format('h'); return ( <Fragment> <span>{seconds}</span> <span>{minutes}</span> <span>{hours}</span> </Fragment> )}class App extends Component { render = () => ( <Fragment> <h1>Checkout this cool watch</h1> <Watch render={date => <AnalogFace date={date} />} /> </Fragment> )}
Now, with the examples I provided in this post, I would like to point out this is not about memorising these patterns by heart and always remembering to apply in all situations. Some patterns are good for something. Some may not. This is also, by far, not the most exhaustive list of design patterns you will ever gonna need. They are many more patterns out that still need yet to explore and get accustomed such as the ones you get from (in the OOP world) Gang of Four: Design Patterns: Elements of Reusable Object-oriented Software , or (in the FP world) you get several online academia reads or learning resources such as this Github link as an example.
I just sharing my years of experience in working within the JS space that I’ve been in both sides of the fence; one being with the OOP world, the other being with the FP world, and how there’s array of battle-tested software design patterns that developers and engineers have used over the years, regardless of the JS frameworks such React/Redux/VueJS/Angular etc you’ll be using.
The key takeaways from this is always be mindfully aware of the design patterns you will be seeing and using over and over again through your JS software development career. They come with several shapes and forms such that they will have a major influence on the present and future libraries/frameworks/tools you will come across in every software projects you’d be doing. What better way to get your hands dirty is to get very much down to the very basics of them by starting these.
From there, once you master them over time, you will evolve to get better and sharper in writing amazingly brilliant software as part of your craft! 🚀🚀🚀🚀🚀
Till then, Happy Coding!
]]>In fact, every full stack developer will tell and share you their stories and trivialities of working with database intimately every day.
So what better way to work with them is to know plenty of SQL statements such as SELECT
, GROUP BY
, FROM
, WHERE
etc, which is paramount without question.
Knowing such basic skills allows you to work with disparate industry-standard relational database technologies such as MySQL, MS SQL, Oracle DB, Postgres, and many more.
But what I discovered, recently of late, there’s a new tool that has been slowly introduced to these relational database technologies crowd all the while.
For the first time, you can now create and generate JSONified results from SQL statements.
What does this mean exactly?
It simply means instead of running a SQL statement that traditionally returns a resultset in a table format and you had to make some translation of such database for the web app to make sense of the data upon its return, you convert the same resultset into a JSON payload result!
How is that so?
Let me show you how.
Let’s say you’re working with Postgres DB using the following query.
/* Some simple table query we’re running... */SELECT customer.id, customer.first_name, customer.last_name, customer.dob FROM customer;
As we know, with the above statement, our web app or mobile app or similar will expect a table and to make an appropriate decision on how to handle the purpose of the data.
Now, we decided we can make this into JSON object results.
We use row_to_json()
method
/* Now let’s throw in some JSON magic here.. */SELECT row_to_json(jt) FROM(SELECT customer.id, customer.first_name, customer.last_name, customer.dob FROM customer) jt
This will result in the JSON output
{"id": 1, "first_name":"Andy", "last_name":"Wong", "dob": "19xx-08-05"} // not telling you my real age ;)
This returns the first-row result which makes sense when you interpret its row_to_json
and believe that it intuitively grabs the first row back in a jsonified output.
But what if you decide that you want to grab all the other rows remaining in the table you ran in the original SQL statement?
If that’s the case, you use array_agg
and array_to_json
methods.
/* Ready for full blown JSON-ripped data */SELECT array_to_json(array_agg(row_to_json(jt)))FROM(SELECT customer.id, customer.first_name, customer.last_name, customer.dob FROM customer) jt
Running the above will result you in having a JSON array of objects.
[{"id": 1, "first_name":"Andy", "last_name":"Wong", "dob": "19xx-08-05"}, {"id": 2, "first_name":"Bruce", "last_name":"Lee", "dob": "1940-11-27"}, {"id": 3, "first_name":"Chuck", "last_name":"Norris", "dob": "1940-03-10"}]
What’s happening here is that array_agg
is an aggregate function that acts count or sum thus it will aggregates the query into one PostgresSQL array, while array_to_json
take the same PostgresSQL array and flattens it into a single JSON value.
This is pretty amazing!!
Do you know what immediate benefit this will mean to you as an application developer?
It means you don’t have to take the responsibility in making the data translation from a table into JSON-formatted results such as using ORM API to deconstruct/reconstruct the resultset.
You get the exact result query you specifically asked for! No data structure to manipulate.
All you then do is to get ORM tools such as Node’s Sequealize or Python’s SQLAlchemy to do only one job, which is to connect fetch the resultset only!
Your front-end app that consumes it, will have the appropriate data model to match the JSON payload response from the server And that’s it!
let customer_data = fetch(api_call).then(result => result.data) //just assume the data is successful.console.log(customer_data.length) // 3console.log(customer_data[0].id) // id: 1console.log(customer_data[0].first_name) //first_name: Andyconsole.log(customer_data[0].last_ name) //last_name: Wongconsole.log(customer_data[0].dob) //dob: 19xx-08-05
// Or Typescript equivalentexport interface CustomerModel { id: number, first_name: string, last_name: string, dob: Date}
With this, you can do furthermore with it.
Using the same customer query above, what if you want to tweak query, say, a customer can have several customer orders? The customer orders will also appear inside this query but we want to appear as embedded JSON resultset against each customer JSON object.
Using the same idea we learnt above, we can rewrite it into the following
/* Revised JSON query to embed JSON array of data against each JSON customer object */SELECT array_to_json(array_agg(row_to_json(jt)))FROM (SELECT customer.id, customer.first_name, customer.last_name, customer.dob (SELECT array_to_json(array_agg(row_to_json(customer_orders))) FROM (SELECT orders.id, orders.order_number, orders.description, orders.amount FROM orders WHERE orders.customer_id = customer.id) customer_orders ) AS orders FROM customer) jt
// JSON results with embedded queries[ { "id": 1, "first_name": "Andy", "last_name": "Wong", "dob": "19xx-08-05", "orders":[ {"id":1, "order_number":"aw0001", "description": "books", "amount": 90.00}, {"id":2, "order_number":"aw0002", "description": "magazines", "amount": 45.00}, {"id":3, "order_number":"aw0003", "description": "toys", "amount": 22.75} ] }, { "id": 2, "first_name": "Bruce", "last_name": "Lee", "dob": "1940-11-27" "orders":[ {"id":4, "order_number":"aw0004", "description": "books", "amount": 100.00}, {"id":5, "order_number":"aw0005", "description": "magazines", "amount": 55.00}, {"id":6, "order_number":"aw0006", "description": "toys", "amount": 32.75} ] }, { "id": 3, "first_name": "Chuck", "last_name": "Norris", "dob": "1940-03-10" "orders":[ {"id":7, "order_number":"aw0007", "description": "books", "amount": 110.00}, {"id":8, "order_number":"aw0008", "description": "magazines", "amount": 65.00}, {"id":9, "order_number":"aw0009", "description": "toys", "amount": 42.75} ] }]
What’s going on here is that in the inner SELECT
orders statement we utilize the array_agg
to aggregate the SQL results of the customer orders table, which is predetermined by looking at the most current iteratee customer.id
that’s supplied by outer SELECT
customers statement. You can think of it as a form of lazy or eager evaluation when all the customer’s records are in the running state of being evaluated by the above statement, the SELECT
orders statement will also get executed in parallel during this evaluation.
At each customer JSON object level, when customer orders have been successfully queried, we want the same results to be converted back into a JSON array of results so we utilize array_to_json
method that simply does that, and gave its property name as orders
.
That’s it!
That’s how you get to produce embedded JSON resultset using JSON query tools within another JSON query tools as above.
The great thing about such setup is that this type of written query is more performant than having two separate SQL statement calls made in the ORM layer. By doing that, you will notice an increase I/O time of reading the disk layer of the database between two such calls.
With the above, you only get to execute one SQL statement so you never need to worry about extended database call duration.
So there you have it, folks!
JSON query tools are now available at your disposal for all your complex and performance-based queries in relational database environments. This includes everything from MySQL, SQLite, Oracle DB, Derby, Amazon Aurora to many more!
But one word of caution - should you decide to go down this path of development, you may make subconscious tradeoff with performance for readability in the code as you could end up with a big SQL statements that have few layers of embedded queries so make to use these features wisely when producing JSON results from table-structured queries.
Give them a go. ^_^
Till then, Happy Coding!
]]>Built to scale as they say in the world of startups and venture capital funding.
That product can be anything from a simple portfolio website for an artists/singer, a basic space invaders game for kids to play online, to building high-grade commercial e-commerce system for thousands, if not millions of online customers to interact and use worldwide, or perhaps build the next Facebook-scaled size social media platform!
These atypical software products we’re so used to building can vary in size. A product can do one or several simple things. Or a product that makes up so many moving parts that are, rightfully so, considered as components that do very complex jobs on its own. Thus the same product is a behemoth size project so you got think how a lone developer is going to meander through the layers of architecture ensuring that all of these components can work with each other in which they primarily function or not.
Thus it brings to my attention on this very important subject matter - using dependency injection as one of your core software design principles.
Before we get into that terminology, let’s figure what dependency injection really means based on this wiki quote.
In software engineering, dependency injection is a technique whereby one object supplies the dependencies of another object. A “dependency” is an object that can be used, for example, a service. Instead of a client specifying which service it will use, something tells the client what service to use. The “injection” refers to the passing of a dependency (a service) into the object (a client) that would use it.
To better explain the above, let’s use the following code setup as a way to demonstrate.
When class A uses some functionality of class B, then it’s said that class A has a dependency of class B.
/* Illustration of Class A depends on Class B*/// A.jsimport B;class A { b = null; constructor() { this.b = new B() } // now we have A's methods that depend of B methods/properties doSomethingWith = (params = {}) =>{ if(b.foo) { b.doSomething(params) } } // more A's methods actionSomethingWith = (params = {}) => { if(b.bar) { b.actionSomethingWith(params); } }}// B.jsClass B { let foo, bar; constructor() { //some properties to go in here. this.foo = 1; this.bar = 1 }}const A = new A();const B = new B();A.doSomethingWith({a:”foo”, b:”bar”})A.actionSomethingWith({1:”!”,2:”@“})
Now with the above example here, we have two distinct classes that communicate with one another.
Here, A is dependent on B’s methods, inputs and properties etc to do some important tasks. In the majority of cases, this is fine as it stands if these snippets of code are not poised to be tested or changed frequently in the future.
But more often than not, it’s not always the case so the code needs to evolve. Thus, the problem becomes more apparent when you start to change internal behaviour of class B. If any of the properties of class B is modified such foo
or bar
are renamed or its methods signature are modified or removed, then the impact of its changes will affect class A for the whole lot as well because everywhere in A has to be changed as we have hardcoded B’s dependency into A.
If you imagine, should we write software like this for every class that depends on each other for performing certain functions or methods such as A is dependent on B, B is dependent on C, C is dependent on D, D is dependent on E etc, etc. This would become an utter nightmare to maintain and change over time thus doing this way leaves you no room for flexibility of swapping or removing dependencies in between, at will.
Thus, what we have gotten ourselves here is an obvious tight coupling of sub-systems with one other. Thus, in usual cases, having tightly coupled systems does not bode for well future incremental changes smoothly over time. By that time, you, as a software engineer, wanted to make several big modifications to the system, you’re more likely going to have to re-architect the entire system inside out just as you first started building it from scratch - day one!
This is not the ideal situation to be in.
Hence, for a long time, in the tech industry, we have a number of battled-tested industry-standard software patterns that deals with this to make our software grown to scale and change incrementally and progressively. And one of them is Dependency injection (DI).
So the question beckons - why does DI matter here?
The long-winded but simple answer to this question is when you have lots of classes that need to talk to each other (such as above), want to keep your code organised, and make it easy to swap different types of objects in/out of your application that gets to communicate with one another at run time.
DI is the main person that does all the work for you.
The key benefit of using this pattern is to encourage loose coupling. Objects can be added(or removed) and tested independently of other objects because they don’t depend on anything other than what you pass onto them. When using traditional dependencies, to test an object you have to create an environment where all of its dependencies exist and are reachable before you can test it. In DI, you can test an object in isolation by passing mock objects for the ones you don’t need or want to create.
At present, our DI pattern strategies come in three types to choose from.
To illustrate these types, let us use Car analogy for this.
First one is that dependencies are provided through a class constructor
/* Constructor Injection DI example*/function Piston (energy, size, noOfPistons) { this.energy = energy; this.size = size; this.noOfPistons = noOfPistons;}function Car (wheels, engine) { this.wheels = wheels; this.engine = engine;}Car.prototype.start = function() { if(this.engine) { this.engine.start(); }}Car.prototype.move = function(direction) { If(this.wheels) { this.wheels.move(direction); }}function Engine (pistons) { this.pistons = pistons}Engine.prototype.start() = function() { if(this.pistons) { console.log('Engine is starting...') }}function Wheels(size, shape, noOfWheels) { this.size = size; this.shape = shape; this.noOfWheels = noOfWheels;}Wheels.prototype.move = function(direction) { console.log('Wheels are moving + ' + direction);}let wheels = new Wheels(8, 'round', 3)let pistons = new Pistons(400, 10, 10)let engine = new Engine(pistons)let car = new Car(wheels, engine);car.start(); // it will say Engine is starting as long as the car got a working engine that's got some working pistons as well.car.move('forward'); // it will say Wheels are moving forward as long as the car has wheels
The example above illustrates that our dependencies injected through class constructor. As far as our above code section is written in Javascript, we got Car constructor that has been injected with Wheels and Engine dependency, along with Engine constructor injected with Pistons dependency. The dependency creation is no longer hardcoded on the constructor level. The constructor classes are no longer responsible for instantiating dependencies’ attributes upon demand. Thus it does not care how they’re built in the first place. They are just required as they see fit.
The second type, as the name suggests, the client exposes a setter method that injector uses to inject the dependency. By having injector we mean the DI guy, as mentioned earlier part of this post. To accomplish this, you need some of a decent injector library that does the work underneath whose main task is to register all the dependencies to one or several external client code and supply them on demand during run time.
To illustrate this, here’s the following example written in Javascript.
/* Setter Injection DI example */Car.prototype.setEngine = function(engine) { this.engine = engine; return this;}Car.prototype.setWheels = function(wheels) { this.wheels = wheels; return this;}Engine.prototype.setPistons() = function(Pistons) { this.pistons = pistons; return this;}//Introducing our injector personnel..const the_di_guy = require('some-di-injector-library');the_di_guy .register('piston') .as(Piston) .withConstructor() .params().val(400,10,10) .register('engine') .as(Engine) .withConstructor() .withProperties() .func('setPistons') .param().ref('pistons') .register('wheels') .as(Wheels) .withConstructor() .params().val(8, 'round', 3) .register('car') .as(Car) .withConstructor() .withProperties() .func('setEngine') .param().ref('engine') .func('setWheels') .param().ref('wheels')
Here we have Car and Engine objects use setter methods in injecting their required dependencies and returns their prototype function methods respectively. Why are we doing this? We’re doing this because our imaginary DI library here looks at all available prototype methods provided by various modules or sub-systems it’s fully aware of after registering them within the environment thus inject dependencies based on the setter injection rules we set up in beginning.
Lastly, this is where the dependency provides an injector method that will inject the dependency into any client passed to it. Client must implement an interface that exposes a setter method that accepts the dependency.
Unfortunately, at the time of writing, there’s no such pattern exists in the JS world as JS itself does not do interfaces compared to other traditional object-oriented programming languages such as Java and C#. JS and any other dynamically-type languages do not require interfaces because they’re ‘replaced’ with late binding/duck typing. For actual examples of interface injection done in Java, it would look something like this.
// for the carpackage some.package;public class Car implements EngineMountable { private Engine engine; private Wheels wheels; @Override //dependency injection public void setEngine(Engine engine){ this.engine = engine; } @Override //dependency injection public void setWheels(Wheels wheels){ this.wheels = wheels; }}public interface EngineMountable { void setEngine(Engine engine);}public interface WheelsMountable { void setWheels(Wheels wheels);}
//for the enginepublic class Engine implements PistonsMountable { private Pistons; @Override //dependency injection public void setPistons(Pistons pistons){ this.pistons = pistons; }}public interface PistonsMountable { void setPiston(Pistons pistons);}``` Now, at this point, you may be thinking. If you come from the world of dynamic languages like Javascript for a considerable time, and the fact is many of the DI information presented are, in actual fact, the prime use cases used in the traditional object-oriented languages like Java and C# really, you might start to beg this question....Does DI truly still pose relevance to this modern age of software architecture design moving forth, considering how many JS and NodeJS frameworks are made every day of the week?The answer is yes and no.It depends on who and what you've heard from the software developer veteran community lately and which programming "tribe" you belong to.For eg, let's take on two popular JS frameworks; React and Angular.Let's start with Angular cause it's the easier and most likely candidate that benefits from DI.By in large, Angular is itself, a DI framework if not just the framework to build client-side apps all the time. Because Di is built-in, Angular has the robust structure of how injectors should allow dependencies to be loaded (or removed) for certain components that require such dependencies' services. Angular provides a mechanism for components to delegate certain tasks for certain services that they should not have any concerns about other than just actioning optimal user experiences. These delegated services are phrased as injectable service classes.For some examples, here are the common uses written in Typescript.``` javascript// Logger class service in Typescriptexport class Logger { log(msg: String) { console.log(msg);} error(msg: String) { console.error(msg);} warn(msg: String) { console.warn(msg);}}
Somewhere in the app, Logger class is used.
//A service class that does funky workexport class SomeFunkyService { // services instances are injected here via constructor constructor(private logger: Logger, private backendService: BackendService){} // we can pretend it makes good funky ice cream k yeah? =) makeMeAFunkyIceCream(flavours) { this.backendService.makeFunkyIceCream(flavours).then( (icecream: IceCream) => { this.logger.log('Ice cream with funky' + flavours + 'is made. Wahoo!!') return icecream; } }}
Here, you notice it the way Logger service is injected uses the constructor injection strategy as per our examples earlier.
Not much different concept. The difference is that Angular has its built-in injector that looks at all available instances of dependencies during its boot-up time. These dependencies are created through registration which is called a provider. The way providers get registered can be done in several ways due to the hierarchical setup of Angular injector system, ie it can be provided at root, the module level or the component level etc. Which is the core reason why Angular is extremely attractive to back-end developers written in static languages ie Java/C#/etc for a long while now.
For the simplicity and scope of the post, we use the root level approach for this injection, which is simply.
@Injectable({ providedIn: 'root'})export class Logger{}
That’s DI in the Angular in a nutshell!
Now how about React? Does React have common DI concept to embrace as well?
Apparently… it does not! Why? Because it doesn’t need to.
It just handles dependencies on-demand creation differently.
Thank a look at this small snippet React code:
const ProductReviewList = props => ( <List resource="reviews" perPage={50} {...props}> <Datagrid rowClick="edit"> <DateField source="date" /> <CustomerField source="customer_id" /> <ProductField source="product_id" /> <RatingField source="rating" /> <TextField source="body" label="Comment"/> <StatusField source="status" /> </Datagrid> </List>)
If you’ve worked with React for some time, we always talk about building components and each component that can be made up of other components through parent-child-sibling hierarchy relationship such as List
, DataGrid
and DateField
components etc.
In the object-oriented world, we treat components as objects thus in order for components that depend on one another through the use of constructor and injector setup. But in the world of React, this is not necessarily the case, because you’ll often find that the child component does not always depend on the structure of the parent components such as Datagrid
and List
components respectively to perform its necessary duties. Here, we’re saying we have List
component has some data that comes through props. But it does not display that same data as a list of reviews. Rather, it delegates the rendering work to the Datagrid
component. We’ve injected our dependencies through the powers of composition without needing for injectors or constructor/setter injectors like other object-oriented languages do.
The beauty with this setup is that you can easily swap dependency around at different places without worrying too much the consequence of changing the hierarchy order of the dependencies between components.
Thus with the List
above, I can replace the Datagrid
component to something like CardView
component;
const ProductReviewCard = props => ( <List resource='products' perPage={10}> <CardView mainHeading={review => <Card record={review}/>} mainBody={review => review.body} > </List>)
All of these are possible thanks to the powerful concepts such as JSX and props patterns, and many other core React patterns that allows you to encapsulate better dependency management in the React world vs more traditionally object-oriented based system design like Angular/C#/Java etc, etc.
So there you have it, folks!
That’s what Dependency Injection is all about.
In summary, the key long-term benefits of using DI are stressed as follows:
Especially the last point is very important as the software design always keep changing to scale and evolve nowadays. They rarely ever stay in a fixed hence tight-coupling systems has no place for them.
However, one thing I failed to explain earlier in this post is to find out when do we really need not care about using DI when designing our software architecture.
Well.
The simple answer lies in the assumption that how many subsystems or modules we know are going to be frequently changed/swapped/configured upfront during its implementation. If they’re not highly configurable, reusable and unit testing has no place for concern or there’s little need to worry about coupling, then we don’t need to hire DI middle guy to do the work at all. It can just work fine on its own.
And it depends on the type of programming languages you work with intimately as you’re probably aware that DI’s inseparably hugely popular for classic object-oriented statically typed programming languages such as Java and C#. But not very much so for dynamically typed languages such as JS and Python. You can read more about it here on this Stackoverflow link.
Disclaimer:
On a final note, I’m do not claim myself as a DI/IOC designer nor am I subject matter expert on it. I’m just an ordinary software guy who’s got the knack and avid curiosity about the software world and how everything works from reading forums to build things in my own spare time which allows me to share my knowledge and learnings from my past React/NodeJS/Angular project work that has DI built inside with everyone keen to understand how difficult concepts such as these can fully be grasped with simple use case examples people online can relate with. Software development/engineering with the forethoughts of agile principles is always part of the software craftsman journey that truly never ends. That’s the fun part of it. :)
Till then, Happy Coding!
PS: Useful References that inspired me to write up this post to document my learning process.
These stale branches include:
With those in mind, these will build up over time thus we need to periodically remove them as needed before we start working on other new feature branches.
When finding out which branches have been merged to the master, you run the following:
git branch --merged master
So we know which branches can we safely remove them on our local machine, thus you would think we should perform git branch -d *branch_to_delete*
right?
Which is fine, if you have a few branches to deal with. But what if, you have more than 10, 15, 20 or even over 100 to deal with? Wouldn’t that take forever to complete such simple trimming operation? You wouldn’t think it’s worth such menial effort to do so we need to automate this.
By tweaking in our previous commands, we do this instead
git branch --merged master | xargs git branch -d
This piping command line is pretty neat! What this is saying is as we list out all branches merged to master to be outputted to a usually terminal screen, they will be served as input parameters for our next CLI command to absorb and execute by using the pipe |
operator, followed by xargs
command. xargs
is a Unix utility belt to accept standard inputs for any command we like to execute next in our piping chain. As for this case, we’re effectively doing git branch -d **branch_name**
without having to type out branch_name
every single time. xargs
takes care of branch_name
input parameter for us as long as the preceding command git branch --merged master
does not run into any problems fetching valid parameters.
And that’s it!
But there’s only little catch here - the above command not only delete the merged branches (as we originally prescribed it), but we’re also deleting the main master branch as well if we run this. Why would it do such careless thing like that?
If you look carefully at this command line:
git branch --merged master
This would also output master
branch, along with other feature branches locally. As master
branch usually (and always) get merged at some constant basis by you or your team members, thus it satisfies the above condition and it will get used as an input param for your xargs
to perform the next thing - which is to delete master
branch in this case.
Which is not we want!
Thus to prevent this, we write our safeguard code to do the following:
git branch --merged master | grep -v 'master$' | xargs git branch -d
We place grep
command into the pipeline chain between xargs
and git branch --merged master
pieces. What this one does is to filter out any inputs that match master$
keyword using v
flag option. Thus you will expect is that master
is not amongst the list of merged branches to be earmarked for deletion.
There you have it!
This is how you want to maintain your local branches for your git repository.
That’s on one side of the fence.
On the other side of the fence, you also need to deal with the remote repository as well.
The command line is to perform such operation would look like this.
git branch -r --merged master | sed 's/ *origin \///' | grep -v 'master$' | cut -d/ -f2- | xargs -n 1 git push --delete origin
This is more or less the same with previous local branch deletion operation, but it’s a little more involved - which I’ll explain a bit.
The clause here
sed 's/ *origin \///' | grep -v 'master$' | cut -d/ -f2-
This is saying we want our matches to, not only ignore any remote repo suffix that ends with master
, but also specifically fit any remote branches that have origin
prefixes, as most git-based repository system uses origin
as the default remote repository end-point. Once those matching conditions are satisfied for branch name searches we want, we then remove origin from the branch name before it gets piped for deletion.
That’s it!
Bear in mind - if you work in a sizeable team where you have more than just one main branch such as eg develop
, then you may want to tweak grep
and sed
commands to accommodate for extra key branches you want to keep before getting things purged.
Finally, once you’re done and contented with the above setup, you can then create bash aliases for these like so:
alias gitdlb="git branch --merged master | grep -v 'master$' | xargs git branch -d"alias gitdrb="git branch -r --merged master | sed 's/ *origin \///' | grep -v 'master$' | cut -d/ -f2- | xargs -n 1 git push --delete origin"
Hope you’ve learned something useful to automate little useful things like this on a day to day basis.
Till next time - Happy Coding!
]]>To kick off the year with a big bang, let’s start with today’s post.
From my previous post little over ago or so, I discussed the importance of knowing data structures such as arrays, stacks, queues etc every good software developer/engineer must grasp. In this post, I will be covering topics on the other not-so-common data structures that we normally (or always, should I say) use when implementing our algorithms.
They are follows:
##Data Structures
###a) Trees
A binary tree is a tree whose elements have at most 2 children. Each element in a binary tree can have only 2 children thus we typically name them left and right nodes respectively. Typically a tree node has the following properties
The top most node in the tree is called the root. Every node (except the root node) in a tree is connected by a directed edge from exactly one node to another. This node is called the parent node. Sometimes, a node can have more than 2 connected nodes to itself. Nodes with no children are called leaves or external nodes. Nodes which are not leaves are called internal nodes.
Next, we have a special kind of binary tree called the Binary Search Tree (BST). This tree is mainly used for storing repository that offers efficient ways of sorting, searching and retrieving data.
A BST is a binary tree where nodes are ordered in the following way:
Finally, we have a binary tree traversal which is a process that visits all the nodes in the trees. When traversing trees, we consider a couple of traversal search approaches for doing such exercise.
Using depth first search approach, we have the following types of traversals to pick on:
With breadth-first search, there’s one type of traversal method which is level order traversal. This type of traversal visits nodes by levels from top to bottom and from left to right.
Binary Trees and Binary Search Trees .
Binary Tree Traversal.
All popular programming languages I know or aware of supports them - such as the following.
####Java
import java.util.*;public class BinaryTree { Node root; public void add(int value) { root = addRecursive(root, value); } private Node addRecursive(Node current, int value) { if (current == null) { return new Node(value); } if (value < current.value) { current.left = addRecursive(current.left, value); } else if (value > current.value) { current.right = addRecursive(current.right, value); } return current; } public boolean isEmpty() { return root == null; } public int getSize() { return getSizeRecursive(root); } private int getSizeRecursive(Node current) { return current == null ? 0 : getSizeRecursive(current.left) + 1 + getSizeRecursive(current.right); } public boolean containsNode(int value) { return containsNodeRecursive(root, value); } private boolean containsNodeRecursive(Node current, int value) { if (current == null) { return false; } if (value == current.value) { return true; } return value < current.value ? containsNodeRecursive(current.left, value) : containsNodeRecursive(current.right, value); } public void delete(int value) { root = deleteRecursive(root, value); } private Node deleteRecursive(Node current, int value) { if (current == null) { return null; } if (value == current.value) { // Case 1: no children if (current.left == null && current.right == null) { return null; } // Case 2: only 1 child if (current.right == null) { return current.left; } if (current.left == null) { return current.right; } // Case 3: 2 children int smallestValue = findSmallestValue(current.right); current.value = smallestValue; current.right = deleteRecursive(current.right, smallestValue); return current; } if (value < current.value) { current.left = deleteRecursive(current.left, value); return current; } current.right = deleteRecursive(current.right, value); return current; } private int findSmallestValue(Node root) { return root.left == null ? root.value : findSmallestValue(root.left); } public void traverseInOrder(Node node) { if (node != null) { traverseInOrder(node.left); System.out.print(" " + node.value); traverseInOrder(node.right); } } public void traversePreOrder(Node node) { if (node != null) { System.out.print(" " + node.value); traversePreOrder(node.left); traversePreOrder(node.right); } } public void traversePostOrder(Node node) { if (node != null) { traversePostOrder(node.left); traversePostOrder(node.right); System.out.print(" " + node.value); } } public void traverseLevelOrder() { if (root == null) { return; } Queue<Node> nodes = new LinkedList<>(); nodes.add(root); while (!nodes.isEmpty()) { Node node = nodes.remove(); System.out.print(" " + node.value); if (node.left != null) { nodes.add(node.left); } if (node.left != null) { nodes.add(node.right); } } } class Node { int value; Node left; Node right; Node(int value) { this.value = value; right = null; left = null; } }}
####Python
#!/usr/bin/pythonclass Node: def __init__(self,info): self.info = info self.left = None self.right = None self.level = None def __str__(self): return str(self.info) class BinarySearchTree: def __init__(self): #constructor of class self.root = None def create(self,val): #create binary search tree nodes if self.root == None: self.root = Node(val) else: current = self.root while 1: if val < current.info: if current.left: current = current.left else: current.left = Node(val) break; elif val > current.info: if current.right: current = current.right else: current.right = Node(val) break; else: break def inorder(self,node): if node is not None: self.inorder(node.left) print node.info self.inorder(node.right) def preorder(self,node): if node is not None: print node.info self.preorder(node.left) self.preorder(node.right) def postorder(self,node): if node is not None: self.postorder(node.left) self.postorder(node.right) print node.info
####Ruby
#!/usr/bin/rubyclass BinarySearchTree class Node attr_reader :key, :left, :right def initialize( key ) @key = key @left = nil @right = nil end def insert( new_key ) if new_key <= @key @left.nil? ? @left = Node.new( new_key ) : @left.insert( new_key ) elsif new_key > @key @right.nil? ? @right = Node.new( new_key ) : @right.insert( new_key ) end end end def initialize @root = nil end def insert( key ) if @root.nil? @root = Node.new( key ) else @root.insert( key ) end end def in_order(node=@root, &block) return if node.nil? in_order(node.left, &block) yield node in_order(node.right, &block) end def pre_order(node=@root, &block) return if node.nil? yield node pre_order(node.left, &block) pre_order(node.right, &block) end def post_order(node=@root, &block) return if node.nil? post_order(node.left, &block) post_order(node.right, &block) yield node end def search( key, node=@root ) return nil if node.nil? if key < node.key search( key, node.left ) elsif key > node.key search( key, node.right ) else return node end end def check_height(node) return 0 if node.nil? leftHeight = check_height(node.left) return -1 if leftHeight == -1 rightHeight = check_height(node.right) return -1 if rightHeight == -1 diff = leftHeight - rightHeight if diff.abs > 1 -1 else [leftHeight, rightHeight].max + 1 end end def is_balanced?(node=@root) check_height(node) == -1 ? false : true endend
####Javascript
class Node { constructor(data, left, right) { this.data = data; this.left = left; this.right = right; } insert(new_data) { if new_data <= this.data { if (this.left == null ?) { this.left = new Node( new_data ) } else { this.left.insert( new_data ) } } else if (new_data > this.data) { if(this.right == null) { this.right = new Node( new_data ) } else{ this.right.insert( new_data ) } } }}class BinarySearchTree { constructor() { this.root = null; } insertNode(new_data) { if (this.root == null) { this.root = new Node(new_data) } else { this.root.insert(new_data) } } inOrderTraversal(node) { if(node == null) { return } in_order(node.left) console.log(node) in_order(node.right) } preOrderTraversal(node) { if(node == null) { return } console.log(node) inOrderTraversal(node.left) inOrderTraversal(node.right) } postOrderTraversal(node) { if(node == null) { return } inOrderTraversal(node.left, &block) inOrderTraversal(node.right, &block) console.log(node) } search(data, node ) { if(node == null) { return null; } if (data < node.data) { search(data, node.left) } else if(data > node.data) { search(key, node.right) } else { return node; } } checkHeight(node) { if(node == null) { return 0; } leftHeight = checkHeight(node.left) if (leftHeight === -1) { return -1; } rightHeight = checkHeight(node.right) if (rightHeight === -1) { return -1; } diff = leftHeight - rightHeight if Math.abs(diff) > 1 return -1 else return Math.max(leftHeight, rightHeight) + 1 end } isBalanced(node){ return (checkHeight(node) === -1) ? false : true }}
####C++
#include <iostream>#include <stdio.h>#include <stdlib.h>struct node { int data; struct node *left, *right;};struct node *newNode(int data) { struct node *temp = (struct node *)malloc(sizeof(struct node)); temp->data = data; temp->left = temp->right = NULL; return temp;}void inorder(struct node *root) { if(root != NULL) { inorder(root->left); printf(“%d \n”, root->key); inorder(root->right); }} void preorder(struct node *root) { if(root != NULL) { printf(“%d \n”, root->key); preorder(root->left); preorder(root->right); }}void postorder(struct node *root) { if(root != NULL) { postorder(root->left); postorder(root->right); printf(“%d \n”, root->key); }}struct node* insert(struct node* new_node, int new_data) { if(new_node == NULL) return newNode(new_data); if(new_data < new_node->key) new_node-> left = insert(new_node->left, new_data); else if(new_data > new_node->key) new_node->right = insert(new_node->right, new_data) return new_node;}int getHeight(struct node *current_node) { if(current_node) return 0; return 1 + max(getHeight(current_node->left), getHeight(current_node->right)); }bool isBalanced(struct node *current_node) { if(!current_node){ return false; } int leftHeight = getHeight(current_node->left); int rightHeight = getHeight(current_node->right); if (abs(leftHeight - rightHeight) > 1) return }
####PHP
<?php class Node { public $info; public $left; public $right; public $level; public function __construct($info) { $this->info = $info; $this->left = NULL; $this->right = NULL; $this->level = NULL; } public function __toString() { return "$this->info"; } } class SearchBinaryTree { public $root; public function __construct() { $this->root = NULL; } public function create($info) { if($this->root == NULL) { $this->root = new Node($info); } else { $current = $this->root; while(true) { if($info < $current->info) { if($current->left) { $current = $current->left; } else { $current->left = new Node($info); break; } } else if($info > $current->info){ if($current->right) { $current = $current->right; } else { $current->right = new Node($info); break; } } else { break; } } } } public function traverse($method) { switch($method) { case 'inorder': $this->_inorder($this->root); break; case 'postorder': $this->_postorder($this->root); break; case 'preorder': $this->_preorder($this->root); break; default: break; } } private function _inorder($node) { if($node->left) { $this->_inorder($node->left); } echo $node. " "; if($node->right) { $this->_inorder($node->right); } } private function _preorder($node) { echo $node. " "; if($node->left) { $this->_preorder($node->left); } if($node->right) { $this->_preorder($node->right); } } private function _postorder($node) { if($node->left) { $this->_postorder($node->left); } if($node->right) { $this->_postorder($node->right); } echo $node. " "; }}
###b) Graphs
Next, we have a graphs.
Like trees, a graph is a non-linear data structure that consist of nodes and edges. The nodes are sometimes called vertices and edges are called lines or arcs for there may be more than two nodes connected to each other in the graph.
Graphs are used to solve many real-life problems. Graphs are used to represent networks. The networks may include paths in a city or telephone network or circuit network. Graphs are also used in social networks like LinkeIn, Facebook,
####Java
import java.util.*;public class Vertex{ private int label; Vertex(String label) { this.label = label; } //equals and hashCode @Override public boolean equals(Object obj) { if(this == obj) return true; if(! (obj instanceof Vertext)) return false; Vertex _obj = (Vertex) obj; return _obj.label = this.label; } @Override public int hashCode() { return label; } @Override public int getLabel(){ return label; } @Override public void setLabel(int newValue){ this.label = newValue; }}public class Graph { private Map<Vertex, List<Vertex>> adjVertices; public Graph() { adjVertices = new Map<>() } public void addVertex(int label) { adjVertices.add(new Vertex(label), new ArrayList<>()); } public void removeVertex(int label) { Vertex v = new Vertex(label); adjVertices.values() .stream() .map(e -> e.remove(v)) .collect(Collectors.toList()); adjVertices.remove(new Vertex(label)); } public void addEdge(int label1, int label2 ){ Vertex v1 = new Vertex(label1); Vertex v2 = new Vertex(label2); adjVertices.get(v1).add(v2); adjVertices.get(v2).add(v1); } public void removeEdge(int label1, int label2) { Vertex v1 = new Vertex(label1); Vertex v2 = new Vertex(label2); List<Vertex> eV1 = adjVertices.get(v1); List<Vertex> eV1 = adjVertices.get(v1); if (eV1 != null) eV1.remove(v2); if (eV2 != null) eV2.remove(v1); } public List<Vertex> getAdjVertices(int label) { return adjVertices.get(new Vertex(label)); }}
####Python
#!/usr/bin/pythonclass Vertex: def __init___(self, label): self.label = label def equals(self): return self.data def hash_code(self): return self.data def get_label(self): return self.data def set_label(self, new_label): self.label = new_labelclass Graph: def __init__(self, adj_vertices): if adj_vertices == None: adj_vertices = {} self.adj_vertices = adj_vertices def add_vertex(self, label): vertex = Vertex(label) if vertex not in self.adj_vertices: self.adj_vertices[vertex] = [] def remove_vertex(self, label): vertex = Vertex(label) if vertex in self.adj_vertices: del self.adj_vertices[vertex] def add_edge(self, edge): edge = set(edge) (vertex1, vertex2) = tuple(edge) if vertex1 in self.adj_vertices: self.adj_vertices[vertex1].append(vertex2) else: self.adj_vertices[vertex1] = [vertex2] def remove_edge(self, edge): edge = set(edge) (vertex1, vertex2) = tuple(edge) if vertex1 in self.adj_vertices: del self.adj_vertices[vertex1] if vertex2 in self.adj_vertices: del self.adj_vertices[vertex2] def vertices(self): return list(self.adj_vertices.keys())
####Ruby
#!/usr/bin/rubyclass Vertex attr_accessor :label def initialise(label) @label = label end def hash_code @label endendclass Graph attr_accessor :adj_vertices def initialize() @adj_vertices = [] end def add_vertex(label) @adj_vertices << Vertex.new(label) end def remove_vertex(label) vertex = Vertex.new(label) if @adj_vertices.include? vertex @adj_vertices.delete(vertex) end end def add_edge(label1, label2) vertex1 = @adj_vertices.index { |v| v.label == label1} vertex2 = @adj_vertices.index { |v| v.label == label2} if @adj_vertices.include? @vertex1 @adj_vertices[@vertex1] << vertex2 else @adj_vertices[@vertex1] = [vertex2] end end def remove_edge(label1, label2) vertex1 = @adj_vertices.index { |v| v.label == label1} vertex2 = @adj_vertices.index { |v| v.label == label2} if @adj_vertices.include? @vertex1 @adj_vertices.delete(vertex1) end if @adj_vertices.include? @vertex2 @adj_vertices.delete(vertex2) end end def vertices @vertices.length endend
####Javascript
class Vertex { constructor(label) { this.label = label; this.edges = {}; } hasCode() { return this.label; }}class Graph { constructor(adjVertices) { this.adjVertices = adjVertices; } addVertex(label) { if(!this.adjVertices[label]) { this.adjVertices[label] = new Vertex(label); } } removeVertex(label) { if(this.adjVertices[label]) { delete this.adjVertices[val]; Object.keys(this.adjVertices).forEach( (key, index) => { if(this.adjVertices[key].edges[label] ){ delete this.adjVertices[key].edges[val]; } }).bind(this); } } addEdge(from, to) { if(this.adjVertices[from] && this.adjVertices[to]) { if(this.adjVertices[from].edges[to]) { this.adjVertices[from].edges[to].weight +=1; } else { this.adjVertices[from].edges[to] = {weight: 1} } } } removeEdge(from, to) { if(this.adjVertices[from] && this.adjVertices[to]){ if(this.adjVertices[from].edges[to]) { delete this.adjVertices[from].edges[to]; } } } getTotalVertices(label) { return this.adjVertices[label]; }};
####C++
#include <iostream>using namespace std;struct Vertex { int label; Vertex* next;};struct Edge { int from, to;}class Graph { Vertex* getAdjListNode(int to, Vertex* head) { Vertex* newVertex = new Vertex; newVertex->label = to; newVertex->next = head; return newVertex; } int N;}public: Vertex **head; Graph(Edge edges[], int n, int N) { head = new Vertex*[N](); this->N = N; for(int i = 0; i < N; i++) head[i] = nullptr; for(unsigned i = 0; i < n; i++) { int from = edges[i].src; int dest = edges[i].dest; Vertex* newVertex = getAdjListNode(dest, head[src]); head[src] = newVertex } }
####PHP
<?phpclass Vertex { public $label; function __construct($label) { $this->label = $label; } function getHashcode() { return $this->label; }}class Graph { private $adjVertices; function __construct() { $this->adjVertices = array(); } public function addVertex($label) { if( !in_array($this->adjVertices, $label) ){ $this->adjVertices[$label] = new Vertex($label) } } public function removeVertex($label) { if( in_array($this->adjVertices, $label)) { unset($this->adjVertices[$label]) } } public function addEdge($label1, $label2) { $vertex1 = new Vertex($label1) $vertex2 = new Vertex($label2) if(in_array($this->adjVertices, $label)) { $this->adjVertices[$label].append($vertex2) } else { $this->adjVertices[$label] = $vertex2 } } public function removeEdge($label1, $label2) { $vertex1 = new Vertex($label1) $vertex2 = new Vertex($label2) if(in_array($this->adjVertices, $label1)) { unset($this->adjVertices[$label]) } if(in_array($this->adjVertices, $label2)) { unset($this->adjVertices[$label2]) } } public function getTotalVertices() { return count($adjVertices) }}?>
###c) Heaps
Another data structure that is commonly used. A heap is a special tree-based data structure in which the tree is a complete binary tree. Morever the tree can only be described as a complete as long as it satisfied the heap property condition such as the parent node’s key has an equal value of one of its children node’s key.
They are two main types of heaps here.
Max-Heap: In a max-heap, the key present at the root of the tree must be the greatest among the keys present at all of it’s children. The same property must be recursively true for all sub-trees in that same binary tree.
Min-Heap: In a min-heap, the key present at the root of the tree must be the lowest among the keys present at all of its children. The same property must be recursively true for all sub-trees in that same binary tree.
Here are the programming languages’ respective heaps mplementations.
####Java
import java.util.*;public class Heap { /** The number of children each node has **/ private static final int d = 2; private int heapSize; private int[] heap; public BinaryHeap(int capacity) { heapSize = 0; heap = new int[capacity + 1]; Arrays.fill(heap, -1); } public boolean isEmpty() { return heapSize == 0; } public boolean isFull() { return heapSize == heap.length; } public int parent(int i) { return (i - 1)/ d; } private int kthChild(int i, int k) { return d * i + k; } public void insert(int x) { if(isFull()) throw new NoSuchElementException("Overflow Exception"); heap[heapSize++] = x; heapifyUp(heapSize - 1); } public int findMin() { if(isEmpty()) throw new NoSuchElementException("Underflow Exception"); return heap[0] } public int deleteMin() { int keyItem = heap[0]; delete(0); return keyItem; } public int delete(int ind) { if (isEmpty() ) throw new NoSuchElementException("Underflow Exception"); int keyItem = heap[ind]; heap[ind] = heap[heapSize - 1]; heapSize--; heapifyDown(ind); return keyItem; } private void heapifyUp(int childInd) { int tmp = heap[childInd]; while (childInd > 0 && tmp < heap[parent(childInd)]) { heap[childInd] = heap[ parent(childInd) ]; childInd = parent(childInd); } heap[childInd] = tmp; } private void heapifyDown(int ind) { int child; int tmp = heap[ ind ]; while (kthChild(ind, 1) < heapSize) { child = minChild(ind); if (heap[child] < tmp) heap[ind] = heap[child]; else break; ind = child; } heap[ind] = tmp; } private int minChild(int ind) { int bestChild = kthChild(ind, 1); int k = 2; int pos = kthChild(ind, k); while ((k <= d) && (pos < heapSize)) { if (heap[pos] < heap[bestChild]) bestChild = pos; pos = kthChild(ind, k++); } return bestChild; }}
####Python
#!/usr/bin/pythonclass Heap: def __init__(self): self.__heap = [] self.__last_index = -1 def push(self, value): self.__last_index += 1 if self.__last_index < len(self.__heap): self.__heap[self.__last_index] = value else: self.__heap.append(value) self.__siftup(self.__last_index) def pop(self): if self.__last_index == -1: raise IndexError('pop from empty heap') min_value = self.__heap[0] self.__heap[0] = self.__heap[self.__last_index] self.__last_index -= 1 self.__siftdown(0) return min_value def __siftup(self, index): while index > 0: parent_index, parent_value = self.__get_parent(index) if parent_value <= self.__heap[index]: break self.__heap[parent_index], self.__heap[index] =\ self.__heap[index], self.__heap[parent_index] index = parent_index def __siftdown(self, index): while True: index_value = self.__heap[index] left_child_index, left_child_value = self.__get_left_child(index, index_value) right_child_index, right_child_value = self.__get_right_child(index, index_value) if index_value <= left_child_value and index_value <= right_child_value: break if left_child_value < right_child_value: new_index = left_child_index else: new_index = right_child_index self.__heap[new_index], self.__heap[index] =\ self.__heap[index], self.__heap[new_index] index = new_index def __get_parent(self, index): if index == 0: return None, None parent_index = (index - 1) // 2 return parent_index, self.__heap[parent_index] def __get_left_child(self, index, default_value): left_child_index = 2 * index + 1 if left_child_index > self.__last_index: return None, default_value return left_child_index, self.__heap[left_child_index] def __get_right_child(self, index, default_value): right_child_index = 2 * index + 2 if right_child_index > self.__last_index: return None, default_value return right_child_index, self.__heap[right_child_index] def __len__(self): return self.__last_index + 1
####Ruby
#!/usr/bin/rubyclass Heap attr_accessor :heap_size, :array_rep def left_child(index) 2*index + 1 end def right_child(index) 2*index + 2 end def left_child_key(index) @array_rep[left_child(index)] end def right_child_key(index) @array_rep[right_child(index)] endend
####Javascript
function MinHeap() { this.data = [];}MinHeap.prototype.insert = function(val) { this.data.push(val); this.bubbleUp(this.data.length-1);};MinHeap.prototype.bubbleUp = function(index) { while (index > 0) { // get the parent var parent = Math.floor((index + 1) / 2) - 1; // if parent is greater than child if (this.data[parent] > this.data[index]) { // swap var temp = this.data[parent]; this.data[parent] = this.data[index]; this.data[index] = temp; } index = parent; }};MinHeap.prototype.extractMin = function() { var min = this.data[0]; // set first element to last element this.data[0] = this.data.pop(); // call bubble down this.bubbleDown(0); return min;};MinHeap.prototype.bubbleDown = function(index) { while (true) { var child = (index+1)*2; var sibling = child - 1; var toSwap = null; // if current is greater than child if (this.data[index] > this.data[child]) { toSwap = child; } // if sibling is smaller than child, but also smaller than current if (this.data[index] > this.data[sibling] && (this.data[child] == null || (this.data[child] !== null && this.data[sibling] < this.data[child]))) { toSwap = sibling; } // if we don't need to swap, then break. if (toSwap == null) { break; } var temp = this.data[toSwap]; this.data[toSwap] = this.data[index]; this.data[index] = temp; index = toSwap; }};
####C++
// A C++ program to demonstrate common Binary Heap Operations #include<iostream> #include<climits> using namespace std; // Prototype of a utility function to swap two integers void swap(int *x, int *y); // A class for Min Heap class MinHeap { int *harr; // pointer to array of elements in heap int capacity; // maximum possible size of min heap int heap_size; // Current number of elements in min heap public: // Constructor MinHeap(int capacity); // to heapify a subtree with the root at given index void MinHeapify(int ); int parent(int i) { return (i-1)/2; } // to get index of left child of node at index i int left(int i) { return (2*i + 1); } // to get index of right child of node at index i int right(int i) { return (2*i + 2); } // to extract the root which is the minimum element int extractMin(); // Decreases key value of key at index i to new_val void decreaseKey(int i, int new_val); // Returns the minimum key (key at root) from min heap int getMin() { return harr[0]; } // Deletes a key stored at index i void deleteKey(int i); // Inserts a new key 'k' void insertKey(int k); }; // Constructor: Builds a heap from a given array a[] of given size MinHeap::MinHeap(int cap) { heap_size = 0; capacity = cap; harr = new int[cap]; } // Inserts a new key 'k' void MinHeap::insertKey(int k) { if (heap_size == capacity) { cout << "\nOverflow: Could not insertKey\n"; return; } // First insert the new key at the end heap_size++; int i = heap_size - 1; harr[i] = k; // Fix the min heap property if it is violated while (i != 0 && harr[parent(i)] > harr[i]) { swap(&harr[i], &harr[parent(i)]); i = parent(i); } } // Decreases value of key at index 'i' to new_val. It is assumed that // new_val is smaller than harr[i]. void MinHeap::decreaseKey(int i, int new_val) { harr[i] = new_val; while (i != 0 && harr[parent(i)] > harr[i]) { swap(&harr[i], &harr[parent(i)]); i = parent(i); } } // Method to remove minimum element (or root) from min heap int MinHeap::extractMin() { if (heap_size <= 0) return INT_MAX; if (heap_size == 1) { heap_size--; return harr[0]; } // Store the minimum value, and remove it from heap int root = harr[0]; harr[0] = harr[heap_size-1]; heap_size--; MinHeapify(0); return root; } // This function deletes key at index i. It first reduced value to minus // infinite, then calls extractMin() void MinHeap::deleteKey(int i) { decreaseKey(i, INT_MIN); extractMin(); } // A recursive method to heapify a subtree with the root at given index // This method assumes that the subtrees are already heapified void MinHeap::MinHeapify(int i) { int l = left(i); int r = right(i); int smallest = i; if (l < heap_size && harr[l] < harr[i]) smallest = l; if (r < heap_size && harr[r] < harr[smallest]) smallest = r; if (smallest != i) { swap(&harr[i], &harr[smallest]); MinHeapify(smallest); } } // A utility function to swap two elements void swap(int *x, int *y) { int temp = *x; *x = *y; *y = temp; } // Driver program to test above functions int main() { MinHeap h(11); h.insertKey(3); h.insertKey(2); h.deleteKey(1); h.insertKey(15); h.insertKey(5); h.insertKey(4); h.insertKey(45); cout << h.extractMin() << " "; cout << h.getMin() << " "; h.decreaseKey(2, 1); cout << h.getMin(); return 0; }
####PHP
<?phpclass BinaryHeap{ protected $heap; public function __construct() { $this->heap = array(); } public function isEmpty() { return empty($this->heap); } public function count() { // returns the heapsize return count($this->heap) - 1; } public function extract() { if ($this->isEmpty()) { throw new RunTimeException('Heap is empty'); } // extract the root item $root = array_shift($this->heap); if (!$this->isEmpty()) { // move last item into the root so the heap is // no longer disjointed $last = array_pop($this->heap); array_unshift($this->heap, $last); // transform semiheap to heap $this->adjust(0); } return $root; } public function compare($item1, $item2) { if ($item1 === $item2) { return 0; } // reverse the comparison to change to a MinHeap! return ($item1 > $item2 ? 1 : -1); } protected function isLeaf($node) { // there will always be 2n + 1 nodes in the // sub-heap return ((2 * $node) + 1) > $this->count(); } protected function adjust($root) { // we've gone as far as we can down the tree if // root is a leaf if (!$this->isLeaf($root)) { $left = (2 * $root) + 1; // left child $right = (2 * $root) + 2; // right child // if root is less than either of its children $h = $this->heap; if ( (isset($h[$left]) && $this->compare($h[$root], $h[$left]) < 0) || (isset($h[$right]) && $this->compare($h[$root], $h[$right]) < 0) ) { // find the larger child if (isset($h[$left]) && isset($h[$right])) { $j = ($this->compare($h[$left], $h[$right]) >= 0) ? $left : $right; } else if (isset($h[$left])) { $j = $left; // left child only } else { $j = $right; // right child only } // swap places with root list($this->heap[$root], $this->heap[$j]) = array($this->heap[$j], $this->heap[$root]); // recursively adjust semiheap rooted at new // node j $this->adjust($j); } } }}
###d) Trie
In computer science, a trie is an ordered search tree that is used to store a dynamic set or associative array where the keys are usually strings. Like an ordered search tree or graph, trie is designed to determine the most efficient way of traversing and retrieving of data by mainly relying on the string prefixes.
It consists of nodes and edges much like graphs and trees. Each node consists of max 26 children and edges connect each parent node to its children. These 26 pointers are nothing but pointers for each of the 26 letters of English alphabet. A separate edge is maintained for every edge.
String are stored in a top to bottom fashion manner on the basis of their prefix in a trie. All prefixes of length 1 are stored at until level 1, all prefixes of length of 2 are stored at until level 2 and so on.
Again, here are the languages’ queue implementations.
####Java
import java.util.Arrays;public class Trie { static final int ALPHABET_SIZE = 26; static class TrieNode { TrieNode[] children = new TrieNode[ALPHABET_SIZE]; boolean isEndOfWord; TrieNode() { isEndOfWord = false; for(int i =0; i < ALPHABET_SIZE; i++) children[i] = null; } } static TrieNode root; static void insert(String key) { int level; int length = key.length(); int index; TrieNode pCrawl = root; for(level = 0; level < length; level++) { index = key.charAt(level) - 'a'; if(pCrawl.children[index] == null) pCrawl.children[index] = new TrieNode(); pCrawl = pCrawl.children[index]; } pCrawl.isEndOfWord = true; } static boolean search(String key) { int level; int length = key.length(); int index; TrieNode pCrawl = root; for(level = 0; level < length; level++) { index = key.charAt(level) - 'a'; if(pCrawl.children[index] == null) return false; pCrawl = pCrawl.children[index]; } return (pCrawl != null && pCrawl.isEndOfWord); }}
####Python
#!/usr/bin/pythonfrom typing import Tupleclass TrieNode(object): """ Our trie node implementation. Very basic. but does the job """ def __init__(self, char: str): self.char = char self.children = [] # Is it the last character of the word.` self.word_finished = False # How many times this character appeared in the addition process self.counter = 1 def add(root, word: str): """ Adding a word in the trie structure """ node = root for char in word: found_in_child = False # Search for the character in the children of the present `node` for child in node.children: if child.char == char: # We found it, increase the counter by 1 to keep track that another # word has it as well child.counter += 1 # And point the node to the child that contains this char node = child found_in_child = True break # We did not find it so add a new chlid if not found_in_child: new_node = TrieNode(char) node.children.append(new_node) # And then point node to the new child node = new_node # Everything finished. Mark it as the end of a word. node.word_finished = Truedef find_prefix(root, prefix: str) -> Tuple[bool, int]: """ Check and return 1. If the prefix exsists in any of the words we added so far 2. If yes then how may words actually have the prefix """ node = root # If the root node has no children, then return False. # Because it means we are trying to search in an empty trie if not root.children: return False, 0 for char in prefix: char_not_found = True # Search through all the children of the present `node` for child in node.children: if child.char == char: # We found the char existing in the child. char_not_found = False # Assign node as the child containing the char and break node = child break # Return False anyway when we did not find a char. if char_not_found: return False, 0 # Well, we are here means we have found the prefix. Return true to indicate that # And also the counter of the last node. This indicates how many words have this # prefix return True, node.counterif __name__ == "__main__": root = TrieNode('*') add(root, "hackathon") add(root, 'hack')
####Ruby
#!/usr/bin/rubyclass Node attr_reader :data, :children attr_accessor :term def initialize(data) @data = data @children = [] @term = false end def insert(char) return get(char) if have?(char) child = Node.new(char) @children << child child end def have?(char) @children.each do |c| return true if c.data == char end false end def get(char) @children.each do |child| return child if child.data == char end false endendclass Trie attr_reader :root def initialize @root = Node.new(nil) end def insert(word) node = @root word.size.times do |i| child = node.insert(word[i]) node = child end node.term = true end def have?(word) node = @root word.size.times do |i| return false unless node.have?(word[i]) node = node.get(word[i]) end return node.term == true endend
####Javascript
function Trie() {this.head = {key : '', children: {}}}Trie.prototype.add = function(key) {var curNode = this.head, newNode = null, curChar = key.slice(0,1);key = key.slice(1);while(typeof curNode.children[curChar] !== "undefined" && curChar.length > 0){curNode = curNode.children[curChar];curChar = key.slice(0,1);key = key.slice(1);}while(curChar.length > 0) {newNode = {key : curChar, value : key.length === 0 ? null : undefined, children : {}};curNode.children[curChar] = newNode;curNode = newNode;curChar = key.slice(0,1);key = key.slice(1);}};Trie.prototype.search = function(key) {var curNode = this.head, curChar = key.slice(0,1), d = 0;key = key.slice(1);while(typeof curNode.children[curChar] !== "undefined" && curChar.length > 0){curNode = curNode.children[curChar];curChar = key.slice(0,1);key = key.slice(1);d += 1;}if (curNode.value === null && key.length === 0) {return d;} else {return -1;}}Trie.prototype.remove = function(key) {var d = this.search(key);if (d > -1){removeH(this.head, key, d);}}function removeH(node, key, depth) {if (depth === 0 && Object.keys(node.children).length === 0){return true;} var curChar = key.slice(0,1);if (removeH(node.children[curChar], key.slice(1), depth-1)) {delete node.children[curChar];if (Object.keys(node.children).length === 0) {return true;} else {return false;}} else {return false;}}
####C++
#include <iostream>// define character size#define CHAR_SIZE 128// A Class representing a Trie nodeclass Trie{public:bool isLeaf;Trie* character[CHAR_SIZE];// ConstructorTrie(){this->isLeaf = false;for (int i = 0; i < CHAR_SIZE; i++)this->character[i] = nullptr;}void insert(std::string);bool deletion(Trie*&, std::string);bool search(std::string);bool haveChildren(Trie const*);};// Iterative function to insert a key in the Trievoid Trie::insert(std::string key){// start from root nodeTrie* curr = this;for (int i = 0; i < key.length(); i++){// create a new node if path doesn't existsif (curr->character[key[i]] == nullptr)curr->character[key[i]] = new Trie();// go to next nodecurr = curr->character[key[i]];}// mark current node as leafcurr->isLeaf = true;}// Iterative function to search a key in Trie. It returns true// if the key is found in the Trie, else it returns falsebool Trie::search(std::string key){// return false if Trie is emptyif (this == nullptr)return false;Trie* curr = this;for (int i = 0; i < key.length(); i++){// go to next nodecurr = curr->character[key[i]];// if string is invalid (reached end of path in Trie)if (curr == nullptr)return false;}// if current node is a leaf and we have reached the// end of the string, return truereturn curr->isLeaf;}// returns true if given node has any childrenbool Trie::haveChildren(Trie const* curr){for (int i = 0; i < CHAR_SIZE; i++)if (curr->character[i])return true;// child foundreturn false;}// Recursive function to delete a key in the Triebool Trie::deletion(Trie*& curr, std::string key){// return if Trie is emptyif (curr == nullptr)return false;// if we have not reached the end of the keyif (key.length()){// recurse for the node corresponding to next character in the key// and if it returns true, delete current node (if it is non-leaf)if (curr != nullptr &&curr->character[key[0]] != nullptr &&deletion(curr->character[key[0]], key.substr(1)) &&curr->isLeaf == false){if (!haveChildren(curr)){delete curr;curr = nullptr;return true;}else {return false;}}}// if we have reached the end of the keyif (key.length() == 0 && curr->isLeaf){// if current node is a leaf node and don't have any childrenif (!haveChildren(curr)){// delete current nodedelete curr;curr = nullptr;// delete non-leaf parent nodesreturn true;}// if current node is a leaf node and have childrenelse{// mark current node as non-leaf node (DON'T DELETE IT)curr->isLeaf = false;// don't delete its parent nodesreturn false;}}return false;}// C++ implementation of Trie Data Structureint main(){Trie* head = new Trie();head->insert("hello");std::cout << head->search("hello") << " "; // print 1head->insert("helloworld");std::cout << head->search("helloworld") << " "; // print 1std::cout << head->search("helll") << " "; // print 0 (Not found)head->insert("hell");std::cout << head->search("hell") << " "; // print 1head->insert("h");std::cout << head->search("h"); // print 1std::cout << std::endl;head->deletion(head, "hello");std::cout << head->search("hello") << " "; // print 0std::cout << head->search("helloworld") << " "; // print 1std::cout << head->search("hell"); // print 1std::cout << std::endl;head->deletion(head, "h");std::cout << head->search("h") << " "; // print 0std::cout << head->search("hell") << " "; // print 1std::cout << head->search("helloworld");// print 1std::cout << std::endl;head->deletion(head, "helloworld");std::cout << head->search("helloworld") << " "; // print 0std::cout << head->search("hell") << " "; // print 1head->deletion(head, "hell");std::cout << head->search("hell"); // print 0std::cout << std::endl;if (head == nullptr)std::cout << "Trie empty!!\n"; // Trie is empty nowstd::cout << head->search("hell"); // print 0return 0;}
####PHP
<?phpclass TrieNode { public $weight; private $children; function __construct($weight, $children) { $this->weight = $weight; $this->children = $children; } /** map lower case english letters to 0-25 */ static function getAsciiValue($char) { return intval(ord($char)) - intval(ord('a')); } function addChild($char, $node) { if (!isset($this->children)) { $this->children = []; } $this->children[self::getAsciiValue($char)] = $node; } function isChild($char) { return isset($this->children[self::getAsciiValue($char)]); } function getChild($char) { return $this->children[self::getAsciiValue($char)]; } function isLeaf() { return empty($this->children); }}
That’s all for common linear-type data structures.
And that’s the end of knowing-your-algorithms-and-data-structures series in my blog post.
Hope you learn something useful out of them.
Till then. Happy Coding!
]]>And when I’m talking about legacy web apps, I’m usually referring to server-side rendering web applications.
In this legacy web app, I have the grand opportunity to design and refactor web UI controls to be heavily built in React alongside with leveraging client-side routing such as React Router.
But due to the sheer size of its monolithic complexity behind such web forms, the time required to completely revamp the UI interface could not be accomplished within our client’s timeline budget of delivery. We need to ship some parts of the system that’s already remade in React in the first phase, but the rest of the legacy systems would have to hop aboard with the rest of the React app with it.
Thus the solution to serve the other legacy system menus and app access within React is simply to make use <iframe>
tag.
Let’s say for eg you have the following React Router (at the time of writing, I’m using version 4) component definition:
import React from "react";import { BrowserRouter, Switch, Route } from "react-router-dom";const ReactApp = () => ( <BrowserRouter> <Switch> <Route path="/home" exact component={Home} /> <Route path="/about/" component={About} /> <Route path="/users/" component={Users} /> <Route path="/users/:id/profile" component={UserProfile} /> </Switch> </BrowserRouter>);
Let’s say that you managed to have Home
and About
React components designed and implemented, but somehow you are nowhere near confident in implementing the Users
and UserProfile
pages to match the legacy environments, just yet.
Thus you need to ‘house’ these legacy apps under React when user navigate such pages in the interim.
Therefore our solution (as originally proposed above) would have to look something like these.
const Users = () => ( <Fragment> <iframe src="some_legacy_url/users" title="some legacy app title" /> </Fragment>);const UserProfile = ({id}) => ( <Fragment> <iframe src={`some_legacy_url/user/${id}/profile`} title="some legacy app title" /> </Fragment>);
With the iframe
tags and page components defined above, I know the legacy URLs to the Users
and UserProfile
pages are going to be needed here beforehand. When users navigate to these pages within React, what’s going to happen behind the scenes is that React will render the iframe
upon load, the iframe
takes over the responsibility of pre-fetching all of its resources from the web server that hosted the legacy web apps and loads their respective mapped pages.
Once that’s complete, you can immediately see their actual legacy pages and ready to be used, just like you would if you were normally using them in the legacy environment!
How is it possible?
Well.
You can think of iframe
as an URL browser that lets you navigate pages as well, with the only difference that the same document page is embedded within the main HTML document window. Which in this case, it’s the React’s HTML document window. You can see a working example from this link from W3Schools so you see what I mean!
That’s fantastic!
We got the client-routing to server-side-rendering template mapping problems solved!
However, at this point, I’ve managed to solve only one aspect of this problem.
Sure - the user can enter the browser’s URL as http://some_react_base/users
and http://some_react_base/user/${id}/profile
to land the correct page resources.
Notice there’s another aspect of the problem here - the some_legacy_url/users
page contains a list of users that have the hyperlink to their profile information ie some_legacy_url/user/:some_id/profile
. Now you would think that by clicking one of those user profiles would trigger React Router to load and server the page on some_react_base_url/user/:same_id/profile
?
Well.. It doesn’t really.
What actually happened here is that iframe
will no doubt load server-side-rendering UserProfile
template within React app properly. The browser URL routing navigation, however, does not. This is because React has no control over the iframe
resources as soon as Users
legacy page is loaded. For any server-side-rendering applications, they have their own internal page routing that leads the user to navigate other server-side rendering pages such as Userprofile
. Therefore as soon as the user starts navigating to UserProfile
page, the main browser window does not leave http:/some-react-base-url/users
. Instead, it is the iframe
leaves the users URL for another page. This is because as I mentioned earlier in this post, iframe
itself is treated like an URL browser window on its own.
That’s why React has no ‘clue’ what iframe
navigation activities are happening over there.
Okay.. That’s good to know.
However, for our case, we do want our React to have control over this navigation behaviour as we may wish to use our menu and page navigation layouts that are different to the legacy’s menus and page navigation setup. Thus we just want to mirror our client-side-routing to the server-side routing paths respectively (and interchangeably).
So - how should we accomplish this?
Simple.
We use the browsers’ DOM API to do this.
Supposed we have the following Python Flask’s Users template
<html> <title>Users Page</title <body> <h1>List of Users</h1> <table id="users_table"> <tbody> {% for user in users %} <tr> <td>{{user.name}}</td> <td>{{user.email}}</td> <td><a href=/user/{{user.id}}/profile>Profile detail</></td> </tr> {% endfor %} </tbody> </table> </body></html>
We have a table rendered with a list of users’ information and each row of a user has a hyperlink that contains their profile in detail.
Next, we hook up our jQuery to manage the individual hyperlink.
$(document).ready( function() { //grab the column elements that has hyperlinks var user_profile_links = $("table[id='users_table'] tr td:nth-child(3)") //bind click event for each profile link user_profile_links.each( function(index) { $(this).on("click", "a" , function(e) { e.preventDefault(); var user_profile_url = $(this).attr('href') document.location.href = user_profile_url; }) });}
Nothing new here. Very basic stuff.
Somewhere in our server-side legacy code base, we handle user profile routing over there.
But here’s the real kicker…
Instead of using document.location.href
, we use the window’s frameElement
.
var isEmbeddedInFrame = window.frameElement;...//put this inside each table's column onclick callbackif(isEmbededInFrame) { var topParentWindow = isEmbeddedInFrame.ownerDocument.defaultView; topParentWindow.history.pushState({}, '', user_profile_url); topParentWindow.history.go();}
Confused? What’s going here, you’re probably wondering..?
How do these help React Router to route the UserProfile pages exactly?
I’m glad that you asked.
Let me explain.
When verifying the template is definitely sitting inside the iframe element by calling window.frameElement API
call, we’re now granted the access to all of its DOM properties and API method calls just like you would with document
and window
object in the browser.
What’s more interesting here is in newer browser versions of Chrome, Safari etc, we can reach out to the main parent window that embeds the current iframe we’re in by calling ownerDocument.defaultView
, which returns us, essentially, another window
object.
And just like any window
object, we can access any of its HTML5 APIs at our disposal such as history
.
Here, our topParentWindow’s history pushState
method call is to create a new URL entry and store it to our browser’s URL history. Not only that, the URL entry will be displayed in the URL link as well. It takes 3 parameters
pushState
.Using pushState
does not trigger the new URL entry to reload the page. It’s not responsible for handling that. You need to execute its go
method to make this happen.
Once you do this, the magic happens when the user_profile_url
link is loaded in the main browser, React Router routing mechanism will start to kick in, and match user_profile_url
link to one of its URL routing rules - which is
<Route path="/users/:id/profile" component={UserProfile} />
Assuming they are matched correctly, it will load the UserProfile
component correctly along with its own iframe
resources pertaining to the user profile information - which will be rendered by the same legacy app.
Thus, it gives the user the illusion that the user is navigating the URL pages within the same environment, but really, both React and legacy app are just getting synced up with one another.
And there you have!
That’s how you can get both client-side routing and server-side working side by side when hosting legacy apps within React environment!!
The reason why this works is because underneath the React Router library, its core engine fundamentally makes full use of HTML5 history API just like the pushState I mentioned earlier, if you read its user docs in detail especially if you’re planning to use BrowserRouter
component for the majority of client-routing experience for all of your React page components.
This is the perfect solution for me in the interim before I can plan for more time in redesigning and, eventually, implementing the new User
and UserProfile
pages in React, thus allowing me to do legacy-to-React app migration gracefully.
Then you may ask sure this is great stuff to know! But what if you’re still serving the legacy web app domain in the production environment and not completely ready to shut the legacy domain completely yet? What if you want to have both the legacy and React environments running side by side? Would the code sample I wrote will break the legacy routing functionality while React is running it side by side?
Again, the solution is not too hard to implement.
We just pass in the followining conditional checks.
$(document).ready( function() { // React will pass the following query string parameters in Flask/Python var environment_type = "{{request.args.get('environment_type'}}"; var user_profile_links = $("table[id='users_table'] tr td:nth-child(3)") user_profile_links.each( function(index) { $(this).on("click", "a" , function(e) { e.preventDefault(); var user_profile_url = $(this).attr('href'); // if the legacy app is hosted in the react environment. if(environment_type === 'react'){ var isEmbeddedInFrame = window.frameElement; // check if it's definitely sitting in the iframe if(isEmbededInFrame) { var topParentWindow = isEmbeddedInFrame.ownerDocument.defaultView; // react router will handle the rest topParentWindow.history.pushState({}, '', user_profile_url); topParentWindow.history.go(); } } else { // otherwise let the server-side routing take care of it. document.location.href = user_profile_url; } }) });}
Then in our Users
component, we modify to do the following:
const qs_param = '?environment_type=react';const Users = () => ( <Fragment> <iframe src={`some_legacy_url/users${qs_param}`} title="some legacy app title" /> </Fragment>);
Now you have both server-side routing and client-side routing working in tandum without breaking any user navigation flows.
One last thing I want to mention here is that using iframe has security implications behind it so be sure to consult Mozilla Web API docs for handling cross-domain security resources as iframe suffered its notoriety for injecting malicious scripts as well as other possible attack vectors it comes with so you need to plan ahead how to handle the security requirements in using this for your web app environment, when needed. Hence, the need for wiring security applications is out of this post’s scope.
Hopefully, you have learnt some useful tricks from this.
Till next time, Happy Coding!
Useful References:
]]>I went ahead to follow this blog post to quickly learn and build blockchain in record speed using Python!
After playing around their sample code, I decided to give myself a crack of this - to write blockchain implementation in NodeJS.
Before I continue writing the rest of the post, I just want to reaffirm my point that I don’t know a lot about them upfront and their useful practicalities in real life scenarios.
I’m sure there’s plenty of online blogs and forums out there will be ongoing discussions amongst blockchain enthusiasts, exploits and the like on how the technology can ‘revolutionalise’ the way people do online transaction.
I will cover some of those in my future post one day perhaps.
But today, I simply want to cover the basics of writing blockchain for anybody that is interested to know and how it what works under the hood.
At its core, blockchains are simply singly linked list data structure.
If you recall from my last year’s post on common data structures software engineers/developers alike oughta know, they work around the concepts of having a node that’s linked to one after another. In each node, it carries some important data or information about itself. In particular, it holds referencing information about the next sibling/successor node. Thus we have a chain of nodes linking together that’s ever-expanding or elongating.
If you can understand this core concept, you’re more than halfway done to fully grasp what blockchain truly is!
Currently, there are two types of blockchain implementations out there present.
What is POW?
POW is a protocol whose goal is to deter cyber attacks such as distributed DDoS. Its purpose is to exhaust all of the computing power within the resource system by sending multiple fake requests. These grunt work of sending fake requests would be dubbed as the ‘miners’. Miners would have to perform a lot of mining to create a new group of transactions that do not involve any third-party trusts, which in this case they’re simply called blockchains. Miners’ primary responsibility is to
Thus in our POW demonstrate this, we define the following function class to describe our block (or node, if you will) structure.
function POWBlock(index, prevHash, timeStamp, data, currentHash) { let index = index, prevHash = prevHash, timeStamp = timeStamp, data = data, currentHash = currentHash;}...module.exports = POWBlock
What we’re saying here is we have the block’s logical structure that holds certain pieces of important information. Each block will record the amount of transaction and timestamp of the transaction that occurred. Also, like the singly-linked list data structure, it contains references to its own hash as well as the previous block’s hash.
Hashes are calculated because it helps to maintain the integrity of the data thus SHA-256 is the common hashing algorithm for this exercise.
To perform such hashing we do the following;
const calculateHash = (index, prevHash, timeStamp, data) => { return CryptoJS.SHA256(index + prevHash + timeStamp + data).toString();};
Once you’re done creating the hash, we need to generate a block that kicks off the block-chaining operation. To do that, we grab the hash of the previous block that was calculated previously and create the rest of the content from here.
const generateNextBlock = (blockData = {}) => { const previousBlock = getLatestBlock(), nextIndex = previousBlock.index + 1, nextTimeStamp = new Date().getTime() / 1000, nextHash = this.calculateHash(nextIndex, previousBlock.hash, nextTimeStamp, blockData); return POWBlock(nextIndex, previousBlock.hash, nextTimeStamp, blockData, nextHash);};
That’s it! That’s the main premise of how blockchain works.
Next, we move onto Proof of Stake.
Proof of Stake is similarly to Proof of Work but its goals are different. The algorithm used to make blockchain is the same. However, miners do not get the reward at the mining so no block reward is given but rather block creation are determined and distributed to the block creator depending on the buildup history of their original wealth thus they will be given a stake. Miners do not get rewarded for solving mathematical problems, but they take transaction fees instead.
For the code level, it looks the same as POW, but with one minor difference. We just add a validator.
function POSBlock(index, prevHash, timeStamp, data, currentHash, validator) { let index = index, prevHash = prevHash, timeStamp = timeStamp, data = data, currentHash = currentHash, validator = validator;}...module.exports = POSBlock
To generate the blockchain, nothing much different other than just adding one attribute there.
const generateBlock = (oldBlock, data, address) => { const newBlock = POSBlock(); const t = new Date(); newBlock.index = oldBlock.index + 1; newBlock.timeStamp = t.getString(); newBlock.data = data; newBlock.prevHash = oldBlock.prevHash; newBlock.currentHash = oldBlock.currentHash; newBlock.validator = address; return newBlock;};
In my next post, I will write up a simulated blockchain ‘economy’ that will demonstrate how the two blockchain implementations will work on their own. I’m looking to using cool NodeJS features such as web sockets and event emitters that will help me to accomplish this goal.
Stay tuned!
Till then, Happy Coding!
Learning Resources:
]]>You know.
The usual UI suspects full-stack web developers normally face.
For eg, let’s say if you were to up build a form that comes with some drop-down fields, your React code would look like this.
import React, { Component, Fragment } from "react";export default class FormApp extends Component { state = { dropdown_shirt: null } searchMe = e => { // Magic is going to happen here. }; pickMe = e => { //state your name! this.setState({[e.target.name]: e.target.value} } render() { return ( <Fragment> <h1>Welcome to my Awesome React App</h1> <form className="form-container"> <label htmlFor="dropdown_shirt">Shirts</label> <select name="dropdown_shirt" onChange={this.pickMe}> <option value="polo_tees">Polo Tees</option> <option value="sleeveless">Sleeveless</option> <option value="v_necks">V Necks</option> </select> <button onClick={this.searchMe}>Find me some tees!</button> </form> {/* The table data will be rendered here when searching */} </Fragment> ); }}
Nothing out of ordinary here.
A very typical React setup using local states, event handlers, JSX elements along with other useful React’s core APIs.
Then you continue adding other input fields such as checkboxes, radio buttons, text fields etc, etc.. to satisfy some user requirements behind them.
But…
As you obviously know, the more features you build, the more complex the app is going to become - especially at the code structure level.
What if we have a requirement such that not only, we have just one drop-down filter, but several more drop-down filters??
Perhaps with an extra 3 drop-down filters…
// Pants dropdown field<label htmlFor="dropdown_pants">Pants</label><select name="dropdown_pants" onChange={this.pickMe} value={this.state.dropdown_pants}> <option value="dress_pants">Dress pants</option> <option value="jeans">Jeans</option> <option value="baggy_pants">Baggy pants</option></select>// Shoes dropdown field<label htmlFor="dropdown_shoes">Shoes</label><select name="dropdown_shoes" onChange={this.pickMe} value={this.state.dropdown_shoes}> <option value="boots">Boots</option> <option value="sporty_shoes">Sporty shoes</option> <option value="leather_shoes">Leather shoes</option></select>// Hats dropdown field<label htmlFor="dropdown_hats">Hats</label><select name="dropdown_hats" onChange={this.pickMe} value={this.state.dropdown_hats}> <option value="beanie">Beanie</option> <option value="cowboy_hat">Cowboy hat</option> <option value="sports_cap">Sports Cap</option></select>
Great! So our search form gets funkier to have more drop-down search filters to choose from.
However, the problem emerges when our render
function now takes in more drop-down components to render.
You may think it’s fine for a few components for now.
But, what if you decide to add more dropdown filters in the future? With that, your render
function is going to get longer and longer such that your rendering section becomes one long poem to read!
Moreover, it is very repetitive and it is going to be hard in maintaining when the next developer comes in to extend/modify your once-so-called awesome form app, not to mention having to keep track of local states, functions, data props etc for each drop-down field.
Definitely not cool at all.
So what can we do to make this better with such repeated UI controls use?
Well. We can DRY them using arrays and destructuring.
To start off, you notice, in the previous examples, all the dropdown fields look identical to each other with the minor difference of their label names and their local states respectively. So, in my mental model, my array structure would be designed like this.
const dropdownsArr = [ { label: "Shirts", name: "dropdown_shirt", value: this.state.dropdown_shirt, options: [ { label: "Polo Tees" value: "polo_tees" }, { label: "Sleeveless" value: "sleeveless" } // ...rest of options ] } ....];
Knowing my current filter functionality as it stands, I was able to figure out the key attributes that make up for each drop-down filter. Each dropdown has a unique label name, a local state that keeps track of user-selected dropdown value along with an array of drop-down option values.
But in a real world, I won’t be maintaining that list of options for each drop-down field. These are better sourced from the external source like a database, CSV or external API provider which grants me this access. Let’s assume for the moment that the options
property takes incoming data from some data API fetch. The same data will be stored as a prop so will get passed down to this component level. That prop name for such collection data is for eg called shirtsOptions
.
Hence, our revised array structure will be
const dropdownsArr = [ { label: "Shirts", name: "dropdown_shirt", value: this.state.dropdown_shirt, options: this.props.shirtsOptions } ....];
Now it looks better and leaner. We can safely assume at this point our shirtsOptions
uses our label
and value
properties for each item element to be part of the options array. So we’re good here.
Next, we add the other 3 drop-down fields, we get the following:
const dropdownsArr = [ { label: "Shirts", name: "dropdown_shirts", value: this.state.dropdown_shirts, options: this.props.shirtsOptions }, { label: "Pants", name: "dropdown_pants", value: this.state.dropdown_pants, options: this.props.pantsOptions }, { label: "Shoes", name: "dropdown_shoes", value: this.state.dropdown_shoes, options: this.props.shoesOptions }, { label: "Hats", name: "dropdown_hats", value: this.state.dropdown_hats, options: this.props.hatsOptions }];
From this, we can refactor our dropdowns rendering using map
dropdownsArray.map(renderAsDropDown);
And renderAsDropDown
will be:
renderAsDropDown = ({label, name, value, options}) => { return ( <Fragment> <label htmlFor={name}>{label}</label> <select name={name} onChange={this.pickMe} value={value}> {renderOptions(options)} </select> </Fragment> )}renderOptions = (options) => { return options.map( (option, index) => ( <option value={option.value}>{option.label}</option> )}
See what I have done here.
For my renderAsDropDown
method, not only will I be iterating each dropdown item from the dropdownsArray
within its callback method, but I also made use of ES6 object destructuring to bring its attributes out and map them to their correct JSX props placement which makes up my dropdown component along with its children components such as the options.
That’s it!
That’s how you can DRY our your repetitive UI control code using such splendid ES6 features for this purpose. 🤟🤘🤟🤘🤟.
Awesome!
But what if I could tell you that we could take this a step further?
Let’s say that you look at the following dropdownArr
construction.
const dropdownsArr = [ { label: "Shirts", name: "dropdown_shirts", value: this.state.dropdown_shirts, options: this.props.shirtsOptions }, { label: "Pants", name: "dropdown_pants", value: this.state.dropdown_pants, options: this.props.pantsOptions }, { label: "Shoes", name: "dropdown_shoes", value: this.state.dropdown_shoes, options: this.props.shoesOptions }, { label: "Hats", name: "dropdown_hats", value: this.state.dropdown_hats, options: this.props.hatsOptions }];
While this looks alright on the surface, we can still refactor this even more by decoupling out state
and props
.
const dropdownsArr = constructDropdownArray(this.state, this.props);
In our constructDropdownArray
method, we do the following.
const constructDropdownArray = ({ dropdown_shirts, dropdown_pants, dropdown_shoes, dropdown_hats },{ shirtsOptions, pantsOptions, shoesOptions, hatsOptions }) => { .... return const array = [ { label: "Shirts", name: "dropdown_shirts", value: dropdown_shirts, options: shirtsOptions }, { label: "Pants", name: "dropdown_pants", value: dropdown_pants, options: pantsOptions }, { label: "Shoes", name: "dropdown_shoes", value: dropdown_shoes, options: shoesOptions }, { label: "Hats", name: "dropdown_hats", value: dropdown_hats, options: hatsOptions } ];}
Again, using our ES6 object destructuring, we can do the same destructuring strategy for state
and props
as they’re also POJOs as per my previous array object example.
This is all very cool.
But what if, along the way, you want to modify some extra behaviours like our existing onChange
event as each dropdown has different onchange event requirements now? Thus we need to cater this change for our array construction. How should we do this?
Simple.
We append another attribute to each item object of the array
In our constructDropdownArray
method, we do the following.
// adding changFn property array = [ { label: "Shirts", name: "dropdown_shirts", changeFn: someEventHandler value: dropdown_shirts, options: shirtsOptions }, ....];
Yup. You can even pass JS functions to the arrays as well.
To use this updated structure, if we go back to our React component
export default class FormApp extends Component { // provided event handlers onShirtSelectedClick = () => {}; onPantsSelectedClick = () => {}; onShoesSelectedClick = () => {}; onHatsSelectedClick = () => {}; .....}
In the same component, within our render function, we tweak this constructDropdownArray
function to add an extra parameter.
const dropdownsArr = constructDropdownArray(this, this.state, this.props);
Yup. You heard me.
I’m passing this
to the calling function.
So I can do this.
const constructDropdownArray = ( { onShirtsSelectedChange, onPantsSelectedChange, onShoesSelectedChange, onHatsSelectedChange, }, ...rest ) => { .... ....}
Using object destructure , I can just grab any existing properties or functions on my current react component and then slab them into my item array in their respective drop-down item’s changeFn
property.
Therefore, in our renderAsDropdown
, we do the simple tweak to have the following.
// add another attribute to the destructure signaturerenderAsDropDown = ({ label, name, value, changeFn, options }) => { return ( <Fragment> <label htmlFor={name}>{label}</label> <select name={name} onChange={changeFn} value={value}> {renderOptions(options)} </select> </Fragment> );};
What’s amazing about this setup is that the Eventlistener function call still works at the renderAsDropdown
point because we didn’t lose the current binding context of this
event after it’s been destructured already.
Amazing isn’t it??
I was pretty stoked myself when I discovered how object destructure can also be used for this type of situation when building my form UIs.
Tricks like these are indeed useful to help to build your UI components to scale greatly.
Hope you learnt something useful.
Till then, Happy Coding!
]]>For instance, it could be something about certain software design pattern that’s currently implemented in one area of the module but you want to affirm whether that same design pattern could be also used in other parts of the application as well.
But where can you begin to figure out where and how heavily the design patterns are used and their overall prevalence within an application, if editor tools such as VSC can’t give you the straight answers?
Well.
Here’s how.
I found this.
Essentially, it’s another type of a Linux-based grep
tool.
But it’s built mainly for programmers working on source code repositories.
Typically, as software developers/engineers, we always spent a considerable proportion of our time navigating and locating source code files we need to work on when building features for our clients before we get to even type our first line of code.
Naturally, we won’t always know the depth of the entire application structure upfront when given on any particular task development to do. But, even if you do, there’s always a great chance certain features are in the constant pace of change thus technologies and design decisions don’t always stand idle against the sands of time - even more so if you work on a larger development team.
As a consequence of such factors, we search for file/folders to make those necessary changes.
For a while, we got things like grep
or built-in file-search patterns editor tools to do the job for us. But as your codebase grows over time, so do their file-search algorithms’ O(N) runtimes to grow exponentially and we don’t want our development time to be stalled by such impending condition.
Now, we get to learn another awesome trick with ack
that will take our file searching duties to the next level.
To start off, install ack
onto your dev machine by looking at your specific OS environments.
As I’m using MacBooks, I do the following using HomeBrew.
brew install ack
Once installed, you can interact with its command shell using your favourite terminal program by typing ack
.
Let’s say, as an example, I decide that I want to do some site changes on my blog site.
I want to enhance the look and feel of my embedded video link by modifying its CSS/SASS properties.
.embed-video-container { -moz-border-radius: .3em; -webkit-border-radius: .3em; border-radius: .3em; -moz-box-shadow: rgba(0,0,0,0.15) 0 1px 4px; -webkit-box-shadow: rgba(0,0,0,0.15) 0 1px 4px; box-shadow: rgba(0,0,0,0.15) 0 1px 4px; -moz-box-sizing: border-box; -webkit-box-sizing: border-box; box-sizing: border-box; border: #fff 0.5em solid;}
With this change, I’d expect that any blog post content pages I wrote some months back that has embedded videos to be impacted by such modification.
However, I do not necessarily know or remember which of the same content pages upfront be in my blog repo. I want to not only how quickly I want to find out where changes are going to be, but also its context of use as well.
With ack
, I do the following
$: ack -i "embed-video-container"
I get the following back
plugins/traileraddict.rb10: %(<div class="embed-video-container"><iframe src="//www.traileraddict.com/emd/#{@id.strip}"></iframe></div>)plugins/ooyala.rb12: %(<div class="embed-video-container"><script src="//player.ooyala.com/iframe.js#pbid=#{@pbid}&ec=#{@ec}"></script></div>)plugins/dailymotion.rb8: %(<div class="embed-video-container"><iframe src="//www.dailymotion.com/embed/video/#{@id.strip}"></iframe></div>)plugins/vimeo.rb10: %(<div class="embed-video-container"><iframe src="//player.vimeo.com/video/#{@id.strip}"></iframe></div>)plugins/youtube.rb10: %(<div class="embed-video-container"><iframe src="//www.youtube.com/embed/#{@id.strip}" allowfullscreen></iframe></div>)............................. public/blog/2018/06/19/my-thoughts-on-gitsoft-or-microhub-whichever-comes-first-since-its-acquisition/index.html151:<div class="embed-video-container"><iframe src="//www.youtube.com/embed/UEb1cvZG3GU" allowfullscreen></iframe></div>sass/custom/_rve.scss1:.embed-video-container {
From this, I found out where my embed-video-container
class is being used in their respective file/folder locations. I see that it’s in
Great! But I may decide I’m not interested in its use in Ruby or SASS files. I want to see which HTML files that make heavy use of the embedded widget.
So how do I filter it?
Simple.
We do this.
$: ack -i --html "embed-video-container"
By entering the html
flag, I get back only HTML files that contain the class name that matches with my keywords.
You can also do multiple file-type searches as well by adding more flags such as Javascript/Python/Java file types in one line if you want.
Fabulous!
Now, using the same results, I want to inspect which lines of the files this snippet content sits; and understand its breadth and depth of use as a context.
Again - simple.
We do either of the following.
# A - Trailing lines search$: ack -i --html -A 10 "embed-video-container"# A outputpublic/blog/2018/06/19/my-thoughts-on-gitsoft-or-microhub-whichever-comes-first-since-its-acquisition/index.html151:<div class="embed-video-container"><iframe src="//www.youtube.com/embed/UEb1cvZG3GU" allowfullscreen></iframe></div>152-153-154-<p>Fascinating, isn’t it?</p>155-156-<p>Or downright gutted by their Github’s decision-making process to par with Microsoft?</p>157-158-<p>Whatever you may be feeling (long-time or newbie dev), there’s no further doubt that more changes are coming our way within the open source communities, if not just restricted to Github itself.</p>159-160-<p>Like it or not.</p># B - Leading lines search$: ack -i --html -B 10 "embed-video-container"# B outputpublic/blog/2018/06/19/my-thoughts-on-gitsoft-or-microhub-whichever-comes-first-since-its-acquisition/index.html141-<p>Steve Balmer now admitted he’s <a href="https://www.zdnet.com/article/ballmer-i-may-have-called-linux-a-cancer-but-now-i-love-it/">loving</a> the open source community. He’s all for it now that the current chief of Microsoft Satya Nadella made all the right moves in porting a number of Microsoft applications into the open source environments such as Github back in 2015 since his inception into the company. Later on, they have journeyed, so far, to become the largest contributor to the open source development community by leaps and bounds, followed by other big players like Google, Amazon, Facebook etc, etc.</p>142-143-<p>The facts speak for themselves.</p>144-145-<p>You can read the official figures from this post <a href="https://medium.freecodecamp.org/the-top-contributors-to-github-2017-be98ab854e87">here</a>.</p>146-147-<p>With those convincing numbers and Microsoft’s dedicated commitment in giving back for this nourishing community, it’s little or no wonder how Microsoft and Github had been in secret talks amongst each other in buying the Octopuss mascot for a whopping $7.5 USD billion dollars!</p>148-149-<p>At the time of writing, I, later on, found out why they’ve made such a move by stumbling upon this Youtube video link below that comes with an amazing infographic that better explains the reasoning behind their purchase.</p>150-151:<div class="embed-video-container"><iframe src="//www.youtube.com/embed/UEb1cvZG3GU" allowfullscreen></iframe></div>
What those A
and B
switches do is that A
is set to show me lines of content after the embed-video-container
keyword pattern whilst B
is set to show me lines of content before the same keyword pattern.
You can even combine both switches to show you what comes before and after the same keywords in one resultset.
This is sick!! 💪💪💪💪💪
With such incredible power, this toolset provides us, it’s also brazenly lightning fast to get these results compared to what VSC/grep can give you.
Not only that, if you use the terminal within your VSC executing the commands above, you can even take advantage of VSC’s CMD+click
to open up files from your ACK results set and change their content from there.
To top it off, you can use either Bash, Zsh or Fish shell to trim and abstract away your ACK commands a step further.
eg.
# search trailing context for this keywordacktrailingby() { ack -i -A $1 "$2"}# search trailing context for this keywordackleadingby() { ack -i -B $1 "$2"}
In short.
ACK commands + VSC terminal environment + Bash/Zsh/Fish = Hackers (👨💻👩💻👨💻👩💻) in GOD mode
That’s how slick ACK really is.
This is just the tip of the iceberg.
As it is a regex pattern engine powered by PERL, I can see how why I see its feedback can come off lightning quick when running them.
The key advantages I found with this tool are
grep
could never perform to match such speed and accuracy.For the 3rd reason, it’s a fantastic tool for polyglot engineers like me. 🖥⌨️🖱
I highly recommended that you add this to one of your arsenal developer’s toolkit.
Give it a go! You won’t regret it.
Till then, Happy Coding!
]]>People have been sending their massive tweets with about dumping Github for an alternate open source online code storage systems out there such as Gitlab or
Bitbucket, left and right, as part of their voiced frustrations and concerns of Microsoft’s Github acquisition and its future of software development.
And, rightly so, they should be entitled to experience that fear as much as I was some weeks back.
This is how I expressed my reaction on my Twitter account.
If you zoom in closer…
Yes.
Those were my initial raw reactions - without a doubt.
I wasn’t too happy about it at first.
I started to worry what potential ‘harm’ or ‘damage’ Microsoft could do with the richest open source ecosystem in the world?
But, I think it’s very important to understand why and where did all this vehement hate come from as some of you may (or may not) remember how Microsoft used to give no respect to the open source development community some eons ago…
Former Microsoft CEO, Steve Balmer used to famously described the open source community, back in 2001, as I quote
Open source eg Linux is a cancer that attracts itself in an intellectual property sense to everything it touches. That’s the way that the licensing works.
Oooh… Cancer…
Ouch!
With such harsh comments, it’s no wonder why people in the community felt so much ‘butt-hurt’ by it at the time, and why they couldn’t stand up to someone who stands amongst the shoulders of corporate tyranny and fear.
Since then, there’s been a grand divide between Microsoft and the open source world where one argues that you can’t provide good software that relies too much on open source material which you can’t put a price on with no economic benefit, whilst the other states having open source movement fosters better innovation and growth, and builds trusts amongst developers (and businesses alike) on how you, the developer, and the rest of the community decide the flexibility factors in making software more malleable to ever-changing business conditions than you would if you were at the mercy of an actual vendor.
This is the fundamental nature of open source and why people love doing them for so long, including myself.
We’ve all worked very hard to defend them, as far as software licensing rights are concerned.
Now, given that’s how Microsoft’s previous attitude towards open source all those years ago, I decided to dig up and find out how much has their attitude changed since then.
And.
Lo and behold, there has been a change of heart - somewhat.
Steve Balmer now admitted he’s loving the open source community. He’s all for it now that the current chief of Microsoft Satya Nadella made all the right moves in porting a number of Microsoft applications into the open source environments such as Github back in 2015 since his inception into the company. Later on, they have journeyed, so far, to become the largest contributor to the open source development community by leaps and bounds, followed by other big players like Google, Amazon, Facebook etc, etc.
The facts speak for themselves.
You can read the official figures from this post here.
With those convincing numbers and Microsoft’s dedicated commitment in giving back for this nourishing community, it’s little or no wonder how Microsoft and Github had been in secret talks amongst each other in buying the Octopuss mascot for a whopping $7.5 USD billion dollars!
At the time of writing, I, later on, found out why they’ve made such a move by stumbling upon this Youtube video link below that comes with an amazing infographic that better explains the reasoning behind their purchase.
Fascinating, isn’t it?
Or downright gutted by their Github’s decision-making process to par with Microsoft?
Whatever you may be feeling (long-time or newbie dev), there’s no further doubt that more changes are coming our way within the open source communities, if not just restricted to Github itself.
Like it or not.
The underlying begging question remains in every Github developer mind now is:
I cannot say for everyone.
But, as for myself, I’m keen to give this a go.
I have to admit that some of MS open source development tools have been good to me.
I’m actively using Microsoft VSCode for all my software development needs and projects since middle of last year ie 2017, compared to the likes of Atom, Sublime Text, Note++ etc.
I have never looked back since.
Their editor tools have done wonders for me, and I had never felt so much freedom in my creative programming prowess in doing a number of my interesting projects I personally hosted on my Github account so far. Which - btw - I may add that it’s built upon Electron JS, one of the Github’s best open source framework for developing desktop apps.
Personally, this is just my opinion, I wouldn’t have accomplished a lot of things without it, and how it’s helped to manage my projects work seamlessly where I take my work with me.
In short, good for Microsoft and its new management team in keeping open source community alive and well.
Only time will tell how far they can help with Github (or others similarly) to reach bold new heights where open source software collaboration has never been before.
Yes.
Only time will tell.
Till then, Happy Coding!
]]>And sure enough as soon as I opened up the Github page, and go search for web scrap you get the following:
There are about 8800+ search results about this topic. And it sounds like it’s an expansive topic to know for such a simple software that goes out and extract all the data across any sites you encountered.
So I ask myself this question is - how and where did all this web scraping begin?
Well.
As it turns out, I went ahead to dig up the little history behind it.
Since the dawn of the Internet which happened some 25 years ago, people in those days were very fascinated in knowing how can you do business with customers on the other side of the world without any physical restrictions. And as soon as more and more people jump on the internet bandwagon, lots and lots of people slowly starting to build the business profile online thus they start to offer products and services to online visitors.
As soon as the wealth of information builds up all over the internet, several business people starting to take notice of them; wanting to grab this golden online business opportunity not long after. This information could include anything from
In order for businesses to get this data, they had to manually copy text from HTML pages. But later proven to be extremely inefficient for business applications.
So they moved on to using spreadsheet software to store web results, where a lot of HTML <table>
tags are used to store data thus they were the perfect candidate for further web scraping.
Later on, the web continues to evolve and we looked into using tools to download the server content on your machine and saved them on a client machine like we still used to today.
Soon, later on, web scraping tools started to get more mature and sophisticated at the end of Internet’s second decade, such they’re given amazing capabilities to scan and automatically sniff all HTML’s content for every website out there; big and small.
Even more so, now that we get more modern web scraping tools whose ability to scale and grow in size along with such raw computing power, even along the side of AI that can drive web extraction lot quicker.
Having said all this, this brings me to my next fun exciting point…
We get to build our very own web scrapping tool!
Whilst I was researching this topic on Github, I also stumbled the following.
According to my Github search results, I found Python is the more popular choice of language that does this job very well, compared others in the open source community.
And as a polyglot engineer myself, I thought it would be good to keep an open mind on how other languages can achieve web scraping objectives such as PHP, Ruby, NodeJs, Golang etc.
But after taking a brief glance at other language implementations of the same service, I led myself to a conclusion that I think Python does a much better job than others due to simply two reasons
Thus, I’ll use my Python for this specific project.
In my snippets, I identified a couple of excellent Python web scraping libraries for this project.
You can use either one of the libraries to fulfil your web scraping needs or both if you like to be a bit more adventurous. The key difference between the two is; BeautifulSoup is primarily an HTML content parsing library where you fetch some statically-written HTML content page from some specific URL and you go ahead parse certain DOM elements and/attributes you’re particularly interested in extracting; whereas Scrapy is a major framework built for extensive web scraping scenarios that BeautifulSoup could not handle. This is especially true if you desire high levels of performance when intensifying deeper web scraping needs along with multiple URLs to crawl and fetch. In short, Scrapy can crawl the domain level websites and fetch any nested URLs under the same domain URL where BeautifulSoup can’t do.
To clearly understand this difference, let’s start with some code examples below.
Let’s say I want to web-scrape data on a list of available software developer jobs in Sydney from some reputable job board site that I may (or may not) be interested in applying. My need for doing this exercise is I would like to find ways how to effectively and efficiently track any job applications I recently made (or have not made) for certain positions at any time of the day or week. I may not have the time to check for all jobs ads online and learn about their details by logging into my browser and look at them individually. Therefore I want to find a way to automating this step hence the perfect motivation to do some heavy web-scraping tasks.
To start, you import the following libraries.
from urllib.request import urlopenfrom bs4 import BeautifulSoupimport webbrowser
Define our base URL.
base_url = "https://www.seek.com.au/software-developer-jobs/in-All-Sydney-NSW"
Then query the same URL and save its HTML content to a BeautifulSoup object instance.
page = urlopen(base_url)soup = BeautifulSoup(page)
We prettify the content as we want the content to be legible to read before writing it back to our own static HTML file
html_str = soup.prettify()
Write the contents back to our to local file.
html_file = open('base_url.html', 'w')html_file.write(html_str)html_file.close()
And we’re done! Then you can view the contents of the newly written file by locating the saved file and opening it.
Or you can also programmatically open it as well as below. It achieves the same thing.
try: file_name = “../../base_url.html" webbrowser.open_new_tab(file_name)except: print("Cannot open local file: {0}".format(file_name))
Now all that is nice and dandy and taken care off, we got right into the ‘meatier’ side of things, which matters the most.
That is we want BeautifulSoup to do the major scraping work for us!
If you open up the base_url.html
file, you’ll see the following:
From this screenshot, I know what my data extraction needs are going to be.
Let’s say, for simple requirements, as a professional software developer, I’m interested to find all software developer roles that are currently available in Sydney job market right now via Seek. Based on the screenshot, I would like to obtain the following information:
Every job role advertised I noticed on the same page would have the exact attributes as compared to the original screenshot. Thus, with that assumption, I will tell BeautifulSoup to analyse and parse the list of jobs based on the previously-mentioned requirements.
If you open up the HTML content, you identify the HTML elements like below.
<html><body> ... <article aria-label="Software Engineer C# .NET" data-automation="premiumJob" data-job-id="36225685"> ... <a data-automation="jobCompany" title="Jobs at RPMGlobal"> RPMGlobal </a> ... <a data-automation="jobLocation" > Sydney </a> ... <a data-automation="jobArea" > North Shore & Northern Beaches </a> <ul> <li> Global, ASX listed mining software company </li> <li> Exciting agile based development projects </li> <li> North Sydney location </li> </ul> <span data-automation="jobShortDescription"> Develop leading products, be a part of a great team, and enjoy a good work/life balance. </span> </article></body><html>
And what you notice that there are some noticeable patterns of data attributes that you can use here.
First, we fetch all article
tags in the list.
job_articles = soup.find_all("article")
After fetching all article tags, we want to perform a loop operation over this list to extract some data for each job.
for a_job_article in job_articles: extract_details(a_job_article)
And here’s our implemented extract_details
method
def extract_details(a_job_article): # Job title print(a_job_article["aria-label"]) # Company company = a_job_article.find(attrs={"data-automation":"jobCompany"}) if company != None: print(company.text) # Location location = a_job_article.find(attrs={"data-automation":"jobLocation"}) if location != None: print(location.text) # Area area = a_job_article.find(attrs={"data-automation":"jobArea"}) if area != None: print(area.text) # Salary range salary = a_job_article.find(attrs={"data-automation":"jobSalary"}) if salary != None: print(salary.text) # Duties and tasks(optional) if a_job_article.find('ul') != None: dutiestasks_list = a_job_article.find('ul').find_all('li') list_of_dutiestasks = [] if dutiestasks_list != None: for dutiestasks_item in dutiestasks_list: if dutiestasks_item != None: print(dutiestasks_item.text) # Job Description job_description = a_job_article.find(attrs={"data-automation":"jobShortDescription"}) if job_description != None: print(job_description.text)
In spite of being a ‘sizeable’ method, it’s easy to read and comprehend clearly what my data extraction requirements are going to be.
I set out to find all the relevant data I want by telling BeautifulSoup I’m searching for specific elements that come with certain HTML attributes such as data-automation.
. I accomplished this by calling BeautifulSoup’s core API methods find
. This find
method does the core job in scanning the HTML elements that match whatever the predicate you specify for a complete match. Obviously simple tag elements like ul
, li
, span
can be used. For our case, our tag matching needs are slightly more complex so we need an extra parameter for our predicate to achieve our main data extraction goals here.
What you also notice too is that I placed a number of conditional checks in case we encountered any HTML elements that are not found, we don’t want our scraping tool to crash and burn, thus not able to continue scanning until the end of the HTML document. This is expected when doing our web scraping needs that we can’t always best assumed all the data is available all the time.
To begin web-scraping, you run the following:
python py-webby-to-scrapy.py
You will then see the following lines produced:
Software Engineer C# .NETRPMGlobalSydneyNorth Shore & Northern BeachesGlobal, ASX listed mining software companyExciting agile based development projectsNorth Sydney locationDeveloping leading products, be part of a great team, and enjoy a good work/life balance
Now that I know that all the data fetched is all working fine as expected.
Next important bit is to extract them and have them populated into a CSV file.
To do this, you write up the following:
with open('bs_job_searches.csv', 'a') as csv_file: # Empty file contents first csv_file.seek(0) csv_file.truncate() # Setup CSV headings for the CSV stream writer fieldnames = ['job_title', 'company', 'location', 'area', 'salary_range', 'role_specification', 'job_description'] csv_writer = csv.DictWriter(csv_file, fieldnames=fieldnames) csv_writer.writeheader() job_articles = fetch_articles() # Let the data extraction commence by passing the list and the csv stream writer for a_job_article in job_articles: extract_details(a_job_article, csv_writer)
Going over the extract_details
method, the change is fairly straightforward. We just simply replace the print
statements with a dictionary that grabs our data and assign to its appropriate key. Like so.
def extract_details(a_job_article, csv_writer=None): # create the dictionary to store all data to their relevant fields job_article_dict = {} # Job title job_article_dict["job_title"] = a_job_article["aria-label"] # Company company = a_job_article.find(attrs={"data-automation":"jobCompany"}) if company != None: job_article_dict["company"] = company.text # Location location = a_job_article.find(attrs={"data-automation":"jobLocation"}) if location != None: job_article_dict["location"] = location.text # Area area = a_job_article.find(attrs={"data-automation":"jobArea"}) if area != None: job_article_dict["area"] = area.text # Salary range salary = a_job_article.find(attrs={"data-automation":"jobSalary"}) if salary != None: job_article_dict["salary_range"] = salary.text # Duties and tasks(optional) if a_job_article.find('ul') != None: dutiestasks_list = a_job_article.find('ul').find_all('li') list_of_dutiestasks = [] if dutiestasks_list != None: for dutiestasks_item in dutiestasks_list: if dutiestasks_item != None: list_of_dutiestasks.append(dutiestasks_item.text) job_article_dict["role_specification"] = ';'.join(list_of_dutiestasks) else: job_article_dict["role_specification"] = "" # Job Description job_description = a_job_article.find(attrs={"data-automation":"jobShortDescription"}) if job_description != None: job_article_dict["job_description"] = job_description.text # Finally write them to the csv file csv_writer.writerow(job_article_dict)
That’s it with BeautifulSoup!
It does a pretty good job for our simple data extraction needs.
However, it does not come without any limitations. As it is only an HTML parsing tool, it cannot perform other sophisticated web scraping needs as you navigate the site further.
What if the site you’re currently on has:
How do we suppose to web-scrape in such situations?
Obviously, BeautifulSoup doesn’t have the ability to crawl or sniff the site/s we want.
Therefore, in such cases where BeautifulSoup can’t do, that’s where Scrapy comes in and helps us!
So lets’ get straight to our code sample!
As Scrapy is a fully-fledged web crawling framework, you have to get used to understanding its CLI commands and how their work before doing any sort of web scraping work.
To start it off, you open up your terminal window.
To create a new project, run scrapy startproject jobsearchscraper
scrapy startproject jobsearchscraper
After running it, you will see the following folder structure:
jobsearchscraper/ scrapy.cfg jobsearcscraper/ __init__.py items.py middlewares.py pipelines.py settings.py spiders/ _init__.py
This is what it comes with when setting up Scrapy project. Like any framework, it comes with its own set of rules and system configurations, followed by pipelines and middlewares that you can leverage in building up your app.
The most important folder we should pay full attention is the spiders
folder.
Analogous to a creepy-crawler spider, it is there to crawl all over the place in order to spread its invasion or territory gain by strewing lots of cobwebs. By doing this, they gain full awareness of their surroundings and be able to find and catch prey, wherever they may be. With that mind, in relation to this framework, this is precisely what web spiders are built for. It will use the Spider class to crawl a website (or a group of websites) so we can gather all the necessary information about them such making initial requests, knowing which links should we follow etc before we go ahead to do some serious web scraping fun.
Go to spiders
folder and create a file called job_search_spider.py
, import scrapy.
import scrapy
Following from the previous web-scraping exercise to extract all latest software developer jobs, we start off defining our variables Scrapy requires in our Spider class.
class JobSearchSpider(scrapy.spider): name = "job_searches" allowed_domains = ["seek.com.au"] start_urls = ["https://www.seek.com.au/software-developer-jobs/in-All-Sydney-NSW"]
With the above, what we’re saying here is we have given a unique name for our Scrapy project called job_searches
whenever we want to perform web requests in preparation for web scraping. We also defined which domain URLs this project is allowed to web scrap only, and we list any specific URLs in the start_urls
array for Scrapy to begin crawling. So in short, when you see this code setup, this suggests we can only do a web crawl for one specific URL link that falls under the root domain URL for this job_searches
web project, one execution at a time.
Now, we got that out of the way, let’s move on.
With our BeautifulSoup exercise earlier, we used find
method calls to scan for elements that contain our data.
In Scrapy, however, we now do the following:
def parse(self, response): self.log('Browsing ' + response.url) job_articles = response.css('article') for job in job_articles: # Initialize our dictionary job_article_dict = {} # fetch our duties and tasks in the li tags list_of_dutiestasks = [] duties_list = job.css('ul li') for each_duty in duties_list: list_of_dutiestasks.append(each_duty.css('span[data-automation="false"]::text').extract_first()) # fetch our data elements job_article_dict['job_title'] = job.css('article::attr(aria-label)').extract_first() job_article_dict['company'] = job.css('a[data-automation="jobCompany"]::text').extract_first() job_article_dict['location'] = job.css('a[data-automation="jobLocation"]::text').extract_first() job_article_dict['area'] = job.css('a[data-automation="jobArea"]::text').extract_first() job_article_dict['salary_range'] = job.css('span[data-automation="jobSalary"] span::text').extract_first() job_article_dict['role_specification'] = ';'.join(list_of_dutiestasks), job_article_dict['job_description'] = job.css('span[data-automation="jobShortDescription"] span[data-automation="false"]::text').extract_first() yield job_article_dict
Comparing to our BeautifulSoup exercise, structurally, it looks the same ie the loop to go through all the HTML elements that hold job article information, our item dictionary etc. The only difference is we replace find
method using a CSS selector API .css()
to find and match the HTML elements we want. Thus if you’re coming from a front-end development background and you know your CSS well, this is second nature to you. You’ll make use your CSS specificity knowledge to your advantage to do this quickly. Otherwise, for those who aren’t, you can use the xpath()
option, which is the path expression tool for navigating and identifying nodes anywhere in an HTML/XML document if you prefer.
Notice, at the end of each job_article
iteration, we yield
our job article dictionary data within our console environment as Scrapy makes heavy use of generators when dealing with an ever-growing list of HTML content pages/sites the program is going to crawl.
This is useful for more memory-efficient and faster crawling performance when scraping data linearly/exponentially.
Now.
Here comes the fun part!
We’re going through pagination links to continue fetching more data!!
To do this, just as we’re about to exit its for loop
statement, we inject the following code
next_page_url = response.css('a[data-automation="page-next"]::attr(href)').extract_first()self.log("Next page url to navigate: " + next_page_url)if next_page_url: next_page_url = response.urljoin(next_page_url) yield scrapy.Request(url=next_page_url, callback=self.parse)
That’s it!
This is how incredibly easy it is to set up.
To understand this, firstly, let’s revisit the pagination HTML markup.
<div> <p> <span class="_2UKqRah _2454KzL">1</span> <span class=""><a href="/software-developer-jobs/in-All-Sydney-NSW?page=2" rel="nofollow" class="_2UKqRah" data-automation="page-2" target="_self">2</a></span> <span class=""><a href="/software-developer-jobs/in-All-Sydney-NSW?page=3" rel="nofollow" class="_2UKqRah" data-automation="page-3" target="_self">3</a></span> <span class="K1Fdmkw"><a href="/software-developer-jobs/in-All-Sydney-NSW?page=4" rel="nofollow" class="_2UKqRah" data-automation="page-4" target="_self">4</a></span> <span class="K1Fdmkw"><a href="/software-developer-jobs/in-All-Sydney-NSW?page=5" rel="nofollow" class="_2UKqRah" data-automation="page-5" target="_self">5</a></span> <span class="K1Fdmkw"><a href="/software-developer-jobs/in-All-Sydney-NSW?page=6" rel="nofollow" class="_2UKqRah" data-automation="page-6" target="_self">6</a></span> <span class="K1Fdmkw"><a href="/software-developer-jobs/in-All-Sydney-NSW?page=7" rel="nofollow" class="_2UKqRah" data-automation="page-7" target="_self">7</a></span> <a href="/software-developer-jobs/in-All-Sydney-NSW?page=2" rel="nofollow next" class="_1XIONbW" data-automation="page-next" target="_self">Next</a> </p></div>
There’s little need for me to explain what this markup is about. You can obviously see this is a typical pattern for pagination links. Thus for each paginated link, it contains its own a href
link tag.
What we’re interested here is to get a href
link with a text which says Next
.
That’s our guy.
So how do we know this href
link stands out differently compared to the rest of the hyperlinks? The answer is right in front of us.
There’s the data-automation
attribute who says ‘page-next’. Since it’s the only unique attribute value in that entire markup, (as well as the entire page), we can safely assume that’s the correct one we’re interested to fetch. Hence our CSS query will be written as follows
response.css('a[data-automation="page-next"]::attr(href)').extract_first()
We say, on the same page, fetch this specific element whose data-automation
attribute is set as ‘page-next’ and grab its actual href link value at the same time. Doing this css
call will give us an instance of a Node HTML object of href link type. We then must explicitly extract its actual href link so we can begin our page request with it.
So as long as the fetched pagination URL is valid ie if next_page_url:
, we obtain its actual absolute URL path by calling response.urljoin(next_page_url)
as the returned next_page_url
is a relative URL path, to begin with. Then we make Scrapy to initialize the web request for the paginated link and yield it until the request has been fulfilled from the server. We send a callback
method to perform some operation after the request for the paginated link is fulfilled and an expected page content is returned, therefore we pass the flow control back to the parse
method as we may have other job article items in the list we wish to continue parsing/extracting. Once we’re done with parsing/extracting the second time, we go to the next page and start the parsing callback again. We keep this repeating process until either we reached the end of the list or we encountered an unreachable page content, so we get the extracted data we got so far.
With that, we run the following command:
scrapy runspider ./jobsearchscaper/spiders/job_search_spider.py
At the console screen, Scrapy will report to you how many items have been extracted
2018-06-10 16:42:44 [scrapy.core.engine] INFO: Closing spider (finished)2018-06-10 16:42:44 [scrapy.statscollectors] INFO: Dumping Scrapy stats:{'downloader/request_bytes': 12302, .......... 'item_scraped_count': 513, .......... 'start_time': datetime.datetime(2018, 6, 10, 6, 42, 25, 472517)}2018-06-10 16:42:44 [scrapy.core.engine] INFO: Spider closed (finished)
And that’s it!
That’s all there is knowing Scrapy!
Scrapy is an awesome framework for tackling all the other web browsing scenarios that Beautiful Soup can’t do, especially true for dynamically generated content such as paginated links or infinite scrolling pages or pages with details content links.
Which is great… but you may ask how does Scrapy handle CSV data export like we did with BeautifulSoup earlier? How does it fare better?
Well.
Glad you asked. This is even much easier to accomplish!
To achieve this, you run the following commands in terminal screen:
scrapy crawl job_searches -o job_searches.csv
Voila!
All done and dusted!!
Find the CSV file you just exported, and you will find all the column headers and rows are exported correctly and beautifully.
No need to write any parsing and reading/writing to files code programmatically.
No need to import any data export libraries or plugins.
Just a few simple CLI tools and away you go!
Scrapy internally includes all the most common data exports formats you can imagine from XML, CSV, JSON etc. It comes baked-in with several useful middlewares that not only help you to extract data but also lets you take advantage of serialization techniques as well as backend storage systems as well. Thus you can do many more advanced data exports functionality, so should you desire. You can read more about it here.
There you have it.
Those are just the core basics of using Python’s two infamous web scraping tools.
This is, by far, no means a complete user guide of getting the most of them. You can do everything with them. My simple web scraping example is too simplistic to explore all the possible features BeautifulSoup and Scrapy can do. Moreover, they are way out of scope for the intents and purposes of this post so I will stop it here.
The only limitations I’d find with them are as these tools are backend tools, they cannot scrap web content that’s heavily powered by some JS framework such as Angular/React/VueJS as JS is, after all, the language and tool for the front-end. You need to integrate different plugin called Splash, which is a Python library that lets you inject your custom JS library code in order to interact with front-end UI rendering logic. Interestingly, this plugin comes with an embedded Lua language in order to make script injection work so expect to get yourself familiarise with Lua beforehand.
I will have to cover them in my future posts, should I get into a deeper relationship in exploring their use cases.
While it is’ great having fun playing around them, however I also must stress the caveats in using them.
As I would picture-quote:
This quote comes from the infamous fictitious uncle, named Ben, of a fictional web-slinging hero, Spiderman, who forewarned him the dangers of using his super powers if not using them correctly ( or morally, should I say). If you do something wrong with it, there will be repercussions down the line. Sure enough, the author died sadly due to our hero’s negligence of his moral ethics for letting a criminal getting away. In the end, the hero felt horrible for his actions thus he learned the hard way in using his powers for greater good, not misusing it. He will forever live up to his moral integrity thereafter.
Ironically, this similarly applies to dev ‘heroes’ who wants to crawl websites may also be faced with similar ethical questions? Will they be web-scraping for the right reasons? Or will they be web-scraping to steal and copy data for their bad intentions and purposes? Who will win or who will lose in this process? Who will suffer and hurt the most? Is it legal to do so? If not, do you have the legal consent to do so? Does it only encourage more business anti-competitiveness etc, etc?
All those looming ethical questions that will haunt developers for days and days.
Like with any languages/frameworks/libraries have their own ‘superpowers’ to do great things, it’s up to developers to take grave responsibility in how we’re planning to use them. Are we using them for the greater good? Or are we slipping into the dark side of its limitless power?
We should tread on this subject matter carefully, and seriously question ourselves whether will there be another business or person that could be negatively impacted by this act? Will they end up like Uncle Ben, who suffered in vain? Or will they be saved from harm because they know the difference between the rights and wrongs of powerful web-scraping?
Hence, we must define our web scraping needs with a purpose.
A purposeful web scraping that comes with great care (and responsibility).
Think about how many Uncle Bens out there you could save and not land yourself in your legal (and moral) hot waters unnecessarily.
You can read more about this senstive subject here, and here, and here, and here, etc.
Till next time, Happy (and ethical) Coding!
PS: You can find my sample code on my Github account here, if you like to know more about it.
For more learning resources on web scraping, you can check the links below:
]]>