Monday, August 5, 2019

PSF GSoC students blogs: Week-10: Caching in the pipeline

Hello folks,

This week was pretty tough. A seemingly simple task of setting up docs CD pipeline to execute notebooks ate up entire week, thanks to the complexities of DevOps! 🙄

What did I do this week?

  1. To execute our notebooks from docs CD pipeline, we need to make cached filter data available at the VM as a pre-build step. So I decided to store it at Azure as artifact (Universal Package) so that each time pipeline runs, it can instantly download artifact into VM and then also update it before using. Since it's updating the artifact, I decided to also publish it back to artifact feed thereat so that next time we get a more up-to-date data.
    • For 1st build of such a pipeline, I needed to make sure that artifact is already present in feed - which means publish the artifact without pipeline but from Azure CLI, locally. The authentication in Azure CLI was really cumbersome but I figured out how to use a PAT for it & then published the filter data as artifact.
    • Then I wrote script steps in pipeline to download artifact & bring it in right directory. The challenge here was to make pipeline download the latest versioned artifact from feed. After some mind-boggling research, I found how to achieve it with Azure REST API, but problem was again authentication while calling the API. On trying to solve it next day with a clear head, I found the solution that was right in front of my eyes in an already opened SO thread which I was overlooking!
    • I wrote script to conditionally publish the filter data as a newer versioned artifact if it got updated. Here I needed to write a python command that calls update function & returns update status back to bash script. But due to logger enabled in function, the returned value was entirely messed up - I fixed it up by disabling logger.
    • I also improved the versioning of artifacts by using date as version but there were conflicts with SemVer accepted by Azure, which again took time to manipulate date into acceptable version.
  2. Next I needed to make sure that executed notebooks give right outputs.
    • The matplotlib plots didn't appear in rendered noteboook. After some searching & digging, I found it was because we were interactively plotting graphs using %pylab notebook. By using %matplotlib inline magic which works non-interactively I made the plots appear.
    • I also cleaned some unnecessary data from quickstart notebook & made documentation more clear.

 

What is coming up next?

There's still a problem with pipeline, it is failing for PR builds although it works fine for my fork. I will try to fix that and then we will possibly move to starkit, where we can integrate it with wsynphot (on which I am currently working) to produce an interface for calculating photometery.

 

Did I get stuck anywhere?

Yes, it was these unexpected problems that took me finishing off this task of making pipeline execute the notebooks, an entire week! But I eventually solved all of them except that PR build failing problem.

 

What was something new I learned?

⚙️ This week made me learn many new things about Azure DevOps, like:

  • Azure CLI & how to use & authorize it to manage Azure resources from another (local) system
  • Azure REST API i.e. a really powerful API which lets you create/retrieve/update/delete the Azure resources
  • System.AccessToken which is a special predefined variable that used as OAuth token to access the REST API
  • Unlike powershell task which runs Windows PowerShell, we can use pwsh task on LINUX VMs since it runs PowerShell Core

🔡 While passing variable from a child process (python script) to parent shell (bash), make sure that you only write the value which you want to be passed on stdout. This means keep a check that there are no such function calls with print or logging statements in your script other than value you want to pass by printing it.

🧐 Openness for the strange options while researching: When we search for the solution of a problem on internet, lot of new & weird information comes before us which we just skim enough to decide that it is not for our case. But even then if we try to understand it, we may get how to make it work for our case by experimenting. Same happened when I was looking for how to authorize my build for Azure API call. On a SO thread I found a powershell script for it but I didn't bother to understand it thinking that powershell script can't be of any use to a LINUX VM. But when I eventually found the solution (SystemAcessToken & pwsh for LINUX), I was like: After all this time answer was right in front of my eyes and I was searching it here & there, by not caring to give it some minutes to understand! 

 


Thank you for reading. Stay tuned to learn about more interesting things with me!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...