How and why I built A Python Package
About me
I am a Lead Data Scientist passionate about building models and tools
The problem I wanted to solve
In of the approachs to solve time series predictive modeling is to transform it to supervised machine learning approach. It is not easily achievable, since one will have to do data transformations to put the data in the right format.
What is A Python Package?
ts2ml is a python package designed to help data scientist format time series problem into supervised machine learning approach. It provides functions and tools to easily perform data required data transformations.
Tech stack
The package is entirely build with python and nbdev - a literate programming framework to build python packages from jupyter notebooks. I have choosen nbdev because it enhances the development workflow, as it allows to go from jupyter notebooks to python modules very easily, and it gives me for free CI/CD pipelines with github actions, documentation and testing.
The process of building A Python Package
I build ts2ml after facing a challenge of encapsulate all the steps required to solve a time series problem in a supervised machine learning approach. I was faced with this problem both on a client, where my team was responsable for build a forecast model and in a online course I take about machine learning engineering.
Challenges I faced
The process of abstraction and encapsulation of functions, classes and methods was the challenging part. Fortunately, nbdev has taken care of the rest needed to build a first-class python product/package.
Key learnings
If possible, it's very important since the beginning of the project to define your functions in the most abstract and general way, to avoid re-work when you'll want to use it for more general cases.
Tips and advice
I strongly suggest the use of nbdev to start build your own python packages. It lets you do the hard work in a jupyter notebook and later export it to python modules. Beside all of this, it gives you for free tools to start with a good CI/CD pipeline, documentation with quarto direct from the jupyter notebooks, testing, software management with semantic versioning, pypi builds and uploads management.
Final thoughts and next steps
It is always a good experience to automate parts of my daily jobs. Doing this by building and publishing a python package is one of the most enjoyable work experience I have. If you need help to start building your own python package, be it from scratch or to frame a bunch of code and functions into a nice python lib you could reuse and share, please let me know! It will be a pleasure to help you!