A Devops Journey - From SAAS To On-Premise
The aim of this article is to present our journey and thoughts about how we added the on-premise option to our current SAAS offer.
Toucan Toco’s solutions are originally SAAS solutions: we run them, update them, back them up and monitor them… with everything hosted on our side. However, with the growth of the business and the sensitive nature of some data, some clients asked us to host our solutions directly on their own infrastructure to respect their data policy. This can range from “data are too sensitive to be hosted elsewhere” to “nothing gets out from their networks”.
Even if our SAAS stack respects and follows all the security and isolation guidelines, for critical and touchy data, the on-premise option makes sense. From here we started to think about how we could ship something designed for SaaS as a “ready to use” package.
Our goal was to give autonomy to the customer’s tech team.
Where to start ? The first step was to rethink and challenge what we believed to know about creating an on-premise package.
What did “on-premise” mean to us? At the very beginning, we naively believed:
- creating an on-premise offer was just duplicating what we were currently doing on our infrastructure to the client environment, which was: install, update, monitor.
- we only had to list our requirements and everything would be ok … (Annie are you okay ? )
- last but not least we would need a README/HOWTO installation documentation
So we thought that’s all ? It looks pretty easy… Silly us, it was obviously not so simple.
Clients infrastructures are potentially really different from ours: not the same technologies, environments, workflows, methodologies. What works on our side could be impossible for our clients.
This is why we started a complete and deep work to build the on-premise offer.
It’s about some tech stuff…
The first main task is something obvious: make your solution and package “agnostic”.
What does it mean ? Our solution should be independent from the environment. For example we forbid hard links or paths: our clients should be able to install the app or store logs anywhere and not in a decreed directory.
Some people prefer to install in
/data or even in
/home/app/… This should not be a problem!
There is nothing more frustrating than when paths are strict and you need to create symlink to match your environment. Note: This work should be done carefully because of the potential side effects!
List our requirements and constraints
It’s really important to mark out our requirements and constraints: which OSs do we support? which versions? Do we need an Internet connection? What’s the server’s sizing? … The aim is to make support easier, avoid surprise to (and from) our clients.
We want to prevent the famous “oh I can’t do it? but you didn’t specify it anywhere!” and protect us from bizarre demands like “How can I run it on a Hannah Montana Linux?”.
Enable feature flags
Currently our stack needs a web server, a mongo backend, supervisord, etc… In some cases our clients will need to install the full stack but sometimes they’ve already got processes and workflows to install these different bricks and they only need our app.
This is why we decided to have an installation configuration process that would include feature flags.
Thus our clients could choose which part of our stack they would need.
Automate the build
The package we provide to our clients is automatically created for each new release.
Our CI has been modified to automatically generate, test and push it somewhere where our clients could easily find it.
Our clients are now able to download the latest and also all previously published packages without asking us. Nota: we think it’s important to tag our packages with versions to get a common reference between our clients and our support team.
Test the build
As explained earlier, thanks to our requirements list, commit ourselves to make our solution work in different environments.
It’s mandatory to adapt and create new specific tests to validate all these different contexts!
For example, our infrastructure is running on Ubuntu/Debian, but our solution is also supported on RedHat/CentOS, so we created associated and dedicated tests.
Simplify the monitoring
What about the monitoring? It’s always complicated to monitor something you don’t really know and we can’t ask our clients to manage this part alone.
Because it’s mandatory, we provide tools to monitor and understand our stack without deep knowledge about it.
Typically, our backend exposes a status page which returns KO (HTTP 5xx status) when a brick of our stack goes wrong (like a mongo crash). This approach is really agnostic because our client’s IT teams only need to plug their monitoring systems (like Nagios, Zabbix, StatusCake…) on this page and that’s all!
But there are a lot of other non tech stuff…
If thinking about the on-premise offer was only a tech project, it would have been too simple and not enough fun :).
Improve the documentation
The effort on documentation is probably the most important part to create an on-premise offer.
It’s not one of our core value for nothing: (WTFM b*tch).
We should not underestimate it.
A good doc is important to avoid stress for our support team (with limitless support mails) and frustrations or disappointments for our clients (because they’re not able to find answers by themselves). We rebuilt our online doc from scratch: design, content, structure… we challenged everything and we - with a great work from all Toucan teams! - finally created the current online documentation.
With this new organization by user profile (app designer, it manager or sysadmin), you can easily find what you need.
It’s also essential to have a nice, beautiful and clear doc: an ugly and non welcoming documentation will become an obstacle and will create more interruptions and mails exchanges with the support team.
The documentation is also a safeguard, no one can deny it when it’s written somewhere, and say “but I thought it was compatible with Windows NT 3!”.
Finally it’s important to note the documentation is not a one shot effort: we continuously update and improve it with feedbacks and our app evolutions.
Create a support workflow
We set a support process because we know we will always recieve questions and mails: “you can only plan so much”.
Without creating a complex methodology, we just created a generic email address(with several people behind) and started a logbook for each project.
The logbook (basically hosted on our wiki) lets all the Toucan people know the status (which version is installed? is it really in production?), the history (meeting reports, mails exchange) and the key people of a on-premise project.
The generic mailbox is a common practice to have only one entry point and be sure there is not only one person in front of all demands.
Work with the on-premise mindset
We also changed our development process to fit the on-premise approach: all features should respect what we explained earlier about tests, feature flags, OS/environment compatibility…
We also added an on-premise part in our PRs’ description template to never forget our app and all new features must work in other contexts.
Communicate with all teams
Internal communication should be perfect to be sure all the Toucan teams are align on: what we do, could do, won’t do.
It’s important to keep updated all the teams (tech or not) to avoid the famous “Great a new project! It should work on RedHat 4 without internet connection but we have time, the deadline is in 2 weeks… easy!”
Become our first client
Finally the best way to be sure our on-premise approach is ok, is to become our own first client and avoid to have a SAAS and an on-premise specific deployments.
One of our goals, in the really short term, is to converge and have only one way to deploy.
With a common approach, it will be simpler to improve it and make it more reliable.
Few words to finish…
As you can see, it’s a real big project to build the on-premise offer from scratch. it required the attention of all our different teams (tech or not), we changed our habits and mindset, we needed to create a lot of things and so on.
At the beginning, it seemed scary but we did it!
The last question could be: how to know if we finished our job?
There will always be parts to improve, but we told ourself that the job would be on the right track when our clients would be autonomous enough to install and update our app on their own… And guess what ? this is where we are ;) I’m not bragging but … Well actually I do !
In the last months, 4 new fortune 500 companies have setup the Toucan Toco software on-premise. The last one did it almost in full autonomy, with only one back-and-forth email about a configuration option. And you could be the next one :)
Feel free to discuss and share with us your experiences about this topic!
If you want to learn a little bit more about my job : https://www.youtube.com/watch?v=Spl_EiL9I7A