The Problem With Configurations

The Problem With Configurations

On April 3rd and 4th 2015, second DevopsDays conference will be held in Ljubljana, Slovenia. Avishai comes to DevopsDays Ljubljana for the 2nd time to speak; and this time around about what "work" is. We've asked him if he could write a bit about something that he cares about, and from the many topics he could cover, he picked configuration. If you're into systems or app development, you probably know how hard it is to get configuration right. Head on over to read Avishai's write-up on configuration and see what he thinks about how it is and could be done.


Configuration is one of those things we usually take for granted. Yet, as more and more configurations are added the use of configurations becomes hard and complicated. With enough configuration parameters, keeping track of what the current configuration “is” can be quite problematic. Often, people misconfigure systems due to various reasons such as poor documentation or unexpected interactions between different parameters. Sometimes, the configuration itself can be complex and requires a language to express it, raising the bar of required knowledge for defining the configuration.

Despite the “boring” nature of configurations they can and do obliterate entire companies in matter of hours.

Yet another config format

If you take a quick look in your /etc/ directory you will find a pile of configuration files using a large variety of format. From key/value shell include files through ini, toml, json, yaml, xml – and those are the relatively standard ones; If you care to look at sudoers, apt.conf, apache or postfix for example you will find more weird looking formats.

Unfortunately we humans need to remember the syntax for each format, not to mention other pecularities of each program configurations. For example, there is no standard for config file validation and some programs don’t have validation at all!

The vast variety of config formats makes system administration harder. It is hard to keep track and maintain a decent skill; some formats are very hard to generate from configuration management tools and scripts so we must resort to using error prone text templates.

Configuration drop bomb

The conf.d pattern has emerged many years ago as a way to allow different modules and packages to inject configuration snippets to other programs. In the absent of CM tools, it served humans to split large configurations to make management easier.

This pattern is generally considered a good practice yet it causes surprising amount of difficulty in maintenance and automation of systems. Since configurations are merged, values can be overridden from other files and are sometimes merged in unexpected ways. You change a configuration parameter, yet the program behaves as if you changed nothing; you grep through the directory and find this value has been set in a different file, one which you do not control. You must reason about the merge order and move your file to a higher priority, possibly overriding other parameters… and now you need to debug again.

When using CM tools this pattern is particularly annoying – if your CM tool manage files in the conf.d directory and you later remove these file resources the configuration files are not removed, they are simply no longer managed. Moreover, this allows overriding the automatic configurations by dropping files into conf.d manually, circumventing our effort to standardise configurations in servers.

Dude, did you restart just now?

Traditionally, config reload is done by sending a POSIX signal to the process, usually USR1 or HUP. The problem is that signal have no output and there is no obvious feedback telling us whether our reload succeeded. Perhaps the config file is malformed or we have set some parameter to an invalid value; Perhaps the signal was blocked or the program couldn’t handle it. We simply don’t know until we start digging through log files and even then we can’t be sure since the absent of “reload” message in the log file doesn’t actually mean the reload didn’t happen. On top of that it’s extremely difficult to automate such checks, so in most cases we simply give up and assume the configuration made it to our program or take the brutal approach and restart our process needlessly.

When in doubt, nuke from orbit

Some parameters that require a restart to change and some require a reload. CM tools have no way to identify what parameters changed in your configuration file and whether a reload is sufficient to activate the changes. As a result, we are forced to always use the nuclear option – restart. Unfortunately sometimes even the nukes are not enough, such as when changing MySQL innodb ibdata file size – which requires a stop – maintenance – start cycle; This forces us to resort to “nuke from orbit” methods of tearing down compute instances to support a configuration change.

I have no idea what’s going on

Just because the config file contains value X doesn’t actually mean that’s what loaded in the process. Perhaps the file was changed without reloading the configuration or the configuration path is wrong – this is actually very common. So how do you know what configuration is loaded in your server? how do you validate that all servers are configured properly? Most programs do not provide a good mechanism for this.

A better way

Like most operational aspects of programs, configuration issues can and should be resolved by grassroots engineering work rather then after-the-fact makeshift solutions. A good example of an attempt to tackle this at the core is Netflix Archaius project and many others have followed suite.

There are several simple design principles that can help make the configuration of your program much easier to work with. To some degree, you can even apply these principle to 3rd party programs using CM tools:

  1. Separate configuration to 2-3 files based on the impact of configuration parameter change: 1st file contains parameters that require a restart to change, 2nd contains parameters that require a reload and so on.
  2. Avoid using the `conf.d` pattern. Instead, have your CM tools merge values and create a small number of config files – making debugging and validation easier
  3. Create a configuration API. If using REST, GET method should return a complete dump of configuration parameters with e-tag header, and HEAD should return the e-tag header without a body. The e-tag header should be a checksum of the configuration in canonical order allowing for easy comparison between in-ram configuration of all nodes and reference version in CM
  4. If possible, use the REST configuration API to reload configurations using a POST or PUT requests. This allows your CM tools to easily validate that the configuration was successfully loaded and whether any values were updated (200/201/202 response). If not possible, write a small reload wrapper that verifies configuration was reloaded using whatever feedback the program provides.
  5. When writing new programs, choose a serialiazed format for configuration like json, yaml, edn, etc. Although they are not particularly comfortable for users to work with directly, remember that using CM tools and simple utilities people can work with whatever format they feel comformtable with as a long as a conversion utility exists.
  6. Some programs require advanced configuration employing logic (e.g. logstash) which doesn’t easily map to serialized formats; For these, treat the configuration as a plugin and extract variables to an external configuration file.


  1. Jamiie


    18. 9. 2020. in 4:11 pm Reply

    Hi Avishai, this was a great read and I love how you say “survivor of many prod skirmishes”, it made me laugh haha!

    Take care

Leave a Reply

Your email address will not be published.



Whether you are new or an established company – finding product-market fit is never easy

They could have continued being successful on a smaller scale and be stagnant but they recognized there is a global market opportunity and decided to act. Here is the story of Lemax, a successful Software-as-a-Service company from Croatia.


Am I a lazy developer if I need a PlayStation break at work?

Me: “Gosh, I can’t go out for coffee now. I’m all fried from work. Gotta rest today.” My gang: “Liar, how would you get tired from work? All you do is play games on PlayStation!”


What is it like to be a social science major in the technology sector?

Economist Tanja Matošević paved her career path in the IT industry, in the Croatian agritech company AGRIVI, so she shared with us her experiences of how she managed leading technology projects for global clients.

What you missed


Travel companies need to free up time for customer-centric activities – through digitalization

Are travel companies aware of the need to change and why this time it is imperative, how should they plan their digital transformation and what benefits should they expect from it? We sought answers from Iva Vodopija, Head of Sales Operations at Lemax, a travel Software-as-a-Service company from Croatia.


Croatian airt has an API that will enable all developers to use AI without a hitch

Croatian AI startup airt that applied for a global patent protection of their algorithm in July, surprised us with another news at the Infobip Shift conference – their platform can now be used and implemented by developers as well.


What can international teams gain from Croatia’s first AI incubator?

In February of this year, Croatia got its first incubator for ventures working with AI, machine learning and data analytics. Sixteen startups already joined!


A day in the life of a Game Designer at Nanobit

What does a game designer actually do and how does that tie in with the world of mobile games? Allow me to explain.


SofaScore’s CTO: From student to expert who scaled a solution for 20 million users

You will not learn how to handle 20 million users at any college and few developers have the opportunity to do so. For SofaScore's CTO Josip Stuhli, who does just that every day, the first and biggest challenge was optimizing their own "machinery" to work better than Amazon's cloud.

Internet marketing

Developer in a marketing hell (based on a true story)

After half a year of ups and downs, mysteries and HEUREKA moments trying to understand what 'ad groups' are and what do 'keywords' have to do with ads, I am finally confident enough to say that I know what a complete ad group should look like. At last, I pull the trigger that creates 20,000 ad groups with keywords in one hour.