Why it is recommended to run only one process in a container?

81

36

In many blog posts, and general opinion, there is a saying that goes "one process per container".

Why does this rule exist? Why not run ntp, nginx, uwsgi and more processes in a single container that needs to have all processes to work?

blog posts mentioning this rule:

Evgeny

Posted 2017-03-09T06:30:20.410

Reputation: 7 247

But - would it be okay to have a very "fat" container with dozens of processes in order to stage a rollout and operation of an enterprise server which still can't have Docker?Peter 2017-09-01T03:10:12.397

@J.Doe it will probably not be okay. containers are different than VMs, there are multiple small problems even for a small application - for an enterprise rollout it will be a two year project get it all to run in a container in the first place.Evgeny 2017-09-01T09:51:04.547

Answers

68

Lets forget the high-level architectural and philosophical arguments for a moment. While there may be some edge cases where multiple functions in a single container may make sense, there are very practical reasons why you may want to consider following "one function per container" as a rule of thumb:

  • Scaling containers horizontally is much easier if the container is isolated to a single function. Need another apache container? Spin one up somewhere else. However if my apache container also has my DB, cron and other pieces shoehorned in, this complicates things.
  • Having a single function per container allows the container to be easily re-used for other projects or purposes.
  • It also makes it more portable and predictable for devs to pull down a component from production to troubleshoot locally rather than an entire application environment.
  • Patching/upgrades (both the OS and the application) can be done in a more isolated and controlled manner. Juggling multiple bits-and-bobs in your container not only makes for larger images, but also ties these components together. Why have to shut down application X and Y just to upgrade Z?
    • Above also holds true for code deployments and rollbacks.
  • Splitting functions out to multiple containers allows more flexibility from a security and isolation perspective. You may want (or require) services to be isolated on the network level -- either physically or within overlay networks -- to maintain a strong security posture or comply with things like PCI.
  • Other more minor factors such as dealing with stdout/stderr and sending logs to the container log, keeping containers as ephemeral as possible etc.

Note that I'm saying function, not process. That language is outdated. The official docker documentation has moved away from saying "one process" to instead recommending "one concern" per container.

Jon

Posted 2017-03-09T06:30:20.410

Reputation: 795

Concerns may not always correspond 1:1 to processes. What if I have a cron job to do somnething about my service? The job is not independent from the service. But I know folks who make a container for litteraly every unix process with no cheating. And they keep chanting one process=one container mantra as a justification to do this. Something seems off to me.Gherman 2019-10-30T12:43:12.807

1

Still, it seems the low-level argument against threads fits here... http://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf

jeffmcneill 2017-11-19T17:50:03.223

Great, comprehensive answer!Rob Wells 2019-01-19T12:58:52.073

Is the idea that the question didn't really mean 'process' in the OS sense - that docker and related writings were using a different terminology that has now been clarified by switching to the word 'function'? Because otherwise, while I acknowledge that this is the accepted and highest rated answer, I don't think it answers the question that was asked.Tom 2019-04-30T19:27:58.127

29

Having slain a "two processes" container a few days ago, there some pain points for me which caused me to use two container instead of a python script which started two processes:

  1. Docker is good at recognizing crashed containers. It can't do that when the main process looks fine, but some other process died a gruesome death. Sure, you can monitor your process manually, but why reimplement that?
  2. docker logs gets a lot less useful when multiple processes are spewing their logs to the console. Again, you can write the process name to the logs, but docker can do that, too.
  3. Testing and reasoning about a container get's a lot harder.

Christian Sauer

Posted 2017-03-09T06:30:20.410

Reputation: 391

This and the answer by Jon that was accepted are the combined right answer. I finally truly get why one process or function or PID is the right thing to do. Theory above and practical here. I would not have bought in with out both of them together. Thank you both.Michael McGarrah 2019-12-01T01:00:15.007

This should be the accepted answer.ClintM 2018-01-15T19:07:10.707

Agreed. While there are some other answers with some great points, the key point is about docker's handling of PID 1.Brett Wagner 2018-03-06T17:09:04.173

13

The recommendation comes from the goal and design of the Operating-system-level virtualization

Containers have been designed to isolate a process for others by giving it its own userspace and filesystem.
This is the logical evolution of chroot which was providing an isolated filesystem, the next step was isolating processes from the others to avoid memory overwrites and allowing to use the same resource (I.e TCP port 8080 for example) from multiple processes without conflicts.

The main interest in a container it to package the needed library for the process without worrying about version conflicts. If you run multiples processes needing two versions of the same library in the same userspace and filesystem, you'd had to tweak at least LDPATH for each process so the proper library is found first, and some libraries can't be tweaked this way, because their path is hard coded in the executable at compilation time, see this SO question for more details.
At the network level you'll have to configure each process to avoid using the same ports.

Running multiple processes in the same container require some heavy tweaking and at the end of the day defeat the purpose of isolation, if you are ok to run multiples processes within the same userspace, sharing the same filesytem and network resources, then why not running them on the host itself ?

Here is the non exhaustive list of the heavy tweaking/pitfalls I can think of:

  • Handling the logs

    Either being with a mounted volume or interleaved on stdout this bring some management. If using a mounted volume your container should have it's own "place" on host or two same containers will fight for the same resource. When interleaving on stdout to take advantage of docker logs it can become a nightmare for analysis if the sources can't be identified easily.

  • Beware of zombie processes

    If one of your process in a container crash, supervisord may not be able to clean up the childs in a zombie state, and the host init will never inherit them. Once you exhausted the number of available pids (2^22 so roughly 4 millions) a bunch of things will fail.

  • Separation of concerns

    If you run two separated things, like an apache server and logstash within the same container, that may ease the log handling, but you have to shutdown apache to update logstash. (In reality, you should use the logging driver of Docker) Will it be a graceful stop waiting the current sessions to end or not ? If it's a graceful stop, it may take sometime and become long to roll the new version. If you do a kill, you'll impact users for a log shipper and that should be avoided IMHO.

Finally when you have multiple processes you're reproducing an OS, and in this case using a hardware virtualization sounds more in line with this need.

Tensibai

Posted 2017-03-09T06:30:20.410

Reputation: 9 733

3I find these arguments unconvincing. There is a huge difference between a process with multiple containers and running on host. While explaining the original intention of containers is somewhat relevant, it's not really a compelling reason to avoid multi-process containers. IOW, you're answering "why not" with "why yes", which isn't as helpful as it could be. It can be very convenient to run multiple processes in the same container - that's the why yes. The why not remains to be explained.Assaf Lavie 2017-03-09T13:51:05.263

@AssafLavie why does "Running multiple processes in the same container require some heavy tweaking and at the end of the day defeat the purpose of isolation, " doesn't explain the not ? I don't really get what you find missingTensibai 2017-03-09T14:05:08.210

1You haven't elaborated on the kind of tweaking you had in mind. And you haven't made the case that this tweaking is more work than setting up multiple containers. Let's take a concrete example: you often see packaged docker images that have supervisord running some main process and some auxiliary process. This is very easy to set up; arguably just as easy as separating the containers. e.g. app & log shipper. So, the onus is on your part, I believe, to argue why this isn't the case.Assaf Lavie 2017-03-09T15:58:19.450

1BTW, I do believe there are valid arguments against multi-process containers, but you did not mention any of them. But in any case, it's far from being a clear cut case. In some instances it's perfectly acceptable to allow more than one process. Heck, some very popular images spawn several sub-process - is that evil as well? What I'm saying is there are trade-offs, and your answer paints a one-sided picture that lacks nuance and detail.Assaf Lavie 2017-03-09T16:00:13.390

@AssafLavie Fair enough, supervisord is the heavy tweaking in itself, the fact it is pre-baked doesn't remove it's a twist from the original goal. But I see your point (and agree it's not a binary choice) I was more on the why it should be avoided, I'll see to bring some more references when I have time.Tensibai 2017-03-09T16:03:21.493

@AssafLavie I've expended my point of view, is it better ?Tensibai 2017-03-10T11:12:50.953

@Pierre.Vriens No I've no idea who it is from and I don't really care to be honest. If someone find this not useful he/she's allowed to, I'll live with it :) But sure a reason for the downvote would be great just to know the why, if it's because of the wording or just because the person disagree. for the wording I can try again to express it differently, if it's disagreement, well, end of talk ;)Tensibai 2017-03-13T15:16:37.057

1

interesting ... It sounds like we have similar (identical) opinion on this. Maybe you should just ignore it in this case, because it was from somebody who wanted to earn the Critic badge ... and decided to abuse your answer to get that badge ...

Pierre.Vriens 2017-03-13T15:55:07.027

@Pierre.Vriens don't rush to conclusion, Jon answer is clearly in favor of using concern instead of process, this doesn't mean the downvoter did downvote this answer for a badge because he prefer the concern approach, he may have downvoted something else in this timeframe or maybe because he thinks this answer is not useful, or another reason. Truth is you can't read minds :) And I'm against early conclusion (even if I fall into this trap sometimes)Tensibai 2017-03-13T20:32:49.380

1I don't "rush" to conclusion ... I just recommend you to ignore it. But "you" cannot change my mind on what I have seen with my own eyes about who the anonymous downvoter of your answer is. Anyway, time to move on ...Pierre.Vriens 2017-03-13T20:49:41.873

6

As in most cases, it's not all-or-nothing. The guidance of "one process per container" stems from the idea that containers should serve a distinct purpose. For example, a container should not be both a web application and a Redis server.

There are cases where it makes sense to run multiple processes in a single container, as long as both processes support a single, modular function.

Dave Swersky

Posted 2017-03-09T06:30:20.410

Reputation: 3 573

2

The process I'll called as service here, 1 container ~ 1 service, if any of my service is failed then I'll only spin up that respective container and with-in seconds everything is up again. So, there won't be any dependencies between services. It is best practice to keep your container size less than 200 MB and max 500 MB (exception to windows native containers are more than 2 GB's) otherwise, its going to be the similar as virtual machine, not exactly but, performance suffice. Also, take into consideration few parameters as scaling, how could I make my services resilience, auto-deploy, etc.

And, its purely your call how you need to make your architectural patterns like micro-service in polygot environment using the containers technology that best suite your environment and will automate the things for you.

mohan08p

Posted 2017-03-09T06:30:20.410

Reputation: 285