At work, we are evaluating Docker as part of our “epic next generation deployment platform”. One of the requirements that our operations team has given us is that our containers have “identity” by virtue of their IP address. In this post, I will describe how we achieved this.
But first, let me explain that a little. Docker (as of version 0.6.3) has 2 networking modes. One which goes slightly further that what full virtualisation platforms would typically call “host only”, and no networking at all (if you can call that a mode!) In “host only” mode, the “host” (that is, the server running the container) can communicate with the software inside the container very easily. However, accessing the container from beyond the host (say, a client – shock! horror!) isn’t possible.
As mentioned, Docker goes a little bit further by providing “port forwarding” via iptables/NAT on the host. It selects a “random” port, say 49153 and adds iptables rules such that if you attempt to access this port on the host’s IP address, you will actually reach the container. This is fine, until you stop the container and restart it. In this case, your container will get a new port. Or, if you restart it on a different host, it will get a new port AND IP address.
One way to address this is via a service discovery mechanism, whereby when the service comes up it registers itself with some kind of well-known directory and clients can discover the service’s location by looking it up in the directory. This has its own problems – not least of which is that the service inside the container has no way of knowing what the IP address of the host it’s running on is, and no way of knowing which port Docker has selected to forward to it!
So, back to the problem in hand. Our ops guys want to treat each container as a well-known service in its own right and give it a proper, routable IP address. This is very much like what a virtualisation platform would call “bridged mode” networking. In my opinion, this is a very sensible choice.
Here’s how I achieved this goal:
Defining a new virtual interface
In order to have a separate IP address from the host, we will need a new virtual interface. I chose to use a
macvlan type interface so that it could have its own MAC address and therefore receive an IP by DHCP if required.
ip link add virtual0 link eth0 type macvlan mode bridge
This creates a new virtual interface called
virtual0. You will need a separate interface for each container. I am using bridged mode as it means container-to-container traffic doesn’t travel across the wire to the switch/router and back.
If you need a fixed MAC address – perhaps because you want a fixed IP address and you achieve this by putting a fixed MAC-to-IP mapping in your DHCP server – you need to add the following to the end of the line:
ip link add ... address 00:11:22:33:44:55
(substitute your own MAC address!)
If you do not do this, the kernel will randomly generate a MAC for you.
Giving the interface an IP address
If you want a fixed IP address and you’re happy to statically configure it:
ip address add 10.10.10.88/24 broadcast 10.10.10.255 dev virtual0
(substitute your own IP address, subnet mask and broadcast address)
If, however, you need to use DHCP to get an address (development environment, for example):
This will start the DHCP client managing just this one interface and leave it running in the background dealing with lease renewals, etc. Once you’re ready to tear down the container and networking, you need to remember to kill the DHCP client, or, better yet:
dhclient -d -r virtual0
will locate the existing instance of the client for that interface, tell it to properly release the DHCP lease, and then exit.
Finally, bring the interface up:
ip link set virtual0 up
Find the container’s internal IP address
If you haven’t done so now, start your container as a daemon.
docker inspect <container id> to find the internal IP address under NetworkSettings/IPAddress.
Setting up inbound routing/NAT
Create a chain for the inbound routing rules for this container. I prefer to use a separate chain for each container as it makes cleaning up easier.
iptables -t nat -N BRIDGE-VIRTUAL0 # some unique name for the iptables chain for this container
Now, send all incoming traffic to this chain. We add a rule in
PREROUTING which will match traffic coming in from outside, and a rule in
OUTPUT which will match traffic generated on the host. Here,
<container external IP> is the routable IP allocated statically or via DHCP to the
iptables -t nat -A PREROUTING -p all -d <container external IP> -j BRIDGE-VIRTUAL0
iptables -t nat -A OUTPUT -p all -d <container external IP> -j BRIDGE-VIRTUAL0
Finally, NAT all inbound traffic to the container. Here,
<container internal IP> is the internal address of the container as discovered using the
docker inspect command.
iptables -t nat -A BRIDGE-VIRTUAL0 -p all -j DNAT --to-destination <container internal IP>
This particular rule will forward inbound traffic for any port on the external IP to the same port on the internal IP – effectively exposing anything that the container exposes to the outside world. As an alternative, you can expose individual ports like so:
iptables -t nat -A BRIDGE-VIRTUAL0 -p all -m tcp --dport 80 -j DNAT --to-destination <container internal IP>:8080
In this example, we forward any traffic hitting port 80 on our external IP to port 8080 on the internal IP – effectively exposing a web server on port 8080 inside the container as port 80 on the external IP. Nice tidy URLs that don’t have port numbers on. Neat, huh? You can even map multiple external ports to the same internal port by using multiple rules if you wish.
Setting up outbound NAT
What we have so far works, but if the container initiates any outbound requests, they will appear (by virtual of Docker’s default
MASQUERADE rule) to come from the hosts own IP. This may be fine depending on your circumstances. But it probably won’t play well if you want to get firewalls involved anywhere, and it might be a nuisance if you’re using
netstat or similar to diagnose where all the incoming connections to a heavily loaded server are coming from.
So… we’ll add “source NAT” into the equation so that outbound connections from the container come from its own IP:
iptables -t nat -I POSTROUTING -p all -s <container internal IP> -j SNAT --to-source <container external IP>
This rule says any traffic we’re routing outbound which has a source address of the containers internal IP should have the source address changed to the containers external IP.
Note that we use
-A) to ensure our rule goes ahead of Docker’s default
This all worked fine when I set it up at home, and then failed in the office. It turns out that it fails when you’re on a routed network (my network at home is a simple, flat network) and you try and reach your container from a different subnet. To cut a long story short, you need to disable “reverse path filtering”!
echo 2 > /proc/sys/net/ipv4/conf/eth0/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/virtual0/rp_filter
There’s a lot of information in this post, but in summary what we’ve done is:
- Create a new virtual interface for the container
- Given the virtual interface an IP address either statically or using DHCP
- Used iptables to set up inbound routing to allow the container to be reached via its external/routable IP from both the host and the network beyond the host
- Optionally forwarded (and even more optionally, aliased) only specific inbound ports
- Set up source NAT to rewrite outbound traffic from the container to use the correct external IP
- Given the kernel a bit of a helping hand to understand that we’re not terrorists and it’s safe to “do the right thing” with our slightly odd-looking traffic
The first 5 steps of this took me about 20 minutes to figure out. Step 6 took me 3 days, the help of several network engineers, one extremely patient ops guy and no small amount of blind luck!
I hope having it all written down in one place will be helpful to others.