At work, we are evaluating Docker as part of our “epic next generation deployment platform”. One of the requirements that our operations team has given us is that our containers have “identity” by virtue of their IP address. In this post, I will describe how we achieved this.
But first, let me explain that a little. Docker (as of version 0.6.3) has 2 networking modes. One which goes slightly further that what full virtualisation platforms would typically call “host only”, and no networking at all (if you can call that a mode!) In “host only” mode, the “host” (that is, the server running the container) can communicate with the software inside the container very easily. However, accessing the container from beyond the host (say, a client – shock! horror!) isn’t possible.
As mentioned, Docker goes a little bit further by providing “port forwarding” via iptables/NAT on the host. It selects a “random” port, say 49153 and adds iptables rules such that if you attempt to access this port on the host’s IP address, you will actually reach the container. This is fine, until you stop the container and restart it. In this case, your container will get a new port. Or, if you restart it on a different host, it will get a new port AND IP address.
One way to address this is via a service discovery mechanism, whereby when the service comes up it registers itself with some kind of well-known directory and clients can discover the service’s location by looking it up in the directory. This has its own problems – not least of which is that the service inside the container has no way of knowing what the IP address of the host it’s running on is, and no way of knowing which port Docker has selected to forward to it!
So, back to the problem in hand. Our ops guys want to treat each container as a well-known service in its own right and give it a proper, routable IP address. This is very much like what a virtualisation platform would call “bridged mode” networking. In my opinion, this is a very sensible choice.
Here’s how I achieved this goal:
Defining a new virtual interface
In order to have a separate IP address from the host, we will need a new virtual interface. I chose to use a macvlan
type interface so that it could have its own MAC address and therefore receive an IP by DHCP if required.
ip link add virtual0 link eth0 type macvlan mode bridge
This creates a new virtual interface called virtual0
. You will need a separate interface for each container. I am using bridged mode as it means container-to-container traffic doesn’t travel across the wire to the switch/router and back.
If you need a fixed MAC address – perhaps because you want a fixed IP address and you achieve this by putting a fixed MAC-to-IP mapping in your DHCP server – you need to add the following to the end of the line:
ip link add ... address 00:11:22:33:44:55
(substitute your own MAC address!)
If you do not do this, the kernel will randomly generate a MAC for you.
Giving the interface an IP address
If you want a fixed IP address and you’re happy to statically configure it:
ip address add 10.10.10.88/24 broadcast 10.10.10.255 dev virtual0
(substitute your own IP address, subnet mask and broadcast address)
If, however, you need to use DHCP to get an address (development environment, for example):
dhclient virtual0
This will start the DHCP client managing just this one interface and leave it running in the background dealing with lease renewals, etc. Once you’re ready to tear down the container and networking, you need to remember to kill the DHCP client, or, better yet:
dhclient -d -r virtual0
will locate the existing instance of the client for that interface, tell it to properly release the DHCP lease, and then exit.
Finally, bring the interface up:
ip link set virtual0 up
Find the container’s internal IP address
If you haven’t done so now, start your container as a daemon.
Use docker inspect <container id>
to find the internal IP address under NetworkSettings/IPAddress.
Setting up inbound routing/NAT
Create a chain for the inbound routing rules for this container. I prefer to use a separate chain for each container as it makes cleaning up easier.
iptables -t nat -N BRIDGE-VIRTUAL0 # some unique name for the iptables chain for this container
Now, send all incoming traffic to this chain. We add a rule in PREROUTING
which will match traffic coming in from outside, and a rule in OUTPUT
which will match traffic generated on the host. Here, <container external IP>
is the routable IP allocated statically or via DHCP to the virtual0
interface.
iptables -t nat -A PREROUTING -p all -d <container external IP> -j BRIDGE-VIRTUAL0
iptables -t nat -A OUTPUT -p all -d <container external IP> -j BRIDGE-VIRTUAL0
Finally, NAT all inbound traffic to the container. Here, <container internal IP>
is the internal address of the container as discovered using the docker inspect
command.
iptables -t nat -A BRIDGE-VIRTUAL0 -p all -j DNAT --to-destination <container internal IP>
This particular rule will forward inbound traffic for any port on the external IP to the same port on the internal IP – effectively exposing anything that the container exposes to the outside world. As an alternative, you can expose individual ports like so:
iptables -t nat -A BRIDGE-VIRTUAL0 -p all -m tcp --dport 80 -j DNAT --to-destination <container internal IP>:8080
In this example, we forward any traffic hitting port 80 on our external IP to port 8080 on the internal IP – effectively exposing a web server on port 8080 inside the container as port 80 on the external IP. Nice tidy URLs that don’t have port numbers on. Neat, huh? You can even map multiple external ports to the same internal port by using multiple rules if you wish.
Setting up outbound NAT
What we have so far works, but if the container initiates any outbound requests, they will appear (by virtual of Docker’s default MASQUERADE
rule) to come from the hosts own IP. This may be fine depending on your circumstances. But it probably won’t play well if you want to get firewalls involved anywhere, and it might be a nuisance if you’re using netstat
or similar to diagnose where all the incoming connections to a heavily loaded server are coming from.
So… we’ll add “source NAT” into the equation so that outbound connections from the container come from its own IP:
iptables -t nat -I POSTROUTING -p all -s <container internal IP> -j SNAT --to-source <container external IP>
This rule says any traffic we’re routing outbound which has a source address of the containers internal IP should have the source address changed to the containers external IP.
Note that we use -I
(not -A
) to ensure our rule goes ahead of Docker’s default MASQUERADE
rule.
And finally
This all worked fine when I set it up at home, and then failed in the office. It turns out that it fails when you’re on a routed network (my network at home is a simple, flat network) and you try and reach your container from a different subnet. To cut a long story short, you need to disable “reverse path filtering”!
echo 2 > /proc/sys/net/ipv4/conf/eth0/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/virtual0/rp_filter
For more information about this, see this ServerFault question, this article and this article (linked from the previous one).
Summary
There’s a lot of information in this post, but in summary what we’ve done is:
- Create a new virtual interface for the container
- Given the virtual interface an IP address either statically or using DHCP
- Used iptables to set up inbound routing to allow the container to be reached via its external/routable IP from both the host and the network beyond the host
- Optionally forwarded (and even more optionally, aliased) only specific inbound ports
- Set up source NAT to rewrite outbound traffic from the container to use the correct external IP
- Given the kernel a bit of a helping hand to understand that we’re not terrorists and it’s safe to “do the right thing” with our slightly odd-looking traffic
The first 5 steps of this took me about 20 minutes to figure out. Step 6 took me 3 days, the help of several network engineers, one extremely patient ops guy and no small amount of blind luck!
I hope having it all written down in one place will be helpful to others.
You’ve spoiled me! And.. Lets see… Saved me cca 3 days… 🙂 Nah… Probably more. Thank you.
I hope that new version of Decker will offer some helper function for this kind of network setup.
You’re welcome. I’m talking to the Docker guys about getting some kind of hybrid of my approach and the pipework approach built into Docker. I’d do it myself, but I don’t know Go (not a great excuse) and I’d have to do it on my limited free time.
I’m using openvswitch and the virtual ethernet interfaces for LXC container, and attach the virtual interfaces to the ovs switch via the LXC up and down scripts.
I’m not using docker but my home grown LXC solution that is similar.
Once you got the containers on the ovs things are easy, you can run dnsmasq to hand out ip’s on the ovs, add routes and firewall rules easily etc. I’ve been very happy with the combo of LXC and ovs.
Could be worth evaluating for your situation.
I’m using openvswitch together with LXC for this purpose in my homegrown setup. The container get a veth interface that I attach to the OVS, and that let’s me easily handle IP addresses (dynamic or static), routing, firewall rules etc. I don’t use docker but something similar homegrown on top of LXC so should be applicable.
It may be worth giving that kind of thing a spin, separates things quite nicely. I never liked the portfowarding stuff of docker.
hmm…. SDN…
Can I do with Open vSwitch some crazy stunts like: For every service (mail, http, …) to have separate docker/LXC instance and all of them to share same public IPv4 address but without using NAT?
Openvswitch is just that – a switch. In my case I don’t want to have the same public ip, I want separate private ip and them to be able to talk to each other over said private network. If you connect your containers to ovs it won’t help with using same public ip, it’s just a switch and you’d need port forwarding etc from interface on physical host.
One problem with this kind of setup is that the container cannot reach itself using the external IP. Any ideas on how to make that work efficiently with iptables?
How to run multiple services like sshd, web server, database in a singe docker container ?
Any Ideas ?
Not really related, but many people have success using supervisord or equivalent.
try this:
http://docs.docker.io/en/latest/examples/using_supervisord/
hi, do you have a script to summarized this ? would gladly accept some guidance, coming from a linux/docker newbie.
from what i gather, you would need to run this “script” each time the container is started/restarted (since the internal ip should change) right ?
I am very new to this topic so I believe that I have screwed something while I try to follow your steps. I ran the below commands to complete the configuration.
1) ip link add vipr2 link eth0 type macvlan mode bridge
2) ip address add 10.10.192.89/24 broadcast 10.10.192.255 dev vipr2
3) ip link set vipr2 up
4) iptables -t nat -N BRIDGE-VIPR2
5) iptables -t nat -A PREROUTING -p all -d 10.10.192.89 -j BRIDGE-VIPR2
6) iptables -t nat -A OUTPUT -p all -d 10.10.192.89 -j BRIDGE-VIPR2
7) iptables -t nat -A BRIDGE-VIPR2 -p all -j DNAT –to-destination 192.168.227.5
8) iptables -t nat -I POSTROUTING -p all -s 192.168.227.5 -j SNAT –to-source 10.10.192.89
9) echo 2 > /proc/sys/net/ipv4/conf/eth0/rp_filter
10) echo 2 > /proc/sys/net/ipv4/conf/vipr2/rp_filter
After this, I was trying to ping the host machine(s) and container from each other.
Host-A
——
Host IP: 10.10.192.233
Docker IP: 192.168.227.1
Container IP: 192.168.227.5
MCVLAN IP: 10.10.192.89
HOST-B
——
Host IP: 10.10.192.200
I am able to ping
1) HOST A -> 192.168.227.1, 192.168.227.5, 10.10.192.89, 10.10.192.200
2) Container -> 10.10.192.233, 192.168.227.1, 10.10.192.89, 10.10.192.200
3) Host B -> 10.10.192.233 (host a)
But I am not able to ping
Host B -> 10.10.192.89 (mcvlan)
I believe the communication from outside world to the container should happen through the mcvlan IP and in my case it is not working.
Please let me know if I am missing something?
I see you’ve posted this on the Docker group. To be honest, you’ll probably get a better response there, so I’ll leave this for now if you don’t mind.
If this is your ‘epic next generation deployment platform’, isn’t it time for IPv6 (without NAT)?
Unfortunately, in an enterprise environment, not all the choices are ours to make.
Hi Danny,
Thanks for your response. It looks like so far there is no reply in the docker group. Appreciate your inputs and please let me know if I am not following the steps correctly. My requirement is to see the containers created from docker as an identifiable host (using IP address) from within the hosted network (meaning all the machines from the network should be able to talk to the container using IP address).
Thanks for your time.
Is this post still relevant? Does docker now support something like this with this docker0 bridge interface? Or does that not let you give public ips to containers.
I haven’t looked at networking in a while. Docker has “always” had the docker0 bridge, so I don’t think that’s directly relevant. However, it has recently acquired the ability to bind to any interface of your choosing. So it’s probably possible to use some parts of this post (creating the interface, etc.), and then use some native Docker capabilities to replace some of the iptables stuff in this post. I haven’t tried this though. It would be great to hear if that works.
Thanks!
Your howto just keeps on giving in 2014 …..
G’day from down under 🙂
I had spent the last few days trying to solve the current Docker networking issue of external container access, and tried a few solutions but yours is the easiest , and works like a charm on on a Ubuntu 12.04 (LTS) test Docker machine when added to /etc/rc.local.
Cheers
Terry
Great! Glad to be of help!
Are there limits to how many network interfaces I can create? Are there any limits on the iptable mappings? I”m thinking of using this technique for a docker vhost site that could Have N amount of containers on one machine
I expect there probably are, but I’ve never found them 😉
Check out https://github.com/jbemmel/ecDock – it’s a little script I wrote that may do what you need, and more.
Using OpenVSwitch and Docker, ecDock places containers in ‘slots’ with a fixed IP and MAC address ( without NAT )
Hi Jeroen,
Thanks for your post, EC looks really usefull!
I gave ‘EC’ a go on my Ubuntu 10.04 system running ovs-vsctl (Open vSwitch) 1.4.6 Compiled Jan 10 2014 01:45:55 but got the following error “ovs-vsctl: Interface does not contain a column whose name matches “ofport_request”. I’ve had a look in the
OVS doc, but it’s a lot to absorb, so I thought perhaps if you have the time, you may know what the problem is ?
#> ec create-vswitch 192.168.0.254/24
ec: [INFO] create vswitch: NAME=ovs0 IP=192.168.0.254/24 MAC=52:00:c0:a8:00:fe optional DEV=
ec: [INFO] Docker and OpenVSwitch dependencies OK
ec: [INFO] New vswitch ‘ovs0’ created with IP 192.168.0.254/24 and MAC 52:00:c0:a8:00:fe
#> ec –slot=85 start terryp/unifi_v3 /usr/bin/supervisord
ec: [INFO] start: VSWITCH=ovs0 SLOT=85 IMAGE=terryp/unifi_v3, parameters for container: /usr/bin/supervisord
ec: [INFO] Docker and OpenVSwitch dependencies OK
ovs-vsctl: Interface does not contain a column whose name matches “ofport_request”
ifconfig
docker0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::1049:d7ff:fe70:b6ad/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:136508 errors:0 dropped:0 overruns:0 frame:0
TX packets:203891 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:23557567 (23.5 MB) TX bytes:267418409 (267.4 MB)
eth0 Link encap:Ethernet HWaddr 44:1e:a1:3c:ab:98
inet addr:192.168.0.105 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::461e:a1ff:fe3c:ab98/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:671303 errors:0 dropped:0 overruns:0 frame:0
TX packets:234731 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:334540491 (334.5 MB) TX bytes:52212249 (52.2 MB)
Interrupt:18
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:107339 errors:0 dropped:0 overruns:0 frame:0
TX packets:107339 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:22939616 (22.9 MB) TX bytes:22939616 (22.9 MB)
lxcbr0 Link encap:Ethernet HWaddr 7e:71:f3:9b:98:33
inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
inet6 addr: fe80::7c71:f3ff:fe9b:9833/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:648 (648.0 B)
ovs0 Link encap:Ethernet HWaddr 52:00:c0:a8:00:fe
inet addr:192.168.0.254 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::5000:c0ff:fea8:fe/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:648 (648.0 B)
Cheers
Terry
Jeroen,
Upgrading to openvswitch 2.0.0 fixed the problem 🙂
ec –slot=85 start terryp/unifi_v3:working_base /usr/bin/supervisord
ec: [INFO] start: VSWITCH=ovs0 SLOT=85 IMAGE=terryp/unifi_v3:working_base, parameters for container: /usr/bin/supervisord
ec: [INFO] Docker and OpenVSwitch dependencies OK
ec: [INFO] New container started in slot ovs0-s85 with ID=aaad2306c42b1126dc029585bb7892392c053e1f1347d19af4d93e735e694f87
ec: [INFO] Found NSPID=340 for container in slot ovs0-s85
ec: [INFO] New container up and running in slot ovs0-s85, IP=192.168.0.85/24 and MAC=52:00:c0:a8:00:55 with gateway 192.168.0.254 ( dual NIC: )
Jeroen,
One last problem I think, when I run “ec attach 85” it hangs at :-
“ec: [INFO] Attaching to container in slot ovs0-s85…” and doesn’t return control to the initiating terminal. I cant ping 192.168.0.85 externally or on the Docker server.
Any suggestions ?
ec: [INFO] attach: VSWITCH=ovs0 SLOT=85
ec: [INFO] Docker and OpenVSwitch dependencies OK
ec: [INFO] Attaching to container in slot ovs0-s85…
Hi Terry,
Attaching to a container presumes that it is running some kind of shell to attach to. For example, putting “/bin/bash” as the container’s entrypoint can do the trick.
You typically have to press when attaching, to get a prompt. I considered sending a newline character automatically somehow, but since one never knows what is running ( could be an application where ” detonates the bomb… 😉 I decided not to
Cheers,
Jeroen
I haven’t been following along, but you should also be aware that you can attach to a container using
lxc-attach --name <long container id>
and it doesn’t require you to have anything in particular running in the container – it simply starts a new shell inside the container namespace. This is not very well documented anywhere on the Docker site (or at least it wasn’t last time I looked).Hi Danny, great post! Thank you for it 🙂
It works, except one thing (I don’t now very much about iptables and I dont’ even know what to google for).
I don’t want to expose all ports from the container to the outside world (which btw. works when I try it). So I tried your solution for that:
iptables -t nat -A BRIDGE-VIRTUAL0 -p all -m tcp –dport 80 -j DNAT –to-destination :80
(The container exposes Port 80, too)
But when I run this, iptables throws this error:
iptables v1.4.18: Need TCP, UDP, SCTP or DCCP with port specification
Can you please give me a hint?
You need to change
-p all
to-p tcp
But a better solution is simply to make the thing inside the container only listen on 127.0.0.1 (localhost). The only reason not to do this is if you want something exposed out of the container and reachable by the host, but NOT reachable from outside the host.
Thanks, wow, that was fast. And it works. 🙂
Yes, I need ssh from the host to the container and I don’t want that exposed.
Do you really? It’s not well documented, but you can use
lxc-attach --name <full container id>
to attach to a container. If you’re using Docker 0.9.0 or greater, you need to be running the LXC engine for this to work. (See http://blog.docker.io/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/)Danny, great post I keep coming back to. Could you give some high-level details on how to accomplish this with IPV6? I’m specifically interested in the routing/nat part. I have a public ipv6 /64 address pool that I would like to use this technique with. I’ve created macvlan interfaces and attached public ipv6 addresses to which I can then ping publically. I am having a hard time connecting the containers to the macvlans. I’m not sure how to handle the routing/nat steps in IPV6.
I’ve tried using Docker Pipework for this but it I can’t seem to get it working.
Nothing specific, I’m afraid. I would expect it to “just” work. You’ll need
ip6tables
rather thaniptables
If you can do without NAT, you could try https://github.com/jbemmel/ecDock
I recently added IPv6 support – just configure your prefix, and containers get ipv6prefix::10.0.0.x ( for example, if you create a vswitch using 10.0.0.254 as IPv4 address )
Pingback: Docker network performance | Research notes
Thanks for your article! It helped me allot creating a set of scripts that automatically creates a ‘network’ container that does routing with iptables on a public routable IP acquired with DHCP. Such network container is then linked to a container that you want to make available on the network.
The scripts, more info and an example are available on: https://github.com/jeroenpeeters/docker-network-containers
You’re welcome. Glad it was useful.
Pingback: Docker | Tomorrow's Technology Delivered Today
Not getting ip from dhcp to virtual0 interface.
# dhclient -v virtual0
Internet Systems Consortium DHCP Client 4.1.1-P1
Copyright 2004-2010 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Listening on LPF/virtual0/0e:e2:20:b7:cc:f8
Sending on LPF/virtual0/0e:e2:20:b7:cc:f8
Sending on Socket/fallback
DHCPDISCOVER on virtual0 to 255.255.255.255 port 67 interval 5 (xid=0x52a7cec2)
DHCPDISCOVER on virtual0 to 255.255.255.255 port 67 interval 7 (xid=0x52a7cec2)
DHCPDISCOVER on virtual0 to 255.255.255.255 port 67 interval 7 (xid=0x52a7cec2)
DHCPDISCOVER on virtual0 to 255.255.255.255 port 67 interval 15 (xid=0x52a7cec2)
DHCPDISCOVER on virtual0 to 255.255.255.255 port 67 interval 17 (xid=0x52a7cec2)
DHCPDISCOVER on virtual0 to 255.255.255.255 port 67 interval 10 (xid=0x52a7cec2)
No DHCPOFFERS received.
No working leases in persistent database – sleeping.
Thank you very much!!! but if so, I will use the container internal ip to ping the container, but will see the container external ip when get the packages from the container, right?
Hello, Im using your exemple for few months now. I have 4 public ip on a system. I putted 3 ips on different docker containers. But now I want to add another container to share the ip address with another container will it be possible to bridge 2 or more container per ip?