Instance directly connected to provider network does not receive DHCP reply
Short description of problem:
In an envrionment where virtual routers and floating IP addresses are fully working, connecting an instance directly to the provider network does not work. The instance is unable to communicate with the network. More specifically: It is able to send a DHCP request message, but does not receive a DHCP reply. The reply message is visible on the physical network.
Details on the environment:
1 controller node
2 compute nodes
2 network nodes
All nodes have access to a management network (eth0). The compute and network nodes also have access to a tenant network (OVS GRE on eth1) and a provider network (eth2). eth0 and eth1 have an IP. Eth2 is configured without an IP.
All nodes are running as virtual machines in a manually maintained VMware environment. The only required / specialized change was to allow promisc mode on the provider network, otherwise for example the virtual routers on the network nodes did not receive network traffic on eth2.
What does work:
- create tenant network
- create provider network
- create router
- create instance connected to tenant network
- create floating ip and assign it to instance
- test network traffic from instance to provider network (checked if the source ip of the ping is the floating IP: yes)
- test network traffic from provider network to floating ip (enable port 22 on security group and try ssh'ing to the instance): works
The following steps are of interest:
- create a new instance connected to provider network
- check (using "nova list" and "nova show") to see if a IP address was provisioned from the provider network: yes.
- once the instance boots, login (via console) into the instance see if it is able to get the IP address from the DHCP server: No.
- statically set the IP address to the interface (inside the instance, using "ifconfig eth0 x.x.x.x netmask x.x.x.x") and test communication: fails, both traffic from instance to provider network and the other way around.
Starting from the drawing in http://
The DHCP Request packet reaches all the way from the instances to the dnsmasq process on one of the network nodes. One thing which catches my attention is that all packets seem to be duplicated for some reason, but this should not be a problem. This only appears to happen on bridges connected to the provider network. When looking for example in the qdhcp-xxxx-xxx.. namespace on the network node, i only see the packet once, so i will ignore this for now.
dnsmasq on the network node replies with an DHCP Reply packet and reaches onto the provider network, back into the eth2 interface of the compute node:
(node2 is the compute node where the instance is running, verified fa:16:3e:47:d9:1b is the mac of the instance, and 192.168.103.230 is the ip it should have according to nova. Also verified that 192.168.103.231 is the ip of the dhcp namespace on the network node)
root@node2:~# tcpdump -i eth2 -n 'udp port 67 or udp port 68'
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
13:10:46.873239 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:46.873682 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:46.873922 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:10:46.873929 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:10:49.885679 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:49.886451 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:49.886705 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:10:49.886712 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
It then reaches onto the phy-br-ex end of the veth pair:
root@node2:~# tcpdump -i phy-br-ex -n 'udp port 67 or udp port 68'
tcpdump: WARNING: phy-br-ex: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on phy-br-ex, link-type EN10MB (Ethernet), capture size 65535 bytes
13:12:09.591398 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:09.591831 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:09.592604 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:12:09.592616 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:12:12.594992 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:12.595972 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:12.596023 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:12:12.597883 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
but at the other end of the veth pair some packet loss occurs (only the reply packets)
root@node2:~# tcpdump -i int-br-ex -n 'udp port 67 or udp port 68'
tcpdump: WARNING: int-br-ex: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on int-br-ex, link-type EN10MB (Ethernet), capture size 65535 bytes
13:13:41.167895 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:41.168812 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:41.168925 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:41.169358 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:44.801697 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:44.801984 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:47.806615 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:47.807393 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:47.807739 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:47.807741 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:50.810922 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:50.811742 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:50.811912 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:50.811913 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:54.758003 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:54.758673 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:57.762639 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:57.763403 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:57.763643 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:57.763644 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
When bringing the br-int interface up on the hypervisor and checking the contents there, it completely lost the reply packet:
root@node2:~# ifconfig br-int up && tcpdump -i br-int -n 'udp port 67 or udp port 68'
tcpdump: WARNING: br-int: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-int, link-type EN10MB (Ethernet), capture size 65535 bytes
13:20:08.545207 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:08.546072 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:12.402099 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:12.402607 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:15.407187 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:15.407960 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
On the qvo and further interfaces/bridges towards the instance, the reply packet is also not visible. (allthough on one occasion i did manage to see a single reply message, but was unable to reliably reproduce the result.)
Some additional information:
When setting verbose and debug to true in /etc/quantum/
top - 13:31:43 up 1:50, 2 users, load average: 0.88, 0.94, 0.83
Tasks: 102 total, 4 running, 98 sleeping, 0 stopped, 0 zombie
Cpu(s): 51.4%us, 7.9%sy, 0.0%ni, 40.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1019476k total, 649212k used, 370264k free, 109780k buffers
Swap: 1046524k total, 0k used, 1046524k free, 161596k cached
Eventhough according to ifconfig promisc mode is not set on the eth2 interfaces on the compute and network nodes, the network node seems to work fine handling traffic (floating ip addresses and everything). Setting promisc mode on the compute node has no effect as far as I can tell. I also believe this is not the problem, since I can see the traffic on eth2 and the veth end connecting to it.
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- neutron Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- Michiel K
- Solved:
- Last query:
- Last reply:
This question was reopened
- by Michiel K