Ubuntu 12.10 + Folsom + Quantum + OVS + GRE Problems
Hi,
I am having OVS issues with my 3-node (control, network, compute) deployment (Ubuntu 12.10 + Folsom + Quantum + OVS + GRE). From what I can tell, my VM's are given a vnet# interface on the compute node by OVS, but they never are able to reach the network node for DHCP, etc.
I ran a tcpdump last night, which showed a new VM trying repeatedly to get an answer from dnsmasq, which I confirmed is running on 67 udp on the network node.
I'm most concerned by the fact that 'ip netns list' returns nothing on any of the three nodes (running the command as 'root').
It's beginning to look like I should just start again from scratch. I dropped the Quantum db last night and remade it, hoping to pull it all together again, but no matter what I do, nothing seems to be working to fix this issue.
Here is the guide I'm debugging/
Here's a paste dump of the situation:
http://
Thank you very much for your assistance.
-Joshua
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- neutron Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- yong sheng gong
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Here is a tcpdump of vnet0 on the compute node, when its only VM is rebooted.
Revision history for this message
|
#2 |
Below, I'm including the full log of the VM booting. Please note that after witnessing the "eth0: IPv6 duplicate address" error in this log, I disabled ipv6 in the sysctl.conf of all three machines and performed a reboot. The log below is from a hard reboot of the VM, after the reboot of the compute node (hypervisor), control node and network node to disable ipv6.
Revision history for this message
|
#3 |
It seems you have not run quantum-
Do u make sure you are using the same ovs_quantum_
network work node should have br-tun too. We should also have gre port on br-tun on both compute node and network node.
Revision history for this message
|
#4 |
Can u give out the:
quantum net-show example-
Revision history for this message
|
#5 |
Here are the net-show commands:
http://
It seems that br-tun should be added automatically, because I configured it in the .ini files. Is br-tun a port which needs to be manually added using ovs-vsctl?
Here is the /etc/quantum/
And the compute node:
Thanks,
Joshua
Revision history for this message
|
#6 |
can u try to run
quantum-
on your network node. Of course stop the current one before it.
and paste out the log.
Revision history for this message
|
#7 |
Here is the output of quantum-
That command appears to have updated the state of OVS:
root@knet-
a4de515d-
Bridge br-tun
Port br-tun
Port "gre-1"
Port patch-int
Bridge br-ex
Port "eth2"
Port br-ex
Bridge br-int
Port patch-tun
Port br-int
ovs_version: "1.4.3"
root@knet-
Here is the agent run on the compute node:
And ovs-vsctl show on the compute node, from after the agent:
root@khyp-
root@khyp-
root@khyp-
quantum-
root@khyp-
801aa35e-
Bridge br-int
Port patch-tun
Port br-int
Port "qvoe4bc93cc-d5"
tag: 1
Bridge br-tun
Port "gre-2"
Port br-tun
Port patch-int
ovs_version: "1.4.3"
root@khyp-
I just tested another VM. It is still unable to reach DCHP. It seems like something is making the agents crash. Is the following behavior normal? It seems strange to see the agents on both compute and network exit'ing with code 1 in dmesg.
(tail of dmesg on compute node)
[36275.758262] block nbd15: queue cleared
[36277.235971] type=1400 audit(135506140
[36277.349062] device vnet0 entered promiscuous mode
[36277.355463] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36277.355488] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36278.958207] kvm: 10992: cpu0 unhandled rdmsr: 0xc0010112
[36289.833629] qbrfe616ec2-11: port 1(qvbfe616ec2-11) entered forwarding state
[36292.390485] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
root@khyp-
stop: Unknown instance:
root@khyp-
quantum-
root@khyp-
[36275.757391] block nbd15: Unexpected reply (ffff883fd06c5c48)
[36275.758262] block nbd15: queue cleared
[36277.235971] type=1400 audit(135506140
[36277.349062] device vnet0 entered promiscuous mode
[36277.355463] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36277.355488] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36278.958207] kvm: 10992: cpu0 unhandled rdmsr: 0xc0010112
[36289.833629] qbrfe616ec2-11: port 1(qvbfe616ec2-11) entered forwarding state
[36292.390485] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36407.439814] init: quantum-
root@khyp-
And also this (on network node):
root@knet-
[ 129.254818] type=1400 audit(135502285
[ 132.682595] openvswitch: Open vSwitch switching datapath 1.4.3, built Dec 8 2012 22:05:31
[ 132.733944] init: quantum-dhcp-agent main process (1532) terminated with status 1
[ 132.812775] init: quantum-
[ 133.539604] device br-int entered promiscuous mode
[ 133.540034] device br-ex entered promiscuous mode
[ 134.120299] init: quantum-l3-agent main process (1534) terminated with status 1
[ 2225.210200] init: quantum-
[37180.858158] device br-tun entered promiscuous mode
[37412.770128] init: quantum-
root@knet-
stop: Unknown instance:
root@knet-
quantum-
root@knet-
[ 132.682595] openvswitch: Open vSwitch switching datapath 1.4.3, built Dec 8 2012 22:05:31
[ 132.733944] init: quantum-dhcp-agent main process (1532) terminated with status 1
[ 132.812775] init: quantum-
[ 133.539604] device br-int entered promiscuous mode
[ 133.540034] device br-ex entered promiscuous mode
[ 134.120299] init: quantum-l3-agent main process (1534) terminated with status 1
[ 2225.210200] init: quantum-
[37180.858158] device br-tun entered promiscuous mode
[37412.770128] init: quantum-
[39008.879120] init: quantum-
root@knet-
stop: Unknown instance:
root@knet-
Thanks,
Joshua
Revision history for this message
|
#8 |
try to run agents on both network and compute nodes with direct command line.
It seems your dhcp agent is not running, either. You should start it on your network node too:
sudo quantum-dhcp-agent --config-file /etc/quantum/
By the way, I don't know why your service way to start the agents do not work.
Revision history for this message
|
#9 |
Your hint led me to resolve a major issue: In trying to debug the network and compute nodes, I purged OVS/Quantum, backed up /var/log and 'rm -rf /var/log/*'. Well, that was a stupid idea, because the /var/log/quantum and /var/log/upstart directories are not automatically spawned if not present. This behavior was keeping the quantum services from starting. I recreated them, per the control node's example, including permissions. All of the agents appear to be started now.
However, my VM is still not able to get a DHCP address.
The following tcpdump comes from the network node, where the l3-agent and dhcp-agent live. It confirms that the DHCP request is arriving on the 'tap' of the network node.
I'm now working on the next layer of the onion. :-)
Thanks,
Joshua
Revision history for this message
|
#10 |
Don't forget your network node also need run l2 agent quantum-
and please 'ovs-vsctl show' on both network and compute node when u are certain all agents run well.
Revision history for this message
|
#11 |
All agents appear to be running well at this time, though my problem persists.
Here is the outputs from 'ovs-vsctl show':
Here is something interesting: Per /var/log/syslog on the network node, dnsmasq is trying to answer the 3x discovers. It seems the path back to the compute node is the issue.
root@knet-
Dec 10 10:27:08 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/
Dec 10 10:27:08 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/
Dec 10 10:27:50 knet-hj29 dnsmasq-dhcp[6186]: DHCPDISCOVER(
Dec 10 10:27:50 knet-hj29 dnsmasq-dhcp[6186]: DHCPOFFER(
Dec 10 10:27:53 knet-hj29 dnsmasq-dhcp[6186]: DHCPDISCOVER(
Dec 10 10:27:53 knet-hj29 dnsmasq-dhcp[6186]: DHCPOFFER(
Dec 10 10:27:56 knet-hj29 dnsmasq-dhcp[6186]: DHCPDISCOVER(
Dec 10 10:27:56 knet-hj29 dnsmasq-dhcp[6186]: DHCPOFFER(
root@knet-
Thanks again,
Joshua
Revision history for this message
|
#12 |
This may be of help, too. It's another tcpdump of the network node. This time only showing gre proto on eth1:
http://
Is it possible that this is a routing issue?
NETWORK:
root@knet-hj29:~# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 0 0 0 br-ex
10.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 br-ex
172.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
root@knet-hj29:~#
COMPUTE:
root@khyp-
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.20.10.52 0.0.0.0 UG 0 0 0 eth1
10.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
172.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
root@khyp-
Here is iptables output for both hosts:
http://
Thanks,
Joshua
Revision history for this message
|
#13 |
So, strangely, when I do a tcpdump on the network node's tap and q* interfaces, this happens:
root@knet-
br-ex Link encap:Ethernet HWaddr 00:10:18:c8:b0:08
inet addr:192.168.5.108 Bcast:192.168.5.255 Mask:255.255.255.0
inet6 addr: fe80::210:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:308 errors:0 dropped:0 overruns:0 frame:0
TX packets:220 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:40832 (40.8 KB) TX bytes:24567 (24.5 KB)
br-int Link encap:Ethernet HWaddr 82:ae:7d:d6:e6:4e
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
br-tun Link encap:Ethernet HWaddr f2:fb:a3:b5:30:41
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr 00:24:e8:2e:80:d3
inet addr:10.20.10.52 Bcast:10.20.10.255 Mask:255.255.255.0
inet6 addr: fe80::224:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:273 errors:0 dropped:0 overruns:0 frame:0
TX packets:265 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:62270 (62.2 KB) TX bytes:45001 (45.0 KB)
eth1 Link encap:Ethernet HWaddr 00:10:18:c8:b0:0a
inet addr:172.20.10.52 Bcast:172.20.10.255 Mask:255.255.255.0
inet6 addr: fe80::210:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:492 (492.0 B)
eth2 Link encap:Ethernet HWaddr 00:10:18:c8:b0:08
inet6 addr: fe80::210:
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:462 errors:0 dropped:71 overruns:0 frame:0
TX packets:222 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:56920 (56.9 KB) TX bytes:25521 (25.5 KB)
eth3 Link encap:Ethernet HWaddr 00:24:e8:2e:80:d4
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:348 errors:0 dropped:0 overruns:0 frame:0
TX packets:348 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:30488 (30.4 KB) TX bytes:30488 (30.4 KB)
qg-a0f57edd-6c Link encap:Ethernet HWaddr ae:a3:6f:5f:fc:b9
inet addr:192.168.5.110 Bcast:192.168.5.255 Mask:255.255.255.0
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:34 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:4162 (4.1 KB) TX bytes:0 (0.0 B)
qr-443b6d3d-71 Link encap:Ethernet HWaddr 5a:be:bf:27:05:f5
inet addr:10.5.5.1 Bcast:10.5.5.255 Mask:255.255.255.0
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
tap3e2fb05e-53 Link encap:Ethernet HWaddr 26:0a:0b:32:2e:ef
inet addr:10.5.5.3 Bcast:10.5.5.255 Mask:255.255.255.0
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@knet-
tcpdump: tap3e2fb05e-53: That device is not up
root@knet-
tcpdump: qr-443b6d3d-71: That device is not up
root@knet-
tcpdump: qg-a0f57edd-6c: That device is not up
root@knet-
It seems to me that Quantum should be ifconfig up'ing these. Do you know why it isn't?
Thanks,
Joshua
Revision history for this message
|
#14 |
I'm rebuilding from scratch, so this can be closed. Thanks for your help! -Joshua
Revision history for this message
|
#15 |
Thanks yong sheng gong, that solved my question.
Revision history for this message
|
#16 |
Hi,
Did you rebuilded ?
Does it works ?
If yes could you provide, from both compute & network nodes :
# route -n
# ifconfig
# dpkg -l |grep openv
Thank's for your help.