Why would an OVS bridge not forward ARP
Bridge "br-eth0"
Port "br-eth0"
Port "eth0"
Port "phy-br-eth0"
Bridge br-int
Port "tap55d1e5e8-ab"
tag: 1
Port "qr-4b50a17d-3c"
tag: 1
Port "int-br-eth0"
Port "tape8d6e0a5-52"
tag: 1
Port "tap6176588e-48"
tag: 1
Port br-int
I can see ARP packets sent from int-br-eth0 to phy-br-eth0 but not to upstream eth0.
So we cannot ping from one VM (or DHCP NetNS) on one machine to another VM on another machine.
I see the ping triggering ARPs. The Tx counter of int-br-eth0 and Rx counter of phy-br-eth0 were also corelated with ping.
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- neutron Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- Eoghan
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Can you also provide the output of:
ovs-ofctl dump-flows br-int
ovs-ofctl dump-flows br-eth0
Revision history for this message
|
#2 |
$ sudo ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
$ sudo ovs-ofctl dump-flows br-eth0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
Revision history for this message
|
#3 |
This is a sample when ping was going on and failing.
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=30847.59s, table=0, n_packets=336925, n_bytes=41895113, priority=1 actions=NORMAL
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=30873.43s, table=0, n_packets=7748, n_bytes=327804, priority=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=30903s, table=0, n_packets=337542, n_bytes=41964462, priority=1 actions=NORMAL
Revision history for this message
|
#4 |
--- 10.0.0.6 ping statistics ---
70 packets transmitted, 0 received, +65 errors, 100% packet loss, time 69300ms
The drops packets (7780-7717 = 63) come close to 65 errors but not sure if 100% co relation can be made.
But we did tcp dump on ARPs for src IP on Rx and Tx side.
Rx side showed ARP packets coming and Tx side showed no ARP Packets leaving.
Revision history for this message
|
#5 |
Hi Sunil,
Is the host (or switch attached to eth0) configured to recieve a packet with a vlan tag on it? If a packet is sent from [tape8d6e0a5-52 or tap6176588e-48] the ARP request will enter int-br-eth0(and a vlan tag of 1 will be added to the packet). Then the request will enter br-eth0 with this vlan tag and then exit eth0.
The other option is that: if you do a ovs-dpctl show, eth0 cordinates to port 5 in which case the packets won't be forwarded on due to the drop rule in your flow table for br-eth0.
Revision history for this message
|
#6 |
Hi Aaron,
Please see this outout. It is phy-br-eth0.
stack@esg-
OFPT_FEATURES_REPLY (xid=0x1): ver:0x1, dpid:000000219b
n_tables:255, n_buffers:256
features: capabilities:0xc7, actions:0xfff
5(phy-br-eth0): addr:06:
config: 0
state: 0
current: 10GB-FD COPPER
8(eth0): addr:00:
config: 0
state: 0
current: 1GB-FD FIBER AUTO_NEG
advertised: 1GB-FD AUTO_NEG
supported: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER FIBER AUTO_NEG
LOCAL(br-eth0): addr:00:
config: PORT_DOWN
state: LINK_DOWN
OFPT_GET_
Revision history for this message
|
#7 |
The switch is connected to eth0 and configured to take VLAN #1 and set in trunk mode.
Revision history for this message
|
#8 |
Looks like br-eth0 is down, I'm not sure if that would stop if from forwarding packets. Can you try ifconfig br-eth0 up; and see if that changes anything? You're sure if you tcpdump on eth0 you don't see any of these arps?
Also:
cookie=0x0, duration=
would block the returning ARP reply (though if it's not making it out eth0, that doesn't matter yet).
Are you using a particular plugin and it's not working as expected?
Revision history for this message
|
#9 |
can u list the network and show the network you are using:
quantum net-list
quantum net-show
and make sure the ovs-quantum-agent is active.
It seems your flows in ovs bridge are not set well.
Revision history for this message
|
#10 |
Hi Aaron,
I am leaving on some trip, and would not have access.
The br-eth0 is up and tried that still but did not work.
stack@esg-
br-eth0 Link encap:Ethernet HWaddr 00:21:9b:c9:d9:83
inet6 addr: fe80::221:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:660411 errors:0 dropped:1883 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:84199314 (84.1 MB) TX bytes:468 (468.0 B)
Sunil.
Revision history for this message
|
#11 |
Hi Yong,
That is not the root cause. Still.
stack@esg-
+------
| id | name | subnets |
+------
| 68f76ec1-
| fa8f9c5e-
+------
stack@esg-
+------
| Field | Value |
+------
| admin_state_up | True |
| id | fa8f9c5e-
| name | net1 |
| provider:
| provider:
| provider:
| router:external | False |
| shared | False |
| status | ACTIVE |
| subnets | 31ed889f-
| tenant_id | b0d8717a0f8b4cf
stack@esg-
+------
| Field | Value |
+------
| admin_state_up | True |
| id | 68f76ec1-
| name | ext_net |
| provider:
| provider:
| provider:
| router:external | True |
| shared | False |
| status | ACTIVE |
| subnets | 09851d25-
| tenant_id | 44cb33fdc72b44a
+------
+------
Revision history for this message
|
#12 |
Sunil, One last thing. If you leave the ping running and then provide the output of
ovs-dpctl dump-flows br-int
ovs-dpctl dump-flows br-tun
That will show the active flow entires in the kernel. Did you try running tcpdump on eth0 to see if you see arp packets there? You never said how you know that they are not making it out eth0. You just said you were unable to ping. (The drop flow entry you provided blocks the returning replies so ping definitely will not work).
Aaron
Revision history for this message
|
#13 |
Hi Aron,
Here is the TCP Dumps.
The following output show there is no link issue between phy-br-eth0 and int-br-eth0.
(1)
root@esg-
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_req=1 ttl=64 time=0.056 ms
64 bytes from 10.0.0.2: icmp_req=2 ttl=64 time=0.052 ms
64 bytes from 10.0.0.2: icmp_req=3 ttl=64 time=0.032 ms
64 bytes from 10.0.0.2: icmp_req=4 ttl=64 time=0.041 ms
64 bytes from 10.0.0.2: icmp_req=5 ttl=64 time=0.048 ms
--- 10.0.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
The above triggers this below on int-br-eth0
stack@esg-
tcpdump: WARNING: int-br-eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on int-br-eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
00:41:55.755545 ARP, Request who-has inenbasavbl1c.
00:41:56.753793 ARP, Request who-has inenbasavbl1c.
00:41:57.753782 ARP, Request who-has inenbasavbl1c.
00:41:58.771011 ARP, Request who-has inenbasavbl1c.
00:42:00.769796 ARP, Request who-has inenbasavbl1c.
(2)
And again on phy-br-eth0.
root@esg-
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
From 10.0.0.2 icmp_seq=1 Destination Host Unreachable From 10.0.0.2 icmp_seq=2 Destination Host Unreachable From 10.0.0.2 icmp_seq=3 Destination Host Unreachable ^C
--- 10.0.0.3 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4024ms
stack@esg-
tcpdump: WARNING: phy-br-eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on phy-br-eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
00:43:35.871097 ARP, Request who-has inenbasavbl1c.
00:43:36.869787 ARP, Request who-has inenbasavbl1c.
00:43:37.873777 ARP, Request who-has inenbasavbl1c.
00:43:38.887008 ARP, Request who-has inenbasavbl1c.
00:43:39.885784 ARP, Request who-has inenbasavbl1c.
00:43:40.885794 ARP, Request who-has inenbasavbl1c.
00:43:41.902994 ARP, Request who-has inenbasavbl1c.
00:43:42.901762 ARP, Request who-has inenbasavbl1c.
00:43:43.901758 ARP, Request who-has inenbasavbl1c.
Thanks,
Sunil
Revision history for this message
|
#14 |
Hi Sunil,
Sorry I'm not sure why you're trying to show me here. You can ping 10.0.0.2 and not 10.0.0.3, I don't know where those interfaces reside in your setup. Can you show me an ifconfig -a of this machine. Also while you are pinging a ovs-dpctl dump-flow. Also, why are you showing me a tcpdump on phy-br-eth0, you should be doing that on eth0 since you say the packets are getting there.
Thanks,
Aaron
P.S: I'll also be in #openstack-dev for a little while longer tonight.
Revision history for this message
|
#15 |
Someone else has been on the setup and things have changed a bit.
He said he entered a flow entry - but I did not have time to follow up. I am in R/O mode. :-)
br-eth0 was brought up and that may have changed the behavior.
Now ARP is reach the other machine, and I can see the traffic on eth0, phy-br-eth0 and int-br-eth0.
But there is not ARP reply. The ARPs are not getting to any of the TAP interfaces (but I only did once).
The br-int seems to be dropping now.
BTW, one TAP interface is not there as VM was brought down but ovs-vsctl has it.
stack@esg-
system@br-eth0:
lookups: hit:658100 missed:119465 lost:0
flows: 27
port 0: br-eth0 (internal)
port 6: eth0
port 9: phy-br-eth0
ovs-dpctl: opening datapath flows failed (No such device)
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
system@br-eth0:
lookups: hit:660125 missed:119838 lost:0
flows: 30
port 0: br-eth0 (internal)
port 6: eth0
port 9: phy-br-eth0
ovs-dpctl: opening datapath flows failed (No such device)
stack@esg-
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
stack@esg-
system@br-int:
lookups: hit:631616 missed:143682 lost:0
flows: 26
port 0: br-int (internal)
Sep 21 03:00:32|
port 1: tap26583155-34 (internal)
port 14: tapd1802d22-b4
port 15: tapfa0e7fcf-8d
port 16: tap5eb27feb-05
port 18: int-br-eth0
ovs-dpctl: opening datapath flows failed (No such device)
Revision history for this message
|
#16 |
Sorry, I need to go. Someone from EMC would follow up.
Revision history for this message
|
#17 |
Sorry, I need to go. Someone from EMC would follow up.
Revision history for this message
|
#18 |
Seen from Floor #11,
you are using local mode where the traffic will not go out from the machine. U can try multiple Vms on the same machine, they should can ping each other.
Revision history for this message
|
#19 |
Following on from Yong's suggestion about flow control rules I checked the rules on br-int and br-eth0 on both nodes. The example below is from br-int on the controller and port 20 was set to drop by default. This was the same for br-int on the other node, and for br-eth0 on both nodes.
sudo ovs-ofctl show br-int
OFPT_FEATURES_REPLY (xid=0x1): ver:0x1, dpid:00005aa5a9
n_tables:255, n_buffers:256
features: capabilities:0xc7, actions:0xfff
2(tap55d1e5e8-ab): addr:0b:
config: PORT_DOWN
state: LINK_DOWN
18(tape8d6e0a5
config: 0
state: 0
current: 10MB-FD COPPER
19(tap6176588e
config: 0
state: 0
current: 10MB-FD COPPER
20(int-br-eth0): addr:6a:
config: 0
state: 0
current: 10GB-FD COPPER
$sudo ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=
cookie=0x0, duration=
Once I opened these up for both br-int and br-eth0 on both sides I could ping instances from either side so this is now working.
Is there any reason why these ports would be set to drop by default?
Thanks
Eoghan
Revision history for this message
|
#20 |
both of your networks are local type:
stack@esg-
+------
| Field | Value |
+------
| admin_state_up | True |
| id | fa8f9c5e-
| name | net1 |
| provider:
the flow is to drop by default. if we have Vms on the network with Vlan network_type, the port will be opened.
Revision history for this message
|
#21 |
I had these in the localrc before I ran stack.sh
ENABLE_
TENANT_
PHYSICAL_
And nova.conf had vlan_interface=eth0
ovs_quantum_
bridge_mappings = eth0:br-eth0
tenant_network_type = vlan
network_vlan_ranges = eth0:1:1000
Should this be sufficient for the networks to run as VLAN type?
Revision history for this message
|
#22 |
Yes. But to enable the networks on multi-nodes to connect together, u need corresponding actual physical net which runs on the given VLAN id.
For example, If your virtual network has | provider:
Revision history for this message
|
#23 |
And if your original question is answered, we should close this question. If we have new ones, we should open new one. Different questions in one thread is not helpful for others to query.
Revision history for this message
|
#25 |
Thanks Eoghan, that solved my question.