FlatDHCPManager makes the wrong bridge and looses connectivity

Asked by arturo lorenzo

If I use 10.0.0.0/24 network and FlatDHCPManager, then the br100 shows 10.0.0.1 and the eth0 does not have any IP (as it should) at all.
The problem happens when restarting all the nova services.

If I change it back to FlatManager then the br100 shows the correct IP and eth0 still does not have any IP which is correct.

why?

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Solved by:
arturo lorenzo
Solved:
Last query:
Last reply:
Revision history for this message
Dan Prince (dan-prince) said :
#1

Are you by any chance using the --flat-interface option when using FlatDHCPManager? If so then I think FlatDHCPManager will try to bridge into that interface (this involves moving the IP of that interface to the br100 bridge).

Perhaps you can use --flat-interface=eth1 or use another unused interface (besides using eth0)?

--

Just some ideas. Might be more helpful to see your config file.

Revision history for this message
arturo lorenzo (arturo-lorenzo) said :
#2

Thanks Dan for your advise. Actually I have done the following:
01) I have reinstalled nova bexar on a fresh machine with nova-CC-install-v1.1.sh
02) After a successful installation the default nova.conf shows as flatmanager and therefore the bridge is set in /etc/network/interfaces so in order to use FlatDHCPManager we first have to stop nova
03) Stop all nova components
04) Then change /etc/nova/nova.con replacing FlatManager with FlatDHCPManager and add --flat_interface=eth1 --flat_injected=False --public_interface=eth0 (fortunately I have eth1 unused)
05) Then change back /etc/network/interfaces to the original config (see file interfaces.ORIG under the current dir)
06) Then reboot
At this point I was able to launch and access the vm's successfully.
ifconfig shows now,
br100 with 10.0.0.1 and eth0 with my static IP and brctl shows br100 with eth1 and all the vnetx correctly.

What I am working now is in scaling by installing nova-NODE-installer.sh on another computer to provide more VMs.
The problem I have at this moment is that the compute nodes are providing VMs OK but they can not access the cloud controller. ifconfig shows now,
br100 with no address and eth0 with my static IP and brctl shows br100 with eth0

Thanks!

Revision history for this message
Vish Ishaya (vishvananda) said :
#3

you need to make the same settings on the compute hosts:
--flat_interface=eth1 and FlatDHCPManager

br100 will not get an ip address on the compute hosts. You may have to delete br100 manually and launch a new instance after changing the settings for it to be recreated properly
On Mar 14, 2011, at 8:33 AM, arturo lorenzo wrote:

> Question #148714 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/148714
>
> Status: Answered => Open
>
> arturo lorenzo is still having a problem:
> Thanks Dan for your advise. Actually I have done the following:
> 01) I have reinstalled nova bexar on a fresh machine with nova-CC-install-v1.1.sh
> 02) After a successful installation the default nova.conf shows as flatmanager and therefore the bridge is set in /etc/network/interfaces so in order to use FlatDHCPManager we first have to stop nova
> 03) Stop all nova components
> 04) Then change /etc/nova/nova.con replacing FlatManager with FlatDHCPManager and add --flat_interface=eth1 --flat_injected=False --public_interface=eth0 (fortunately I have eth1 unused)
> 05) Then change back /etc/network/interfaces to the original config (see file interfaces.ORIG under the current dir)
> 06) Then reboot
> At this point I was able to launch and access the vm's successfully.
> ifconfig shows now,
> br100 with 10.0.0.1 and eth0 with my static IP and brctl shows br100 with eth1 and all the vnetx correctly.
>
> What I am working now is in scaling by installing nova-NODE-installer.sh on another computer to provide more VMs.
> The problem I have at this moment is that the compute nodes are providing VMs OK but they can not access the cloud controller. ifconfig shows now,
> br100 with no address and eth0 with my static IP and brctl shows br100 with eth0
>
> Thanks!
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
arturo lorenzo (arturo-lorenzo) said :
#4

Vish, thanks for your response; I don't quite understand what you meant in: "You may have to delete br100 manually and launch a new instance after changing the settings for it to be recreated properly"
if I delete the br100 via: ifconfig br100 down;brctl delbr br100 then as soon as I restart another instance the br100 will be recreated.
Do I have to connect the compute node on eth1 and create a private net that way?
That means that the Cloud Controller will have static ip 192.168.2.10 on eth0 and no ip on eth1 which is connected to the compute node on eth0?
Thanks!

Revision history for this message
Vish Ishaya (vishvananda) said :
#5

For sanity, you should probably use the same eth device for all of your nodes. For example use eth1 on the node running nova-network and the node running nova-compute. The node running nova-network will get an ip on br100 (it listens for dhcp requests using this ip), but the node running nova-compute will not. br100 on both hosts will be automatically bridged into whichever device is specified using --flat_itnerface. Make sure that you specify the same --network_manager on both hosts.

Vish

On Mar 14, 2011, at 12:26 PM, arturo lorenzo wrote:

> Question #148714 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/148714
>
> Status: Answered => Open
>
> arturo lorenzo is still having a problem:
> Vish, thanks for your response; I don't quite understand what you meant in: "You may have to delete br100 manually and launch a new instance after changing the settings for it to be recreated properly"
> if I delete the br100 via: ifconfig br100 down;brctl delbr br100 then as soon as I restart another instance the br100 will be recreated.
> Do I have to connect the compute node on eth1 and create a private net that way?
> That means that the Cloud Controller will have static ip 192.168.2.10 on eth0 and no ip on eth1 which is connected to the compute node on eth0?
> Thanks!
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
arturo lorenzo (arturo-lorenzo) said :
#6

I think I have the set up in accord to your recommendations. I have the cloudccontroller1 with a public interface eth0 and eth1 with a private net. br100 comes ok with 10.0.0.1 and I am using eth2 as my flat_interface.
On compute1 I have my public interface eth0 connected to the same switch as eth1 on the cloudcontroller1 and br100 does not show any IP and I am using eth2 as my flat_interface too. Both nova.conf contain FlatDHCPManager.
I can see packets going on eth2 every time I ping the instance (10.0.0.2) running on compute1 from cloudcontroller1 or from the compute1 nodes.
But the problem is the same ( I can not ssh into it because the instance can not communicate correctly with the cloudcontroller1) see the console.log:
============================================================================
cloud-init start running: Mon, 21 Mar 2011 17:48:12 +0000. up 1.38 seconds
2011-03-21 17:48:14,703 - DataSourceEc2.py[WARNING]: waiting for metadata service at http://169.254.169.254/2009-04-04/meta-data/instance-id

2011-03-21 17:48:14,704 - DataSourceEc2.py[WARNING]: 17:48:14 [ 1/100]: url error [timed out]

2011-03-21 17:48:17,709 - DataSourceEc2.py[WARNING]: 17:48:17 [ 2/100]: url error [timed out]

2011-03-21 17:48:20,713 - DataSourceEc2.py[WARNING]: 17:48:20 [ 3/100]: url error [timed out]
============================================================================

From the compute1 node I can access the cloudcontroller1 with this cmd:

 wget http://169.254.169.254:80

--2011-03-21 14:05:01-- http://169.254.169.254/
Connecting to 169.254.169.254:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 92 [text/html]
Saving to: `index.html'

100%[=======================================================================================================================>] 92 --.-K/s in 0s

2011-03-21 14:05:01 (21.0 MB/s) - `index.html' saved [92/92]

So I don't know what else I need to do.
Thanks!

Revision history for this message
Vish Ishaya (vishvananda) said :
#7

If this is a desktop image, you may have to give the 169.254 address to the network host:
something like:
 ip addr add 169.254.169.254/32 scope link dev eth1
This will allow it to arp for the address. The eth device that you add the address to isn't particularly important, although if you decide to add it to br100 you should probably use scope global instead of scope link, or the ordering of ip addresses can sometimes mess up dhcp.
If this is not a desktop image, then you may be having issues with your forwarding rules. Check:
iptables -L -n -v
for the 169.254 rule. Make sure that the rule has the proper ip for your api server and make sure that the rule is actually getting hit properly.

Vish

On Mar 21, 2011, at 11:07 AM, arturo lorenzo wrote:

> Question #148714 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/148714
>
> Status: Answered => Open
>
> arturo lorenzo is still having a problem:
> I think I have the set up in accord to your recommendations. I have the cloudccontroller1 with a public interface eth0 and eth1 with a private net. br100 comes ok with 10.0.0.1 and I am using eth2 as my flat_interface.
> On compute1 I have my public interface eth0 connected to the same switch as eth1 on the cloudcontroller1 and br100 does not show any IP and I am using eth2 as my flat_interface too. Both nova.conf contain FlatDHCPManager.
> I can see packets going on eth2 every time I ping the instance (10.0.0.2) running on compute1 from cloudcontroller1 or from the compute1 nodes.
> But the problem is the same ( I can not ssh into it because the instance can not communicate correctly with the cloudcontroller1) see the console.log:
> ============================================================================
> cloud-init start running: Mon, 21 Mar 2011 17:48:12 +0000. up 1.38 seconds
> 2011-03-21 17:48:14,703 - DataSourceEc2.py[WARNING]: waiting for metadata service at http://169.254.169.254/2009-04-04/meta-data/instance-id
>
> 2011-03-21 17:48:14,704 - DataSourceEc2.py[WARNING]: 17:48:14 [
> 1/100]: url error [timed out]
>
> 2011-03-21 17:48:17,709 - DataSourceEc2.py[WARNING]: 17:48:17 [
> 2/100]: url error [timed out]
>
> 2011-03-21 17:48:20,713 - DataSourceEc2.py[WARNING]: 17:48:20 [ 3/100]: url error [timed out]
> ============================================================================
>
>> From the compute1 node I can access the cloudcontroller1 with this cmd:
>
> wget http://169.254.169.254:80
>
> --2011-03-21 14:05:01-- http://169.254.169.254/
> Connecting to 169.254.169.254:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 92 [text/html]
> Saving to: `index.html'
>
> 100%[=======================================================================================================================>]
> 92 --.-K/s in 0s
>
> 2011-03-21 14:05:01 (21.0 MB/s) - `index.html' saved [92/92]
>
> So I don't know what else I need to do.
> Thanks!
>
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
arturo lorenzo (arturo-lorenzo) said :
#8

Vish,
I was able to make it work by doing the following:
01) Connect eth2 physically to the switch on both nodes the cloudcontroller1 and the computecontroller1 with no IPS
02) Delete the IP rule on the computecontroller1 (i.e. iptables -t nat -A PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination)
We plan to experiment with other combination (configurations) and if something comes good out of it then we will share. Our main goal is to stop using the two cables between the two eth2 interfaces.

Thanks!