Problems launching instances

Asked by Diego Lalo

Hi everyone, last week I managed to configure a two node OpenStack cloud and it was working fine but today is not working!!! I don't know what is the problem, I restarted the CloudController, I restarted the services but nothing worked, here is the output of nova-compute log:

2011-05-16 17:24:19,823 DEBUG nova.rpc [-] received {u'_context_request_id': u'2S-U0RPA6VPLVL73BB07', u'_context_read_deleted': False, u'args': {u'instance_id': 18, u'injected_files': None, u'availability_zone': None}, u'_context_is_admin': True, u'_context_timestamp': u'2011-05-16T16:24:19Z', u'_context_user': u'myuser', u'method': u'run_instance', u'_context_project': u'proj1', u'_context_remote_address': u'192.168.1.30'} from (pid=1048) _receive /usr/lib/pymodules/python2.6/nova/rpc.py:177
2011-05-16 17:24:19,824 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-05-16T16:24:19Z', 'remote_address': u'192.168.1.30', 'project': u'proj1', 'is_admin': True, 'user': u'myuser', 'request_id': u'2S-U0RPA6VPLVL73BB07', 'read_deleted': False} from (pid=1048) _unpack_context /usr/lib/pymodules/python2.6/nova/rpc.py:350
2011-05-16 17:24:19,895 AUDIT nova.compute.manager [2S-U0RPA6VPLVL73BB07 myuser proj1] instance 18: starting...
2011-05-16 17:24:20,042 DEBUG nova.rpc [-] Making asynchronous call on network.openstack-in-a-box-1rev85 ... from (pid=1048) call /usr/lib/pymodules/python2.6/nova/rpc.py:370
2011-05-16 17:24:20,043 DEBUG nova.rpc [-] MSG_ID is 1d4c30b82a514def8264d5a37a449879 from (pid=1048) call /usr/lib/pymodules/python2.6/nova/rpc.py:373
2011-05-16 17:24:23,646 INFO nova.compute.manager [-] Found instance 'instance-00000012' in DB but no VM. State=0, so setting state to shutoff.
2011-05-16 17:24:23,647 INFO nova.compute.manager [-] DB/VM state mismatch. Changing state from '0' to '5'
2011-05-16 17:25:23,856 INFO nova.compute.manager [-] Found instance 'instance-00000012' in DB but no VM. State=5, so setting state to shutoff.
2011-05-16 17:26:23,925 INFO nova.compute.manager [-] Found instance 'instance-00000012' in DB but no VM. State=5, so setting state to shutoff.

From what I'm uderstanding reading the log the spawning process is interrupted after making asynchronous call on network, nova computer manager waits for 3 seconds and because there is no response to the network call it changes the state of the VM to shutoff. Here is the nova.conf of the cloud controller:

--dhcpbridge_flagfile=/etc/nova/nova.conf
--dhcpbridge=/usr/bin/nova-dhcpbridge
--logdir=/var/log/nova
--state_path=/var/lib/nova
--lock_path=/var/lock/nova
--verbose
--s3_host=192.168.1.234
--rabbit_host=192.168.1.234
--cc_host=192.168.1.234
--ec2_url=http://192.168.1.234:8773/services/Cloud
--fixed_range=10.0.0.0/12
--network_size=2048
--network_manager=nova.network.manager.FlatDHCPManager
--flat_interface=eth0
--routing_source_ip=192.168.1.234
--FAKE_subdomain=ec2
--image_service=nova.image.glance.GlanceImageService
--glance_host=192.168.1.234
--glance_port=9292
--sql_connection=mysql://root:nova@192.168.1.234/nova

And here is the output of ifconfig on the cloud controller:

eth0 Link encap:Ethernet HWaddr 00:24:8c:2e:50:88
          inet addr:192.168.1.234 Bcast:192.168.1.255 Mask:255.255.255.0
          inet6 addr: fe80::224:8cff:fe2e:5088/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:305327 errors:0 dropped:0 overruns:0 frame:0
          TX packets:155066 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:29213846 (29.2 MB) TX bytes:16316677 (16.3 MB)
          Interrupt:42 Base address:0x8000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:1439806 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1439806 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:118299280 (118.2 MB) TX bytes:118299280 (118.2 MB)

virbr0 Link encap:Ethernet HWaddr 66:d1:2b:48:79:ff
          inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

I noticed that there is no br100 before trying to launch an instance but after trying that I can't see the bridge either, I didn't modify anything from last friday on the configuration so I don't understand why the cloud is not working.

Cheers
Diego

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Solved by:
Diego Lalo
Solved:
Last query:
Last reply:
Revision history for this message
Vish Ishaya (vishvananda) said :
#1

Are your services still running? It sounds like nova-network may be down.

Vish

On May 16, 2011, at 10:15 AM, Diego Lalo wrote:

> New question #157772 on OpenStack Compute (nova):
> https://answers.launchpad.net/nova/+question/157772
>
> Hi everyone, last week I managed to configure a two node OpenStack cloud and it was working fine but today is not working!!! I don't know what is the problem, I restarted the CloudController, I restarted the services but nothing worked, here is the output of nova-compute log:
>
> 2011-05-16 17:24:19,823 DEBUG nova.rpc [-] received {u'_context_request_id': u'2S-U0RPA6VPLVL73BB07', u'_context_read_deleted': False, u'args': {u'instance_id': 18, u'injected_files': None, u'availability_zone': None}, u'_context_is_admin': True, u'_context_timestamp': u'2011-05-16T16:24:19Z', u'_context_user': u'myuser', u'method': u'run_instance', u'_context_project': u'proj1', u'_context_remote_address': u'192.168.1.30'} from (pid=1048) _receive /usr/lib/pymodules/python2.6/nova/rpc.py:177
> 2011-05-16 17:24:19,824 DEBUG nova.rpc [-] unpacked context: {'timestamp': u'2011-05-16T16:24:19Z', 'remote_address': u'192.168.1.30', 'project': u'proj1', 'is_admin': True, 'user': u'myuser', 'request_id': u'2S-U0RPA6VPLVL73BB07', 'read_deleted': False} from (pid=1048) _unpack_context /usr/lib/pymodules/python2.6/nova/rpc.py:350
> 2011-05-16 17:24:19,895 AUDIT nova.compute.manager [2S-U0RPA6VPLVL73BB07 myuser proj1] instance 18: starting...
> 2011-05-16 17:24:20,042 DEBUG nova.rpc [-] Making asynchronous call on network.openstack-in-a-box-1rev85 ... from (pid=1048) call /usr/lib/pymodules/python2.6/nova/rpc.py:370
> 2011-05-16 17:24:20,043 DEBUG nova.rpc [-] MSG_ID is 1d4c30b82a514def8264d5a37a449879 from (pid=1048) call /usr/lib/pymodules/python2.6/nova/rpc.py:373
> 2011-05-16 17:24:23,646 INFO nova.compute.manager [-] Found instance 'instance-00000012' in DB but no VM. State=0, so setting state to shutoff.
> 2011-05-16 17:24:23,647 INFO nova.compute.manager [-] DB/VM state mismatch. Changing state from '0' to '5'
> 2011-05-16 17:25:23,856 INFO nova.compute.manager [-] Found instance 'instance-00000012' in DB but no VM. State=5, so setting state to shutoff.
> 2011-05-16 17:26:23,925 INFO nova.compute.manager [-] Found instance 'instance-00000012' in DB but no VM. State=5, so setting state to shutoff.
>
>> From what I'm uderstanding reading the log the spawning process is interrupted after making asynchronous call on network, nova computer manager waits for 3 seconds and because there is no response to the network call it changes the state of the VM to shutoff. Here is the nova.conf of the cloud controller:
>
>
> --dhcpbridge_flagfile=/etc/nova/nova.conf
> --dhcpbridge=/usr/bin/nova-dhcpbridge
> --logdir=/var/log/nova
> --state_path=/var/lib/nova
> --lock_path=/var/lock/nova
> --verbose
> --s3_host=192.168.1.234
> --rabbit_host=192.168.1.234
> --cc_host=192.168.1.234
> --ec2_url=http://192.168.1.234:8773/services/Cloud
> --fixed_range=10.0.0.0/12
> --network_size=2048
> --network_manager=nova.network.manager.FlatDHCPManager
> --flat_interface=eth0
> --routing_source_ip=192.168.1.234
> --FAKE_subdomain=ec2
> --image_service=nova.image.glance.GlanceImageService
> --glance_host=192.168.1.234
> --glance_port=9292
> --sql_connection=mysql://root:nova@192.168.1.234/nova
>
> And here is the output of ifconfig on the cloud controller:
>
>
> eth0 Link encap:Ethernet HWaddr 00:24:8c:2e:50:88
> inet addr:192.168.1.234 Bcast:192.168.1.255 Mask:255.255.255.0
> inet6 addr: fe80::224:8cff:fe2e:5088/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:305327 errors:0 dropped:0 overruns:0 frame:0
> TX packets:155066 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:29213846 (29.2 MB) TX bytes:16316677 (16.3 MB)
> Interrupt:42 Base address:0x8000
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:1439806 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1439806 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:118299280 (118.2 MB) TX bytes:118299280 (118.2 MB)
>
> virbr0 Link encap:Ethernet HWaddr 66:d1:2b:48:79:ff
> inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>
> I noticed that there is no br100 before trying to launch an instance but after trying that I can't see the bridge either, I didn't modify anything from last friday on the configuration so I don't understand why the cloud is not working.
>
>
>
> Cheers
> Diego
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
Diego Lalo (diego-lalo) said :
#2

root@openstack:~# ps ax | grep nova
 1023 ? Ss 0:00 su -c nova-api --flagfile=/etc/nova/nova.conf nova
 1038 ? S 0:04 /usr/bin/python /usr/bin/nova-api --flagfile=/etc/nova/nova.conf
 1040 ? Ss 0:00 su -c nova-compute --flagfile=/etc/nova/nova.conf nova
 1048 ? S 2:11 /usr/bin/python /usr/bin/nova-compute --flagfile=/etc/nova/nova.conf
 1057 ? S 0:00 /usr/bin/python /usr/bin/nova-objectstore --uid 110 --gid 65534 --pidfile /var/run/nova/nova-objectstore.pid --flagfile=/etc/nova/nova.conf --nodaemon --logfile=/var/log/nova/nova-objectstore.log
 1842 ? Ss 0:00 su -c nova-scheduler --flagfile=/etc/nova/nova.conf nova
 1843 ? S 1:44 /usr/bin/python /usr/bin/nova-scheduler --flagfile=/etc/nova/nova.conf
 3148 ? Ss 0:00 su -c nova-network --flagfile=/etc/nova/nova.conf nova
 3149 ? S 0:30 /usr/bin/python /usr/bin/nova-network --flagfile=/etc/nova/nova.conf
 3521 pts/0 S+ 0:00 grep --color=auto nova

Revision history for this message
Vish Ishaya (vishvananda) said :
#3

Try:
rabbitmqctl list_queues

If you see any reporting a number other than 0, then something isn't pulling messages properly. You might want to check the logs of that component to see if it is reporting errors and restart that component.

Vish

On May 16, 2011, at 10:31 AM, Diego Lalo wrote:

> Question #157772 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/157772
>
> Status: Answered => Open
>
> Diego Lalo is still having a problem:
> root@openstack:~# ps ax | grep nova
> 1023 ? Ss 0:00 su -c nova-api --flagfile=/etc/nova/nova.conf nova
> 1038 ? S 0:04 /usr/bin/python /usr/bin/nova-api --flagfile=/etc/nova/nova.conf
> 1040 ? Ss 0:00 su -c nova-compute --flagfile=/etc/nova/nova.conf nova
> 1048 ? S 2:11 /usr/bin/python /usr/bin/nova-compute --flagfile=/etc/nova/nova.conf
> 1057 ? S 0:00 /usr/bin/python /usr/bin/nova-objectstore --uid 110 --gid 65534 --pidfile /var/run/nova/nova-objectstore.pid --flagfile=/etc/nova/nova.conf --nodaemon --logfile=/var/log/nova/nova-objectstore.log
> 1842 ? Ss 0:00 su -c nova-scheduler --flagfile=/etc/nova/nova.conf nova
> 1843 ? S 1:44 /usr/bin/python /usr/bin/nova-scheduler --flagfile=/etc/nova/nova.conf
> 3148 ? Ss 0:00 su -c nova-network --flagfile=/etc/nova/nova.conf nova
> 3149 ? S 0:30 /usr/bin/python /usr/bin/nova-network --flagfile=/etc/nova/nova.conf
> 3521 pts/0 S+ 0:00 grep --color=auto nova
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
Diego Lalo (diego-lalo) said :
#4

{error_logger,{{2011,5,16},{18,37,53}},std_error,"File operation error: eacces. Target: /etc/hosts. Function: get_file. Process: inet_db."}

=ERROR REPORT==== 16-May-2011::18:37:53 ===
File operation error: eacces. Target: /etc/hosts. Function: get_file. Process: inet_db.
Listing queues ...
1d4c30b82a514def8264d5a37a449879 0
compute.openstack 0
b7380f15d08c4c50ac4ad3915b4f62a5 0
scheduler.openstack 0
compute_fanout_946ec41b9ae04b51a8daf46ddd89b922 0
network.openstack 0
network_fanout_ec4d522fda714aa2885cd132e1e35695 0
compute_fanout_f3dee487234648f89beb0f5071cd9593 0
scheduler_fanout_5a81125db31e49edac3d268052d1abfb 0
090f6b86a38348cfb4890d4914b57d7e 0
network_fanout_81b19b55ac7b4ae884c1107fc3272924 0
compute 0
scheduler 0
network 0
network_fanout_3bc4d7b2b41e4dcab9c47cd7fc8e8b36 0
798f654b5cfe4e46abec4f52038c0ac8 0
compute.eucalyptus 0
...done.

This is the output of <email address hidden>

=WARNING REPORT==== 16-May-2011::18:39:39 ===
exception on TCP connection <0.7872.0> from 192.168.1.234:44306
connection_closed_abruptly

=INFO REPORT==== 16-May-2011::18:39:39 ===
closing TCP connection <0.7872.0> from 192.168.1.234:44306

=INFO REPORT==== 16-May-2011::18:39:39 ===
accepted TCP connection on 0.0.0.0:5672 from 192.168.1.234:53720

=INFO REPORT==== 16-May-2011::18:39:39 ===
starting TCP connection <0.23529.0> from 192.168.1.234:53720

=WARNING REPORT==== 16-May-2011::18:39:39 ===
exception on TCP connection <0.23529.0> from 192.168.1.234:53720
connection_closed_abruptly

=INFO REPORT==== 16-May-2011::18:39:39 ===
closing TCP connection <0.23529.0> from 192.168.1.234:53720

=INFO REPORT==== 16-May-2011::18:39:39 ===
accepted TCP connection on 0.0.0.0:5672 from 192.168.1.234:53721

=INFO REPORT==== 16-May-2011::18:39:39 ===
starting TCP connection <0.23535.0> from 192.168.1.234:53721

=INFO REPORT==== 16-May-2011::18:39:40 ===
accepted TCP connection on 0.0.0.0:5672 from 192.168.1.50:50550

=INFO REPORT==== 16-May-2011::18:39:40 ===
starting TCP connection <0.23544.0> from 192.168.1.50:50550

=INFO REPORT==== 16-May-2011::18:39:40 ===
accepted TCP connection on 0.0.0.0:5672 from 192.168.1.50:50551

=INFO REPORT==== 16-May-2011::18:39:40 ===
starting TCP connection <0.23551.0> from 192.168.1.50:50551

=WARNING REPORT==== 16-May-2011::18:39:42 ===
exception on TCP connection <0.23535.0> from 192.168.1.234:53721
connection_closed_abruptly

=INFO REPORT==== 16-May-2011::18:39:42 ===
closing TCP connection <0.23535.0> from 192.168.1.234:53721

Revision history for this message
Vish Ishaya (vishvananda) said :
#5

something looks a little odd there. I might try restarting rabbit and restarting all of the services.

On May 16, 2011, at 10:46 AM, Diego Lalo wrote:

> Question #157772 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/157772
>
> Status: Answered => Open
>
> Diego Lalo is still having a problem:
> {error_logger,{{2011,5,16},{18,37,53}},std_error,"File operation error:
> eacces. Target: /etc/hosts. Function: get_file. Process: inet_db."}
>
> =ERROR REPORT==== 16-May-2011::18:37:53 ===
> File operation error: eacces. Target: /etc/hosts. Function: get_file. Process: inet_db.
> Listing queues ...
> 1d4c30b82a514def8264d5a37a449879 0
> compute.openstack 0
> b7380f15d08c4c50ac4ad3915b4f62a5 0
> scheduler.openstack 0
> compute_fanout_946ec41b9ae04b51a8daf46ddd89b922 0
> network.openstack 0
> network_fanout_ec4d522fda714aa2885cd132e1e35695 0
> compute_fanout_f3dee487234648f89beb0f5071cd9593 0
> scheduler_fanout_5a81125db31e49edac3d268052d1abfb 0
> 090f6b86a38348cfb4890d4914b57d7e 0
> network_fanout_81b19b55ac7b4ae884c1107fc3272924 0
> compute 0
> scheduler 0
> network 0
> network_fanout_3bc4d7b2b41e4dcab9c47cd7fc8e8b36 0
> 798f654b5cfe4e46abec4f52038c0ac8 0
> compute.eucalyptus 0
> ...done.
>
> This is the output of <email address hidden>
>
>
> =WARNING REPORT==== 16-May-2011::18:39:39 ===
> exception on TCP connection <0.7872.0> from 192.168.1.234:44306
> connection_closed_abruptly
>
> =INFO REPORT==== 16-May-2011::18:39:39 ===
> closing TCP connection <0.7872.0> from 192.168.1.234:44306
>
> =INFO REPORT==== 16-May-2011::18:39:39 ===
> accepted TCP connection on 0.0.0.0:5672 from 192.168.1.234:53720
>
> =INFO REPORT==== 16-May-2011::18:39:39 ===
> starting TCP connection <0.23529.0> from 192.168.1.234:53720
>
> =WARNING REPORT==== 16-May-2011::18:39:39 ===
> exception on TCP connection <0.23529.0> from 192.168.1.234:53720
> connection_closed_abruptly
>
> =INFO REPORT==== 16-May-2011::18:39:39 ===
> closing TCP connection <0.23529.0> from 192.168.1.234:53720
>
> =INFO REPORT==== 16-May-2011::18:39:39 ===
> accepted TCP connection on 0.0.0.0:5672 from 192.168.1.234:53721
>
> =INFO REPORT==== 16-May-2011::18:39:39 ===
> starting TCP connection <0.23535.0> from 192.168.1.234:53721
>
> =INFO REPORT==== 16-May-2011::18:39:40 ===
> accepted TCP connection on 0.0.0.0:5672 from 192.168.1.50:50550
>
> =INFO REPORT==== 16-May-2011::18:39:40 ===
> starting TCP connection <0.23544.0> from 192.168.1.50:50550
>
> =INFO REPORT==== 16-May-2011::18:39:40 ===
> accepted TCP connection on 0.0.0.0:5672 from 192.168.1.50:50551
>
> =INFO REPORT==== 16-May-2011::18:39:40 ===
> starting TCP connection <0.23551.0> from 192.168.1.50:50551
>
> =WARNING REPORT==== 16-May-2011::18:39:42 ===
> exception on TCP connection <0.23535.0> from 192.168.1.234:53721
> connection_closed_abruptly
>
> =INFO REPORT==== 16-May-2011::18:39:42 ===
> closing TCP connection <0.23535.0> from 192.168.1.234:53721
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
Diego Lalo (diego-lalo) said :
#6

I restarted rabbitmq and then all of nova services, I also checked again the message queues and they are all empty, I also noticed something strange at the start of the queues list:

{error_logger,{{2011,5,16},{18,55,20}},std_error,"File operation error: eacces. Target: /etc/hosts. Function: get_file. Process: inet_db."}

=ERROR REPORT==== 16-May-2011::18:55:20 ===
File operation error: eacces. Target: /etc/hosts. Function: get_file. Process: inet_db.

Is this a problem?

Cheers
Diego

Revision history for this message
Vish Ishaya (vishvananda) said :
#7

Not sure, I've never seen that message before.

Vish

On May 16, 2011, at 11:01 AM, Diego Lalo wrote:

> Question #157772 on OpenStack Compute (nova) changed:
> https://answers.launchpad.net/nova/+question/157772
>
> Status: Answered => Open
>
> Diego Lalo is still having a problem:
> I restarted rabbitmq and then all of nova services, I also checked again
> the message queues and they are all empty, I also noticed something
> strange at the start of the queues list:
>
> {error_logger,{{2011,5,16},{18,55,20}},std_error,"File operation error:
> eacces. Target: /etc/hosts. Function: get_file. Process: inet_db."}
>
> =ERROR REPORT==== 16-May-2011::18:55:20 ===
> File operation error: eacces. Target: /etc/hosts. Function: get_file. Process: inet_db.
>
> Is this a problem?
>
>
> Cheers
> Diego
>
> --
> You received this question notification because you are a member of Nova
> Core, which is an answer contact for OpenStack Compute (nova).

Revision history for this message
Diego Lalo (diego-lalo) said :
#8

Vish thank you very much for your help!! I finally discovered what was the problem, on Friday before I left the office I changed the hostname of the cloud controller and I suspect that triggered the problem. I had to reinstall the cloud controller because I changed the hostname to the previous name but I couldn't make the services work so I had no choice. I noticed the hostname change when I saw the services table from nova DB I saw several nova-compute services corresponding to the old hostname and to the new one. So everyone that reads this message remember to NOT CHANGE THE HOSTNAME!!!!

Cheers
Diego