ceph osd fails on worker nodes not on master-server name not found: ceph-mon-discovery.ceph.svc.cluster.local

Asked by chinasubbareddy mallavarapu

when setting up multinode , ceph osd pods getting failed on worker node not on master node .

root@ceph-mon1:~# kubectl get po -n ceph -o wide |grep osd
ceph-osd-default-83945928-5czl6 0/1 CrashLoopBackOff 432 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-osd-default-83945928-9psxt 0/1 CrashLoopBackOff 432 1d 10.142.0.2 ceph1.c.kube5s-199510.internal
ceph-osd-default-83945928-kg5t6 1/1 Running 0 1d 10.142.0.5 ceph-mon1.c.kube5s-199510.internal

here are logs from them :

root@ceph-mon1:~# kubectl logs ceph-osd-default-83945928-5czl6 -n ceph
LAUNCHING OSD: in directory:directory mode
+ echo 'LAUNCHING OSD: in directory:directory mode'
+ exec /tmp/osd-directory.sh
+ export LC_ALL=C
+ LC_ALL=C
+ : ceph2
+ : 'root=default host=ceph2'
+ : /var/lib/ceph/osd/ceph
+ : /var/lib/ceph/journal
+ : /var/lib/ceph/bootstrap-osd/ceph.keyring
+ is_available rpm
+ command -v rpm
+ is_available dpkg
+ command -v dpkg
+ OS_VENDOR=ubuntu
+ source /etc/default/ceph
++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
++ ceph -v
++ egrep -q '12.2|luminous'
++ echo 0
+ [[ 0 -ne 0 ]]
+ [[ ! -d /var/lib/ceph/osd ]]
+ '[' -z ceph2 ']'
++ find /var/lib/ceph/osd -prune -empty
+ [[ -n /var/lib/ceph/osd ]]
+ echo 'Creating osd'
Creating osd
++ uuidgen
+ UUID=5f7a4e0d-3de6-4620-bd94-6f8676a06b6c
++ ceph-authtool --gen-print-key
+ OSD_SECRET=AQCQJtdaO4gCNRAAqLweK/IhObI5EKAvYZ0Rpg==
++ echo '{"cephx_secret": "AQCQJtdaO4gCNRAAqLweK/IhObI5EKAvYZ0Rpg=="}'
++ ceph osd new 5f7a4e0d-3de6-4620-bd94-6f8676a06b6c -i - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring
unable to parse addrs in 'ceph-mon-discovery.ceph.svc.cluster.local'
InvalidArgumentError does not take keyword arguments
+ OSD_ID='server name not found: ceph-mon-discovery.ceph.svc.cluster.local (Temporary failure in name resolution)'

we could resolve the name successfully , not sure why it is failing .

root@ceph-mon1:~# nslookup ceph-mon-discovery.ceph.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53

Non-authoritative answer:
Name: ceph-mon-discovery.ceph.svc.cluster.local
Address: 10.142.0.5
Name: ceph-mon-discovery.ceph.svc.cluster.local
Address: 10.142.0.3
Name: ceph-mon-discovery.ceph.svc.cluster.local
Address: 10.142.0.2

infact whole ceph cluster looks like this :

root@ceph-mon1:~# kubectl get po -n ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE
ceph-bootstrap-rcjqn 0/1 CrashLoopBackOff 434 1d 192.168.108.17 ceph2.c.kube5s-199510.internal
ceph-cephfs-provisioner-56cd9948c5-rh2sf 0/1 Init:0/1 0 1d 192.168.193.209 ceph1.c.kube5s-199510.internal
ceph-cephfs-provisioner-56cd9948c5-snqmr 0/1 Init:0/1 0 1d 192.168.108.19 ceph2.c.kube5s-199510.internal
ceph-mds-679f98dd45-w99x4 0/1 Init:0/2 0 1d 192.168.108.15 ceph2.c.kube5s-199510.internal
ceph-mgr-7c66bd658-wbjtx 0/1 CrashLoopBackOff 448 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-mon-9fgt8 0/1 Running 1 1d 10.142.0.5 ceph-mon1.c.kube5s-199510.internal
ceph-mon-check-74b98c966b-vt9wr 1/1 Running 0 1d 192.168.193.205 ceph1.c.kube5s-199510.internal
ceph-mon-vnfd8 0/1 CrashLoopBackOff 201 1d 10.142.0.2 ceph1.c.kube5s-199510.internal
ceph-mon-vxgw9 0/1 CrashLoopBackOff 202 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-osd-default-83945928-5czl6 0/1 CrashLoopBackOff 433 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-osd-default-83945928-9psxt 0/1 CrashLoopBackOff 432 1d 10.142.0.2 ceph1.c.kube5s-199510.internal
ceph-osd-default-83945928-kg5t6 1/1 Running 0 1d 10.142.0.5 ceph-mon1.c.kube5s-199510.internal
ceph-rbd-pool-qzwr6 0/1 CrashLoopBackOff 409 1d 192.168.108.21 ceph2.c.kube5s-199510.internal
ceph-rbd-provisioner-69c59fb6f6-22nfc 0/1 Init:0/1 0 1d 192.168.193.210 ceph1.c.kube5s-199510.internal
ceph-rbd-provisioner-69c59fb6f6-kcb8f 0/1 Init:0/1 0 1d 192.168.108.16 ceph2.c.kube5s-199510.internal
ceph-rgw-85d66f9658-84rw4 0/1 Init:0/3 0 1d 192.168.193.206 ceph1.c.kube5s-199510.internal

Question information

Language:
English Edit question
Status:
Solved
For:
openstack-helm Edit question
Assignee:
No assignee Edit question
Solved by:
chinasubbareddy mallavarapu
Solved:
Last query:
Last reply:
Revision history for this message
chinasubbareddy mallavarapu (chinasubbareddy) said :
#1

converting this to a question as this is the problem with environment which is running on top of gcloud.

Revision history for this message
chinasubbareddy mallavarapu (chinasubbareddy) said :
#2

GCE blocks traffic between hosts by default; run the following command to allow Calico traffic to flow between containers on different hosts (where the source-ranges parameter assumes you have created your project with the default GCE network parameters - modify the address range if yours is different):

gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:

gcloud compute firewall-rules list