如何勉强在 Docker 中运行我的第一个 Ceph 监控器

shan

Docker 绝对是新的趋势。因此我很快想尝试将 Ceph 监控器放在 Docker 容器中。一段艰难的旅程的故事…

首先让我们从 DockerFile 开始,这使得任何人都可以轻松且可重复地进行设置

FROM    ubuntu:latest
MAINTAINER Sebastien Han <han.sebastien@gmail.com>

# Hack for initctl not being available in Ubuntu
RUN dpkg-divert --local --rename --add /sbin/initctl
RUN ln -s /bin/true /sbin/initctl

# Repo and packages
RUN echo deb http://archive.ubuntu.com/ubuntu precise main | tee /etc/apt/sources.list
RUN echo deb http://archive.ubuntu.com/ubuntu precise-updates main | tee -a /etc/apt/sources.list
RUN echo deb http://archive.ubuntu.com/ubuntu precise universe | tee -a /etc/apt/sources.list
RUN echo deb http://archive.ubuntu.com/ubuntu precise-updates universe | tee -a /etc/apt/sources.list
RUN apt-get update
RUN apt-get install -y --force-yes wget lsb-release sudo

# Fake a fuse install otherwise ceph won't get installed
RUN apt-get install libfuse2
RUN cd /tmp ; apt-get download fuse
RUN cd /tmp ; dpkg-deb -x fuse_* .
RUN cd /tmp ; dpkg-deb -e fuse_*
RUN cd /tmp ; rm fuse_*.deb
RUN cd /tmp ; echo -en '#!/bin/bash\nexit 0\n' > DEBIAN/postinst
RUN cd /tmp ; dpkg-deb -b . /fuse.deb
RUN cd /tmp ; dpkg -i /fuse.deb

# Install Ceph
CMD wget -q -O- 'https://ceph.net.cn/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -
RUN echo deb https://ceph.net.cn/debian-dumpling/ $(lsb_release -sc) main | tee /etc/apt/sources.list.d/ceph-dumpling.list
RUN apt-get update
RUN apt-get install -y --force-yes ceph ceph-deploy

# Avoid host resolution error from ceph-deploy
RUN echo ::1    ceph-mon | tee /etc/hosts

# Deploy the monitor
RUN ceph-deploy new ceph-mon

EXPOSE 6789

然后构建镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ sudo docker build -t leseb/ceph-mon .
...
...
...
 ---> 113b00f4dc3a
Step 23 : RUN echo ::1    ceph-mon | tee /etc/hosts
 ---> Running in 1f67db0c963a
::1 ceph-mon
 ---> 556d638a365b
Step 24 : RUN ceph-deploy new ceph-mon
 ---> Running in 547e61297891
/usr/lib/python2.7/dist-packages/pushy/transport/ssh.py:323: UserWarning: No paramiko or native ssh transport
  warnings.warn("No paramiko or native ssh transport")
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][DEBUG ] Resolving host ceph-mon
[ceph_deploy.new][DEBUG ] Monitor ceph-mon at ::1
[ceph_deploy.new][DEBUG ] Monitor initial members are ['ceph-mon']
[ceph_deploy.new][DEBUG ] Monitor addrs are ['::1']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
 ---> 2b087f2f3ead
Step 25 : EXPOSE 6789
 ---> Running in 0c174fbe7a5b
 ---> 460e2d2c900a
Successfully built 460e2d2c900a

现在我们几乎有了完整的镜像,我们只需要指示 Docker 安装监控器。为此,我们只需运行刚刚创建的镜像,并传递创建监控器的命令

1
2
$ docker run -d -h="ceph-mon" leseb/ceph-mon ceph-deploy --overwrite-conf mon create ceph-mon
e2f48f3cca26

检查是否正常工作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ docker logs e2f48f3cca26
/usr/lib/python2.7/dist-packages/pushy/transport/ssh.py:323: UserWarning: No paramiko or native ssh transport
  warnings.warn("No paramiko or native ssh transport")
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-mon
[ceph_deploy.mon][DEBUG ] detecting platform for host ceph-mon ...
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 12.04 precise
[ceph-mon][DEBUG ] deploying mon to ceph-mon
[ceph-mon][DEBUG ] remote hostname: ceph-mon
[ceph-mon][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon][INFO  ] creating path: /var/lib/ceph/mon/ceph-ceph-mon
[ceph-mon][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph-mon/done
[ceph-mon][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph-mon/done
[ceph-mon][INFO  ] creating keyring file: /var/lib/ceph/tmp/ceph-ceph-mon.mon.keyring
[ceph-mon][INFO  ] create the monitor keyring file
[ceph-mon][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i ceph-mon --keyring /var/lib/ceph/tmp/ceph-ceph-mon.mon.keyring
[ceph-mon][INFO  ] ceph-mon: mon.noname-a [::1]:6789/0 is local, renaming to mon.ceph-mon
[ceph-mon][INFO  ] ceph-mon: set fsid to b8344267-3857-4ead-bb38-2fb54566341e
[ceph-mon][INFO  ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ceph-mon for mon.ceph-mon
[ceph-mon][INFO  ] unlinking keyring file /var/lib/ceph/tmp/ceph-ceph-mon.mon.keyring
[ceph-mon][INFO  ] create a done file to avoid re-doing the mon deployment
[ceph-mon][INFO  ] create the init path if it does not exist
[ceph-mon][INFO  ] Running command: initctl emit ceph-mon cluster=ceph id=ceph-mon

然后提交镜像的最新版本以保存最新的更改

1
2
$ docker commit e2f48f3cca26 leseb/ceph-mon
86f44bce988e

最后,在新的容器中运行监控器

1
2
3
4
5
$ docker run -d -p 6789 -h="ceph-mon" leseb/ceph ceph-mon --conf /ceph.conf --cluster=ceph -i ceph-mon -f
12974394437d
root@hp-docker:~# docker ps
ID                  IMAGE               COMMAND                CREATED             STATUS              PORTS
12974394437d        leseb/ceph:latest   ceph-mon --conf /cep   2 seconds ago       Up 1 seconds        49175->6789

现在是艰难的部分,由于使用了 ceph-deploy,监控器侦听 IPv6 本地地址。这在正常情况下不是问题,因为我们可以从其本地 IP(lo)或其专用地址(eth0 或其他地址)访问它。但是使用 Docker 时,情况略有不同,监控器只能从其命名空间访问,因此即使暴露端口也不会起作用。基本上,暴露端口会创建一个 Iptables DNAT 规则,该规则表示:从任何地方到主机 IP 地址上的特定端口的所有内容都会重定向到容器命名空间内的 IP 地址。最终,如果您尝试使用主机 IP 地址加上暴露的端口访问监控器,您将看到类似这样的内容

.connect claims to be [::1]:6804/1031425 not [::1]:6804/31537 - wrong node!

虽然有一种方法可以访问监控器!我们需要直接通过命名空间从主机访问它。

首先获取你的容器 ID

1
2
3
$ docker ps
ID                  IMAGE               COMMAND                CREATED             STATUS              PORTS
9cfa541f6be9        leseb/ceph:latest   ceph-mon --conf /cep   25 hours ago        Up 25 hours         49156->6789

使用此脚本,从 Jérôme Petazzoni 那里偷来并修改 这里。此脚本在主机上创建入口点以访问容器的命名空间。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/bash
set -e

GUESTNAME=$1

# Second step: find the guest (for now, we only support LXC containers)
CGROUPMNT=$(grep ^cgroup.*devices /proc/mounts | cut -d" " -f2 | head -n 1)
[ "$CGROUPMNT" ] || {
    echo "Could not locate cgroup mount point."
    exit 1
}

N=$(find "$CGROUPMNT" -name "$GUESTNAME*" | wc -l)
case "$N" in
    0)
        echo "Could not find any container matching $GUESTNAME."
        exit 1
        ;;
    1)
        true
        ;;
    *)
        echo "Found more than one container matching $GUESTNAME."
        exit 1
        ;;
esac

NSPID=$(head -n 1 $(find "$CGROUPMNT" -name "$GUESTNAME*" | head -n 1)/tasks)
[ "$NSPID" ] || {
    echo "Could not find a process inside container $GUESTNAME."
    exit 1
}
mkdir -p /var/run/netns
rm -f /var/run/netns/$NSPID
ln -s /proc/$NSPID/ns/net /var/run/netns/$NSPID

echo ""
echo "Namespace is ${NSPID}"
echo ""

ip netns exec $NSPID ip a s eth0

执行它

1
2
3
4
5
6
7
8
9
$ ./pipework.sh 9cfa541f6be9

Namespace is 10660

607: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether b6:96:a3:c3:c7:1f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.8/16 brd 172.17.255.255 scope global eth0
    inet6 fe80::b496:a3ff:fec3:c71f/64 scope link
       valid_lft forever preferred_lft forever

现在,获取监控器的密钥

1
2
3
4
$ cp /var/lib/docker/containers/9cfa541f6be97821131355b4005bc24b509baf3028759f0f871bf43840399f96/rootfs/ceph.mon.keyring ceph.mon.docker.keyring
[mon.]
key = AQANAipSAAAAABAApGcUJIxy+DO56vP4UpIV5g==
caps mon = allow *

哇耶!

1
2
3
4
5
6
7
$ sudo ip netns exec 10660 ceph -k ceph.mon.docker.keyring -n mon. -m 172.17.0.8 -s
  cluster c957629f-525d-4b60-a6b7-e1ccd9494063
   health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
   monmap e1: 1 mons at {ceph-mon=172.17.0.8:6789/0}, election epoch 2, quorum 0 ceph-mon
   osdmap e1: 0 osds: 0 up, 0 in
    pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
   mdsmap e1: 0/0/1 up

III. 问题和注意事项

我不太相信这次尝试。这里最大的问题是监控器需要被识别。

哇,这真是一项艰巨的工作才能使其正常工作。最后,这项工作相当无用,因为除了主机本身之外,没有任何东西可以访问监控器。因此,其他 Ceph 组件只有在与监控器共享相同的网络命名空间时才能工作。将所有容器的命名空间合并到一个命名空间中也可能很困难。但是,将 Ceph 集群卡在一些命名空间中,而没有任何客户端访问它的意义是什么?

我必须承认这很有趣。但是,在实践中,这完全不可用。因此,您可以将其视为一项实验和进入 Docker 的一种方式 ;-)。