像虚拟机一样运行容器

背景链接到标题

平时看过文章标题比较多的都是说“像容器一样运行虚拟机”，大家都想要有虚拟机的隔离性，又想要容器的便捷性，也有一些开源项目比如 Firecracker 或 KataContainer 在做。今天反过来，来看看如何“像虚拟机一样运行容器”。

为啥要把容器搞得像虚拟机一样呢？我平时用到容器比较多的地方就是在 CI 集成部分，通过 docker 快速搭建环境，进行单元测试或集成测试，测试完成后清理镜像，简单方便。但是在CD 部分，就有一点比较头疼的问题，就是调试。zouquan 同学之前在知乎上提了一个问题：容器化环境里如何方便的进行debug和测试？，回答中的一个总结很好的描述了这个问题的关键： 虽然我在本地开发，但我的应用就像在 k8s 里一样。

那怎么在容器中开发像是在本地一样呢？肯定不能每次改了代码都走一遍 build,push,deploy 的流程，上面问题的回答中给出的是借助各种工具来达成这样的效果，我不像要用那些奇奇怪怪的工作（学不动了），那么只能想办法把容器搞的跟虚拟机一样了。

最近看到了 weaveworks/footloose 项目，这个项目的简介就是我的最原始的需求：Containers that look like Virtual Machines。先来看看这个项目的示例（开源项目中examples 写的好真是上手快）。

功能示例链接到标题

Ansible 远程控制链接到标题

[root@yiran ansible]# footloose config create --replicas 1 # 指定 machine 副本数为 1
[root@yiran ansible]# footloose create                     # 创建目标资源
INFO[0000] Creating SSH key: cluster-key ...            
INFO[0000] Docker Image: quay.io/footloose/centos7:0.6.1 present locally 
INFO[0000] Creating machine: cluster-node0 ...          
INFO[0001] Machine cluster-node0 is already created...  
[root@yiran ansible]# cat ansible.cfg                      # 在 ansible 配置文件中指定 inventory 及连接参数
[defaults]
inventory=inventory.txt
remote_user=root
debug=no

[privilege_escalation]
become=no

[root@yiran ansible]# cat inventory.txt 
[all]
cluster-node0 ansible_connection=docker                    # 编写对应 machine 连接方式

[root@yiran ansible]# ansible -m ping all                  # 验证 ansible 通信
cluster-node0 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}

可以看到，通过 footloose 创建一个 machine（容器），可以支持我们远程连接，通过 Ansible 来控制，那么我们来试试 Ansible Playbook 的效果：

---
- name: Install nginx
  hosts: cluster-node0

  tasks:
  - name: Add epel-release repo
    yum:
      name: epel-release
      state: latest

  - name: Install nginx
    yum:
      name: nginx
      state: latest

  - name: Insert Index Page
    copy:
      content: "welcome to footloose nginx ansible example"
      dest: /usr/share/nginx/html/index.html

  - name: Start NGiNX
    service:
      name: nginx
      state: started

执行结果：

[root@yiran ansible]# ansible-playbook  example1.yml

PLAY [Install nginx] ********************************

TASK [Gathering Facts] *********************************
ok: [cluster-node0]

TASK [Add epel-release repo] *******************************
changed: [cluster-node0]

TASK [Install nginx] ********************************
changed: [cluster-node0]

TASK [Insert Index Page] ***********************************
changed: [cluster-node0]

TASK [Start NGiNX] **********************************
changed: [cluster-node0]

PLAY RECAP ************************
cluster-node0              : ok=5    changed=4    unreachable=0    failed=0   

[root@yiran ansible]# ansible all -m raw -a 'systemctl status nginx'
cluster-node0 | SUCCESS | rc=0 >>
● nginx.service - The nginx HTTP and reverse proxy server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-09-19 08:51:26 UTC; 10s ago
  Process: 437 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS)
  Process: 436 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=0/SUCCESS)
  Process: 435 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)
 Main PID: 438 (nginx)
   CGroup: /docker/6b8bd7e41a6a303d5cc023e2c2e576773649e4a5188f4ef15b0ad3079e148b49/system.slice/nginx.service
           ├─438 nginx: master process /usr/sbin/ngin
           ├─439 nginx: worker proces
           ├─440 nginx: worker proces
           ├─441 nginx: worker proces
           └─442 nginx: worker proces

Sep 19 08:51:26 node0 systemd[1]: Starting The nginx HTTP and reverse proxy server...
Sep 19 08:51:26 node0 nginx[436]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
Sep 19 08:51:26 node0 nginx[436]: nginx: configuration file /etc/nginx/nginx.conf test is successful
Sep 19 08:51:26 node0 systemd[1]: Started The nginx HTTP and reverse proxy server.

可以执行 Ansible Playbook，那几乎意味着我们可以执行任何操作，我们可以通过 ansible rsync 模块直接将代码同步到容器中，也可以通过 Playbook 在容器中执行一些配置来达到我们对环境的修改，可以说是很方便了。

SSH 连接链接到标题

既然可以通过 Ansible 进行控制，那么我们肯定也可以通过 ssh 进行连接，可以通过 footloose 提供的默认命令 footloose ssh ：

[root@yiran ansible]# footloose ssh root@node0
Last login: Thu Sep 19 08:57:17 2019 from gateway
[root@node0 ~]# hostname
node0
[root@node0 ~]# logout
Connection to localhost closed.

Host 端口映射链接到标题

在容器使用的过程中，我们通常需要跑一些对外提供端口的服务，这时候就需要进行 Host 端口映射，先来看下 footloose 的配置文件，这里我们指定了 machine的数量是 2，并且指定了容器的 22 端口映射到 host 的 2222端口，依次递增：

cluster:
  name: cluster
  privateKey: cluster-key
machines:
- count: 2
  spec:
    image: quay.io/footloose/centos7
    name: node%d
    portMappings:
    - containerPort: 22
      hostPort: 2222

创建对应 machine 资源：

[root@yiran simple-hostPort]# footloose create
INFO[0000] Creating SSH key: cluster-key ...            
INFO[0000] Pulling image: quay.io/footloose/centos7 ... 
INFO[0013] Creating machine: cluster-node0 ...          
INFO[0014] Creating machine: cluster-node1 ...

通过 netstat 查看 Host 端口情况，这里可以看到 footloose 使用的是 docker 作为容器管理入口：

[root@yiran simple-hostPort]# netstat -antp |grep 222
tcp6       0      0 :::2222                 :::*                    LISTEN      42227/docker-proxy  
tcp6       0      0 :::2223                 :::*                    LISTEN      42540/docker-proxy

这时候就可以使用普通的 ssh 命令连接到容器中了：

[root@yiran simple-hostPort]# ssh root@127.0.0.1 -p 2222 -i cluster-key hostname
The authenticity of host '[127.0.0.1]:2222 ([127.0.0.1]:2222)' can't be established.
ECDSA key fingerprint is SHA256:a6w9oFXMxjPCIXV42C44ogH9uaOILQiAdo/nlGdOnoc.
ECDSA key fingerprint is MD5:6b:a8:78:08:78:63:d4:26:b8:11:9e:3c:31:24:ad:6e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
node0
[root@yiran simple-hostPort]# ssh root@127.0.0.1 -p 2223 -i cluster-key hostname
The authenticity of host '[127.0.0.1]:2223 ([127.0.0.1]:2223)' can't be established.
ECDSA key fingerprint is SHA256:o5cVIJ1MBlw/J/OcNcjZxjqogiIVe03HhU0ZYZEuyPM.
ECDSA key fingerprint is MD5:06:a6:4f:09:4c:23:1e:17:ee:f6:fe:f1:fd:35:e1:ba.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[127.0.0.1]:2223' (ECDSA) to the list of known hosts.
node1
[root@yiran simple-hostPort]# footloose show
NAME            HOSTNAME   PORTS      IP           IMAGE                       CMD          STATE     BACKEND
cluster-node0   node0      2222->22   172.17.0.2   quay.io/footloose/centos7   /sbin/init   Running   
cluster-node1   node1      2223->22   172.17.0.3   quay.io/footloose/centos7   /sbin/init   Running

写了三个使用场景，那么我们来看看 footloose 是怎么实现的。

代码实现链接到标题

machine 创建：

// CreateMachine creates and starts a new machine in the cluster.
func (c *Cluster) CreateMachine(machine *Machine, i int) error {
	name := machine.ContainerName()

	publicKey, err := c.publicKey(machine) # 获取当前主机的公钥
	...

	cmd := "/sbin/init"                    # 指定容器运行命令
	if machine.spec.Cmd != "" {
		cmd = machine.spec.Cmd
	}

	if machine.IsIgnite() {                # 判断 backend
		...
	} else {                              
		runArgs := c.createMachineRunArgs(machine, name, i)
		_, err := docker.Create(machine.spec.Image,
			runArgs,
			[]string{cmd},
		)
		if err != nil {
			return err
		}

		if len(machine.spec.Networks) > 1 { # 当容器有多个网络配置时，依次进行 bridge 连接
			for _, network := range machine.spec.Networks[1:] {
				log.Infof("Connecting %s to the %s network...", name, network)
				if network == "bridge" {
					if err := docker.ConnectNetwork(name, network); err != nil {
						return err
					}
				} else {
					if err := docker.ConnectNetworkWithAlias(name, network, machine.Hostname()); err != nil {
						return err
					}
				}
			}
		}

		if err := docker.Start(name); err != nil {
			return err
		}

		// Initial provisioning.
		if err := containerRunShell(name, initScript); err != nil {
			return err
		}
		if err := copy(name, publicKey, "/root/.ssh/authorized_keys"); err != nil {
			return err
		}
	}

	return nil
}

解这看下 createMachineRunArgs 里面的实现：

func (c *Cluster) createMachineRunArgs(machine *Machine, name string, i int) []string {
	runArgs := []string{ # 根据已有参数，进行 docker 的命令行拼接
		"-it",
		"--label", "works.weave.owner=footloose",
		"--label", "works.weave.cluster=" + c.spec.Cluster.Name,
		"--name", name,
		"--hostname", machine.Hostname(),
		"--tmpfs", "/run",              # 注意这里传入的参数部分
		"--tmpfs", "/run/lock",
		"--tmpfs", "/tmp:exec,mode=777",
		"-v", "/sys/fs/cgroup:/sys/fs/cgroup:ro", 
	}

	for _, volume := range machine.spec.Volumes { # 卷挂载
		...
	}

	for _, mapping := range machine.spec.PortMappings { # 端口映射
		...
	}

	if machine.spec.Privileged {
		runArgs = append(runArgs, "--privileged")
	}

	if len(machine.spec.Networks) > 0 { # 网络连接
		...
	}

	return runArgs
}

这里需要注意的是，在 docker 命令行最终执行时，添加了 --tmpfs /run --tmpfs /run/lock --tmpfs /tmp:exec,mode=777 参数，并且将 Host 的 cgroup 配置路径通过只读权限传递给了容器，后面有用到。

其他的启动，停止，删除等操作也都是拼接为 docker 的命令行然后执行处理的，这里不过多描述。

那么有个问题，在容器内部，pid 为1 的进程应该是我们运行容器时传递的参数，也就时说，当我们执行的进程结束时，容器也就退出了：

[root@yiran ~]# docker run centos sleep 6000
Unable to find image 'centos:latest' locally
latest: Pulling from library/centos
d8d02d457314: Already exists 
Digest: sha256:307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb
Status: Downloaded newer image for centos:latest

# 新开 termimal
[root@yiran ~]# docker ps |grep -i centos
6337dc1ad054        centos                      "sleep 6000"        13 minutes ago      Up 13 minutes                              heuristic_haibt
[root@yiran ~]# docker exec -it  6337dc1ad054 bash
[root@6337dc1ad054 /]# ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
root          1      0  0 10:29 ?        00:00:00 sleep 6000
root         23      0  3 10:43 pts/0    00:00:00 bash
root         36     23  0 10:43 pts/0    00:00:00 ps -ef

来看下 footloose 创建的 machine 是如何保证容器持久运行的：

[root@yiran ansible]# 
[root@yiran ansible]# docker ps 
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@yiran ansible]# footloose create
INFO[0000] Docker Image: quay.io/footloose/centos7:0.6.1 present locally 
INFO[0000] Creating machine: cluster-node0 ...          
[root@yiran ansible]# footloose show
NAME            HOSTNAME   PORTS       IP           IMAGE                             CMD          STATE     BACKEND
cluster-node0   node0      32773->22   172.17.0.2   quay.io/footloose/centos7:0.6.1   /sbin/init   Running   docker
[root@yiran ansible]# docker ps 
CONTAINER ID        IMAGE                             COMMAND             CREATED             STATUS              PORTS                   NAMES
8ba9af085e53        quay.io/footloose/centos7:0.6.1   "/sbin/init"        8 seconds ago       Up 7 seconds        0.0.0.0:32773->22/tcp   cluster-node0
[root@yiran ansible]# footloose ssh node0
[root@node0 ~]# ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
root          1      0  0 10:46 ?        00:00:00 /sbin/init
root         17      1  0 10:46 ?        00:00:00 /usr/lib/systemd/systemd-journald
root         50      1  0 10:46 ?        00:00:00 /usr/sbin/sshd -D
root         58     50  0 10:46 ?        00:00:00 sshd: root@pts/1
dbus         60      1  0 10:46 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root         61      1  0 10:46 ?        00:00:00 /usr/lib/systemd/systemd-logind
root         62     58  0 10:46 pts/1    00:00:00 -bash
root         75     62  0 10:46 pts/1    00:00:00 ps -ef

可以看到在 machine 中， pid 为1 的进程是 init，这个初始化参数是写死在代码里面的，因为 machine 中存在 init 进程，也就保证了我们之后的进程都是在 init 进程树下的，我们可以通过 systemd 对服务进行管理，直到我们的从容器外部将容器杀死。

前面使用过程中，一直忽略了一点，就是我们的容器镜像内部有什么不同么？看下 Dockerfile 里面的内容：

master ✗ $ cat Dockerfile          
FROM centos:7 # base 镜像是 centos7

ENV container docker

RUN yum -y install sudo procps-ng net-tools iproute iputils wget && yum clean all # 安装必要的debug 工具

# 在 centos7 中，init 切换为 systemd 管理，针对容器中删除部分 systemd 配置
RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ $i == \
systemd-tmpfiles-setup.service ] || rm -f $i; done); \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;\
rm -f /lib/systemd/system/*.wants/*update-utmp*;

# 为了支持 ssh 连接，安装 openssh
RUN yum -y install openssh-server && yum clean all

# 暴露 22 端口
EXPOSE 22

# https://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/
STOPSIGNAL SIGRTMIN+3

CMD ["/bin/bash"]

可以看到 footloose 支持的镜像在官方的 CentOS7 的基础上进行了部分配置，比如 systemd、openssh、端口暴露等，来让容器更像是一台虚拟机。

总结链接到标题

为了方便的进行持续集成，我们引入了容器；为了更方便的进行调试/测试，我们让容器装作虚拟机的样子，也是无奈。

参考链接链接到标题

https://github.com/weaveworks/footloose

背景 链接到标题

功能示例 链接到标题

Ansible 远程控制 链接到标题

SSH 连接 链接到标题

Host 端口映射 链接到标题

代码实现 链接到标题

总结 链接到标题

参考链接 链接到标题