9-资源限制

了解stress工具

stress是一个压力测试工具

docker run -it ubuntu:16.04

此时会进入ubuntu内，这时我们安装 stress工具

apt-get update && apt-get install -y stress

查看help

stress --help

Usage: stress [OPTION [ARG]] ...-?, --help         show this help statement--version      show version statement-v, --verbose      be verbose-q, --quiet        be quiet-n, --dry-run      show what would have been done-t, --timeout N    timeout after N seconds--backoff N    wait factor of N microseconds before work starts-c, --cpu N        spawn N workers spinning on sqrt()-i, --io N         spawn N workers spinning on sync()-m, --vm N         spawn N workers spinning on malloc()/free()--vm-bytes B   malloc B bytes per vm worker (default is 256MB)--vm-stride B  touch a byte every B bytes (default is 4096)--vm-hang N    sleep N secs before free (default none, 0 is inf)--vm-keep      redirty memory instead of freeing and reallocating-d, --hdd N        spawn N workers spinning on write()/unlink()--hdd-bytes B  write B bytes per hdd worker (default is 1GB)

-? 显示帮助信息
-v 显示版本号
-q 不显示运行信息
-n，--dry-run 显示已经完成的指令执行情况
-t --timeout N 指定运行N秒后停止--backoff N 等待N微妙后开始运行
-c --cpu 产生n个进程 每个进程都反复不停的计算随机数的平方根
-i --io  产生n个进程 每个进程反复调用sync()，sync()用于将内存上的内容写到硬盘上
-m --vm n 产生n个进程,每个进程不断调用内存分配malloc和内存释放free函数--vm-bytes B 指定malloc时内存的字节数 (默认256MB)--vm-hang N 指示每个消耗内存的进程在分配到内存后转入休眠状态，与正常的无限分配和释放内存的处理相反，这有利于模拟只有少量内存的机器
-d --hadd n 产生n个执行write和unlink函数的进程--hadd-bytes B 指定写的字节数，默认是1GB--hadd-noclean 不要将写入随机ASCII数据的文件Unlink时间单位可以为秒s，分m，小时h，天d，年y，文件大小单位可以为K，M，G

重点说几个重要的参数

--vm 创建几个进程
--vm-bytes 创建的内存大小

例子1.

stress --vm 1

结果

root@aec1c5bc8396:/# stress --vm 1
stress: info: [221] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

此时stress会反复创建 256MB 内存然后释放。

这里log不是很详细，可以加--verbose查看详细log输出

例子2.

stress --vm 1 --vm-bytes 500000M --verbose

此时会报错，因为分配的内存超出了虚拟机的内存。

我们可以看看虚拟机的内存。

top

Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   498892 total,   219204 free,   172276 used,   107412 buff/cache
KiB Swap:  2097148 total,  2084092 free,    13056 used.   286196 avail Mem

我们可以看到虚拟机的内存为 498892KB

构建一个stress的镜像

创建一个文件夹
```
mkdir stress
```

在stress文件夹内创建Dockerfile

FROM ubuntu:16.04
RUN apt-get update && apt-get install -y stress
ENTRYPOINT ["/usr/bin/stress"]

构建镜像
```
docker build -t ubuntu-stress .
```
测试镜像使用
```
docker run -it --rm ubuntu-stress --vm 1
```
此时这个容器就可以当做stress命令使用了。

容器的资源配置

这个资源是例如 cpu 内存(包含虚拟内存)。

在创建容器的时候是可以指定一些参数的，我们来看一下有哪些参数。

docker run --help

Options:--add-host list                  Add a custom host-to-IP mapping (host:ip)-a, --attach list                    Attach to STDIN, STDOUT or STDERR--blkio-weight uint16            Block IO (relative weight), between 10 and 1000, or 0 to disable (default 0)--blkio-weight-device list       Block IO weight (relative device weight) (default [])--cap-add list                   Add Linux capabilities--cap-drop list                  Drop Linux capabilities--cgroup-parent string           Optional parent cgroup for the container--cidfile string                 Write the container ID to the file--cpu-period int                 Limit CPU CFS (Completely Fair Scheduler) period--cpu-quota int                  Limit CPU CFS (Completely Fair Scheduler) quota--cpu-rt-period int              Limit CPU real-time period in microseconds--cpu-rt-runtime int             Limit CPU real-time runtime in microseconds-c, --cpu-shares int                 CPU shares (relative weight)--cpus decimal                   Number of CPUs--cpuset-cpus string             CPUs in which to allow execution (0-3, 0,1)--cpuset-mems string             MEMs in which to allow execution (0-3, 0,1)-d, --detach                         Run container in background and print container ID--detach-keys string             Override the key sequence for detaching a container--device list                    Add a host device to the container--device-cgroup-rule list        Add a rule to the cgroup allowed devices list--device-read-bps list           Limit read rate (bytes per second) from a device (default [])--device-read-iops list          Limit read rate (IO per second) from a device (default [])--device-write-bps list          Limit write rate (bytes per second) to a device (default [])--device-write-iops list         Limit write rate (IO per second) to a device (default [])--disable-content-trust          Skip image verification (default true)--dns list                       Set custom DNS servers--dns-option list                Set DNS options--dns-search list                Set custom DNS search domains--entrypoint string              Overwrite the default ENTRYPOINT of the image-e, --env list                       Set environment variables--env-file list                  Read in a file of environment variables--expose list                    Expose a port or a range of ports--group-add list                 Add additional groups to join--health-cmd string              Command to run to check health--health-interval duration       Time between running the check (ms|s|m|h) (default 0s)--health-retries int             Consecutive failures needed to report unhealthy--health-start-period duration   Start period for the container to initialize before starting health-retries countdown (ms|s|m|h) (default 0s)--health-timeout duration        Maximum time to allow one check to run (ms|s|m|h) (default 0s)--help                           Print usage-h, --hostname string                Container host name--init                           Run an init inside the container that forwards signals and reaps processes-i, --interactive                    Keep STDIN open even if not attached--ip string                      IPv4 address (e.g., 172.30.100.104)--ip6 string                     IPv6 address (e.g., 2001:db8::33)--ipc string                     IPC mode to use--isolation string               Container isolation technology--kernel-memory bytes            Kernel memory limit-l, --label list                     Set meta data on a container--label-file list                Read in a line delimited file of labels--link list                      Add link to another container--link-local-ip list             Container IPv4/IPv6 link-local addresses--log-driver string              Logging driver for the container--log-opt list                   Log driver options--mac-address string             Container MAC address (e.g., 92:d0:c6:0a:29:33)-m, --memory bytes                   Memory limit--memory-reservation bytes       Memory soft limit--memory-swap bytes              Swap limit equal to memory plus swap: '-1' to enable unlimited swap--memory-swappiness int          Tune container memory swappiness (0 to 100) (default -1)--mount mount                    Attach a filesystem mount to the container--name string                    Assign a name to the container--network string                 Connect a container to a network (default "default")--network-alias list             Add network-scoped alias for the container--no-healthcheck                 Disable any container-specified HEALTHCHECK--oom-kill-disable               Disable OOM Killer--oom-score-adj int              Tune host's OOM preferences (-1000 to 1000)--pid string                     PID namespace to use--pids-limit int                 Tune container pids limit (set -1 for unlimited)--privileged                     Give extended privileges to this container-p, --publish list                   Publish a container's port(s) to the host-P, --publish-all                    Publish all exposed ports to random ports--read-only                      Mount the container's root filesystem as read only--restart string                 Restart policy to apply when a container exits (default "no")--rm                             Automatically remove the container when it exits--runtime string                 Runtime to use for this container--security-opt list              Security Options--shm-size bytes                 Size of /dev/shm--sig-proxy                      Proxy received signals to the process (default true)--stop-signal string             Signal to stop a container (default "SIGTERM")--stop-timeout int               Timeout (in seconds) to stop a container--storage-opt list               Storage driver options for the container--sysctl map                     Sysctl options (default map[])--tmpfs list                     Mount a tmpfs directory-t, --tty                            Allocate a pseudo-TTY--ulimit ulimit                  Ulimit options (default [])-u, --user string                    Username or UID (format: <name|uid>[:<group|gid>])--userns string                  User namespace to use--uts string                     UTS namespace to use-v, --volume list                    Bind mount a volume--volume-driver string           Optional volume driver for the container--volumes-from list              Mount volumes from the specified container(s)-w, --workdir string                 Working directory inside the container

内内存的限制

参数

--memory
--memory-swap

如果我们只是限制了memory而没有限制 memory-swap 那么 memory-swap 会和 memory 一样。

列子1.

限定 200M 内存

docker run --memory=200M ubuntu-stress --vm 1 --verbose

压力测试 500M

docker run --memory=200M ubuntu-stress --vm 1 --verbose --vm-bytes 500M

结果

stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogvm worker 1 [6] forked
stress: dbug: [6] allocating 524288000 bytes ...
stress: dbug: [6] touching bytes in strides of 4096 bytes ...
stress: FAIL: [1] (415) <-- worker 6 got signal 9
stress: WARN: [1] (417) now reaping child worker processes
stress: FAIL: [1] (421) kill error: No such process
stress: FAIL: [1] (451) failed run completed in 1s

cpu限制

参数

--cpu-shares 是一个相对权重。例如: 有两个容器，一个设定了10，一个设定了5，这样的话，两个容器如果占满了cpu,那么这个个数的比例就是权重，10个cpu的百分比是5个cpu的百分比的两倍。

例子1.

打开三个虚拟机的窗口，通过vagrant ssh
第一个窗口执行top命令

第二个执行ubuntu-stress容器

docker run --name test2 --cpu-shares=5 ubuntu-stress --cpu 1

此时我们查看第一个窗口的top

8875 root      20   0    7472     96      0 R 99.7  0.0   0:06.99 stress

cpu已经占用已经快 100% 了。

第三个窗口执行ubuntu-stress容器

docker run --name test1 --cpu-shares=10 ubuntu-stress --cpu 1

此时我们查看第一个窗口的top

 8956 root      20   0    7472     96      0 R 66.1  0.0   0:03.54 stress                                                                                     
8875 root      20   0    7472     96      0 R 33.2  0.0   1:21.26 stress

我们可以看到 8875 进程的cpu占用率编程 33 了，另一个是 66，大约是两倍的关系。

所以 --cpu-shares 的含义并不是指定使用了多少颗cpu，而是一个比例，默认值为 1024,最小为2,只有在容器竞争资源的时候才有意义，如果只有一个容器，即使设置为2，那么也是占用当前设备的 100% cpu资源。

限制可用的 CPU 个数

当前系统为 2 个cpu

为容器分配2个cpu，并且使用一个cpu

在其中一个终端执行

docker run --rm --cpus=2 ubuntu-stress --cpu 1

另外一个终端执行top

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                   5120 root      20   0    7472     96      0 R  99.7  0.0   0:54.48 stress

查看宿主cpu使用情况
在这里插入图片描述

也可以通过 top 命令然后按 1 查看。

可以看到 5120 进程占用了 100%，此时只有一个cpu被占用，那么我们使用2个cpu做一下测试

docker run --rm --cpus=2 ubuntu-stress --cpu 2

查看top

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                   5234 root      20   0    7472     96      0 R  99.7  0.0   0:13.47 stress                                                                                    5233 root      20   0    7472     96      0 R  99.3  0.0   0:13.38 stress

可以看到 2 个进程消耗了cpu 100%，说明两个cpu都被充分利用。

指定固定的 CPU

通过 --cpus 选项我们无法让容器始终在一个或某几个 CPU 上运行，但是通过 --cpuset-cpus 选项却可以做到！这是非常有意义的，因为现在的多核系统中每个核心都有自己的缓存，如果频繁的调度进程在不同的核心上执行势必会带来缓存失效等开销。下面我们就演示如何设置容器使用固定的 CPU，下面的命令为容器设置了 --cpuset-cpus 选项，指定运行容器的 CPU 编号为 1：

docker run --rm --cpuset-cpus="1" ubuntu-stress --cpu 1

在 top 终端内按数字 1

%Cpu0  :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

可以看到是在编号为 1 的cpu上运行。

资源限制的底层技术支持

Namespace ：做隔离 pid, net, ipc, mnt, uts
Control groups : 做资源限制
Union file systems : Container 和 image的分层

隔离性 Linux namespace

每个用户实例之间相互隔离, 互不影响。一般的硬件虚拟化方法给出的方法是VM，而LXC给出的方法是container，更细一点讲就是kernel namespace。其中pid、net、ipc、mnt、uts、user等namespace将container的进程、网络、消息、文件系统、UTS(“UNIX Time-sharing System”)和用户空间隔离开。

pid namespace

不同用户的进程就是通过pid namespace隔离开的，且不同 namespace 中可以有相同pid。所有的LXC进程在docker中的父进程为docker进程，每个lxc进程具有不同的namespace。同时由于允许嵌套，因此可以很方便的实现 Docker in Docker。
net namespace

有了 pid namespace, 每个namespace中的pid能够相互隔离，但是网络端口还是共享host的端口。网络隔离是通过net namespace实现的，每个net namespace有独立的 network devices, IP addresses, IP routing tables, /proc/net 目录。这样每个container的网络就能隔离开来。docker默认采用veth的方式将container中的虚拟网卡同host上的一个docker bridge: docker0连接在一起。
ipc namespace

container中进程交互还是采用linux常见的进程间交互方法(interprocess communication - IPC), 包括常见的信号量、消息队列和共享内存。然而同 VM 不同的是，container 的进程间交互实际上还是host上具有相同pid namespace中的进程间交互，因此需要在IPC资源申请时加入namespace信息 - 每个IPC资源有一个唯一的 32 位 ID。
mnt namespace

类似chroot，将一个进程放到一个特定的目录执行。mnt namespace允许不同namespace的进程看到的文件结构不同，这样每个 namespace 中的进程所看到的文件目录就被隔离开了。同chroot不同，每个namespace中的container在/proc/mounts的信息只包含所在namespace的mount point。
uts namespace

UTS(“UNIX Time-sharing System”) namespace允许每个container拥有独立的hostname和domain name, 使其在网络上可以被视作一个独立的节点而非Host上的一个进程。
user namespace

每个container可以有不同的 user 和 group id, 也就是说可以在container内部用container内部的用户执行程序而非Host上的用户。