问题

使用kong的chart,在kubernetes集群默认安装出来kong的容器是监听8000和8443端口的,而为了让外部以80和443端口访问kong这个API网关,一般会使用kubernetes的service proxy技术或外部load balancer将流量反向代理到kong。能否直接让kong直接监听80和443端口,从而避免反向代理的网络开销,这里进行一些尝试。

解决过程

修改kubernetes清单文件

原来用kong的chart安装出来的kong其关键yaml如下:

---
apiVersion: v1
kind: Service
metadata:
  name: kong-kong-proxy
  labels:
    app: kong
    chart: "kong-0.15.1"
    release: "kong"
    heritage: "Tiller"
spec:
  type: NodePort
  ports:
  - name: kong-proxy
    port: 80
    targetPort: 8000
    protocol: TCP
  - name: kong-proxy-tls
    port: 443
    targetPort: 8443
    protocol: TCP

  selector:
    app: kong
    release: kong
    component: app
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: "kong-kong"
  labels:
    app: "kong"
    chart: "kong-0.15.1"
    release: "kong"
    heritage: "Tiller"
    component: app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kong
      release: kong
      component: app
  template:
    metadata:
      labels:
        app: kong
        release: kong
        component: app
    spec:
      initContainers:
      - name: wait-for-db
        image: "kong:1.2.2"
        imagePullPolicy: IfNotPresent
        env:
        - name: KONG_PG_HOST
          value: kong-postgresql
        - name: KONG_PG_PORT
          value: "5432"
        - name: KONG_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: kong-postgresql
              key: postgresql-password

        - name: KONG_ADMIN_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_ADMIN_GUI_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_GUI_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_DATABASE
          value: "postgres"
        - name: KONG_PORTAL_API_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PORTAL_API_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_PROXY_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PROXY_ERROR_LOG
          value: "/dev/stderr"
        command: [ "/bin/sh", "-c", "until kong start; do echo 'waiting for db'; sleep 1; done; kong stop" ]

      containers:
      - name: kong
        image: "kong:1.2.2"
        imagePullPolicy: IfNotPresent
        env:
        - name: KONG_ADMIN_LISTEN
          value: "0.0.0.0:8444 ssl"
        - name: KONG_PROXY_LISTEN
          value: 0.0.0.0:8000,0.0.0.0:8443 ssl
        - name: KONG_NGINX_DAEMON
          value: "off"
        - name: KONG_NGINX_HTTP_INCLUDE
          value: /kong/servers.conf
        - name: KONG_PG_HOST
          value: kong-postgresql
        - name: KONG_PG_PORT
          value: "5432"
        - name: KONG_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: kong-postgresql
              key: postgresql-password
        - name: KONG_ADMIN_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_ADMIN_GUI_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_GUI_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_DATABASE
          value: "postgres"
        - name: KONG_PORTAL_API_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PORTAL_API_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_PROXY_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PROXY_ERROR_LOG
          value: "/dev/stderr"
        ports:
        - name: admin
          containerPort: 8444
          protocol: TCP
        - name: proxy
          containerPort: 8000
          protocol: TCP
        - name: proxy-tls
          containerPort: 8443
          protocol: TCP
        - name: metrics
          containerPort: 9542
          protocol: TCP
        volumeMounts:
          - name: custom-nginx-template-volume
            mountPath: /kong
        readinessProbe:
          failureThreshold: 5
          httpGet:
            path: /status
            port: admin
            scheme: HTTPS
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /status
            port: admin
            scheme: HTTPS
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5

        resources:
          {}

      tolerations:
        []

      volumes:
        - name: custom-nginx-template-volume
          configMap:
            name: kong-kong-default-custom-server-blocks
---

可以看到定义了一个Deployment,并用Service将pod提供的服务暴露出来了。其中kong这个容器有一个环境变量KONG_PROXY_LISTEN,其值为0.0.0.0:8000,0.0.0.0:8443 ssl,说明容器会监听8000和8443端口。看到这里很自然想到直接修改KONG_PROXY_LISTEN这个环境变量,pod直接使用hostNetwork,这样就很可以很轻松地让kong监听node节点上的80和443端口,修改成的yaml文件如下:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: "kong-kong"
  labels:
    app: "kong"
    chart: "kong-0.15.1"
    release: "kong"
    heritage: "Tiller"
    component: app
spec:
  selector:
    matchLabels:
      app: kong
      release: kong
      component: app
  template:
    metadata:
      labels:
        app: kong
        release: kong
        component: app
    spec:
      initContainers:
      - name: wait-for-db
        image: "kong:1.2.2"
        imagePullPolicy: IfNotPresent
        env:
        - name: KONG_PG_HOST
          value: kong-postgresql
        - name: KONG_PG_PORT
          value: "5432"
        - name: KONG_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: kong-postgresql
              key: postgresql-password

        - name: KONG_ADMIN_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_ADMIN_GUI_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_GUI_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_DATABASE
          value: "postgres"
        - name: KONG_PORTAL_API_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PORTAL_API_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_PROXY_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PROXY_ERROR_LOG
          value: "/dev/stderr"
        command: [ "/bin/sh", "-c", "until kong start; do echo 'waiting for db'; sleep 1; done; kong stop" ]

      containers:
      - name: kong
        image: "kong:1.2.2"
        imagePullPolicy: IfNotPresent
        env:
        - name: KONG_ADMIN_LISTEN
          value: "0.0.0.0:8444 ssl"
        - name: KONG_PROXY_LISTEN
          value: 0.0.0.0:80,0.0.0.0:443 ssl
        - name: KONG_NGINX_DAEMON
          value: "off"
        - name: KONG_NGINX_HTTP_INCLUDE
          value: /kong/servers.conf
        - name: KONG_PG_HOST
          value: kong-postgresql
        - name: KONG_PG_PORT
          value: "5432"
        - name: KONG_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: kong-postgresql
              key: postgresql-password
        - name: KONG_ADMIN_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_ADMIN_GUI_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_ADMIN_GUI_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_DATABASE
          value: "postgres"
        - name: KONG_PORTAL_API_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PORTAL_API_ERROR_LOG
          value: "/dev/stderr"
        - name: KONG_PROXY_ACCESS_LOG
          value: "/dev/stdout"
        - name: KONG_PROXY_ERROR_LOG
          value: "/dev/stderr"
        ports:
        - name: admin
          containerPort: 8444
          protocol: TCP
        - name: proxy
          containerPort: 80
          protocol: TCP
        - name: proxy-tls
          containerPort: 443
          protocol: TCP
        - name: metrics
          containerPort: 9542
          protocol: TCP
        volumeMounts:
          - name: custom-nginx-template-volume
            mountPath: /kong
        readinessProbe:
          failureThreshold: 5
          httpGet:
            path: /status
            port: admin
            scheme: HTTPS
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /status
            port: admin
            scheme: HTTPS
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5

        resources:
          {}

      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet

      tolerations:
        []

      volumes:
        - name: custom-nginx-template-volume
          configMap:
            name: kong-kong-default-custom-server-blocks
---

主要做了以下改动:

  1. Deployment修改为了DaemonSet
  2. 删除了Service
  3. 8000端口修改为了808443端口修改为了443
  4. 配置了hostNetworkturednsPolicyClusterFirstWithHostNet。因为kong这个pod还引用了kubernetes里的kong-postgresql服务名,以了让DaemonSet的pod能正确解析kong-postgresql服务名,必须设置使用了hostNetwork的pod其 dnsPolicyClusterFirstWithHostNet,见这里的说明。

kubernetes清单文件修改完毕后,我们将之部署进kubernetes测试一下,结果pod报错:

2019/08/17 12:59:06 [emerg] 1#0: bind() to 0.0.0.0:80 failed (13: Permission denied)
nginx: [emerg] bind() to 0.0.0.0:80 failed (13: Permission denied)

授予合适的Linux capabilities

从上面的报错来看,是说没有足够的权限监听80端口,应该是没有绑定1024以下特权端口的权限。查阅下Linux capabilities,得知得添加CAP_NET_BIND_SERVICE的capability,程序才能绑定1024以下特权端口。于是参考kubernetes的SecurityContext的文档,我给pod配置上合适的Linux capabilities。

   CAP_NET_BIND_SERVICE
          Bind a socket to Internet domain privileged ports (port
          numbers less than 1024).
podSpec:
  ......
	securityContext:
      capabilities:
        add: 
        - NET_BIND_SERVICE

再将部署pod,发现问题依旧。

通过kubernetes的SecurityContext还可以设置很多pod安全相关的设置,以后在工作中可以多实践下。

分析kong的启动过程

已经添加了合适的Linux capabilities,竟然还不能正常监听80和443,看来问题并不是这儿。接下来我分析下kong镜像中kong进程的启动过程。

找到kong镜像的Dockerfile

FROM alpine:3.10
LABEL maintainer="Kong Core Team <team-core@konghq.com>"

ENV KONG_VERSION 1.2.2
ENV KONG_SHA256 76183d7e8ff084c86767b917da441001d0d779d35fa2464275b74226029a46bf

RUN adduser -Su 1337 kong \
	&& mkdir -p "/usr/local/kong" \
	&& apk add --no-cache --virtual .build-deps wget tar ca-certificates \
	&& apk add --no-cache libgcc openssl pcre perl tzdata curl libcap su-exec zip \
	&& wget -O kong.tar.gz "https://bintray.com/kong/kong-alpine-tar/download_file?file_path=kong-$KONG_VERSION.apk.tar.gz" \
	&& echo "$KONG_SHA256 *kong.tar.gz" | sha256sum -c - \
	&& tar -xzf kong.tar.gz -C /tmp \
	&& rm -f kong.tar.gz \
	&& cp -R /tmp/usr / \
	&& rm -rf /tmp/usr \
	&& cp -R /tmp/etc / \
	&& rm -rf /tmp/etc \
	&& apk del .build-deps \
	&& chown -R kong:0 /usr/local/kong \
	&& chmod -R g=u /usr/local/kong

COPY docker-entrypoint.sh /docker-entrypoint.sh

ENTRYPOINT ["/docker-entrypoint.sh"]

EXPOSE 8000 8443 8001 8444

STOPSIGNAL SIGQUIT

CMD ["kong", "docker-start"]

这个镜像的构建过程很简单,逻辑如下:

  1. 创建kong用户
  2. 安装kong的程序
  3. 将docker-entrypoint.sh启动脚本拷贝到镜像里
  4. 设置ENTRYPOINT及CMD

再看一看docker-entrypoint.sh启动脚本

#!/bin/sh
set -e

export KONG_NGINX_DAEMON=off

has_transparent() {
  echo "$1" | grep -E "[^\s,]+\s+transparent\b" >/dev/null
}

if [[ "$1" == "kong" ]]; then
  PREFIX=${KONG_PREFIX:=/usr/local/kong}

  if [[ "$2" == "docker-start" ]]; then
    shift 2
    kong prepare -p "$PREFIX" "$@"
    
    # workaround for https://github.com/moby/moby/issues/31243
    chmod o+w /proc/self/fd/1 || true
    chmod o+w /proc/self/fd/2 || true

    if [ "$(id -u)" != "0" ]; then
      exec /usr/local/openresty/nginx/sbin/nginx \
        -p "$PREFIX" \
        -c nginx.conf
    else
      if [ ! -z ${SET_CAP_NET_RAW} ] \
          || has_transparent "$KONG_STREAM_LISTEN" \
          || has_transparent "$KONG_PROXY_LISTEN" \
          || has_transparent "$KONG_ADMIN_LISTEN";
      then
        setcap cap_net_raw=+ep /usr/local/openresty/nginx/sbin/nginx
      fi
      chown -R kong:0 /usr/local/kong
      exec su-exec kong /usr/local/openresty/nginx/sbin/nginx \
        -p "$PREFIX" \
        -c nginx.conf
    fi
  fi
fi

exec "$@"

分析上面的脚本,可以知道,如果以root启动镜像,脚本里最终会以kong用户启动/usr/local/openresty/nginx/sbin/nginx二进制程序。

exec su-exec kong /usr/local/openresty/nginx/sbin/nginx \
        -p "$PREFIX" \
        -c nginx.conf

即然是普通用户kong运行的程序,自然无法正常监听1024以下的端口。

使用setcap给二进制提权

这时我会问了,为啥安装了apache,以www用户运行apache的二进制程序,为啥又可以监听80端口呢?

查阅文档,我们知道有两种办法让普通用户执行二进制程序时:

  1. 使用chmod设置setuid位,这样一个可执行文件启动时,它不会以启动它的用户的权限运行,而是以该文件所有者的权限运行,参见这里
  2. 另一种方法是使用setcap给二进制文件添加必要的Linux capabilities,参见这里

一般会采用方法2,这样二进制文件的权限更受控一点。于是我在docker-entrypoint.sh里使用setcap命令给二进制文件添加必要的Linux capabilities。

setcap cap_net_bind_service=+eip /usr/local/openresty/nginx/sbin/nginx

至此,使用kong的docker镜像,容器本身终于可以监听80和443端口了。

更优雅的处理方案

问题终于解决了,偶然在kong的开源端点上发现有人为解决该问题,发了一个PR,看PR的代码,是通过判断一个环境变量来决定是否调用setcap命令的,而且还考虑了setcap作用被覆盖的场景,处理方案明显更优雅。

    caps=""

    if [ -n "${SET_CAP_NET_RAW}" ] \
        || has_transparent "$KONG_STREAM_LISTEN" \
        || has_transparent "$KONG_PROXY_LISTEN" \
        || has_transparent "$KONG_ADMIN_LISTEN";
    then
      caps="${caps:+"${caps}",}cap_net_raw"
    fi

    if [ -n "${SET_CAP_NET_BIND_SERVICE}" ] ; then
      caps="${caps:+"${caps}",}cap_net_bind_service"
    fi

    if [ -n "${caps}" ] ; then
      setcap "${caps}=+ep" /usr/local/openresty/nginx/sbin/nginx
    fi

这个PR不知原因一直没有合进主干。

看man文档的技巧

有时看到一个命令的用法,觉得很陌生,可以使用man命令快速学习一下。比如:

$ man setcap

NAME
       setcap - set file capabilities

SYNOPSIS
       setcap [-q] [-v] (capabilities|-|-r) filename [ ... capabilitiesN fileN
       ]

DESCRIPTION
       In the absence of the -v (verify) option setcap sets  the  capabilities
       of  each  specified  filename  to  the  capabilities specified.  The -v
       option is used to verify that the specified capabilities are  currently
       associated with the file.

       The   capabilities   are   specified   in   the   form   described   in
       cap_from_text(3).
       
# 上述文档中说capabilities的格式在cap_from_text(3)里进行说明

# 找不到cap_from_text(3)的man文档
$ man 3 cap_from_text
No manual entry for cap_from_text in section 3

# 安装man-pages、man-db
$ yum install -y man-pages
$ yum install -y man-db
$ mandb

# 还是找不到cap_from_text(3)的man文档
$ man 3 cap_from_text
No manual entry for cap_from_text in section 3

# 搜索下cap_from_text的man文档在哪个rpm包里,下面的输出说明在libcap-devel这个rpm包里
$ yum whatprovides */cap_from_text.*.gz
libcap-devel-2.22-9.el7.i686 : Development files for libcap
Repo        : os
Matched from:
Filename    : /usr/share/man/man3/cap_from_text.3.gz

libcap-devel-2.22-9.el7.x86_64 : Development files for libcap
Repo        : os
Matched from:
Filename    : /usr/share/man/man3/cap_from_text.3.gz

# 安装libcap-devel软件包
$ yum install -y libcap-devel

# 这次终于可以查寻到cap_from_text(3)的man文档,看了man文档,总算知道setcap命令里+eip是什么意思了
$ man 3 cap_from_text
......
       Each  clause consists of a list of comma-separated capability names (or
       the word `all'), followed by an action-list.  An  action-list  consists
       of  a  sequence of operator flag pairs.  Legal operators are: `=', '+',
       and `-'.  Legal flags are: `e', `i', and `p'.  These  flags  are  case-
       sensitive  and  specify  the  Effective, Inheritable and Permitted sets
       respectively.
......

总结

这个权限问题虽说是个小问题,但在解决的过程中查阅了kubernetes的SecurityContext的相关资料、Linux capabilities的概念及用法、分析了kong的启动过程,整个过程学到了挺多东西,还挺有意思的。

参考

  1. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
  2. http://man7.org/linux/man-pages/man7/capabilities.7.html
  3. https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
  4. https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container
  5. https://linux.cn/article-9355-1.html
  6. https://www.cnblogs.com/nf01/articles/10418141.html
  7. https://github.com/Kong/docker-kong/pull/213
  8. https://codingbee.net/centos/man-how-to-fix-the-no-manual-entry-for
  9. https://luanlengli.github.io/2019/07/05/%E9%80%9A%E8%BF%87Linux%20capabilities%E6%9C%BA%E5%88%B6%E8%AE%A9kong%E7%9B%91%E5%90%AC80%E5%92%8C443%E7%AB%AF%E5%8F%A3.html
  10. https://docs.konghq.com/1.2.x/configuration/#proxy_listen