每期一个小窍门 k8s版本的 Prometheus + grafana + alertmanager 三件套部署监控落地

首先部署prometheus

首先是pvc

apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-data-pvcnamespace: monitor
spec:accessModes:- ReadWriteManystorageClassName: "data-nfs-storage"resources:requests:storage: 10Gi

然后接着 cluster-role 这里给了cluster-admin权限

apiVersion: v1
kind: ServiceAccount
metadata:name: prometheus2namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus2
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: cluster-admin
subjects:
- kind: ServiceAccountname: prometheus2namespace: monitor 

由于用的是istio 所以这里的ingress用vs替代

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitor
data:prometheus.yml: |global:scrape_interval:     15sevaluation_interval: 15sexternal_labels:cluster: "kubernetes"alerting:alertmanagers:- static_configs:- targets: ["aalertmanager:9093"] scrape_configs:- job_name: prometheusstatic_configs:- targets: ['localhost:9090']labels:instance: prometheus- job_name: kubeletmetrics_path: /metrics/cadvisorscheme: httpstls_config:insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: k8s-state-metricskubernetes_sd_configs: # k8s service discovery conf 服务发现配置- role: endpoints      # 去k8s的APIServer里拿取endpoints资源清单relabel_configs:- source_labels: [__meta_kubernetes_service_label_kubernetes_io_name]regex: kube-state-metricsaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:8080- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: ingressmetrics_path: /metrics/cadvisorkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- source_labels: [__address__]regex: '(.+):10250'target_label: __address__replacement: ${1}:10254- job_name: jx-mysql-master-37static_configs:- targets: ['10.0.40.3:9104']labels:instance: jx-mysql-master-36 ############ 指定告警规则文件路径位置 ###################rule_files:- /etc/prometheus/rules/*.rules
---
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-rulesnamespace: monitor
data:node-exporter.rules: |groups:- name: NodeExporterrules:- alert: HostOutOfMemoryexpr: '(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host out of memory (instance {{ $labels.instance }})description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostMemoryUnderMemoryPressureexpr: '(rate(node_vmstat_pgmajfault[1m]) > 1000) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host memory under memory pressure (instance {{ $labels.instance }})description: "The node is under heavy memory pressure. High rate of major page faults\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostMemoryIsUnderutilizedexpr: '(100 - (rate(node_memory_MemAvailable_bytes[30m]) / node_memory_MemTotal_bytes * 100) < 20) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 1wlabels:severity: infoannotations:summary: Host Memory is underutilized (instance {{ $labels.instance }})description: "Node memory is < 20% for 1 week. Consider reducing memory space. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualNetworkThroughputInexpr: '(sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual network throughput in (instance {{ $labels.instance }})description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualNetworkThroughputOutexpr: '(sum by (instance) (rate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 100) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual network throughput out (instance {{ $labels.instance }})description: "Host network interfaces are probably sending too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskReadRateexpr: '(sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual disk read rate (instance {{ $labels.instance }})description: "Disk is probably reading too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskWriteRateexpr: '(sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host unusual disk write rate (instance {{ $labels.instance }})description: "Disk is probably writing too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostOutOfDiskSpaceexpr: '((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host out of disk space (instance {{ $labels.instance }})description: "Disk is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostDiskWillFillIn24Hoursexpr: '((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs"}[1h], 24 * 3600) < 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host disk will fill in 24 hours (instance {{ $labels.instance }})description: "Filesystem is predicted to run out of space within the next 24 hours at current write rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostOutOfInodesexpr: '(node_filesystem_files_free / node_filesystem_files * 100 < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host out of inodes (instance {{ $labels.instance }})description: "Disk is almost running out of available inodes (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostFilesystemDeviceErrorexpr: 'node_filesystem_device_error == 1'for: 0mlabels:severity: criticalannotations:summary: Host filesystem device error (instance {{ $labels.instance }})description: "{{ $labels.instance }}: Device error with the {{ $labels.mountpoint }} filesystem\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostInodesWillFillIn24Hoursexpr: '(node_filesystem_files_free / node_filesystem_files * 100 < 10 and predict_linear(node_filesystem_files_free[1h], 24 * 3600) < 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host inodes will fill in 24 hours (instance {{ $labels.instance }})description: "Filesystem is predicted to run out of inodes within the next 24 hours at current write rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskReadLatencyexpr: '(rate(node_disk_read_time_seconds_total[1m]) / rate(node_disk_reads_completed_total[1m]) > 0.1 and rate(node_disk_reads_completed_total[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host unusual disk read latency (instance {{ $labels.instance }})description: "Disk latency is growing (read operations > 100ms)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskWriteLatencyexpr: '(rate(node_disk_write_time_seconds_total[1m]) / rate(node_disk_writes_completed_total[1m]) > 0.1 and rate(node_disk_writes_completed_total[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host unusual disk write latency (instance {{ $labels.instance }})description: "Disk latency is growing (write operations > 100ms)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostHighCpuLoadexpr: '(sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 10mlabels:severity: warningannotations:summary: Host high CPU load (instance {{ $labels.instance }})description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostCpuIsUnderutilizedexpr: '(100 - (rate(node_cpu_seconds_total{mode="idle"}[30m]) * 100) < 20) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 1wlabels:severity: infoannotations:summary: Host CPU is underutilized (instance {{ $labels.instance }})description: "CPU load is < 20% for 1 week. Consider reducing the number of CPUs.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostCpuStealNoisyNeighborexpr: '(avg by(instance) (rate(node_cpu_seconds_total{mode="steal"}[5m])) * 100 > 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host CPU steal noisy neighbor (instance {{ $labels.instance }})description: "CPU steal is > 10%. A noisy neighbor is killing VM performances or a spot instance may be out of credit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostCpuHighIowaitexpr: '(avg by (instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100 > 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host CPU high iowait (instance {{ $labels.instance }})description: "CPU iowait > 10%. A high iowait means that you are disk or network bound.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskIoexpr: '(rate(node_disk_io_time_seconds_total[1m]) > 0.5) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual disk IO (instance {{ $labels.instance }})description: "Time spent in IO is too high on {{ $labels.instance }}. Check storage for issues.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostContextSwitchingexpr: '((rate(node_context_switches_total[5m])) / (count without(cpu, mode) (node_cpu_seconds_total{mode="idle"})) > 10000) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host context switching (instance {{ $labels.instance }})description: "Context switching is growing on the node (> 10000 / s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostSwapIsFillingUpexpr: '((1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100 > 80) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host swap is filling up (instance {{ $labels.instance }})description: "Swap is filling up (>80%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostSystemdServiceCrashedexpr: '(node_systemd_unit_state{state="failed"} == 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host systemd service crashed (instance {{ $labels.instance }})description: "systemd service crashed\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostPhysicalComponentTooHotexpr: '((node_hwmon_temp_celsius * ignoring(label) group_left(instance, job, node, sensor) node_hwmon_sensor_label{label!="tctl"} > 75)) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host physical component too hot (instance {{ $labels.instance }})description: "Physical hardware component too hot\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNodeOvertemperatureAlarmexpr: '(node_hwmon_temp_crit_alarm_celsius == 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: criticalannotations:summary: Host node overtemperature alarm (instance {{ $labels.instance }})description: "Physical node temperature alarm triggered\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostRaidArrayGotInactiveexpr: '(node_md_state{state="inactive"} > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: criticalannotations:summary: Host RAID array got inactive (instance {{ $labels.instance }})description: "RAID array {{ $labels.device }} is in a degraded state due to one or more disk failures. The number of spare drives is insufficient to fix the issue automatically.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostRaidDiskFailureexpr: '(node_md_disks{state="failed"} > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host RAID disk failure (instance {{ $labels.instance }})description: "At least one device in RAID array on {{ $labels.instance }} failed. Array {{ $labels.md_device }} needs attention and possibly a disk swap\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostKernelVersionDeviationsexpr: '(count(sum(label_replace(node_uname_info, "kernel", "$1", "release", "([0-9]+.[0-9]+.[0-9]+).*")) by (kernel)) > 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 6hlabels:severity: warningannotations:summary: Host kernel version deviations (instance {{ $labels.instance }})description: "Different kernel versions are running\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostOomKillDetectedexpr: '(increase(node_vmstat_oom_kill[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host OOM kill detected (instance {{ $labels.instance }})description: "OOM kill detected\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostEdacCorrectableErrorsDetectedexpr: '(increase(node_edac_correctable_errors_total[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: infoannotations:summary: Host EDAC Correctable Errors detected (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} has had {{ printf \"%.0f\" $value }} correctable memory errors reported by EDAC in the last 5 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostEdacUncorrectableErrorsDetectedexpr: '(node_edac_uncorrectable_errors_total > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host EDAC Uncorrectable Errors detected (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} has had {{ printf \"%.0f\" $value }} uncorrectable memory errors reported by EDAC in the last 5 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkReceiveErrorsexpr: '(rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host Network Receive Errors (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} receive errors in the last two minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkTransmitErrorsexpr: '(rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host Network Transmit Errors (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} transmit errors in the last two minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkInterfaceSaturatedexpr: '((rate(node_network_receive_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m]) + rate(node_network_transmit_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m])) / node_network_speed_bytes{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"} > 0.8 < 10000) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 1mlabels:severity: warningannotations:summary: Host Network Interface Saturated (instance {{ $labels.instance }})description: "The network interface \"{{ $labels.device }}\" on \"{{ $labels.instance }}\" is getting overloaded.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkBondDegradedexpr: '((node_bonding_active - node_bonding_slaves) != 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host Network Bond Degraded (instance {{ $labels.instance }})description: "Bond \"{{ $labels.device }}\" degraded on \"{{ $labels.instance }}\".\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostConntrackLimitexpr: '(node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 0.8) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host conntrack limit (instance {{ $labels.instance }})description: "The number of conntrack is approaching limit\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostClockSkewexpr: '((node_timex_offset_seconds > 0.05 and deriv(node_timex_offset_seconds[5m]) >= 0) or (node_timex_offset_seconds < -0.05 and deriv(node_timex_offset_seconds[5m]) <= 0)) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 10mlabels:severity: warningannotations:summary: Host clock skew (instance {{ $labels.instance }})description: "Clock skew detected. Clock is out of sync. Ensure NTP is configured correctly on this host.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostClockNotSynchronisingexpr: '(min_over_time(node_timex_sync_status[1m]) == 0 and node_timex_maxerror_seconds >= 16) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host clock not synchronising (instance {{ $labels.instance }})description: "Clock not synchronising. Ensure NTP is configured on this host.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostRequiresRebootexpr: '(node_reboot_required > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 4hlabels:severity: infoannotations:summary: Host requires reboot (instance {{ $labels.instance }})description: "{{ $labels.instance }} requires a reboot.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"volume.rules: |groups:- name: volume.rulesrules:- alert: PersistentVolumeClaimLostexpr: |sum by(namespace, persistentvolumeclaim) (kube_persistentvolumeclaim_status_phase{phase="Lost"}) == 1for: 2mlabels:severity: warningannotations:description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is lost!"- alert: PersistentVolumeClaimPendigexpr: |sum by(namespace, persistentvolumeclaim) (kube_persistentvolumeclaim_status_phase{phase="Pendig"}) == 1for: 2mlabels:severity: warningannotations:description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pendig!"- alert: PersistentVolume Failedexpr: |sum(kube_persistentvolume_status_phase{phase="Failed",job="kubernetes-service-endpoints"}) by (persistentvolume) == 1for: 2mlabels:severity: warningannotations:description: "Persistent volume is failed state\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: PersistentVolume Pendingexpr: |sum(kube_persistentvolume_status_phase{phase="Pending",job="kubernetes-service-endpoints"}) by (persistentvolume) == 1for: 2mlabels:severity: warningannotations:description: "Persistent volume is pending state\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"prometheus.rules: |groups:- name: prometheus.rulesrules:- alert: PrometheusErrorSendingAlertsToAnyAlertmanagersexpr: |(rate(prometheus_notifications_errors_total{instance="localhost:9090", job="prometheus"}[5m]) / rate(prometheus_notifications_sent_total{instance="localhost:9090", job="prometheus"}[5m])) * 100 > 3for: 5mlabels:severity: warningannotations:description: '{{ printf "%.1f" $value }}% minimum errors while sending alerts from Prometheus {{$labels.namespace}}/{{$labels.pod}} to any Alertmanager.'- alert: PrometheusNotConnectedToAlertmanagersexpr: |max_over_time(prometheus_notifications_alertmanagers_discovered{instance="localhost:9090", job="prometheus"}[5m]) != 1for: 5mlabels:severity: criticalannotations:description: "Prometheus {{$labels.namespace}}/{{$labels.pod}} 链接alertmanager异常!"- alert: PrometheusRuleFailuresexpr: |increase(prometheus_rule_evaluation_failures_total{instance="localhost:9090", job="prometheus"}[5m]) > 0for: 5mlabels:severity: criticalannotations:description: 'Prometheus {{$labels.namespace}}/{{$labels.pod}} 在5分钟执行失败的规则次数 {{ printf "%.0f" $value }}'- alert: PrometheusRuleEvaluationFailuresexpr: increase(prometheus_rule_evaluation_failures_total[3m]) > 0for: 0mlabels:severity: criticalannotations:summary: Prometheus rule evaluation failures (instance {{ $labels.instance }})description: "Prometheus 遇到规则 {{ $value }} 载入失败, 请及时检查."- alert: PrometheusTsdbReloadFailuresexpr: increase(prometheus_tsdb_reloads_failures_total[1m]) > 0for: 0mlabels:severity: criticalannotations:summary: Prometheus TSDB reload failures (instance {{ $labels.instance }})description: "Prometheus {{ $value }} TSDB 重载失败!"- alert: PrometheusTsdbWalCorruptionsexpr: increase(prometheus_tsdb_wal_corruptions_total[1m]) > 0for: 0mlabels:severity: criticalannotations:summary: Prometheus TSDB WAL corruptions (instance {{ $labels.instance }})description: "Prometheus {{ $value }} TSDB WAL 模块出现问题!"website.rules: |groups:- name: website.rulesrules:- alert: "ssl证书过期警告"expr: (probe_ssl_earliest_cert_expiry - time())/86400 <30for: 1hlabels:severity: warningannotations:description: '域名{{$labels.instance}}的证书还有{{ printf "%.1f" $value }}天就过期了,请尽快更新证书'summary: "ssl证书过期警告"- alert: blackbox_network_statsexpr: probe_success == 0for: 1mlabels:severity: criticalpod: '{{$labels.instance}}'namespace: '{{$labels.kubernetes_namespace}}'annotations:summary: "接口/主机/端口/域名 {{ $labels.instance }} 不能访问"description: "接口/主机/端口/域名 {{ $labels.instance }} 不能访问,请尽快检测!"- alert: curlHttpStatusexpr:  probe_http_status_code{job="blackbox-http"} >= 422 and probe_success{job="blackbox-http"} == 0for: 1mlabels:severity: criticalannotations:summary: '业务报警: 网站不可访问'description: '{{$labels.instance}} 不可访问,请及时查看,当前状态码为{{$value}}'pod.rules: |groups:- name: pod.rulesrules:- alert: PodCPUUsageexpr: |sum(rate(container_cpu_usage_seconds_total{image!=""}[5m]) * 100) by (pod, namespace) > 90for: 5mlabels:severity: warningpod: '{{$labels.pod}}'annotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} CPU使用大于90% (当前值: {{ $value }})"- alert: PodMemoryUsageexpr: |sum(container_memory_rss{image!=""}) by(pod, namespace) / sum(container_spec_memory_limit_bytes{image!=""}) by(pod, namespace) * 100 != +inf > 85for: 5mlabels:severity: criticalpod: '{{$labels.pod}}'annotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 内存使用大于85% (当前值: {{ $value }})"- alert: KubeDeploymentErrorexpr: |kube_deployment_spec_replicas{job="kubernetes-service-endpoints"} != kube_deployment_status_replicas_available{job="kubernetes-service-endpoints"}for: 3mlabels:severity: warningpod: '{{$labels.deployment}}'annotations:description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }}控制器与实际数量不相符 (当前值: {{ $value }})"- alert: coreDnsErrorexpr: |kube_pod_container_status_running{container="coredns"} == 0for: 1mlabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} coreDns服务异常 (当前值: {{ $value }})"- alert: kubeProxyErrorexpr: |kube_pod_container_status_running{container="kube-proxy"} == 0for: 1mlabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} kube-proxy服务异常 (当前值: {{ $value }})"- alert: filebeatErrorexpr: |kube_pod_container_status_running{container="filebeat"} == 0for: 1mlabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} filebeat服务异常 (当前值: {{ $value }})"- alert: PodNetworkReceiveexpr: |sum(rate(container_network_receive_bytes_total{image!="",name=~"^k8s_.*"}[5m]) /1000) by (pod,namespace) > 60000for: 5mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 入口流量大于60MB/s (当前值: {{ $value }}K/s)"- alert: PodNetworkTransmitexpr: |sum(rate(container_network_transmit_bytes_total{image!="",name=~"^k8s_.*"}[5m]) /1000) by (pod,namespace) > 60000for: 5mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 出口流量大于60MB/s (当前值: {{ $value }}/K/s)"- alert: PodRestartexpr: |sum(changes(kube_pod_container_status_restarts_total[1m])) by (pod,namespace) > 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod重启 (当前值: {{ $value }})"- alert: PodFailedexpr: |sum(kube_pod_status_phase{phase="Failed"}) by (pod,namespace) > 0for: 5slabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Failed (当前值: {{ $value }})"- alert: PodPendingexpr: |sum(kube_pod_status_phase{phase="Pending"}) by (pod,namespace) > 0for: 30slabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Pending (当前值: {{ $value }})"- alert: PodErrImagePullexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="ErrImagePull"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态ErrImagePull (当前值: {{ $value }})"- alert: PodImagePullBackOffexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="ImagePullBackOff"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态ImagePullBackOff (当前值: {{ $value }})"- alert: PodCrashLoopBackOffexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态CrashLoopBackOff (当前值: {{ $value }})"- alert: PodInvalidImageNameexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="InvalidImageName"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态InvalidImageName (当前值: {{ $value }})"- alert: PodCreateContainerConfigErrorexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="CreateContainerConfigError"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态CreateContainerConfigError (当前值: {{ $value }})"- alert: KubernetesContainerOomKillerexpr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1for: 0mlabels:severity: warningannotations:summary: Kubernetes container oom killer (instance {{ $labels.instance }})description: "{{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes!"- alert: KubernetesPersistentvolumeErrorexpr: kube_persistentvolume_status_phase{phase=~"Failed|Pending", job="kube-state-metrics"} > 0for: 0mlabels:severity: criticalannotations:summary: Kubernetes PersistentVolume error (instance {{ $labels.instance }})description: "{{ $labels.instance }} Persistent volume is in bad state!"- alert: KubernetesStatefulsetDownexpr: (kube_statefulset_status_replicas_ready / kube_statefulset_status_replicas_current) != 1for: 1mlabels:severity: criticalannotations:summary: Kubernetes StatefulSet down (instance {{ $labels.instance }})description: "{{ $labels.statefulset }} A StatefulSet went down!"- alert: KubernetesStatefulsetReplicasMismatchexpr: kube_statefulset_status_replicas_ready != kube_statefulset_status_replicasfor: 10mlabels:severity: warningannotations:summary: Kubernetes StatefulSet replicas mismatch (instance {{ $labels.instance }})description: "{{ $labels.statefulset }} A StatefulSet does not match the expected number of replicas."coredns.rules: |groups:- name: EmbeddedExporterrules:- alert: CorednsPanicCountexpr: 'increase(coredns_panics_total[1m]) > 0'for: 0mlabels:severity: criticalannotations:summary: CoreDNS Panic Count (instance {{ $labels.instance }})description: "Number of CoreDNS panics encountered\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"etcd.rules: |groups:- name: EmbeddedExporterrules:- alert: EtcdInsufficientMembersexpr: 'count(etcd_server_id) % 2 == 0'for: 0mlabels:severity: criticalannotations:summary: Etcd insufficient Members (instance {{ $labels.instance }})description: "Etcd cluster should have an odd number of members\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdNoLeaderexpr: 'etcd_server_has_leader == 0'for: 0mlabels:severity: criticalannotations:summary: Etcd no Leader (instance {{ $labels.instance }})description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfLeaderChangesexpr: 'increase(etcd_server_leader_changes_seen_total[10m]) > 2'for: 0mlabels:severity: warningannotations:summary: Etcd high number of leader changes (instance {{ $labels.instance }})description: "Etcd leader changed more than 2 times during 10 minutes\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedGrpcRequestsexpr: 'sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) BY (grpc_service, grpc_method) / sum(rate(grpc_server_handled_total[1m])) BY (grpc_service, grpc_method) > 0.01'for: 2mlabels:severity: warningannotations:summary: Etcd high number of failed GRPC requests (instance {{ $labels.instance }})description: "More than 1% GRPC request failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedGrpcRequestsexpr: 'sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) BY (grpc_service, grpc_method) / sum(rate(grpc_server_handled_total[1m])) BY (grpc_service, grpc_method) > 0.05'for: 2mlabels:severity: criticalannotations:summary: Etcd high number of failed GRPC requests (instance {{ $labels.instance }})description: "More than 5% GRPC request failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdGrpcRequestsSlowexpr: 'histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m])) by (grpc_service, grpc_method, le)) > 0.15'for: 2mlabels:severity: warningannotations:summary: Etcd GRPC requests slow (instance {{ $labels.instance }})description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedHttpRequestsexpr: 'sum(rate(etcd_http_failed_total[1m])) BY (method) / sum(rate(etcd_http_received_total[1m])) BY (method) > 0.01'for: 2mlabels:severity: warningannotations:summary: Etcd high number of failed HTTP requests (instance {{ $labels.instance }})description: "More than 1% HTTP failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedHttpRequestsexpr: 'sum(rate(etcd_http_failed_total[1m])) BY (method) / sum(rate(etcd_http_received_total[1m])) BY (method) > 0.05'for: 2mlabels:severity: criticalannotations:summary: Etcd high number of failed HTTP requests (instance {{ $labels.instance }})description: "More than 5% HTTP failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHttpRequestsSlowexpr: 'histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15'for: 2mlabels:severity: warningannotations:summary: Etcd HTTP requests slow (instance {{ $labels.instance }})description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdMemberCommunicationSlowexpr: 'histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15'for: 2mlabels:severity: warningannotations:summary: Etcd member communication slow (instance {{ $labels.instance }})description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedProposalsexpr: 'increase(etcd_server_proposals_failed_total[1h]) > 5'for: 2mlabels:severity: warningannotations:summary: Etcd high number of failed proposals (instance {{ $labels.instance }})description: "Etcd server got more than 5 failed proposals past hour\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighFsyncDurationsexpr: 'histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5'for: 2mlabels:severity: warningannotations:summary: Etcd high fsync durations (instance {{ $labels.instance }})description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighCommitDurationsexpr: 'histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[1m])) > 0.25'for: 2mlabels:severity: warningannotations:summary: Etcd high commit durations (instance {{ $labels.instance }})description: "Etcd commit duration increasing, 99th percentile is over 0.25s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"kubestate.rules: |groups:- name: KubestateExporterrules:- alert: KubernetesNodeNotReadyexpr: 'kube_node_status_condition{condition="Ready",status="true"} == 0'for: 10mlabels:severity: criticalannotations:summary: Kubernetes Node not ready (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has been unready for a long time\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeMemoryPressureexpr: 'kube_node_status_condition{condition="MemoryPressure",status="true"} == 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes Node memory pressure (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has MemoryPressure condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeDiskPressureexpr: 'kube_node_status_condition{condition="DiskPressure",status="true"} == 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes Node disk pressure (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has DiskPressure condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeNetworkUnavailableexpr: 'kube_node_status_condition{condition="NetworkUnavailable",status="true"} == 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes Node network unavailable (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has NetworkUnavailable condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeOutOfPodCapacityexpr: 'sum by (node) ((kube_pod_status_phase{phase="Running"} == 1) + on(uid) group_left(node) (0 * kube_pod_info{pod_template_hash=""})) / sum by (node) (kube_node_status_allocatable{resource="pods"}) * 100 > 90'for: 2mlabels:severity: warningannotations:summary: Kubernetes Node out of pod capacity (instance {{ $labels.instance }})description: "Node {{ $labels.node }} is out of pod capacity\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesContainerOomKillerexpr: '(kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1'for: 0mlabels:severity: warningannotations:summary: Kubernetes Container oom killer (instance {{ $labels.instance }})description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesJobFailedexpr: 'kube_job_status_failed > 0'for: 0mlabels:severity: warningannotations:summary: Kubernetes Job failed (instance {{ $labels.instance }})description: "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesCronjobSuspendedexpr: 'kube_cronjob_spec_suspend != 0'for: 0mlabels:severity: warningannotations:summary: Kubernetes CronJob suspended (instance {{ $labels.instance }})description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPersistentvolumeclaimPendingexpr: 'kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1'for: 2mlabels:severity: warningannotations:summary: Kubernetes PersistentVolumeClaim pending (instance {{ $labels.instance }})description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pending\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesVolumeOutOfDiskSpaceexpr: 'kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10'for: 2mlabels:severity: warningannotations:summary: Kubernetes Volume out of disk space (instance {{ $labels.instance }})description: "Volume is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesVolumeFullInFourDaysexpr: 'predict_linear(kubelet_volume_stats_available_bytes[6h:5m], 4 * 24 * 3600) < 0'for: 0mlabels:severity: criticalannotations:summary: Kubernetes Volume full in four days (instance {{ $labels.instance }})description: "Volume under {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is expected to fill up within four days. Currently {{ $value | humanize }}% is available.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPersistentvolumeErrorexpr: 'kube_persistentvolume_status_phase{phase=~"Failed|Pending", job="kube-state-metrics"} > 0'for: 0mlabels:severity: criticalannotations:summary: Kubernetes PersistentVolume error (instance {{ $labels.instance }})description: "Persistent volume {{ $labels.persistentvolume }} is in bad state\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetDownexpr: 'kube_statefulset_replicas != kube_statefulset_status_replicas_ready > 0'for: 1mlabels:severity: criticalannotations:summary: Kubernetes StatefulSet down (instance {{ $labels.instance }})description: "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} went down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaScaleInabilityexpr: 'kube_horizontalpodautoscaler_status_condition{status="false", condition="AbleToScale"} == 1'for: 2mlabels:severity: warningannotations:summary: Kubernetes HPA scale inability (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} is unable to scale\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaMetricsUnavailabilityexpr: 'kube_horizontalpodautoscaler_status_condition{status="false", condition="ScalingActive"} == 1'for: 0mlabels:severity: warningannotations:summary: Kubernetes HPA metrics unavailability (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} is unable to collect metrics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaScaleMaximumexpr: 'kube_horizontalpodautoscaler_status_desired_replicas >= kube_horizontalpodautoscaler_spec_max_replicas'for: 2mlabels:severity: infoannotations:summary: Kubernetes HPA scale maximum (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} has hit maximum number of desired pods\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaUnderutilizedexpr: 'max(quantile_over_time(0.5, kube_horizontalpodautoscaler_status_desired_replicas[1d]) == kube_horizontalpodautoscaler_spec_min_replicas) by (horizontalpodautoscaler) > 3'for: 0mlabels:severity: infoannotations:summary: Kubernetes HPA underutilized (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} is constantly at minimum replicas for 50% of the time. Potential cost saving here.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPodNotHealthyexpr: 'sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0'for: 15mlabels:severity: criticalannotations:summary: Kubernetes Pod not healthy (instance {{ $labels.instance }})description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPodCrashLoopingexpr: 'increase(kube_pod_container_status_restarts_total[1m]) > 3'for: 2mlabels:severity: warningannotations:summary: Kubernetes pod crash looping (instance {{ $labels.instance }})description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesReplicasetReplicasMismatchexpr: 'kube_replicaset_spec_replicas != kube_replicaset_status_ready_replicas'for: 10mlabels:severity: warningannotations:summary: Kubernetes ReplicaSet replicas mismatch (instance {{ $labels.instance }})description: "ReplicaSet {{ $labels.namespace }}/{{ $labels.replicaset }} replicas mismatch\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDeploymentReplicasMismatchexpr: 'kube_deployment_spec_replicas != kube_deployment_status_replicas_available'for: 10mlabels:severity: warningannotations:summary: Kubernetes Deployment replicas mismatch (instance {{ $labels.instance }})description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replicas mismatch\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetReplicasMismatchexpr: 'kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas'for: 10mlabels:severity: warningannotations:summary: Kubernetes StatefulSet replicas mismatch (instance {{ $labels.instance }})description: "StatefulSet does not match the expected number of replicas.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDeploymentGenerationMismatchexpr: 'kube_deployment_status_observed_generation != kube_deployment_metadata_generation'for: 10mlabels:severity: criticalannotations:summary: Kubernetes Deployment generation mismatch (instance {{ $labels.instance }})description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has failed but has not been rolled back.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetGenerationMismatchexpr: 'kube_statefulset_status_observed_generation != kube_statefulset_metadata_generation'for: 10mlabels:severity: criticalannotations:summary: Kubernetes StatefulSet generation mismatch (instance {{ $labels.instance }})description: "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has failed but has not been rolled back.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetUpdateNotRolledOutexpr: 'max without (revision) (kube_statefulset_status_current_revision unless kube_statefulset_status_update_revision) * (kube_statefulset_replicas != kube_statefulset_status_replicas_updated)'for: 10mlabels:severity: warningannotations:summary: Kubernetes StatefulSet update not rolled out (instance {{ $labels.instance }})description: "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDaemonsetRolloutStuckexpr: 'kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0'for: 10mlabels:severity: warningannotations:summary: Kubernetes DaemonSet rollout stuck (instance {{ $labels.instance }})description: "Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled or not ready\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDaemonsetMisscheduledexpr: 'kube_daemonset_status_number_misscheduled > 0'for: 1mlabels:severity: criticalannotations:summary: Kubernetes DaemonSet misscheduled (instance {{ $labels.instance }})description: "Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesCronjobTooLongexpr: 'time() - kube_cronjob_next_schedule_time > 3600'for: 0mlabels:severity: warningannotations:summary: Kubernetes CronJob too long (instance {{ $labels.instance }})description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesJobSlowCompletionexpr: 'kube_job_spec_completions - kube_job_status_succeeded - kube_job_status_failed > 0'for: 12hlabels:severity: criticalannotations:summary: Kubernetes Job slow completion (instance {{ $labels.instance }})description: "Kubernetes Job {{ $labels.namespace }}/{{ $labels.job_name }} did not complete in time.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesApiServerErrorsexpr: 'sum(rate(apiserver_request_total{job="apiserver",code=~"^(?:5..)$"}[1m])) / sum(rate(apiserver_request_total{job="apiserver"}[1m])) * 100 > 3'for: 2mlabels:severity: criticalannotations:summary: Kubernetes API server errors (instance {{ $labels.instance }})description: "Kubernetes API server is experiencing high error rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesApiClientErrorsexpr: '(sum(rate(rest_client_requests_total{code=~"(4|5).."}[1m])) by (instance, job) / sum(rate(rest_client_requests_total[1m])) by (instance, job)) * 100 > 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes API client errors (instance {{ $labels.instance }})description: "Kubernetes API client is experiencing high error rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesClientCertificateExpiresNextWeekexpr: 'apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 7*24*60*60'for: 0mlabels:severity: warningannotations:summary: Kubernetes client certificate expires next week (instance {{ $labels.instance }})description: "A client certificate used to authenticate to the apiserver is expiring next week.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesClientCertificateExpiresSoonexpr: 'apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 24*60*60'for: 0mlabels:severity: criticalannotations:summary: Kubernetes client certificate expires soon (instance {{ $labels.instance }})description: "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesApiServerLatencyexpr: 'histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"} [10m])) WITHOUT (instance, resource)) / 1e+06 > 1'for: 2mlabels:severity: warningannotations:summary: Kubernetes API server latency (instance {{ $labels.instance }})description: "Kubernetes API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"mysql.rules: |groups:- name: MysqldExporterrules:- alert: MysqlDownexpr: 'mysql_up == 0'for: 0mlabels:severity: criticalannotations:summary: MySQL down (instance {{ $labels.instance }})description: "MySQL instance is down on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlTooManyConnections(>80%)expr: 'max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 > 80'for: 2mlabels:severity: warningannotations:summary: MySQL too many connections (> 80%) (instance {{ $labels.instance }})description: "More than 80% of MySQL connections are in use on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlHighThreadsRunningexpr: 'max_over_time(mysql_global_status_threads_running[1m]) / mysql_global_variables_max_connections * 100 > 60'for: 2mlabels:severity: warningannotations:summary: MySQL high threads running (instance {{ $labels.instance }})description: "More than 60% of MySQL connections are in running state on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlaveIoThreadNotRunningexpr: '( mysql_slave_status_slave_io_running and ON (instance) mysql_slave_status_master_server_id > 0 ) == 0'for: 0mlabels:severity: criticalannotations:summary: MySQL Slave IO thread not running (instance {{ $labels.instance }})description: "MySQL Slave IO thread not running on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlaveSqlThreadNotRunningexpr: '( mysql_slave_status_slave_sql_running and ON (instance) mysql_slave_status_master_server_id > 0) == 0'for: 0mlabels:severity: criticalannotations:summary: MySQL Slave SQL thread not running (instance {{ $labels.instance }})description: "MySQL Slave SQL thread not running on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlaveReplicationLagexpr: '( (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) and ON (instance) mysql_slave_status_master_server_id > 0 ) > 30'for: 1mlabels:severity: criticalannotations:summary: MySQL Slave replication lag (instance {{ $labels.instance }})description: "MySQL replication lag on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlowQueriesexpr: 'increase(mysql_global_status_slow_queries[1m]) > 0'for: 2mlabels:severity: warningannotations:summary: MySQL slow queries (instance {{ $labels.instance }})description: "MySQL server mysql has some new slow query.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlInnodbLogWaitsexpr: 'rate(mysql_global_status_innodb_log_waits[15m]) > 10'for: 0mlabels:severity: warningannotations:summary: MySQL InnoDB log waits (instance {{ $labels.instance }})description: "MySQL innodb log writes stalling\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlRestartedexpr: 'mysql_global_status_uptime < 60'for: 0mlabels:severity: infoannotations:summary: MySQL restarted (instance {{ $labels.instance }})description: "MySQL has just been restarted, less than one minute ago on {{ $labels.instance }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
---apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:name: prometheus-vs
spec:hosts:- "p.cctest"gateways:- cc-gwhttp:- match:- uri:prefix: /route:- destination:port:number: 9090host: prometheus.monitor.svc.cluster.local
---
apiVersion: v1
kind: Service
metadata:name: prometheusnamespace: monitorlabels:k8s-app: prometheus
spec:type: ClusterIPports:- name: httpport: 9090targetPort: 9090selector:k8s-app: prometheus
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitorlabels:k8s-app: prometheus
spec:replicas: 1selector:matchLabels:k8s-app: prometheustemplate:metadata:labels:k8s-app: prometheusspec:serviceAccountName: super-usercontainers:- name: prometheusimage: prom/prometheus:v2.36.0imagePullPolicy: IfNotPresentports:- name: httpcontainerPort: 9090securityContext:runAsUser: 65534privileged: truecommand:- "/bin/prometheus"args:- "--config.file=/etc/prometheus/prometheus.yml"- "--web.enable-lifecycle"- "--storage.tsdb.path=/prometheus"- "--storage.tsdb.retention.time=10d"- "--web.console.libraries=/etc/prometheus/console_libraries"- "--web.console.templates=/etc/prometheus/consoles"resources:limits:cpu: 2000mmemory: 2048Mirequests:cpu: 1000mmemory: 512MireadinessProbe:httpGet:path: /-/readyport: 9090initialDelaySeconds: 5timeoutSeconds: 10livenessProbe:httpGet:path: /-/healthyport: 9090initialDelaySeconds: 30timeoutSeconds: 30volumeMounts:- name: datamountPath: /prometheussubPath: prometheus- name: configmountPath: /etc/prometheus- name: prometheus-rulesmountPath: /etc/prometheus/rules- name: configmap-reloadimage: jimmidyson/configmap-reload:v0.5.0imagePullPolicy: IfNotPresentargs:- "--volume-dir=/etc/config"- "--webhook-url=http://localhost:9090/-/reload"resources:limits:cpu: 100mmemory: 100Mirequests:cpu: 10mmemory: 10MivolumeMounts:- name: configmountPath: /etc/configreadOnly: truevolumes:- name: datapersistentVolumeClaim:claimName: prometheus-data-pvc- name: prometheus-rulesconfigMap:name: prometheus-rules- name: configconfigMap:name: prometheus-config
AlertManager
apiVersion: v1
kind: ConfigMap
metadata:name: alertmanager-confignamespace: monitor
data:alertmanager.yml: |-global:resolve_timeout: 1msmtp_smarthost: 'smtp.163.com:25'     # 邮箱服务器的SMTP主机配置smtp_from: 'laoyang1df@163.com'    # 发件人smtp_auth_username: 'laoyang1df@163.com'      # 登录用户名smtp_auth_password: 'ZGYMAPQJDEYOZFVD'    # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码smtp_require_tls: false           # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。templates:- '/etc/alertmanager/*.tmpl'route:group_by: ['env','instance','type','group','job','alertname','cluster']   # 报警分组group_wait: 5s      # 在组内等待所配置的时间,如果同组内,5秒内出现相同报警,在一个组内出现。group_interval: 1m        # 如果组内内容不变化,合并为一条警报信息,2m后发送。repeat_interval: 2m    # 发送报警间隔,如果指定时间内没有修复,则重新发送报警。receiver: 'email'routes:- receiver: 'devops'match:severity: critical22group_wait: 5sgroup_interval: 5mrepeat_interval: 30mreceivers:- name: 'email'email_configs:- to: '553069938@qq.com'send_resolved: truehtml: '{{ template "email.to.html" . }}'- name: 'devops'email_configs:- to: '553069938@qq.com'send_resolved: truehtml: '{{ template "email.to.html" . }}'inhibit_rules:    # 抑制规则- source_match:       # 源标签警报触发时抑制含有目标标签的警报,在当前警报匹配 servrity: 'critical'severity: 'critical'target_match:severity: 'warning'    # 目标标签值正则匹配,可以是正则表达式如: ".*MySQL.*"equal: ['alertname', 'dev', 'instance']    # 确保这个配置下的标签内容相同才会抑制,也就是说警报中必须有这三个标签值才会被抑制。wechat.tmpl: |-{{ define "wechat.default.message" }}{{- if gt (len .Alerts.Firing) 0 -}}{{- range $index, $alert := .Alerts -}}{{- if eq $index 0 }}========= 监控报警 =========告警状态:{{   .Status }}告警级别:{{ .Labels.severity }}告警类型:{{ $alert.Labels.alertname }}故障主机: {{ $alert.Labels.instance }}告警主题: {{ $alert.Annotations.summary }}告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};触发阀值:{{ .Annotations.value }}故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}========= = end =  ========={{- end }}{{- end }}{{- end }}{{- if gt (len .Alerts.Resolved) 0 -}}{{- range $index, $alert := .Alerts -}}{{- if eq $index 0 }}========= 告警恢复 =========告警类型:{{ .Labels.alertname }}告警状态:{{   .Status }}告警主题: {{ $alert.Annotations.summary }}告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}{{- if gt (len $alert.Labels.instance) 0 }}实例信息: {{ $alert.Labels.instance }}{{- end }}========= = end =  ========={{- end }}{{- end }}{{- end }}{{- end }}email.tmpl: |-{{ define "email.from" }}xxx.com{{ end }}{{ define "email.to" }}xxx.com{{ end }}{{ define "email.to.html" }}{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}========= 监控报警 =========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }}  <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>========= = end =  =========<br>{{ end }}{{ end -}}{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}========= 告警恢复 =========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>========= = end =  =========<br>{{ end }}{{ end -}}{{- end }}
---
apiVersion: v1
kind: Service
metadata:name: alertmanagernamespace: monitorlabels:k8s-app: alertmanager
spec:type: ClusterIPports:- name: httpport: 9093targetPort: 9093selector:k8s-app: alertmanager
---
apiVersion: apps/v1
kind: Deployment
metadata:name: alertmanagernamespace: monitorlabels:k8s-app: alertmanager
spec:replicas: 1selector:matchLabels:k8s-app: alertmanagertemplate:metadata:labels:k8s-app: alertmanagerspec:containers:- name: alertmanagerimage: prom/alertmanager:v0.24.0imagePullPolicy: IfNotPresentports:- name: httpcontainerPort: 9093args:## 指定容器中AlertManager配置文件存放地址 (Docker容器中的绝对位置)- "--config.file=/etc/alertmanager/alertmanager.yml"## 指定AlertManager管理界面地址,用于在发生的告警信息中,附加AlertManager告警信息页面地址- "--web.external-url=https://alert.jxit.net.cn"  ## 指定监听的地址及端口- '--cluster.advertise-address=0.0.0.0:9093'## 指定数据存储位置 (Docker容器中的绝对位置)- "--storage.path=/alertmanager"resources:limits:cpu: 1000mmemory: 512Mirequests:cpu: 1000mmemory: 512MireadinessProbe:httpGet:path: /-/readyport: 9093initialDelaySeconds: 5timeoutSeconds: 10livenessProbe:httpGet:path: /-/healthyport: 9093initialDelaySeconds: 30timeoutSeconds: 30volumeMounts:- name: datamountPath: /alertmanager - name: configmountPath: /etc/alertmanager- name: configmap-reloadimage: jimmidyson/configmap-reload:v0.7.1args:- "--volume-dir=/etc/config"- "--webhook-url=http://localhost:9093/-/reload"resources:limits:cpu: 100mmemory: 100Mirequests:cpu: 100mmemory: 100MivolumeMounts:- name: configmountPath: /etc/configreadOnly: truevolumes:- name: datapersistentVolumeClaim:claimName: alertmanager-pvc- name: configconfigMap:name: alertmanager-config
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:namespace: monitorname: alertmanager-ingress
spec:ingressClassName: nginxrules:- host: alert.jxit.net.cnhttp:paths:- pathType: Prefixbackend:service:name: alertmanagerport:number: 9093path: /
Grafana pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-data-pvcnamespace: monitor
spec:accessModes:- ReadWriteManystorageClassName: "data-nfs-storage"resources:requests:storage: 10Gi
Grafana
apiVersion: v1
kind: ConfigMap
metadata:name: grafana-confignamespace: monitor
data:grafana.ini: |[server]root_url = http://grafana.kubernets.cn[smtp]enabled = falsehost = smtp.exmail.qq.com:465user = devops@xxxx.compassword = aDhUcxxxxyecEskip_verify = truefrom_address = devops@xxxx.com[alerting]enabled = falseexecute_alerts = false
---
apiVersion: v1
kind: Service
metadata:name: grafananamespace: monitorlabels:app: grafanacomponent: core
spec:type: ClusterIPports:- port: 3000selector:app: grafanacomponent: core
---
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-corenamespace: monitorlabels:app: grafanacomponent: core
spec:replicas: 1selector:matchLabels:app: grafanatemplate:metadata:labels:app: grafanacomponent: corespec:containers:- name: grafana-coreimage: grafana/grafana:latestimagePullPolicy: IfNotPresentvolumeMounts:- name: storagesubPath: grafanamountPath: /var/lib/grafana# env:resources:# keep request = limit to keep this container in guaranteed classlimits:cpu: 500mmemory: 1Girequests:cpu: 100mmemory: 500Mienv:            #配置环境变量,设置Grafana 的默认管理员用户名/密码# The following env variables set up basic auth twith the default admin user and admin password.- name: GF_AUTH_BASIC_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "false"# - name: GF_AUTH_ANONYMOUS_ORG_ROLE#   value: Admin# does not really work, because of template variables in exported dashboards:# - name: GF_DASHBOARDS_JSON_ENABLED#   value: "true"readinessProbe:httpGet:path: /loginport: 3000# initialDelaySeconds: 30# timeoutSeconds: 1volumeMounts:- name: datasubPath: grafanamountPath: /var/lib/grafana- name: grafana-configmountPath: /etc/grafanareadOnly: truesecurityContext:       #容器安全策略,设置运行容器使用的归属组与用户fsGroup: 472runAsUser: 472volumes:- name: datapersistentVolumeClaim:claimName: grafana-data-pvc- name: grafana-configconfigMap:name: grafana-config---apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:name: prometheus-vs
spec:hosts:- "g.cc-test"gateways:- cctest-gwhttp:- match:- uri:prefix: /route:- destination:port:number: 3000host: grafana.monitor.svc.cluster.local
--- 

这些yaml文件其实已经做好了一些基本组件的监控 但是如果要监控外部比如mysql
举个例子吧

监控mysql

我们用独立部署exporter方式来做监控 避免对k8s侵入过多

# 下载
curl http://stu.jxit.net.cn:88/k8s/mysqld_exporter-0.14.0.linux-amd64.tar.gz -o a.tar.gz
# 解压
tar -xvf a.tar.gz

创建mysql监控账号

CREATE USER 'exporter'@'<安装Prometheus的主机IP>' IDENTIFIED BY '<your password>';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'<安装Prometheus的主机IP>';
flush privileges;

my.cnf

cat <<E0F>> my.cnf
[client]
host = xxxx
port = xxxx
user = xxxxx
password= xxxx

启动mysql exporter

 nohup ./mysqld_exporter --config.my-cnf=my.cnf &

最后在grafana的web界面中 点击import 输入 15757 15759 对应的k8s基本信息监控 和节点基本信息监控
在这里插入图片描述

效果如下
在这里插入图片描述
最后想说一句话
不要局限于组件本身 比如这一套组件
要目标导向的去找组件 比如我想监控什么metric

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/844833.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

CSS+Canvas绘制最美星空(一闪一闪亮晶晶效果+流星划过)

1.效果 2.代码 <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><style>body,html {margin: 0;padding: 0;ov…

商业应用中的AI模型选择:开源还是闭源?

评价一个AI模型“好不好”、“有没有发展”&#xff0c;往往绕不开“开源”和“闭源”这两条不同的发展路径。这两种路径各有优劣&#xff0c;从数据隐私、商业应用和社区参与三个方面来看&#xff0c;我们可以更全面地理解它们的差异和影响。 方向一&#xff1a;数据隐私 开…

【Vue】v-on事件绑定指令

作用&#xff1a; 注册事件 添加监听 提供处理逻辑 使用Vue时&#xff0c;如需为DOM注册事件&#xff0c;及其的简单&#xff0c;语法如下&#xff1a; // 内联语句&#xff1a;就是一段可执行代码 // 内联语句是在模板中写代码&#xff0c;所以它是可以直接访问到里面的数…

Android12 将成果物生成到system和vendor分区

最近工作中遇到这么个事情&#xff0c;之前项目中留存的动态库是生成在vendor分区中&#xff0c;现在需要在system_ext分区中也生成&#xff0c;以便Android的系统进程也能访问这个动态库。 因此&#xff0c;需要修改对应动态库项目的Android.bp cc_library_shared { name: &qu…

20 VUE学习:插件

介绍 插件 (Plugins) 是一种能为 Vue 添加全局功能的工具代码。下面是如何安装一个插件的示例&#xff1a; import { createApp } from vueconst app createApp({})app.use(myPlugin, {/* 可选的选项 */ })一个插件可以是一个拥有 install() 方法的对象&#xff0c;也可以直接…

CNAS软件测试公司作用分享,如何获取CNAS软件测试报告?

在软件测试行业&#xff0c;CNAS认可和CNAS软件测试公司是不可忽视的关键词。CNAS认可是指中国合格评定国家认可委员会对特定领域组织、机构或公司的能力和资质进行的认可过程。该认可遵循国际标准及相关法律法规&#xff0c;是评定组织或实验室技术能力和专业水平的权威认可&a…

ctfshow web入门 嵌入式 bash cpp pwn

kali转bash shell方法 方便我们本地 bash脚本教程 下面这个代码是bash脚本 #!/bin/bashOIFS"$IFS"IFS"," //表示逗号为字段分隔符set $QUERY_STRING //将参数传入数组Args($QUERY_STRING)IFS"$OIFS" //恢复原始IFS值if [ "$…

2024年5月28号PMP每日三题含答案

2024年5月28号PMP每日三题含答案 1.项目经理与项目干系人开会&#xff0c;获得关于如何最好地向施工现场输送用品的信息和知识。这使用的是哪种沟通方法类型&#xff1f; A.交互式沟通 B.内部沟通 C.拉式沟通 D.推式沟通 1.解析&#xff1a;A是参考答案。开会是交互式沟通。知…

【考研数学】李艳芳900比李林880难吗?值得做吗?

差不多&#xff0c;只能说基础没搞好刷这两个都很费劲 李艳芳900题把每个章节题目划分为ABC三个难度级别&#xff0c;题目选取的难度较大也比较新颖&#xff0c;计算量也非常接近考研趋势&#xff0c;原创性很高&#xff0c;比较适合过完一轮的同学继续做补充和强化 880算是比…

Java 数组的基本使用【Array】

目录 含义语法格式语句特点数组的长度数组的元素打印数组显示数组数组的复制扩展示例【12】 含义 数组&#xff08;array&#xff09;是一种最简单的复合数据类型&#xff0c;它是有序数据的集合&#xff0c;数组中的每个元素具有相同的数据类型&#xff0c;可以用一个统一的数…

cesium绘制编辑区域

npm 安装也是可以的 #默认安装最新的 yarn add cesium#卸载插件 yarn remove cesium#安装指定版本的 yarn add cesium1.96.0#安装指定版本到测试环境 yarn add cesium1.96.0 -D yarn install turf/turf token记得换成您自己的&#xff01;&#xff01;&#xff01; <t…

代码随想录(二叉树)

二叉树的递归遍历 class Solution {public List<Integer> preorderTraversal(TreeNode root) {List<Integer> list new ArrayList<>();if(rootnull) {return list;}else {traversal(list,root);}return list;}public void traversal(List<Integer> l…

如何培养元技能?

如何培养元技能&#xff1f; 一、引言 在当今社会&#xff0c;仅仅依靠某一专业技能是远远不够的。我们需要拓宽自己的能力和视野&#xff0c;从而更好地应对日新月异的社会发展和工作需求。在这个过程中&#xff0c;培养元技能变得至关重要。元技能不仅有助于我们在各个领域中…

【全开源】驾校管理系统源码(FastAdmin+ThinkPHP)

一款基于FastAdminThinkPHP开发的驾校管理系统&#xff0c;驾校管理系统(DSS)主要面向驾驶学校实现内部信息化管理&#xff0c;让驾校管理者和工作人员更高效、更快捷的完成枯燥无味的工作&#xff0c;让工作更有条理。改变驾校传统的手工或半手工Excel文档管理的工作方式。多驾…

JavaScript数据类型;属性,对象,方法;var,let,const,局部变量,全局变量

JavaScript数据类型&#xff1a; 值类型(基本类型)&#xff1a;字符串&#xff08;String&#xff09;、数字(Number)、布尔(Boolean)、空&#xff08;Null&#xff09;、未定义&#xff08;Undefined&#xff09;、Symbol。 引用数据类型&#xff08;对象类型&#xff09;&a…

PG主从切换

文章目录 一、 不再需要配置recovery.conf文件二、 备库执行基础备份时新的命令行选项-R三、 如何生成standby.signal文件四、初次主备切换流程1、主库停止2、备库提升为新主库&#xff0c;对外提供服务3、新主库修改pg_hba.conf文件4、原主库新建$PGDATA/standby.signal文件5、…

嵌入式进阶——HID协议

&#x1f3ac; 秋野酱&#xff1a;《个人主页》 &#x1f525; 个人专栏:《Java专栏》《Python专栏》 ⛺️心若有所向往,何惧道阻且长 文章目录 USB烧录USB HID协议USB协议组成通讯流程 官方USB HID范例文件说明修改PC端的显示 兼容库函数HID键盘USB调试工具USB 描述符设备描述…

mysql登录报错 Client does not support authentication protocol requested by server

mysql登录报错 Client does not support authentication protocol requested by server 在使用 MySQL 数据库时&#xff0c;你可能会遇到以下错误消息&#xff1a; ERROR 1251 (08004): Client does not support authentication protocol requested by server; consider upgr…

【数据结构与算法 | 栈 + 队列篇】力扣232, 225

1. 力扣232 : 用栈实现队列 (1). 题 请你仅使用两个栈实现先入先出队列。队列应当支持一般队列支持的所有操作&#xff08;push、pop、peek、empty&#xff09;&#xff1a; 实现 MyQueue 类&#xff1a; void push(int x) 将元素 x 推到队列的末尾int pop() 从队列的开头移…

suse xen内核安装启动失败问题

Error 15 /boot/xen.gz not found Filesystem type is ext2fs, partition type 0x83 Error 15 原因&#xff1a; 除了安装以下三个安装包 -rw-r--r-- 1 root root 23362981 Jun 14 2013 kernel-xen-3.0.76-0.11.1.x86_64.rpm -rw-r--r-- 1 root root 14158930 Jun 14 20…