每期一个小窍门 k8s版本的 Prometheus + grafana + alertmanager 三件套部署监控落地

首先部署prometheus

首先是pvc

apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-data-pvcnamespace: monitor
spec:accessModes:- ReadWriteManystorageClassName: "data-nfs-storage"resources:requests:storage: 10Gi

然后接着 cluster-role 这里给了cluster-admin权限

apiVersion: v1
kind: ServiceAccount
metadata:name: prometheus2namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus2
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: cluster-admin
subjects:
- kind: ServiceAccountname: prometheus2namespace: monitor 

由于用的是istio 所以这里的ingress用vs替代

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitor
data:prometheus.yml: |global:scrape_interval:     15sevaluation_interval: 15sexternal_labels:cluster: "kubernetes"alerting:alertmanagers:- static_configs:- targets: ["aalertmanager:9093"] scrape_configs:- job_name: prometheusstatic_configs:- targets: ['localhost:9090']labels:instance: prometheus- job_name: kubeletmetrics_path: /metrics/cadvisorscheme: httpstls_config:insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: k8s-state-metricskubernetes_sd_configs: # k8s service discovery conf 服务发现配置- role: endpoints      # 去k8s的APIServer里拿取endpoints资源清单relabel_configs:- source_labels: [__meta_kubernetes_service_label_kubernetes_io_name]regex: kube-state-metricsaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:8080- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: ingressmetrics_path: /metrics/cadvisorkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- source_labels: [__address__]regex: '(.+):10250'target_label: __address__replacement: ${1}:10254- job_name: jx-mysql-master-37static_configs:- targets: ['10.0.40.3:9104']labels:instance: jx-mysql-master-36 ############ 指定告警规则文件路径位置 ###################rule_files:- /etc/prometheus/rules/*.rules
---
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-rulesnamespace: monitor
data:node-exporter.rules: |groups:- name: NodeExporterrules:- alert: HostOutOfMemoryexpr: '(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host out of memory (instance {{ $labels.instance }})description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostMemoryUnderMemoryPressureexpr: '(rate(node_vmstat_pgmajfault[1m]) > 1000) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host memory under memory pressure (instance {{ $labels.instance }})description: "The node is under heavy memory pressure. High rate of major page faults\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostMemoryIsUnderutilizedexpr: '(100 - (rate(node_memory_MemAvailable_bytes[30m]) / node_memory_MemTotal_bytes * 100) < 20) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 1wlabels:severity: infoannotations:summary: Host Memory is underutilized (instance {{ $labels.instance }})description: "Node memory is < 20% for 1 week. Consider reducing memory space. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualNetworkThroughputInexpr: '(sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual network throughput in (instance {{ $labels.instance }})description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualNetworkThroughputOutexpr: '(sum by (instance) (rate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 100) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual network throughput out (instance {{ $labels.instance }})description: "Host network interfaces are probably sending too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskReadRateexpr: '(sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual disk read rate (instance {{ $labels.instance }})description: "Disk is probably reading too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskWriteRateexpr: '(sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host unusual disk write rate (instance {{ $labels.instance }})description: "Disk is probably writing too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostOutOfDiskSpaceexpr: '((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host out of disk space (instance {{ $labels.instance }})description: "Disk is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostDiskWillFillIn24Hoursexpr: '((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs"}[1h], 24 * 3600) < 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host disk will fill in 24 hours (instance {{ $labels.instance }})description: "Filesystem is predicted to run out of space within the next 24 hours at current write rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostOutOfInodesexpr: '(node_filesystem_files_free / node_filesystem_files * 100 < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host out of inodes (instance {{ $labels.instance }})description: "Disk is almost running out of available inodes (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostFilesystemDeviceErrorexpr: 'node_filesystem_device_error == 1'for: 0mlabels:severity: criticalannotations:summary: Host filesystem device error (instance {{ $labels.instance }})description: "{{ $labels.instance }}: Device error with the {{ $labels.mountpoint }} filesystem\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostInodesWillFillIn24Hoursexpr: '(node_filesystem_files_free / node_filesystem_files * 100 < 10 and predict_linear(node_filesystem_files_free[1h], 24 * 3600) < 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host inodes will fill in 24 hours (instance {{ $labels.instance }})description: "Filesystem is predicted to run out of inodes within the next 24 hours at current write rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskReadLatencyexpr: '(rate(node_disk_read_time_seconds_total[1m]) / rate(node_disk_reads_completed_total[1m]) > 0.1 and rate(node_disk_reads_completed_total[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host unusual disk read latency (instance {{ $labels.instance }})description: "Disk latency is growing (read operations > 100ms)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskWriteLatencyexpr: '(rate(node_disk_write_time_seconds_total[1m]) / rate(node_disk_writes_completed_total[1m]) > 0.1 and rate(node_disk_writes_completed_total[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host unusual disk write latency (instance {{ $labels.instance }})description: "Disk latency is growing (write operations > 100ms)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostHighCpuLoadexpr: '(sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 10mlabels:severity: warningannotations:summary: Host high CPU load (instance {{ $labels.instance }})description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostCpuIsUnderutilizedexpr: '(100 - (rate(node_cpu_seconds_total{mode="idle"}[30m]) * 100) < 20) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 1wlabels:severity: infoannotations:summary: Host CPU is underutilized (instance {{ $labels.instance }})description: "CPU load is < 20% for 1 week. Consider reducing the number of CPUs.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostCpuStealNoisyNeighborexpr: '(avg by(instance) (rate(node_cpu_seconds_total{mode="steal"}[5m])) * 100 > 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host CPU steal noisy neighbor (instance {{ $labels.instance }})description: "CPU steal is > 10%. A noisy neighbor is killing VM performances or a spot instance may be out of credit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostCpuHighIowaitexpr: '(avg by (instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100 > 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host CPU high iowait (instance {{ $labels.instance }})description: "CPU iowait > 10%. A high iowait means that you are disk or network bound.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostUnusualDiskIoexpr: '(rate(node_disk_io_time_seconds_total[1m]) > 0.5) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host unusual disk IO (instance {{ $labels.instance }})description: "Time spent in IO is too high on {{ $labels.instance }}. Check storage for issues.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostContextSwitchingexpr: '((rate(node_context_switches_total[5m])) / (count without(cpu, mode) (node_cpu_seconds_total{mode="idle"})) > 10000) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host context switching (instance {{ $labels.instance }})description: "Context switching is growing on the node (> 10000 / s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostSwapIsFillingUpexpr: '((1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100 > 80) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host swap is filling up (instance {{ $labels.instance }})description: "Swap is filling up (>80%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostSystemdServiceCrashedexpr: '(node_systemd_unit_state{state="failed"} == 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host systemd service crashed (instance {{ $labels.instance }})description: "systemd service crashed\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostPhysicalComponentTooHotexpr: '((node_hwmon_temp_celsius * ignoring(label) group_left(instance, job, node, sensor) node_hwmon_sensor_label{label!="tctl"} > 75)) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host physical component too hot (instance {{ $labels.instance }})description: "Physical hardware component too hot\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNodeOvertemperatureAlarmexpr: '(node_hwmon_temp_crit_alarm_celsius == 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: criticalannotations:summary: Host node overtemperature alarm (instance {{ $labels.instance }})description: "Physical node temperature alarm triggered\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostRaidArrayGotInactiveexpr: '(node_md_state{state="inactive"} > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: criticalannotations:summary: Host RAID array got inactive (instance {{ $labels.instance }})description: "RAID array {{ $labels.device }} is in a degraded state due to one or more disk failures. The number of spare drives is insufficient to fix the issue automatically.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostRaidDiskFailureexpr: '(node_md_disks{state="failed"} > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host RAID disk failure (instance {{ $labels.instance }})description: "At least one device in RAID array on {{ $labels.instance }} failed. Array {{ $labels.md_device }} needs attention and possibly a disk swap\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostKernelVersionDeviationsexpr: '(count(sum(label_replace(node_uname_info, "kernel", "$1", "release", "([0-9]+.[0-9]+.[0-9]+).*")) by (kernel)) > 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 6hlabels:severity: warningannotations:summary: Host kernel version deviations (instance {{ $labels.instance }})description: "Different kernel versions are running\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostOomKillDetectedexpr: '(increase(node_vmstat_oom_kill[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host OOM kill detected (instance {{ $labels.instance }})description: "OOM kill detected\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostEdacCorrectableErrorsDetectedexpr: '(increase(node_edac_correctable_errors_total[1m]) > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: infoannotations:summary: Host EDAC Correctable Errors detected (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} has had {{ printf \"%.0f\" $value }} correctable memory errors reported by EDAC in the last 5 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostEdacUncorrectableErrorsDetectedexpr: '(node_edac_uncorrectable_errors_total > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 0mlabels:severity: warningannotations:summary: Host EDAC Uncorrectable Errors detected (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} has had {{ printf \"%.0f\" $value }} uncorrectable memory errors reported by EDAC in the last 5 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkReceiveErrorsexpr: '(rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host Network Receive Errors (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} receive errors in the last two minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkTransmitErrorsexpr: '(rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host Network Transmit Errors (instance {{ $labels.instance }})description: "Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} transmit errors in the last two minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkInterfaceSaturatedexpr: '((rate(node_network_receive_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m]) + rate(node_network_transmit_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m])) / node_network_speed_bytes{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"} > 0.8 < 10000) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 1mlabels:severity: warningannotations:summary: Host Network Interface Saturated (instance {{ $labels.instance }})description: "The network interface \"{{ $labels.device }}\" on \"{{ $labels.instance }}\" is getting overloaded.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostNetworkBondDegradedexpr: '((node_bonding_active - node_bonding_slaves) != 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host Network Bond Degraded (instance {{ $labels.instance }})description: "Bond \"{{ $labels.device }}\" degraded on \"{{ $labels.instance }}\".\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostConntrackLimitexpr: '(node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 0.8) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 5mlabels:severity: warningannotations:summary: Host conntrack limit (instance {{ $labels.instance }})description: "The number of conntrack is approaching limit\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostClockSkewexpr: '((node_timex_offset_seconds > 0.05 and deriv(node_timex_offset_seconds[5m]) >= 0) or (node_timex_offset_seconds < -0.05 and deriv(node_timex_offset_seconds[5m]) <= 0)) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 10mlabels:severity: warningannotations:summary: Host clock skew (instance {{ $labels.instance }})description: "Clock skew detected. Clock is out of sync. Ensure NTP is configured correctly on this host.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostClockNotSynchronisingexpr: '(min_over_time(node_timex_sync_status[1m]) == 0 and node_timex_maxerror_seconds >= 16) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 2mlabels:severity: warningannotations:summary: Host clock not synchronising (instance {{ $labels.instance }})description: "Clock not synchronising. Ensure NTP is configured on this host.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: HostRequiresRebootexpr: '(node_reboot_required > 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}'for: 4hlabels:severity: infoannotations:summary: Host requires reboot (instance {{ $labels.instance }})description: "{{ $labels.instance }} requires a reboot.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"volume.rules: |groups:- name: volume.rulesrules:- alert: PersistentVolumeClaimLostexpr: |sum by(namespace, persistentvolumeclaim) (kube_persistentvolumeclaim_status_phase{phase="Lost"}) == 1for: 2mlabels:severity: warningannotations:description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is lost!"- alert: PersistentVolumeClaimPendigexpr: |sum by(namespace, persistentvolumeclaim) (kube_persistentvolumeclaim_status_phase{phase="Pendig"}) == 1for: 2mlabels:severity: warningannotations:description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pendig!"- alert: PersistentVolume Failedexpr: |sum(kube_persistentvolume_status_phase{phase="Failed",job="kubernetes-service-endpoints"}) by (persistentvolume) == 1for: 2mlabels:severity: warningannotations:description: "Persistent volume is failed state\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: PersistentVolume Pendingexpr: |sum(kube_persistentvolume_status_phase{phase="Pending",job="kubernetes-service-endpoints"}) by (persistentvolume) == 1for: 2mlabels:severity: warningannotations:description: "Persistent volume is pending state\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"prometheus.rules: |groups:- name: prometheus.rulesrules:- alert: PrometheusErrorSendingAlertsToAnyAlertmanagersexpr: |(rate(prometheus_notifications_errors_total{instance="localhost:9090", job="prometheus"}[5m]) / rate(prometheus_notifications_sent_total{instance="localhost:9090", job="prometheus"}[5m])) * 100 > 3for: 5mlabels:severity: warningannotations:description: '{{ printf "%.1f" $value }}% minimum errors while sending alerts from Prometheus {{$labels.namespace}}/{{$labels.pod}} to any Alertmanager.'- alert: PrometheusNotConnectedToAlertmanagersexpr: |max_over_time(prometheus_notifications_alertmanagers_discovered{instance="localhost:9090", job="prometheus"}[5m]) != 1for: 5mlabels:severity: criticalannotations:description: "Prometheus {{$labels.namespace}}/{{$labels.pod}} 链接alertmanager异常!"- alert: PrometheusRuleFailuresexpr: |increase(prometheus_rule_evaluation_failures_total{instance="localhost:9090", job="prometheus"}[5m]) > 0for: 5mlabels:severity: criticalannotations:description: 'Prometheus {{$labels.namespace}}/{{$labels.pod}} 在5分钟执行失败的规则次数 {{ printf "%.0f" $value }}'- alert: PrometheusRuleEvaluationFailuresexpr: increase(prometheus_rule_evaluation_failures_total[3m]) > 0for: 0mlabels:severity: criticalannotations:summary: Prometheus rule evaluation failures (instance {{ $labels.instance }})description: "Prometheus 遇到规则 {{ $value }} 载入失败, 请及时检查."- alert: PrometheusTsdbReloadFailuresexpr: increase(prometheus_tsdb_reloads_failures_total[1m]) > 0for: 0mlabels:severity: criticalannotations:summary: Prometheus TSDB reload failures (instance {{ $labels.instance }})description: "Prometheus {{ $value }} TSDB 重载失败!"- alert: PrometheusTsdbWalCorruptionsexpr: increase(prometheus_tsdb_wal_corruptions_total[1m]) > 0for: 0mlabels:severity: criticalannotations:summary: Prometheus TSDB WAL corruptions (instance {{ $labels.instance }})description: "Prometheus {{ $value }} TSDB WAL 模块出现问题!"website.rules: |groups:- name: website.rulesrules:- alert: "ssl证书过期警告"expr: (probe_ssl_earliest_cert_expiry - time())/86400 <30for: 1hlabels:severity: warningannotations:description: '域名{{$labels.instance}}的证书还有{{ printf "%.1f" $value }}天就过期了,请尽快更新证书'summary: "ssl证书过期警告"- alert: blackbox_network_statsexpr: probe_success == 0for: 1mlabels:severity: criticalpod: '{{$labels.instance}}'namespace: '{{$labels.kubernetes_namespace}}'annotations:summary: "接口/主机/端口/域名 {{ $labels.instance }} 不能访问"description: "接口/主机/端口/域名 {{ $labels.instance }} 不能访问,请尽快检测!"- alert: curlHttpStatusexpr:  probe_http_status_code{job="blackbox-http"} >= 422 and probe_success{job="blackbox-http"} == 0for: 1mlabels:severity: criticalannotations:summary: '业务报警: 网站不可访问'description: '{{$labels.instance}} 不可访问,请及时查看,当前状态码为{{$value}}'pod.rules: |groups:- name: pod.rulesrules:- alert: PodCPUUsageexpr: |sum(rate(container_cpu_usage_seconds_total{image!=""}[5m]) * 100) by (pod, namespace) > 90for: 5mlabels:severity: warningpod: '{{$labels.pod}}'annotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} CPU使用大于90% (当前值: {{ $value }})"- alert: PodMemoryUsageexpr: |sum(container_memory_rss{image!=""}) by(pod, namespace) / sum(container_spec_memory_limit_bytes{image!=""}) by(pod, namespace) * 100 != +inf > 85for: 5mlabels:severity: criticalpod: '{{$labels.pod}}'annotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 内存使用大于85% (当前值: {{ $value }})"- alert: KubeDeploymentErrorexpr: |kube_deployment_spec_replicas{job="kubernetes-service-endpoints"} != kube_deployment_status_replicas_available{job="kubernetes-service-endpoints"}for: 3mlabels:severity: warningpod: '{{$labels.deployment}}'annotations:description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }}控制器与实际数量不相符 (当前值: {{ $value }})"- alert: coreDnsErrorexpr: |kube_pod_container_status_running{container="coredns"} == 0for: 1mlabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} coreDns服务异常 (当前值: {{ $value }})"- alert: kubeProxyErrorexpr: |kube_pod_container_status_running{container="kube-proxy"} == 0for: 1mlabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} kube-proxy服务异常 (当前值: {{ $value }})"- alert: filebeatErrorexpr: |kube_pod_container_status_running{container="filebeat"} == 0for: 1mlabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} filebeat服务异常 (当前值: {{ $value }})"- alert: PodNetworkReceiveexpr: |sum(rate(container_network_receive_bytes_total{image!="",name=~"^k8s_.*"}[5m]) /1000) by (pod,namespace) > 60000for: 5mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 入口流量大于60MB/s (当前值: {{ $value }}K/s)"- alert: PodNetworkTransmitexpr: |sum(rate(container_network_transmit_bytes_total{image!="",name=~"^k8s_.*"}[5m]) /1000) by (pod,namespace) > 60000for: 5mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 出口流量大于60MB/s (当前值: {{ $value }}/K/s)"- alert: PodRestartexpr: |sum(changes(kube_pod_container_status_restarts_total[1m])) by (pod,namespace) > 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod重启 (当前值: {{ $value }})"- alert: PodFailedexpr: |sum(kube_pod_status_phase{phase="Failed"}) by (pod,namespace) > 0for: 5slabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Failed (当前值: {{ $value }})"- alert: PodPendingexpr: |sum(kube_pod_status_phase{phase="Pending"}) by (pod,namespace) > 0for: 30slabels:severity: criticalannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Pending (当前值: {{ $value }})"- alert: PodErrImagePullexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="ErrImagePull"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态ErrImagePull (当前值: {{ $value }})"- alert: PodImagePullBackOffexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="ImagePullBackOff"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态ImagePullBackOff (当前值: {{ $value }})"- alert: PodCrashLoopBackOffexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态CrashLoopBackOff (当前值: {{ $value }})"- alert: PodInvalidImageNameexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="InvalidImageName"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态InvalidImageName (当前值: {{ $value }})"- alert: PodCreateContainerConfigErrorexpr: |sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="CreateContainerConfigError"}) == 1for: 1mlabels:severity: warningannotations:description: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }}  Pod状态CreateContainerConfigError (当前值: {{ $value }})"- alert: KubernetesContainerOomKillerexpr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1for: 0mlabels:severity: warningannotations:summary: Kubernetes container oom killer (instance {{ $labels.instance }})description: "{{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes!"- alert: KubernetesPersistentvolumeErrorexpr: kube_persistentvolume_status_phase{phase=~"Failed|Pending", job="kube-state-metrics"} > 0for: 0mlabels:severity: criticalannotations:summary: Kubernetes PersistentVolume error (instance {{ $labels.instance }})description: "{{ $labels.instance }} Persistent volume is in bad state!"- alert: KubernetesStatefulsetDownexpr: (kube_statefulset_status_replicas_ready / kube_statefulset_status_replicas_current) != 1for: 1mlabels:severity: criticalannotations:summary: Kubernetes StatefulSet down (instance {{ $labels.instance }})description: "{{ $labels.statefulset }} A StatefulSet went down!"- alert: KubernetesStatefulsetReplicasMismatchexpr: kube_statefulset_status_replicas_ready != kube_statefulset_status_replicasfor: 10mlabels:severity: warningannotations:summary: Kubernetes StatefulSet replicas mismatch (instance {{ $labels.instance }})description: "{{ $labels.statefulset }} A StatefulSet does not match the expected number of replicas."coredns.rules: |groups:- name: EmbeddedExporterrules:- alert: CorednsPanicCountexpr: 'increase(coredns_panics_total[1m]) > 0'for: 0mlabels:severity: criticalannotations:summary: CoreDNS Panic Count (instance {{ $labels.instance }})description: "Number of CoreDNS panics encountered\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"etcd.rules: |groups:- name: EmbeddedExporterrules:- alert: EtcdInsufficientMembersexpr: 'count(etcd_server_id) % 2 == 0'for: 0mlabels:severity: criticalannotations:summary: Etcd insufficient Members (instance {{ $labels.instance }})description: "Etcd cluster should have an odd number of members\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdNoLeaderexpr: 'etcd_server_has_leader == 0'for: 0mlabels:severity: criticalannotations:summary: Etcd no Leader (instance {{ $labels.instance }})description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfLeaderChangesexpr: 'increase(etcd_server_leader_changes_seen_total[10m]) > 2'for: 0mlabels:severity: warningannotations:summary: Etcd high number of leader changes (instance {{ $labels.instance }})description: "Etcd leader changed more than 2 times during 10 minutes\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedGrpcRequestsexpr: 'sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) BY (grpc_service, grpc_method) / sum(rate(grpc_server_handled_total[1m])) BY (grpc_service, grpc_method) > 0.01'for: 2mlabels:severity: warningannotations:summary: Etcd high number of failed GRPC requests (instance {{ $labels.instance }})description: "More than 1% GRPC request failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedGrpcRequestsexpr: 'sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) BY (grpc_service, grpc_method) / sum(rate(grpc_server_handled_total[1m])) BY (grpc_service, grpc_method) > 0.05'for: 2mlabels:severity: criticalannotations:summary: Etcd high number of failed GRPC requests (instance {{ $labels.instance }})description: "More than 5% GRPC request failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdGrpcRequestsSlowexpr: 'histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m])) by (grpc_service, grpc_method, le)) > 0.15'for: 2mlabels:severity: warningannotations:summary: Etcd GRPC requests slow (instance {{ $labels.instance }})description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedHttpRequestsexpr: 'sum(rate(etcd_http_failed_total[1m])) BY (method) / sum(rate(etcd_http_received_total[1m])) BY (method) > 0.01'for: 2mlabels:severity: warningannotations:summary: Etcd high number of failed HTTP requests (instance {{ $labels.instance }})description: "More than 1% HTTP failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedHttpRequestsexpr: 'sum(rate(etcd_http_failed_total[1m])) BY (method) / sum(rate(etcd_http_received_total[1m])) BY (method) > 0.05'for: 2mlabels:severity: criticalannotations:summary: Etcd high number of failed HTTP requests (instance {{ $labels.instance }})description: "More than 5% HTTP failure detected in Etcd\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHttpRequestsSlowexpr: 'histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15'for: 2mlabels:severity: warningannotations:summary: Etcd HTTP requests slow (instance {{ $labels.instance }})description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdMemberCommunicationSlowexpr: 'histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15'for: 2mlabels:severity: warningannotations:summary: Etcd member communication slow (instance {{ $labels.instance }})description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighNumberOfFailedProposalsexpr: 'increase(etcd_server_proposals_failed_total[1h]) > 5'for: 2mlabels:severity: warningannotations:summary: Etcd high number of failed proposals (instance {{ $labels.instance }})description: "Etcd server got more than 5 failed proposals past hour\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighFsyncDurationsexpr: 'histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5'for: 2mlabels:severity: warningannotations:summary: Etcd high fsync durations (instance {{ $labels.instance }})description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: EtcdHighCommitDurationsexpr: 'histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[1m])) > 0.25'for: 2mlabels:severity: warningannotations:summary: Etcd high commit durations (instance {{ $labels.instance }})description: "Etcd commit duration increasing, 99th percentile is over 0.25s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"kubestate.rules: |groups:- name: KubestateExporterrules:- alert: KubernetesNodeNotReadyexpr: 'kube_node_status_condition{condition="Ready",status="true"} == 0'for: 10mlabels:severity: criticalannotations:summary: Kubernetes Node not ready (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has been unready for a long time\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeMemoryPressureexpr: 'kube_node_status_condition{condition="MemoryPressure",status="true"} == 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes Node memory pressure (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has MemoryPressure condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeDiskPressureexpr: 'kube_node_status_condition{condition="DiskPressure",status="true"} == 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes Node disk pressure (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has DiskPressure condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeNetworkUnavailableexpr: 'kube_node_status_condition{condition="NetworkUnavailable",status="true"} == 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes Node network unavailable (instance {{ $labels.instance }})description: "Node {{ $labels.node }} has NetworkUnavailable condition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesNodeOutOfPodCapacityexpr: 'sum by (node) ((kube_pod_status_phase{phase="Running"} == 1) + on(uid) group_left(node) (0 * kube_pod_info{pod_template_hash=""})) / sum by (node) (kube_node_status_allocatable{resource="pods"}) * 100 > 90'for: 2mlabels:severity: warningannotations:summary: Kubernetes Node out of pod capacity (instance {{ $labels.instance }})description: "Node {{ $labels.node }} is out of pod capacity\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesContainerOomKillerexpr: '(kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1'for: 0mlabels:severity: warningannotations:summary: Kubernetes Container oom killer (instance {{ $labels.instance }})description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesJobFailedexpr: 'kube_job_status_failed > 0'for: 0mlabels:severity: warningannotations:summary: Kubernetes Job failed (instance {{ $labels.instance }})description: "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesCronjobSuspendedexpr: 'kube_cronjob_spec_suspend != 0'for: 0mlabels:severity: warningannotations:summary: Kubernetes CronJob suspended (instance {{ $labels.instance }})description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPersistentvolumeclaimPendingexpr: 'kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1'for: 2mlabels:severity: warningannotations:summary: Kubernetes PersistentVolumeClaim pending (instance {{ $labels.instance }})description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pending\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesVolumeOutOfDiskSpaceexpr: 'kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10'for: 2mlabels:severity: warningannotations:summary: Kubernetes Volume out of disk space (instance {{ $labels.instance }})description: "Volume is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesVolumeFullInFourDaysexpr: 'predict_linear(kubelet_volume_stats_available_bytes[6h:5m], 4 * 24 * 3600) < 0'for: 0mlabels:severity: criticalannotations:summary: Kubernetes Volume full in four days (instance {{ $labels.instance }})description: "Volume under {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is expected to fill up within four days. Currently {{ $value | humanize }}% is available.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPersistentvolumeErrorexpr: 'kube_persistentvolume_status_phase{phase=~"Failed|Pending", job="kube-state-metrics"} > 0'for: 0mlabels:severity: criticalannotations:summary: Kubernetes PersistentVolume error (instance {{ $labels.instance }})description: "Persistent volume {{ $labels.persistentvolume }} is in bad state\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetDownexpr: 'kube_statefulset_replicas != kube_statefulset_status_replicas_ready > 0'for: 1mlabels:severity: criticalannotations:summary: Kubernetes StatefulSet down (instance {{ $labels.instance }})description: "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} went down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaScaleInabilityexpr: 'kube_horizontalpodautoscaler_status_condition{status="false", condition="AbleToScale"} == 1'for: 2mlabels:severity: warningannotations:summary: Kubernetes HPA scale inability (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} is unable to scale\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaMetricsUnavailabilityexpr: 'kube_horizontalpodautoscaler_status_condition{status="false", condition="ScalingActive"} == 1'for: 0mlabels:severity: warningannotations:summary: Kubernetes HPA metrics unavailability (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} is unable to collect metrics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaScaleMaximumexpr: 'kube_horizontalpodautoscaler_status_desired_replicas >= kube_horizontalpodautoscaler_spec_max_replicas'for: 2mlabels:severity: infoannotations:summary: Kubernetes HPA scale maximum (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} has hit maximum number of desired pods\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesHpaUnderutilizedexpr: 'max(quantile_over_time(0.5, kube_horizontalpodautoscaler_status_desired_replicas[1d]) == kube_horizontalpodautoscaler_spec_min_replicas) by (horizontalpodautoscaler) > 3'for: 0mlabels:severity: infoannotations:summary: Kubernetes HPA underutilized (instance {{ $labels.instance }})description: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} is constantly at minimum replicas for 50% of the time. Potential cost saving here.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPodNotHealthyexpr: 'sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0'for: 15mlabels:severity: criticalannotations:summary: Kubernetes Pod not healthy (instance {{ $labels.instance }})description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesPodCrashLoopingexpr: 'increase(kube_pod_container_status_restarts_total[1m]) > 3'for: 2mlabels:severity: warningannotations:summary: Kubernetes pod crash looping (instance {{ $labels.instance }})description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesReplicasetReplicasMismatchexpr: 'kube_replicaset_spec_replicas != kube_replicaset_status_ready_replicas'for: 10mlabels:severity: warningannotations:summary: Kubernetes ReplicaSet replicas mismatch (instance {{ $labels.instance }})description: "ReplicaSet {{ $labels.namespace }}/{{ $labels.replicaset }} replicas mismatch\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDeploymentReplicasMismatchexpr: 'kube_deployment_spec_replicas != kube_deployment_status_replicas_available'for: 10mlabels:severity: warningannotations:summary: Kubernetes Deployment replicas mismatch (instance {{ $labels.instance }})description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replicas mismatch\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetReplicasMismatchexpr: 'kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas'for: 10mlabels:severity: warningannotations:summary: Kubernetes StatefulSet replicas mismatch (instance {{ $labels.instance }})description: "StatefulSet does not match the expected number of replicas.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDeploymentGenerationMismatchexpr: 'kube_deployment_status_observed_generation != kube_deployment_metadata_generation'for: 10mlabels:severity: criticalannotations:summary: Kubernetes Deployment generation mismatch (instance {{ $labels.instance }})description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has failed but has not been rolled back.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetGenerationMismatchexpr: 'kube_statefulset_status_observed_generation != kube_statefulset_metadata_generation'for: 10mlabels:severity: criticalannotations:summary: Kubernetes StatefulSet generation mismatch (instance {{ $labels.instance }})description: "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has failed but has not been rolled back.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesStatefulsetUpdateNotRolledOutexpr: 'max without (revision) (kube_statefulset_status_current_revision unless kube_statefulset_status_update_revision) * (kube_statefulset_replicas != kube_statefulset_status_replicas_updated)'for: 10mlabels:severity: warningannotations:summary: Kubernetes StatefulSet update not rolled out (instance {{ $labels.instance }})description: "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDaemonsetRolloutStuckexpr: 'kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0'for: 10mlabels:severity: warningannotations:summary: Kubernetes DaemonSet rollout stuck (instance {{ $labels.instance }})description: "Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled or not ready\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesDaemonsetMisscheduledexpr: 'kube_daemonset_status_number_misscheduled > 0'for: 1mlabels:severity: criticalannotations:summary: Kubernetes DaemonSet misscheduled (instance {{ $labels.instance }})description: "Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesCronjobTooLongexpr: 'time() - kube_cronjob_next_schedule_time > 3600'for: 0mlabels:severity: warningannotations:summary: Kubernetes CronJob too long (instance {{ $labels.instance }})description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesJobSlowCompletionexpr: 'kube_job_spec_completions - kube_job_status_succeeded - kube_job_status_failed > 0'for: 12hlabels:severity: criticalannotations:summary: Kubernetes Job slow completion (instance {{ $labels.instance }})description: "Kubernetes Job {{ $labels.namespace }}/{{ $labels.job_name }} did not complete in time.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesApiServerErrorsexpr: 'sum(rate(apiserver_request_total{job="apiserver",code=~"^(?:5..)$"}[1m])) / sum(rate(apiserver_request_total{job="apiserver"}[1m])) * 100 > 3'for: 2mlabels:severity: criticalannotations:summary: Kubernetes API server errors (instance {{ $labels.instance }})description: "Kubernetes API server is experiencing high error rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesApiClientErrorsexpr: '(sum(rate(rest_client_requests_total{code=~"(4|5).."}[1m])) by (instance, job) / sum(rate(rest_client_requests_total[1m])) by (instance, job)) * 100 > 1'for: 2mlabels:severity: criticalannotations:summary: Kubernetes API client errors (instance {{ $labels.instance }})description: "Kubernetes API client is experiencing high error rate\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesClientCertificateExpiresNextWeekexpr: 'apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 7*24*60*60'for: 0mlabels:severity: warningannotations:summary: Kubernetes client certificate expires next week (instance {{ $labels.instance }})description: "A client certificate used to authenticate to the apiserver is expiring next week.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesClientCertificateExpiresSoonexpr: 'apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 24*60*60'for: 0mlabels:severity: criticalannotations:summary: Kubernetes client certificate expires soon (instance {{ $labels.instance }})description: "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: KubernetesApiServerLatencyexpr: 'histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"} [10m])) WITHOUT (instance, resource)) / 1e+06 > 1'for: 2mlabels:severity: warningannotations:summary: Kubernetes API server latency (instance {{ $labels.instance }})description: "Kubernetes API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"mysql.rules: |groups:- name: MysqldExporterrules:- alert: MysqlDownexpr: 'mysql_up == 0'for: 0mlabels:severity: criticalannotations:summary: MySQL down (instance {{ $labels.instance }})description: "MySQL instance is down on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlTooManyConnections(>80%)expr: 'max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 > 80'for: 2mlabels:severity: warningannotations:summary: MySQL too many connections (> 80%) (instance {{ $labels.instance }})description: "More than 80% of MySQL connections are in use on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlHighThreadsRunningexpr: 'max_over_time(mysql_global_status_threads_running[1m]) / mysql_global_variables_max_connections * 100 > 60'for: 2mlabels:severity: warningannotations:summary: MySQL high threads running (instance {{ $labels.instance }})description: "More than 60% of MySQL connections are in running state on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlaveIoThreadNotRunningexpr: '( mysql_slave_status_slave_io_running and ON (instance) mysql_slave_status_master_server_id > 0 ) == 0'for: 0mlabels:severity: criticalannotations:summary: MySQL Slave IO thread not running (instance {{ $labels.instance }})description: "MySQL Slave IO thread not running on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlaveSqlThreadNotRunningexpr: '( mysql_slave_status_slave_sql_running and ON (instance) mysql_slave_status_master_server_id > 0) == 0'for: 0mlabels:severity: criticalannotations:summary: MySQL Slave SQL thread not running (instance {{ $labels.instance }})description: "MySQL Slave SQL thread not running on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlaveReplicationLagexpr: '( (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) and ON (instance) mysql_slave_status_master_server_id > 0 ) > 30'for: 1mlabels:severity: criticalannotations:summary: MySQL Slave replication lag (instance {{ $labels.instance }})description: "MySQL replication lag on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlSlowQueriesexpr: 'increase(mysql_global_status_slow_queries[1m]) > 0'for: 2mlabels:severity: warningannotations:summary: MySQL slow queries (instance {{ $labels.instance }})description: "MySQL server mysql has some new slow query.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlInnodbLogWaitsexpr: 'rate(mysql_global_status_innodb_log_waits[15m]) > 10'for: 0mlabels:severity: warningannotations:summary: MySQL InnoDB log waits (instance {{ $labels.instance }})description: "MySQL innodb log writes stalling\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: MysqlRestartedexpr: 'mysql_global_status_uptime < 60'for: 0mlabels:severity: infoannotations:summary: MySQL restarted (instance {{ $labels.instance }})description: "MySQL has just been restarted, less than one minute ago on {{ $labels.instance }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
---apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:name: prometheus-vs
spec:hosts:- "p.cctest"gateways:- cc-gwhttp:- match:- uri:prefix: /route:- destination:port:number: 9090host: prometheus.monitor.svc.cluster.local
---
apiVersion: v1
kind: Service
metadata:name: prometheusnamespace: monitorlabels:k8s-app: prometheus
spec:type: ClusterIPports:- name: httpport: 9090targetPort: 9090selector:k8s-app: prometheus
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitorlabels:k8s-app: prometheus
spec:replicas: 1selector:matchLabels:k8s-app: prometheustemplate:metadata:labels:k8s-app: prometheusspec:serviceAccountName: super-usercontainers:- name: prometheusimage: prom/prometheus:v2.36.0imagePullPolicy: IfNotPresentports:- name: httpcontainerPort: 9090securityContext:runAsUser: 65534privileged: truecommand:- "/bin/prometheus"args:- "--config.file=/etc/prometheus/prometheus.yml"- "--web.enable-lifecycle"- "--storage.tsdb.path=/prometheus"- "--storage.tsdb.retention.time=10d"- "--web.console.libraries=/etc/prometheus/console_libraries"- "--web.console.templates=/etc/prometheus/consoles"resources:limits:cpu: 2000mmemory: 2048Mirequests:cpu: 1000mmemory: 512MireadinessProbe:httpGet:path: /-/readyport: 9090initialDelaySeconds: 5timeoutSeconds: 10livenessProbe:httpGet:path: /-/healthyport: 9090initialDelaySeconds: 30timeoutSeconds: 30volumeMounts:- name: datamountPath: /prometheussubPath: prometheus- name: configmountPath: /etc/prometheus- name: prometheus-rulesmountPath: /etc/prometheus/rules- name: configmap-reloadimage: jimmidyson/configmap-reload:v0.5.0imagePullPolicy: IfNotPresentargs:- "--volume-dir=/etc/config"- "--webhook-url=http://localhost:9090/-/reload"resources:limits:cpu: 100mmemory: 100Mirequests:cpu: 10mmemory: 10MivolumeMounts:- name: configmountPath: /etc/configreadOnly: truevolumes:- name: datapersistentVolumeClaim:claimName: prometheus-data-pvc- name: prometheus-rulesconfigMap:name: prometheus-rules- name: configconfigMap:name: prometheus-config
AlertManager
apiVersion: v1
kind: ConfigMap
metadata:name: alertmanager-confignamespace: monitor
data:alertmanager.yml: |-global:resolve_timeout: 1msmtp_smarthost: 'smtp.163.com:25'     # 邮箱服务器的SMTP主机配置smtp_from: 'laoyang1df@163.com'    # 发件人smtp_auth_username: 'laoyang1df@163.com'      # 登录用户名smtp_auth_password: 'ZGYMAPQJDEYOZFVD'    # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码smtp_require_tls: false           # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。templates:- '/etc/alertmanager/*.tmpl'route:group_by: ['env','instance','type','group','job','alertname','cluster']   # 报警分组group_wait: 5s      # 在组内等待所配置的时间,如果同组内,5秒内出现相同报警,在一个组内出现。group_interval: 1m        # 如果组内内容不变化,合并为一条警报信息,2m后发送。repeat_interval: 2m    # 发送报警间隔,如果指定时间内没有修复,则重新发送报警。receiver: 'email'routes:- receiver: 'devops'match:severity: critical22group_wait: 5sgroup_interval: 5mrepeat_interval: 30mreceivers:- name: 'email'email_configs:- to: '553069938@qq.com'send_resolved: truehtml: '{{ template "email.to.html" . }}'- name: 'devops'email_configs:- to: '553069938@qq.com'send_resolved: truehtml: '{{ template "email.to.html" . }}'inhibit_rules:    # 抑制规则- source_match:       # 源标签警报触发时抑制含有目标标签的警报,在当前警报匹配 servrity: 'critical'severity: 'critical'target_match:severity: 'warning'    # 目标标签值正则匹配,可以是正则表达式如: ".*MySQL.*"equal: ['alertname', 'dev', 'instance']    # 确保这个配置下的标签内容相同才会抑制,也就是说警报中必须有这三个标签值才会被抑制。wechat.tmpl: |-{{ define "wechat.default.message" }}{{- if gt (len .Alerts.Firing) 0 -}}{{- range $index, $alert := .Alerts -}}{{- if eq $index 0 }}========= 监控报警 =========告警状态:{{   .Status }}告警级别:{{ .Labels.severity }}告警类型:{{ $alert.Labels.alertname }}故障主机: {{ $alert.Labels.instance }}告警主题: {{ $alert.Annotations.summary }}告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};触发阀值:{{ .Annotations.value }}故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}========= = end =  ========={{- end }}{{- end }}{{- end }}{{- if gt (len .Alerts.Resolved) 0 -}}{{- range $index, $alert := .Alerts -}}{{- if eq $index 0 }}========= 告警恢复 =========告警类型:{{ .Labels.alertname }}告警状态:{{   .Status }}告警主题: {{ $alert.Annotations.summary }}告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}{{- if gt (len $alert.Labels.instance) 0 }}实例信息: {{ $alert.Labels.instance }}{{- end }}========= = end =  ========={{- end }}{{- end }}{{- end }}{{- end }}email.tmpl: |-{{ define "email.from" }}xxx.com{{ end }}{{ define "email.to" }}xxx.com{{ end }}{{ define "email.to.html" }}{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}========= 监控报警 =========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }}  <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>========= = end =  =========<br>{{ end }}{{ end -}}{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}========= 告警恢复 =========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>========= = end =  =========<br>{{ end }}{{ end -}}{{- end }}
---
apiVersion: v1
kind: Service
metadata:name: alertmanagernamespace: monitorlabels:k8s-app: alertmanager
spec:type: ClusterIPports:- name: httpport: 9093targetPort: 9093selector:k8s-app: alertmanager
---
apiVersion: apps/v1
kind: Deployment
metadata:name: alertmanagernamespace: monitorlabels:k8s-app: alertmanager
spec:replicas: 1selector:matchLabels:k8s-app: alertmanagertemplate:metadata:labels:k8s-app: alertmanagerspec:containers:- name: alertmanagerimage: prom/alertmanager:v0.24.0imagePullPolicy: IfNotPresentports:- name: httpcontainerPort: 9093args:## 指定容器中AlertManager配置文件存放地址 (Docker容器中的绝对位置)- "--config.file=/etc/alertmanager/alertmanager.yml"## 指定AlertManager管理界面地址,用于在发生的告警信息中,附加AlertManager告警信息页面地址- "--web.external-url=https://alert.jxit.net.cn"  ## 指定监听的地址及端口- '--cluster.advertise-address=0.0.0.0:9093'## 指定数据存储位置 (Docker容器中的绝对位置)- "--storage.path=/alertmanager"resources:limits:cpu: 1000mmemory: 512Mirequests:cpu: 1000mmemory: 512MireadinessProbe:httpGet:path: /-/readyport: 9093initialDelaySeconds: 5timeoutSeconds: 10livenessProbe:httpGet:path: /-/healthyport: 9093initialDelaySeconds: 30timeoutSeconds: 30volumeMounts:- name: datamountPath: /alertmanager - name: configmountPath: /etc/alertmanager- name: configmap-reloadimage: jimmidyson/configmap-reload:v0.7.1args:- "--volume-dir=/etc/config"- "--webhook-url=http://localhost:9093/-/reload"resources:limits:cpu: 100mmemory: 100Mirequests:cpu: 100mmemory: 100MivolumeMounts:- name: configmountPath: /etc/configreadOnly: truevolumes:- name: datapersistentVolumeClaim:claimName: alertmanager-pvc- name: configconfigMap:name: alertmanager-config
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:namespace: monitorname: alertmanager-ingress
spec:ingressClassName: nginxrules:- host: alert.jxit.net.cnhttp:paths:- pathType: Prefixbackend:service:name: alertmanagerport:number: 9093path: /
Grafana pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-data-pvcnamespace: monitor
spec:accessModes:- ReadWriteManystorageClassName: "data-nfs-storage"resources:requests:storage: 10Gi
Grafana
apiVersion: v1
kind: ConfigMap
metadata:name: grafana-confignamespace: monitor
data:grafana.ini: |[server]root_url = http://grafana.kubernets.cn[smtp]enabled = falsehost = smtp.exmail.qq.com:465user = devops@xxxx.compassword = aDhUcxxxxyecEskip_verify = truefrom_address = devops@xxxx.com[alerting]enabled = falseexecute_alerts = false
---
apiVersion: v1
kind: Service
metadata:name: grafananamespace: monitorlabels:app: grafanacomponent: core
spec:type: ClusterIPports:- port: 3000selector:app: grafanacomponent: core
---
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-corenamespace: monitorlabels:app: grafanacomponent: core
spec:replicas: 1selector:matchLabels:app: grafanatemplate:metadata:labels:app: grafanacomponent: corespec:containers:- name: grafana-coreimage: grafana/grafana:latestimagePullPolicy: IfNotPresentvolumeMounts:- name: storagesubPath: grafanamountPath: /var/lib/grafana# env:resources:# keep request = limit to keep this container in guaranteed classlimits:cpu: 500mmemory: 1Girequests:cpu: 100mmemory: 500Mienv:            #配置环境变量,设置Grafana 的默认管理员用户名/密码# The following env variables set up basic auth twith the default admin user and admin password.- name: GF_AUTH_BASIC_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "false"# - name: GF_AUTH_ANONYMOUS_ORG_ROLE#   value: Admin# does not really work, because of template variables in exported dashboards:# - name: GF_DASHBOARDS_JSON_ENABLED#   value: "true"readinessProbe:httpGet:path: /loginport: 3000# initialDelaySeconds: 30# timeoutSeconds: 1volumeMounts:- name: datasubPath: grafanamountPath: /var/lib/grafana- name: grafana-configmountPath: /etc/grafanareadOnly: truesecurityContext:       #容器安全策略,设置运行容器使用的归属组与用户fsGroup: 472runAsUser: 472volumes:- name: datapersistentVolumeClaim:claimName: grafana-data-pvc- name: grafana-configconfigMap:name: grafana-config---apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:name: prometheus-vs
spec:hosts:- "g.cc-test"gateways:- cctest-gwhttp:- match:- uri:prefix: /route:- destination:port:number: 3000host: grafana.monitor.svc.cluster.local
--- 

这些yaml文件其实已经做好了一些基本组件的监控 但是如果要监控外部比如mysql
举个例子吧

监控mysql

我们用独立部署exporter方式来做监控 避免对k8s侵入过多

# 下载
curl http://stu.jxit.net.cn:88/k8s/mysqld_exporter-0.14.0.linux-amd64.tar.gz -o a.tar.gz
# 解压
tar -xvf a.tar.gz

创建mysql监控账号

CREATE USER 'exporter'@'<安装Prometheus的主机IP>' IDENTIFIED BY '<your password>';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'<安装Prometheus的主机IP>';
flush privileges;

my.cnf

cat <<E0F>> my.cnf
[client]
host = xxxx
port = xxxx
user = xxxxx
password= xxxx

启动mysql exporter

 nohup ./mysqld_exporter --config.my-cnf=my.cnf &

最后在grafana的web界面中 点击import 输入 15757 15759 对应的k8s基本信息监控 和节点基本信息监控
在这里插入图片描述

效果如下
在这里插入图片描述
最后想说一句话
不要局限于组件本身 比如这一套组件
要目标导向的去找组件 比如我想监控什么metric

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/844833.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

CSS+Canvas绘制最美星空(一闪一闪亮晶晶效果+流星划过)

1.效果 2.代码 <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><style>body,html {margin: 0;padding: 0;ov…

20 VUE学习:插件

介绍 插件 (Plugins) 是一种能为 Vue 添加全局功能的工具代码。下面是如何安装一个插件的示例&#xff1a; import { createApp } from vueconst app createApp({})app.use(myPlugin, {/* 可选的选项 */ })一个插件可以是一个拥有 install() 方法的对象&#xff0c;也可以直接…

CNAS软件测试公司作用分享,如何获取CNAS软件测试报告?

在软件测试行业&#xff0c;CNAS认可和CNAS软件测试公司是不可忽视的关键词。CNAS认可是指中国合格评定国家认可委员会对特定领域组织、机构或公司的能力和资质进行的认可过程。该认可遵循国际标准及相关法律法规&#xff0c;是评定组织或实验室技术能力和专业水平的权威认可&a…

【考研数学】李艳芳900比李林880难吗?值得做吗?

差不多&#xff0c;只能说基础没搞好刷这两个都很费劲 李艳芳900题把每个章节题目划分为ABC三个难度级别&#xff0c;题目选取的难度较大也比较新颖&#xff0c;计算量也非常接近考研趋势&#xff0c;原创性很高&#xff0c;比较适合过完一轮的同学继续做补充和强化 880算是比…

Java 数组的基本使用【Array】

目录 含义语法格式语句特点数组的长度数组的元素打印数组显示数组数组的复制扩展示例【12】 含义 数组&#xff08;array&#xff09;是一种最简单的复合数据类型&#xff0c;它是有序数据的集合&#xff0c;数组中的每个元素具有相同的数据类型&#xff0c;可以用一个统一的数…

cesium绘制编辑区域

npm 安装也是可以的 #默认安装最新的 yarn add cesium#卸载插件 yarn remove cesium#安装指定版本的 yarn add cesium1.96.0#安装指定版本到测试环境 yarn add cesium1.96.0 -D yarn install turf/turf token记得换成您自己的&#xff01;&#xff01;&#xff01; <t…

如何培养元技能?

如何培养元技能&#xff1f; 一、引言 在当今社会&#xff0c;仅仅依靠某一专业技能是远远不够的。我们需要拓宽自己的能力和视野&#xff0c;从而更好地应对日新月异的社会发展和工作需求。在这个过程中&#xff0c;培养元技能变得至关重要。元技能不仅有助于我们在各个领域中…

【全开源】驾校管理系统源码(FastAdmin+ThinkPHP)

一款基于FastAdminThinkPHP开发的驾校管理系统&#xff0c;驾校管理系统(DSS)主要面向驾驶学校实现内部信息化管理&#xff0c;让驾校管理者和工作人员更高效、更快捷的完成枯燥无味的工作&#xff0c;让工作更有条理。改变驾校传统的手工或半手工Excel文档管理的工作方式。多驾…

嵌入式进阶——HID协议

&#x1f3ac; 秋野酱&#xff1a;《个人主页》 &#x1f525; 个人专栏:《Java专栏》《Python专栏》 ⛺️心若有所向往,何惧道阻且长 文章目录 USB烧录USB HID协议USB协议组成通讯流程 官方USB HID范例文件说明修改PC端的显示 兼容库函数HID键盘USB调试工具USB 描述符设备描述…

MFC工控项目实例之一主菜单制作

1、本项目用在WIN10下安装的vc6.0兼容版实现。创建项目名为SEAL_PRESSURE的MFC对话框。在项目res文件下添加相关256色ico格式图片。 2、项目名称&#xff1a;密封压力试验机 主菜单名称&#xff1a; 系统参数 SYS_DATA 系统测试 SYS_TEST 选择型号 TYP_CHOICE 开始试验 TES_STA…

SAP_SD模块 物料科目分配/成本简介

SAP系统各模块与财务都有个方面的集成。文本主要说明销售模块中的科目分配和成本的一个对应关系。 1、首先是在物料主数据上销售视图中的物料科目分配组&#xff0c;S1主营、S2材料等字段&#xff0c;物料销售的时候会将这个物料产生的记录到对应的科目中。 首先是物料主数据中…

pip更新网络问题:Exception: Traceback (most recent call last): File

报错&#xff1a;rootdebian01:~# pip3.9 install --upgrade pip Collecting pip Downloading pip-24.0-py3-none-any.whl (2.1 MB) |██████████████████▉ | 1.2 MB 5.5 kB/s eta 0:02:39ERROR: Exception: Traceback (most recent call last): File “/usr…

利用cython将.py文件编译为.pyd文件

文章目录 1. 引言2. py文件编译为pyd文件步骤2.1 环境准备2.2 准备setup.py文件2.3 进行编译 3. 测试代码 1. 引言 在实际的Python开发中&#xff0c;为了防止Python脚本源码暴露&#xff0c;常常需要对python源码文件进行加密保护&#xff0c;Python的原始文件格式为.py&…

在outlook的邮件中插入HTML;HTML模板获取;页面组态手动生成HTML

本文介绍如何在outlook发送邮件时&#xff0c;在邮件中插入HTML&#xff0c;此HTML可以从获取模板自行进行修改。 文章目录 一、下载HTML模板&#xff08;或自己制作好HTML文件&#xff09;二、outlook新增宏三、新建邮件&#xff0c;插入HTML四、通过图像化页面组态手动生成HT…

做场外个股期权怎么询价

做场外个股期权怎么询价&#xff1f;没有具体的哪家做市商是询价是最低的&#xff0c;个人投资者需要通过机构通道方询价进行对比&#xff0c;各券商的报价由询价机构方提供给到投资者&#xff0c;可以参考不同券商的报价进行比对&#xff0c;再决定是否进行投资。本文来自&…

操作系统复习-操作系统概述

操作系统概述 操作系统的基本功能 操作系统统一管理着计算机资源&#xff1a; 处理器资源IO设备资源存储器资源文件资源 操作系统实现了对计算机资源的抽象&#xff1a; 用户无需向硬件接口编程IO设备管理软件&#xff0c;提供读写接口文件管理软件&#xff0c;提供操作文…

关于验证码的那些漏洞

一、短信轰炸 这类漏洞存在的原因是没有对短信验证码的发送时间、用户及其IP作一些限制。 案例1、正常的短信轰炸 burp一直发包即可 案例2、并发绕过 做了限制咋办&#xff1f;可以试试并发(万物皆可并发) 使用turbo intruder插件进行并发。 并发次数越大是不是轰炸就越多。 …

宝塔安装java环境Jdk1.8

1.打开宝塔——选择“终端”——输入SSH的服务器IP和SSH账号&#xff0c;选择密码验证&#xff0c;输入密码 2。登录成功后&#xff0c;输入&#xff1a;yum list java-1.8*&#xff0c;用于列出所有与 “java-1.8” 相关的软件包 yum list java-1.8* 3.安装Jdk1.8: yum insta…

[leetcode hot150]第二百三十六题,二叉树的最近公共祖先

题目&#xff1a; 给定一个二叉树, 找到该树中两个指定节点的最近公共祖先。 百度百科中最近公共祖先的定义为&#xff1a;“对于有根树 T 的两个节点 p、q&#xff0c;最近公共祖先表示为一个节点 x&#xff0c;满足 x 是 p、q 的祖先且 x 的深度尽可能大&#xff08;一个…

【C++】前缀和:一维前缀和

1.题目 2.算法思路 如果暴力求解的话&#xff0c;时间复杂度为O(n*q)。一定会超时。 优化的思路也很简单&#xff0c;就是得到一个求和数组arr&#xff0c;使arr[i]a1a2...ai。 然后每次求l到r之间的数时&#xff0c;直接arr[r]-arr[l-1]就可以得出&#xff01; 这样&#…