[{"content":"Ascend NPU Deployment Guide for Kubernetes\rOverview\rThis document describes the complete process of deploying the Ascend NPU containerized environment in a Kubernetes cluster, suitable for the following scenarios:\nContainer Runtime: Containerd NPU Device: Ascend 310P System Architecture: aarch64 (ARM64) Kubernetes Version: 1.28+ The deployment process includes three main steps:\nEnvironment preparation (node labels, users, directories) Install Ascend Docker Runtime Deploy Ascend Device Plugin Preparation\rCreate Node Labels\rAdd appropriate labels to Kubernetes nodes for subsequent Pod scheduling and resource management.\n1 2 3 4 5 6 7 8 # Label master node kubectl label nodes ecs-b0tf90001 masterselector=dls-master-node # Label NPU compute nodes kubectl label nodes ecs-exyqec0002 node-role.kubernetes.io/worker=worker kubectl label nodes ecs-exyqec0002 workerselector=dls-worker-node kubectl label nodes ecs-exyqec0002 host-arch=huawei-arm kubectl label nodes ecs-exyqec0002 accelerator=huawei-Ascend310P Note: Please modify ecs-b0tf90001 and ecs-exyqec0002 according to your actual node names.\nCreate System User\rCreate dedicated users and groups on Ascend compute nodes (e.g., ecs-exyqec0002).\n1 2 3 4 5 6 7 8 # Create hwMindX user (UID 9000) useradd -d /home/hwMindX -u 9000 -m -s /sbin/nologin hwMindX # Create HwHiAiUser group groupadd HwHiAiUser # Add hwMindX user to HwHiAiUser group usermod -a -G HwHiAiUser hwMindX Important: UID 9000 and group HwHiAiUser are the default configurations for the Ascend software stack. Do not modify them arbitrarily.\nCreate Log Directory\rCreate the Device Plugin log directory on Ascend compute nodes.\n1 2 3 # Create log directory and set permissions mkdir -m 750 /var/log/mindx-dl/devicePlugin chown root:root /var/log/mindx-dl/devicePlugin Install Ascend Docker Runtime\rAscend Docker Runtime is a core component for using Ascend NPU in containerized environments and must be installed on all Ascend compute nodes.\nDownload Installation Package\rVisit the official Git repository to download the corresponding version of the installation package.\nExample: Ascend-docker-runtime_7.2.RC1.SPC2_linux-aarch64.run\nInstallation Steps\rStep 1: Enter Installation Package Directory\r1 cd \u0026lt;path to run package\u0026gt; Step 2: Verify Package Integrity\r1 ./Ascend-docker-runtime_{version}_linux-{arch}.run --check Expected output:\n1 2 [WARNING]: --check is meaningless for Ascend-docker-runtime and will be discarded in the future Verifying archive integrity... All good. Step 3: Add Executable Permission\r1 chmod u+x Ascend-docker-runtime_{version}_linux-{arch}.run Step 4: Execute Installation\rMethod 1: Install to Default Path (Recommended)\n1 ./Ascend-docker-runtime_{version}_linux-{arch}.run --install Method 2: Install to Custom Path\n1 ./Ascend-docker-runtime_{version}_linux-{arch}.run --install --install-path=\u0026lt;path\u0026gt; Successful installation output example:\n1 2 3 4 Uncompressing ascend-docker-runtime 100% [INFO]: installing ascend docker runtime ... [INFO] Ascend Docker Runtime install success Default Installation Path: /usr/local/Ascend/Ascend-Docker-Runtime/\nConfigure Containerd\rModify the Containerd configuration file /etc/containerd/config.toml according to the cgroup version used by your system.\nConfiguration Method 1: Cgroup v1\rTwo key configuration items need to be modified:\nruntime_type = \u0026quot;io.containerd.runtime.v1.linux\u0026quot; runtime = \u0026quot;/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime\u0026quot; Complete configuration example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.containerd.runtimes] [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.containerd.runtimes.runc] runtime_type = \u0026#34;io.containerd.runtime.v1.linux\u0026#34; runtime_engine = \u0026#34;\u0026#34; runtime_root = \u0026#34;\u0026#34; privileged_without_host_devices = false base_runtime_spec = \u0026#34;\u0026#34; [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.containerd.runtimes.runc.options] [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.cni] bin_dir = \u0026#34;/opt/cni/bin\u0026#34; conf_dir = \u0026#34;/etc/cni/net.d\u0026#34; max_conf_num = 1 conf_template = \u0026#34;\u0026#34; [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.registry] [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.registry.mirrors] [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.registry.mirrors.\u0026#34;docker.io\u0026#34;] endpoint = [\u0026#34;https://registry-1.docker.io\u0026#34;] [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.image_decryption] key_model = \u0026#34;\u0026#34; # ... other configurations ... [plugins.\u0026#34;io.containerd.monitor.v1.cgroups\u0026#34;] no_prometheus = false [plugins.\u0026#34;io.containerd.runtime.v1.linux\u0026#34;] shim = \u0026#34;containerd-shim\u0026#34; runtime = \u0026#34;/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime\u0026#34; runtime_root = \u0026#34;\u0026#34; no_shim = false shim_debug = false [plugins.\u0026#34;io.containerd.runtime.v2.task\u0026#34;] platforms = [\u0026#34;linux/amd64\u0026#34;] Configuration Method 2: Cgroup v2\rThe following key configuration item needs to be modified:\nBinaryName = \u0026quot;/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime\u0026quot; Complete configuration example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.containerd.default_runtime.options] [plugins.\u0026#34;io.containerd.grpc.v1.cri\u0026#34;.containerd.runtimes] [plugins.\u0026#34;io.containerd.grpc.v2.cri\u0026#34;.containerd.runtimes.runc] base_runtime_spec = \u0026#34;\u0026#34; cni_conf_dir = \u0026#34;\u0026#34; cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = \u0026#34;\u0026#34; runtime_path = \u0026#34;\u0026#34; runtime_root = \u0026#34;\u0026#34; runtime_type = \u0026#34;io.containerd.runc.v2\u0026#34; [plugins.\u0026#34;io.containerd.grpc.v2.cri\u0026#34;.containerd.runtimes.runc.options] BinaryName = \u0026#34;/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime\u0026#34; CriuImagePath = \u0026#34;\u0026#34; CriuPath = \u0026#34;\u0026#34; CriuWorkPath = \u0026#34;\u0026#34; IoGid = 0 IoUid = 0 NoNewKeyring = false NoPivotRoot = false Root = \u0026#34;\u0026#34; ShimCgroup = \u0026#34;\u0026#34; SystemdCgroup = true Tip: To check the cgroup version, execute stat -fc %T /sys/fs/cgroup/. Output cgroup2fs indicates v2, while tmpfs indicates v1.\nRestart Services\r1 2 systemctl daemon-reload systemctl restart containerd kubelet Verify Installation\rExecute the following command on the Kubernetes master node to confirm that the Ascend compute nodes are in normal status:\n1 kubectl get nodes Expected output example:\n1 2 3 NAME STATUS ROLES AGE VERSION k8s-master Ready master,worker 3d v1.28.12 k8s-worker Ready worker 3d v1.28.12 All nodes should show Ready status.\nDeploy Ascend Device Plugin\rAscend Device Plugin is responsible for managing and allocating NPU resources in Kubernetes.\nPrepare Images\r1. Pull Images\nExecute the following commands on Ascend compute nodes to pull the required images:\n1 2 3 4 5 6 7 8 9 # Pull all required images docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/resilience-controller:v7.1.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-operator:v7.2.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/npu-exporter:v7.2.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v7.2.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-controller-manager:v1.7.0-v7.2.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-scheduler:v1.7.0-v7.2.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:v7.2.RC1 docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/clusterd:v7.2.RC1 2. Export Images\n1 2 3 4 5 6 7 8 9 docker save -o ascend.tar \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/resilience-controller:v7.1.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-operator:v7.2.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/npu-exporter:v7.2.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v7.2.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-controller-manager:v1.7.0-v7.2.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-scheduler:v1.7.0-v7.2.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:v7.2.RC1 \\ swr.cn-south-1.myhuaweicloud.com/ascendhub/clusterd:v7.2.RC1 3. Import to Containerd\n1 ctr -n k8s.io images import ascend.tar Download Deployment Configuration Files\rVisit the official Git repository to download the Ascend Device Plugin installation package.\nExample: Ascend-mindxdl-device-plugin_7.2.RC1.SPC2_linux-aarch64.zip\nAfter extracting, copy the corresponding YAML files to the Kubernetes management node.\nChoose the Appropriate YAML File\rSelect the corresponding YAML file based on the actual device type and whether to use the Volcano scheduler:\nYAML Filename Use Case device-plugin-310-v{version}.yaml Atlas 300I inference card, without Volcano device-plugin-310-volcano-v{version}.yaml Atlas 300I inference card, with Volcano device-plugin-310P-1usoc-v{version}.yaml Atlas 200I SoC A1 core board, without Volcano device-plugin-310P-1usoc-volcano-v{version}.yaml Atlas 200I SoC A1 core board, with Volcano device-plugin-310P-v{version}.yaml Atlas inference series products (e.g., 310P), without Volcano device-plugin-310P-volcano-v{version}.yaml Atlas inference series products, with Volcano device-plugin-910-v{version}.yaml Atlas training series products/A2/A3/800I A2, without Volcano device-plugin-volcano-v{version}.yaml Atlas training series products/A2/A3/800I A2, with Volcano Note:\nFor Ascend 310P devices, typically choose device-plugin-310P-v{version}.yaml Do not modify the DaemonSet.metadata.name field in the YAML file to avoid automatic identification issues Deploy Device Plugin\r1 2 # Please modify the image name to: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v7.2.RC1 kubectl apply -f device-plugin-310P-v7.2.RC1.SPC2.yaml Expected output:\n1 2 3 4 serviceaccount/ascend-device-plugin-sa created clusterrole.rbac.authorization.k8s.io/pods-node-ascend-device-plugin-role created clusterrolebinding.rbac.authorization.k8s.io/pods-node-ascend-device-plugin-rolebinding created daemonset.apps/ascend-device-plugin-daemonset created Verify Deployment\rCheck if the Device Plugin started successfully:\n1 kubectl get pod -n kube-system | grep ascend Expected output (status should be Running):\n1 2 NAME READY STATUS RESTARTS AGE ascend-device-plugin-daemonset-d5ctz 1/1 Running 0 11s Check node NPU resources:\n1 kubectl describe node \u0026lt;node-name\u0026gt; | grep -A 5 \u0026#34;Capacity:\u0026#34; You should see the huawei.com/Ascend310P resource.\nUsing NPU Compute Cards\rPod Configuration Example\rRequest NPU resources through the resources field in the Pod definition:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: v1 kind: Pod metadata: name: npu-test-pod spec: containers: - name: alg-container image: ubuntu:22.04 resources: limits: memory: 24Gi huawei.com/Ascend310P: 1 # Request 1 NPU card requests: memory: 2Gi huawei.com/Ascend310P: 1 # Request 1 NPU card command: [\u0026#34;/bin/bash\u0026#34;, \u0026#34;-c\u0026#34;, \u0026#34;sleep infinity\u0026#34;] Note:\nThe value of huawei.com/Ascend310P indicates the number of NPU cards requested Values in limits and requests should be consistent Adjust the resource name according to the actual NPU model (e.g., 310, 910, etc.) ","date":"2026-01-08T00:00:00Z","image":"/p/ascend_on_k8s/ascend_on_k8s.png","permalink":"/en/p/ascend_on_k8s/","title":"Ascend NPU Deployment Guide for Kubernetes"},{"content":"Problem Background\rDuring server maintenance, I encountered a challenging issue: after inserting a new 1TB SAS hard drive into the server, the lsblk command couldn\u0026rsquo;t detect this new drive. After investigation, I found that although the hard drive had been recognized by the RAID card, it was in an Unconfigured(good) state and hadn\u0026rsquo;t been configured as a virtual disk, making it inaccessible to the operating system.\nThis article documents the complete process of troubleshooting and resolving this issue using the MegaCli tool.\nInstalling MegaCli Tool\rMegaCli is a command-line tool for managing LSI/Broadcom RAID cards. Here are the installation steps for Ubuntu/Debian systems:\nDownload MegaCli Package\r1 wget https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-14_MegaCLI.zip Extract the Package\r1 unzip 8-07-14_MegaCLI.zip Convert RPM Package to DEB Package\rSince the downloaded package is in RPM format, use the alien tool to convert it to DEB format:\n1 2 cd Linux sudo alien MegaCli-8.07.14-1.noarch.rpm Install the DEB Package\r1 sudo dpkg -i megacli_8.07.14-2_all.deb Fix Dependency Issues (If Needed)\rIf you encounter a missing libncurses.so.5 error, create a symbolic link:\n1 sudo ln -s /usr/lib/x86_64-linux-gnu/libncurses.so.6 /usr/lib/x86_64-linux-gnu/libncurses.so.5 Verify Installation\r1 /opt/MegaRAID/MegaCli/MegaCli64 -v Checking RAID Status\rUse the following command to view detailed information about all physical hard drives:\n1 /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL Through MegaCli\u0026rsquo;s output, you can clearly see the current status of all hard drives on the server.\nInvestigation Conclusion\rThe server has a total of 7 physical hard drives, and the newly inserted drive is located at Slot 6, with a current status of Unconfigured(good) (unconfigured but in good condition). It hasn\u0026rsquo;t been configured as a Virtual Drive yet, so the Linux operating system (lsblk) cannot recognize it.\nDetailed Hard Drive Distribution\rFor better understanding, the 7 hard drives are categorized into 4 groups by purpose:\nSystem Drive (corresponds to /dev/sda)\rLocation: Slot 5 Model: Intel 120GB SSD Status: Online Configuration: Single-disk RAID 0 (or passthrough mode) Purpose: System boot drive Data Drive Array (corresponds to /dev/sdb, capacity 2.7TB)\rLocation: Slot 0, 1, 2, 3 Model: 4 Seagate 1TB SAS drives Status: Online Configuration: RAID 5 array (3 data blocks + 1 parity = 3TB usable capacity) Data Drive (corresponds to /dev/sdc, capacity 931GB)\rLocation: Slot 7 Model: Seagate 1TB SAS drive Status: Online Configuration: Single-disk RAID 0 🔍 Key Finding: Newly Inserted Drive (Slot 6)\rLocation: Slot 6 Model: Toshiba 1TB SAS drive Status: Unconfigured(good) (unconfigured but in good condition) Foreign State: None (no foreign configuration) Other Error Count: 9 (some historical error counts, not affecting current recognition) Current Situation: The drive is physically connected properly and recognized by the RAID card, but since it hasn\u0026rsquo;t been added to any RAID group or created as a virtual disk, the RAID card won\u0026rsquo;t present it to the operating system Problem Analysis: Why Wasn\u0026rsquo;t It Auto-Recognized?\rReason One: Foreign State is None\rThe output shows Foreign State: None, indicating that the RAID card hasn\u0026rsquo;t detected any recognizable old RAID configuration information on this disk (or it has been cleared).\nIf it was previously Linux software RAID (mdadm): Hardware RAID cards cannot recognize software RAID metadata and will treat it as an empty disk If it was previously hardware RAID: The RAID metadata may be incompatible or has been cleared Reason Two: Manual Virtual Disk Creation Required\rOn RAID cards like Dell PERC, physical disks must be configured as Virtual Drives (VD) before the operating system can access them.\nSolution: Configure the Drive Online\rSince the drive status is Unconfigured(good), it needs to be configured as a single-disk RAID 0 for the operating system to recognize it.\nCreate Single-Disk RAID 0\rExecute the following command to configure the Slot 6 drive as a virtual disk:\n1 2 3 4 # Configure Slot 6 drive as single-disk RAID 0 # [32:6] represents Enclosure 32, Slot 6 # -a0 represents adapter 0 sudo /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r0 [32:6] -a0 Expected Result\rAfter successful command execution, the terminal will display:\n1 Adapter 0: Created VD 3 Run the lsblk command again, and you should see the newly added /dev/sdd device.\n⚠️ Data Safety Notice\rImportant: The -CfgLdAdd command creates RAID structure and rewrites disk header metadata, which may affect access to original data.\nData Recovery Options for Different Scenarios\rOriginal Drive Configuration Foreign State Data Recovery Possibility Linux Software RAID (mdadm) None After configuration, try using mdadm --assemble --scan to recover data Hardware RAID None RAID card no longer recognizes old configuration, can only mount as new disk, recovery depends on partition table integrity Brand New Drive None No data loss risk, can use directly Refresh System Device List\rAfter configuration, if the system still hasn\u0026rsquo;t automatically recognized the new device, you can manually refresh the SCSI bus:\n1 2 3 4 # Scan all SCSI host controllers to make the system re-recognize drives for host in /sys/class/scsi_host/host*/scan; do echo \u0026#34;- - -\u0026#34; \u0026gt; \u0026#34;$host\u0026#34; done After execution, use lsblk or fdisk -l to verify if the new hard drive is visible.\nSummary\rKey steps in this troubleshooting process:\n✅ Install MegaCli tool for RAID card management ✅ Use -PDList command to view all physical drive statuses ✅ Identify the problem: New drive is in Unconfigured(good) status ✅ Use -CfgLdAdd command to create single-disk RAID 0 ✅ Refresh system device list and verify drive is online Through this investigation, I gained a deep understanding of how hardware RAID cards work: physical drives must first be configured as virtual disks before the operating system can access them. In daily operations, when encountering hard drive recognition issues, you should first check the RAID card level configuration status rather than troubleshooting only at the operating system level.\n","date":"2025-12-30T00:00:00Z","image":"/p/raid_error/raid_error.png","permalink":"/en/p/raid_error/","title":"Troubleshooting RAID Hard Drive Recognition Issues"},{"content":"Mounting WebDAV on Linux\rThis document describes how to mount WebDAV shares on Linux systems using davfs2, with Ubuntu 24.04 as an example.\nInstalling davfs2\rRun the following command in the terminal to install the davfs2 package:\n1 2 sudo apt-get update sudo apt-get install davfs2 During installation, you may be prompted whether to allow non-root users to mount WebDAV. You can use the arrow keys to switch to the \u0026ldquo;Yes\u0026rdquo; option.\nCreating a Mount Point Directory\rCreate a directory to serve as the mount point:\n1 sudo mkdir /mnt/webdav Configuring davfs2\rEdit the davfs2.conf file to configure davfs2. Open the configuration file:\n1 sudo nano /etc/davfs2/davfs2.conf Find the use_locks configuration option in the file and ensure its value is set to 0. This disables file locking, as some WebDAV servers do not support locking.\n1 use_locks 0 Save and close the file (press Ctrl+X, then press Y, and finally press Enter).\nConfiguring the davfs2 secrets File\rCreate a secrets file to store the username and password for the WebDAV server. Run the following command in the terminal:\n1 sudo nano /etc/davfs2/secrets Add a line similar to the following, replacing it with your WebDAV server\u0026rsquo;s username and password:\n1 http://your-webdav-url username password Replace the content in the above line with your actual information:\nhttp://your-webdav-url - Your WebDAV server address username - Your username password - Your password Save and close the file.\nSetting File Permissions\rTo ensure the password in the secrets file is secure, set the file permissions:\n1 sudo chmod 600 /etc/davfs2/secrets Mounting the WebDAV Share\rUse the mount command to mount the WebDAV share to the previously created mount point:\n1 sudo mount -t davfs http://your-webdav-url /mnt/webdav Automatic Mounting on Boot (Optional)\rIf you need to mount automatically on boot, you can edit the /etc/fstab file:\n1 sudo nano /etc/fstab Add the following line:\n1 http://your-webdav-url /mnt/webdav davfs user,noauto 0 0 Unmounting the WebDAV Share\rWhen you need to unmount, run:\n1 sudo umount /mnt/webdav ","date":"2025-11-11T00:00:00Z","image":"/p/webdav_on_linux/webdav_on_linux_en.png","permalink":"/en/p/webdav_on_linux/","title":"Mounting WebDAV on Linux"},{"content":"NVIDIA GPU Driver Persistence Configuration and Troubleshooting\rOverview\rThis article documents a monitoring anomaly issue caused by GPU driver non-persistent mode, and introduces the principles and configuration methods of NVIDIA GPU driver persistence.\nProblem Symptoms\rDuring algorithm program stress testing, the Grafana monitoring dashboard revealed that the Nvidia Exporter service was running unstably, showing intermittent behavior:\nInitial Investigation\rRuling Out Prometheus Scrape Issues\nManually executing the curl http://localhost:9835/metrics command on the target GPU server resulted in a timeout, confirming that the issue was with the Exporter service itself.\nAdjusting Log Level\nThe Nvidia Exporter log level was adjusted to debug, but no obvious error messages were found.\nRoot Cause Identification\nManually executing the query command used internally by Nvidia Exporter:\n1 nvidia-smi --query-gpu=timestamp,driver_version,vgpu_driver_capability.heterogenous_multivGPU,count,name,serial,uuid,pci.bus_id,pci.domain,pci.bus,pci.device,pci.baseClass,pci.subClass,pci.device_id,pci.sub_device_id,vgpu_device_capability.fractional_multiVgpu,vgpu_device_capability.heterogeneous_timeSlice_profile,vgpu_device_capability.heterogeneous_timeSlice_sizes,vgpu_device_capability.homogeneous_placements,pcie.link.gen.current,pcie.link.gen.gpucurrent,pcie.link.gen.max,pcie.link.gen.gpumax,pcie.link.gen.hostmax,pcie.link.width.current,pcie.link.width.max,index,display_mode,display_active,persistence_mode,addressing_mode,accounting.mode,accounting.buffer_size,driver_model.current,driver_model.pending,vbios_version,inforom.img,inforom.oem,inforom.ecc,inforom.pwr,gpu_recovery_action,gom.current,gom.pending,fan.speed,pstate,clocks_event_reasons.supported,clocks_event_reasons.active,clocks_event_reasons.gpu_idle,clocks_event_reasons.applications_clocks_setting,clocks_event_reasons.sw_power_cap,clocks_event_reasons.hw_slowdown,clocks_event_reasons.hw_thermal_slowdown,clocks_event_reasons.hw_power_brake_slowdown,clocks_event_reasons.sw_thermal_slowdown,clocks_event_reasons.sync_boost,memory.total,memory.reserved,memory.used,memory.free,compute_mode,compute_cap,utilization.gpu,utilization.memory,utilization.encoder,utilization.decoder,utilization.jpeg,utilization.ofa,encoder.stats.sessionCount,encoder.stats.averageFps,encoder.stats.averageLatency,dramEncryption.mode.current,dramEncryption.mode.pending,ecc.mode.current,ecc.mode.pending,ecc.errors.corrected.volatile.device_memory,ecc.errors.corrected.volatile.dram,ecc.errors.corrected.volatile.register_file,ecc.errors.corrected.volatile.l1_cache,ecc.errors.corrected.volatile.l2_cache,ecc.errors.corrected.volatile.texture_memory,ecc.errors.corrected.volatile.cbu,ecc.errors.corrected.volatile.sram,ecc.errors.corrected.volatile.total,ecc.errors.corrected.aggregate.device_memory,ecc.errors.corrected.aggregate.dram,ecc.errors.corrected.aggregate.register_file,ecc.errors.corrected.aggregate.l1_cache,ecc.errors.corrected.aggregate.l2_cache,ecc.errors.corrected.aggregate.texture_memory,ecc.errors.corrected.aggregate.cbu,ecc.errors.corrected.aggregate.sram,ecc.errors.corrected.aggregate.total,ecc.errors.uncorrected.volatile.device_memory,ecc.errors.uncorrected.volatile.dram,ecc.errors.uncorrected.volatile.register_file,ecc.errors.uncorrected.volatile.l1_cache,ecc.errors.uncorrected.volatile.l2_cache,ecc.errors.uncorrected.volatile.texture_memory,ecc.errors.uncorrected.volatile.cbu,ecc.errors.uncorrected.volatile.sram,ecc.errors.uncorrected.volatile.total,ecc.errors.uncorrected.aggregate.device_memory,ecc.errors.uncorrected.aggregate.dram,ecc.errors.uncorrected.aggregate.register_file,ecc.errors.uncorrected.aggregate.l1_cache,ecc.errors.uncorrected.aggregate.l2_cache,ecc.errors.uncorrected.aggregate.texture_memory,ecc.errors.uncorrected.aggregate.cbu,ecc.errors.uncorrected.aggregate.sram,ecc.errors.uncorrected.aggregate.total,ecc.errors.uncorrected.volatile.sram.parity,ecc.errors.uncorrected.volatile.sram.secded,ecc.errors.uncorrected.aggregate.sram.parity,ecc.errors.uncorrected.aggregate.sram.secded,ecc.errors.uncorrected.aggregate.sram.thresholdExceeded,ecc.errors.uncorrected.aggregate.sram.l2,ecc.errors.uncorrected.aggregate.sram.sm,ecc.errors.uncorrected.aggregate.sram.mcu,ecc.errors.uncorrected.aggregate.sram.pcie,ecc.errors.uncorrected.aggregate.sram.other,retired_pages.single_bit_ecc.count,retired_pages.double_bit.count,retired_pages.pending,remapped_rows.correctable,remapped_rows.uncorrectable,remapped_rows.pending,remapped_rows.failure,remapped_rows.histogram.max,remapped_rows.histogram.high,remapped_rows.histogram.partial,remapped_rows.histogram.low,remapped_rows.histogram.none,temperature.gpu,temperature.gpu.tlimit,temperature.memory,power.management,power.draw,power.draw.average,power.draw.instant,power.limit,enforced.power.limit,power.default_limit,power.min_limit,power.max_limit,module.power.draw.average,module.power.draw.instant,module.power.limit,module.enforced.power.limit,module.power.default_limit,module.power.min_limit,module.power.max_limit,clocks.current.graphics,clocks.current.sm,clocks.current.memory,clocks.current.video,clocks.applications.graphics,clocks.applications.memory,clocks.default_applications.graphics,clocks.default_applications.memory,clocks.max.graphics,clocks.max.sm,clocks.max.memory,mig.mode.current,mig.mode.pending,gsp.mode.current,gsp.mode.default,c2c.mode,protected_memory.total,protected_memory.used,protected_memory.free,fabric.state,fabric.status,platform.chassis_serial_number,platform.slot_number,platform.tray_index,platform.host_id,platform.peer_type,platform.module_id,platform.gpu_fabric_guid --format=csv Key Finding: The command execution time fluctuated between 3-10 seconds, which was clearly abnormal. The test environment had 8 GPUs in total, with 2 being occupied by the algorithm program and the remaining 6 idle.\nAfter reviewing the NVIDIA official documentation on GPU driver persistence, we attempted to enable persistent mode.\nSolution\rTemporarily Enable Persistent Mode\rExecute the following command to immediately enable GPU driver persistence:\n1 nvidia-smi -pm 1 After executing the query command again, the response time dropped to milliseconds, problem solved.\nConfigure Automatic Startup\rTo ensure the persistence configuration takes effect after system reboot, a systemd service needs to be configured:\n1. Create Service Configuration File\n1 sudo vim /usr/lib/systemd/system/nvidia-persistenced.service 2. Add the Following Content\n1 2 3 4 5 6 7 8 9 10 11 12 [Unit] Description=NVIDIA Persistence Daemon Wants=network.target [Service] Type=forking PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid ExecStart=/usr/bin/nvidia-persistenced --persistence-mode ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced [Install] WantedBy=multi-user.target 3. Enable and Start the Service\n1 sudo systemctl enable nvidia-persistenced.service --now Verification\rAfter configuration, the monitoring system returned to normal, with stable GPU usage collection:\nTechnical Principles\rGPU Driver Loading Mechanism\rNVIDIA GPU interaction depends on the kernel mode driver, which operates in two modes:\nPersistent Mode: The driver remains continuously active On-Demand Loading Mode: The driver loads only when a program uses the GPU Driver Lifecycle\rInitialization Phase\nWhen the first program attempts to interact with the GPU, if the kernel driver is not running, the system triggers driver loading and GPU device initialization.\nDe-initialization Phase\nAfter all GPU client programs exit, the driver executes GPU de-initialization operations, essentially \u0026ldquo;shutting down\u0026rdquo; the GPU device.\nImpact on Users\rApplication Startup Delay\rWhen GPU initialization is triggered for the first time, operations such as ECC memory checks cause a delay of 1-3 seconds. If the GPU is already initialized, there is no such delay.\nDriver State Loss\rAfter GPU de-initialization, non-persistent state information (such as power limits, clock frequency configurations, etc.) is lost and restored to default values upon the next initialization. Enabling persistent mode avoids this issue.\nPlatform Differences\rWindows Platform\rOn Windows systems, the kernel driver loads at system startup and remains running until system shutdown. Therefore, Windows users typically do not need to be concerned about driver persistence issues.\nNote: Driver reload events (such as TDR triggers or driver updates) will cause non-persistent state resets.\nLinux Platform\rLinux system behavior depends on the runtime environment:\nGraphical Environment\nIf the X Server runs on the target GPU, the kernel driver typically remains active from boot to shutdown, maintained by the X process connection.\nHeadless Server Environment\nOn servers without a graphical interface (Headless Server), if there is no long-running GPU client, each application start and stop will trigger driver loading and unloading. This is extremely common in High-Performance Computing (HPC) and Data Center environments, which was the root cause of this incident.\nBest Practice Recommendations\rStrongly recommended for production environments to enable GPU driver persistence, especially in headless server scenarios Use systemd service to ensure persistence configuration automatically takes effect after system reboot The monitoring system should be thoroughly tested after enabling persistence to verify the stability of metric collection Regularly check the nvidia-persistenced service status to ensure it is running properly References\rNVIDIA Driver Persistence Official Documentation ","date":"2025-11-07T00:00:00Z","image":"/p/nvidia_gpu_driver_persistence/nvidia_gpu_driver_persistence.png","permalink":"/en/p/nvidia_gpu_driver_persistence/","title":"NVIDIA GPU Driver Persistence"},{"content":"Overview\rRedroid (Remote Android) is a container-based Android runtime environment that can run a complete Android system on Linux servers. This article provides a detailed guide on how to build a fully functional custom Redroid image from AOSP source code, integrating advanced features such as GPS positioning, battery management, and Magisk Root.\nUse Cases\rMobile application development and testing Automated testing environment setup Android reverse engineering and security research Cloud-based Android service deployment Prerequisites\rBefore starting, please ensure your environment meets the following requirements:\nHardware Requirements\rCPU: Recommended 8+ cores, supporting x86_64 architecture Memory: At least 16GB RAM, 32GB recommended Storage: At least 100GB available space (SSD recommended) Software Environment\rOperating System: Ubuntu 20.04/22.04 LTS or other modern Linux distributions Docker: Version 20.10 or higher Git: Version 2.25 or higher Python 3: For running build scripts Network Requirements\rStable network connection (large amount of source code to download) If encountering network issues, recommend using Tsinghua mirror sources Step 1: Environment Setup and Source Code Download\r1.1 Create Working Directory\r1 2 3 # Create dedicated working directory cd /data mkdir redroid \u0026amp;\u0026amp; cd redroid Install Google Repo Tool\rRepo is a tool developed by Google for managing multiple Git repositories, essential for AOSP projects.\n1 2 3 4 5 6 7 # Download and install repo tool curl -k https://storage.googleapis.com/git-repo-downloads/repo \u0026gt; /usr/local/bin/repo chmod a+x /usr/local/bin/repo # Configure Git user information (required) git config --global user.email \u0026#34;your-email@example.com\u0026#34; git config --global user.name \u0026#34;YourName\u0026#34; Note: Please replace the email and username with your actual information.\nInitialize AOSP Source Repository\r1 2 3 4 5 6 7 # Initialize repo using Tsinghua mirror for accelerated download repo init -u https://mirrors.tuna.tsinghua.edu.cn/git/AOSP/platform/manifest \\ --git-lfs --depth=1 -b android-15.0.0_r36 # Add Redroid-specific local manifest configuration git clone https://git.coderkang.top/Android/local_manifests.git \\ .repo/local_manifests -b 15.0.0 Sync Source Code\r1 2 # Start syncing source code (time-consuming, please be patient) repo sync -c Tip: Initial sync may take 2-4 hours, depending on network conditions. Recommend using screen or tmux to run in background.\nStep 2: Magisk Module Integration\rMagisk is a popular Android Root solution that provides system-level permission management.\nExtract Magisk Components\r1 2 3 # Enter Magisk directory and execute extraction script cd /data/redroid/vendor/magisk python3 magisk.py This script will automatically download the latest version of Magisk APK and extract necessary binary files and modules, integrating them into the system image.\nStep 3: Apply Custom Patches\rDownload and Apply Redroid Patches\r1 2 3 4 5 6 7 8 # Return to main directory cd /data # Get Redroid-specific patch set git clone https://git.coderkang.top/Android/redroid-patches.git # Apply all patches to source code ./redroid-patches/apply-patch.sh /data/redroid These patches include:\nGPS Functionality: Enable location service support Battery Management: Simulate battery status and power management Network Enhancement: Improve network connection stability Performance Optimization: Performance tuning for container environments Step 4: Docker Build Environment Setup\rCreate Build Container\r1 2 3 4 5 6 7 8 9 10 # Get build tools git clone https://git.coderkang.top/Android/redroid-doc.git cd redroid-doc/android-builder-docker # Build Docker image (contains all build dependencies) docker build \\ --build-arg userid=$(id -u) \\ --build-arg groupid=$(id -g) \\ --build-arg username=$(id -un) \\ -t redroid-builder . Start Build Environment\r1 2 3 4 5 6 # Start build container, mount source code directory docker run -it --rm \\ --hostname redroid-builder \\ --name redroid-builder \\ -v /data/redroid:/src \\ redroid-builder Step 5: Compile Android System\rConfigure Build Environment\r1 2 3 4 5 6 7 8 # Execute the following commands inside the container cd /src # Initialize build environment . build/envsetup.sh # Select build target (ARM64 architecture, user debug version) lunch redroid_arm64_only-ap3a-userdebug Start Compilation\r1 2 # Start compilation process (time-consuming, recommend using -j parameter to specify parallelism) m -j$(nproc) Compilation Time: Depending on hardware configuration, compilation usually takes 1-3 hours.\nStep 6: Image Packaging and Deployment\rCreate Docker Image\rAfter compilation is complete, the generated system files need to be packaged into a Docker image.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # Exit build container, return to host machine exit # Enter compilation output directory cd /data/redroid/out/target/product/redroid_arm64_only # Mount system images (read-only) sudo mount system.img system -o ro sudo mount vendor.img vendor -o ro # Package as Docker image sudo tar --xattrs -c vendor -C system --exclude=\u0026#34;./vendor\u0026#34; . | \\ docker import -c \u0026#39;ENTRYPOINT [\u0026#34;/init\u0026#34;, \u0026#34;androidboot.hardware=redroid\u0026#34;, \u0026#34;ro.setupwizard.mode=DISABLED\u0026#34;]\u0026#39; \\ - redroid:custom # Unmount image files sudo umount system vendor Verify Image\r1 2 3 4 5 6 7 8 # View created image docker images | grep redroid # Test starting container docker run -itd --rm --memory-swappiness=0 \\ --name redroid-test \\ -p 5555:5555 \\ redroid:custom Function Verification and Usage\rVerify Root Permissions\r1 2 3 4 5 # Connect to Redroid container adb connect localhost:5555 # Verify Root permissions adb shell su -c \u0026#34;id\u0026#34; Verify GPS Functionality\r1 2 # Check GPS service status adb shell dumpsys location Verify Battery Management\r1 2 # View battery status adb shell dumpsys battery Common Issues and Solutions\rCompilation Errors\rIssue: Insufficient memory causing compilation failure\n1 2 # Solution: Reduce parallelism m -j4 # Use fewer parallel tasks Issue: Insufficient disk space\n1 2 # Solution: Clean compilation cache make clean Runtime Issues\rIssue: Container cannot start\n1 2 3 4 5 6 7 # Check kernel modules lsmod | grep binder lsmod | grep ashmem # If missing, load modules sudo modprobe binder_linux sudo modprobe ashmem_linux Issue: ADB connection failure\n1 2 3 # Restart ADB service adb kill-server adb start-server Optimization Recommendations\rPerformance Optimization\rMemory Configuration: Allocate sufficient memory to containers (recommended 4GB+) CPU Configuration: Enable CPU hotplug support Storage Optimization: Use SSD storage to improve I/O performance Security Considerations\rNetwork Isolation: Use custom Docker networks Permission Control: Limit container permissions, avoid privileged mode Data Persistence: Properly configure data volume mounts Summary\rThrough the detailed guidance in this article, you have successfully built a fully functional custom Redroid image. This image integrates advanced features such as GPS, battery management, and Magisk Root, meeting various development and testing needs.\nIn practical use, you can further customize system configurations, add more functional modules, or optimize performance parameters according to specific requirements. Redroid\u0026rsquo;s flexibility makes it an ideal choice for Android development and testing.\nNext Steps\rExplore more Magisk module integrations Configure Continuous Integration/Continuous Deployment (CI/CD) Set up multi-instance cluster environments Integrate automated testing frameworks ","date":"2025-09-17T00:00:00Z","image":"/p/custom_redroid/banner.png","permalink":"/en/p/custom_redroid/","title":"Custom Redroid Image: From Source Code Build to Feature Enhancement"},{"content":"Introduction\rIn a previous post, I explained how to configure Redroid in Docker and obtain root access with Magisk. As some open-source modules raised their minimum Magisk version requirements, this article adopts ayasa520/redroid-script to rebuild the image and update the root approach. It also introduces AutoJS to handle autostart and automation-click scenarios, documenting the complete flow from building and composing to troubleshooting.\nCompliance \u0026amp; Risk Notice\nThis article is for learning and research only. Operations involving certificates, root, system modules, and network redirection may pose compliance and security risks. Ensure you have proper authorization for any production use or actions involving third-party systems. Device identifiers in the article (e.g., IMEI) are examples—replace them with your own compliant, fictitious values.\nEnvironment \u0026amp; Prerequisites\rDocker / Docker Compose A Linux host capable of running Redroid ADB / Scrcpy (for debugging and screen casting) Base apps: MT Manager, Termux, AutoJsPro, JustTrustMe Magisk modules: LSPosed, AlwaysTrustUserCerts, Systemless Hosts Unified resource pack (download link at the end) Build a Redroid Image with Magisk Included\rFetch the script and install dependencies\r1 2 3 4 5 git clone https://github.com/ayasa520/redroid-script cd redroid-script apt update \u0026amp;\u0026amp; apt install -y lzip pip install -r requirements.txt Extend Android version choices (example: 14/15)\rIf you need Android 14/15, add the options in redroid.py:\n1 2 3 4 5 6 7 8 9 10 def main(): ... parser.add_argument(\u0026#39;-a\u0026#39;, \u0026#39;--android-version\u0026#39;, dest=\u0026#39;android\u0026#39;, help=\u0026#39;Specify the Android version to build\u0026#39;, default=\u0026#39;11.0.0\u0026#39;, - choices=[\u0026#39;13.0.0\u0026#39;, \u0026#39;12.0.0\u0026#39;, \u0026#39;12.0.0_64only\u0026#39;, \u0026#39;11.0.0\u0026#39;, \u0026#39;10.0.0\u0026#39;, \u0026#39;9.0.0\u0026#39;, \u0026#39;8.1.0\u0026#39;]) + choices=[\u0026#39;15.0.0_64only\u0026#39;, \u0026#39;15.0.0\u0026#39;, \u0026#39;14.0.0_64only\u0026#39;, \u0026#39;14.0.0\u0026#39;, + \u0026#39;13.0.0_64only\u0026#39;, \u0026#39;13.0.0\u0026#39;, \u0026#39;12.0.0\u0026#39;, \u0026#39;12.0.0_64only\u0026#39;, + \u0026#39;11.0.0\u0026#39;, \u0026#39;10.0.0\u0026#39;, \u0026#39;9.0.0\u0026#39;, \u0026#39;8.1.0\u0026#39;]) Build the image with Magisk\r1 2 python3 redroid.py -a 15.0.0_64only -m # New image: redroid/redroid:15.0.0_64only_magisk Launch Redroid\rExample docker-compose.yaml:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 name: android services: redroid_15_1: image: redroid/redroid:15.0.0_64only_magisk container_name: redroid_15_1 restart: unless-stopped privileged: true networks: android: ipv4_address: 172.18.0.252 ports: - \u0026#34;45555:5555\u0026#34; volumes: - ./redroid_15_1:/data command: - androidboot.hardware=mt6891 - androidboot.hwc=CN - androidboot.redroid_height=2400 - androidboot.redroid_width=1080 - ro.boot.hwc=CN - ro.product.manufacturer=Xiaomi - ro.product.brand=Xiaomi - ro.product.model=2211133C - ro.product.marketname=Xiaomi 13 - ro.product.device=fuxi - ro.product.name=fuxi - ro.build.product=fuxi - ro.product.mod_device=fuxi - ro.secure=0 - ro.product.locale=zh-CN - ro.product.locale.language=zh - ro.product.locale.region=CN - persist.sys.locale=zh-CN - persist.sys.locale_list=zh-CN,en-US - persist.sys.timezone=Asia/Shanghai - persist.sys.time_12_24=24 networks: android: driver: bridge ipam: config: - subnet: 172.18.0.0/16 Startup \u0026amp; connection:\n1 2 3 4 5 6 7 docker compose up -d docker compose ps docker compose logs -f redroid_15_1 adb connect 127.0.0.1:45555 adb devices adb shell whoami Once running properly, the ADB service is exposed on port 45555. Install Base Apps \u0026amp; Modules\rApps (from the unified pack at the end)\nMT Manager Termux AutoJsPro JustTrustMe Magisk modules\nLSPosed AlwaysTrustUserCerts Systemless Hosts Recommended order: install the user certificate first (see below), then enable AlwaysTrustUserCerts, and enable/adjust other modules as needed to improve Zygisk activation stability.\nRoot Authorization \u0026amp; the su Path\rSymptoms\nRunning su in Termux correctly triggers the Magisk grant dialog. MT Manager and Shizuku fail to obtain root. logcat shows no obvious errors. Resolution\nExplicitly set the su path to /sbin/su. Reference issue: https://github.com/ayasa520/redroid-script/issues/47#issuecomment-3242690759. Notes Different apps discover and invoke root differently. An explicit path avoids visibility issues caused by environment variables or mount strategies. If you customized PATH, SELinux, or overlayfs, evaluate their impact accordingly.\nTroubleshooting Zygisk Not Activating\rSymptom Magisk’s Zygisk keeps saying “restart required,” but remains inactive after reboot. Logs are sparse.\nReusable steps\nInstall the user certificate in system settings first. Enable AlwaysTrustUserCerts via LSPosed/module manager. Reboot Redroid so Zygisk and related modules can initialize and mount properly. If it still fails, try clearing module caches, check LSPosed version compatibility, and if necessary roll back to a stable Magisk/Redroid combination.\nDeploy AutoJS for Automation\rUse cases Execute automated clicks, heartbeats, or form-filling after specific apps launch.\nCompliance Reminder The following is for your own environment and authorized testing only. Do not apply it to apps or services without explicit permission.\nPrepare the AutoJS service\r1 2 3 4 5 6 7 # Extract autojserver.tar.gz from the resource pack to your compose directory tar -zxvf autojserver.tar.gz cd autojserver chmod +x main.sh bash main.sh # Initialization generates a self-signed certificate and other runtime files Compose setup\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 name: android services: redroid_15_1: image: redroid/redroid:15.0.0_64only_magisk container_name: redroid_15_1 restart: unless-stopped privileged: true networks: android: ipv4_address: 172.18.0.252 ports: - \u0026#34;45555:5555\u0026#34; volumes: - ./redroid_15_1:/data command: - androidboot.hardware=mt6891 - androidboot.hwc=CN - androidboot.redroid_height=2400 - androidboot.redroid_width=1080 - ro.boot.hwc=CN - ro.product.manufacturer=Xiaomi - ro.product.brand=Xiaomi - ro.product.model=2211133C - ro.product.marketname=Xiaomi 13 - ro.product.device=fuxi - ro.product.name=fuxi - ro.build.product=fuxi - ro.product.mod_device=fuxi - ro.secure=0 - ro.product.locale=zh-CN - ro.product.locale.language=zh - ro.product.locale.region=CN - persist.sys.locale=zh-CN - persist.sys.locale_list=zh-CN,en-US - persist.sys.timezone=Asia/Shanghai - persist.sys.time_12_24=24 autojserver: image: autojserver:latest build: context: ./autojserver dockerfile: Dockerfile container_name: autojserver restart: unless-stopped working_dir: /data networks: android: ipv4_address: 172.18.0.251 command: [\u0026#34;python\u0026#34;, \u0026#34;/data/main.py\u0026#34;] volumes: - ./autojserver:/data networks: android: driver: bridge ipam: config: - subnet: 172.18.0.0/16 Start and view logs\r1 2 3 docker compose up -d docker compose ps docker compose logs -f autojserver Inject certificates and hosts\r1 2 3 4 5 # User certificate: for manual installation in system settings cp ./autojserver/ca.crt ./redroid_15_1/media/0/Download/ # Systemless Hosts: ensure the module is installed cp -rf ./autojserver/hosts ./redroid_15_1/adb/modules/hosts/system/etc/ Then, in Settings → Security → Encryption \u0026amp; credentials (or Certificate management) → Install from storage, install the ca.crt user certificate. In JustTrustMe, check AutoJS / target apps in the scope. Reboot Redroid.\nResource Download\rUnified resource pack\nApps: MT Manager, Termux, AutoJsPro, JustTrustMe Modules: LSPosed, AlwaysTrustUserCerts, Systemless Hosts ","date":"2025-09-02T00:00:00Z","image":"/p/autojs_on_redroid/banner.png","permalink":"/en/p/autojs_on_redroid/","title":"Using AutoJS on Redroid: Build, Troubleshooting, and Automation"},{"content":"Based on PaddlePaddle deployment on Ascend 310P, use PyInstaller to package PaddleOCR.\nScope and prerequisites\rTarget: Package a PaddleOCR-based CLI program for Ascend 310P NPU runtime. OS/Arch: Linux x86_64 (example paths use Conda + Python 3.11). Key versions: Python: 3.11 (adjust paths if you use a different version) PyInstaller: 6.x or newer PaddlePaddle (Ascend build) and runtime for 310P is correctly installed Tip: Verify environment 1 2 3 python -V pyinstaller --version python -c \u0026#34;import paddle, cv2; print(paddle.__version__, cv2.__version__)\u0026#34; Spec file\rPackage with the following test_ocr.spec:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 # -*- mode: python ; coding: utf-8 -*- from PyInstaller.utils.hooks import copy_metadata, collect_data_files datas = [(\u0026#39;/opt/PaddleX/paddlex\u0026#39;, \u0026#39;paddlex\u0026#39;)] # Python package directory of the PaddleX project source code datas += [(\u0026#39;/root/miniconda3/envs/asr_ocr_npu/lib/python3.11/site-packages/paddle\u0026#39;, \u0026#39;paddle\u0026#39;)] # Directory of the Paddle PyPI package datas += collect_data_files(\u0026#39;lmdb\u0026#39;) datas += copy_metadata(\u0026#39;ftfy\u0026#39;) datas += copy_metadata(\u0026#39;lxml\u0026#39;) datas += copy_metadata(\u0026#39;opencv-contrib-python\u0026#39;) datas += copy_metadata(\u0026#39;pyclipper\u0026#39;) datas += copy_metadata(\u0026#39;pypdfium2\u0026#39;) datas += copy_metadata(\u0026#39;ultra-infer-npu-python\u0026#39;) datas += copy_metadata(\u0026#39;scikit-learn\u0026#39;) a = Analysis( [\u0026#39;./test_ocr.py\u0026#39;], # Entry script to be packaged pathex=[], binaries=[(\u0026#39;/root/miniconda3/envs/asr_ocr_npu/lib/python3.11/site-packages/paddle/libs\u0026#39;, \u0026#39;.\u0026#39;)], # Directory of Paddle shared libraries datas=datas, hiddenimports=[\u0026#39;cv2\u0026#39;, \u0026#39;pypdfium2\u0026#39;, \u0026#39;ultra_infer\u0026#39;, \u0026#39;paddle.cinn_config\u0026#39;], hookspath=[], hooksconfig={}, runtime_hooks=[], excludes=[], noarchive=False, optimize=0, ) pyz = PYZ(a.pure) exe = EXE( pyz, a.scripts, [], exclude_binaries=True, name=\u0026#39;test_ocr\u0026#39;, debug=False, bootloader_ignore_signals=False, strip=False, upx=True, console=True, disable_windowed_traceback=False, argv_emulation=False, target_arch=None, codesign_identity=None, entitlements_file=None, ) coll = COLLECT( exe, a.binaries, a.datas, strip=False, upx=True, upx_exclude=[], name=\u0026#39;test_ocr\u0026#39;, ) Build\r1 pyinstaller ./test_ocr.spec What each section does\rdatas: include packages and non-Python assets required at runtime (e.g., ftfy, lxml, opencv-contrib-python). binaries: include Paddle shared libraries under paddle/libs so the app runs without Paddle preinstalled. hiddenimports: add modules PyInstaller may miss (e.g., cv2, paddle.cinn_config, ultra_infer). UPX: reduces size. If startup issues occur, try upx=False. One-folder vs one-file: this builds a folder via COLLECT; it’s more reliable for large frameworks. ","date":"2025-08-10T00:00:00Z","image":"/p/pyinstaller_paddleocr_ascend310p/pyinstaller_paddleocr_ascend310P.png","permalink":"/en/p/pyinstaller_paddleocr_ascend310p/","title":"Using PyInstaller to package PaddleOCR on Ascend 310P environment"},{"content":"Introduction\rSMB (Server Message Block) is a network file sharing protocol widely used for sharing files and printers between different operating systems. Installing and configuring SMB service on Ubuntu 22.04 makes it convenient to share files between Linux and Windows systems. This article will provide detailed instructions on how to install, configure, and use SMB service on Ubuntu 22.04.\nInstalling SMB Service\rUpdate System Packages\rFirst, ensure system packages are up to date:\n1 2 sudo apt update sudo apt upgrade -y Install Samba\rSamba is the open-source software package that implements the SMB protocol on Linux systems:\n1 sudo apt install samba samba-common-bin -y Verify Installation\rCheck the Samba service status:\n1 2 sudo systemctl status smbd sudo systemctl status nmbd Configuring SMB Service\rBackup Default Configuration File\rBefore modifying the configuration file, backup the original file:\n1 sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.backup Edit Configuration File\rUse a text editor to edit the SMB configuration file:\n1 sudo nano /etc/samba/smb.conf Basic Configuration Example\rAdd the following content to the configuration file:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [global] workgroup = WORKGROUP server string = Ubuntu SMB Server log file = /var/log/samba/log.%m max log size = 1000 logging = file panic action = /usr/share/samba/panic-action %d server role = standalone server obey pam restrictions = yes unix password sync = yes passwd program = /usr/bin/passwd %u passwd chat = *Enter\\snew\\s*\\spassword:* %n\\n *Retype\\snew\\s*\\spassword:* %n\\n *password\\supdated\\ssuccessfully* . pam password change = yes map to guest = bad password usershare allow guests = yes [shared] comment = Shared Folder path = /srv/samba/shared browseable = yes read only = no guest ok = yes create mask = 0755 directory mask = 0755 Create Shared Directory\rCreate the directory for sharing:\n1 2 3 sudo mkdir -p /srv/samba/shared sudo chown nobody:nogroup /srv/samba/shared sudo chmod 777 /srv/samba/shared Verify Configuration File\rUse the testparm command to verify configuration file syntax:\n1 sudo testparm User Management\rCreate SMB User\rIf user authentication is needed, first create a system user:\n1 sudo adduser smbuser Then add the user to the SMB database:\n1 sudo smbpasswd -a smbuser Enable/Disable User\rEnable SMB user:\n1 sudo smbpasswd -e smbuser Disable SMB user:\n1 sudo smbpasswd -d smbuser Delete User\rDelete SMB user:\n1 sudo smbpasswd -x smbuser Starting and Managing Services\rStart Services\rStart SMB services:\n1 2 sudo systemctl start smbd sudo systemctl start nmbd Enable Auto-start\rSet services to start automatically on boot:\n1 2 sudo systemctl enable smbd sudo systemctl enable nmbd Restart Services\rRestart SMB services:\n1 2 sudo systemctl restart smbd sudo systemctl restart nmbd Check Service Status\rCheck service running status:\n1 2 sudo systemctl status smbd sudo systemctl status nmbd Firewall Configuration\rIf firewall is enabled, open SMB related ports:\n1 sudo ufw allow \u0026#39;Samba\u0026#39; Or manually open ports:\n1 2 3 4 sudo ufw allow 139/tcp sudo ufw allow 445/tcp sudo ufw allow 137/udp sudo ufw allow 138/udp Testing Connection\rLocal Testing\rTest SMB shares locally:\n1 smbclient -L localhost Mount Share\rMount SMB share on Linux client:\n1 2 sudo mkdir /mnt/smbshare sudo mount -t cifs //SERVER_IP/shared /mnt/smbshare -o username=smbuser Windows Client\rIn Windows File Explorer address bar, enter:\n1 \\\\SERVER_IP\\shared Advanced Configuration\rHome Directory Sharing\rConfigure home directory sharing for each user:\n1 2 3 4 5 6 7 [homes] comment = User Home Directories browseable = no read only = no create mask = 0700 directory mask = 0700 valid users = %S Read-only Share\rConfigure read-only share:\n1 2 3 4 5 6 [readonly] comment = Read-only Share path = /srv/samba/readonly browseable = yes read only = yes guest ok = yes Security Settings\rConfiguration for enhanced security:\n1 2 3 4 5 6 7 8 9 10 [global] # Disable SMB1 protocol server min protocol = SMB2 # Enable encryption server signing = mandatory # Restrict access by IP hosts allow = 192.168.1.0/24 127.0.0.1 hosts deny = ALL Common Issues\rPermission Issues\rIf encountering permission issues, check the following settings:\nFile system permissions SELinux settings (if enabled) SMB user permissions Connection Issues\rIf unable to connect, check:\nFirewall settings Network connectivity Service status Performance Optimization\rConfiguration for optimizing SMB performance:\n1 2 3 4 5 6 7 [global] socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=131072 SO_SNDBUF=131072 read raw = yes write raw = yes max xmit = 65535 dead time = 15 getwd cache = yes Monitoring and Logging\rView Logs\rSMB log file locations:\n1 2 sudo tail -f /var/log/samba/log.smbd sudo tail -f /var/log/samba/log.nmbd Connection Status\rView current connections:\n1 sudo smbstatus Share List\rList all shares:\n1 sudo smbclient -L localhost Summary\rThis article provides a complete guide to install and configure SMB service on Ubuntu 22.04. SMB service offers a convenient solution for cross-platform file sharing, suitable for both home networks and enterprise environments.\nKey points:\nRegularly backup configuration files Set appropriate user permissions Configure proper firewall rules Monitor service status and logs Adjust performance parameters as needed A properly configured SMB service will provide stable and reliable file sharing functionality for network environments.\n","date":"2025-07-04T00:00:00Z","image":"/p/ubuntu_smb/ubuntu_smb.png","permalink":"/en/p/ubuntu_smb/","title":"Installing SMB Service on Ubuntu 22.04"},{"content":"Overview\rThis article provides a comprehensive guide for deploying PaddlePaddle models on the Ascend 310P platform. It covers multiple deployment approaches, their pros and cons, implementation steps, and solutions to common issues. The guide is particularly useful for ASR (Automatic Speech Recognition) and OCR (Optical Character Recognition) applications.\nDeployment Approaches Overview\rApproach Description Status Recommendation Approach 1 Using paddle-custom-npu pip package ❌ Dependency issues ⭐ Approach 2 Compiling PaddleCustomDevice ⚠️ Precision issues ⭐⭐ Approach 3 paddle2onnx + onnxruntime_cann ⚠️ Slow inference ⭐⭐⭐ Approach 4 paddle2onnx + OM model ⚠️ Dynamic shape issues ⭐⭐⭐ Approach 5 PaddleX High-Performance Inference ✅ Recommended ⭐⭐⭐⭐⭐ Approach 1: Pip Package Installation\rOverview\rThis approach uses the official paddle-custom-npu pip package for quick deployment, but testing revealed missing runtime dependencies.\nDeployment Steps\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Install basic dependencies pip install psutil attrs decorator # Install Python package dependencies pip3 install py3Fdfs imageio pyheif whatimage shapely pyclipper minio \\ scikit-image imgaug lmdb pykafka gunicorn Pillow==9.5.0 # Install PaddlePaddle and NPU support pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/nightly/npu/ pip install paddleocr # Fix version compatibility issues pip install protobuf==3.20.0 Configuration Modifications\r1 2 3 4 5 # Modify PaddleSpeech executor configuration vim /root/miniconda3/envs/asr_ocr/lib/python3.9/site-packages/paddlespeech/cli/executor.py +92 # Modify Paddle core module configuration vim /root/miniconda3/envs/asr_ocr/lib/python3.9/site-packages/paddle/fluid/core.py +386 Issues Encountered\rMissing runtime dependencies preventing normal inference Internal errors persist even after adding related dependencies Approach 2: Compilation Installation\rOverview\rInstalling by compiling PaddleCustomDevice source code, but encountering precision support issues.\nReferences\rHuawei Ascend NPU-PaddlePaddle Deep Learning Platform PaddleCustomDevice NPU Backend Environment Setup\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # System dependencies installation apt-get update -y \u0026amp;\u0026amp; apt-get install -y \\ zlib1g zlib1g-dev libsqlite3-dev openssl libssl-dev libffi-dev \\ libbz2-dev libxslt1-dev unzip pciutils net-tools libblas-dev \\ gfortran libblas3 liblapack-dev liblapack3 libopenblas-dev git # Python dependencies installation pip install psutil attrs decorator pyyaml pathlib2 scipy requests \\ psutil absl-py sympy numpy==1.25.0 scipy # CANN toolkit installation ./Ascend-cann-kernels-310p_8.0.RC2_linux.run --install ./Ascend-cann-nnal_8.0.RC2_linux-aarch64.run --install # Operator package installation wget -q https://paddle-ascend.bj.bcebos.com/code-share-master.zip --no-check-certificate . /usr/local/Ascend/ascend-toolkit/set_env.sh unzip code-share-master.zip cd code-share-master/build \u0026amp;\u0026amp; bash build_ops.sh chmod +x aie_ops.run \u0026amp;\u0026amp; ./aie_ops.run --extract=/usr/local/Ascend/ Environment Variables Configuration\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Log level configuration (0:debug 1:info 2:warning 3:error 4:null) export ASCEND_GLOBAL_LOG_LEVEL=3 # HCCL configuration export HCCL_CONNECT_TIMEOUT=7200 export HCCL_WHITELIST_DISABLE=1 export HCCL_SECURITY_MODE=1 export HCCL_BUFFSIZE=120 # PaddlePaddle NPU configuration export FLAGS_npu_storage_format=0 export FLAGS_use_stride_kernel=0 export FLAGS_allocator_strategy=naive_best_fit export PADDLE_XCCL_BACKEND=npu Compilation and Installation\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Enter NPU backend directory cd PaddleCustomDevice/backends/npu # Install PaddlePaddle CPU version pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ # Configure compilation options export WITH_TESTING=OFF # Execute compilation bash tools/compile.sh # Install compilation artifacts pip install build/dist/paddle_custom_npu*.whl Functionality Verification\r1 2 3 4 5 6 7 8 9 # Check available hardware backends python -c \u0026#34;import paddle; print(paddle.device.get_all_custom_device_type())\u0026#34; # Expected output: [\u0026#39;npu\u0026#39;] # Check version information python -c \u0026#34;import paddle_custom_device; paddle_custom_device.npu.version()\u0026#34; # PaddlePaddle health check python -c \u0026#34;import paddle; paddle.utils.run_check()\u0026#34; Issues Encountered\rError: The soc version does not support bf16 / fp32 for calculations, please change the setting of cubeMathType or the Dtype of input tensor.\nApproach 3: ONNX Runtime CANN Solution\rOverview\rUsing paddle2onnx for model conversion, combined with onnxruntime_cann\u0026rsquo;s CANNExecutionProvider for inference.\nDependencies Installation\r1 2 3 4 5 6 7 8 9 10 11 # Basic dependencies pip install psutil attrs decorator pyyaml pathlib2 scipy requests \\ psutil absl-py sympy numpy==1.25.0 scipy packaging # Image processing dependencies pip install opencv-python Pillow==9.5.0 # Project dependencies pip install flask py3Fdfs imageio pyheif whatimage shapely pyclipper \\ minio scikit-image imgaug lmdb pykafka gunicorn protobuf==3.20.0 \\ Pyinstaller nacos-python-sdk filetype Model Conversion\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # OCR recognition model conversion paddle2onnx --model_dir ./ch_PP-OCRv4_rec_infer \\ --model_filename inference.pdmodel \\ --params_filename inference.pdiparams \\ --save_file ./ch_PP-OCRv4_rec_infer.onnx \\ --opset_version 11 \\ --enable_onnx_checker True # OCR detection model conversion paddle2onnx --model_dir ./ch_PP-OCRv4_det_infer \\ --model_filename inference.pdmodel \\ --params_filename inference.pdiparams \\ --save_file ./ch_PP-OCRv4_det_infer.onnx \\ --opset_version 11 \\ --enable_onnx_checker True # Text direction classification model conversion paddle2onnx --model_dir ./ch_ppocr_mobile_v2.0_cls_infer \\ --model_filename inference.pdmodel \\ --params_filename inference.pdiparams \\ --save_file ./ch_ppocr_mobile_v2.0_cls.onnx \\ --opset_version 11 \\ --enable_onnx_checker True Performance Optimization\rNPU inference performance optimization analysis:\nPerformance Issue Causes:\nJIT Compilation Delay: NPU uses aclop JIT operator library, requiring operator compilation and caching on first run Operator Fallback: Unsupported operators fall back to CPU, causing frequent memory copying Optimization Solutions:\n1 2 3 # Disable JIT compilation, use pre-compiled operators export FLAGS_npu_jit_compile=0 export FLAGS_use_stride_kernel=0 Warm-up Preheating: Perform 5-10 warm-up runs before formal inference testing.\nOperator Caching Mechanism: After execution, a kernel_meta directory is generated containing operator cache files to improve subsequent execution performance.\nIssues Encountered\rNo significant inference speed improvement compared to CPU Manual warm-up operations required kernel_meta cache files may consume significant disk space Approach 4: OM Model Deployment\rOverview\rConverting ONNX models to Ascend-specific OM model format for inference using ACL interfaces.\nModel Conversion Process\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Step 1: Paddle model to ONNX paddle2onnx --model_dir ./ch_PP-OCRv4_rec_infer \\ --model_filename inference.pdmodel \\ --params_filename inference.pdiparams \\ --save_file ./ch_PP-OCRv4_rec_infer.onnx \\ --opset_version 11 \\ --enable_onnx_checker True # Step 2: ONNX model to OM atc --model=./rec/ch_PP-OCRv4_rec_infer.onnx \\ --framework=5 \\ --input_format=NCHW \\ --output=./rec/ch_PP-OCRv4_rec_infer.om \\ --soc_version=Ascend310P3 \\ --input_shape=\u0026#34;x:1,3,48,320\u0026#34; Technical Challenges\rDynamic Shape Issues:\n1 2 3 4 5 # Model information example Input name : x Input shape : [\u0026#39;DynamicDimension.0\u0026#39;, 3, \u0026#39;DynamicDimension.1\u0026#39;, \u0026#39;DynamicDimension.2\u0026#39;] Output name : sigmoid_0.tmp_0 Output shape: [\u0026#39;DynamicDimension.3\u0026#39;, 1, \u0026#39;DynamicDimension.4\u0026#39;, \u0026#39;DynamicDimension.5\u0026#39;] Memory Allocation Issues: get_output_size_by_index returns 0, causing acl.rt.malloc memory allocation failure.\nSolutions\rSystem Auto-allocation: Create empty aclDataBuffer, system automatically allocates memory User Pre-allocation: Pre-allocate memory based on maximum possible output Issues Encountered\rIncomplete dynamic shape support Complex memory management Difficult debugging Approach 5: PaddleX High-Performance Inference (Recommended)\rOverview\rUsing PaddleX\u0026rsquo;s high-performance inference plugin, supporting fixed-shape OM model inference with excellent performance and stability.\nInstallation and Configuration\r1 2 3 4 5 6 7 # Install PaddleX git clone https://github.com/PaddlePaddle/PaddleX.git cd PaddleX pip install -e \u0026#34;.[base]\u0026#34; # Install high-performance inference plugin paddlex --install hpi-npu Manual Compilation (Optional)\r1 2 3 4 5 6 7 8 9 10 11 12 cd PaddleX/libs/ultra-infer/python unset http_proxy https_proxy # Configure compilation options export ENABLE_OM_BACKEND=ON ENABLE_ORT_BACKEND=ON export ENABLE_PADDLE_BACKEND=OFF WITH_GPU=OFF DEVICE_TYPE=NPU export NPU_HOST_LIB=/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/lib64 # Compile and install python setup.py build python setup.py bdist_wheel pip install dist/ultra_infer_npu*.whl Model Conversion\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 paddlex --install paddle2onnx # Convert to ONNX using PaddleX paddlex --paddle2onnx \\ --opset_version 11 \\ --paddle_model_dir \u0026lt;PaddlePaddle_model_directory\u0026gt; \\ --onnx_model_dir \u0026lt;ONNX_model_directory\u0026gt; # Convert to OM model atc --model=inference.onnx \\ --framework=5 \\ --output=inference \\ --soc_version=Ascend310P3 \\ --input_shape \u0026#34;x:1,3,48,320\u0026#34; # FP32 precision conversion atc --model=inference.onnx \\ --framework=5 \\ --output=inference \\ --soc_version=Ascend310P3 \\ --input_shape \u0026#34;x:1,3,48,320\u0026#34; \\ --precision_mode_v2=origin Inference Code Example\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 # -*- encoding=utf-8 -*- \u0026#34;\u0026#34;\u0026#34; @Author: Kang @Modified by: @Datetime: 2025/07/15 13:57 @Description: OCR test script \u0026#34;\u0026#34;\u0026#34; import os import time import copy from pathlib import Path from typing import List, Dict, Any import cv2 import numpy as np from loguru import logger from paddlex import create_model class OCRProcessor: \u0026#34;\u0026#34;\u0026#34;OCR processor class\u0026#34;\u0026#34;\u0026#34; def __init__(self, det_model_dir: str = \u0026#34;/opt/models/ocr/PP-OCRv4_server_det_infer_om_310P\u0026#34;, rec_model_dir: str = \u0026#34;/opt/models/ocr/PP-OCRv4_server_rec_infer_om_310P\u0026#34;, ori_model_dir: str = \u0026#34;/opt/models/ocr/PP-LCNet_x1_0_textline_ori_infer\u0026#34;, device: str = \u0026#34;npu:0\u0026#34;, output_dir: str = \u0026#34;/opt/output/\u0026#34;): self.output_dir = Path(output_dir) self.output_dir.mkdir(parents=True, exist_ok=True) # Configure parameters hpi_config = { \u0026#34;auto_config\u0026#34;: False, \u0026#34;backend\u0026#34;: \u0026#34;om\u0026#34;, } # Initialize classification model logger.info(\u0026#34;Loading classification model...\u0026#34;) self.model_ori = create_model( model_name=\u0026#34;PP-LCNet_x1_0_textline_ori\u0026#34;, model_dir=ori_model_dir, ) # Initialize detection model logger.info(\u0026#34;Loading detection model...\u0026#34;) self.model_det = create_model( model_name=\u0026#34;PP-OCRv4_server_det\u0026#34;, model_dir=det_model_dir, device=device, use_hpip=True, hpi_config=hpi_config, input_shape=[3, 640, 480] ) # Initialize recognition model logger.info(\u0026#34;Loading recognition model...\u0026#34;) self.model_rec = create_model( model_name=\u0026#34;PP-OCRv4_server_rec\u0026#34;, model_dir=rec_model_dir, device=device, use_hpip=True, hpi_config=hpi_config, input_shape=[3, 48, 320] ) logger.info(\u0026#34;Model loaded successfully\u0026#34;) def _crop_by_polys(self, img: np.ndarray, dt_polys: List[list]) -\u0026gt; List[dict]: \u0026#34;\u0026#34;\u0026#34; Call method to crop images based on detection boxes. Args: img (nd.ndarray): The input image. dt_polys (list[list]): List of detection polygons. Returns: list[dict]: A list of dictionaries containing cropped images and their sizes. Raises: NotImplementedError: If det_box_type is not \u0026#39;quad\u0026#39; or \u0026#39;poly\u0026#39;. \u0026#34;\u0026#34;\u0026#34; dt_boxes = np.array(dt_polys) output_list = [] for bno in range(len(dt_boxes)): tmp_box = copy.deepcopy(dt_boxes[bno]) img_crop = self.get_minarea_rect_crop(img, tmp_box) output_list.append(img_crop) return output_list def get_minarea_rect_crop(self, img: np.ndarray, points: np.ndarray) -\u0026gt; np.ndarray: \u0026#34;\u0026#34;\u0026#34; Get the minimum area rectangle crop from the given image and points. Args: img (np.ndarray): The input image. points (np.ndarray): A list of points defining the shape to be cropped. Returns: np.ndarray: The cropped image with the minimum area rectangle. \u0026#34;\u0026#34;\u0026#34; bounding_box = cv2.minAreaRect(np.array(points).astype(np.int32)) points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0]) index_a, index_b, index_c, index_d = 0, 1, 2, 3 if points[1][1] \u0026gt; points[0][1]: index_a = 0 index_d = 1 else: index_a = 1 index_d = 0 if points[3][1] \u0026gt; points[2][1]: index_b = 2 index_c = 3 else: index_b = 3 index_c = 2 box = [points[index_a], points[index_b], points[index_c], points[index_d]] crop_img = self.get_rotate_crop_image(img, np.array(box)) return crop_img def _rotate_image(self, image_array_list: List[np.ndarray], rotate_angle_list: List[int]): assert len(image_array_list) == len( rotate_angle_list ), f\u0026#34;Length of image ({len(image_array_list)}) must match length of angle ({len(rotate_angle_list)})\u0026#34; for angle in rotate_angle_list: assert angle in [0, 1], f\u0026#34;rotate_angle must be 0 or 1, now it\u0026#39;s {angle}\u0026#34; rotated_images = [] for image_array, rotate_indicator in zip(image_array_list, rotate_angle_list): # Convert 0/1 indicator to actual rotation angle rotate_angle = rotate_indicator * 180 if rotate_angle \u0026lt; 0 or rotate_angle \u0026gt;= 360: raise ValueError(\u0026#34;`angle` should be in range [0, 360)\u0026#34;) if rotate_angle \u0026lt; 1e-7: rotated_images.append(image_array) continue # Should we align corners? h, w = image_array.shape[:2] center = (w / 2, h / 2) scale = 1.0 mat = cv2.getRotationMatrix2D(center, rotate_angle, scale) cos = np.abs(mat[0, 0]) sin = np.abs(mat[0, 1]) new_w = int((h * sin) + (w * cos)) new_h = int((h * cos) + (w * sin)) mat[0, 2] += (new_w - w) / 2 mat[1, 2] += (new_h - h) / 2 dst_size = (new_w, new_h) rotated = cv2.warpAffine( image_array, mat, dst_size, flags=cv2.INTER_CUBIC, ) rotated_images.append(rotated) logger.info(f\u0026#34;Number of rotated images: {len(rotated_images)}\u0026#34;) return rotated_images def get_rotate_crop_image(self, img: np.ndarray, points: list) -\u0026gt; np.ndarray: \u0026#34;\u0026#34;\u0026#34; Crop and rotate the input image based on the given four points to form a perspective-transformed image. Args: img (np.ndarray): The input image array. points (list): A list of four 2D points defining the crop region in the image. Returns: np.ndarray: The transformed image array. \u0026#34;\u0026#34;\u0026#34; assert len(points) == 4, \u0026#34;shape of points must be 4*2\u0026#34; img_crop_width = int( max( np.linalg.norm(points[0] - points[1]), np.linalg.norm(points[2] - points[3]), ) ) img_crop_height = int( max( np.linalg.norm(points[0] - points[3]), np.linalg.norm(points[1] - points[2]), ) ) pts_std = np.float32( [ [0, 0], [img_crop_width, 0], [img_crop_width, img_crop_height], [0, img_crop_height], ] ) M = cv2.getPerspectiveTransform(points, pts_std) dst_img = cv2.warpPerspective( img, M, (img_crop_width, img_crop_height), borderMode=cv2.BORDER_REPLICATE, flags=cv2.INTER_CUBIC, ) dst_img_height, dst_img_width = dst_img.shape[0:2] if dst_img_height * 1.0 / dst_img_width \u0026gt;= 1.5: dst_img = np.rot90(dst_img) return dst_img def process_image(self, image_path: str) -\u0026gt; List[Dict[str, Any]]: \u0026#34;\u0026#34;\u0026#34; Process single image for OCR recognition Args: image_path: Image path Returns: List of recognition results \u0026#34;\u0026#34;\u0026#34; if not os.path.exists(image_path): logger.error(f\u0026#34;Image file does not exist: {image_path}\u0026#34;) return [] try: # Read image image = cv2.imread(image_path) if image is None: logger.error(f\u0026#34;Failed to read image: {image_path}\u0026#34;) return [] logger.info(f\u0026#34;Start processing image: {image_path}\u0026#34;) start_time = time.time() # Text detection logger.info(f\u0026#34;Start text detection\u0026#34;) det_start = time.time() output_det = self.model_det.predict(image) det_time = time.time() - det_start logger.info(f\u0026#34;Text detection time: {det_time}s\u0026#34;) results = [] result_det = [] for res in output_det: det_polys = res.get(\u0026#34;dt_polys\u0026#34;, []) det_scores = res.get(\u0026#34;dt_scores\u0026#34;, []) for idx, det_poly in enumerate(det_polys): result_det.append({ \u0026#34;idx\u0026#34;: idx, \u0026#34;dt_polys\u0026#34;: det_poly, \u0026#34;dt_scores\u0026#34;: det_scores[idx], }) logger.info(f\u0026#34;Detected {len(det_polys)} text regions\u0026#34;) images_det = self._crop_by_polys(image, det_polys) for idx, img in enumerate(images_det): cv2.imwrite(f\u0026#34;{self.output_dir}/cropped_det_{idx}.png\u0026#34;, img) # Text direction classification logger.info(f\u0026#34;Start text direction classification\u0026#34;) ori_start = time.time() output_ori = self.model_ori.predict(images_det) ori_time = time.time() - ori_start logger.info(f\u0026#34;Text direction classification time: {ori_time}s\u0026#34;) angles = [ int(ori_res[\u0026#34;class_ids\u0026#34;][0]) for ori_res in output_ori ] images_ori = self._rotate_image(images_det, angles) for idx, img in enumerate(images_ori): cv2.imwrite(f\u0026#34;{self.output_dir}/cropped_ori_{idx}.png\u0026#34;, img) # Text recognition logger.info(f\u0026#34;Start text recognition\u0026#34;) rec_start = time.time() for item in result_det: output_rec = self.model_rec.predict(images_ori[item[\u0026#34;idx\u0026#34;]]) for rec_res in output_rec: rec_text = rec_res.get(\u0026#34;rec_text\u0026#34;, \u0026#34;\u0026#34;) rec_score = rec_res.get(\u0026#34;rec_score\u0026#34;, 0) results.append({ \u0026#34;idx\u0026#34;: item[\u0026#34;idx\u0026#34;], \u0026#34;dt_polys\u0026#34;: item[\u0026#34;dt_polys\u0026#34;].tolist(), \u0026#34;dt_scores\u0026#34;: item[\u0026#34;dt_scores\u0026#34;], \u0026#34;rec_res\u0026#34;: rec_text, \u0026#34;rec_score\u0026#34;: rec_score, }) rec_time = time.time() - rec_start logger.info(f\u0026#34;Detection time-text recognition: {rec_time}s\u0026#34;) total_time = time.time() - start_time logger.info(f\u0026#34;Total processing time: {total_time:.3f}s\u0026#34;) return results except Exception as e: logger.exception(f\u0026#34;Failed to process image: {e}\u0026#34;) return [] def batch_process(self, image_paths: List[str]) -\u0026gt; Dict[str, List[Dict[str, Any]]]: \u0026#34;\u0026#34;\u0026#34; Batch process images Args: image_paths: List of image paths Returns: Batch processing results \u0026#34;\u0026#34;\u0026#34; batch_results = {} for image_path in image_paths: results = self.process_image(image_path) batch_results[image_path] = results return batch_results def main(): \u0026#34;\u0026#34;\u0026#34;Main function\u0026#34;\u0026#34;\u0026#34; # Initialize OCR processor ocr_processor = OCRProcessor( output_dir=\u0026#34;/opt/output/\u0026#34; ) # Process single image image_path = \u0026#34;/opt/test/test.png\u0026#34; results = ocr_processor.process_image(image_path) # Output results print(\u0026#34;\\n=== OCR results ===\u0026#34;) for result in results: print(result) if __name__ == \u0026#34;__main__\u0026#34;: main() Supported Models\rModel Type Model Name Input Shape Chip Support Text Detection PP-OCRv4_mobile_det (1,3,640,480) 910B/310P/310B Text Recognition PP-OCRv4_mobile_rec (1,3,48,320) 910B/310P/310B Image Classification ResNet50 (1,3,224,224) 910B/310P/310B Object Detection RT-DETR-L Multi-input 910B/310P/310B Summary and Recommendations\rApproach Comparison\rApproach 5 (PaddleX): ✅ Highly Recommended - Official support, excellent performance, stable and reliable Approach 3 (ONNX Runtime): ⚠️ Usable but average performance, suitable for quick validation Approach 4 (OM Model): ⚠️ High technical difficulty, suitable for deep customization Approaches 1 \u0026amp; 2: ❌ Not recommended due to multiple issues Best Practices\rPrioritize Approach 5: Use PaddleX high-performance inference plugin Fixed Input Shapes: Avoid complexity introduced by dynamic shapes Model Warm-up: Perform warm-up operations before inference Performance Monitoring: Use Ascend CANN Profiling tools for performance optimization Version Management: Keep CANN toolkit and PaddleX versions synchronized Performance Optimization Recommendations\rUse FP16 precision to improve inference speed Configure batch size appropriately Utilize operator caching mechanisms Regularly clean up kernel_meta temporary files Through this comprehensive guide, developers can select the appropriate deployment approach based on their specific requirements and reference corresponding troubleshooting methods to successfully deploy PaddlePaddle models on the Ascend 310P platform.\n","date":"2025-06-11T00:00:00Z","image":"/p/paddle_on_ascend310p/paddle_on_ascend310P.png","permalink":"/en/p/paddle_on_ascend310p/","title":"PaddlePaddle Deployment on Ascend 310P: A Comprehensive Guide"},{"content":"In the long history of computer development, boot firmware plays a crucial role. From the early BIOS to today\u0026rsquo;s powerful UEFI, it has continuously evolved to meet the growing performance, security, and management needs of hardware.\nOrigin: The Golden Age of BIOS\rIn the 1980s, IBM PC introduced BIOS (Basic Input/Output System), which was etched into the motherboard ROM and took on three major responsibilities for computer startup:\nPower-On Self-Test (POST): Detecting whether CPU, memory, graphics cards, keyboards, and other hardware are available. MBR Loading and Hardware Interrupt Interface: Reading the primary bootloader from the first sector of the hard disk (Master Boot Record, 512 bytes), and providing unified hardware access capabilities through interrupt calls such as INT 13h/INT 10h. Although the BIOS architecture was simple and highly compatible, it encountered bottlenecks in the following aspects:\nDisk Capacity Limitations: MBR supports a maximum of 2 TB and up to 4 primary partitions. Difficult Firmware Updates: Driver code is hardcoded in ROM, making it inflexible for extension or patching. Lack of Graphics and Network Capabilities: Only providing the most basic text interface and simple PXE network microcode loading. Disruption: The Emergence of UEFI\rEntering the 21st century, Intel initiated the UEFI (Unified Extensible Firmware Interface) alliance, aiming to create a more flexible, extensible, and secure boot environment.\nGPT Partition Support: Breaking through TB-level storage limitations, capable of managing hundreds of partitions. Modular Drivers: Extending hardware support through loadable .efi drivers without flashing ROM. Modern Graphical Interface: Supporting GUI, mouse, and multiple languages for more user-friendly interactions. Security and Remote Management: Verifying signatures of boot programs and operating systems at each stage to prevent tampering and rootkits (Secure Boot). Built-in HTTP/FTP clients and UEFI Shell for remote updates or script execution. Brief UEFI Boot Process\rParallel Firmware Initialization: Multi-threaded loading of hardware drivers and firmware components. Scanning ESP: Reading the EFI System Partition (FAT32 format), looking for .efi files under the \\EFI\\ path. Running Boot Manager: Executing EFI applications according to the boot option order stored in NVRAM. Loading Kernel: EFI applications (such as bootx64.efi) boot the operating system kernel and transfer control to it. Comparative Summary\rFeature BIOS + MBR UEFI + GPT Disk Support ≤ 2 TB; max 4 partitions Theoretically ≥ 9.4 ZB; max 128 partitions Drivers \u0026amp; Extensions ROM fixed, difficult to update Dynamic loading of .efi modules, easy to upgrade Boot Interface Text or minimal graphics Rich GUI, mouse, multi-language support Secure Boot Not supported Supports signature verification, anti-tampering Network Functionality Only simple PXE boot Supports HTTP/FTP, remote management Real-World Choices and Migration\rLegacy Devices \u0026amp; Embedded Systems: BIOS+MBR remains effective due to small size and strong compatibility. Modern PCs \u0026amp; Servers: UEFI+GPT is standard, improving boot speed, security, and large disk support. Dual/Multi-Boot Systems: Recommended to standardize on UEFI mode, using BCD for Windows and shim+GRUB EFI for Linux to reduce boot conflicts. Migration Key Points:\nBIOS → UEFI: Back up data, use Windows mbr2gpt or Linux gdisk to convert to GPT, create an ESP ≥100 MB, switch firmware mode, and rebuild the boot. UEFI → BIOS: Convert GPT to MBR (note the loss of GPT information), rewrite traditional MBR boot code, and adjust firmware to Legacy mode. Common Boot Issues and Troubleshooting Techniques\rUnable to Recognize ESP Check partition type: ESP must be set to EF00 (GPT) or FAT32 (MBR). Verify files: Confirm that \\EFI\\BOOT\\BOOTX64.EFI (or corresponding platform) exists. Secure Boot Errors Disable or configure: Enter firmware settings, temporarily disable Secure Boot, or import the correct public keys (PK/KEK). Missing Boot Options Use efibootmgr (Linux) or bcdedit (Windows) to recreate boot entries. Unable to Boot After Switching Disk Mode Check controller mode: After switching IDE/RAID/AHCI, kernel modules and driver signatures need to be updated synchronously. Advanced Customization and Extensions\rUEFI Shell Automation: Write .nsh scripts to automatically mount partitions, perform self-tests, or remotely download firmware updates. Driver Injection: Place vendor .efi drivers in ESP to extend native support for NVMe, RAID controllers, etc. Variable Management: Use dmpstore (Shell) or efivar tools to read and write firmware variables, enabling automated startup or log collection. Open Firmware Alternatives: Such as TianoCore (OVMF), Coreboot, can be customized for size and functionality as needed. ","date":"2025-04-23T00:00:00Z","image":"/p/bios_and_uefi/cover.png","permalink":"/en/p/bios_and_uefi/","title":"BIOS and UEFI: The Boot Journey from History to Modern Era"},{"content":"Introduction\rHelm is a powerful tool for managing Kubernetes applications. Through Helm Charts, you can define, install, and upgrade even the most complex Kubernetes applications.\nHelm Charts are easy to create, version control, share, and publish. This means you can use Helm more efficiently, avoiding the tedious process of manually copying and pasting configuration files.\nHelm has graduated from CNCF (Cloud Native Computing Foundation) and is continuously maintained by the Helm community, ensuring its stability and ongoing development.\nManaging Complexity\nHelm Charts can describe even the most complex applications, providing reproducible installation workflows and serving as the single source of truth for application configuration to ensure consistency and reliability.\nEasy Updates\nWith Helm, you can effortlessly manage application updates through in-place upgrades and custom hooks, minimizing disruptions caused by updates.\nSimple Sharing\nHelm Charts are easy to version control, share, and host on public or private repositories, facilitating team collaboration and application distribution.\nRollbacks\nIf issues arise during updates, the helm rollback command allows you to easily revert to a previous release version, ensuring stable application operation.\nHelm\u0026rsquo;s Workflow\rThe workflow of Helm can be understood as follows:\nInstall Charts: Helm installs Charts into a Kubernetes cluster, creating a new Release each time. Manage Releases: Each Release is independently managed, allowing the same Chart to be deployed multiple times to meet different requirements. Find Charts: Charts can be found and obtained through Helm\u0026rsquo;s Chart Repository, allowing for new Charts to be deployed. Overall Diagram Representation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 +---------------------+ +---------------------+ | Chart Repository | \u0026lt;------\u0026gt; | Helm | | (Store all Charts) | | (Install and Manage Charts) | +---------------------+ +---------+-----------+ | v +--------------------+ | Kubernetes Cluster | | ------------------ | | Release A | | Release B | | ... | +--------------------+ Three Core Concepts\rChart\rChart is a Helm package. It contains all the resource definitions required to run an application, tool, or service in a Kubernetes cluster. It can be compared to the Kubernetes version of Homebrew recipes, dpkg, or RPM files in Apt or Yum.\nDiagram Representation:\n1 2 3 4 5 6 7 8 +-----------------+ | Chart | | --------------- | | Deployment.yaml | | Service.yaml | | ConfigMap.yaml | | ... | +-----------------+ Helm will deploy the Chart in the following order:\nNamespace \u0026mdash;\u0026gt; NetworkPolicy \u0026mdash;\u0026gt; ResourceQuota \u0026mdash;\u0026gt; LimitRange \u0026mdash;\u0026gt; PodSecurityPolicy \u0026mdash;\u0026gt; PodDisruptionBudget \u0026mdash;\u0026gt; ServiceAccount \u0026mdash;\u0026gt; Secret \u0026mdash;\u0026gt; SecretList \u0026mdash;\u0026gt; ConfigMap \u0026mdash;\u0026gt; StorageClass \u0026mdash;\u0026gt; PersistentVolume \u0026mdash;\u0026gt; PersistentVolumeClaim \u0026mdash;\u0026gt; CustomResourceDefinition \u0026mdash;\u0026gt; ClusterRole \u0026mdash;\u0026gt; ClusterRoleList \u0026mdash;\u0026gt; ClusterRoleBinding \u0026mdash;\u0026gt; ClusterRoleBindingList \u0026mdash;\u0026gt; Role \u0026mdash;\u0026gt; RoleList \u0026mdash;\u0026gt; RoleBinding \u0026mdash;\u0026gt; RoleBindingList \u0026mdash;\u0026gt; Service \u0026mdash;\u0026gt; DaemonSet \u0026mdash;\u0026gt; Pod \u0026mdash;\u0026gt; ReplicationController \u0026mdash;\u0026gt; ReplicaSet \u0026mdash;\u0026gt; Deployment \u0026mdash;\u0026gt; HorizontalPodAutoscaler \u0026mdash;\u0026gt; StatefulSet \u0026mdash;\u0026gt; Job \u0026mdash;\u0026gt; CronJob \u0026mdash;\u0026gt; Ingress \u0026mdash;\u0026gt; APIService\nRepository\rRepository is a collection of Charts. It can be compared to the Apt or Yum repository.\nDiagram Representation:\n1 2 3 4 5 6 7 +---------------------+ | Chart Repository | | ------------------- | | Chart A Chart B | | Chart C Chart D | | ... | +---------------------+ Release\rRelease is a running instance of a Chart. A Chart can be installed multiple times in a Kubernetes cluster, each with its own Release. For example, if you have a MySQL Chart, you can install it twice in the same cluster, each with its own Release.\nDiagram Representation:\n1 2 3 4 5 6 7 +------------------+ +------------------+ | Release A | | Release B | |------------------| |------------------| | MySQL Chart | | MySQL Chart | | Release Name: A | | Release Name: B | | ... | | ... | +------------------+ +------------------+ Installation\rEach release of Helm provides binary releases for a variety of OSes. These binary versions can be manually downloaded and installed.\nDownload your desired version Unpack it (tar -zxvf helm-v3.0.0-linux-amd64.tar.gz) Find the helm binary in the unpacked directory, and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/helm) Create Project Templates\rCreate a Helm project template using the command:\n1 helm create mychart This generates the following files and directory structure:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 . ├── charts # Directory for dependency Charts ├── Chart.yaml # Metadata describing the Chart ├── templates # Directory for template files │ ├── deployment.yaml # Deployment template │ ├── _helpers.tpl # Helper templates │ ├── hpa.yaml # Horizontal Pod Autoscaler template │ ├── ingress.yaml # Ingress template │ ├── NOTES.txt # Installation notes │ ├── serviceaccount.yaml # Service Account template │ ├── service.yaml # Service template │ └── tests # Test templates │ └── test-connection.yaml # Connection test template └── values.yaml # Default values overriding template parameters Summary\rHelm simplifies the deployment and management of applications, tools, and services in Kubernetes clusters through three core concepts:\nCharts: Define applications Repositories: Enable sharing and distribution of Charts Releases: Manage running instances of these applications in clusters The charts provide application definitions, repositories facilitate sharing, and releases handle the lifecycle of deployed instances.\n","date":"2025-01-18T00:00:00Z","permalink":"/en/p/kubernetes_helm_guide/","title":"Kubernetes Package Management: The Ultimate Helm Guide"},{"content":"Why Block Foreign IP Addresses\rServers on public networks face severe security threats:\nNumerous scanners on the internet continuously probe servers 24/7, attempting to gain unauthorized access and control Server log analysis reveals that most attacks originate from foreign servers in countries like the Netherlands, United States, Singapore, and Japan Whether using cloud servers or IDC-hosted servers, exposing service ports increases vulnerability whenever public services are offered Solution Overview\rFor services primarily targeting domestic users, blocking foreign IP access can significantly enhance security.\nTechnical Foundation\rIptables: Linux firewall tool used to filter and block requests Ipset module: Iptables extension that efficiently handles large IP address ranges IPdeny: Provides regularly updated global IP address allocation data Implementation Approach\rCollect and organize domestic IP ranges into Ipset Configure Iptables to use Ipset for checking source IPs Allow domestic IP access while blocking foreign IP connections Complete Implementation Steps\rThis guide is based on CentOS 7.6; commands may vary across different Linux distributions\nInstall Required Tools\r1 2 # If ipset is not already installed yum install -y ipset Create IP Address Set\rDownload Domestic IP Ranges\r1 wget http://www.ipdeny.com/ipblocks/data/countries/cn.zone Convert to Ipset Commands\r1 2 for i in `cat cn.zone`; do echo \u0026#34;ipset add china $i\u0026#34; \u0026gt;\u0026gt;ipset_result.sh; done chmod +x ipset_result.sh Create and Populate Ipset Collection\r1 2 3 4 5 6 7 8 9 10 # Create the china set ipset create china hash:net hashsize 10000 maxelem 1000000 # Add private network IP ranges echo \u0026#34;ipset add china 10.0.0.0/8\u0026#34; \u0026gt;\u0026gt; ipset_result.sh echo \u0026#34;ipset add china 172.0.0.0/8\u0026#34; \u0026gt;\u0026gt; ipset_result.sh echo \u0026#34;ipset add china 192.0.0.0/8\u0026#34; \u0026gt;\u0026gt; ipset_result.sh # Execute script to add IP ranges bash ipset_result.sh Verify IP Collection\r1 2 ipset list china ipset list china | wc -l # Should contain approximately 8000+ entries Configure Iptables Rules\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # Clear existing rules (if necessary) iptables -F iptables -X # Create basic rules cat \u0026gt; /etc/sysconfig/iptables \u0026lt;\u0026lt; EOF *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT # Add rules for other required ports below # Example: -A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT -A INPUT -m set ! --match-set china src -j DROP -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited COMMIT EOF # Apply rules iptables-restore \u0026lt; /etc/sysconfig/iptables Ensure Configuration Persistence\rTo prevent configuration loss after server restart, set up persistence:\nPersist Ipset Data\r1 2 3 4 5 6 # Save Ipset data ipset save china \u0026gt; /etc/ipset.conf # Configure loading at startup chmod +x /etc/rc.d/rc.local echo \u0026#34;ipset restore \u0026lt; /etc/ipset.conf\u0026#34; \u0026gt;\u0026gt; /etc/rc.d/rc.local Persist Iptables Rules\r1 2 # Configure loading at startup echo \u0026#34;/usr/sbin/iptables-restore \u0026lt; /etc/sysconfig/iptables\u0026#34; \u0026gt;\u0026gt; /etc/rc.d/rc.local Automate IP Range Updates\rTo ensure IP ranges stay current, set up periodic updates:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # Create weekly update script cat \u0026gt; /usr/local/bin/update_cn_ip.sh \u0026lt;\u0026lt; EOF #!/bin/bash wget -O /tmp/cn.zone http://www.ipdeny.com/ipblocks/data/countries/cn.zone ipset flush china for ip in \\$(cat /tmp/cn.zone); do ipset add china \\$ip; done # Add private network IP ranges ipset add china 10.0.0.0/8 ipset add china 172.0.0.0/8 ipset add china 192.0.0.0/8 # Update persistence file ipset save china \u0026gt; /etc/ipset.conf EOF chmod +x /usr/local/bin/update_cn_ip.sh # Add weekly scheduled task echo \u0026#34;0 0 * * 1 /usr/local/bin/update_cn_ip.sh\u0026#34; \u0026gt; /etc/cron.d/update_cn_ip Verification and Troubleshooting\rTesting Configuration\r1 2 3 4 5 6 7 8 # Check Ipset collection ipset list china # Check Iptables rules iptables -L -n # Test domestic IP access (should be allowed) # Test foreign IP access (should be blocked) Common Issues and Solutions\rUnable to SSH connect: Ensure SSH port rules are added before blocking rules Local network access restricted: Verify private IP ranges are added to the china set Configuration not persisting: Check rc.local file permissions and script content Conclusion\rBy blocking foreign IP access, we can significantly reduce the risk of server attacks, particularly suitable for services primarily targeting domestic users. Note that this method may affect legitimate access from overseas users - adjust according to your specific business requirements.\n","date":"2025-01-01T00:00:00Z","image":"/p/block_foreign_ips_server_security/cover_english.png","permalink":"/en/p/block_foreign_ips_server_security/","title":"Complete Guide to Blocking Foreign IP Access to Servers"},{"content":"This guide explains how to configure an Ingress-Nginx controller in a Kubernetes cluster to support TCP service forwarding, using MySQL service as an example.\nPrerequisites\rKubernetes cluster installed Ingress-Nginx controller deployed Existing TCP service to expose (MySQL in this example) Configuration Steps\rConfigure TCP Service Mapping\rCreate a tcp-services.yaml file to define TCP port mappings:\n1 2 3 4 5 6 7 8 apiVersion: v1 kind: ConfigMap metadata: name: tcp-services namespace: ingress-nginx data: # Syntax: \u0026lt;external_port\u0026gt;: \u0026#34;\u0026lt;namespace\u0026gt;/\u0026lt;service_name\u0026gt;:\u0026lt;target_port\u0026gt;\u0026#34; 3306: \u0026#34;default/mysql-primary:3306\u0026#34; Apply configuration:\n1 kubectl create -f tcp-services.yaml If the ConfigMap already exists, edit directly:\n1 2 kubectl edit configmap tcp-services -n ingress-nginx # Add new mapping under data: \u0026lt;external_port\u0026gt;: \u0026#34;\u0026lt;namespace\u0026gt;/\u0026lt;service_name\u0026gt;:\u0026lt;port\u0026gt;\u0026#34; Update Ingress Controller Configuration\rEdit the Ingress-Nginx Controller Deployment:\n1 kubectl edit deployment ingress-nginx-controller -n ingress-nginx Add the following configuration to the controller\u0026rsquo;s args section:\n1 2 3 args: # ... existing parameters ... - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services Configuring Ingress Controller Service\rEdit the Ingress-Nginx Controller Service:\n1 kubectl edit service ingress-nginx-controller -n ingress-nginx Add TCP service port configuration in the ports section:\n1 2 3 4 5 6 ports: # ... Other existing port configurations ... - name: mysql-primary port: 3306 protocol: TCP targetPort: 3306 Verification\rAfter completing configurations, verify with following methods:\nCheck ConfigMap creation: 1 kubectl get configmap tcp-services -n ingress-nginx -o yaml Confirm Ingress Controller status: 1 kubectl get pods -n ingress-nginx Verify port exposure: 1 kubectl get svc ingress-nginx-controller -n ingress-nginx Notes\rEnsure target services (e.g., MySQL) are properly running in the cluster Check for port conflicts Additional security group/firewall rule configurations may be required for cloud providers ","date":"2024-12-20T00:00:00Z","permalink":"/en/p/k8s_ingress_tcp_forwarding_config/","title":"Guide to Configuring TCP Service Forwarding with Ingress-Nginx in Kubernetes"},{"content":"Deployment\rPlease ensure that the following components are deployed:\ncert-manager: Installation - cert-manager Documentation 1 kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml cert-manager-webhook-dnspod: imroc/cert-manager-webhook-dnspod: cert-manager webhook resolver for DNSPod (github.com) 1 kubectl apply -f https://raw.githubusercontent.com/imroc/cert-manager-webhook-dnspod/master/bundle.yaml Modify the deployment file ingress-dnspod-solver.yaml as follows:\ningress-dnspod-solver.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 apiVersion: v1 kind: ServiceAccount metadata: name: ingress-dnspod-solver namespace: cert-manager labels: app: ingress-dnspod-solver --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: ingress-dnspod-solver rules: - apiGroups: [ \u0026#34;networking.k8s.io\u0026#34; ] resources: [ \u0026#34;ingresses\u0026#34; ] verbs: [ \u0026#34;get\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;watch\u0026#34; ] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: ingress-dnspod-solver labels: app: ingress-dnspod-solver roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: ingress-dnspod-solver subjects: - apiGroup: \u0026#34;\u0026#34; kind: ServiceAccount name: ingress-dnspod-solver namespace: cert-manager --- apiVersion: v1 kind: Secret metadata: name: ingress-dnspod-solver namespace: cert-manager type: Opaque stringData: TENCENT_SECRET_KEY: \u0026#34;\u0026lt;secret_key\u0026gt;\u0026#34; # Tencent Cloud API Secret Key --- apiVersion: v1 kind: ConfigMap metadata: name: ingress-dnspod-solver namespace: cert-manager data: DOMAIN: \u0026#34;example.com\u0026#34; # Domain name POLICY: \u0026#34;retain\u0026#34; # Monitoring handling policy: retain (keep records on resource update), update (update records on resource update) TENCENT_SECRET_ID: \u0026#34;\u0026lt;secret_id\u0026gt;\u0026#34; # Tencent Cloud API Secret ID RECORD_VALUE: \u0026#34;\u0026lt;record_value\u0026gt;\u0026#34; # DNS record value --- apiVersion: apps/v1 kind: Deployment metadata: name: ingress-dnspod-solver namespace: cert-manager labels: app: ingress-dnspod-solver spec: replicas: 1 selector: matchLabels: app: ingress-dnspod-solver template: metadata: labels: app: ingress-dnspod-solver spec: serviceAccountName: ingress-dnspod-solver containers: - name: ingress-dnspod-solver image: harbor.example.com/devops/ingress-dnspod-solver:latest imagePullPolicy: IfNotPresent env: - name: DOMAIN valueFrom: configMapKeyRef: name: ingress-dnspod-solver key: DOMAIN - name: POLICY valueFrom: configMapKeyRef: name: ingress-dnspod-solver key: POLICY - name: TENCENT_SECRET_ID valueFrom: configMapKeyRef: name: ingress-dnspod-solver key: TENCENT_SECRET_ID - name: TENCENT_SECRET_KEY valueFrom: secretKeyRef: name: ingress-dnspod-solver key: TENCENT_SECRET_KEY - name: RECORD_VALUE valueFrom: configMapKeyRef: name: ingress-dnspod-solver key: RECORD_VALUE 1 kubectl apply -f ingress-dnspod-solver.yaml Source Code Analysis and Build\rThe source code is written in go and uses go mod to manage dependencies.\nmain.go\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 package main import ( \u0026#34;fmt\u0026#34; \u0026#34;github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common\u0026#34; \u0026#34;github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile\u0026#34; dnspod \u0026#34;github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/dnspod/v20210323\u0026#34; \u0026#34;go.uber.org/zap\u0026#34; networkingv1 \u0026#34;k8s.io/api/networking/v1\u0026#34; \u0026#34;k8s.io/client-go/informers\u0026#34; \u0026#34;k8s.io/client-go/kubernetes\u0026#34; \u0026#34;k8s.io/client-go/rest\u0026#34; \u0026#34;k8s.io/client-go/tools/cache\u0026#34; \u0026#34;k8s.io/client-go/tools/clientcmd\u0026#34; \u0026#34;os\u0026#34; ) var ( logger *zap.Logger version string policy string domain string secretId string secretKey string recordValue string dnsPodClient *dnspod.Client clientSet *kubernetes.Clientset ) func initLogger() { logger, _ = zap.NewDevelopment() defer func(logger *zap.Logger) { err := logger.Sync() if err != nil { fmt.Println(\u0026#34;Logger sync error: \u0026#34;, err) } }(logger) } func initCheck() { version = os.Getenv(\u0026#34;VERSION\u0026#34;) if version == \u0026#34;\u0026#34; { version = \u0026#34;2021-03-23\u0026#34; } policy = os.Getenv(\u0026#34;POLICY\u0026#34;) if policy == \u0026#34;\u0026#34; { policy = \u0026#34;retain\u0026#34; } domain = os.Getenv(\u0026#34;DOMAIN\u0026#34;) if domain == \u0026#34;\u0026#34; { logger.Error(\u0026#34;Please set the environment variable `DOMAIN`\u0026#34;) os.Exit(1) } secretId = os.Getenv(\u0026#34;TENCENT_SECRET_ID\u0026#34;) secretKey = os.Getenv(\u0026#34;TENCENT_SECRET_KEY\u0026#34;) if secretId == \u0026#34;\u0026#34; || secretKey == \u0026#34;\u0026#34; { logger.Error(\u0026#34;Please set the environment variables `TENCENT_SECRET_ID` and `TENCENT_SECRET_KEY`\u0026#34;) os.Exit(1) } recordValue = os.Getenv(\u0026#34;RECORD_VALUE\u0026#34;) if recordValue == \u0026#34;\u0026#34; { logger.Error(\u0026#34;Please set the environment variable `RECORD_VALUE`\u0026#34;) os.Exit(1) } logger.Info(\u0026#34;---------------------------------\u0026#34;) logger.Info(\u0026#34;Version\u0026#34; + \u0026#34;: \u0026#34; + version) logger.Info(\u0026#34;Policy\u0026#34; + \u0026#34;: \u0026#34; + policy) logger.Info(\u0026#34;Domain\u0026#34; + \u0026#34;: \u0026#34; + domain) logger.Info(\u0026#34;SecretId\u0026#34; + \u0026#34;: \u0026#34; + secretId) logger.Info(\u0026#34;SecretKey: *****\u0026#34;) logger.Info(\u0026#34;RecordValue\u0026#34; + \u0026#34;: \u0026#34; + recordValue) logger.Info(\u0026#34;---------------------------------\u0026#34;) } func getRecordDict() map[string]uint64 { recordDict := make(map[string]uint64) request := dnspod.NewDescribeRecordListRequest() request.Domain = common.StringPtr(domain) request.RecordType = common.StringPtr(\u0026#34;A\u0026#34;) request.Offset = common.Uint64Ptr(0) request.Limit = common.Uint64Ptr(10) // Response corresponds to the DescribeRecordListResponse instance response, err := dnsPodClient.DescribeRecordList(request) if err != nil { logger.Panic(err.Error()) } for i := 0; i \u0026lt; len(response.Response.RecordList); i++ { recordDict[*response.Response.RecordList[i].Name] = *response.Response.RecordList[i].RecordId } domainCount := *response.Response.RecordCountInfo.ListCount domainTotal := *response.Response.RecordCountInfo.TotalCount for domainCount \u0026lt; domainTotal { request.Offset = common.Uint64Ptr(domainCount) response, err = dnsPodClient.DescribeRecordList(request) if err != nil { logger.Panic(err.Error()) } for i := 0; i \u0026lt; len(response.Response.RecordList); i++ { recordDict[*response.Response.RecordList[i].Name] = *response.Response.RecordList[i].RecordId } domainCount += *response.Response.RecordCountInfo.ListCount } logger.Info(\u0026#34;RecordDict: \u0026#34;, zap.Any(\u0026#34;RecordDict\u0026#34;, recordDict)) return recordDict } func createRecord(subDomain string) { request := dnspod.NewCreateRecordRequest() request.Domain = common.StringPtr(domain) request.SubDomain = common.StringPtr(subDomain) request.RecordType = common.StringPtr(\u0026#34;A\u0026#34;) request.RecordLine = common.StringPtr(\u0026#34;默认\u0026#34;) request.Value = common.StringPtr(recordValue) _, err := dnsPodClient.CreateRecord(request) if err != nil { logger.Panic(err.Error()) } } func updateRecord(recordId uint64, subDomain string) { request := dnspod.NewModifyRecordRequest() request.Domain = common.StringPtr(domain) request.RecordId = common.Uint64Ptr(recordId) request.SubDomain = common.StringPtr(subDomain) request.RecordType = common.StringPtr(\u0026#34;A\u0026#34;) request.RecordLine = common.StringPtr(\u0026#34;默认\u0026#34;) request.Value = common.StringPtr(recordValue) _, err := dnsPodClient.ModifyRecord(request) if err != nil { logger.Panic(err.Error()) } } func deleteRecord(recordId uint64) { request := dnspod.NewDeleteRecordRequest() request.Domain = common.StringPtr(domain) request.RecordId = common.Uint64Ptr(recordId) _, err := dnsPodClient.DeleteRecord(request) if err != nil { logger.Panic(err.Error()) } } func addHandler(obj interface{}) { logger.Info(\u0026#34;Detected Ingress Add event\u0026#34;) recordDict := getRecordDict() addIngress := obj.(*networkingv1.Ingress) for i := 0; i \u0026lt; len(addIngress.Spec.Rules); i++ { customDomain := addIngress.Spec.Rules[i].Host // Check if customDomain ends with domain if customDomain[len(customDomain)-len(domain):] != domain { continue } subDomain := customDomain[:len(customDomain)-len(domain)-1] if _, ok := recordDict[subDomain]; !ok { createRecord(subDomain) } else { if policy == \u0026#34;update\u0026#34; { updateRecord(recordDict[subDomain], subDomain) } } } } func updateHandler(oldObj, newObj interface{}) { logger.Info(\u0026#34;Detected Ingress Update event\u0026#34;) recordDict := getRecordDict() newIngress := newObj.(*networkingv1.Ingress) for i := 0; i \u0026lt; len(newIngress.Spec.Rules); i++ { customDomain := newIngress.Spec.Rules[i].Host // Check if customDomain ends with domain if customDomain[len(customDomain)-len(domain):] != domain { continue } subDomain := customDomain[:len(customDomain)-len(domain)-1] if _, ok := recordDict[subDomain]; !ok { createRecord(subDomain) } else { updateRecord(recordDict[subDomain], subDomain) } } } func deleteHandler(obj interface{}) { logger.Info(\u0026#34;Detected Ingress Delete event\u0026#34;) recordDict := getRecordDict() deleteIngress := obj.(*networkingv1.Ingress) for i := 0; i \u0026lt; len(deleteIngress.Spec.Rules); i++ { customDomain := deleteIngress.Spec.Rules[i].Host // Check if customDomain ends with domain if customDomain[len(customDomain)-len(domain):] != domain { continue } subDomain := customDomain[:len(customDomain)-len(domain)-1] if _, ok := recordDict[subDomain]; ok { deleteRecord(recordDict[subDomain]) } } } func k8sInformer() { // Create shared Informer factory factory := informers.NewSharedInformerFactory(clientSet, 0) // Get Ingress watcher Informer ingressInformer := factory.Networking().V1().Ingresses().Informer() // Add event handlers for Ingress resource changes _, err := ingressInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ AddFunc: addHandler, UpdateFunc: updateHandler, DeleteFunc: deleteHandler, }) if err != nil { return } // Start informer and wait for cache sync stopCh := make(chan struct{}) defer close(stopCh) factory.Start(stopCh) factory.WaitForCacheSync(stopCh) // Block to keep listening for events \u0026lt;-stopCh } func main() { // Initialization initLogger() initCheck() // Create DNSPod client credential := common.NewCredential(secretId, secretKey) cpf := profile.NewClientProfile() cpf.HttpProfile.Endpoint = \u0026#34;dnspod.tencentcloudapi.com\u0026#34; dnsPodClient, _ = dnspod.NewClient(credential, \u0026#34;\u0026#34;, cpf) // Create Kubernetes client var config *rest.Config var err error // Check if running inside cluster if os.Getenv(\u0026#34;KUBERNETES_SERVICE_HOST\u0026#34;) != \u0026#34;\u0026#34; { config, err = rest.InClusterConfig() if err != nil { logger.Panic(err.Error()) } } else { if _, err = os.Stat(\u0026#34;./.kube/config\u0026#34;); os.IsNotExist(err) { logger.Panic(\u0026#34;Please set the kubeconfig file\u0026#34;) } config, err = clientcmd.BuildConfigFromFlags(\u0026#34;\u0026#34;, \u0026#34;./.kube/config\u0026#34;) } clientSet, err = kubernetes.NewForConfig(config) if err != nil { logger.Panic(err.Error()) } k8sInformer() } Dockerfile\n1 2 3 4 5 6 7 8 9 10 FROM golang:alpine AS builder WORKDIR /app COPY . . RUN go mod download RUN go build -o main . FROM alpine:latest WORKDIR /app COPY --from=builder /app/main . CMD [\u0026#34;./main\u0026#34;] Build image\n1 docker build -t harbor.example.com/devops/ingress-dnspod-solver:latest . Push image\n1 docker push harbor.example.com/devops/ingress-dnspod-solver:latest ","date":"2024-12-17T00:00:00Z","permalink":"/en/p/k8s_ingress_domain_auto_resolution/","title":"Automatic Domain Resolution and Certificate Management for Kubernetes Ingress"},{"content":"\rUsing Ingress | Kubernetes as the application traffic proxy for K8s, and configuring certificates\nService Deployment\rSince the K8S cluster used is not provided by a cloud provider, deploying metallb is required when using LoadBalancer-type services for ingress-nginx. MetalLB is a load balancer implementation for bare-metal Kubernetes clusters, using standard routing protocols.\nDeploy MetalLB\r1 2 3 4 5 6 7 8 9 # Preview configuration changes (returns non-zero if changes detected) kubectl get configmap kube-proxy -n kube-system -o yaml | \\ sed -e \u0026#34;s/strictARP: false/strictARP: true/\u0026#34; | \\ kubectl diff -f - -n kube-system # Apply ARP configuration changes (returns non-zero only on errors) kubectl get configmap kube-proxy -n kube-system -o yaml | \\ sed -e \u0026#34;s/strictARP: false/strictARP: true/\u0026#34; | \\ kubectl apply -f - -n kube-system 1 2 # Install MetalLB components kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.8/config/manifests/metallb-native.yaml Create ip-pool.yaml to define the IP address pool:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: default namespace: metallb-system spec: addresses: - 172.16.0.90/32 # Manual IP assignment required for bare-metal clusters autoAssign: true --- apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: default namespace: metallb-system spec: ipAddressPools: - default Deploy ingress-nginx\r1 kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.11.2/deploy/static/provider/cloud/deploy.yaml Configuring Certificates\rHere we introduce 3 certificate configuration methods: manually generating self-signed certificates, manually obtaining trusted certificates, and using cert-manager to acquire certificates.\nManual Self-Signed Certificate\rYou can use OpenSSL or other tools to apply for certificates. The tls.crt certificate file and tls.key private key file, the base64 encoded content of the certificate and private key, can customize the relevant information of the certificate: country, province, city, company, department, domain name, validity period, etc.\n1 2 3 4 5 6 7 8 9 10 11 # Generate private key openssl genrsa -out tls.key 2048 # Generate private key file tls.key # Generate certificate request openssl req -new -key tls.key -out tls.csr -subj \u0026#34;/C=CN/ST=\u0026lt;state\u0026gt;/L=\u0026lt;city\u0026gt;/O=\u0026lt;company\u0026gt;/OU=\u0026lt;department\u0026gt;/CN=\u0026lt;domain\u0026gt;\u0026#34; # Generate certificate request file tls.csr # Generate certificate openssl x509 -req -in tls.csr -signkey tls.key -out tls.crt -days 3650 # Generate certificate file tls.crt # View certificate information openssl x509 -in tls.crt -text -noout test-ingress.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 apiVersion: v1 kind: Namespace metadata: name: test-ingress --- apiVersion: apps/v1 kind: Deployment metadata: name: service1 namespace: test-ingress spec: replicas: 1 selector: matchLabels: app: service1 template: metadata: labels: app: service1 spec: containers: - name: service1 image: nginx:alpine --- apiVersion: v1 kind: Service metadata: name: service1 namespace: test-ingress spec: selector: app: service1 ports: - name: http protocol: TCP port: 80 targetPort: 80 # letsencrypt certificate (manually created) --- apiVersion: v1 kind: Secret metadata: name: example-com namespace: test-ingress data: tls.crt: \u0026lt;base64 encoded cert\u0026gt; tls.key: \u0026lt;base64 encoded key\u0026gt; type: kubernetes.io/tls --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-ingress namespace: test-ingress spec: ingressClassName: nginx tls: - hosts: - test.example.com secretName: example-com rules: - host: test.example.com http: paths: - path: / pathType: Prefix backend: service: name: service1 port: number: 80 1 kubectl apply -f test-ingress.yaml Manual Trusted Certificate Application\rTo apply for a certificate, you can use Let\u0026rsquo;s Encrypt or other Certificate Authorities (CAs).\nThe TLS certificate file (tls.crt) and private key file (tls.key) must be Base64-encoded. The default validity period is 90 days.\ntest-ingress.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 apiVersion: v1 kind: Namespace metadata: name: test-ingress --- apiVersion: apps/v1 kind: Deployment metadata: name: service1 namespace: test-ingress spec: replicas: 1 selector: matchLabels: app: service1 template: metadata: labels: app: service1 spec: containers: - name: service1 image: nginx:alpine --- apiVersion: v1 kind: Service metadata: name: service1 namespace: test-ingress spec: selector: app: service1 ports: - name: http protocol: TCP port: 80 targetPort: 80 # letsencrypt certificate (manually created) --- apiVersion: v1 kind: Secret metadata: name: example-com namespace: test-ingress data: tls.crt: \u0026lt;base64 encoded cert\u0026gt; tls.key: \u0026lt;base64 encoded key\u0026gt; type: kubernetes.io/tls --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-ingress namespace: test-ingress spec: ingressClassName: nginx tls: - hosts: - test.example.com secretName: example-com rules: - host: test.example.com http: paths: - path: / pathType: Prefix backend: service: name: service1 port: number: 80 1 kubectl apply -f test-ingress.yaml Using cert-manager to Obtain Certificates\rcert-manager is a Kubernetes certificate management controller that utilizes CustomResourceDefinitions (CRDs), offering features such as certificate application, issuance, renewal, and deletion. Below are three methods for obtaining certificates using cert-manager: SelfSigned Issuer-type certificates, ACME Issuer-type certificates, and CA Issuer-type certificates.\nInstalling cert-manager\r1 kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml By default, cert-manager will be installed in the cert-manager namespace. Although it can be deployed in a different namespace, this requires modifications to the deployment manifests.\nAfter installation, verify the deployment by checking the pods in the cert-manager namespace:\n1 2 3 4 5 6 $ kubectl get pods --namespace cert-manager NAME READY STATUS RESTARTS AGE cert-manager-cainjector-5fd6444f95-kmbmd 1/1 Running 0 60m cert-manager-d894bbbd4-lrwp5 1/1 Running 0 60m cert-manager-webhook-869674f96f-ljffr 1/1 Running 0 60m Configuring SelfSigned Issuer Type Certificates\rtest-ingress.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 apiVersion: v1 kind: Namespace metadata: name: test-ingress --- apiVersion: apps/v1 kind: Deployment metadata: name: service1 namespace: test-ingress spec: replicas: 1 selector: matchLabels: app: service1 template: metadata: labels: app: service1 spec: containers: - name: service1 image: nginx:alpine --- apiVersion: v1 kind: Service metadata: name: service1 namespace: test-ingress spec: selector: app: service1 ports: - name: http protocol: TCP port: 80 targetPort: 80 # SelfSigned Issuer: 100 years, automatically created --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: selfsigned-issuer namespace: test-ingress spec: selfSigned: {} # SelfSigned Issuer: 100 years, automatically created --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: example-com namespace: test-ingress spec: secretName: example-com duration: 876000h # 100 years renewBefore: 720h # The certificate will be renewed 30 days before expiration issuerRef: name: selfsigned-issuer kind: ClusterIssuer commonName: test.example.com subject: organizations: - \u0026#39;*** Technology Co., Ltd.\u0026#39; organizationalUnits: - \u0026#39;*** Technology Co., Ltd. Operations Department\u0026#39; isCA: true privateKey: algorithm: RSA encoding: PKCS1 size: 2048 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-ingress namespace: test-ingress annotations: cert-manager.io/cluster-issuer: selfsigned-issuer spec: ingressClassName: nginx tls: - hosts: - test.example.com secretName: example-com rules: - host: test.example.com http: paths: - path: / pathType: Prefix backend: service: name: service1 port: number: 80 1 kubectl apply -f test-ingress.yaml ACME (Automated Certificate Management Environment) is a protocol used for automated certificate issuance and renewal. The ACME protocol is standardized by the IETF, with the most common current implementation being Let\u0026rsquo;s Encrypt. A key feature of the ACME protocol is that it allows Certificate Authorities (CAs) to verify the identity of certificate requesters without requiring human intervention. Another critical feature of the ACME protocol is its ability to enable automatic certificate renewal by CAs without manual involvement. The Issuer type represents an individual account registered with an ACME certificate authority server. When creating a new ACME Issuer, the Certificate Manager will generate a private key used for identification on the ACME server. By default, certificates issued by public ACME servers are typically trusted by client computers. This means that, for example, websites secured by ACME certificates issued for specific URLs will be automatically trusted by most web browsers. ACME certificates are generally free.\nConfiguring ACME Issuer with HTTP01 Certificate Type\rThe HTTP01 challenge is completed by exposing a computed secret key at a publicly accessible HTTP URL endpoint. This URL will use the domain name for which the certificate is being requested. Once the ACME server can retrieve this secret key from the URL via the internet, it verifies your ownership of the domain.\nWhen creating an HTTP01 challenge, cert-manager will automatically configure your cluster ingress to route traffic accessing this URL to a small web server that presents the secret key. In simpler terms, cert-manager automatically creates an ingress route to a lightweight web server responsible for displaying the verification key, thereby proving domain ownership.\nNote:\nFake certificate (1-year validity, auto-created): kubernetes ingress controller fake cert Example file: test-ingress.yaml 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 apiVersion: v1 kind: Namespace metadata: name: test-ingress --- apiVersion: apps/v1 kind: Deployment metadata: name: service1 namespace: test-ingress spec: replicas: 1 selector: matchLabels: app: service1 template: metadata: labels: app: service1 spec: containers: - name: service1 image: nginx:alpine --- apiVersion: v1 kind: Service metadata: name: service1 namespace: test-ingress spec: selector: app: service1 ports: - name: http protocol: TCP port: 80 targetPort: 80 # ACME certificate, automatically created --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: sencrypt-prod namespace: test-ingress spec: acme: email: CoderKang@hotmail.com privateKeySecretRef: name: sencrypt-prod server: https://acme-v02.api.letsencrypt.org/directory solvers: - http01: ingress: class: nginx --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-ingress namespace: test-ingress annotations: cert-manager.io/cluster-issuer: sencrypt-prod spec: ingressClassName: nginx tls: - hosts: - test.example.com secretName: example-com rules: - host: test.example.com http: paths: - path: / pathType: Prefix backend: service: name: service1 port: number: 80 1 kubectl apply -f test-ingress.yaml Configuring ACME Issuer dns01-type Certificates\rThe DNS01 challenge is completed by providing a cryptographic key that exists in a DNS TXT record. Once this TXT record propagates across the internet, the ACME server can retrieve the key via DNS queries and verify that the client requesting the certificate is the domain owner.\nIf proper permissions are granted, cert-manager will automatically add this TXT record to your specified DNS provider.\nHowever, since Tencent Cloud DNS is used here, and cert-manager does not natively support Tencent Cloud DNS, a custom resolver (cert-manager webhook) is required.\nDNS01 - cert-manager Documentation\n1 kubectl apply -f https://raw.githubusercontent.com/imroc/cert-manager-webhook-dnspod/master/bundle.yaml test-ingress.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 apiVersion: v1 kind: Namespace metadata: name: test-ingress --- apiVersion: apps/v1 kind: Deployment metadata: name: service1 namespace: test-ingress spec: replicas: 1 selector: matchLabels: app: service1 template: metadata: labels: app: service1 spec: containers: - name: service1 image: nginx:alpine --- apiVersion: v1 kind: Service metadata: name: service1 namespace: test-ingress spec: selector: app: service1 ports: - name: http protocol: TCP port: 80 targetPort: 80 # letsencrypt automatically created, valid for 90 days # Configure the SecretId and SecretKey of the Tencent Cloud DNS provider # https://console.dnspod.cn/account/token/apikey --- apiVersion: v1 stringData: secret-key: \u0026lt;tencent cloud secret key\u0026gt; # Tencent Cloud SecretKey kind: Secret metadata: name: dnspod-secret namespace: cert-manager type: Opaque --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: sencrypt-prod namespace: test-ingress spec: acme: email: CoderKang@hotmail.com preferredChain: \u0026#34;\u0026#34; privateKeySecretRef: name: dnspod-letsencrypt server: https://acme-v02.api.letsencrypt.org/directory solvers: - dns01: webhook: config: secretId: \u0026lt;tencent cloud secret id\u0026gt; # Tencent Cloud SecretId secretKeyRef: key: secret-key name: dnspod-secret ttl: 600 groupName: acme.imroc.cc solverName: dnspod --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: example-com namespace: test-ingress spec: dnsNames: - test.example.com issuerRef: kind: ClusterIssuer name: sencrypt-prod secretName: example-com --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-ingress namespace: test-ingress annotations: cert-manager.io/cluster-issuer: sencrypt-prod spec: ingressClassName: nginx tls: - hosts: - test.example.com secretName: example-com rules: - host: test.example.com http: paths: - path: / pathType: Prefix backend: service: name: service1 port: number: 80 1 kubectl apply -f test-ingress.yaml ","date":"2024-12-16T00:00:00Z","permalink":"/en/p/k8s_ingress_nginx_certificate_configuration/","title":"Kubernetes Ingress-NGINX Service Certificate Configuration"},{"content":"Ingress Configuration\rRequires setting up ingress (see 13-K8s ingress-nginx Service Certificate) and enabling TCP forwarding (see 23-K8s ingress TCP Forwarding).\nJumpServer Deployment\r\u0026ldquo;Installation\u0026rdquo; 1 helm install jms-k8s ./jumpserver-v4.6.0 -n jumpserver --create-namespace -f values.yaml \u0026ldquo;Upgrade\u0026rdquo; 1 helm upgrade jms-k8s ./jumpserver-v4.6.0 -n jumpserver --create-namespace -f values.yaml \u0026ldquo;Uninstallation\u0026rdquo; 1 helm -n jumpserver delete jms-k8s Core Component Image\rModified core component to support DingTalk alerts.\n1 2 3 4 docker pull docker.1ms.run/jumpserver/core:v4.6.0-ce cd core docker build -t jumpserver/core:v4.6.1-ce . \u0026ldquo;Dockerfile\u0026rdquo; 1 2 3 FROM docker.1ms.run/jumpserver/core:v4.6.0-ce COPY ./notifications.py /opt/jumpserver/apps/notifications/notifications.py \u0026ldquo;notifications.py\u0026rdquo; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 import textwrap import time import traceback from itertools import chain from celery import shared_task from django.utils.translation import gettext_lazy as _ from html2text import HTML2Text from common.utils import lazyproperty from common.utils.timezone import local_now from notifications.backends import BACKEND from settings.utils import get_login_title from terminal.const import RiskLevelChoices from users.models import User from .models import SystemMsgSubscription, UserMsgSubscription __all__ = (\u0026#39;SystemMessage\u0026#39;, \u0026#39;UserMessage\u0026#39;, \u0026#39;system_msgs\u0026#39;, \u0026#39;Message\u0026#39;) system_msgs = [] user_msgs = [] class MessageType(type): def __new__(cls, name, bases, attrs: dict): clz = type.__new__(cls, name, bases, attrs) if \u0026#39;message_type_label\u0026#39; in attrs \\ and \u0026#39;category\u0026#39; in attrs \\ and \u0026#39;category_label\u0026#39; in attrs: message_type = clz.get_message_type() msg = { \u0026#39;message_type\u0026#39;: message_type, \u0026#39;message_type_label\u0026#39;: attrs[\u0026#39;message_type_label\u0026#39;], \u0026#39;category\u0026#39;: attrs[\u0026#39;category\u0026#39;], \u0026#39;category_label\u0026#39;: attrs[\u0026#39;category_label\u0026#39;], } if issubclass(clz, SystemMessage): system_msgs.append(msg) elif issubclass(clz, UserMessage): user_msgs.append(msg) return clz @shared_task( verbose_name=_(\u0026#39;Publish the station message\u0026#39;), description=_( \u0026#34;\u0026#34;\u0026#34;This task needs to be executed for sending internal messages for system alerts, work orders, and other notifications\u0026#34;\u0026#34;\u0026#34; ) ) def publish_task(receive_user_ids, backends_msg_mapper): Message.send_msg(receive_user_ids, backends_msg_mapper) def send_dingtalk_task(message): # Send to custom webhook import hmac, hashlib, base64, urllib.parse timestamp = str(round(time.time() * 1000)) # JumpServer credentials access_token = \u0026#39;\u0026lt;access token\u0026gt;\u0026#39; secret = \u0026#39;\u0026lt;secret\u0026gt;\u0026#39; secret_enc = secret.encode(\u0026#39;utf-8\u0026#39;) string_to_sign = f\u0026#39;{timestamp}\\n{secret}\u0026#39; string_to_sign_enc = string_to_sign.encode(\u0026#39;utf-8\u0026#39;) hmac_code = hmac.new(secret_enc, string_to_sign_enc, digestmod=hashlib.sha256).digest() sign = urllib.parse.quote_plus(base64.b64encode(hmac_code)) form_data = { \u0026#34;msgtype\u0026#34;: \u0026#34;markdown\u0026#34;, \u0026#34;markdown\u0026#34;: { \u0026#34;title\u0026#34;: \u0026#34;JumpServer Monitoring Alert\u0026#34;, \u0026#34;text\u0026#34;: message }, \u0026#34;at\u0026#34;: { \u0026#34;atMobiles\u0026#34;: [ \u0026#34;\u0026lt;mobile\u0026gt;\u0026#34; ], \u0026#34;isAtAll\u0026#34;: False } } import requests try: res = requests.post( url=f\u0026#39;https://oapi.dingtalk.com/robot/send?access_token={access_token}\u0026amp;timestamp={timestamp}\u0026amp;sign={sign}\u0026#39;, json=form_data) except Exception as e: with open(\u0026#39;/opt/error.log\u0026#39;, \u0026#39;a+\u0026#39;) as f: f.write(str(e)) class Message(metaclass=MessageType): \u0026#34;\u0026#34;\u0026#34; What\u0026#39;s encapsulated here? Templates for different messages with a unified sending interface - publish: Implementation relates to message subscription table structure - send_msg \u0026#34;\u0026#34;\u0026#34; message_type_label: str category: str category_label: str text_msg_ignore_links = True command = None @classmethod def get_message_type(cls): return cls.__name__ def publish_async(self): self.publish(is_async=True) @classmethod def gen_test_msg(cls): raise NotImplementedError def publish(self, is_async=False): raise NotImplementedError def get_backend_msg_mapper(self, backends): backends = set(backends) backends.add(BACKEND.SITE_MSG) # Site message is mandatory backends_msg_mapper = {} for backend in backends: backend = BACKEND(backend) if not backend.is_enable: continue get_msg_method = getattr(self, f\u0026#39;get_{backend}_msg\u0026#39;, self.get_common_msg) msg = get_msg_method() backends_msg_mapper[backend] = msg return backends_msg_mapper @staticmethod def send_msg(receive_user_ids, backends_msg_mapper): for backend, msg in backends_msg_mapper.items(): try: backend = BACKEND(backend) client = backend.client() users = User.objects.filter(id__in=receive_user_ids).all() client.send_msg(users, **msg) except NotImplementedError: continue except: traceback.print_exc() @classmethod def send_test_msg(cls, ding=True, wecom=False): msg = cls.gen_test_msg() if not msg: return from users.models import User users = User.objects.filter(username=\u0026#39;admin\u0026#39;) backends = [] if ding: backends.append(BACKEND.DINGTALK) if wecom: backends.append(BACKEND.WECOM) msg.send_msg(users, backends) @staticmethod def get_common_msg() -\u0026gt; dict: return {\u0026#39;subject\u0026#39;: \u0026#39;\u0026#39;, \u0026#39;message\u0026#39;: \u0026#39;\u0026#39;} def get_html_msg(self) -\u0026gt; dict: return self.get_common_msg() @staticmethod def html_to_markdown(html_msg): h = HTML2Text() h.body_width = 0 content = html_msg[\u0026#39;message\u0026#39;] html_msg[\u0026#39;message\u0026#39;] = h.handle(content) return html_msg def get_markdown_msg(self) -\u0026gt; dict: return self.html_to_markdown(self.get_html_msg()) def get_text_msg(self) -\u0026gt; dict: h = HTML2Text() h.body_width = 90 msg = self.get_html_msg() content = msg[\u0026#39;message\u0026#39;] h.ignore_links = self.text_msg_ignore_links msg[\u0026#39;message\u0026#39;] = h.handle(content) return msg @lazyproperty def common_msg(self) -\u0026gt; dict: return self.get_common_msg() @lazyproperty def text_msg(self) -\u0026gt; dict: msg = self.get_text_msg() return msg @lazyproperty def markdown_msg(self): return self.get_markdown_msg() @lazyproperty def get_jumpserver_dingtalk_msg(self) -\u0026gt; str: date_str = time.strftime(\u0026#34;%Y-%m-%d %H:%M:%S\u0026#34;, time.localtime()) msg = f\u0026#34;\u0026#34;\u0026#34;## \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;[JumpServer Monitoring Alert](https://jumpserver.example.com/)\u0026lt;/font\u0026gt;🔥 ### \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;Alert Status\u0026lt;/font\u0026gt;: {RiskLevelChoices.get_label(self.command[\u0026#39;risk_level\u0026#39;])} ### \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;Target Host\u0026lt;/font\u0026gt;: {self.command[\u0026#39;asset\u0026#39;]} ### \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;Target User\u0026lt;/font\u0026gt;: {self.command[\u0026#39;user\u0026#39;]} ### \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;Executed Command\u0026lt;/font\u0026gt;: `{self.command[\u0026#39;input\u0026#39;]}` ### \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;Alert Details\u0026lt;/font\u0026gt;: User {self.command[\u0026#39;user\u0026#39;]} executed a risky command `{self.command[\u0026#39;input\u0026#39;]}` on {self.command[\u0026#39;asset\u0026#39;]}, please handle promptly! ### \u0026lt;font color=\u0026#34;#FF0000\u0026#34;\u0026gt;Trigger Time\u0026lt;/font\u0026gt;: {date_str}\u0026#34;\u0026#34;\u0026#34; return msg @lazyproperty def html_msg(self) -\u0026gt; dict: msg = self.get_html_msg() return msg @lazyproperty def html_msg_with_sign(self): msg = self.get_html_msg() msg[\u0026#39;message\u0026#39;] = textwrap.dedent(\u0026#34;\u0026#34;\u0026#34; {} \u0026lt;small\u0026gt; \u0026lt;br /\u0026gt; — \u0026lt;br /\u0026gt; {} \u0026lt;/small\u0026gt; \u0026#34;\u0026#34;\u0026#34;).format(msg[\u0026#39;message\u0026#39;], self.signature) return msg @lazyproperty def text_msg_with_sign(self): msg = self.get_text_msg() msg[\u0026#39;message\u0026#39;] = textwrap.dedent(\u0026#34;\u0026#34;\u0026#34; {} — {} \u0026#34;\u0026#34;\u0026#34;).format(msg[\u0026#39;message\u0026#39;], self.signature) return msg @lazyproperty def signature(self): return get_login_title() # -------------------------------------------------------------- # Support different messaging formats def get_dingtalk_msg(self) -\u0026gt; dict: # DingTalk restricts identical messages within a day, add timestamp suffix message = self.markdown_msg[\u0026#39;message\u0026#39;] time = local_now().strftime(\u0026#39;%Y-%m-%d %H:%M:%S\u0026#39;) suffix = \u0026#39;\\n{}: {}\u0026#39;.format(_(\u0026#39;Time\u0026#39;), time) return { \u0026#39;subject\u0026#39;: self.markdown_msg[\u0026#39;subject\u0026#39;], \u0026#39;message\u0026#39;: message + suffix } def get_wecom_msg(self) -\u0026gt; dict: return self.markdown_msg def get_feishu_msg(self) -\u0026gt; dict: return self.markdown_msg def get_lark_msg(self) -\u0026gt; dict: return self.markdown_msg def get_email_msg(self) -\u0026gt; dict: return self.html_msg_with_sign def get_site_msg_msg(self) -\u0026gt; dict: return self.html_msg def get_slack_msg(self) -\u0026gt; dict: return self.markdown_msg def get_sms_msg(self) -\u0026gt; dict: return self.text_msg_with_sign @classmethod def get_all_sub_messages(cls): def get_subclasses(cls): \u0026#34;\u0026#34;\u0026#34;Returns all subclasses of argument, cls\u0026#34;\u0026#34;\u0026#34; if issubclass(cls, type): subclasses = cls.__subclasses__(cls) else: subclasses = cls.__subclasses__() for subclass in subclasses: subclasses.extend(get_subclasses(subclass)) return subclasses messages_cls = get_subclasses(cls) return messages_cls @classmethod def test_all_messages(cls, ding=True, wecom=False): messages_cls = cls.get_all_sub_messages() for _cls in messages_cls: try: _cls.send_test_msg(ding=ding, wecom=wecom) except NotImplementedError: continue class SystemMessage(Message): def publish(self, is_async=False): subscription = SystemMsgSubscription.objects.get( message_type=self.get_message_type() ) # Only send through enabled backends receive_backends = subscription.receive_backends receive_backends = BACKEND.filter_enable_backends(receive_backends) users = [ *subscription.users.all(), *chain(*[g.users.all() for g in subscription.groups.all()]) ] receive_user_ids = [u.id for u in users] backends_msg_mapper = self.get_backend_msg_mapper(receive_backends) if is_async: send_dingtalk_task(self.get_jumpserver_dingtalk_msg) # publish_task.delay(receive_user_ids, backends_msg_mapper) else: self.send_msg(receive_user_ids, backends_msg_mapper) @classmethod def post_insert_to_db(cls, subscription: SystemMsgSubscription): pass @classmethod def gen_test_msg(cls): raise NotImplementedError class UserMessage(Message): user: User def __init__(self, user): self.user = user def publish(self, is_async=False): \u0026#34;\u0026#34;\u0026#34; Send messages through user-configured channels \u0026#34;\u0026#34;\u0026#34; sub = UserMsgSubscription.objects.get(user=self.user) backends_msg_mapper = self.get_backend_msg_mapper(sub.receive_backends) receive_user_ids = [self.user.id] if is_async: send_dingtalk_task(self.get_jumpserver_dingtalk_msg) # publish_task.delay(receive_user_ids, backends_msg_mapper) else: self.send_msg(receive_user_ids, backends_msg_mapper) @classmethod def get_test_user(cls): from users.models import User return User.objects.all().first() @classmethod def gen_test_msg(cls): raise NotImplementedError Translated text here\nvalues.yaml\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 replicaCount: 1 image: repository: nginx pullPolicy: IfNotPresent tag: \u0026#34;\u0026#34; imagePullSecrets: [] nameOverride: \u0026#34;\u0026#34; fullnameOverride: \u0026#34;\u0026#34; serviceAccount: create: true annotations: {} name: \u0026#34;\u0026#34; podAnnotations: {} podSecurityContext: {} securityContext: {} service: type: ClusterIP port: 80 ingress: enabled: false resources: {} autoscaling: enabled: false nodeSelector: {} tolerations: [] affinity: {} Modify the Image Version of the Jumpserver Core Module\rTo modify the image version of the Jumpserver core module, follow these steps:\n1 2 3 4 5 6 7 # Step 1: Edit the core module deployment kubectl -n jumpserver edit deployments.apps jms-k8s-jumpserver-jms-core # Step 2: Locate the image field in the container configuration # Find the containers section and modify the image version to: # - name: jms-core # image: jumpserver/core:v4.6.1-ce ","date":"2024-11-19T00:00:00Z","permalink":"/en/p/k8s_jumpserver_deployment_dingtalk_alerts/","title":"Deploying JumpServer on Kubernetes with DingTalk Alert Integration"},{"content":"Macvlan\r📝\rNote\rNetwork drivers overview | Docker Docs\nMacvlan network driver | Docker Docs\nCertain applications, particularly legacy software or programs that monitor network traffic, require direct connections to the physical network. In such scenarios, you can utilize the macvlan network driver to assign each container\u0026rsquo;s virtual network interface a unique MAC address, making them appear as physical network interfaces directly attached to the physical network. This configuration requires specifying a physical interface on the Docker host for macvlan usage, along with defining the network\u0026rsquo;s subnet and gateway. You may also employ different physical network interfaces to isolate your macvlan networks.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 cat \u0026lt;\u0026lt;EOF \u0026gt; docker_network_macvlan.yaml name: docker_network_macvlan services: docker_network_macvlan: image: busybox container_name: docker_network_macvlan networks: macvlan_net: ipv4_address: 192.168.142.234 # Configure static IP privileged: true cap_add: - NET_ADMIN # Add network privileges command: sleep infinity ports: - \u0026#34;17000:17000\u0026#34; networks: macvlan_net: driver: macvlan # Use macvlan network type # Macvlan networks allow containers to have MAC addresses, making them appear as physical devices on the network. # The Docker daemon can route traffic through container\u0026#39;s MAC address. # Macvlan is often the best choice when dealing with legacy applications that expect direct physical network connection. driver_opts: parent: eno1 # Specify network interface ipam: config: - subnet: 192.168.142.0/24 # Subnet ip_range: 192.168.142.0/24 # IP range gateway: 192.168.142.1 # Gateway EOF 1 docker-compose -f docker_network_macvlan.yaml up -d 📌\rImportant\rThe container currently cannot communicate with the host machine (using eno1 NIC) or other containers.\nPipework\rjpetazzo/pipework: Software-Defined Networking tools for LXC (LinuX Containers) (github.com)\n1 sudo docker run -itd --name test ubuntu /bin/bash 1 sudo docker exec test ip addr show The host machine\u0026rsquo;s network is 172.16.0.100. Configure the network for container \u0026ldquo;test\u0026rdquo; and connect it to bridge br0, where the address after @ represents the gateway:\n1 2 sudo pipework br0 test 172.16.0.156/24@172.16.0.1 # ip addr add 172.16.0.254/24 dev br0 1 2 3 4 5 sudo ip addr add 172.16.0.100/24 dev br0; \\ sudo ip addr del 172.16.0.100/24 dev enp1s0; \\ sudo brctl addif br0 enp1s0; \\ sudo ip route del default; \\ sudo ip route add default via 172.16.0.1 dev br0 ","date":"2024-07-11T00:00:00Z","permalink":"/en/p/docker_macvlan_pipework_configuration/","title":"Docker Standalone IP Configuration Guide - Macvlan and Pipework Implementation"},{"content":"Kubernetes Pod Deletion Lifecycle\rDeleting a container in Kubernetes involves the following steps and concepts. Specifically, container lifecycle management is handled by Pods and controllers (such as Deployments, ReplicaSets, etc.). You can delete a Pod or container using the kubectl delete command in Kubernetes, and the exact behavior depends on the resource type involved.\nProcess of Deleting a Container\rAssume you execute the following command to delete a Pod (the container is one or more processes within the Pod):\n1 kubectl delete pod \u0026lt;pod-name\u0026gt; When this command is executed, the following process occurs:\nPod Scheduling Management: Controllers in Kubernetes (e.g., Deployment or ReplicaSet) monitor the number of Pod replicas and ensure the running Pod count matches the desired state. If you delete a Pod and the replica count is set to greater than 1 (e.g., a Deployment with 3 replicas), the controller will automatically create a new Pod to replace the deleted one.\nIf the replica count is set to 1 and there are no additional controllers managing the Pod, the Pod will not be recreated after deletion unless the replica count is manually adjusted.\nContainer Termination: When you delete a Pod, Kubernetes sends a termination signal (SIGTERM) to the containers running in the Pod. This gives the containers a grace period (default: 30 seconds) to perform cleanup operations and release resources.\nForced Termination: If a container does not stop within 30 seconds, it is forcibly terminated (via SIGKILL).\nPod Deletion: Once the containers terminate, the Pod itself is removed from the cluster, and associated resources (e.g., volumes) are cleaned up or reattached, depending on the volume type.\nResource Cleanup: The deleted container releases its allocated resources (CPU, memory, network connections, etc.). If the container uses mounted volumes (e.g., Persistent Volumes), their cleanup behavior is determined by the ReclaimPolicy (e.g., Retain or Delete).\nContainer Lifecycle Management\rIn Kubernetes, a container\u0026rsquo;s lifecycle is generally controlled by a Pod, while the Pod lifecycle itself is a process consisting of creation, execution, and termination phases.\nPod Lifecycle\rPending: The Pod is scheduled onto a node, but its containers have not yet started running. This is typically due to pending container image pulls or incomplete resource scheduling. Running: The containers are actively running, and the Pod is executing. Succeeded: All containers have terminated successfully, marking the end of the Pod\u0026rsquo;s lifecycle. Failed: At least one container exited abnormally (e.g., by returning status code 1 or another non-zero exit status) and cannot be restarted, causing the Pod to enter the Failed state. Unknown: Unable to retrieve the Pod\u0026rsquo;s status, usually due to node failures or network issues. Container Lifecycle\rA container\u0026rsquo;s lifecycle is managed by its Pod, but you can customize lifecycle behaviors in Pod configurations using lifecycle hooks, including:\nPreStop: Invoked before container termination to perform cleanup tasks, such as closing network connections or saving temporary data. PostStart: Invoked after container startup to execute initialization tasks. Lifecycle Example\rPod Creation: The Pod is created, and the containers begin initializing. The containers start pulling images and booting up. During startup, the PostStart hook (if configured) is executed. Pod Execution: Containers actively provide services during runtime. Resources (e.g., containers, volumes) remain active. Pod Deletion: When deleting the Pod, Kubernetes sends a SIGTERM signal to the containers to request graceful termination. If a container fails to exit within the defined grace period, Kubernetes forces termination via SIGKILL. Post-deletion, Pod-associated resources (e.g., network, volumes) are cleaned up, and the containers/Pod enter a terminated state. Container Exit: When a container exits, its exit status code is recorded. A status code of 0 indicates success; otherwise, it is marked as a failure. Based on the Pod\u0026rsquo;s restart policy (e.g., Always), Kubernetes may restart the container after termination. Restart Policy and Replica Count\rAlways (Default): If a container crashes or is deleted, Kubernetes will attempt to restart it. OnFailure: Containers are only restarted if they exit abnormally (return a non-zero exit code). Never: Containers will not restart automatically and will remain permanently stopped after exiting. Summary\rWhen you delete a container, Kubernetes manages the Pod and container lifecycle through the following steps:\nGraceful Container Shutdown: Sends a SIGTERM signal to allow graceful termination (default 30 seconds). Forced Termination: If the container does not exit within the specified time, a SIGKILL signal is sent to forcefully terminate it. Pod Deletion: After the container stops, the Pod resource is cleaned up. Controller Behavior: If the Pod is managed by a Deployment or ReplicaSet, the controller ensures the Pod count matches the desired state, creating new Pods as needed to replace deleted ones. If the replica count is 1, deleting the Pod will not automatically create a new Pod unless the replica count is manually adjusted or resources are recreated.\nGraceful Termination vs. Forced Termination of Containers\rIn Kubernetes, container termination has two main phases: Graceful Termination and Forced Termination. These phases differ in how Kubernetes handles the exit process and whether the container has an opportunity to perform cleanup operations.\nContainer Graceful Termination\rWhen you delete a Pod or container, Kubernetes will first attempt to terminate the container gracefully, which is known as Graceful Termination.\nSending SIGTERM Signal: When Kubernetes requests container termination, it sends a SIGTERM (termination signal). The container can catch this signal and begin normal shutdown operations.\nUpon receiving SIGTERM, the container can perform actions such as:\nClosing network connections Cleaning up occupied resources (e.g., file handles, temporary data) Executing custom shutdown logic (e.g., database commits, logging) Grace Period: After receiving SIGTERM, the container is granted a \u0026ldquo;Grace Period\u0026rdquo; to complete cleanup tasks. By default, this period is 30 seconds, which can be adjusted through the terminationGracePeriodSeconds field in the Pod manifest.\nGraceful Exit: If the container exits normally within the grace period (completes cleanup and exits), Kubernetes marks the container status as Succeeded or Terminated. The Pod will then handle subsequent actions according to termination policies (e.g., deletion or rescheduling).\nForced Termination of Containers\rIf containers fail to exit normally within the grace period, Kubernetes initiates forced termination. This occurs when containers do not complete cleanup operations or do not respond to termination requests, prompting Kubernetes to take stricter measures to terminate them.\nSending SIGKILL Signal: If a container does not terminate gracefully within the time specified by terminationGracePeriodSeconds, Kubernetes sends a SIGKILL (force-kill signal) to stop the container immediately. Unlike SIGTERM, the SIGKILL signal cannot be captured or handled by the container, preventing any cleanup operations from being executed. Immediate Termination: Upon receiving SIGKILL, the container is halted abruptly, and all running processes are killed. This means processes have no chance to release resources (e.g., file handles, temporary storage) and may leave behind uncleaned states. Inability to Clean Up Resources: Forcibly terminated containers cannot perform any cleanup operations, potentially resulting in unreleased resources such as temporary files, memory, database connections, etc. Differences Between Graceful Termination and Forced Termination of Containers\rFeature Graceful Termination Forced Termination Signal SIGTERM SIGKILL Cleanup Time Allows cleanup (Default: 30 seconds, configurable) No cleanup time, immediate kill Termination Process Container can catch SIGTERM for cleanup Container cannot catch SIGKILL Resource Release Resources (file handles, DB connections) released Resources may leak Container State Exits normally (Succeeded/Terminated) Forcibly stopped (Terminated) Trigger Scenario Normal shutdown (e.g., deletion request) Unresponsive or grace period expired Configuring Container Termination Behavior in Pod Specifications\rTo control container termination behavior, configure the terminationGracePeriodSeconds field in the Pod specification. Default: 30 seconds (configurable).\n1 2 3 4 5 6 7 8 9 apiVersion: v1 kind: Pod metadata: name: mypod spec: terminationGracePeriodSeconds: 60 # Set grace period to 60 seconds containers: - name: mycontainer image: myimage Container Lifecycle Hooks\rKubernetes also provides lifecycle hooks that allow you to insert custom behaviors during a container\u0026rsquo;s startup and shutdown processes, particularly executing additional operations when the container terminates:\nPreStop: Executes before container termination. You can use this hook to perform cleanup tasks, send notifications, or other operations. For example, terminating external service connections or closing database links. Example:\n1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: v1 kind: Pod metadata: name: mypod spec: containers: - name: mycontainer image: myimage lifecycle: preStop: exec: command: [\u0026#34;sh\u0026#34;, \u0026#34;-c\u0026#34;, \u0026#34;echo \u0026#39;Container is stopping\u0026#39; \u0026gt; /tmp/shutdown.log\u0026#34;] Summary\rGraceful Container Termination: Kubernetes sends a SIGTERM signal to request the container to exit gracefully. The container has an opportunity to perform cleanup operations and release resources, typically completing within the grace period. Forced Container Termination: If the container fails to exit normally within the grace period, Kubernetes sends a SIGKILL signal to forcibly terminate the container. The container cannot perform any cleanup and is immediately stopped, freeing up resources. By appropriately configuring a Pod\u0026rsquo;s grace period and lifecycle hooks, you can control container termination behaviors, ensuring minimized resource leaks and service disruptions during container exits.\nSIGTERM and SIGKILL\rSIGTERM and SIGKILL are two commonly used Unix/Linux signals. Both are used to send termination requests to processes, but they differ in their purpose, behavior, and consequences. Below is a detailed explanation of these two signals:\nSIGTERM (Signal Terminate)\rSignal Number: 15\nAction: SIGTERM is a signal requesting a process to terminate. It notifies the process to exit gracefully, allowing time to complete cleanup operations, save state, and release resources. This is the default signal for terminating processes.\nSending Method: You can send SIGTERM using the kill command or programmatically via kill(pid, SIGTERM).\nExample:\n1 kill \u0026lt;pid\u0026gt; # Sends SIGTERM by default Characteristics:\nSIGTERM enables graceful exit and can be caught and handled by the process. Upon receiving SIGTERM, a process may perform cleanup tasks such as closing database connections, saving temporary data, or releasing file handles. If the process handles SIGTERM, it can execute cleanup logic before terminating. If the process does not respond to SIGTERM, the operating system will wait by default for a period (usually 30 seconds) before sending SIGKILL. Handling Behavior:\nA process may choose to catch and handle SIGTERM, for example, by registering a signal handler function. If the process ignores SIGTERM and fails to exit within the timeout, SIGKILL will be enforced. Use Cases:\nUsed when allowing cleanup operations, e.g., closing network connections, saving data, or logging shutdown events. In Kubernetes or Docker, SIGTERM is sent first when deleting a container to request a graceful shutdown. SIGKILL (Signal Kill) - Forced Termination Signal\rNumber: 9\nFunction: SIGKILL is a signal that forces immediate termination of a process. It instructs the operating system to stop the target process without allowing any cleanup operations. This signal cannot be caught, ignored, or handled by the process.\nExample:\n1 kill -9 \u0026lt;pid\u0026gt; # Send SIGKILL signal Characteristics:\nSIGKILL is a mandatory termination signal; the targeted process cannot perform any cleanup operations. The operating system terminates the process immediately and releases its occupied resources, regardless of the process\u0026rsquo;s state. Processes cannot intercept or ignore SIGKILL; it removes the process from memory instantly. Upon sending SIGKILL, the process loses the opportunity to close files, free memory, write logs, or perform other essential cleanup tasks. Use Cases:\nUse SIGKILL to forcibly terminate a process when it does not respond to SIGTERM or is hung (e.g., frozen or stuck in an infinite loop). In operating systems, SIGKILL is employed to terminate unresponsive or zombie processes, ensuring complete cleanup. Comparison: SIGTERM vs. SIGKILL\rCharacteristic SIGTERM (15) SIGKILL (9) Function Requests graceful termination, allowing the process to perform cleanup tasks Forces immediate process termination; cannot be captured or handled Capturable/Handled Can be captured by the process; supports custom termination logic Cannot be captured or processed by the process Graceful Exit Permits resource cleanup (e.g., closing files, saving data) Instantly terminates the process without cleanup Ignorable Can be ignored by the process (if no handler is registered) Cannot be ignored; enforced termination Default Behavior System typically waits for process exit (default: 30 seconds) Immediately kills the process without delay Use Cases Safe process shutdown requiring resource release or state preservation Terminating unresponsive processes or overriding failed SIGTERM OS Behavior OS waits for the process to exit within a timeout OS forcibly terminates the process and releases all resources SIGTERM and SIGKILL in Kubernetes\rIn Kubernetes, when you delete a Pod or container, Kubernetes first sends a SIGTERM signal to the container, requesting its main process to gracefully exit. The process within the container has a specific period (30 seconds by default) to respond to SIGTERM and perform cleanup operations. If the container does not exit normally during this grace period, Kubernetes sends SIGKILL to forcibly terminate the container. This process aims to ensure graceful shutdown where possible, while using forceful termination if unresponsive.\nSIGTERM Signal\rSent to: The main process within the container.\nBehavior: When the container runs, there is a main process (typically the process defined by CMD or ENTRYPOINT at container startup). When Kubernetes requests to stop the container, it sends a SIGTERM signal to the container\u0026rsquo;s main process. SIGTERM is a graceful termination signal, meaning the process should handle it by performing cleanup tasks and exiting properly.\nBehavior inside the container: The main process has the opportunity to capture the SIGTERM signal and execute cleanup operations within the allotted time, such as:\nClosing database connections. Saving log files or persistent data. Cleaning up temporary files or caches. The container may capture the signal and execute custom termination logic (e.g., via the lifecycle hook PreStop) upon receiving SIGTERM.\nGrace Period: Kubernetes provides a default of 30 seconds (configurable) for the container\u0026rsquo;s main process to shut down gracefully. If the process does not exit within this timeframe, Kubernetes will send a SIGKILL signal to forcibly terminate the container.\nSIGKILL Signal\rTarget: Processes within the container (main process or any other processes) Behavior: SIGKILL is an uncatchable and unignorable signal. It directly terminates processes in the container without allowing any cleanup operations. When a container fails to exit gracefully during the grace period after receiving SIGTERM, Kubernetes sends SIGKILL to forcefully terminate container processes. Process Behavior: Upon receiving SIGKILL, container processes terminate immediately without performing any cleanup. All file handles, network connections, database transactions may be abruptly interrupted, potentially leaving resources not properly released. Container Termination: With processes forcibly terminated, the container ends its lifecycle. The Pod controller (e.g., Deployment or ReplicaSet) will then reschedule new containers based on replica counts. Relationship Between Containers and Processes\rIn Kubernetes, a container essentially serves as a runtime environment encapsulating one or multiple processes (typically a single main process). Discussions about container startup, shutdown, or restart fundamentally refer to managing the lifecycle of processes within the container. When Kubernetes sends SIGTERM or SIGKILL, these signals are directed to processes within the container rather than the container itself. The container itself has no executable code or processes - it acts merely as a runtime environment, while the containerized processes are the actual entities handling these signals. How Processes in Containers Respond to SIGTERM and SIGKILL\rResponse to SIGTERM: Processes within a container can capture the SIGTERM signal. The container\u0026rsquo;s main process may implement code logic to handle this signal for cleanup operations or resource release.\nFor example, if the container runs an HTTP server process, it might stop accepting new requests and gracefully complete ongoing requests upon receiving SIGTERM, then exit.\nExample code (Python):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 import signal import time def graceful_shutdown(signum, frame): print(\u0026#34;Received SIGTERM, shutting down gracefully...\u0026#34;) # Perform cleanup operations, such as closing database connections, flushing caches, etc. time.sleep(2) # Simulate cleanup tasks exit(0) signal.signal(signal.SIGTERM, graceful_shutdown) print(\u0026#34;Running... Press CTRL+C to exit.\u0026#34;) while True: time.sleep(1) Response to SIGKILL: Processes in containers cannot capture or handle the SIGKILL signal. The SIGKILL signal terminates the process directly, causing the process to exit immediately without any opportunity for cleanup.\nWhen SIGKILL is received, the process is terminated abruptly by the system, leaving no time to save state or release resources. Recovery from this termination is not possible.\nSummary\rSIGTERM: A graceful termination request allowing processes to clean up resources and exit properly. If the process does not terminate within the specified time limit, the operating system sends SIGKILL. SIGKILL: A forced termination signal that cannot be intercepted or handled by the process. The process is immediately terminated without any cleanup. These two signals are commonly used in Unix/Linux system management. Understanding their roles and behaviors helps in effectively managing process lifecycles, particularly in containerized environments (e.g., Kubernetes) and automated operations.\nSIGTERM and SIGKILL signals in Kubernetes are ultimately sent to the processes running inside the container, not directly to the container itself. A container is a runtime environment containing one or more processes. Therefore, these signals target the actual processes within the container.\nSIGTERM is sent to processes inside the container to allow graceful exit and cleanup. If processes fail to terminate within the grace period (default 30 seconds), Kubernetes follows up with SIGKILL. SIGKILL (sent to container processes) cannot be caught or ignored. Processes are forcibly terminated immediately, bypassing cleanup routines. In summary, containers manage internal processes as runtime environments. The lifecycle management of a container depends on how its internal processes respond to these termination signals.\n","date":"2024-03-27T00:00:00Z","permalink":"/en/p/kubernetes_pod_termination_signals/","title":"Understanding Pod Termination Signals in Kubernetes"},{"content":"Log Format\n1 2024-01-29 16:11:11.189 |INFO | 1.1.1.1|2345 | com.smart.service.receive.impl.ReceiveServiceImpl:903 | Capability\u0026gt;Total 04 steps | 6df2f14fca4b40f6be89b9ef19382c42adasfasf Logstash Configuration Example\rdocker deployment\r1 2 3 4 5 6 7 8 9 10 11 12 13 [root@master logstash]# cat docker-compose.yaml version: \u0026#39;3\u0026#39; services: logstash: image: docker.elastic.co/logstash/logstash:8.12.0 container_name: logstash volumes: - ./conf/logstash.yml:/usr/share/logstash/config/logstash.yml - ./conf/conf.d:/usr/share/logstash/config/conf.d/ - ./logs:/opt ports: - 5044:5044 config files\nlogstash.yml\n1 2 3 http.host: \u0026#34;0.0.0.0\u0026#34; xpack.monitoring.elasticsearch.hosts: [ \u0026#34;http://192.168.142.106:9200\u0026#34; ] path.config: /usr/share/logstash/config/conf.d/*.conf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 input { file { type =\u0026gt; \u0026#34;info_log\u0026#34; path =\u0026gt; \u0026#34;/opt/kaikai.log\u0026#34; discover_interval =\u0026gt; 10 # listen interval start_position =\u0026gt; \u0026#34;end\u0026#34; # sincedb_path =\u0026gt; \u0026#34;/usr/share/logstash/sincedb_kaikai\u0026#34; #start_position =\u0026gt; \u0026#34;beginning\u0026#34; codec =\u0026gt; multiline { pattern =\u0026gt; \u0026#34;^%{TIMESTAMP_ISO8601}\u0026#34; negate =\u0026gt; true what =\u0026gt; \u0026#34;previous\u0026#34; } } file { type =\u0026gt; \u0026#34;error_log\u0026#34; path =\u0026gt; \u0026#34;/opt/error.log\u0026#34; discover_interval =\u0026gt; 10 start_position =\u0026gt; \u0026#34;beginning\u0026#34; codec =\u0026gt; multiline { pattern =\u0026gt; \u0026#34;^%{TIMESTAMP_ISO8601}\u0026#34; negate =\u0026gt; true what =\u0026gt; \u0026#34;previous\u0026#34; } } } filter { grok { match =\u0026gt; { \u0026#34;[log][file][path]\u0026#34; =\u0026gt; \u0026#34;/(?\u0026lt;logfilename\u0026gt;[^/]+)\\.log$\u0026#34; } # get file name logfilename } grok { match =\u0026gt; { \u0026#34;message\u0026#34; =\u0026gt; \u0026#34;%{DATA:time}\\|%{DATA:level}\\|%{DATA:ip}\\|%{DATA:pid}\\|%{DATA:source}\\|%{GREEDYDATA:content}\u0026#34;} } if \u0026#34;_grokparsefailure\u0026#34; in [tags] { mutate { add_field =\u0026gt; { \u0026#34;content\u0026#34; =\u0026gt; \u0026#34;%{message}\u0026#34; } add_field =\u0026gt; { \u0026#34;level\u0026#34; =\u0026gt; \u0026#34;ERROR\u0026#34; } } } } output { stdout { codec =\u0026gt; rubydebug } elasticsearch { hosts =\u0026gt; [\u0026#34;192.168.142.106:9200\u0026#34;] index =\u0026gt; \u0026#34;%{logfilename}-%{+YYYY-MM-dd}\u0026#34; # index by file name } } k8s deployment\rlogstash.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 apiVersion: v1 kind: ConfigMap metadata: name: log-file-config data: logstash.yml: | http.host: \u0026#34;0.0.0.0\u0026#34; xpack.monitoring.elasticsearch.hosts: [ \u0026#34;http://192.168.142.106:9200\u0026#34; ] #xpack.monitoring.elasticsearch.hosts: [ \u0026#34;http://192.168.142.106:9200\u0026#34; ] path.config: /usr/share/logstash/config/conf.d/*.conf collect.conf: | input { beats { port =\u0026gt; 5044 } } filter { grok { match =\u0026gt; { \u0026#34;[log][file][path]\u0026#34; =\u0026gt; [\u0026#34;/(?\u0026lt;logfilename\u0026gt;[^/]+)\\.log$\u0026#34;] } } grok { match =\u0026gt; { \u0026#34;message\u0026#34; =\u0026gt; \u0026#34;%{DATA:time}\\|%{DATA:level}\\|%{DATA:ip}\\|%{DATA:pid}\\|%{DATA:source}\\|%{GREEDYDATA:content}\u0026#34; } } if \u0026#34;_grokparsefailure\u0026#34; in [tags] { mutate { add_field =\u0026gt; { \u0026#34;content\u0026#34; =\u0026gt; \u0026#34;%{message}\u0026#34; } add_field =\u0026gt; { \u0026#34;level\u0026#34; =\u0026gt; \u0026#34;ERROR\u0026#34; } } } } output { stdout { codec =\u0026gt; rubydebug } elasticsearch { hosts =\u0026gt; [\u0026#34;192.168.142.106:9200\u0026#34;] index =\u0026gt; \u0026#34;%{logfilename}-%{+YYYY-MM-dd}\u0026#34; } } --- kind: Deployment apiVersion: apps/v1 metadata: name: logstash labels: app: logstash spec: replicas: 4 selector: matchLabels: app: logstash template: metadata: labels: app: logstash annotations: appName: logstash appType: java spec: containers: - name: logstash-logging image: registry.cn-beijing.aliyuncs.com/kaikai136/logstash:8.12.0 volumeMounts: - name: logstash-config mountPath: /usr/share/logstash/config/logstash.yml subPath: logstash.yml - name: logstash-config mountPath: /usr/share/logstash/config/conf.d/collect.conf subPath: collect.conf volumes: - name: logstash-config configMap: name: log-file-config items: - key: logstash.yml path: logstash.yml - key: collect.conf path: collect.conf imagePullSecrets: - name: my-harbor --- apiVersion: v1 kind: Service metadata: name: logstash-svc labels: app: logstash-svc spec: ports: - port: 5044 targetPort: 5044 protocol: TCP name: http nodePort: 32467 type: NodePort selector: app: logstash filebeat\rcollector test\rfilebeat.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 apiVersion: v1 kind: ConfigMap metadata: name: filebeat-config data: filebeat.yml: | filebeat.inputs: - type: log enabled: true paths: - /logs/*_info.log scan_frequency: 1s # set scan frequency to 1 second harvester_buffer_size: 32768 # increase harvester buffer size backoff_factor: 2 ignore_older: 24h # ignore files older than 24 hours close_inactive: 5m # close harvester inactive for 5 minutes clean_inactive: 72h # clean inactive harvester after 72 hours close_removed: true # close harvester when file is removed clean_removed: true # clean removed harvester close_eof: true # close harvester when file reaches EOF multiline.pattern: \u0026#39;^[0-9]{4}\u0026#39; # match multiline logs multiline.negate: true multiline.match: after var.convert_timezone: true # convert timezone encoding: UTF-8 # set encoding fields: wisentIp: 0.0.0.0 # add custom field log_type: info_log - type: log enabled: true paths: - /logs/*_error.log scan_frequency: 1s # set scan frequency to 1 second harvester_buffer_size: 32768 # increase harvester buffer size backoff_factor: 2 ignore_older: 24h # ignore files older than 24 hours close_inactive: 5m # close harvester inactive for 5 minutes clean_inactive: 72h # clean inactive harvester after 72 hours close_removed: true # close harvester when file is removed clean_removed: true # clean removed harvester close_eof: true # close harvester when file reaches EOF multiline.pattern: \u0026#39;^[0-9]{4}\u0026#39; # match multiline logs multiline.negate: true multiline.match: after var.convert_timezone: true # convert timezone encoding: UTF-8 # set encoding fields: wisentIp: 0.0.0.0 # add custom field log_type: error_log queue.mem: events: 4096 # memory queue size flush.min_events: 2048 # minimum flush events flush.timeout: 1s # flush timeout #queue.disk: # max_size: 1024mb # maximum disk usage # segment_size: 10mb # size of each segment # max_retries: 3 # maximum retries logging.level: debug filebeat.shutdown_timeout: 30s # ensure enough time to process current events when shutting down Filebeat throttle: 5s # set the time Filebeat waits before being throttled logging.level: info # set logging level to info for detailed run information logging.to_files: true logging.files: path: /usr/share/filebeat/logs name: filebeat keepfiles: 7 permissions: 0644 output.logstash: hosts: [\u0026#34;logstash-svc.default.svc.cluster.local:5044\u0026#34;] --- kind: Deployment apiVersion: apps/v1 metadata: name: filebeat labels: app: filebeat spec: replicas: 1 selector: matchLabels: app: filebeat template: metadata: labels: app: filebeat annotations: appName: filebeat appType: java spec: containers: - name: filebeat-logging image: registry.cn-beijing.aliyuncs.com/kaikai136/filebeat:8.12.0 volumeMounts: - name: filebeat-config mountPath: /usr/share/filebeat/filebeat.yml subPath: filebeat.yml - name: myhostpath mountPath: /logs volumes: - name: filebeat-config configMap: name: filebeat-config items: - key: filebeat.yml path: filebeat.yml - name: myhostpath hostPath: path: /opt/kaikai/file-logstash/filebeat_log type: DirectoryOrCreate imagePullSecrets: - name: my-harbor ","date":"2024-03-09T00:00:00Z","permalink":"/en/p/logstash_filebeat_configuration/","title":"Log Collector Configuration Guide for Logstash and Filebeat"},{"content":"Android-iOS-macOS Docker\rhttps://github.com/sickcodes/dock-droid https://github.com/remote-android/redroid-doc https://github.com/budtmo/docker-android iOS Docker macOS Docker Redroid Docker\rNon-Root Version\rStart Redroid Container\n1 2 3 4 5 6 7 8 9 10 11 12 version: \u0026#34;3.0\u0026#34; services: redroid_11: image: redroid/redroid:11.0.0-latest # Android 11 container_name: redroid_11 privileged: true restart: always ports: - 5555:5555 volumes: - ../data/redroid_11:/data command: ro.secure=0 Root Version\rStart Redroid Container\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 #!/bin/bash # Delete old files rm -rf ./magisk* ./remove.rc ./setup.sh # Create a directory to store Magisk files if [ ! -d \u0026#34;/root/Redroid/MagiskOnRedroid\u0026#34; ]; then mkdir ~/Redroid/MagiskOnRedroid fi cd ~/Redroid/MagiskOnRedroid # Download Magisk APK find -maxdepth 1 -iname \u0026#34;magisk*\u0026#34; -not -name \u0026#34;*.apk\u0026#34; -exec rm -r {} \\; magisk_file=\u0026#34;app-debug.apk\u0026#34; if [ ! -f $magisk_file ]; then wget \u0026#34;https://cdn.jsdelivr.net/gh/topjohnwu/magisk-files@1cea72840fbf690f9a95512d03721f6a710fe02e/app-debug.apk\u0026#34; fi # Extract magisk files (x86_64 architecture) unzip -j $magisk_file \u0026#34;lib/x86_64/libmagisk64.so\u0026#34; -d magisk unzip -j $magisk_file \u0026#34;lib/x86_64/libbusybox.so\u0026#34; -d magisk mv -v magisk/libmagisk64.so magisk/magisk mv -v magisk/libbusybox.so magisk/busybox # Compress Magisk files tar --transform \u0026#39;s/.*\\///g\u0026#39; -cf ./magisk.tar --absolute-names $( find ~/Redroid/MagiskOnRedroid | grep -E \u0026#34;magisk/|app-debug.apk$\u0026#34; ) # Generate remove.rc cat \u0026lt;\u0026lt;\\EOF \u0026gt; ./remove.rc on early-init export PATH /sbin:/product/bin:/apex/com.android.runtime/bin:/apex/com.android.art/bin:/system_ext/bin:/system/bin:/system/xbin:/odm/bin:/vendor/bin:/vendor/xbin chmod 0700 /magisk.tar chown root root /magisk.tar chmod 0700 /setup.sh chown root root /setup.sh exec root root -- /setup.sh service magisk-d /sbin/magisk --daemon user root oneshot on boot start magisk-d on post-fs-data start logd rm /dev/.magisk-unblock start s1 wait /dev/.magisk-unblock 5 rm /dev/.magisk-unblock service s1 /sbin/magisk --post-fs-data user root oneshot service s2 /sbin/magisk --service class late_start user root oneshot on property:sys.boot_completed=1 exec /sbin/magisk --boot-complete on property:init.svc.zygote=restarting exec /sbin/magisk --zygote-restart on property:init.svc.zygote=stopped exec /sbin/magisk --zygote-restart EOF sudo chmod 644 ./remove.rc sudo chown root:root ./remove.rc # Generate setup.sh cat \u0026lt;\u0026lt;\\EOF \u0026gt; ./setup.sh #!/system/bin/sh # rm /system/fonts/NotoColorEmoji.ttf tmpPushed=/magisk rm -rf $tmpPushed mkdir $tmpPushed tar -xvf /magisk.tar --no-same-owner -C $tmpPushed umount /magisk.tar ; rm -v /magisk.tar mkdir /sbin chown root:root /sbin # chmod 0700 /sbin chmod 0751 /sbin cp $tmpPushed/magisk /sbin/ cp $tmpPushed/app-debug.apk /sbin/stub.apk find /sbin -type f -exec chmod 0755 {} \\; find /sbin -type f -exec chown root:root {} \\; # File structure: # /sbin/ # ├── magisk # └── stub.apk ln -f -s /sbin/magisk /system/xbin/su mkdir /product/bin chmod 751 /product/bin ln -f -s /sbin/magisk /product/bin/su # Add su links: # /product/bin/ # └── su -\u0026gt; /sbin/magisk mkdir -p /data/adb/magisk chmod 700 /data/adb mv $tmpPushed/busybox /data/adb/magisk/ chmod -R 755 /data/adb/magisk chmod -R root:root /data/adb/magisk # Directory structure: # /data/adb/ # ├── magisk # │ └── busybox # Cleanup # rm -rf $tmpPushed EOF sudo chmod 700 ./setup.sh sudo chown root:root ./setup.sh Start Redroid Container\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 version: \u0026#34;3.0\u0026#34; services: redroid_11_magisk: image: redroid/redroid:11.0.0-latest # 安卓 11 container_name: redroid_11_magisk privileged: true restart: always ports: - 5555:5555 volumes: - ../data/redroid_11_magisk:/data - ../MagiskOnRedroid/remove.rc:/vendor/etc/init/remove.rc - ../MagiskOnRedroid/setup.sh:/setup.sh - ../MagiskOnRedroid/magisk.tar:/magisk.tar command: ro.secure=0 Connect to Redroid Container\r1 2 adb connect 172.16.0.101:5555 scrcpy -s 172.16.0.101:5555 ","date":"2023-12-01T00:00:00Z","image":"/p/android_redroid_docker_configuration/android_redroid_docker_configuration.png","permalink":"/en/p/android_redroid_docker_configuration/","title":"Android Redroid Docker Implementation Guide"},{"content":"In Linux system security maintenance, the firewall is an important tool for protecting servers from unauthorized access. iptables, as a Linux kernel firewall tool, provides powerful and flexible network traffic control mechanisms. This article will delve into the core concepts, configuration methods, and common implementation scenarios of iptables, helping system administrators build a more secure server environment.\nWhat is iptables?\riptables is a firewall tool in the Linux kernel, directly interacting with the netfilter module, responsible for filtering, modifying, and forwarding network data packets. As a complete firewall framework, iptables provides fine-grained network traffic control capabilities, allowing complex network access policies to be formulated based on multiple conditions (such as source IP address, destination port, protocol type, etc.).\nCore Components of iptables\rThe structure of iptables is based on the concepts of \u0026ldquo;tables\u0026rdquo; and \u0026ldquo;chains\u0026rdquo;:\nTables：Organize rules with specific functionalities\nfilter：Default table, used for packet filtering nat：Used for network address translation mangle：Used for special packet modifications raw：Used for configuring exempted connection tracking security：Used for enforcing access control network rules Chains：Each table contains multiple chains, defining when rules are applied\nINPUT：Processes incoming packets OUTPUT：Processes outgoing packets FORWARD：Processes forwarded packets PREROUTING：Pre-routing processing POSTROUTING：Post-routing processing Basic Operations of iptables\rViewing Current Rules\r1 2 3 4 5 # View all rules in the filter table sudo iptables -L -v # View all rules in the nat table sudo iptables -t nat -L -v Adding Rules\r1 2 3 4 5 6 7 8 # Allow SSH connections (22 port) sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT # Allow established connections sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow local loopback interface sudo iptables -A INPUT -i lo -j ACCEPT Deleting Rules\r1 2 3 4 5 # Delete the first rule sudo iptables -D INPUT 1 # Delete a specific rule sudo iptables -D INPUT -p tcp --dport 80 -j ACCEPT Setting Default Policies\r1 2 3 4 5 # Set default INPUT policy to reject sudo iptables -P INPUT DROP # Set default OUTPUT policy to accept sudo iptables -P OUTPUT ACCEPT Common Implementation Scenarios and Configuration Examples\rBasic Server Protection Configuration\rA basic server protection configuration example that allows common services and defaults to rejecting other connections:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # Clear existing rules sudo iptables -F sudo iptables -X # Set default policies sudo iptables -P INPUT DROP sudo iptables -P FORWARD DROP sudo iptables -P OUTPUT ACCEPT # Allow local loopback sudo iptables -A INPUT -i lo -j ACCEPT # Allow established connections sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow SSH (22 port) sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT # Allow HTTP/HTTPS (80/443 ports) sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT # Allow ICMP (ping) sudo iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT Port Forwarding Configuration\rForward external 80 port requests to internal 8080 port:\n1 sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080 Limiting Connection Rate\rSimple configuration to prevent DoS attacks:\n1 sudo iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 4 -j DROP Blocking Specific IP Addresses\r1 2 # Block a specific IP sudo iptables -A INPUT -s 192.168.1.0/24 -j DROP Rule Persistence\riptables rules are lost after system restart, so persistence configuration is needed:\nDebian/Ubuntu System\r1 2 3 4 5 # Install iptables-persistent sudo apt-get install iptables-persistent # Save current rules sudo netfilter-persistent save CentOS/RHEL System\r1 2 # Save current rules sudo iptables-save \u0026gt; /etc/sysconfig/iptables Common Issues and Troubleshooting\rRule Ordering：iptables matches rules in order, stopping at the first match. Therefore, rule ordering is critical. Locking Risk：Adding DROP rules requires caution, as incorrect configuration can lead to inability to remotely connect to the server. It is recommended to test new rules with physical access when possible. Performance Considerations：Too many rules can affect network performance, so it is recommended to regularly clean up unnecessary rules. Logging and Monitoring：Use the LOG target to record rejected connections for troubleshooting: 1 sudo iptables -A INPUT -j LOG --log-prefix \u0026#34;iptables denied: \u0026#34; --log-level 7 Advanced Features\rNetwork Address Translation (NAT)\rConfigure simple NAT to share an Internet connection:\n1 2 3 4 5 # Enable IP forwarding echo 1 \u0026gt; /proc/sys/net/ipv4/ip_forward # Allow forwarding sudo iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT Traffic Control and QoS\rUse iptables\u0026rsquo;s mangle table to mark traffic, combined with tc (Traffic Control) for QoS:\n1 sudo iptables -t mangle -A PREROUTING -p tcp --dport 22 -j MARK --set-mark 1 Conclusion\riptables is a powerful firewall tool in the Linux kernel, mastering its use is crucial for server security. By properly configuring iptables rules, you can effectively control network traffic, protect servers from unauthorized access, and implement advanced features such as network address translation.\nFor more complex scenarios, consider using higher-level tools like ufw (Uncomplicated Firewall) or firewalld, which still use iptables but provide a more user-friendly interface.\nRegardless of the tool used, understanding the core concepts and working principles of iptables is essential for building a secure Linux network environment.\n","date":"2023-10-03T00:00:00Z","permalink":"/en/p/linux_firewall_iptables_guide/","title":"Comprehensive Guide to Linux Firewall Tool iptables: Configuration and Usage"},{"content":"Prometheus Comprehensive Monitoring Platform - Consul-Based Auto Discovery\nBackground\rConsul Documentation | Consul | HashiCorp Developer\nThe Prometheus configuration file prometheus-config.yaml contains numerous scraping rules, which are mostly manually managed by DevOps teams. When new nodes or components are added, manual modification of this configuration and a hot reload of Prometheus are required. Is it possible to dynamically monitor microservices? Prometheus provides multiple dynamic service discovery mechanisms, and this guide uses Consul as an example.\nConsul-Based Auto Discovery\rConsul is a distributed key-value database and a service registry component. Other services can register with Consul, including Prometheus. By leveraging Consul\u0026rsquo;s service discovery, we can avoid manually specifying large numbers of targets in Prometheus.\nThe workflow for Prometheus\u0026rsquo; Consul-based service discovery is as follows:\nRegister or deregister services (monitoring targets) in Consul. Prometheus continuously monitors Consul. When changes to services meeting the criteria are detected, Prometheus updates its monitoring targets accordingly. Service Discovery Mechanisms Supported by Prometheus\rPrometheus configuration for data sources falls into two main categories: static configuration and dynamic discovery. Commonly used mechanisms include:\n1 2 3 4 5 1) static_configs: # Static service discovery 2) file_sd_configs: # File-based service discovery 3) dns_sd_configs: # DNS-based service discovery 4) kubernetes_sd_configs: # Kubernetes service discovery 5) consul_sd_configs: # Consul service discovery In Kubernetes monitoring scenarios, frequently updated resources like Pods and Services best demonstrate the advantages of Prometheus\u0026rsquo; auto-discovery capabilities.\nWorking Principles\rPrometheus queries the configuration information stored in Consul\u0026rsquo;s KV storage through the Consul API, then extracts service metadata from it;\nPrometheus uses this information to construct target service URLs and adds them to the service discovery target list;\nWhen services are deregistered or become unavailable, Prometheus will automatically remove them from the target list.\nContainerized Consul Cluster\rFor testing and validation purposes only. Not suitable for production use! Production environments must undergo comprehensive cluster-based deployment validation with service process guardianship and monitoring.\nCreate a single-node Consul cluster:\n1 # docker run -id -expose=[8300,8301,8302,8500,8600] --restart always -p 18300:8300 -p 18301:8301 -p 18302:8302 -p 18500:8500 -p 18600:8600 --name server1 -e \u0026#39;CONSUL_LOCAL_CONFIG={\u0026#34;skip_leave_on_interrupt\u0026#34;: true}\u0026#39; consul agent -server -bootstrap-expect=1 -node=server1 -bind=0.0.0.0 -client=0.0.0.0 -ui -datacenter dc1 Parameter Explanations:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 -expose: Exposes Consul\u0026#39;s required ports: 8300, 8301, 8302, 8500, 8600 --restart: always ensures automatic container restart upon failure -p: Establishes host-container port mappings --name: Container name -e: Environment variable for Consul configuration consul: Refers to the Consul image name, not the command agent: Command executed in the container. Parameter details: -server: Designates the node as a server type -bootstrap-expect: Specifies the number of server nodes required to trigger leader election (1 for single-node) -node: Node name -bind: Internal cluster communication address (default 0.0.0.0) -client: Client interface address (default 127.0.0.1) -ui: Enables Consul web UI -datacenter: Data center name Validation test:\nAccess the web UI at, for example: http://192.10.192.109:18500/\n1 # curl localhost:18500 Register Host to Consul\rExample: Register node-exporter on a virtual machine to consul.\n1 2 3 4 5 ## Format $ curl -X PUT -d \u0026#39;{\u0026#34;id\u0026#34;: \u0026#34;\u0026#39;${host_name}\u0026#39;\u0026#34;,\u0026#34;name\u0026#34;: \u0026#34;node-exporter\u0026#34;,\u0026#34;address\u0026#34;: \u0026#34;\u0026#39;${host_addr}\u0026#39;\u0026#34;,\u0026#34;port\u0026#34;:9100,\u0026#34;tags\u0026#34;: [\u0026#34;dam\u0026#34;],\u0026#34;checks\u0026#34;: [{\u0026#34;http\u0026#34;: \u0026#34;http://\u0026#39;${host_addr}\u0026#39;:9100/\u0026#34;,\u0026#34;interval\u0026#34;: \u0026#34;5s\u0026#34;}]}\u0026#39; http://192.10.192.109:18500/v1/agent/service/register ## Example $ curl -X PUT -d \u0026#39;{\u0026#34;id\u0026#34;: \u0026#34;sh-middler2\u0026#34;,\u0026#34;name\u0026#34;: \u0026#34;node-exporter\u0026#34;,\u0026#34;address\u0026#34;: \u0026#34;192.10.192.134\u0026#34;,\u0026#34;port\u0026#34;:9100,\u0026#34;tags\u0026#34;: [\u0026#34;middleware\u0026#34;],\u0026#34;checks\u0026#34;: [{\u0026#34;http\u0026#34;: \u0026#34;http://192.10.192.134:9100/metrics\u0026#34;,\u0026#34;interval\u0026#34;: \u0026#34;3s\u0026#34;}]}\u0026#39; http://192.10.192.109:18500/v1/agent/service/register Parameter Explanation\n1 2 3 4 5 6 7 8 9 id : Registration ID (must be unique in consul) name : Service name address: Binding IP for auto-registration port: Binding port for auto-registration tags: Registration tags (multiple allowed) checks : Health checks http: Data source for check interval: Check interval http://192.10.192.109:18500/v1/agent/service/register : Consul registration API endpoint Delete:\n1 2 3 4 5 ## Format $ curl -X PUT http://192.10.192.109:18500/v1/agent/service/deregister/${id} ## Example $ curl -X PUT http://192.10.192.109:18500/v1/agent/service/deregister/sh-middler2 Configuring Prometheus with Consul for Automatic Service Discovery\rModify Prometheus\u0026rsquo; ConfigMap configuration file: prometheus-config.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - job_name: consul honor_labels: true metrics_path: /metrics scheme: http consul_sd_configs: # Configuration based on Consul service discovery - server: 192.10.192.109:18500 # Consul listening address services: [] # Match all services in Consul relabel_configs: # Relabel configuration settings - source_labels: [\u0026#39;__meta_consul_tags\u0026#39;] # Assign __meta_consul_tags value to product target_label: \u0026#39;servername\u0026#39; - source_labels: [\u0026#39;__meta_consul_dc\u0026#39;] # Assign __meta_consul_dc value to idc target_label: \u0026#39;idc\u0026#39; - source_labels: [\u0026#39;__meta_consul_service\u0026#39;] regex: \u0026#34;consul\u0026#34; # Match services named \u0026#34;consul\u0026#34; action: drop # Drop/remove matching entries Reload Prometheus using the above method. Open the Prometheus Target page to see the defined mysql-exporter job:\n1 curl -XPOST http://prometheus.kubernets.cn/-/reload Summary\rDynamic Service Discovery and Monitoring: By integrating with Consul, Prometheus dynamically maintains its target list, ensuring prompt discovery and monitoring as new services are deployed. Scalability: Automated service discovery simplifies infrastructure scaling while preserving reliable monitoring availability and performance. Seamless Integration: As the service registry, Consul enables Prometheus to integrate seamlessly with other tools in the Consul ecosystem, delivering a comprehensive solution for service infrastructure monitoring and management. Self-Healing Capability: Automatic service discovery allows Prometheus to detect infrastructure changes in real time, continuously updating its target list to ensure uninterrupted monitoring data and high performance. ","date":"2023-09-26T00:00:00Z","permalink":"/en/p/consul_based_prometheus_auto_discovery/","title":"Consul-based Auto Discovery Mechanism for Prometheus Monitoring"},{"content":"Deploy acme.sh with Docker\rdocker-compose.yml, replace the actual values of DNSPod API ID and Key.\n1 2 3 4 5 6 7 8 9 10 11 12 services: acme-sh: image: neilpang/acme.sh:latest container_name: acme restart: always command: daemon environment: - TZ=Asia/Shanghai - DP_Id=****** # DNSPod API ID - DP_Key=**************** # DNSPod API Key volumes: - ./certs/:/acme.sh/ Configure acme.sh\r1 2 3 docker compose up -d docker compose exec -it acme sh acme.sh --set-default-ca --server letsencrypt Apply for a certificate\rGenerate a certificate, replace www.example.com with the actual domain name. The certificate will be mounted to the ./certs/ directory.\n1 acme.sh --issue --dns dns_dp -d www.example.com ","date":"2023-09-18T00:00:00Z","image":"/p/acme_dnspod_ssl_certificate_guide/acme_dnspod_ssl_certificate_guide.png","permalink":"/en/p/acme_dnspod_ssl_certificate_guide/","title":"Guide to Automatically Configure SSL Certificates with ACME on DNSPod"},{"content":"What is frp?\rfrp is a high-performance reverse proxy application specialized in intranet penetration. It supports multiple protocols including TCP, UDP, HTTP, HTTPS, and features P2P communication capabilities. Using frp, you can securely and conveniently expose internal network services to the public network through relay nodes with public IP addresses.\nWhy choose frp?\rBy deploying an frp server on a node with a public IP, you can easily penetrate internal services to the public network while enjoying these professional features:\nMulti-protocol Support: Client-server communication supports TCP, QUIC, KCP, Websocket, and other protocols. TCP Connection Stream Multiplexing: Carry multiple requests over a single connection to reduce connection setup time and lower request latency. Load Balancing between proxy groups. Port Multiplexing: Multiple services can be exposed through the same server port. P2P Communication: Traffic bypasses server relay, maximizing bandwidth utilization. Client Plugins: Provides natively supported client plugins for static file viewing, HTTPS/HTTP protocol conversion, HTTP/SOCKS5 proxies, and more. Server Plugin System: Highly extensible plugin system for custom functional expansion. User-Friendly UI: Server and client interfaces for simplified configuration and monitoring. frps\rFRPS is the server part of frp, which needs a public IP and opens port 12345.\ndocker-compose.yaml 1 2 3 4 5 6 7 8 9 10 11 services: frps: image: fatedier/frps:v0.61.0 container_name: frps restart: always working_dir: /etc/frps/ command: [\u0026#34;-c\u0026#34;, \u0026#34;./config.toml\u0026#34;] volumes: - /etc/localtime:/etc/localtime:ro - ./:/etc/frps/ network_mode: host config.toml 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 bindAddr = \u0026#34;0.0.0.0\u0026#34; bindPort = 12345 vhostHTTPPort = 12380 vhostHTTPSPort = 12343 webServer.addr = \u0026#34;0.0.0.0\u0026#34; webServer.port = 12375 webServer.user = \u0026#34;admin\u0026#34; webServer.password = \u0026#34;******\u0026#34; enablePrometheus = true log.to = \u0026#34;./logs/frps.log\u0026#34; log.level = \u0026#34;debug\u0026#34; log.maxDays = 7 log.disablePrintColor = false auth.method = \u0026#34;token\u0026#34; auth.token = \u0026#34;******\u0026#34; custom404Page = \u0026#34;./404.html\u0026#34; 404.html 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 \u0026lt;!DOCTYPE html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;UTF-8\u0026#34;\u0026gt; \u0026lt;meta name=\u0026#34;viewport\u0026#34; content=\u0026#34;width=device-width, initial-scale=1.0\u0026#34;\u0026gt; \u0026lt;title\u0026gt;Not Found\u0026lt;/title\u0026gt; \u0026lt;style\u0026gt; body { font-family: Arial, sans-serif; text-align: center; margin-top: 100px; } h1 { font-size: 50px; margin-bottom: 10px; } p { font-size: 20px; margin-top: 0; } \u0026lt;/style\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;h1\u0026gt;404\u0026lt;/h1\u0026gt; \u0026lt;p\u0026gt;Page not found\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; frpc\rdocker deployment\rdocker-compose.yaml 1 2 3 4 5 6 7 8 9 10 11 services: net-tunnel: image: \u0026#34;harbor.example.com/devops/kube-net-tunnel:standalone\u0026#34; container_name: \u0026#34;net-tunnel\u0026#34; restart: \u0026#34;always\u0026#34; network_mode: \u0026#34;bridge\u0026#34; working_dir: /etc/net-tunnel/ volumes: - \u0026#34;/etc/localtime:/etc/localtime:ro\u0026#34; - \u0026#34;./:/etc/net-tunnel/\u0026#34; command: [\u0026#34;-c\u0026#34;, \u0026#34;/etc/net-tunnel/config.toml\u0026#34;] config.toml 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 user = \u0026#34;test\u0026#34; serverAddr = \u0026#34;*.*.*.*\u0026#34; serverPort = 12345 auth.method = \u0026#34;token\u0026#34; auth.token = \u0026#34;******\u0026#34; log.to = \u0026#34;./logs/frpc.log\u0026#34; log.level = \u0026#34;debug\u0026#34; log.maxDays = 3 log.disablePrintColor = false webServer.addr = \u0026#34;127.0.0.1\u0026#34; webServer.port = 12346 webServer.user = \u0026#34;admin\u0026#34; webServer.password = \u0026#34;******\u0026#34; [[proxies]] name = \u0026#34;admin_ui\u0026#34; type = \u0026#34;tcp\u0026#34; localIP = \u0026#34;127.0.0.1\u0026#34; localPort = 12346 remotePort = 12440 [[proxies]] name = \u0026#34;ssh\u0026#34; type = \u0026#34;tcp\u0026#34; localIP = \u0026#34;172.17.0.1\u0026#34; # Windows or MacOS: \u0026#34;host.docker.internal\u0026#34; localPort = 22 remotePort = 12441 K8s deployment\rkube-net-tunnel.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 apiVersion: v1 kind: ConfigMap metadata: name: kube-net-tunnel-config namespace: kube-system data: config.toml: | user=\u0026#34;test\u0026#34; serverAddr = \u0026#34;*.*.*.*\u0026#34; serverPort = 12345 auth.method = \u0026#34;token\u0026#34; auth.token = \u0026#34;******\u0026#34; log.to = \u0026#34;console\u0026#34; log.level = \u0026#34;debug\u0026#34; log.maxDays = 3 log.disablePrintColor = false webServer.addr = \u0026#34;127.0.0.1\u0026#34; webServer.port = 7400 webServer.user = \u0026#34;admin\u0026#34; webServer.password = \u0026#34;******\u0026#34; [[proxies]] name = \u0026#34;admin\u0026#34; type = \u0026#34;tcp\u0026#34; localIP = \u0026#34;127.0.0.1\u0026#34; localPort = 7400 remotePort = 12431 [[proxies]] name = \u0026#34;ssh\u0026#34; type = \u0026#34;tcp\u0026#34; localIP = \u0026#34;10.19.31.7\u0026#34; localPort = 22 remotePort = 12432 --- apiVersion: apps/v1 kind: Deployment metadata: name: kube-net-tunnel namespace: kube-system spec: replicas: 1 selector: matchLabels: app: kube-net-tunnel template: metadata: labels: app: kube-net-tunnel spec: containers: - name: kube-net-tunnel image: harbor.example.com/devops/kube-net-tunnel:standalone imagePullPolicy: IfNotPresent args: [\u0026#34;-c\u0026#34;, \u0026#34;/etc/kube-net-tunnel/config.toml\u0026#34;] volumeMounts: - name: config-volume mountPath: /etc/kube-net-tunnel/config.toml subPath: config.toml volumes: - name: config-volume configMap: defaultMode: 0755 name: kube-net-tunnel-config items: - key: config.toml path: config.toml ","date":"2023-07-09T00:00:00Z","permalink":"/en/p/frp_service_deployment_configuration/","title":"Complete Guide to FRP Reverse Proxy Deployment"},{"content":"Reference\rhttps://www.cnblogs.com/lincappu/p/14926757.html\nFastDFS\rThe author of FastDFS, Yu Qing, described it on GitHub as follows: \u0026ldquo;FastDFS is an open source high performance distributed file system. It\u0026rsquo;s major functions include: file storing, file syncing and file accessing (file uploading and file downloading), and it can resolve the high capacity and load balancing problem. FastDFS should meet the requirement of the website whose service based on files such as photo sharing site and video sharing site.\u0026rdquo; This means FastDFS is an open-source high-performance distributed file system. Its core functionalities include file storage, file synchronization, and file access (uploading and downloading), addressing challenges of large-scale storage and load balancing. It is designed to meet the needs of file-service-oriented websites like photo or video sharing platforms.\nFastDFS has two roles: Tracker and Storage. The Tracker is responsible for scheduling file access requests and load balancing. Storage manages file operations, including storage, synchronization, and providing file access interfaces. It also handles metadata, represented as key-value attribute pairs associated with files. Both Tracker and Storage nodes can be composed of one or multiple servers. Servers can be added or removed without disrupting services, though at least one server in each cluster must remain operational. Notably, all servers in a Tracker cluster operate in a peer-to-peer (P2P) manner, allowing dynamic scaling based on server workload.\nAdditionally, the official documentation elaborates on the storage architecture. To support massive capacity, the Storage nodes adopt a volume (or group) organizing approach. The storage system consists of one or more independent volumes, where the total system capacity equals the sum of all volumes. Each volume can be hosted by one or multiple Storage servers. All servers within the same volume store identical files, serving purposes of redundant backup and load balancing. Adding servers to a volume\u0026hellip;\nWhen adding a new server, the system automatically performs file synchronization. Once synchronization is complete, the system automatically switches the new server online to provide services. When storage space is insufficient or near depletion, volumes can be dynamically expanded. Simply add one or more servers and configure them as new volumes to increase the storage system\u0026rsquo;s capacity. We will not delve too deeply into the concepts of volumes or groups here, as there will be detailed explanations in the subsequent installation and deployment.\nIn FastDFS, the file identifier consists of two parts: the volume name and the file name.\nEnvironment Specifications\rOperating System: CentOS Linux release 7.2.1511 System Disk: 274GB Mounted Disks: 3.7TB * 12 CPU: 32 cores (Intel® Xeon®) Memory: 8GB Architecture Design\rWorkflow:\nThe client sends a request to the Tracker. The Tracker retrieves metadata from Storage nodes and returns it to the client. The client uses the metadata to directly request files from Storage nodes. Key Design Principles\rThe core system contains two roles: Tracker Server and Storage Server. All Tracker servers are peer-to-peer (P2P) with no Master-Slave relationships. Storage servers are organized into groups; files are fully replicated within the same group. Storage servers across different groups do not communicate. Synchronization occurs only within the same group. Storage servers proactively report status to Trackers. Each Tracker maintains complete Storage server status records. When the Trunk feature is enabled, Trackers coordinate with Storages to elect a Trunk-Server. Cluster Deployment\rTable 1 Software List and Versions\nName Description Link CentOS 7.x (Installation OS) libfastcommon Utility function package for FastDFS libfastcommon V1.0.39 FastDFS FastDFS Main Program FastDFS V5.11 fastdfs-nginx-module FastDFS-Nginx integration module (resolves intra-group sync delay) fastdfs-nginx-module V1.22 nginx nginx 1.12.2 (Latest version via YUM for CentOS 7) nginx 1.17.4 Table 2 Server IPs, Service Allocation, and Port Planning\nName IP Address Application Service Port Machine A 10.58.10.136 tracker 22122 10.58.10.136 storage-group1 23000 10.58.10.136 storage-group2 33000 10.58.10.136 libfastcommon - 10.58.10.136 nginx 8888 10.58.10.136 fastdfs-nginx-module - Machine B 10.58.10.137 tracker 22122 | | 10.58.10.137 | storage-group1 | 23000 | | | 10.58.10.137 | storage-group3 | 43000 | | | 10.58.10.137 | libfastcommon | - | | | 10.58.10.137 | nginx | 8888 | | | 10.58.10.137 | fastdfs-nginx-module | - | |Machine C| | | | | | 10.58.10.138 | tracker | 22122 | | | 10.58.10.138 | storage-group2 | 33000 | | | 10.58.10.138 | storage-group23 | 43000 | | | 10.58.10.138 | libfastcommon | - | | | 10.58.10.138 | nginx | 8888 | | | 10.58.10.138 | fastdfs-nginx-module | - |\nPrerequisites before installation:\nEnsure to grant read and write permissions to the storage directories (logs, data, PID files, etc.) that will be used.\nAll configuration files in the following sections contain comments marked with \u0026ldquo;#\u0026rdquo; for explanations. Be sure to remove these comment lines that start with \u0026ldquo;#\u0026rdquo; when implementing the configurations.\nInitialize Environment\r1 2 # Install build environment yum groups install Development Tools perl redhat-rpm-config.noarch gd-devel perl-devel perl-ExtUtils-Embed pcre-devel openssl openssl-devel gcc-c++ autoconf automake zlib-devel libxml2 libxml2-dev libxslt-devel GeoIP GeoIP-devel GeoIP-data gperftools-y Install libfastcommon\rExecute the following operations on machines A, B, and C respectively:\n1 2 3 4 tar -zxvf libfastcommon-1.0.39.tar.gz cd libfastcommon-1.0.39/ ./make.sh ./make.sh install libfastcommon is installed to /usr/lib64/libfastcommon.so. Note the difference between new and old versions:\nNew versions will automatically create a symlink for libfastcommon.so to the /usr/local/lib directory. Old versions require manual symlink creation: 1 2 ln -s /usr/lib64/libfastcommon.so /usr/local/lib/libfastcommon.so ln -s /usr/lib64/libfastcommon.so /usr/lib/libfastcommon.so If libfdfsclient.so exists, also add it to /usr/local/lib:\n1 2 ln -s /usr/lib64/libfdfsclient.so /usr/local/lib/libfdfsclient.so ln -s /usr/lib64/libfdfsclient.so /usr/lib/libfdfsclient.so !!! Note It\u0026rsquo;s recommended to manually verify successful symlink creation using:\n1 ls | grep libfastcommon Check in both `/usr/lib/` and `/usr/local/lib` directories.\rInstall Tracker\rPerform the following operations on machines A, B, and C respectively\n1 2 3 4 5 6 7 mkdir -p /data/fastdfs/tracker tar -zxvf fastdfs-5.11.tar.gz cd fastdfs-5.11/ ./make.sh ./make.sh install # Prepare configuration files cp /etc/fdfs/tracker.conf.sample /etc/fdfs/tracker.conf # For tracker node Modify tracker configuration file:\n1 2 3 4 5 vim /etc/fdfs/tracker.conf # Required modifications: max_connections=1024 # default 256, maximum number of connections port=22122 # tracker server port (default 22122, usually unchanged) base_path=/data/fastdfs/tracker # root directory for storing logs and data Add tracker.service to enable service management with systemctl (start/restart/stop) and auto-start on boot:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Edit service file vim /usr/lib/systemd/system/fastdfs-tracker.service [Unit] Description=The FastDFS File server After=network.target remote-fs.target nss-lookup.target [Service] Type=forking ExecStart=/usr/bin/fdfs_trackerd /etc/fdfs/tracker.conf start ExecStop=/usr/bin/fdfs_trackerd /etc/fdfs/tracker.conf stop ExecRestart=/usr/bin/fdfs_trackerd /etc/fdfs/tracker.conf restart [Install] WantedBy=multi-user.target After saving the /usr/lib/systemd/system/fastdfs-tracker.service file and exiting vim, execute the following commands to start the FastDFS Tracker service:\n1 2 3 $ systemctl daemon-reload $ systemctl enable fastdfs-tracker.service $ systemctl start fastdfs-tracker.service After the tracker service starts, use the following command to verify if the port is properly opened:\n1 netstat -tulnp | grep 22122 # Check if the service is running and the port is open Install Storage\rSkip the extraction step if already performed during tracker installation.\n1 2 3 4 $ tar -zxvf fastdfs-5.11.tar.gz $ cd fastdfs-5.11/ $ ./make.sh $ ./make.sh install Machine A (group1/group2)\rCopy storage configuration files under fastdfs-5.11 directory (make two copies)\n1 2 3 4 5 sudo mkdir -p /data/fastdfs/storage/group1 sudo mkdir -p /data/fastdfs/storage/group2 sudo cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage-group1.conf #storage node group1 sudo cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage-group2.conf #storage node group2 sudo cp /etc/fdfs/client.conf.sample /etc/fdfs/client.conf #client config for testing According to architecture design, modify the three files sequentially:\nModify group1 configuration file: /etc/fdfs/storage-group1.conf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 $ sudo vim /etc/fdfs/storage-group2.conf # The following content needs to be modified: group_name=group2 port=33000 # Storage service port (default 23000, modified to 33000) base_path=/data/fastdfs/storage/group2 # Root directory for data and log file storage store_path_count=6 store_path0=/data07/fastdfs # First storage directory for group2 store_path1=/data08/fastdfs # Second storage directory for group2 store_path2=/data09/fastdfs # Third storage directory for group2 store_path3=/data10/fastdfs # Fourth storage directory for group2 store_path4=/data11/fastdfs # Fifth storage directory for group2 store_path5=/data12/fastdfs # Sixth storage directory for group2 tracker_server=10.58.10.136:22122 # Tracker server IP and port tracker_server=10.58.10.137:22122 # Tracker server IP and port tracker_server=10.58.10.138:22122 # Tracker server IP and port http.server_port=8888 # HTTP access port for files (default 8888, modify as needed to match Nginx configuration) Modify client configuration file: /etc/fdfs/client.conf 1 2 3 4 5 6 $ sudo vim /etc/fdfs/client.conf # The following content needs to be modified: base_path=/data/fastdfs/client tracker_server=10.58.10.136:22122 # Tracker server IP and port tracker_server=10.58.10.137:22122 # Tracker server IP and port tracker_server=10.58.10.138:22122 # Tracker server IP and port After setting up the Storage service, we start the two services for Storage: 1 2 3 $ systemctl daemon-reload $ systemctl enable fastdfs-storage-group1.service $ systemctl start fastdfs-storage-group1.service Possible issues during startup:\nThe service might fail to start due to permission issues or configuration errors. Check service status using:\nsystemctl status fastdfs-storage-group1.service Analyze logs (located at /data/fastdfs/storage/group1/logs/) to troubleshoot. Refer to the \u0026ldquo;Troubleshooting\u0026rdquo; section for common pitfalls. Verify service activation:\n1 netstat -tulnp # Check if services are running and ports are open (23000, 33000) Check FastDFS cluster status after successful startup: 1 2 # View cluster status $ fdfs_monitor /etc/fdfs/storage-group1.conf list ??? note \u0026ldquo;Info\u0026rdquo;\n[Cluster status details will be shown here]\nThe console printed the following information, indicating success: [2018-11-06 00:00:00] DEBUG - base_path=/data/fastdfs/storage/group1, connect_timeout=30, network_timeout=60, tracker_server_count=2, anti_steal_token=0, anti_steal_secret_key length=0, use_connection_pool=0, g_connection_pool_max_idle_time=3600s, use_storage_id=0, storage server id count: 0 server_count=3, server_index=0 tracker server is 10.58.10.136:22122,10.58.10.137:22122,10.58.10.138:22122 group count: 2 Group 1: \u0026hellip;\nTest file upload via client 1 $ fdfs_upload_file /etc/fdfs/client.conf test.txt Machine B (group1/group3)\rThe configuration process is similar to Machine A, with the following modifications:\nCreate directories and copy configuration files 1 2 3 4 5 $ sudo mkdir -p /data/fastdfs/storage/group1 $ sudo mkdir -p /data/fastdfs/storage/group3 $ sudo cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage-group1.conf # Storage node configuration for group1 $ sudo cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage-group3.conf # Storage node configuration for group3 $ sudo cp /etc/fdfs/client.conf.sample /etc/fdfs/client.conf # Client configuration for testing Modify the configuration file for group3 (/etc/fdfs/storage-group3.conf). The configuration for group1 remains consistent with Machine A. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 $ sudo vim /etc/fdfs/storage-group3.conf # Required modifications are as follows: group_name=group3 port=43000 # Storage service port (default: 23000, modified to 43000) base_path=/data/fastdfs/storage/group3 # Root directory for data and log files store_path_count=6 store_path0=/data07/fastdfs # First storage directory for group3 store_path1=/data08/fastdfs # Second storage directory for group3 store_path2=/data09/fastdfs # Third storage directory for group3 store_path3=/data10/fastdfs # Fourth storage directory for group3 store_path4=/data11/fastdfs # Fifth storage directory for group3 store_path5=/data12/fastdfs # Sixth storage directory for group3 tracker_server=10.58.10.136:22122 # Tracker server IP and port tracker_server=10.58.10.137:22122 # Tracker server IP and port tracker_server=10.58.10.138:22122 # Tracker server IP and port http.server_port=8888 # HTTP file access port (default: 8888, keep consistent with Nginx) The client configuration remains consistent with Machine A, no duplication needed here.\nCreate a service for starting group3. The configuration for fastdfs-storage-group1.service is identical to Machine A - simply copy it.\nExecute the startup script until both fastdfs-storage-group1.service and fastdfs-storage-group3.service are running.\nConfiguration for Machine C (Group2/Group3)\rThe configuration process is similar to Machine A, with the following modifications:\nCreate directories and copy configuration files: 1 2 3 4 5 $ sudo mkdir -p /data/fastdfs/storage/group2 $ sudo mkdir -p /data/fastdfs/storage/group3 $ sudo cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage-group2.conf # Storage node for group2 $ sudo cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage-group3.conf # Storage node for group3 $ sudo cp /etc/fdfs/client.conf.sample /etc/fdfs/client.conf # Client configuration file (for testing) Modify the configuration file for Group2 (/etc/fdfs/storage-group2.conf). The configuration content for Group3 should remain consistent with Machine B\u0026rsquo;s settings. Configuration file for FastDFS Storage group2\r$ sudo vim /etc/fdfs/storage-group2.conf\nModified configurations:\rgroup_name=group2 port=33000 # Storage service port (default 23000, modified to 33000) base_path=/data/fastdfs/storage/group2 # Root directory for data and log storage store_path_count=6 store_path0=/data01/fastdfs # First storage directory for group2 store_path1=/data02/fastdfs # Second storage directory for group2 store_path2=/data03/fastdfs # Third storage directory for group2 store_path3=/data04/fastdfs # Fourth storage directory for group2 store_path4=/data05/fastdfs # Fifth storage directory for group2 store_path5=/data06/fastdfs # Sixth storage directory for group2 tracker_server=10.58.10.136:22122 # Tracker server IP and port tracker_server=10.58.10.137:22122 # Tracker server IP and port tracker_server=10.58.10.138:22122 # Tracker server IP and port http.server_port=8888 # HTTP file access port (default 8888, should match nginx configuration)\nThe client configuration file is consistent with Machine A, no repetition needed here. Create startup services for group2 and group3 (existing templates can be copied directly). Execute startup scripts until both fastdfs-storage-group2.service and fastdfs-storage-group3.services are running. Installing Nginx and the FastDFS Nginx Module\rStep 1: In the FastDFS directory, copy the http.conf and mime.types files to the /etc/fdfs directory to enable Nginx access to the Storage service.\n1 2 3 # Execute on all three machines $ cp ./conf/http.conf /etc/fdfs/ # For Nginx access $ cp ./conf/mime.types /etc/fdfs/ # For Nginx access Step 2: Install the FastDFS Nginx module:\n1 2 3 # Execute on all three machines $ tar -zxvf V1.20.tar.gz $ cp fastdfs-nginx-module-1.20/src/mod_fastdfs.conf /etc/fdfs/mod_fastdfs.conf Step 3: Modify the configuration file in fastdfs-nginx-module-1.20/src/config. Locate the ngx_module_incs and CORE_INCS entries and modify them as follows:\n1 2 ngx_module_incs=\u0026#34;/usr/include/fastdfs /usr/include/fastcommon/\u0026#34; CORE_INCS=\u0026#34;$CORE_INCS /usr/include/fastdfs /usr/include/fastcommon/\u0026#34; If not modified, the following error will occur during Nginx compilation:\n/usr/include/fastdfs/fdfs_define.h:15:27: fatal error: common_define.h: No such file or directory\nStep 4: Then extract and install the Nginx service:\n$ tar -zxvf nginx-1.12.2.tar.gz $ cd nginx-1.12.2 $ ./configure \u0026ndash;prefix=/usr/share/nginx \u0026ndash;sbin-path=/usr/sbin/nginx \u0026ndash;modules-path=/usr/lib64/nginx/modules \u0026ndash;conf-path=/etc/nginx/nginx.conf \u0026ndash;error-log-path=/var/log/nginx/error.log \u0026ndash;http-log-path=/var/log/nginx/access.log \u0026ndash;http-client-body-temp-path=/var/lib/nginx/tmp/client_body \u0026ndash;http-proxy-temp-path=/var/lib/nginx/tmp/proxy \u0026ndash;http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi \u0026ndash;http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi \u0026ndash;http-scgi-temp-path=/var/lib/nginx/tmp/scgi \u0026ndash;pid-path=/run/nginx.pid \u0026ndash;lock-path=/run/lock/subsys/nginx \u0026ndash;user=nginx \u0026ndash;group=nginx \u0026ndash;with-file-aio \u0026ndash;with-ipv6 \u0026ndash;with-http_auth_request_module \u0026ndash;with-http_ssl_module \u0026ndash;with-http_v2_module \u0026ndash;with-http_realip_module \u0026ndash;with-http_addition_module \u0026ndash;with-http_xslt_module=dynamic \u0026ndash;with-http_image_filter_module=dynamic \u0026ndash;with-http_geoip_module=dynamic \u0026ndash;with-http_sub_module \u0026ndash;with-http_dav_module \u0026ndash;with-http_flv_module \u0026ndash;with-http_mp4_module \u0026ndash;with-http_gunzip_module \u0026ndash;with-http_gzip_static_module \u0026ndash;with-http_random_index_module \u0026ndash;with-http_secure_link_module \u0026ndash;with-http_degradation_module \u0026ndash;with-http_slice_module \u0026ndash;with-http_stub_status_module \u0026ndash;with-http_perl_module=dynamic \u0026ndash;with-mail=dynamic \u0026ndash;with-mail_ssl_module \u0026ndash;with-pcre \u0026ndash;with-pcre-jit \u0026ndash;with-stream=dynamic \u0026ndash;with-stream_ssl_module \u0026ndash;with-google_perftools_module \u0026ndash;with-debug \u0026ndash;with-cc-opt=\u0026rsquo;-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong \u0026ndash;param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic\u0026rsquo; \u0026ndash;with-ld-opt=\u0026rsquo;-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E\u0026rsquo; \u0026ndash;add-module=${YOUR_PATH}/fastdfs-nginx-module-1.20/src $ make $ make install\nNote: Replace ${YOUR_PATH} in the above with the parent directory of fastdfs-nginx-module-1.20 to ensure the path is correct.\nModify configuration file /etc/fdfs/mod_fastdfs.conf on Machine A: connect_timeout=2\nnetwork_timeout=30\nbase_path=/data/fastdfs/ngx_mod\nload_fdfs_parameters_from_tracker=true\nstorage_sync_file_max_delay = 86400\nuse_storage_id = false\nstorage_ids_filename = storage_ids.conf\ntracker_server=10.58.10.136:22122 # Tracker server IP and port\ntracker_server=10.58.10.137:22122 # Tracker server IP and port\ntracker_server=10.58.10.138:22122 # Tracker server IP and port\ngroup_name=group1/group2 # Global setting\nurl_have_group_name = true\nlog_level=info\nlog_filename=\nresponse_mode=proxy\nif_alias_prefix=\nflv_support = true\nflv_extension = flv\ngroup_count = 2\n[group1]\ngroup_name=group1 # Group-specific\nstorage_server_port=23000\nstore_path_count=6\nstore_path0=/data01/fastdfs\nstore_path1=/data02/fastdfs\nstore_path2=/data03/fastdfs\nstore_path3=/data04/fastdfs\nstore_path4=/data05/fastdfs\nstore_path5=/data06/fastdfs\n[group2]\ngroup_name=group2\nstorage_server_port=33000\nstore_path_count=6\nstore_path0=/data07/fastdfs\nstore_path1=/data08/fastdfs\nstore_path2=/data09/fastdfs\nstore_path3=/data10/fastdfs\nstore_path4=/data11/fastdfs\nstore_path5=/data12/fastdfs\nuser nginx; worker_processes on; worker_rlimit_nofile 65535;\nerror_log /var/log/nginx/error.log; pid /run/nginx.pid;\ninclude /usr/share/nginx/modules/*.conf;\nevents { worker_connections 65535; use epoll; accept_mutex off; }\nhttp { log_format main \u0026lsquo;$remote_addr - $remote_user [$time_local] \u0026ldquo;$request\u0026rdquo; \u0026rsquo; \u0026lsquo;$status $body_bytes_sent \u0026ldquo;$http_referer\u0026rdquo; \u0026rsquo; \u0026lsquo;\u0026quot;$http_user_agent\u0026quot; \u0026ldquo;$http_x_forwarded_for\u0026rdquo;\u0026rsquo;;\naccess_log /var/log/nginx/access.log main;\rsendfile on;\rtcp_nopush on;\rtcp_nodelay on;\rkeepalive_timeout 65;\rtypes_hash_max_size 2048;\rgzip on;\rserver_names_hash_bucket_size 128;\rclient_header_buffer_size 32k;\rlarger_client_header_buffers 4 32k;\rclient_max_body_size 300m;\rproxy_redirect off;\rproxy_http_version 1.1;\rproxy_set_header Connection '';\rproxy_set_header REMOTE-HOST $remote_addr;\rproxy_set_header Host $host;\rproxy_set_header X-Real-IP $remote_addr;\rproxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\rproxy_connect_timeout 90;\rproxy_send_timeout 90;\rproxy_read_timeout 90;\rproxy_buffer_size 16k;\rproxy_buffers 8 64k;\rproxy_busy_buffers_size 128k;\rproxy_temp_file_write_size 128k;\rproxy_cache_path /data/fastdfs/cache/nginx/proxy_cache levels=1:2\rkeys_zone=http-cache:200m max_size=1g inactive=30d;\rproxy_temp_path /data/fastdfs/cache/nginx/proxy_cache/temp;\rinclude /etc/nginx/mime.types;\rdefault_type application/octet-stream;\rinclude /etc/nginx/conf.d/*.conf;\rserver {\rlisten 80 default_server;\rlisten [::]:80 default_server;\rserver_name _;\rroot /usr/share/nginx/html;\r# Load default server block configuration\r#include /etc/nginx/default.d/*.conf;\rlocation ~ ^/ok(\\..*)?$ {\rreturn 200 \u0026quot;OK\u0026quot;;\r}\rlocation /nginx {\rstub_status on;\r}\rlocation /healthcheck {\rcheck_status on;\r}\rlocation ^~ /group1/ {\rproxy_next_upstream http_502 http_504 error timeout invalid_header;\rproxy_cache http-cache;\rproxy_cache_valid 200 304 12h;\rproxy_cache_key $uri$is_args$args;\radd_header 'Access-Control-Allow-Origin' $http_origin;\radd_header 'Access-Control-Allow-Credentials' 'true';\radd_header \u0026quot;Access-Control-Allow-Methods\u0026quot; \u0026quot;GET, POST, HEAD, PUT, DELETE, OPTIONS, PATCH\u0026quot;;\radd_header \u0026quot;Access-Control-Allow-Headers\u0026quot; \u0026quot;Origin, No-Cache, Authorization, X-Requested-With, If-Modified-Since, Pragma, Last-Modified, Cache-Control, Expires, Content-Type\u0026quot;;\rif ($request_method = 'OPTIONS') {\rreturn 200 'OK';\r}\rproxy_pass http://fdfs_group1;\rexpires 30d;\r}\rlocation ^~ /group2/ {\rproxy_next_upstream http_502 http_504 error timeout invalid_header;\rproxy_cache http-cache;\rproxy_cache_valid 200 304 12h;\rproxy_cache_key $uri$is_args$args;\radd_header 'Access-Control-Allow-Origin' $http_origin;\radd_header 'Access-Control-Allow-Credentials' 'true';\radd_header \u0026quot;Access-Control-Allow-Methods\u0026quot; \u0026quot;GET, POST, HEAD, PUT, DELETE, OPTIONS, PATCH\u0026quot;;\radd_header \u0026quot;Access-Control-Allow-Headers\u0026quot; \u0026quot;Origin, No-Cache, Authorization, X-Requested-With, If-Modified-Since, Pragma, Last-Modified, Cache-Control, Expires, Content-Type\u0026quot;;\rif ($request_method = 'OPTIONS') {\rreturn 200 'OK';\r}\rproxy_pass http://fdfs_group2;\rexpires 30d;\r}\rlocation ^~ /group3/ {\rproxy_next_upstream http_502 http_504 error timeout invalid_header;\rproxy_cache http-cache;\rproxy_cache_valid 200 304 12h;\rproxy_cache_key $uri$is_args$args;\radd_header 'Access-Control-Allow-Origin' $http_origin;\radd_header 'Access-Control-Allow-Credentials' 'true';\radd_header \u0026quot;Access-Control-Allow-Methods\u0026quot; \u0026quot;GET, POST, HEAD, PUT, DELETE, OPTIONS, PATCH\u0026quot;;\radd_header \u0026quot;Access-Control-Allow-Headers\u0026quot; \u0026quot;Origin, No-Cache, Authorization, X-Requested-With, If-Modified-Since, Pragma, Last-Modified, Cache-Control, Expires, Content-Type\u0026quot;;\rif ($request_method = 'OPTIONS') {\rreturn 200 'OK';\r}\rproxy_pass http://fdfs_group3;\rexpires 30d;\r}\rlocation ~/purge(/.*) {\rallow 127.0.0.1;\rallow 192.168.1.0/24;\rallow 10.58.1.0/24;\rdeny all;\rproxy_cache_purge http-cache $1$is_args$args;\r}\r}\rserver {\rlisten 8888;\rserver_name localhost;\rlocation /ok.htm {\rreturn 200 \u0026quot;OK\u0026quot;;\r}\rlocation ~/group[0-9]/ {\rngx_fastdfs_module;\r}\rerror_page 500 502 503 504 /50x.html;\rlocation = /50x.html {\rroot html;\r}\r}\rupstream fdfs_group1 {\rserver 10.58.10.136:8888 max_fails=0;\rserver 10.58.10.137:8888 max_fails=0;\rkeepalive 10240;\rcheck interval=2000 rise=2 fall=3 timeout=1000 type=http default_down=false;\rcheck_http_send \u0026quot;GET /ok.htm HTTP/1.0\\r\\nConnection:keep-alive\\r\\n\\r\\n\u0026quot;;\rcheck_keepalive_requests 100;\r}\rupstream fdfs_group2 {\rserver 10.58.10.136:8888 max_fails=0;\rserver 10.58.10.138:8888 max_fails=0;\rkeepalive 10240;\rcheck interval=2000 rise=2 fall=3 timeout=1000 type=http default_down=false;\rcheck_http_send \u0026quot;GET /ok.htm HTTP/1.0\\r\\nConnection:keep-alive\\r\\n\\r\\n\u0026quot;;\rcheck_keepalive_requests 100;\r}\rupstream fdfs_group3 {\rserver 10.58.10.137:8888 max_fails=0;\rserver 10.58.10.138:8888 max_fails=0;\rkeepalive 10240;\rcheck interval=2000 rise=2 fall=3 timeout=1000 type=http default_down=false;\rcheck_http_send \u0026quot;GET /ok.htm HTTP/1.0\\r\\nConnection:keep-alive\\r\\n\\r\\n\u0026quot;;\rcheck_keepalive_requests 100;\r}\r}\nStart the nginx service: sudo nginx -c /etc/nginx/nginx.conf\nAccess http://localhost/ok.htm to verify if it returns a 200 status code with \u0026ldquo;OK\u0026rdquo; content. If startup fails, check the nginx logs located at /var/log/nginx/error.log. This is the default error log path for nginx in CentOS. If the error_log configuration has been modified, check the log file specified in the error_log directive.\nThe configurations for servers B and C are essentially identical, with completely consistent nginx configuration files. The only required modification is in the /etc/fdfs/mod_fastdfs.conf file: adjust the group_name according to the corresponding storage node configuration. Note that the [groupX] identifiers in square brackets must follow sequential increments starting from 1. Issue Summary\rWhen encountering issues, first, we troubleshoot service startup failures through console log messages; second, investigate via service logs. In most cases, logs are the most effective troubleshooting tool, bar none.\nLogs for tracker or storage services are stored under the logs directory within the path specified by the base_path configuration in their respective service configuration files. For nginx logs, if a custom log path is configured, check the specified directory. By default on CentOS, logs are typically located under /var/log/nginx/. Additionally, in this environment, we must also investigate logs for the fastdfs-nginx-module extension. Its logs reside in the directory specified by the base_path configuration in /etc/fdfs/mod_fastdfs.conf. Port Configuration for Multiple Groups in a Cluster\rEach group requires a dedicated storage service. Thus, port numbers for storage groups on the same host must not conflict. For example: Group1: Port 23000 Group2: Port 33000 Group3: Port 43000 Groups with the same name cannot exist on the same machine because the tracker synchronizes storage services within the same group, which requires identical port numbers. Therefore, groups with the same name across different hosts must use the same storage port configuration. Storage Path Configuration for Multiple Groups in a Cluster\rOn the same host, storage paths for different groups must be configured separately and should not overlap. Across different hosts, storage paths for groups with the same name may differ, but the number of storage nodes and disk capacity should remain consistent. Configuring Multiple Storage Services on the Same Node\rIn a 3-node cluster, two groups were deployed on the same machine, with each group maintaining a replica on another node. When starting the services using systemctl start fastdfs-storage-groupx.service (where groupx represents group1, group2, or group3), one group repeatedly failed to start, displaying:\n1 2 3 Loaded: loaded (/usr/lib/systemd/system/fastdfs-storage-group1.service; enabled; vendor preset: disabled) ... Active: inactive (dead) ➔ Active: exited Resolution: A systemctl daemon-reload is required before starting each storage service.\nExample: To start both fastdfs-storage-group1.service and fastdfs-storage-group2.service on Machine A:\nStart fastdfs-storage-group1.service: 1 2 systemctl daemon-reload systemctl start fastdfs-storage-group1.service Start fastdfs-storage-group2.service: 1 2 systemctl daemon-reload systemctl start fastdfs-storage-group2.service FastDFS Storage Service Configuration\nOn all storage machines, when modifying /etc/fdfs/mod_fastdfs.conf, adjust the group_name value according to the respective group of each storage. Pay special attention to two configuration items. Below is an example using group2 and group3:\nGlobal group_name\nThe group_name in the global configuration must be separated by \u0026ldquo;/\u0026rdquo;, and its value should match the group_name in local configurations. For example: 1 group_name=group2/group3 Steps to Configure and Start Services\nExecute the following commands for group1: 1 2 3 sudo systemctl daemon-reload sudo systemctl enable fastdfs-storage-group1.service sudo systemctl start fastdfs-storage-group1.service Check the status of fastdfs-storage-group1.service: 1 systemctl status fastdfs-storage-group1.service If the status shows active (running), proceed to start fastdfs-storage-group2.service. If not, check logs to troubleshoot:\n1 2 3 sudo systemctl daemon-reload sudo systemctl enable fastdfs-storage-group2.service sudo systemctl start fastdfs-storage-group3.service # Note: Verify group name consistency (group2 vs group3) Monitor the status of fastdfs-storage-group2.service until it becomes active. Notes\nYou may encounter a warning like Unknown lvalue 'ExecRestart' in section 'Service'. A known workaround (yum install systemd-*) might not resolve this issue. Contributions to fix this are welcome. FastDFS Nginx Module Configuration\nWhen modifying /etc/fdfs/mod_fastdfs.conf on storage machines:\nEnsure group_name values are correctly set for each storage group (e.g., group2 and group3). Follow the global/local group_name format as described above. \u0026lt;group_entries\u0026gt; \u0026lt;/group_entries\u0026gt;\n","date":"2023-07-04T00:00:00Z","permalink":"/en/p/fastdfs_cluster_deployment/","title":"Comprehensive Guide to FastDFS Cluster Deployment"},{"content":"Cause\rAfter investigation, it was identified that Ubuntu 21.10 and Fedora 35 began using glibc 2.34 and higher versions. In glibc 2.34, a new system call named clone3 was introduced. Normally, Docker intercepts all system calls in containers and determines how to handle them. If Docker lacks specific policies for a particular system call, its default policy returns a \u0026ldquo;Permission Denied\u0026rdquo; error to the container. However, when glibc receives this error, it does NOT fallback to alternative methods. It would only attempt fallback procedures if the response indicated \u0026ldquo;This system call is unavailable.\u0026rdquo;\nSolutions\rSolution 1\rAdd the following parameter when running the container to bypass Docker\u0026rsquo;s system call restrictions:\n1 --security-opt seccomp=unconfined Important caveats:\nThis compromises container security. This parameter cannot be used during image builds. Refer to Solution 2 for alternatives. Solution 2\rUpgrade Docker to version 20.10.8 or higher (\u0026gt;20.10.8).\nProduction environment considerations:\nUpgrading Docker versions in production may be challenging. When building images, avoid using Ubuntu 21.10, Fedora 35, or newer as base images, and verify if derived images are affected. Most official images are based on Debian - confirm Debian-based image compatibility to ensure they aren\u0026rsquo;t impacted by this issue. Solution 3\rUpgrade runc.\nhttps://github.com/opencontainers/runc/releases/\nCheck the runc version before upgrading using docker version:\n1. Select version 1.0.0-rc95 and download runc.amd64\n2. Upload the file to the server, rename it, and grant permissions\n1 mv runc.amd64 runc \u0026amp;\u0026amp; chmod +x runc 3. Back up the existing runc\n1 2 which runc mv 4. Stop Docker\n1 systemctl stop docker 5. Replace with the new runc version\n1 cp runc /usr/bin/runc 6. Start Docker\n1 systemctl start docker 7. Verify if runc was upgraded successfully\n1 docker version ","date":"2023-06-05T00:00:00Z","permalink":"/en/p/docker_apt_update_gpg_issue/","title":"Resolving apt Update GPG Signature Issues in Docker"},{"content":"Required File Downloads:\n1 2 wget https://minio-console.coderkang.top/files/atlassian-agent.jar wget https://minio-console.coderkang.top/files/mysql-connector-j-8.3.0.jar docker-compose.yaml\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 version: \u0026#39;3.8\u0026#39; name: atlassian services: confluence: image: atlassian/confluence-server:8.5.2 container_name: atlassian-confluence restart: always environment: - TZ=Asia/Shanghai - JVM_MINIMUM_MEMORY=4096m - JVM_MAXIMUM_MEMORY=8192m - ATL_DB_TYPE=mysql - ATL_JDBC_URL=jdbc:mysql://db:3306/confluence?sessionVariables=transaction_isolation=\u0026#39;READ-COMMITTED\u0026#39; - ATL_JDBC_USER=root - ATL_JDBC_PASSWORD=****** - JAVA_OPTS=\u0026#39;-javaagent:/opt/atlassian-agent.jar\u0026#39; ports: - 38090:8090 - 38091:8091 volumes: - ./application-data/:/var/atlassian/application-data/confluence/ # - ./setenv.sh:/opt/atlassian/confluence/bin/setenv.sh - ./atlassian-agent.jar:/opt/atlassian-agent.jar - ./mysql-connector-j-8.3.0.jar:/opt/atlassian/confluence/confluence/WEB-INF/lib/mysql-connector-j-8.3.0.jar healthcheck: test: [\u0026#34;CMD\u0026#34;, \u0026#34;curl\u0026#34;, \u0026#34;-f\u0026#34;, \u0026#34;http://localhost:8090\u0026#34;] interval: 10s timeout: 5s retries: 5 start_period: 30s depends_on: db: condition: service_healthy db: image: mysql:latest container_name: atlassian-db restart: always environment: MYSQL_ROOT_PASSWORD: ****** MYSQL_DATABASE: confluence command: --character-set-server=utf8mb4 --collation-server=utf8mb4_bin volumes: - ./application-db/data/:/var/lib/mysql/ - ./application-db/conf.d/:/etc/mysql/conf.d/ healthcheck: test: [\u0026#34;CMD\u0026#34;, \u0026#34;mysqladmin\u0026#34;, \u0026#34;ping\u0026#34;, \u0026#34;-h\u0026#34;, \u0026#34;localhost\u0026#34;, \u0026#34;-u\u0026#34;, \u0026#34;root\u0026#34;, \u0026#34;-p******\u0026#34;] interval: 10s timeout: 5s retries: 5 start_period: 30s 1 2 docker compose up -d # The startup process may take some time as it needs to wait for the MySQL service to fully initialize Access the webpage to obtain the Server ID.\n1 2 3 4 docker compose exec -it confluence java -jar /opt/atlassian-agent.jar -m \u0026lt;Email\u0026gt; -n \u0026lt;Username\u0026gt; -o \u0026lt;Organization\u0026gt; -p conf -s \u0026#39;\u0026lt;Server ID\u0026gt;\u0026#39; # Example: # docker compose exec -it confluence java -jar /opt/atlassian-agent.jar -m CoderKang@hotmail.com -n CoderKang -o NiKo -p conf -s \u0026#39;BHE8-N86V-SW29-TDDO\u0026#39; Click \u0026ldquo;Next\u0026rdquo; after entering the information and wait\u0026hellip;\n📌\rImportant\rIf encountering issues during installation, after stopping the container:\nDelete the application-db and application-data directories Rebuild the container ","date":"2023-03-22T00:00:00Z","permalink":"/en/p/docker_confluence_setup_guide/","title":"Guide to Setting Up Confluence with Docker"},{"content":"Pod Lifecycle\r📝\rNote\rPods follow a predefined lifecycle, starting from the Pending phase. If at least one primary container starts successfully, the Pod transitions to the Running phase. Subsequently, depending on whether any container in the Pod exits with a failure status, it enters either the Succeeded or Failed phase.\nWhen a Pod is deleted, some kubectl commands may display its status as Terminating. This Terminating state is not one of the official Pod phases.\nPhase Description Pending The Pod has been accepted by the Kubernetes system, but one or more containers have not been created or started. This phase includes time spent waiting for scheduling and downloading container images. Running The Pod is bound to a node, and all containers have been created. At least one container is still running, starting, or restarting. Succeeded All containers in the Pod have terminated successfully and will not be restarted. Failed All containers in the Pod have terminated, and at least one container exited due to failure (e.g., exited with a non-zero status or was terminated by the system without automatic restart configured). Unknown The Pod status cannot be retrieved, typically due to communication failures with the node hosting the Pod. Container States\rKubernetes monitors the state of each container within a Pod, similar to how it tracks the Pod lifecycle.\nOnce the scheduler assigns a Pod to a node, the kubelet initiates container creation for the Pod through the container runtime. A container can be in one of three states: Waiting, Running, or Terminated. To inspect the state of containers in a Pod, use kubectl describe pod \u0026lt;pod-name\u0026gt;. The output includes the status of each container within the Pod.\nEach state has specific implications:\nWaiting\rIf a container is neither in the Running nor Terminated state, it is Waiting. A container in the Waiting state is still performing operations required to start successfully, such as pulling a container image from a registry or applying ConfigMap/Secret data to the container.\nRunning\rThe Running state indicates that the container is actively executing and functioning without issues.\nTerminated\rA container in the Terminated state has completed execution, either normally or due to a failure.\nPod Failure Scenarios\rA Pod may encounter various exceptions during its lifecycle. Based on whether its containers are running, these failure scenarios can be broadly categorized into two groups:\nExceptions during container creation: These occur while the Pod is being scheduled or its containers are being created. The Pod remains stuck in the Pending phase. Exceptions during container execution: These occur while containers are running. The Pod’s stage varies depending on the specific scenario. 4. Container Probes\r📝\rNote\rProbes are a mechanism used by kubelet to periodically check the status of containers. To perform a check, kubelet can execute code inside the container or make a network request.\nLiveness Probe\rWhat is a Liveness Probe?\rA liveness probe determines whether a container is running. If the probe fails, the kubelet kills the container, and the container is subjected to the restart policy.\nIf a container\u0026rsquo;s liveness probe fails multiple times, the kubelet will restart the container. If the liveness probe fails, the kubelet will kill the container, and the container will be subjected to the restart policy. If a container does not provide a liveness probe, the default status is Success.\nThe liveness probe does not wait for the readiness probe to succeed. If you want to wait for the liveness probe to execute before the readiness probe, you can define initialDelaySeconds, or use the startup probe.\nWhen to Use a Liveness Probe?\rIf a container\u0026rsquo;s process can crash on its own, you may not need a liveness probe; kubelet will automatically restart the container based on the restartPolicy.\nIf you want the container to be killed and restarted when the probe fails, specify a liveness probe and set the restartPolicy to \u0026ldquo;Always\u0026rdquo; or \u0026ldquo;OnFailure\u0026rdquo;.\nReadiness Probe\rWhat is a Readiness Probe?\rA readiness probe determines when a container is ready to accept traffic. This probe is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming up caches.\nIf the readiness probe returns a failure status, Kubernetes removes the Pod from the endpoints of all associated Services.\nThe readiness probe continues to run throughout the container\u0026rsquo;s lifecycle.\nWhen to Use a Readiness Probe?\rSpecify a readiness probe if you want to start sending traffic to a Pod only after the probe succeeds. In this case, the readiness probe might be the same as the liveness probe. However, the presence of a readiness probe in the specification ensures that the Pod does not receive any data during the startup phase and only starts receiving traffic after the probe succeeds.\nYou can also define a readiness probe if you want the container to enter a maintenance state independently. This probe should check a readiness-specific endpoint, which differs from the liveness probe.\nIf your application has strict dependencies on backend services, implement both liveness and readiness probes. After the liveness probe confirms the application is healthy, the readiness probe can perform additional checks to verify the availability of required backend services. This helps avoid directing traffic to Pods that might return errors.\nFor containers requiring large data loading, configuration file processing, or migrations during startup, use a startup probe. However, if you need to distinguish between a failed application and one still initializing, a readiness probe may be more appropriate.\nStartup Probe (startupProbe)\rWhat is a Startup Probe?\rA startup probe checks whether an application within a container has started. It is designed for containers with slow startup times to prevent kubelet from terminating them prematurely before they begin running.\nIf configured, this probe disables liveness and readiness checks until the startup probe succeeds.\nThe startup probe executes only during the container\u0026rsquo;s initialization phase, unlike the readiness probe, which runs periodically.\nWhen to Use a Startup Probe?\rThe startup probe is useful for Pods that contain containers requiring extended time to become ready. Instead of configuring a long liveness probe interval, you can set up a separate configuration option to probe the container during startup. This allows exceeding the duration permitted by the liveness check interval by a significant margin.\nIf a container\u0026rsquo;s startup time typically exceeds the total value of initialDelaySeconds + failureThreshold × periodSeconds, a startup probe should be configured to check the same endpoint used by the liveness probe. The default value for periodSeconds is 10 seconds. Set its failureThreshold high enough to ensure sufficient startup time for the container while retaining the default values for the liveness probe. This configuration helps mitigate deadlock scenarios.\nProbe Execution\rProbe Target Action Effect Runtime Liveness Container Restart Restart container Entire container lifecycle Readiness Endpoint Remove Remove from service endpoints (no traffic) Entire container lifecycle Startup Container Restart Restart container Executed once after container starts Running Probe Usage\rThere are four different methods to use probes to check containers. Each probe must be precisely defined as one of these four mechanisms:\nProbe Type Description exec Executes a specified command inside the container. The diagnosis is considered successful if the command exits with a status code of 0. tcpSocket Performs a TCP check on the specified port of the container\u0026rsquo;s IP address. The diagnosis is successful if the port is open. If the remote system (container) closes the connection immediately after opening it, this is still considered healthy. httpGet Sends an HTTP GET request to the specified port and path on the container\u0026rsquo;s IP address. The diagnosis is successful if the response status code is between 200 and 399 (inclusive). grpc Uses gRPC to perform a remote procedure call. The target should implement the gRPC health check. If the response status is \u0026ldquo;SERVING\u0026rdquo;, the diagnosis is considered successful. Probe Parameters Description initialDelaySeconds Number of seconds to wait after the container starts before initiating startup, liveness, and readiness probes. If a startup probe is defined, the delays for liveness and readiness probes will begin only after the startup probe succeeds. If periodSeconds is greater than initialDelaySeconds, initialDelaySeconds is ignored. The default is 0 seconds, and the minimum value is 0. periodSeconds Interval (in seconds) at which probes are executed. The default is 10 seconds. The minimum value is 1. timeoutSeconds The number of seconds to wait after a probe times out. Default is 1 second. Minimum value is 1. successThreshold Minimum consecutive successes required for a probe to be considered successful after failure. Default is 1. For liveness and startup probes, this value must be 1. Minimum value is 1. failureThreshold After failureThreshold consecutive failures, Kubernetes considers the overall check failed: container status becomes unready/unhealthy/inactive. Default is 3, minimum is 1.\nFor liveness or startup probes: If ≥ failureThreshold probes fail, Kubernetes triggers container restart (following terminationGracePeriodSeconds).\nFor readiness probes: kubelet continues executing failed probes but marks Pod\u0026rsquo;s Ready condition as false. terminationGracePeriodSeconds Configures the grace period for kubelet to wait between triggering container termination and forcing runtime stop. Default inherits Pod-level value (30s if unset). Minimum is 1. Added in Kubernetes v1.25, only effective for startup and liveness probes. Example\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 apiVersion: v1 kind: Namespace metadata: name: k8s-test --- apiVersion: v1 kind: ConfigMap metadata: name: probe-demo namespace: k8s-test data: default.conf: | server { listen 80; server_name localhost; keepalive_timeout 0; # Disable Keep-Alive location / { root /usr/share/nginx/html; index index.html index.htm; } } --- apiVersion: v1 kind: Service metadata: name: probe-demo namespace: k8s-test spec: selector: app: probe-demo ports: - name: http protocol: TCP port: 80 targetPort: 80 nodePort: 31080 type: NodePort --- apiVersion: v1 kind: Pod metadata: name: probe-demo namespace: k8s-test labels: app: probe-demo spec: restartPolicy: OnFailure # Pod restart policy: OnFailure (restart only on non-zero exit code), Always (restart on any exit), Never (no restart) containers: - name: probe-demo image: 192.168.142.99:7891/devops/nginx:latest # command: [\u0026#34;/bin/sh\u0026#34;, \u0026#34;-c\u0026#34;, \u0026#34;sleep 10\u0026#34;] # Simulate Pod Succeeded status # command: [\u0026#34;/bin/sh\u0026#34;, \u0026#34;-c\u0026#34;, \u0026#34;sleep infinity\u0026#34;] # For startup probe testing command: [\u0026#34;/bin/sh\u0026#34;, \u0026#34;-c\u0026#34;, \u0026#34;set -ex \u0026amp;\u0026amp; nohup nginx \u0026amp;\u0026amp; touch /tmp/healthy \u0026amp;\u0026amp; sleep 30 \u0026amp;\u0026amp; rm -f /tmp/healthy \u0026amp;\u0026amp; sleep 600\u0026#34;] # startupProbe: # exec: # command: # - cat # - /tmp/healthy # initialDelaySeconds: 5 # Wait 5s after container starts # periodSeconds: 5 # Probe every 5s # timeoutSeconds: 1 # Probe timeout # successThreshold: 1 # Minimum consecutive successes to consider probe successful # failureThreshold: 3 # Consecutive failures needed to mark probe failed # livenessProbe: # exec: # command: # - cat # - /tmp/healthy # initialDelaySeconds: 5 # periodSeconds: 5 # timeoutSeconds: 1 # failureThreshold: 3 readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 failureThreshold: 3 volumeMounts: - name: config-volume mountPath: /etc/nginx/conf.d/default.conf subPath: default.conf volumes: - name: config-volume configMap: name: probe-demo items: - key: default.conf path: default.conf Practical Demonstration\rProcess Analysis\r","date":"2023-02-18T00:00:00Z","permalink":"/en/p/k8s_lifecycle_and_probes/","title":"Kubernetes Lifecycle and Probes Deep Dive"},{"content":"\rTo resolve conflicts between OpenVPN and proxy software\nwfg/docker-openvpn-client: OpenVPN client with killswitch and proxy servers; built on Alpine (github.com)\nBuild Files\rDockerfile\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 FROM alpine:3.17 RUN apk add --no-cache \\ bash \\ bind-tools \\ iptables \\ ip6tables \\ openvpn COPY . /usr/local/bin ENV KILL_SWITCH=on ENTRYPOINT [ \u0026#34;entry.sh\u0026#34; ] killswitch\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 #!/usr/bin/env bash set -o errexit set -o nounset set -o pipefail # Block all traffic not going through tun0 interface iptables --insert OUTPUT \\ ! --out-interface tun0 \\ --match addrtype ! --dst-type LOCAL \\ ! --destination \u0026#34;$(ip -4 -oneline addr show dev eth0 | awk \u0026#39;NR == 1 { print $4 }\u0026#39;)\u0026#34; \\ --jump REJECT # Create static routes for allowed subnets default_gateway=$(ip -4 route | awk \u0026#39;$1 == \u0026#34;default\u0026#34; { print $3 }\u0026#39;) for subnet in ${1//,/ }; do ip route add \u0026#34;$subnet\u0026#34; via \u0026#34;$default_gateway\u0026#34; iptables --insert OUTPUT --destination \u0026#34;$subnet\u0026#34; --jump ACCEPT done # Whitelist OpenVPN server addresses global_port=$(awk \u0026#39;$1 == \u0026#34;port\u0026#34; { print $2 }\u0026#39; \u0026#34;${config:?\u0026#34;config file not found by kill switch\u0026#34;}\u0026#34;) global_protocol=$(awk \u0026#39;$1 == \u0026#34;proto\u0026#34; { print $2 }\u0026#39; \u0026#34;${config:?\u0026#34;config file not found by kill switch\u0026#34;}\u0026#34;) remotes=$(awk \u0026#39;$1 == \u0026#34;remote\u0026#34; { print $2, $3, $4 }\u0026#39; \u0026#34;${config:?\u0026#34;config file not found by kill switch\u0026#34;}\u0026#34;) ip_regex=\u0026#39;^(([1-9]?[0-9]|1[0-9][0-9]|2([0-4][0-9]|5[0-5]))\\.){3}([1-9]?[0-9]|1[0-9][0-9]|2([0-4][0-9]|5[0-5]))$\u0026#39; while IFS= read -r line; do # Process each remote entry IFS=\u0026#34; \u0026#34; read -ra remote \u0026lt;\u0026lt;\u0026lt; \u0026#34;${line%%\\#*}\u0026#34; address=${remote[0]} port=${remote[1]:-${global_port:-1194}} protocol=${remote[2]:-${global_protocol:-udp}} # Allow IP addresses and handle domain resolution if [[ $address =~ $ip_regex ]]; then iptables --insert OUTPUT --destination \u0026#34;$address\u0026#34; --protocol \u0026#34;$protocol\u0026#34; --destination-port \u0026#34;$port\u0026#34; --jump ACCEPT else for ip in $(dig -4 +short \u0026#34;$address\u0026#34;); do iptables --insert OUTPUT --destination \u0026#34;$ip\u0026#34; --protocol \u0026#34;$protocol\u0026#34; --destination-port \u0026#34;$port\u0026#34; --jump ACCEPT echo \u0026#34;$ip $address\u0026#34; \u0026gt;\u0026gt; /etc/hosts done fi done \u0026lt;\u0026lt;\u0026lt; \u0026#34;$remotes\u0026#34; entry.sh\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 #!/usr/bin/env bash set -o errexit set -o nounset set -o pipefail cleanup() { kill TERM \u0026#34;$openvpn_pid\u0026#34; exit 0 } is_enabled() { [[ ${1,,} =~ ^(true|t|yes|y|1|on|enable|enabled)$ ]] } # Either a specific file name or a pattern. if [[ $CONFIG_FILE ]]; then config_file=$(find /config -name \u0026#34;$CONFIG_FILE\u0026#34; 2\u0026gt; /dev/null | sort | shuf -n 1) else config_file=$(find /config -name \u0026#39;*.conf\u0026#39; -o -name \u0026#39;*.ovpn\u0026#39; 2\u0026gt; /dev/null | sort | shuf -n 1) fi if [[ -z $config_file ]]; then echo \u0026#34;no openvpn configuration file found\u0026#34; \u0026gt;\u0026amp;2 exit 1 fi echo \u0026#34;using openvpn configuration file: $config_file\u0026#34; openvpn_args=( \u0026#34;--config\u0026#34; \u0026#34;$config_file\u0026#34; \u0026#34;--cd\u0026#34; \u0026#34;/config\u0026#34; ) if is_enabled \u0026#34;$KILL_SWITCH\u0026#34;; then openvpn_args+=(\u0026#34;--route-up\u0026#34; \u0026#34;/usr/local/bin/killswitch.sh $ALLOWED_SUBNETS\u0026#34;) fi # Docker secret that contains the credentials for accessing the VPN. if [[ $AUTH_SECRET ]]; then openvpn_args+=(\u0026#34;--auth-user-pass\u0026#34; \u0026#34;/run/secrets/$AUTH_SECRET\u0026#34;) fi openvpn \u0026#34;${openvpn_args[@]}\u0026#34; \u0026amp; openvpn_pid=$! trap cleanup TERM wait $openvpn_pid Build and Start\rdocker-compose.yaml\rPublic image: ghcr.io/wfg/openvpn-client\nBuild image:\n1 docker compose build Start: 1 docker compose up -d 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 services: openvpn-client: image: openvpn-client:latest # ghcr.io/wfg/openvpn-client build: context: ./ dockerfile: Dockerfile container_name: openvpn-client cap_add: - NET_ADMIN environment: # - HTTP_PROXY=on - SOCKS_PROXY=on devices: - /dev/net/tun:/dev/net/tun volumes: - ./local:/config - ./local:/data/vpn ports: # - 8080:8080 # HTTP_PROXY port - 1080:1080 # SOCKS_PROXY port restart: unless-stopped Using with Proxy Software\rUtilizing the Rule Configuration routing policy in Clash software\n1 2 3 4 5 6 7 proxies: - { name: OpenVPN, server: 127.0.0.1, port: 1080, type: socks5 } rules: - IP-CIDR,192.168.142.0/24,OpenVPN - IP-CIDR,10.10.10.0/24,OpenVPN - IP-CIDR,172.16.0.0/24,OpenVPN One More Thing\rIn addition to OpenVPN, other campus VPN clients like EasyConnect (used for free CNKI access) can be deployed in Docker containers. Expose HTTP/SOCKS proxies and configure routing rules to achieve elegant cross-network environment access.\n1 2 - { name: \u0026#39;CNKI\u0026#39;, type: socks5, server: 10.10.10.45, port: 57080} - \u0026#39;DOMAIN-SUFFIX,cnki.net,CNKI\u0026#39; ","date":"2022-06-14T00:00:00Z","permalink":"/en/p/docker_openvpn_client_proxy_integration/","title":"Dockerized OpenVPN Client with Proxy Integration"},{"content":"Environment Preparation\rDisable Firewall and Mail Service\r1 2 3 4 5 6 7 8 9 # Check firewall status firewall-cmd --state # Temporarily disable firewall systemctl stop firewalld.service # Disable firewall auto-start systemctl disable firewalld.service systemctl stop postfix.service systemctl disable postfix.service Disable SELinux\r1 2 3 4 5 6 # Check SELinux status getenforce # Temporarily disable SELinux setenforce 0 # Permanently disable SELinux sed -i \u0026#39;s/^ *SELINUX=enforcing/SELINUX=disabled/g\u0026#39; /etc/selinux/config Disable Swap\r1 2 3 4 5 6 # Temporarily disable swap swapoff -a # Permanently disable swap sed -i.bak \u0026#39;/swap/s/^/#/\u0026#39; /etc/fstab # Verify free -g Tuning Kernel Parameters and Modules\rAfter system installation, appropriately adjusting kernel parameters based on practical application scenarios can help establish a more efficient and stable system environment. This includes:\nOptimizing resource allocation parameters (e.g., modifying the maximum number of open file descriptors to improve high-concurrency service capabilities) Adjusting network parameters using sysctl (e.g., modifying ip_local_port_range, nf_conntrack_max, and socket buffer sizes) Managing kernel modules: Loading or disabling unnecessary modules (e.g., disabling unused hardware drivers to reduce potential vulnerabilities while enabling specific acceleration modules for performance enhancement) Typical operation examples:\n1 2 3 4 5 6 7 8 # Modify network port range sysctl -w net.ipv4.ip_local_port_range=\u0026#34;1024 65000\u0026#34; # Adjust connection tracking table size echo 1048576 \u0026gt; /proc/sys/net/netfilter/nf_conntrack_max # Dynamically load zRAM module modprobe zram Key principle: Make targeted adjustments according to actual service requirements, using monitoring data as guidance while maintaining parameter modification records.\nEnable ipvs\rIf ipvs is not enabled, iptables will be used for packet forwarding (less efficient). Recommended to enable ipvs for better performance.\n1 2 3 4 5 6 7 8 cat \u0026lt;\u0026lt;EOF\u0026gt; /etc/sysconfig/modules/ipvs.modules #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack EOF 1 2 3 4 5 6 7 8 # Load modules chmod 755 /etc/sysconfig/modules/ipvs.modules \u0026amp;\u0026amp; bash /etc/sysconfig/modules/ipvs.modules \u0026amp;\u0026amp; lsmod | grep -e ip_vs -e nf_conntrack_ipv4 # Install ipset package yum install ipset -y # Install management tool ipvsadm yum install ipvsadm -y Synchronize server time\r1 yum install chrony -y 1 2 3 4 5 6 7 8 9 # Master node configuration vim /etc/chrony.conf server ntp.aliyun.com iburst allow 192.168.142.0/24 # Slave node vim /etc/chrony.conf server controller iburst 1 2 systemctl restart chronyd.service systemctl enable chronyd.service Modify Hostname and Hosts (Optional)\r1 2 3 4 5 6 7 8 9 10 11 12 hostnamectl set-hostname master hostnamectl set-hostname node1 hostnamectl set-hostname node2 cat \u0026gt;\u0026gt; /etc/hosts \u0026lt;\u0026lt; EOF 192.168.231.3 master 192.168.231.4 node1 EOF echo -e \u0026#34;\\033[32m [Host Resolution] ==\u0026gt; OK \\033[0m\u0026#34; System Log Optimization\r1 2 mkdir /var/log/journal mkdir /etc/systemd/journald.conf.d 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 cat \u0026gt; /etc/systemd/journald.conf.d/99-prophet.conf \u0026lt;\u0026lt; EOF [journal] # Persist logs to disk Storage=persistent # Compress historical logs Compress=yes # Set sync interval to disk as 5 minutes SyncIntervalSec=5m # Control rate limiting for log messages RateLimitInterval=30s RateLimitBurst=1000 # Maximum disk usage 10G SystemMaxUse=10G # Control runtime log size RuntimeMaxUse=500M # Max individual log file size 200M SystemMaxFileSize=200M # Log retention period (2 weeks) MaxRetentionSec=2week # Disable syslog forwarding ForwardToSyslog=no # Enable sealing mode when size limits reached SystemMaxSealing=yes RuntimeMaxSealing=yes # Enable watchdog timer (restart service after 1h inactivity) RuntimeWatchdogSec=1h # Enable automatic log file optimization SystemMaxFilesTidy=yes RuntimeMaxFilesTidy=yes # Max age for individual log files RuntimeMaxFileSec=7days # Auto-select best storage mode StorageAuto=yes EOF 1 2 systemctl restart systemd-journald echo -e \u0026#34;\\033[32m [Log Optimization] ==\u0026gt; OK \\033[0m\u0026#34; Docker Installation\r1 2 3 4 5 6 7 8 9 set -e sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo systemctl start docker sudo systemctl enable docker vim /etc/docker/daemon.json 1 2 3 4 5 6 { \u0026#34;exec-opts\u0026#34;: [\u0026#34;native.cgroupdriver=systemd\u0026#34;], \u0026#34;insecure-registries\u0026#34;: [ \u0026#34;192.168.142.99\u0026#34; ] } 1 2 sudo systemctl restart docker docker login 192.168.142.99 Kubernetes Installation\rInstalling kubelet, kubeadm, and kubectl\rAdding Kubernetes Repository\n1 2 3 4 5 6 7 8 9 10 cat \u0026lt;\u0026lt;EOF \u0026gt; /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF Install kubeadm, kubelet, and kubectl\n1 2 3 4 5 6 7 # Check version, the latest version is 1.23.5-0 yum list kubeadm --showduplicates | sort -r yum install -y kubelet-1.23.5-0 kubectl-1.23.5-0 kubeadm-1.23.5-0 kubeadm version # kubeadm version: \u0026amp;version.Info{Major:\u0026#34;1\u0026#34;, Minor:\u0026#34;23\u0026#34;, GitVersion:\u0026#34;v1.23.5\u0026#34;, GitCommit:\u0026#34;c285e781331a3785a7f436042c65c5641ce8a9e9\u0026#34;, GitTreeState:\u0026#34;clean\u0026#34;, BuildDate:\u0026#34;2022-03-16T15:57:37Z\u0026#34;, GoVersion:\u0026#34;go1.17.8\u0026#34;, Compiler:\u0026#34;gc\u0026#34;, Platform:\u0026#34;linux/amd64\u0026#34;} Modify kubelet configuration\n1 2 # Modify configuration file /etc/sysconfig/kubelet (this file does not exist by default and needs to be created) KUBELET_EXTRA_ARGS=--root-dir=/var/lib/kubelet Start kubelet service and enable it to start on boot\n1 2 systemctl start kubelet systemctl enable kubelet Initialize Kubernetes Cluster\rInitialization via Configuration File\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 cat \u0026lt;\u0026lt;EOF\u0026gt; kubeadm.yaml apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.4.27 # Internal IP of the apiserver node bindPort: 6443 nodeRegistration: criSocket: /run/containerd/containerd.sock # Switch to containerd imagePullPolicy: IfNotPresent name: master taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: type: CoreDNS # DNS type (CoreDNS) etcd: local: dataDir: /var/lib/etcd imageRepository: registry.aliyuncs.com/google_containers # Modified image repository for accessibility kind: ClusterConfiguration kubernetesVersion: 1.23.5 # Kubernetes version networking: dnsDomain: cluster.local podSubnet: 10.244.0.0/16 serviceSubnet: 10.96.0.0/12 scheduler: {} --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs # kube-proxy mode EOF 1 kubeadm init --config kubeadm.yaml 📌\rImportant\rkubeadm-installed certificates have a default validity period of one year\nkube-proxy operates in iptables mode by default. This can be modified via kubectl edit configmap kube-proxy -n kube-system\nImperative Initialization\r1 2 3 4 5 6 7 8 9 10 11 kubeadm init \\ --control-plane-endpoint k8svip:8443 \\ --kubernetes-version=v1.23.5 \\ --service-cidr=172.96.0.0/12 \\ --pod-network-cidr=172.244.0.0/16 \\ --image-repository registry.aliyuncs.com/google_containers \\ --upload-certs kubeadm join k8svip:8443 --token i8zsn5.dakiqfxexdxn7wdt \\ --discovery-token-ca-cert-hash sha256:61ed5a0941ecf47078dac91c4389bc8abb9c761149e869a40f9c3da859b39dba \\ --control-plane --certificate-key fc133d520c12052c9391e075c3aa6dda456599b70b1335aba2c3e0680e75af6e 1 2 3 4 5 6 7 kubeadm init --kubernetes-version v1.23.5 \\ --apiserver-advertise-address 172.16.0.185 \\ --image-repository registry.aliyuncs.com/google_containers \\ --service-cidr 172.96.0.0/12 \\ --pod-network-cidr 172.244.0.0/16 \\ --upload-certs \\ --v=5 Install Calico Network Plugin (Execute on master node)\r1 2 3 # Download manifest curl https://docs.projectcalico.org/manifests/calico.yaml -o calico.yaml kubectl apply -f calico.yaml Installing Auto-Completion Tools\r1 2 3 4 yum install -y bash-completion source /usr/share/bash-completion/bash_completion source \u0026lt;(kubectl completion bash) echo \u0026#34;source \u0026lt;(kubectl completion bash)\u0026#34; \u0026gt;\u0026gt; ~/.bashrc Deployment Verification Cluster\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 cat \u0026lt;\u0026lt; EOF \u0026gt; nginx-ds.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deploy-game namespace: default spec: replicas: 8 selector: matchLabels: app: game release: stabel template: metadata: labels: app: game release: stabel env: test spec: imagePullSecrets: - name: kkregcred containers: - name: game image: registry.cn-beijing.aliyuncs.com/kaikai136/docker-2048:v1 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: game-svc namespace: default spec: type: NodePort selector: app: game release: stabel ports: - name: http port: 80 targetPort: 80 nodePort: 32000 protocol: TCP EOF kubectl apply -f nginx-ds.yaml\nkubectl create deployment nginx \u0026ndash;image=nginx kubectl expose deployment nginx \u0026ndash;port=80 \u0026ndash;type=NodePort kubectl get pod,svc\nTesting Calico Network\r1 kubectl run busybox --image docker.io/library/busybox:1.28 --image-pull-policy=IfNotPresent --restart=Never --rm -it busybox -- sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 / # ping www.baidu.com PING www.baidu.com (180.101.50.242): 56 data bytes 64 bytes from 180.101.50.242: seq=0 ttl=51 time=7.880 ms 64 bytes from 180.101.50.242: seq=1 ttl=51 time=7.247 ms ^C # The network accessibility indicates Calico network plugin is properly installed --- www.baidu.com ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 7.247/7.563/7.880 ms / # nslookup kubernetes.default.svc.cluster.local Server: 172.96.0.10 Address 1: 172.96.0.10 kube-dns.kube-system.svc.cluster.local Name: kubernetes.default.svc.cluster.local Address 1: 172.96.0.1 kubernetes.default.svc.cluster.local # 172.96.0.10 corresponds to the clusterIP of coreDNS, verifying coreDNS configuration. # Internal Service name resolution is handled by coreDNS. # Note: # Must use busybox version 1.28 as specified - latest versions may fail to resolve DNS and IP addresses with nslookup Installing Additional Tools\rAuto Completion Tools\r1 2 3 4 yum install -y bash-completion source /usr/share/bash-completion/bash_completion source \u0026lt;(kubectl completion bash) echo \u0026#34;source \u0026lt;(kubectl completion bash)\u0026#34; \u0026gt;\u0026gt; ~/.bashrc ","date":"2022-05-31T00:00:00Z","image":"/p/centos7_kubernetes_1.23.5_installation_guide/cover_english.png","permalink":"/en/p/centos7_kubernetes_1.23.5_installation_guide/","title":"Complete Guide to Deploying Kubernetes 1.23.5 Cluster on CentOS 7"},{"content":"\r📝\rNote\rdeviantony/docker-elk: The Elastic stack (ELK) powered by Docker and Compose. (github.com)\n1 2 git clone https://github.com/deviantony/docker-elk cd docker-elk 1 vim .env 1 2 docker compose up setup docker compose up 1 2 docker compose down -v setup docker compose down -v ","date":"2022-05-22T00:00:00Z","permalink":"/en/p/docker_elk_installation_guide/","title":"Docker Deployment Guide for ELK"},{"content":"CentOS7 Kernel Upgrade\rDownload kernel source:\n1 rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm Install the latest kernel version:\n1 yum --enablerepo=elrepo-kernel install -y kernel-lt Check entries:\n1 cat /boot/grub2/grub.cfg | grep menuentry Set default boot kernel:\n1 grub2-set-default \u0026#34;CentOS Linux (4.4.221-1.el7.elrepo.x86_64) 7 (Core)\u0026#34; Disable Firewall\r1 2 systemctl stop firewalld systemctl disable firewalld Install Common Tools\r1 yum install -y conntrack ntpdate ntp ipvsadm ipset jq iptables curl sysstat libseccomp wget vim git net-tools dos2unix lsof tcpdump lrzsz telnet bash-completion.noarch conntrack-tools Linux completion:\n1 yum install libvirt-bash-completion bash-completion gedit-plugin-bracketcompletion gedit-plugin-wordcompletion libguestfs-bash-completion -y Configure SELinux\r1 2 setenforce 0 sed -i \u0026#39;/^SELINUX=/ s/enforcing/disabled/\u0026#39; /etc/selinux/config Update History and Shell Timeout Settings\rEdit /etc/profile:\n1 2 export HISTSIZE=100 export TMOUT=300 Disable swap partition\r1 2 3 4 5 swapoff -a # To permanently disable swap partition, comment out the swap line in the following file sed -i \u0026#39;/ swap / s/^\\(.*\\)$/#\\1/g\u0026#39; /etc/fstab echo \u0026#34;vm.swappiness = 0\u0026#34;\u0026gt;\u0026gt; /etc/sysctl.conf sysctl -p Disable mail service\r1 2 systemctl stop postfix.service systemctl disable postfix.service Log optimization\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 mkdir /var/log/journal mkdir /etc/systemd/journald.conf.d cat \u0026gt; /etc/systemd/journald.conf.d/99-prophet.conf \u0026lt;\u0026lt; EOF [journal] # Persistent storage to disk Storage=persistent # Compress historical logs Compress=yes SyncIntervalSec=5m RateLimitInterval=30s RateLimitBurst=1000 # Maximum disk space 10G SystemMaxUse=10G # Single log file maximum size 200M SystemMaxFileSize=200M # Log retention time 2 weeks MaxRetentionSec=2week # Do not forward logs to syslog ForwardToSyslog=no EOF systemctl restart systemd-journald Load ipvs modules\r1 2 3 4 5 6 7 8 9 10 cat \u0026gt; /etc/sysconfig/modules/ipvs.modules \u0026lt;\u0026lt;EOF modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 EOF chmod 755 /etc/sysconfig/modules/ipvs.modules bash /etc/sysconfig/modules/ipvs.modules File Optimization\recho \u0026lsquo;* - nofile 65535 \u0026rsquo; \u0026raquo;/etc/security/limits.conf echo \u0026lsquo;vm.max_map_count=262144 \u0026rsquo; \u0026raquo;/etc/security/limits.conf\nsysctl vm.overcommit_memory=1\ntail -1 /etc/security/limits.conf sysctl -p\nKernel Optimization\r1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 cat \u0026gt;\u0026gt;/etc/sysctl.conf\u0026lt;\u0026lt;EOF net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 1200 net.ipv4.ip_local_port_range = 4000 65000 net.ipv4.tcp_max_syn_backlog = 262144 net.ipv4.tcp_max_tw_buckets = 36000 net.ipv4.route.gc_timeout = 100 net.ipv4.tcp_syn_retries = 1 net.ipv4.tcp_synack_retries = 1 net.core.somaxconn = 262144 net.core.netdev_max_backlog = 262144 net.ipv4.tcp_max_orphans = 16384 net.ipv4.tcp_mem = 94500000 915000000 927000000 EOF sysctl -p Explanation of Network Parameters\rnet.ipv4.tcp_syncookies = 1: Enables SYN Cookies. When the SYN backlog overflows, cookies are used to handle connections, mitigating minor SYN flooding attacks. Default: 0 (disabled). net.ipv4.tcp_tw_reuse = 1: Allows reusing TIME-WAIT sockets for new TCP connections. Default: 0 (disabled). net.ipv4.tcp_tw_recycle = 1: Enables fast recycling of TIME-WAIT sockets. Default: 0 (disabled). net.ipv4.tcp_fin_timeout = 30: Defines the time (in seconds) a connection remains in FIN-WAIT-2 state if closed locally. net.ipv4.tcp_keepalive_time = 1200: Sets the frequency (in seconds) for TCP keepalive probes. Default: 7200 (2 hours), modified to 1200 (20 minutes). net.ipv4.ip_local_port_range = 1024 65000: Specifies the port range for outgoing connections. Default: 32768-61000, expanded to 1024-65000. net.ipv4.tcp_max_syn_backlog = 8192: Sets the maximum length of the SYN queue to accommodate more pending connections. Default: 1024. net.ipv4.tcp_max_tw_buckets = 5000: Limits the maximum number of TIME-WAIT sockets. Exceeding this threshold triggers immediate cleanup. Default: 180000, adjusted for servers like Apache/Nginx to reduce TIME-WAIT sockets. Squid may require additional tuning. SSH Service Optimization\r1 2 3 4 5 6 7 8 9 10 11 # Edit SSH configuration vim /etc/ssh/sshd_config # Disable GSSAPI authentication GSSAPIAuthentication no # Disable DNS resolution checks UseDNS no # (Remove the \u0026#39;#\u0026#39; to uncomment; default is disabled) # Restart SSH service systemctl restart sshd II. System-related Commands\r1. CPU Core Count, Model, and Clock Speed\r1 cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c 2. Testing Disk I/O Performance\r1). hdparm Command\rThe hdparm command provides a CLI interface for reading and setting parameters of IDE/SCSI hard drives. Note: This command only tests disk read speed.\n1 2 3 4 5 [root@server-68.2.stage.polex.io var ]$ hdparm -Tt /dev/polex_pv/varvol /dev/polex_pv/varvol: Timing cached reads: MB in 2.00 seconds = 7803.05 MB/sec Timing buffered disk reads: MB in 3.01 seconds = 374.90 MB/sec [Additional translation of subsequent sections would continue here following the same pattern]\n2). The dd Command\rThe Linux dd command is used to read, convert, and output data. dd can read data from standard input or files, transform it according to specified formats, and then output it to files, devices, or standard output.\nWe can use the copy function of the dd command to test the IO performance of a disk. Note that dd provides only a rough measurement of disk IO performance and is not highly accurate.\n1 2 3 4 5 6 7 8 [root@server-68.2.stage.polex.io var ]$ time dd if=/dev/zero of=test.file bs=1G count= oflag=direct + records in + records out bytes (2.1 GB) copied, 13.5487 s, MB/s real 0m13.556s user 0m0.000s sys 0m0.888s ??? note \u0026ldquo;Parameter Explanation\u0026rdquo; As shown, the disk write speed for this partition is 159 MB/s. Key parameters include:\n- `/dev/zero`: A pseudo-device that generates empty character streams; no IO is incurred.\r- `if`: Specifies the input file for `dd` to read from.\r- `of`: Specifies the output file for `dd` to write to.\r- `bs`: Defines the block size for each write operation.\r- `count`: Sets the number of blocks to write.\r- `oflag=direct`: Required for IO testing, ensures direct writes to disk (bypassing cache).\r3). FIO Testing Disk IO Performance\rThe fio command is specifically used to test IOPS and is more accurate than the dd command. The fio command has many parameters. Here are some examples for reference:\n1 yum install fio 1 2 3 4 5 6 7 8 9 10 # Random read: fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=4k -size=60G -numjobs=64 -runtime=10 -group_reporting -name=file # Sequential read: fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=4k -size=60G -numjobs=64 -runtime=10 -group_reporting -name=file # Random write: fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=60G -numjobs=64 -runtime=10 -group_reporting -name=file # Sequential write: fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=60G -numjobs=64 -runtime=10 -group_reporting -name=file # Mixed random read/write: fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=30 -ioengine=psync -bs=4k -size=60G -numjobs=64 -runtime=10 -group_reporting -name=file -ioscheduler=noop In the results, bw=1532.2KB/s, iops=383 indicates the measured IOPS.\n??? note \u0026ldquo;Parameter Explanation\u0026rdquo;\nfilename=/dev/sda1: Test file name, typically selecting the data directory of the disk to be tested\ndirect=1: Bypasses system buffers during testing for more authentic results\nrw=randwrite: Tests random write I/O\nrw=randrw: Tests mixed random read/write I/O\nrw=randread: Tests random read I/O\nbs=4k: Block size per I/O operation is 4KB\nbsrange=512-2048: Specifies data block size range\nsize=60g: Test file size set to 60GB with 4KB I/O operations\nnumjobs=64: Test runs with 64 concurrent threads\nruntime=10: Test duration limited to 10 seconds\nioengine=psync: I/O engine uses psync mode\nrwmixwrite=30: 30% write ratio in mixed read/write mode\ngroup_reporting: Aggregates results per-process\nAdditional parameters:\n- lockmem=1g: Limits memory usage to 1GB for testing\n- zero_buffers: Initialize buffers with zeros\n- nrfiles=8: Number of files generated per process\n4). iostat Command\rFirst use iostat to check if disk I/O has high read/write loads\nIf %util approaches 100%, it indicates too many I/O requests and the I/O system is saturated. The disk may be a bottleneck. Generally, if %util exceeds 70%, the I/O pressure is significant with considerable read wait time. Then check other parameters.\n1 2 3 yum install sysstat iostat -x 1 10 ??? note \u0026ldquo;Explanation\u0026rdquo;\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 rrqm/s: Number of read operations merged per second. Calculated as delta(rmerge)/s wrqm/s: Number of write operations merged per second. Calculated as delta(wmerge)/s r/s: Read I/O operations completed per second. Calculated as delta(rio)/s w/s: Write I/O operations completed per second. Calculated as delta(wio)/s rsec/s: Sectors read per second. Calculated as delta(rsect)/s wsec/s: Sectors written per second. Calculated as delta(wsect)/s rKB/s: Kilobytes read per second. Half of rsec/s (since sector size is 512 bytes) wKB/s: Kilobytes written per second. Half of wsec/s avgrq-sz: Average data size per I/O operation (sectors). Calculated as delta(rsect+wsect)/delta(rio+wio) avgqu-sz: Average I/O queue length. Calculated as delta(aveq)/s/1000 (since aveq is in milliseconds) await: Average wait time per I/O operation (milliseconds). Calculated as delta(ruse+wuse)/delta(rio+wio) svctm: Average service time per I/O operation (milliseconds). Calculated as delta(use)/delta(rio+wio) %util: Percentage of time with I/O operations active, or time when I/O queue was non-empty 5). iotop Command\rA tool to identify processes with high I/O usage. Simply execute the iotop command:\n1 yum install iotop -y 3. sar Command\rThe sar -u 1 1 command checks CPU utilization, sampling the data once every 1 second for 1 iteration.\nThe sar command is an essential tool for analyzing system bottlenecks, used to monitor performance metrics including CPU, memory, disk, and network.\n[root@server-68.2.stage.polex.io var ]$ sar -d -p Linux 3.10.0-693.5.2.el7.x86_64 (server-) // x86_64 ( CPU)\n:: PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util :: PM sda 1.00 0.00 3.00 3.00 0.01 9.00 9.00 0.90 :: PM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 :: PM polex_pv-rootvol 1.00 0.00 3.00 3.00 0.01 9.00 9.00 0.90 :: PM polex_pv-varvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 :: PM polex_pv-homevol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00\n:: PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util :: PM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 :: PM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 :: PM polex_pv-rootvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 :: PM polex_pv-varvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 :: PM polex_pv-homevol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00\nAverage: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util Average: sda 0.50 0.00 1.50 3.00 0.00 9.00 9.00 0.45 Average: sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: polex_pv-rootvol 0.50 0.00 1.50 3.00 0.00 9.00 9.00 0.45 Average: polex_pv-varvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: polex_pv-homevol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00\nIn the command, the \u0026ldquo;-d\u0026rdquo; parameter represents viewing disk performance, the \u0026ldquo;-p\u0026rdquo; parameter displays dev devices by names like sda, sdb, etc., \u0026ldquo;1\u0026rdquo; indicates sampling values every 1 second, and \u0026ldquo;2\u0026rdquo; specifies collecting data a total of 2 times.\n??? note \u0026ldquo;Parameter Explanation\u0026rdquo;\nawait: The average waiting time per device I/O operation (in milliseconds).\n**svctm**: The average service time per device I/O operation (in milliseconds). **%util**: The percentage of time spent on I/O operations each second. For disk I/O performance, the following criteria generally apply: - Normally, **svctm** should be smaller than **await**. The value of **svctm** depends on disk performance, but CPU and memory load can also affect it. Excessive I/O requests may indirectly increase the **svctm** value. - The **await** value is typically influenced by **svctm**, the I/O queue length, and the I/O request pattern. If **svctm** is close to **await**, it indicates minimal I/O waiting, implying excellent disk performance. If **await** is significantly higher than **svctm**, it suggests a long I/O queue wait, which slows down applications. This can often be resolved by using a faster disk. - **%util** is another critical metric. If **%util** approaches 100%, the disk is handling too many I/O requests and operating at full capacity, indicating a potential bottleneck. Prolonged high utilization will degrade system performance. Solutions include optimizing programs or upgrading to a faster/higher-capacity disk.\r4. vmstat Command\r1 2 3 [root@server-68.2.stage.polex.io var ]$ vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st In the output, the bi and bo values reflect current disk performance:\nbi: Blocks received per second from block devices. Block devices include all disks and other block devices on the system. The default block size is 1024 bytes. bo: Blocks sent per second to block devices. For example, reading files increases bo.\nGenerally, both bi and bo should be close to 0. Consistently high values indicate excessive I/O activity, requiring system adjustments. 5. uptime Command\r1 uptime The output displays:\nCurrent system time System uptime (duration since last reboot) Number of logged-in users Load averages for the last 1 minute, 5 minutes, and 15 minutes. If the load average values consistently exceed the number of CPUs in the system, it indicates high CPU load, which may degrade performance.\n6. TCP/IP Related Tools\r1) netstat Command\r1 netstat -an |grep tcp # View all active TCP connection details 2). Socket Statistics Command\rPreviously using the netstat command was found to be inefficient on busy servers, sometimes consuming over 90% of CPU.\nThe Socket Statistics (ss) command, however, operates at a lower level using the tcp_diag module in the TCP protocol stack for statistical analysis, making it faster and more efficient.\nCommon ss Commands:\nss -t: Displays all current TCP connections. ??? note \u0026ldquo;Details\u0026rdquo;\n- -t: Show TCP connection information only\n- -a: Display all connection information\n- -u: Show UDP connection information only\nWhile nearly all Linux systems include `netstat` by default, `ss` may not be pre-installed (CentOS includes it by default). The `ss` command is part of the `iproute` toolkit, a suite of tools for managing TCP/UDP/IP networks with IPv4/IPv6 support. If the `ss` command is missing, install the toolkit with: 1 yum install iproute iproute-doc 7. Disk I/O, Throughput, and Storage IOPS\rDisk I/O, Throughput, and Storage IOPS Performance Metrics\rCloud server disk storage performance metrics include Disk I/O, IOPS, and Throughput. Below is a detailed explanation of these terms and their relationships:\nStorage IOPS (Input/Output Operations Per Second): The number of read/write operations a disk can perform per second. Disk I/O: Refers to input (writing data to disk) and output (reading data from disk). The data volume per I/O request is measured in KiB (e.g., 4KiB, 256KiB, 1024KiB). Throughput: The total data transfer rate per second, combining read and write operations. Formula: Relationship Between IOPS, I/O Size, and Throughput\rThe relationship is defined as:\nThroughput = IOPS × I/O Size\nIn other words, higher IOPS and larger I/O sizes result in greater throughput. While higher IOPS and throughput values are generally desirable, they are constrained by hardware limits.\nFor further details on disk I/O performance for cloud servers, refer to Alibaba Cloud\u0026rsquo;s documentation on ECS storage performance at ecs6.com.\nCommon Linux Monitoring Commands\rfree\ndf\ntop / htop\nuptime\niftop\niostat\niotop\nvmstat\nnetstat\nnethogs (shows bandwidth used by each process)\n","date":"2021-06-23T00:00:00Z","permalink":"/en/p/centos7_system_optimization_guide/","title":"CentOS 7 System Optimization and Deployment Guide"},{"content":"Introduction\rOfficial repository address: https://github.com/neurobin/shc\nA generic shell script compiler. Shc takes a script specified in the command line and generates C source code. The generated source code is then compiled and linked to produce a stripped binary executable file.\nThe compiled binary still depends on the shell specified in the shebang line (i.e., #!/bin/sh) of the original shell script. Therefore, shc does not create fully independent binaries.\nShc itself is not a compiler like cc, but rather encodes and encrypts shell scripts while generating C source code with additional expiration functionality. It then uses the system compiler to build a stripped binary that behaves exactly like the original script. When executed, the compiled binary decrypts and runs the code using the shell -c option.\nInstallation\r1 2 3 yum install epel-release yum -y install gcc gcc-c++ libstdc++-devel yum -y install shc 1 2 3 4 [root@template mnt]# shc -v shc parse(-f): No source file specified shc Usage: shc [-e date] [-m addr] [-i iopt] [-x cmnd] [-l lopt] [-o outfile] [-rvDSUHCABh] -f script Testing\r1 2 shc -v -rf HelloWorld.sh ./HelloWorld.x ","date":"2020-10-12T00:00:00Z","permalink":"/en/p/shc_shell_script_compiler/","title":"Compiling Shell Scripts to Executable Binaries with SHC"},{"content":"Installing Kubernetes\rInstall kubelet, kubeadm, kubectl\rAdd Kubernetes repository 1 2 3 4 5 6 7 8 sudo apt-get install -y ca-certificates curl software-properties-common apt-transport-https curl curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add - sudo tee /etc/apt/sources.list.d/kubernetes.list \u0026lt;\u0026lt;EOF deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF sudo apt-get update Install kubeadm, kubelet, kubectl 1 apt install -y kubelet=1.23.5-00 kubectl=1.23.5-00 kubeadm=1.23.5-00 Start kubelet service and enable it to start on boot 1 2 systemctl start kubelet systemctl enable kubelet Initialize Kubernetes cluster\r1 2 3 4 5 6 7 kubeadm init --kubernetes-version v1.23.5 \\ --apiserver-advertise-address 192.168.142.63 \\ --image-repository registry.aliyuncs.com/google_containers \\ --service-cidr 172.95.0.0/12 \\ --pod-network-cidr 172.245.0.0/16 \\ --upload-certs \\ --v=5 Install network plugin\rcalico-ipv4.yaml\n","date":"2020-05-31T00:00:00Z","permalink":"/en/p/k8s_installation_ubuntu_18_04_1_23_5/","title":"Complete Guide to Installing Kubernetes 1.23.5 Cluster on Ubuntu 18.04"}]