首先说明下这台PVE服务器的磁盘配置,两块2T机械硬盘,其中一块盘用于安装PVE系统和给虚拟机分配磁盘空间用,另一块2T专门用于虚拟机的定期自动备份存储(ZSTD压缩存储,保留最后3次备份)。其中一台虚机vm-100(KVM模式,不过感觉要记录的问题应该和虚拟化方式无关)分配了两个虚拟磁盘,一个200G的/dev/sda1,一个1T的/dev/sdb1,其中200G也用于安装操作系统(debian 10),1T用于一些服务的数据存储。
一开始其中一个服务的数据存储保存在了200G虚拟盘负责的主系统分区上,由于最初没有考虑到后续服务使用增长速度,导致200G的系统盘很快就占满了,所以找了个时间对这个服务的数据存储路径做了下迁移,原封不动的移到了1T盘挂在的路径下,一通操作后(一开始还搞错了一次,cp没有使用保留文件权限的方式,导致迁移后服务启动报错,最后又用cp带-a参数二次操作了一遍才顺利启动),服务正常恢复,但是在接下来的PVE自动备份任务执行期间,却出现了备份失败的问题,查看PVE相关日志,发现如下报错:
100: 2026-05-12 04:00:00 INFO: Starting Backup of VM 100 (qemu)
100: 2026-05-12 04:00:00 INFO: status = running
100: 2026-05-12 04:00:00 INFO: VM Name: xh-data-server
100: 2026-05-12 04:00:00 INFO: include disk 'scsi0' 'local-lvm:vm-100-disk-0' 203G
100: 2026-05-12 04:00:00 INFO: include disk 'scsi1' 'local-lvm:vm-100-disk-1' 1T
100: 2026-05-12 04:00:00 INFO: backup mode: snapshot
100: 2026-05-12 04:00:00 INFO: ionice priority: 7
100: 2026-05-12 04:00:00 INFO: creating vzdump archive '/mnt/sdb1/dump/vzdump-qemu-100-2026_05_12-04_00_00.vma.zst'
100: 2026-05-12 04:00:00 INFO: issuing guest-agent 'fs-freeze' command
100: 2026-05-12 04:00:01 INFO: issuing guest-agent 'fs-thaw' command
100: 2026-05-12 04:00:01 INFO: started backup task '1322dd3c-8aa3-4f07-bcf8-337628701d4e'
100: 2026-05-12 04:00:01 INFO: resuming VM again
100: 2026-05-12 04:00:04 INFO: 0% (561.1 MiB of 1.2 TiB) in 3s, read: 187.0 MiB/s, write: 141.6 MiB/s
100: 2026-05-12 04:03:06 INFO: 1% (12.3 GiB of 1.2 TiB) in 3m 5s, read: 66.1 MiB/s, write: 61.7 MiB/s
100: 2026-05-12 04:04:52 INFO: 2% (24.6 GiB of 1.2 TiB) in 4m 51s, read: 118.7 MiB/s, write: 110.7 MiB/s
100: 2026-05-12 04:06:39 INFO: 3% (36.9 GiB of 1.2 TiB) in 6m 38s, read: 117.6 MiB/s, write: 108.9 MiB/s
100: 2026-05-12 04:08:32 INFO: 4% (49.1 GiB of 1.2 TiB) in 8m 31s, read: 110.7 MiB/s, write: 103.7 MiB/s
100: 2026-05-12 04:10:25 INFO: 5% (61.4 GiB of 1.2 TiB) in 10m 24s, read: 111.3 MiB/s, write: 104.5 MiB/s
100: 2026-05-12 04:12:17 INFO: 6% (73.6 GiB of 1.2 TiB) in 12m 16s, read: 112.2 MiB/s, write: 105.4 MiB/s
100: 2026-05-12 04:14:09 INFO: 7% (86.0 GiB of 1.2 TiB) in 14m 8s, read: 112.6 MiB/s, write: 105.4 MiB/s
100: 2026-05-12 04:16:00 INFO: 8% (98.2 GiB of 1.2 TiB) in 15m 59s, read: 113.3 MiB/s, write: 104.2 MiB/s
100: 2026-05-12 04:17:51 INFO: 9% (110.5 GiB of 1.2 TiB) in 17m 50s, read: 112.6 MiB/s, write: 105.4 MiB/s
100: 2026-05-12 04:19:43 INFO: 10% (122.7 GiB of 1.2 TiB) in 19m 42s, read: 112.3 MiB/s, write: 105.4 MiB/s
100: 2026-05-12 04:21:36 INFO: 11% (135.0 GiB of 1.2 TiB) in 21m 35s, read: 111.3 MiB/s, write: 104.4 MiB/s
100: 2026-05-12 04:23:27 INFO: 12% (147.2 GiB of 1.2 TiB) in 23m 26s, read: 112.7 MiB/s, write: 105.7 MiB/s
100: 2026-05-12 04:25:20 INFO: 13% (159.6 GiB of 1.2 TiB) in 25m 19s, read: 112.1 MiB/s, write: 105.3 MiB/s
100: 2026-05-12 04:27:13 INFO: 14% (171.8 GiB of 1.2 TiB) in 27m 12s, read: 110.6 MiB/s, write: 103.5 MiB/s
100: 2026-05-12 04:28:46 INFO: 15% (184.2 GiB of 1.2 TiB) in 28m 45s, read: 136.2 MiB/s, write: 124.4 MiB/s
100: 2026-05-12 04:30:35 INFO: 16% (196.4 GiB of 1.2 TiB) in 30m 34s, read: 114.6 MiB/s, write: 106.8 MiB/s
100: 2026-05-12 04:32:29 INFO: 17% (208.6 GiB of 1.2 TiB) in 32m 28s, read: 110.1 MiB/s, write: 103.4 MiB/s
100: 2026-05-12 04:35:02 INFO: 18% (220.9 GiB of 1.2 TiB) in 35m 1s, read: 82.1 MiB/s, write: 77.1 MiB/s
100: 2026-05-12 04:38:26 INFO: 19% (233.2 GiB of 1.2 TiB) in 38m 25s, read: 61.6 MiB/s, write: 57.8 MiB/s
100: 2026-05-12 04:41:12 INFO: 20% (245.5 GiB of 1.2 TiB) in 41m 11s, read: 75.8 MiB/s, write: 70.8 MiB/s
100: 2026-05-12 04:43:36 INFO: 21% (257.7 GiB of 1.2 TiB) in 43m 35s, read: 87.0 MiB/s, write: 81.1 MiB/s
100: 2026-05-12 04:45:52 INFO: 22% (269.9 GiB of 1.2 TiB) in 45m 51s, read: 92.2 MiB/s, write: 86.4 MiB/s
100: 2026-05-12 04:47:57 INFO: 23% (282.3 GiB of 1.2 TiB) in 47m 56s, read: 101.1 MiB/s, write: 93.7 MiB/s
100: 2026-05-12 04:49:55 INFO: 24% (294.5 GiB of 1.2 TiB) in 49m 54s, read: 106.1 MiB/s, write: 99.5 MiB/s
100: 2026-05-12 04:52:02 INFO: 25% (306.8 GiB of 1.2 TiB) in 52m 1s, read: 99.2 MiB/s, write: 93.0 MiB/s
100: 2026-05-12 04:53:03 INFO: 25% (310.8 GiB of 1.2 TiB) in 53m 2s, read: 66.8 MiB/s, write: 62.6 MiB/s
100: 2026-05-12 04:53:03 ERROR: vma_queue_write: write error - Broken pipe
100: 2026-05-12 04:53:03 INFO: aborting backup job
100: 2026-05-12 04:53:03 INFO: resuming VM again
100: 2026-05-12 04:53:04 ERROR: Backup of VM 100 failed - vma_queue_write: write error - Broken pipe
以上信息来自自动备份的通知邮件,细查系统log,发现了disk full的错误信息,查看用于备份的2T盘,发现空间已经耗尽(PVE的备份逻辑是这样,假如设定了保存最后3个备份,那实际上它会先备份第4个,成功后删除最老的那1个,留下3个,也就是其实峰值是4个备份并存),查看过往备份,发现比捣鼓服务存储数据文件夹前,增大了3倍(zst文件,折腾前一次备份大概也就会生成200G左右的备份文件,折腾后备份会生成700G左右的文件,耗时也明显慢了很多),此时有点莫名奇妙,因为折腾结束后,无用的文件都删除掉了,虚机内部用df -hT看,两个分区加一起的总占用也就200G左右,可为什么备份感知到的空间大小增大了3倍多?
仔细排查确认没有误操作导致没有删干净无用文件后,我开始考虑是不是PVE的磁盘有某种类似缓冲池的概念,之前来回折腾那几次撑大了这个缓冲池,造成备份文件大小暴增?
于是,我将现象给GPT描述了下,并按照提示在PVE宿主系统中用lvs命令行查看,果然发现分配给vm-100的虚拟盘(不知道这个叫法对不对,反正也不用关心这个)的Data%占用异常高,明显与虚拟内df -hT查询出的实际占用空间不符,于是按照接下来的指引,开始进行优化:
首先就是要确认虚机的硬件配置中的磁盘打开了discard=on,如下图:

没有打开的话,需要手动打开,这里我的一开始就没有打开,并且还被GPT大人坑了一次,一开始是在PVE系统中用GPT提供的命令行:
qm set 100 -scsi0 local-lvm:vm-100-disk-0,discard=on
qm set 100 -scsi1 local-lvm:vm-100-disk-1,discard=on
来操作的,操作后直接按照提示进虚机系统使用命令“fstrim -av”进行磁盘trim优化,命令执行后也看到了类似:XXX GiB trimmed on /dev/sda1,XXX GiB trimmed on /dev/sdb1,但是我在PVE host系统里lvs看Data%还是没有下降,并且也用手动触发备份任务的方式验证了一下,生成的zst文件依然很大,追问GPT给出了一些幻觉答案,导致绕了一些弯路,最后发现问题出在增加discard=on配置后,整个虚机需要完全关机,再开机(系统内reboot也不行),让新配置生效(这点也可以上面截图的PVE管理UI操作,并且不管用哪种方式操作,只要没完全重启生效的时候,PVE管理页面上会以黄字标注出来,表明修改的配置没有生效)!
再进行了完全重启虚机,是两个盘的discard=on配置生效后,再次执行fstrim -av,然后进入PVE宿主系统查看lvs,就可以看到Data%缩减到了实际消耗的大小,并且后续自动备份生成的zst文件大小也回归到了200G左右的合理值了。
博主友情提示:
如您在评论中需要提及如QQ号、电子邮件地址或其他隐私敏感信息,欢迎使用>>博主专用加密工具v3<<处理后发布,原文只有博主可以看到。