Here’s an interesting scenario that I was asked to look into recently with the root file system on an Oracle database server filling up. Normally cleaning up disk space is straight forward; find the large and/or old files and delete them. However, in this case there was a difference is space usage reported between df and du, and the find utility could not locate any file over 1G in size.
Here’s the status of the root file system which was causing the “No Space Left on Device” error message.
# df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/VGExaDb-LVDbSys1 30G 30G 0 100% /
After deleting around 2G of old files and logs, the error went away but the output of df -h showed the root file system slowing filling up again. These directory sizes hardly changed at all, only MB differences. From the “/” directory, here are all the directories that are on the “/” file system as seen in df -h $dir .
# du -sh * 7.7M bin 67M boot 3.5M dev 8.5M etc 1.2G home 440M lib 28M lib64 16K lost+found 4.0K media 1.2G mnt 6.8G opt 1.1G root 41M sbin 4.0K selinux 4.0K srv 0 sys 12M tmp 3.0G usr 260M var
Notice here that the sum of these directories only adds up to around 15G, leaving the rest of the used space unaccounted for, and the file system used space was still increasing.
Next was to look at open files. It is worth mentioning here that even if a file is deleted, it’s space may not be reclaimed if the process that created it, or still using it, is still running. Using the lsof ( list open files ) utility will show these files.
# lsof | grep deleted ... expdp 7271 oracle 1w REG 253,0 16784060416 2475867 /home/oracle/nohup.out (deleted) expdp 7271 oracle 2w REG 253,0 16784060416 2475867 /home/oracle/nohup.out (deleted) … # # ps -ef | grep 7271 oracle 7271 1 99 May31 ? 3-10:43:36 expdp directory=DP_DIR dumpfile=exp_schema.dmp logfile=exp_schema.log schemas=schema
The above shows an export data pump job ( pid = 7271 ) whose process was still running at the OS level, although it was not running in the database. This job was probably canceled out for some reason, but was not cleaned up although the nohup file was deleted. The background process was still running at the OS level and the nohup.out file is taking up the space filling up the “/” partition. It is worth mentioning here that the use of nohup is NOT desired with data pump. The data pump utilities are server side processes; if you kick off a job and then loose your terminal for whatever reason, the data pump job is still running.
Once the expdp process 7271 was killed at the OS level, the space was reclaimed.
# df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/VGExaDb-LVDbSys1 30G 13G 16G 45% /