I have the following setup:
- Windows 8.1 32-bit
- Drive 0: system drive, SSD, NTFS, mounted at
C:\ - Drive 1: data drive, magnetic HDD, NTFS, mounted at
C:\Users\Database User\DocumentsandZ:\additionally
In a sub-sub-directory of C:\Users\Database User\Documents I have about 50 000 files with about 2KB on average in about 10 subdirectories. (A bcolz column database.)
With cross-drive NTFS junction points I find huge performance discrepancies depending on whether a process' file IO targets its working directory (or a sub-directory thereof) or any other directory.
Below the NTFS junction acceptable performance is only achieved in the processes' working directory or a subdirectory of the working directory:
Working directory
C:\Users\Database User\Documents\abc\def: executingrmdir /Q /S mydata.bcolzis a IO bound (Disk bound) operationWorking directory
C:\Users\Database User\Documents\abc: executingrmdir /Q /S def\mydata.bcolzis a IO bound (Disk bound) operationWorking directory
C:\Users\Database User\Documents\abc\def\xyz: executingrmdir /Q /S ..\mydata.bcolzis a CPU bound operation
In the first two cases, the cmd.exe process hardly consumes any CPU time, while in the latter it consumes 100% of one core. The operation is identical in all three cases. Only the working directories differs.
But note:
- Working directory
Z:\abc\def\xyz: executingrmdir /Q /S ..\mydata.bcolzis again an IO bound operation!
This phenomenon occurs with any rapid file IO with a very large number of very small files. It is not limited to rmdir or cmd.exe. The above example is only for illustration.
Any idea what is going on and how to fix it?