I think I ran across a bug in a batch file because it was written with unix line endings. Is this a known problem with cmd.exe running batch files in windows?
6 Answers
This really isn't a "bug"... as it is by-design. Windows new-lines are defined as "\r\n" ... or a "Carriage Return" and "New Line" combination... whereas *nix flavors prefer omitting the carriage return. You should always use "\r\n" in anything in Windows where possible. Anything else may be interpreted incorrectly... and cause a great deal of unexpected results.
- 7,429
For batch files there is no big difference between unix line endings and windows line endings.
Currently, there is only a glitch known for the behavior of GOTO and CALL, while searching for a label. The label parser fails to find a label, if the label is on a 512 byte boundary relative to the current file position. The difference is, because the carriage return is used for an internal line reset. See Call and goto may fail when the batch file has Unix line endings
Despite the label scanner, it's unexpected to find more problems, because the batch parser removes all carriage returns directly after the percent expansion phase.
Sample to show the problems:
@echo off
goto :main
:func
echo ************************************************
echo ************************************************
echo ************************************************
echo ************************************************
echo ************************************************
echo ***********************************************
echo ***********************************************
echo ***********************************************
echo ************************************************
echo never go back to :main
echo This is the end of :func
exit /b
:main
:main
echo This is main
goto :func
exit /b
The output is
This is the end of :func
- 385
- 3
- 12
Kinda, but...
You will have to be a paranoid and
- always duplicate your label lines!
- never use colons outside of label definitions!
Eg.:
@echo off
goto main <- No use of colon
...
:main
:main <- Repeat label
echo At least one of the above labels are discoverable
REM main part done <- No use of colons in comments
How it works
When using Unix line endings, the label parser will skip over some labels because of an off-by-one error. This is due to the parser's use of 512-byte sized chunks and the assumption that line endings is denoted by two characters, \r\n, rather than one, \n. When a label is erroneously skipped over, the next erroneous skip can only occur at an offset of 512 bytes. If you duplicate a label on the next line, the duplicated label will be within the 512 limit, and can act as a fallback.
Furthermore, as demonstrated by @jeb, the parser also misinterprets the end of each 512-byte size chunk as a new line (as well as somehow ignoring white space characters between a colon and the next text on these pseudo lines). A comment such as :: main section can trigger the parser into somehow reading the text : main as the label :main.
In summary, not only can the parser skip labels, it can also misinterpret comments and other pieces of text as labels.
Why not just stick with dos line endings?
It's not always safe to assume that your batch script will retain its line endings, especially when using Git or sharing content over GitHub. It's also convenient for cross platform projects not to care about line endings and assume Unix line endings as a common denominator.
- 161
- 1
- 3
The answer is: you may get "lucky" and it will work with LF, but don't count on it. We had same problem as the original requestor. Our process would end up with bat files that were only LF, and (sorry could not find a pattern) sometimes a label would be 'not found', though clearly it was there. Would have to convert to CR-LF, or make random changes until it worked!
- 11
I have a GitHub project that is a single, pure batch file. When downloaded from GitHub using the "raw" feature, it serves the file with LF line endings regardless of how I save the file. I've done my own substantial research to come up with an entirely new solution.
None of the answers, including the accepted answer are correct. While saving the file with the proper line endings is always a good idea, there are clearly some scenarios that this won't hold up. The previously suggested solutions do not fully address the issue, are misleading or just wrong. Here's why.
Understanding the problem
To follow along, edit your "test.bat" file in an editor like Notepad++ where you easily have the ability to change the line endings. (Notepad++ -> Edit -> EOL Conversion)
I'll start with deliberately trying to make a batch script fail with LF while working perfectly fine with CRLF. If the issue with line endings is related solely to labels and double colon comments, then you would think that a batch file like this would break, but it doesn't necessarily break the script.
@echo off
goto :main
:main
echo Hello from main
Now, I will show you a slightly modified version of the script that contains a UTF-8 character.
@echo off
goto :main
:main
echo ═ Hello from main
You're going to notice right away that this script works perfectly fine with CRLF and for most of us it breaks completely with LF. When using CRLF, some of you will see something like
ÔòÉ Hello from main
whereas some of you will see the expected UTF-8 character
═ Hello from main
There are two things to explain here. First, the reason different people will see different things is because not everybody uses the same default code page.
In a fresh command prompt, type
chcp
You will see a response showing your current code page. My own default code page is 850.
Active code page: 850
Now, for all of us to be able to see the same thing, you'll want your batch script to set the code page to UTF-8 using code page 65001.
@echo off
chcp 65001 >nul
goto :main
:main
echo ═ Hello from main
Secondly, when LF is used with UTF-8 characters, it will misalign and cause issues. After doing my own research, I discovered its related to how cmd.exe actually runs a .bat file. It doesn't read it line-by-line from disk like one might imagine but instead parses it in 512 byte chunks and each chunk is scanned to find lines, labels, and commands. It expects each line to end with CRLF and each character to be single-byte. When a multi-byte UTF-8 character is added to the script, it seems as though the script misaligns the line endings, isn't able to properly count the line length or something to this effect. Whatever the exact issue is, the UTF-8 character is breaking the script.
My solution is to use the failure itself as a poison to be able to determine the line endings of the file without any additional calls. And I was successful. After some experimentation, I was able to write a minimal script that can use this poison to determine the difference.
@echo off
chcp 65001 >nul
goto :main
cls
echo This script was saved with Unix style line endings.
exit /b 1
:main
:: ═
cls
echo This script was saved with Windows style line endings.
Something to be noted about this, the spacing absolutely matters. If you remove one of the spaces in front of the indented lines the script will break. Otherwise, if written properly, it's absolutely determining the line endings of the file. I also tried tabs and new lines instead of spaces. Use a single tab, and it will break. Use two tabs and it runs just like the spaces would. The following works the exact same as the above:
@echo off
chcp 65001 >nul
goto :main
cls
echo This script was saved with Unix style line endings.
exit /b 1
:main
:: ═
echo This script was saved with Windows style line endings.
Now here's the surprising part, we're going to eliminate the need for that spacing by simply moving the UTF-8 character elsewhere.
@echo off
chcp 65001 >nul
goto :main
:: ═
cls
echo This script was saved with Unix style line endings.
exit /b 1
:main
echo This script was saved with Windows style line endings.
To explain exactly what is going on here, the multi-byte UTF-8 character is pushing the buffer to the left after the first new line character as a result of the UTF-8 character offsetting the buffer.
@echo off
chcp 65001 >nul
goto :example
:: ═
echo First echo
:example
echo Second Echo
The output of this script will look like this:
'cp' is not recognized as an internal or external command, operable program or batch file. 'oto' is not recognized as an internal or external command, operable program or batch file. '═' is not recognized as an internal or external command, operable program or batch file.
First echo
Second Echo
The first thing you might notice is that you see both echos, the goto was ignored and the commands are misaligned with lengths of 2, 3, 1... yet the echo was also turned off without any issue.
This is the same script with some more echos.
@echo off
chcp 65001 >nul
echo.
echo.
echo.
goto :example
:: ═
echo First echo
:example
echo Second Echo
Your output should look something like this, with a misalignment of 2, 3, 3, 3, 3, 1.
'cp' is not recognized as an internal or external command, operable program or batch file. 'ho.' is not recognized as an internal or external command, operable program or batch file. 'ho.' is not recognized as an internal or external command, operable program or batch file. 'ho.' is not recognized as an internal or external command, operable program or batch file. 'oto' is not recognized as an internal or external command, operable program or batch file. '═' is not recognized as an internal or external command, operable program or batch file.
First echo
Second Echo
And now I will show you that we can control this offsetting by adding a new line between one of these echos.
@echo off
chcp 65001 >nul
echo.
echo.
echo.
goto :example
:: ═
echo First echo
:example
echo Second Echo
Your output will look like this with a misalignment of 2,3,3,4,3,1
'cp' is not recognized as an internal or external command, operable program or batch file. 'ho.' is not recognized as an internal or external command, operable program or batch file. 'ho.' is not recognized as an internal or external command, operable program or batch file. 'cho.' is not recognized as an internal or external command, operable program or batch file. 'oto' is not recognized as an internal or external command, operable program or batch file. '═' is not recognized as an internal or external command, operable program or batch file.
First echo
Second Echo
Okay, now this is where things start to get even more interesting. To get around most of these errors, we're going to use the ampersand to string several of our initial commands together.
@echo off & chcp 65001 >nul & goto :eol_verify
:main
echo Hello world!
:: ═
:eol_verify
echo Second Echo
Now your output is going to look like this:
'y' is not recognized as an internal or external command, operable program or batch file.
Second Echo
There are going to be two ways to get around this particular issue.
Our first option is to match the offset by adding a whitespace character after the label, and even if you "double up" the label, you're going to get the exact same output. The double label has zero impact on the script what-so-ever, but the single whitespace character makes a difference here because its matching the length of the offset.
@echo off & chcp 65001 >nul & goto :eol_verify
:main
echo Hello world!
:: ═
:eol_verify
echo Second Echo
The next option is to add a block of non UTF-8 characters to pad the buffer, forcing our eol_verify label into its own buffer block, unimpacted by the UTF-8 character above. We want to use enough bytes to ensure we get an entirely new boundary to reset the offset created by the UTF-8 character(s).
@echo off & chcp 65001 >nul & goto :eol_verify
:main
echo Hello world!
:: ═
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: --------------BUFFER-BLOCK--------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:eol_verify
echo Second Echo
With either of these two options, your output should look like this:
Second Echo
What is quite clear is that we have skipped over the main label and reached the eol_verify label without any errors.
You don't need to run this next block of code, but if you were to, you would immediately see an infinite loop condition as the top of the code is still impacted by the buffer offset caused by the UTF-8 character that proceeds it and therefore skips the exit command while saying "'it' is not recognized as an internal or external command, operable program or batch file."
@echo off & chcp 65001 >nul & goto :eol_verify
:main
echo Hello world!
exit /b
:: ═
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: --------------BUFFER-BLOCK--------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:eol_verify
echo Second Echo
goto main
Yet again, a simple solution would be to put the echo and the exit into a single line.
:main
echo Hello world! & exit /b
Utilizing the solution
Now that we understand how this all works, we're going to take full advantage of what we know. We want to deliberately have our script trigger the error, and we want to customize that error to reduce any potential issues.
The most notable issue is this, if the script from the error like "'it' is not recognized as an internal or external command, operable program or batch file." does exist, like it.bat in your active directory, you're not going to see that error. The offset is actually going to trigger your program to run that script. Therefore, we want our error to show us an impossible file.
We want to trigger an error where the unrecognizable command, program or batch file contains a question mark, as a filename on Windows could never possibly contain a question mark. We also need to take advantage of what we know about the offsets to make a set of commands that will successfully pass when using CRLF and fail when using LF.
@echo off & chcp 65001 >nul & goto :eol_verify
:main
echo Hello world!
exit /b
:: ═
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: --------------BUFFER-BLOCK--------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:eol_verify
rem ═
:═?
echo Second Echo
Now the LF version of the script will show:
'��?' is not recognized as an internal or external command, operable program or batch file.
Second Echo
Whereas the CRLF version will just show the second echo.
Now we can track the error in the LF version specifically by looking for the 9009 (MSG_DIR_BAD_COMMAND_OR_FILE) error and handle it accordingly. This is the least you're going to need to be able to stop a user from running the script with the wrong line endings.
@echo off & chcp 65001 >nul & goto :eol_verify
:main
echo Hello world!
exit /b
:: ═
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: --------------BUFFER-BLOCK--------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:eol_verify
rem ═
:═? ?
cls
if %errorlevel%==9009 (
echo This script was saved with unix line endings and is unable to run.
exit /b 1
)
goto main
Self Repairing Batch Script
Before fully landing on this solution, I had experimented with and employed several other solutions for detecting and fixing the line endings. Because detecting the line endings has been rather tedious, my previous solutions involved having the batch file rewrite itself regardless of the line endings to "normalize" the file before executing it again. I won't go into detail but the comments of that code should be self explanatory.
@echo off
rem Written by: Brogan Scott Houston McIntyre
rem ========== ========== ========== ========= ==========
rem Beginning of normalization
rem ===== ===== ===== ===== =====
setlocal EnableDelayedExpansion
rem Source and temp paths
set "src=%~f0" & set "tmp=%TEMP%\%~nx0_%random%.tmp"
rem Normalize LF to CRLF into the temp file
type "%src%" | find /V "" > "%tmp%"
rem Add goto to skip the normalization header, remove the temp file and restart
( echo @echo off & echo :: Skip normalization & echo goto main & echo. & echo :: ========== ========== ========== ========== ========== & echo. & type "%tmp%" ) > "%src%" & del "%tmp%" & endlocal & call "%src%" & exit /B
rem ===== ===== ===== ===== =====
rem End of normalization
:: ========== ========== ========== ========== ==========
:: The main script
:main
This solution works, but now that we have a working method of detecting if the file created using CRLF or LF, I've revised this self repairing code along with resetting the code page back to its original when we exit the script.
@echo off & for /f "tokens=2 delims=:" %%a in ('chcp') do set "codepage=%%a" & chcp 65001 >nul & goto :eol_verify
:: Store the original code page, change the code page to UTF-8 and verify the line endings
:: ========== ========== ========== ========== ==========
:: Written by: Brogan Scott Houston McIntyre
:: Please provide me credit if using this code.
:: From the Always Active Hours script located at:
:: https://github.com/TechTank/AlwaysActiveHours
:: ========== ========== ========== ========== ==========
:main
:: The main script
echo Hello world!
pause
:: Properly end the batch file
goto end
:: ========== ========== ========== ========== ==========
:end
chcp %codepage% >nul
exit /b
:: ========== ========== ========== ========== ==========
:: Use a poison to verify line endings and repair the file if necessary
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: --------------BUFFER-BLOCK--------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:: ----------------------------------------
:eol_verify
rem ═
:═? ?
cls
if %errorlevel%==9009 (
goto repair
)
goto main
:repair
:: The echo and pause can safely be removed
echo This script was saved with Unix line endings and will repair itself.
pause
:: Source and temp paths
set "src=%~f0" & set "tmp=%TEMP%%~nx0_%random%.tmp"
:: Normalize LF to CRLF into the temp file
type "%src%" | find /V "" > "%tmp%"
:: Copy the temp file to our original source, remove the temp file, reset the code page and restart
type "%tmp%" > "%src%" & del "%tmp%" & chcp %codepage% >nul & call "%src%" & exit /B
- 111