Files to convert to WIN 866 (OEM Russian) <=> UTF-8
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Files to convert to WIN 866 (OEM Russian) <=> UTF-8
As means Alaska can I convert files of WIN 866 (OEM Russian) <=> UTF-8?
I think I don't understand something. UTF-8 can be in DOS format (ASCII) and the Windows character set (ANSI):
I think I don't understand something. UTF-8 can be in DOS format (ASCII) and the Windows character set (ANSI):
- Attachments
-
- Без имени-1.jpg (142.69 KiB) Viewed 21206 times
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
Install the packages enca and recode in UNIX and tried the command allocated by the block on the given window. Everything was converted correctly. The team determined the current encoding of the file and convert it to UTF-8. To run UNIX commands from my program I make a bat file with the launch of bash and commands specifying paths. Everything works.
PS
in bash this is all elementary, but difficult to use it under windows
PS
in bash this is all elementary, but difficult to use it under windows
- Attachments
-
- Безымянный.jpg (91.53 KiB) Viewed 21202 times
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
Think I found the console transcoder files:
http://ru-mangos.ru/attachment.php?s=27 ... 1268067439
http://ru-mangos.ru/showthread.php?p=700
Tested, seems to work.
Unfortunately, it seems these programs have viruses...
Here is an interesting information:
http://gnuwin32.sourceforge.net/packages/libiconv.htm
http://www.gnu.org/software/libiconv/
http://www.cyberforum.ru/cmd-bat/thread1361276.html
http://ru-mangos.ru/attachment.php?s=27 ... 1268067439
http://ru-mangos.ru/showthread.php?p=700
Tested, seems to work.
Unfortunately, it seems these programs have viruses...
Here is an interesting information:
http://gnuwin32.sourceforge.net/packages/libiconv.htm
http://www.gnu.org/software/libiconv/
http://www.cyberforum.ru/cmd-bat/thread1361276.html
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
http://www.mashkov.com/2015/01/22/%D0%B ... ault_1251/
Вопрос:
Как перекодировать файлы utf8, ascii, oem, UTF32, UTF7, BigEndianUnicode, Unicode, Default (Windows-1251) ?
Ответ:
Можно воспользоваться командой через powershell. Внизу пример как перекодировать файл utf8 в windows-1251, ANSI Cyrillic (кодировка операционной системы) в командной строке.
utf8 => 1251
C:\>powershell.exe "Get-Content -Encoding Unicode 'c:\text file.txt' | Out-File -Encoding Default 'c:\text file.txt.Default'"
1251 => utf8
C:\>powershell.exe "Get-Content -Encoding Default 'c:\text file.txt' | Out-File -Encoding Unicode 'c:\text file.txt.Default'"
Так же Вы можете перекодировать файлы по следующим кодировкам:
1. ascii;
2. BigEndianUnicode — UCS-2 Big Endian;
3. default — кодировка операционной системы, в России Windows-1251;
4. oem — OEM 866;
5. Unicode — UCS-2 Little Endian;
6. utf32;
7. utf7;
8. utf8.
Подробная информация по команде Get-Content for FileSystem на сайте разработчика
Идентификаторы различных кодировок — Code Page Identifiers на сайте разработчика
Так же для Вас может быть полезна страница запуск powershell сценария из командной строки
Вопрос:
Как перекодировать файлы utf8, ascii, oem, UTF32, UTF7, BigEndianUnicode, Unicode, Default (Windows-1251) ?
Ответ:
Можно воспользоваться командой через powershell. Внизу пример как перекодировать файл utf8 в windows-1251, ANSI Cyrillic (кодировка операционной системы) в командной строке.
utf8 => 1251
C:\>powershell.exe "Get-Content -Encoding Unicode 'c:\text file.txt' | Out-File -Encoding Default 'c:\text file.txt.Default'"
1251 => utf8
C:\>powershell.exe "Get-Content -Encoding Default 'c:\text file.txt' | Out-File -Encoding Unicode 'c:\text file.txt.Default'"
Так же Вы можете перекодировать файлы по следующим кодировкам:
1. ascii;
2. BigEndianUnicode — UCS-2 Big Endian;
3. default — кодировка операционной системы, в России Windows-1251;
4. oem — OEM 866;
5. Unicode — UCS-2 Little Endian;
6. utf32;
7. utf7;
8. utf8.
Подробная информация по команде Get-Content for FileSystem на сайте разработчика
Идентификаторы различных кодировок — Code Page Identifiers на сайте разработчика
Так же для Вас может быть полезна страница запуск powershell сценария из командной строки
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
Eugene,
Now I have not time to search prospects for coding ascii files, because I tested many softwares, utilities, when work on coding, decoding and test code pages of text files in my app.
but one from utility I use this is xcode.exe, in attach, but I cannot found web adresss from whitch this is, but I mean this is russian product
that you must found it.
This utility can test and convert from to other code pages, also CP866 and W1251.
syntax is xcode -c -e %1 %1x
where -e is in English, that you do not put here and will be in russian.
when run with xcode -c -e zp807231oem.red zp807231oem.redx
that program write how code page is used in file ZP807231oem.red to file zp807231oem.redx
(red file is some txt file)
this is result : cp866: zp807231ansi.rec
Syntax for xcode is :
Usage: xcode -E -[hH?] -[wkaim1234567890] +[wkaim1234567890] [-q] [in [out]]
-E -h in English (don't forget to add -h or -H switch!)
-v print version information
-H manual, list of 14 encodings supported, and view YO-ware license
-d double recoding (try if simple 'xcode' failed)
-q quoted-printable decoding (useful for decoding MIME-files)
-l decode html Unicoded text (like Дима)
-c determine encoding and print it to the output (see details by -H)
-t do unix2dos transformation (convert LF to CR/LF) in DOS/Win only
-p pipe mode (applies to DOS/Win environment only)
-s silent mode (no information on encodings displayed)
If input/output files are not specified, the standard input/output is used.
-a to set cp866 output (default)
-w to set cp1251 output
-k to set koi8-r output
-i to set iso8859-5 output
-m to set mac output
+a to force cp866 input
+w to force cp1251 input
+k to force koi8-r input
+i to force iso8859-5 input
+m to force mac input
Other utility is free converter PokludaCZ, http://www.pokluda.cz
also can run from command line :
czkonverze /00 /20 "zp807231oem.red" >vystup1.log
czkonverze /20 /00 "zp807231ansi.red" >vystup2.log
but this utility have only W1250 , not W1251 code page.
I have writed in Alaska only detector code page which test multiplicity some characters and then statistic count for what code page is this near.
Here some source , input parameter is some row from text:
**********************************
* DETEKTOR KÓDOVEJ STRĮNKY TEXTU *
**********************************
****************************
FUNCTION DETEKTORCP(riadok)
****************************
* zadefinovanie premennżch a po¾a znakov pre detekciu
Local pocet[7]
/*
Local detect := ;
{ "č‡č‹Ćc", ;
"ų©żųŽŅr", ;
"Øē¹äÓs", ;
"˛‘§¾ģŚz", ;
"ó¢¢ó—Ļo", ;
"į į‡Įa", ;
"é‚‚éˇ×e", ;
"ś££śÕu", ;
"ķķ’Éi" ;
}
*/
Local detect := ;
{ "č‡č‹Ćc", ;
"ų©żųŽŅr", ;
"Øē¹äÓs", ;
"˛‘§¾ģŚz", ;
"ó¢¢ó—Ļo", ;
"į į‡Įa", ;
"é‚‚éˇ×e", ;
"ś££śÕu", ;
"ķķ’Éi", ;
"Čķ’Éi", ;
"¼ķ’Éi", ;
"ķ’Éi", ;
"¨ķ’Éi", ;
"ˇķ’Éi", ;
"Ļķ’Éi", ;
"żķ’Éi" ;
}
* vynulovanie počķtadla
for k=1 to 7
pocet[k]:=0
next
* cyklus pre načķtanie a otestovanie vetkżch znakov riadku
for i=1 to len(riadok)
* testujem iba znaky nad CHR(127)
if riadok>chr(127)
* skenujem 9 variantov znakov
* for j=1 to 9
for j=1 to 16
* testujem ka˛dż znak sady, v ka˛dej sade je 7 znakov
for k=1 to 7
if riadok==detect[j][k]
pocet[k]++
endif
next
next
endif
next
* tu vyhodnoti¯ ktorżch znakov je najviac pod¾a pocet[k]
*ladenie("pocet[1]"+str(pocet[1]))
*ladenie("pocet[2]"+str(pocet[2]))
*ladenie("pocet[3]"+str(pocet[3]))
*ladenie("pocet[4]"+str(pocet[4]))
*ladenie("pocet[5]"+str(pocet[5]))
*ladenie("pocet[6]"+str(pocet[6]))
*ladenie("pocet[7]"+str(pocet[7]))
/*
k=1
pompocet=pocet[k]
if pocet[2]>pompocet
pompocet=pocet[2]
k=2
endif
if pocet[3]>pompocet
pompocet=pocet[3]
k=3
endif
if pocet[4]>pompocet
pompocet=pocet[4]
k=4
endif
if pocet[5]>pompocet
pompocet=pocet[5]
k=5
endif
if pocet[6]>pompocet
pompocet=pocet[6]
k=6
endif
if pocet[7]>pompocet
pompocet=pocet[7]
k=7
endif
*/
* zatia¾ jednoduchie vyhodnotenie lebo kompletné nedįva korektné vżsledky CP850/Win1250
if pocet[1]>0
kodstr=1250
else
kodstr=852
endif
ladenie("kódovį strįnka "+str(kodstr))
RETURN kodstr
Maybe some inspiration for you..
Now I have not time to search prospects for coding ascii files, because I tested many softwares, utilities, when work on coding, decoding and test code pages of text files in my app.
but one from utility I use this is xcode.exe, in attach, but I cannot found web adresss from whitch this is, but I mean this is russian product

This utility can test and convert from to other code pages, also CP866 and W1251.
syntax is xcode -c -e %1 %1x
where -e is in English, that you do not put here and will be in russian.
when run with xcode -c -e zp807231oem.red zp807231oem.redx
that program write how code page is used in file ZP807231oem.red to file zp807231oem.redx
(red file is some txt file)
this is result : cp866: zp807231ansi.rec
Syntax for xcode is :
Usage: xcode -E -[hH?] -[wkaim1234567890] +[wkaim1234567890] [-q] [in [out]]
-E -h in English (don't forget to add -h or -H switch!)
-v print version information
-H manual, list of 14 encodings supported, and view YO-ware license
-d double recoding (try if simple 'xcode' failed)
-q quoted-printable decoding (useful for decoding MIME-files)
-l decode html Unicoded text (like Дима)
-c determine encoding and print it to the output (see details by -H)
-t do unix2dos transformation (convert LF to CR/LF) in DOS/Win only
-p pipe mode (applies to DOS/Win environment only)
-s silent mode (no information on encodings displayed)
If input/output files are not specified, the standard input/output is used.
-a to set cp866 output (default)
-w to set cp1251 output
-k to set koi8-r output
-i to set iso8859-5 output
-m to set mac output
+a to force cp866 input
+w to force cp1251 input
+k to force koi8-r input
+i to force iso8859-5 input
+m to force mac input
Other utility is free converter PokludaCZ, http://www.pokluda.cz
also can run from command line :
czkonverze /00 /20 "zp807231oem.red" >vystup1.log
czkonverze /20 /00 "zp807231ansi.red" >vystup2.log
but this utility have only W1250 , not W1251 code page.
I have writed in Alaska only detector code page which test multiplicity some characters and then statistic count for what code page is this near.
Here some source , input parameter is some row from text:
**********************************
* DETEKTOR KÓDOVEJ STRĮNKY TEXTU *
**********************************
****************************
FUNCTION DETEKTORCP(riadok)
****************************
* zadefinovanie premennżch a po¾a znakov pre detekciu
Local pocet[7]
/*
Local detect := ;
{ "č‡č‹Ćc", ;
"ų©żųŽŅr", ;
"Øē¹äÓs", ;
"˛‘§¾ģŚz", ;
"ó¢¢ó—Ļo", ;
"į į‡Įa", ;
"é‚‚éˇ×e", ;
"ś££śÕu", ;
"ķķ’Éi" ;
}
*/
Local detect := ;
{ "č‡č‹Ćc", ;
"ų©żųŽŅr", ;
"Øē¹äÓs", ;
"˛‘§¾ģŚz", ;
"ó¢¢ó—Ļo", ;
"į į‡Įa", ;
"é‚‚éˇ×e", ;
"ś££śÕu", ;
"ķķ’Éi", ;
"Čķ’Éi", ;
"¼ķ’Éi", ;
"ķ’Éi", ;
"¨ķ’Éi", ;
"ˇķ’Éi", ;
"Ļķ’Éi", ;
"żķ’Éi" ;
}
* vynulovanie počķtadla
for k=1 to 7
pocet[k]:=0
next
* cyklus pre načķtanie a otestovanie vetkżch znakov riadku
for i=1 to len(riadok)
* testujem iba znaky nad CHR(127)
if riadok>chr(127)
* skenujem 9 variantov znakov
* for j=1 to 9
for j=1 to 16
* testujem ka˛dż znak sady, v ka˛dej sade je 7 znakov
for k=1 to 7
if riadok==detect[j][k]
pocet[k]++
endif
next
next
endif
next
* tu vyhodnoti¯ ktorżch znakov je najviac pod¾a pocet[k]
*ladenie("pocet[1]"+str(pocet[1]))
*ladenie("pocet[2]"+str(pocet[2]))
*ladenie("pocet[3]"+str(pocet[3]))
*ladenie("pocet[4]"+str(pocet[4]))
*ladenie("pocet[5]"+str(pocet[5]))
*ladenie("pocet[6]"+str(pocet[6]))
*ladenie("pocet[7]"+str(pocet[7]))
/*
k=1
pompocet=pocet[k]
if pocet[2]>pompocet
pompocet=pocet[2]
k=2
endif
if pocet[3]>pompocet
pompocet=pocet[3]
k=3
endif
if pocet[4]>pompocet
pompocet=pocet[4]
k=4
endif
if pocet[5]>pompocet
pompocet=pocet[5]
k=5
endif
if pocet[6]>pompocet
pompocet=pocet[6]
k=6
endif
if pocet[7]>pompocet
pompocet=pocet[7]
k=7
endif
*/
* zatia¾ jednoduchie vyhodnotenie lebo kompletné nedįva korektné vżsledky CP850/Win1250
if pocet[1]>0
kodstr=1250
else
kodstr=852
endif
ladenie("kódovį strįnka "+str(kodstr))
RETURN kodstr
Maybe some inspiration for you..
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
Hey, Victoria!
Thank you very much!
I found where to download this console interpreter. Even a whole book about him. According to the book it looks like what you need. I see. But I need to recode them, 866 or 1251 to UTF8 and back. I need it because I use online translator (bash), which only works with files encoded in UTF8. And Alaska works in 866 (ASCII code) and 1251 (ANSI).
http://www.rusf.ru/books/yo/xcode.html
http://www.rusf.ru/books/yo/xcode.html#tth_sEc2
Source code ./src/xcodesrc.zip
The program is available under the following operating systems:
DOS ./bin/xcodedos.zip it is Recommended to copy the program into one of the directories in the PATH environment variable.
Win ./bin/xcodewin.zip we recommend that you copy the program in the %WINDOWS%\COMMAND (which often coincides with C:\WINDOWS\COMMAND). This version differs from the version for DOS and compiled as a win32 console application.
Unix ./bin/linux.zip Should work on all modern Linux distributions (the program was compiled under SuSE 8.1) ./bin/xcoderedhat71.zip for RedHat 7.1 (no longer supported, compile the source) ./bin/xcodesun.zip for Sun Solaris 8 (no longer supported, compile the source).
Thank you very much!
I found where to download this console interpreter. Even a whole book about him. According to the book it looks like what you need. I see. But I need to recode them, 866 or 1251 to UTF8 and back. I need it because I use online translator (bash), which only works with files encoded in UTF8. And Alaska works in 866 (ASCII code) and 1251 (ANSI).
http://www.rusf.ru/books/yo/xcode.html
http://www.rusf.ru/books/yo/xcode.html#tth_sEc2
Source code ./src/xcodesrc.zip
The program is available under the following operating systems:
DOS ./bin/xcodedos.zip it is Recommended to copy the program into one of the directories in the PATH environment variable.
Win ./bin/xcodewin.zip we recommend that you copy the program in the %WINDOWS%\COMMAND (which often coincides with C:\WINDOWS\COMMAND). This version differs from the version for DOS and compiled as a win32 console application.
Unix ./bin/linux.zip Should work on all modern Linux distributions (the program was compiled under SuSE 8.1) ./bin/xcoderedhat71.zip for RedHat 7.1 (no longer supported, compile the source) ./bin/xcodesun.zip for Sun Solaris 8 (no longer supported, compile the source).
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
I couldn't use xcode. It seems it should work with Unicode, but works incorrectly.
So try this:
In the attached file in the folder bad result in automatic recoding. It is no good. Then I took and he had to scramble a source file in utf8 and done the translation. Everything turned out fine. The result in the folder is Good. Since the translation was made into English, the output file in utf8 and 1251 are no different.
Can You manage to find a transcoder that could be inserted in this bat-file to get it working correctly!
It would be nice if Alaska was recoding text files 866 and 1251 to utf8 and back!!!
PS
No viruses here like recoder:
http://kb.mista.ru/article.php?id=481
So try this:
Code: Select all
@echo off
@echo Translating in progress...
powershell.exe "Get-Content -Encoding ascii 'inp_1251.txt' | Out-File -Encoding utf8 'inp_utf8.txt'"
bash.exe -l -i trans -b ru:en -i c:\Aidos-X\cygwin\bin\inp_utf8.txt -o c:\Aidos-X\cygwin\bin\out_utf8.txt
powershell.exe "Get-Content -Encoding utf8 'out_utf8.txt' | Out-File -Encoding ascii 'out_1251.txt'"
@echo Translating is finished...
Can You manage to find a transcoder that could be inserted in this bat-file to get it working correctly!
It would be nice if Alaska was recoding text files 866 and 1251 to utf8 and back!!!
PS
No viruses here like recoder:
http://kb.mista.ru/article.php?id=481
- Attachments
-
- Downloads.rar
- (53.9 KiB) Downloaded 756 times
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
All made very simple on UNIX: just three lines. Everything works. Will soon show.
But under WINDOWS all the time was not the same encoding, even though I clearly indicated what is necessary.
and under win I wrote a bat:
==============================
@echo off
@echo Translating in progress...
:: ANSI -> UTF-8
chcp 1251 > nul
cmd /u /c type inp1251.txt > tmp.txt
chcp 65001 > nul
type tmp.txt > inp65001.txt
bash.exe -l-i trans -b EN:en -i C:\Aidos-X\cygwin\bin\inp65001.txt -o C:\Aidos-X\cygwin\bin\out65001.txt
:: UTF-8 -> ANSI
chcp 65001 > nul
cmd /u /c type out65001.txt > tmp.txt
chcp 1251 > nul
type tmp.txt > out1251.txt
@echo Translating is finished...
==============================
However:
:: ANSI -> UTF-8
chcp 1251 > nul
cmd /u /c type inp1251.txt > tmp.txt
chcp 65001 > nul
type tmp.txt > inp65001.txt
encode the source file inp1251.txt not in utf8, and Unicode.
Although explicitly specified codepage 65001 utf8.
But under WINDOWS all the time was not the same encoding, even though I clearly indicated what is necessary.
and under win I wrote a bat:
==============================
@echo off
@echo Translating in progress...
:: ANSI -> UTF-8
chcp 1251 > nul
cmd /u /c type inp1251.txt > tmp.txt
chcp 65001 > nul
type tmp.txt > inp65001.txt
bash.exe -l-i trans -b EN:en -i C:\Aidos-X\cygwin\bin\inp65001.txt -o C:\Aidos-X\cygwin\bin\out65001.txt
:: UTF-8 -> ANSI
chcp 65001 > nul
cmd /u /c type out65001.txt > tmp.txt
chcp 1251 > nul
type tmp.txt > out1251.txt
@echo Translating is finished...
==============================
However:
:: ANSI -> UTF-8
chcp 1251 > nul
cmd /u /c type inp1251.txt > tmp.txt
chcp 65001 > nul
type tmp.txt > inp65001.txt
encode the source file inp1251.txt not in utf8, and Unicode.
Although explicitly specified codepage 65001 utf8.
Last edited by Eugene Lutsenko on Wed Jan 03, 2018 2:36 am, edited 1 time in total.
- Eugene Lutsenko
- Posts: 1649
- Joined: Sat Feb 04, 2012 2:23 am
- Location: Russia, Southern federal district, city of Krasnodar
- Contact:
Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8
Everything is done in UNIX (sygwin). Works perfectly. This. bat file:
@echo off
@echo Translating in progress...
enca -L russian inptrans.txt -x utf8
bash.exe -l -i trans -b ru:en -i C:\Aidos-X\cygwin\bin\inptrans.txt -o C:\Aidos-X\cygwin\bin\outtrans.txt
enca -L english outtrans.txt -x 1251
@echo Translating is finished...
@echo off
@echo Translating in progress...
enca -L russian inptrans.txt -x utf8
bash.exe -l -i trans -b ru:en -i C:\Aidos-X\cygwin\bin\inptrans.txt -o C:\Aidos-X\cygwin\bin\outtrans.txt
enca -L english outtrans.txt -x 1251
@echo Translating is finished...