Check out Atomic Chess, our featured variant for November, 2024.


[ Help | Earliest Comments | Latest Comments ]
[ List All Subjects of Discussion | Create New Subject of Discussion ]
[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

Earlier Reverse Order Later
菲舍爾任意制象棋(Fischer Random Chess). 费舍尔的随机国际象棋变体 (Chinese Language)[All Comments] [Add Comment or Rating]
🕸Fergus Duniho wrote on Thu, Feb 17, 2022 06:51 PM UTC:

This page was displaying as gobbledegook, because it was not written in UTF-8. Using Notepad++, I converted it into Chinese characters. Since I cannot read Chinese, though, I cannot tell if I fixed anything. Although many characters are now Chinese, there are some suspicious characters, such as one that looks like a gamepad, the Japanese letter Ru, and the female/Venus symbol. So, I think I'll try a different approach on the backup site.


Bn Em wrote on Thu, Feb 17, 2022 11:48 PM UTC in reply to Fergus Duniho from 06:51 PM:

I think what may have happened here is the same thing that seems to have happened in a number of places around the site: at some point, a bunch of files written in UTF-8 were interpreted bytewise as what I assume is Latin-1. That also explains a number of other strange things that turn up (Ben notes that several pages have e.g. ⟨²⟩ where ⟨²⟩ is expected, and some old comments (e.g.) refer to Jörg as ⟨Jörg⟩).

I note that that transformation (and, indeed the reverse) would leave ASCII characters intact, as the backup version has (hence intact links ⁊c.) but this attempt to fix it has not (hence the broken URLs spilt around and partial names (⟨盓ric van Reem⟩) ⁊c.)

EDIT: Playing about with it, it looks like it is indeed one of the 8‐bit encodings, though the ⟨€⟩ sign suggests it's not Latin-1 but one of the others. The character between “Shuffle Chess” and “Prechess” would seem to suggest it's one where ⟨€⟩ is 0x80, as that gives a ⟨、⟩, which would make sense there (it's a punctuation used to separate items in a list) — of those, Codepage 1252 (Microsoft Windows Western European) was once quite popular iirc, so it seems a likely candidate.

EDIT 2: The backup page contains the characters Z–caron ⟨Ž⟩, S–caron ⟨Š⟩, and the O–E ligature ⟨Œ⟩; of the charsets on the linked page, codepage 1252 is the only one to contain all three of those characters, so my money is on that being the right one. As such, presumably the procedure would be to save it encoded in codepage 1252 and then open it as a UTF-8 file.

EDIT 3: A quick try at doing this with Libreoffice tells me I'm right — my Chinese isn't very good but it's enough to see that it looks plausible — the notes section for example begins with a section on how to play it on one's computer (matching the Zillions file link). Unfortunately Libreoffice still leaves a couple of things garbled: it refuses to accept bytes like 0x81 (which is unassigned in codepage 1252), rather than pass them through, which in turn leaves any character encoded using it (including the aforementioned list comma ⟨、⟩) unrecovered. A correct recovery would thus need to use software which is a bit more liberal in what it accepts/emits.


🕸Fergus Duniho wrote on Fri, Feb 18, 2022 03:33 AM UTC in reply to Bn Em from Thu Feb 17 11:48 PM:

Older pages were generally written in Latin-1, and when I tried to enforce a site-wide use of UTF-8, some things didn't convert correctly. I have some backups of the database from when the columns in MemberSubmissions were still using the latin1_swedish_ci collation instead of the utf8_general_ci collation. I'll try to put together a script that will read from a backup database and not enforce UTF-8 to see if it will display properly. But that's for tomorrow, as it is time to shut down my computer for the night.


H. G. Muller wrote on Fri, Feb 18, 2022 01:27 PM UTC:

Note there are two different kinds of Chinese: traditional (Taiwan, Hong Kong) and simplified (China). And that the Japanese use basically the same kanji script. All use different encodings, though (Big 5, GB 2312 or Shift-JIS). In Windows Big 5 = page 950, and BG2312 = page 936. Japanese = page 932.


Bn Em wrote on Fri, Feb 18, 2022 01:58 PM UTC in reply to Fergus Duniho from 03:33 AM:

tried to enforce a site-wide use of UTF-8

That'd explain it: probably the conversion caught some pages that were already in UTF-8 and reëncoded them too.

different kinds of Chinese

My quick attempt last night at deëncoding using Libreoffice gave some pretty plausible‐looking UTF-8‐encoded Traditional Chinese (modulo anything encoded with byte 0x81, or presumably any other bytes unassigned in CP1252).


🕸Fergus Duniho wrote on Fri, Feb 18, 2022 06:28 PM UTC in reply to Fergus Duniho from 03:33 AM:

I made the script to display from a backup database, but it didn't help. To prevent it from trying to use UTF-8, I even ran it on a different domain to avoid the .htaccess file on this site. But that didn't help. It appears that the corruption is in the earliest backup of the database, which is from 2017.


🕸Fergus Duniho wrote on Sat, Feb 19, 2022 12:26 AM UTC in reply to Bn Em from Fri Feb 18 01:58 PM:

I converted the introduction from every encoding available for mb_convert_encoding() to UTF-8. Some included Chinese characters, but many of these mixed in Korean or Japanese characters. The conversion most densely packed with Chinese characters was BIG-5, which I know is a Chinese encoding. But I can't tell if it says anything intelligible. Here is what I got:

Bobby Fischer簿翹?疇?兜?瓣繡??癟?〣?疇??嘔怵﹦€疑阬授 ̄汕?€嘔乒€??癡罈?簿翹?疆??疇?¯疑刈詹?珍岑阬員a href="http://www.chessvariants.org/d.chess/chess.html">疇??嘔怵﹦€疑阬授 ̄汕?€較/a> 癟禳??癡簧?矇竄??簿翹?疇?汕亂刈蜃倥汕?€嘔氐倦?癟禳??疇??疇禮?嘔抽€汕?癟翻簧疆?簪矇禳穡疆穢顫矇?繡疆??? ̄巫﹦€?瓊?砂€?Fischer疇職鱉癡?玳?疇??疇?汕永刈算€?疇礎??Capablanca癟簫?冕純敷?疆??疇?¯疑汀??嘔怵﹦€疑阬授 ̄汕?€嘔阬敉?矇竄??癟禳??疇?兜?瓣繡??癟?〣?疇???癡罈?瓣繒?嘔瓦???簿翹?癟??繞癡?玳?疇?汕亂刈領€?癟禳??癡簧?矇竄??矇?翻疆簡??疆??冕阬Ⅹ姻竹??疇?顫疆鬚穡癡癒?矇?鬚瓊?砂€? 癡?簡癡??癟?職瓣罈罈疆???疇?繞癡簣癒疆瞿?嘔佯?? ̄色€甄棺氐?瞻癡?玲?癟禳??Shuffle Chess瓊?玲?Prechess(疆???疇?汕亂刈領€?疆??冕抽€??癟禳??癡簧?矇竄??)疆??冕刈算€疑怏ˍ壅刈撢撳純敷?瓣翻??疆?簪疆??冕兩€¯秉氐溘掙岑?穡癟?兜嘔巫﹦€?矇瞽穡疆?翹瓊?砂€?矇?砂?〡乒?砂€嘔怏??疆?簡疇罈瞿疆糧?疑巫﹦€?瓣罈?嘔岑朝嘔乒€??疆簫繚疇?簡 癟??簣Eric van Reem疆?售?珍氐純姻??砂€?


🕸Fergus Duniho wrote on Sat, Feb 19, 2022 12:33 AM UTC in reply to Bn Em from Fri Feb 18 01:58 PM:

I then converted the introduction from every encoding to Windows-1252, and I got this as a conversion from UTF-8:

Bobby Fischer,前世界國際象棋冠軍,提出一種國際象棋 的變體,其中棋子的初始配置是隨機選擇的。Fischer從而加入了如Capablanca等,提出國際象棋變體的前世界冠軍之列,然而其他的變體都沒有被成功推行過。 菲舍爾任意制象棋與更古老的Shuffle Chess、Prechess(或其他有關的變體)有些類似,但是有自己獨特的風格。這個遊戲廣泛的介紹和歷史 由Eric van Reem所寫。

This looks like a clean conversion to Chinese, and I even see 象棋 repeated a few times. Does this look accurate to you, Bn?


🕸Fergus Duniho wrote on Sat, Feb 19, 2022 03:19 AM UTC:

It looks like the cause of the corruption was converting to UTF-8. When UTF-8 was converted back to Windows-1252, it turned Chinese again. Although I can read very little Chinese, this looks correct. 象棋 gets used in the Chinese word for western Chess, and in the table describing castling, 王 gets used for the King, and 車 gets used for the Rook. So, I converted the text from a backup into Windows-1252, then cut and pasted it here, so that it is stored correctly as UTF-8. I also made local links relative, fixed some links that were incorrect, put everything in the introduction section to avoid English headings, and added a last note at the end in Chinese, thanks to an online translation tool, about this being a translation of our English Fischer Random Chess page. Finally, I deleted my previous revisions.


Bn Em wrote on Sat, Feb 19, 2022 11:55 AM UTC in reply to Fergus Duniho from 12:33 AM:

That does indeed look quite plausible (as does the Shatranj page), and concords with my own efforts at making sense of it, as well as lining up, afaict, with what the English page says.

The note at the end looks a little odd (It literally reads, as far as my Chinese gets me: “This is our English page's translation Fischer Random Chess”; the syntax of the Chinese is fine afaict but the link afterward, even with a space separating it, reads strangely). Would it be worth putting the link on the Chinese for “Our English page” (i.e. “我们英文页面”) instead?

Also incidentally what software did you use to convert it to CP1252? All the immediately accessible ones on linux seem to put up a (quiet) fuss about e.g. U+0081 not being available in the target encoding.


🕸Fergus Duniho wrote on Sat, Feb 19, 2022 02:34 PM UTC in reply to Bn Em from 11:55 AM:

The note at the end looks a little odd (It literally reads, as far as my Chinese gets me: “This is our English page's translation Fischer Random Chess”;

I could write it entirely in English, since it will be of interest mainly to people who can read English.

incidentally what software did you use to convert it to CP1252?

I used PHP's mb_convert_encoding() function. I then viewed the page source and copied from that.


11 comments displayed

Earlier Reverse Order Later

Permalink to the exact comments currently displayed.