[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]
Comments by DerekNalls
'... the Battle-of-the-Goths tournament was played at 1 hour per game per side (55'+5'/move, the time on the clocks is displayed in the viewer). And you call it speed Chess. Poof, there goes half your argument up in smoke.' Sorry, I could not find the time per move on your crude web page. Nonetheless, less than 1 minute per move is much too short to yield quality moves ... at least by anything better than low standards. _________________________________________________________ 'Not that it was any good to begin with: it is well known and amply tested that the quality of computer play only is a very weak function of time control.' WRONG! The quality of computer play correlates strongly as a function of ply depth completion which, in turn, is a function of time where exponentially greater time is generally required to complete each successive ply. ___________________________________________________________________ 'The fact that you ask how 'my theory was constructed' is shocking. Didn't you notice I did not present any theory at all?' In fact, I have noticed that you have failed to present a theory to date. I apologize for politely yet incorrectly giving you the benefit of the doubt that you had developed any theory at all unpublished but somewhere within your mind. Do you actually prefer for me to state or imply that you are clueless even as you claim to be the world's foremost authority on the subject and claim the rest of us are stupid? Fine then. ____________________________________________________________ 'I just reported my OBSERVATION that quiet positions with C instead of A do not have a larger probability to win the game, and that in my opinion thus any concept of 'piece value' that does not ascribe nearly equal value to A and C is worse than useless.' When you speak of what is needed to 'win the game' you are fixating upon the mating power of pieces which translates to endgame relative piece values- NOT opening game or midgame relative piece values. Incidentally, relative piece values during the opening game are more important than during the midgame which, in turn, are more important than during the endgame. Furthermore, I am particularly wary about the use of relative piece values at all during the endgame since any theoretically deep possibility to achieve checkmate (regardless of material sacrifices), discovered or undiscovered, renders relative piece values an absolutely non-applicable and false concept. I strongly recommend that you shift your attention oppositely to the supremely-important opening game to derive more useful relative piece values. _______ 'So what have I think I proved by the battle-of-the-Goths long TC tourney about the value of A and C? Nothing of course! Did I claim I did? No, that was just a figment of your imagination!' I did not claim that I knew exactly how your ridiculous idea that an archbishop is appr. equally valuable to a chancellor originated. This 'tournament' of yours that I criticized just seems to be a part of your 'delusion maintenance' belief system. __________________________________________ 'It might be of interest to know that prof. Hyatt develops Crafty (one of the best open-source Chess engines) based on 40/1' games, as he has found that this is as accurate as using longer TC for relative performance measurement, and that Rybka (the best engine in the World) is tuned through games of 40 moves per second.' Now, you are completely confusing a method for QUICKLY and easily testing a computer hardware and software system to make sure it is operating properly with a method for achieving AI games consisting of highest quality moves of theoretical value to expert analysts of a given chess variant. I have already explained some of this to you. Gawd! ____________________________________________________ 'The method you used (testing the effect of changing the piece values, rather than the effect of changing the pieces) is highly inferior, and needs about 100 times as many games to get the statistical noise down to the same level as my method. (Because in most games, the mis-evaluated pieces would still be traded against each other.)' First, you are falsely inventing stats out of thin air! If you really were competent with statistics, then you would know the difference between their proper and improper application within your own work attempting to derive accurate relative piece values. Second, you do not recognize (due to having no experience) the surprisingly great frequency with which a typical game between two otherwise-identical versions running a quality program with contrasting relative piece values will play into each other's most significant differences in the values of a piece. Here is a hypothetical example ... If white (incorrectly) values a rook significantly higher than an archbishop AND If black (correctly) values an archbishop significantly higher than a rook, then the trade of white archbishop for a black rook will be readily permitted by both programs and is very likely to actually occur at some point during a single game or a couple-few games at most. Consequently, all things otherwise equal, white will probably lose most games which is indicative of a problem somewhere within its set of relative piece values (compared to black). __________________________________________ 'If you are not prepared to face the facts, this discussion is pointless.' When I reflect your remark back to you, I agree completely. ___________________________________________________________ 'Play a few dozen games with Smirf, at any time control you feel trustworthy, where one side lacks A and the other B+N, and see who is crushed.' relative piece values opening game (bishop pairs intact) Muller pawn 10.00 knight 35.29 bishop 45.88 rook 55.88 archbishop 102.94 chancellor 105.88 queen 111.76 Nalls pawn 10.00 knight 30.77 bishop 37.56 rook 59.43 archbishop 70.61 chancellor 94.18 queen 101.60 So, what is your problem? Both of our models are in basic agreement on this issue. There is no dispute between us. [I hate to disappoint you.] What you failed to take into account (since you refuse to educate yourself via my paper) is the 'supreme piece(s) enhancement' within my model. My published start-of-the-game relative piece values are not the final word for a simplistic model. My model is more sophisticated and adaptable with some adjustments required during the game. For CRC, the 3 most powerful pieces in the game (i.e., archbishop, chancellor, queen) share, by a weighted formula, a 12.5% bonus which contributes to 'practical attack values' (a component of material values under my model). Moreover, the shares for each piece of the 12.5% bonus typically increase, by a weighted formula, during the game as some of the 3 most powerful pieces are captured and their share(s) is inherited by the remaining piece(s). Thus, if the archbishop becomes the only remaining, most powerful piece, then it becomes much more valuable than the combined values of the bishop and knight. Notwithstanding, I'll bet you still think my model is 'worthless nonsense'. Right? In the future, please do the minimal fact finding prerequisite to making sense in what you are arguing about? ____________________________________ '... the rest of the World beware that your theory of piece values sucks in the extreme!' No, it does not. Your self-described 'far less than a theory, only an observation' comes close, though.
'So the ply depth depends only logarithmically on search time, which is VERY WEAKLY. So if you had wanted to show any understanding of the matter at hand, you should have written RIGHT! instead of WRONG! above it...' ______________________________________________________ '... it is well known and amply tested that the quality of computer play only is a very weak function of time control.' ____________________________________ I disagreed with your previous remark only because it was misleadingly, poorly expressed. You made it sound as if you barely realized at all that the quality of computer play is a function of search time. Obviously, you do. So, here is the correction you demand and deserve .... RIGHT! _______ 'Absolute nonsense. Most Capablanca Chess games are won by annihilation of the opponents Piece army, after which the winning side can easily push as many Pawns to promotion as he needs to perform a quick mate. Closely-matched end-games are relatively rare, and mating power almost plays no role at all. As long as the Pawns can promote to pieces with mating power, like Queens.' Very well. I spoke incorrectly when I creditted you with foolishly assigning the archbishop nearly equal value to the chancellor due mainly to its decent mating power, relevant mainly in endgames ... sometimes. You are even more foolish than that. You actually think the archbishop has nearly equal value to the chancellor throughout the game- in the opening game and mid-game as well. Wow! By the way, please add IM Larry Kaufmann to your dubious list of 'insufferably stupid people' who disagree with your relative piece values in CRC: http://en.wikipedia.org/wiki/Gothic_Chess ___________________________________________________ '... But let's cut the beating around the bush ...' Good idea! I have now completely run out of patience with your endless inept, amateurish attempts to discredit my work. Not because you disagree. Not even because you are unnecessarily rude and disrespectful. Instead, strictly because you have NOT done your homework! You refuse to read the same 58-page paper you are confidently grading with an 'F'. Consequently, virtually all of your criticisms to date about my model for calculating relative piece values have been incorrect, irrelevant and/or irrational. When/If you ever address concerns about my method that I can identify as making sense and knowing at least what you are talking about, then I will politely answer them. Until then, my side of this conversation is closed.
Since ... A. The argumentative posts of Muller (mainly against Scharnagl & Aberg) in advocacy of his model for relative piece values in CRC are neverending. B. My absence from this melee has not spared my curious mind the agony of reading them at all. ... I hope I can help-out by returning briefly just to point-out the six most serious, directly-paradoxical and obvious problems with Muller's model. 1. The archbishop (102.94) is very nearly as valuable as the chancellor (105.88)- 97.22%. 2. The archbishop (102.94) is nearly as valuable as the queen (111.76)- 92.11%. 3. One archbishop (102.94) is nearly as valuable as two rooks (2 x 55.88)- 92.11%. In other words, one rook (55.88) is only a little more than half as valuable as one archbishop (102.94)- 54.28%. 4. Two rooks (2 x 55.88) have a value exactly equal to one queen (111.76). 5. One knight (35.29) plus one rook (55.88) are markedly less valuable than one archbishop (102.94)- 88.57%. 6. One bishop (45.88) plus one rook (55.88) are less valuable than one archbishop (102.94)- 98.85%. None of these problems exist within the reputable models by Nalls, Scharnagl, Kaufmann, Trice or Aberg. You must honestly address all of these important concerns or realistically expect to be ignored.
A substantial revision and expansion has recently occurred. universal calculation of piece values http://www.symmetryperfect.com/shots/calc.pdf 66 pages Only three games have relative piece values calculated using this complex model: FRC, CRC and Hex Chess SS (my own invention). Furthermore, I only confidently consider my figures somewhat reliable for two of these games, FRC (including Chess) and Capablanca Random Chess, because much work has been done by many talented individuals (hopefully, including myself) as well as computers to isolate reliable material values. This dovetails into the reason that I do not take requests. I have absolutely no assurance that the effort spent outside these two established testbeds is productive at all. If it is important to you to know the material values for the pieces within your favorite chess variant (according to this model), then you must calculate them yourself. Under the recent changes to this model, the material values for FRC pieces and Hex Chess SS pieces remained exactly the same. However, the material values for a few CRC pieces changed significantly: Capablanca Random Chess material values for pieces http://www.symmetryperfect.com/shots/values-capa.pdf pawn 10.00 knight 30.77 bishop 37.56 rook 59.43 archbishop 93.95 chancellor 95.84 queen 103.05 Focused, intensive playtesting on my part has proven Muller to be correct in his radical, new contention that the accurate material value of the archbishop is extraordinarily, counter-intuitively high. I think I have successfully discovered a theoretical basis which is now explained within my 66-page paper. All of the problems (that I am presently aware of) within my set of CRC material values have now been solved. Some problems remain within Muller's set. I leave it to him whether or not to maturely discuss them.
As far as playtesting goes ... Admittedly, my initial intention was just to amuse myself by disproving the consistency of Muller's unusually-high archbishop material value in relation to other piece values within his CRC set. If indeed his archbishop material value had been as fictitious as it was radical, then this would have been readily-achievable using any high-quality chess variant program such as SMIRF. No matter what test I threw at it, this never happened. Previously, I have only used 'symmetrical playtesting'. By this I mean that the material and positions of the pieces of both players have been identical relative to one another. This is effective when playing one entire set of CRC piece values against another entire set as, for example, Reinhard Scharnagl & I have done on numerous occasions. The player that consistently wins all deep-ply (long time per move) games, alternatively playing white and black, can be safely concluded to be the player using the better of the two sets of CRC piece values since this single variable has been effectively isolated. However, this playtesting method cannot isolate which individual pieces within the set carry the most or least accurate material values. In fact, I had no problem with Muller's set of CRC piece values as a whole. The order of the material values of all of the CRC pieces was-is correct. However, I had a large problem with his material value for the archbishop being nearly as high as for the chancellor. To pinpoint an unreasonably-high material value for only one piece within a CRC set required 'asymmetrical playtesting'. By this I mean that the material and positions of the pieces of both players had to be different in an appropriate manner to test the upper and lower limits of the material value for a certain piece (e.g., archbishop). This was achieved by removing select pieces from both players within the Embassy Chess setup so that BOTH players had a significant material advantage consistent with different models (i.e., Scharnagl set vs. Muller set). This was possible strictly because of the sharp contrast between the 'normal, average' and 'very high', respectively, material values for the archbishop assigned by Scharnagl and Muller. The fact that the SMIRF program implicitly uses the Scharnagl set to play both players is a control variable- not a problem- since it is insures equality in the playing strength with which both players are handled. The player using the Scharnagl set lost every game using SMIRF MS-173h-X ... regardless of time controls, white or black player choice and all variations in excluded pieces that I could devise. I thought it was remotely possible that an intransigent, positional advantage for the Muller set somehow happened to exist within the modified Embassy Chess setup that was larger than its material disadvantage. This type of catastrophe can be the curse of 'asymmetrical playtesting'. So, I experimented likewise using a few other CRC variants. Same result! The Scharnagl set lost every game. I seriously doubt that all CRC variants (or at least, the games I tested) are realistically likely to carry an intransigent, positional advantage for the Muller set. If this is true, then the Muller set is provably, ideally suited to CRC, notwithstanding- just for a different reason. Finally, I reconsidered my position and revised my model.
For the reasons you describe (which I mostly agree with), I do not ever use 'asymmetrical playtesting' unless that method is unavoidable. However, you should know that I used many permutations of positions within my 'missing pieces' test games to try to average-out positions that may have pre-set a significant positional advantage for either player. Yes, the fact that SMIRF currently uses your (Scharnagl) material values with a 'normal, average' material value for the archbishop instead of a 'very high' material value (as well as the interrelated positional value given to the archbishop with SMIRF) means that both players will place greater effort than I think is appropriate into avoiding being forced into disadvantageous exchanges where they would trade their chancellor or queen for the archbishop of the opponent. Still, the order of your material values for CRC pieces agrees with the Muller model (although an archbishop-chancellor exchange is considered only slightly harmful to the chancellor player under his model). So, I think tests using SMIRF are meaningful even if I disagree substantially with the material value for one piece within your model (i.e., the archbishop). Due to apprehension over boring my audience with irrelevant details, I did not even mention within my previous post that I also invented a variety of 10 x 8 test games using the 10 x 8 editor available in SMIRF that were unrelated to CRC. For example, one game consisted of 1 king & 10 pawns per player with 9 archbishops for one player and 8 chancellors or queens for another player. Under the Muller model, the player with the 9 archbishops had a significant material advantage. Under the Scharnagl model, the player with the 8 chancellors or 8 queens had a significant material advantage. The player with the 9 archbishops won every game. For example, one game consisted of 1 king & 20 pawns per player with 9 archbishops for one player and 8 chancellors or queens for another player. Under the Muller model, the player with the 9 archbishops had a significant material advantage. Under the Scharnagl model, the player with the 8 chancellors or 8 queens had a significant material advantage. The player with the 9 archbishops won every game. For example, one game consisted of 1 king & 10 pawns per player with 18 archbishops for one player and 16 chancellors or queens for another player. Under the Muller model, the player with the 18 archbishops had a significant material advantage. Under the Scharnagl model, the player with the 16 chancellors or 16 queens had a significant material advantage. The player with the 18 archbishops won every game. I have seen it demonstrated many times how resilient positionally the archbishop is against the chancellor and/or the queen in virtually any game you can create using SMIRF with a 10 x 8 board and a CRC piece set. When Muller assures us that he is responsibly using statistical methods similar to those employeed by Larry Kaufmann, a widely-respected researcher of Chess piece values, I think we should take his word for it. Of course, I remain concerned about the reliability of his stats generated via using fast time controls. However, it has now been proven to me that his method is at least sensitive enough to detect 'elephants' (i.e., large discrepancies in material values) such as exist between contrasting CRC models for the archbishop even if it is not sensitive enough to detect 'mice' (i.e., small discrepancies in material values) so to speak.
Yes, your test example yields a result totally inconsistent with everyone's models for CRC piece values. [I did not run any playtest games of it since I trust you completely.] Yes, your test example could cause someone who placed too much trust in it to draw the wrong conclusion about the material values of knights vs. archbishops. The reason your test example is unreliable (and we both agree it must be) is due to its 2:1 ratio of knights to archbishops. The game is a victory for the knights player simply because he/she can overrun the archbishops player and force materially-disadvantageous exchanges despite the fact that 4 archbishops indisputably have a material value significantly greater than 8 knights. In all three of my test examples from my previous post, the ratios of archbishops to chancellors and archbishops to queens were only 9:8. Note the sharp contrast. Although I agree that a 1:1 ratio is the ideal goal, it was impossible to achieve for the purposes of the tests. I do not believe a slight disparity (1 piece) in the total number of test pieces per player is enough to make the test results highly unreliable. [Yes, feel free to invalidate my test example with 18 archbishops vs. 16 chancellors and 18 archbishops vs. 16 queens since a 2 piece advantage existed.] Although surely imperfect and slightly unreliable, I think the test results achieved thru 'asymmetrical playtesting' or 'games with different armies' can be instructive as long as the test conditions are not pushed to the extreme. Your test example was extreme. Two out of three of my test examples were not extreme.
Feel free to invalidate my other two test examples I (reluctantly) mentioned as well. My reason is that having ranks nearly full of archbishops, chancellors or queens in test games does not even resemble a proper CRC variant setup with its variety and placement of pieces. Therefore, those test results cannot safely be concluded to have any bearing upon the material values of pieces in any CRC variant. Your reason is well-expressed.
The feasibility of using identical armies to calculate piece values It has been a long time since our sets of CRC piece values have played one another (on my dual 2.4 Ghz CPU server) using otherwise-identical versions of SMIRF. Obviously, the reason is that it has been a long time since there existed a large disparity within our material values for any one of the CRC pieces. Recently, that has changed in the case of the archbishop. I already have the standard version of SMIRF MS-174b-O which uses Scharnagl CRC piece values. Would you be willing to compile a special version of SMIRF MS-174b-O for me which uses Nalls CRC piece values? Capablanca Random Chess material values of pieces http://www.symmetryperfect.com/shots/values-capa.pdf Back on safe ground using 'symmetrical playtesting', the results of who wins the test games should be indicative of who is using a better set of CRC piece values.
I understand. I wondered what the 'X' & 'O' designations for recent SMIRF versions meant. Do you still possess an older version of SMIRF (of satisfactory quality to you) that uses your current CRC material values? Since there is appr. 2-1/2 pawns difference between our models in our material values for the archbishop, I predict that my playtesting results would probably be worthwhile and decisive.
Your revised material values for SMIRF look fine to me. I have written them down for safekeeping. Which version will you be compiling? Of course, I do not plan to playtest anyone's material values for pieces upon the 8 x 8 board- only material values for CRC pieces upon the 10 x 8 board.
I have adequate confidence in my latest material values to ask you to publish them upon your web page (instead of my previous material values). CRC material values of pieces http://www.symmetryperfect.com/shots/values-capa.pdf They are, in principle, similar to Muller's set for every piece except that they run on a comparatively compressed scale. Even though I have not yet playtested them, I consider my tentative confidence rational (although admittedly premature and risky) because I trust Muller's methods of playtesting his own material values and I think my latest revisions to my model are conceptually valid.
Muller: You have my best regards toward your worthwhile effort to publish your empirical, statistical method for obtaining the material values of pieces in the ICGA Journal. My assessment is that it will surely be a much better paper than the junk [name removed] published in the same journal regarding piece values. [The above has been edited to remove a name and/or site reference. It is the policy of cv.org to avoid mention of that particular name and site to remove any threat of lawsuits. Sorry to have to do that, but we must protect ourselves. -D. Howe]
re: Muller's assessment of 5 methods of deriving material values for CRC pieces 'I am not sure how much of the agreement between (3) and (4) can be ascribed to the playtesting, and how much to the theoretical arguments ...' As much playtesting as possible. Unfortunately, that amount is deficient by my standards (and yours). I have tried to compensate for marginal quantity with high quality via long time controls. You use a converse approach with opposite emphasis. Given enough years (working with only one server), this quantity of well-played games may eventually become adequate. ' ... and it is not clear how well the theoretical arguments are able to PREdict piece values rather than POSTdict them.' You have pinpointed my greatest disappointment and frustration thusfar with my ongoing work. To date, my theoretical model has not made any impressive predictions verified by playtesting. To the contrary, it has been revised, expanded and complicated many times upon discovery that it was grossly in error or out of conformity with reality. Although the foundations of the theoretical model are built upon arithmetic and geometry to the greatest extent possible with verifiable phenomena important to material values of pieces used logically for refinements, mathematical modelling can be misused to postulate and describe in detail the existence of almost any imaginable non-existent phenomena. For example, the Ptolemy model of the solar system.
'I never found any effect of the time control on the scores I measure for some material imbalance. Within statistical error, the combinations I tries produced the same score at 40/15', 40/20', 40/30', 40/40', 40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I did not consider it worth doing just to prove that it was a waste of time...' _________ The additional time I normally give to playtesting games to improve the move quality is partially wasted because I can only control the time per move instead of the number of plies completed using most chess variant programs. This usually results in the time expiring while it is working on an incomplete ply. Then, it prematurely spits out a move representative of an incomplete tour of the moves available within that ply at a random fraction of that ply. Since there is always more than one move (often, a few-several) under evaluation as being the best possible move [Otherwise, the chosen move would have already been executed.], this means that any move on this 'list of top candidates' is equally likely to be randomly executed. Here are two typical scenarios that should cover what usually happens: A. If the list of top candidates in an 11-ply search consists of 6 moves where the list of top candidates in a 10-ply search consists of 7 moves, then only 1 discovered-to-be-less-than-the-best move has been successfully excluded and cannot be executed. Of course, an 11-ply search completion may typically require est. 8-10 times as much time as the search completions for all previous plies (1-ply thru 10-ply) up until then added together. OR B. If the list of top candidates in an 11-ply search consists of 7 moves [Moreover, the exact same 7 moves.] just as the preceding 10-ply search, then there is no benefit at all in expending 8-10 times as much time. ______________________________________________________________ The reason I endure this brutal waiting game is not for purely masochistic experience but because the additional time has a tangible chance (although no guarantee) of yielding a better move with every occasion. Throughout the numerous moves within a typical game, it can be realistically expected to yield better moves on dozens of occasions. We usually playtest for purposes at opposite extremes of the spectrum yet I regard our efforts as complimentary toward building a complete picture involving material values of pieces. You use 'asymmetrical playtesting' with unequal armies on fast time controls, collect and analyze statistics ... to determine a range, with a margin of error, for individual material piece values. I remain amazed (although I believe you) that you actually obtain any meaningful results at all via games that are played so quickly that the AI players do not have 'enough time to think' while playing games so complex that every computer (and person) needs time to think to play with minimal competence. Can you explain to me in a way I can understand how and why you are able to successfully obtain valuable results using this method? The quality of your results was utterly surprising to me. I apologize for totally doubting you when you introduced your results and mentioned how you obtained them. I use 'symmetrical playtesting' with identical armies on very slow time controls to obtain the best moves realistically possible from an evaluation function thereby giving me a winner (that is by some margin more likely than not deserving) ... to determine which of two sets of material piece values is probably (yet not certainly) better. Nonetheless, as more games are likewise played ... If they present a clear pattern, then the results become more probable to be reliable, decisive and indicative of the true state of affairs. The chances of flipping a coin once and it landing 'heads' are equal to it landing 'tails'. However, the chances of flipping a coin 7 times and it landing 'heads' all 7 times in a row are 1/128. Now, replace the concepts 'heads' and 'tails' with 'victory' and 'defeat'. I presume you follow my point. The results of only a modest number of well-played games can definitely establish their significance beyond chance and to the satisfaction of reasonable probability for a rational human mind. [Most of us, including me, do not need any better than a 95%-99% success to become convinced that there is a real correlation at work even though such is far short of an absolute 100% mathematical proof.] In my experience, I have found that using any less than 10 minutes per move will cause at least one instance within a game when an AI player makes a move that is obvious to me (and correctly assessed as truly being) a poor move. Whenever this occurs, it renders my playtesting results tainted and useless for my purposes. Sometimes this occurs during a game played at 30 minutes per move. However, this rarely occurs during a game played at 90 minutes per move. For my purposes, it is critically important above all other considerations that the winner of these time-consuming games be correctly determined 'most of the time' since 'all of the time' is impossible to assure. I must do everything within my power to get as far from 50% toward 100% reliability in correctly determining the winner. Hence, I am compelled to play test games at nearly the longest survivable time per move to minimize the chances that any move played during a game will be an obviously poor move that could have changed the destiny of the game thereby causing the player that should have won to become the loser, instead. In fact, I feel as if I have no choice under the circumstances.
Before Scharnagl sent me three special versions of SMIRF MS-174c compiled with the CRC material values of Scharnagl, Muller & Nalls, I began playtesting something else that interested me using SMIRF MS-174b-O. I am concerned that the material value of the rook (especially compared to the queen) amongst CRC pieces in the Muller model is too low: rook 55.88 queen 111.76 This means that 2 rooks exactly equal 1 queen in material value. According to the Scharnagl model: rook 55.71 queen 91.20 This means that 2 rooks have a material value (111.42) 22.17% greater than 1 queen. According to the Nalls model: rook 59.43 queen 103.05 This means that 2 rooks have a material value (118.86) 15.34% greater than 1 queen. Essentially the Scharnagl & Nalls models are in agreement in predicting victories in a CRC game for the player missing 1 queen yet possessing 2 rooks. By contrast, the Muller model predicts draws (or appr. equal number of victories and defeats) in a CRC game for either player. I put this extraordinary claim to the test by playing 2 games at 10 minutes per move on an appropriately altered Embassy Chess setup with the missing-1-queen player and the missing-2-rooks player each having a turn at white and black. The missing-2-rooks player lost both games and was always behind. They were not even long games at 40-60 moves. Muller: I think you need to moderately raise the material value of your rook in CRC. It is out of its proper relation with the other material values within the set.
'You hardly have the possibility of trading it before there are open files. So it stands to reason that you might as well use the higher value during the entire game.' Well, I understand and accept your reasons for leaving your lower rook value in CRC as is. It is interesting that you thoroughly understand and accept the reasons of others for using a higher rook value in CRC as well. Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic? _____________________________ '... if we both play a Q-2R match from the opening, it is a serious problem if we don't get the same result. But you have played only 2 games. Statistically, 2 games mean NOTHING.' I never falsely claimed or implied that only 2 games at 10 minutes per move mean everything or even mean a great deal (to satisfy probability overwhelmingly). However, they mean significantly more than nothing. I cannot accept your opinion, based upon a purely statistical viewpoint, since it is at the exclusion another applicable mathematical viewpoint. They definitely mean something ... although exactly how much is not easily known or quantified (measured) mathematically. __________________________________________________ 'I don't even look at results before I have at least 100 games, because before they are about as likely to be the reverse from what they will eventually be, as not.' Statistically, when dealing with speed chess games populated exclusively with virtually random moves ... YES, I can understand and agree with you requiring a minimum of 100 games. However, what you are doing is at the opposite extreme from what I am doing via my playtesting method. Surely you would agree that IF I conducted only 2 games with perfect play for both players that those results would mean EVERYTHING. Unfortunately, with state-of-the-art computer hardware and chess variant programs (such as SMIRF), this is currently impossible and will remain impossible for centuries-millennia. Nonetheless, games played at 100 minutes per move (for example) have a much greater probability of correctly determining which player has a definite, significant advantage than games played at 10 seconds per move (for example). Even though these 'deep games' play of nowhere near 600 times better quality than these 'shallow games' as one might naively expect (due to a non-linear correlation), they are far from random events (to which statistical methods would then be fully applicable). Instead, they occupy a middleground between perfect play games and totally random games. [In my studied opinion, the example 'middleground games' are more similar to and closer to perfect play games than totally random games.] To date, much is unknown to combinatorial game theory about the nature of these 'middleground games'. Remember the analogy to coin flips that I gave you? Well, in fact, the playtest games I usually run go far above and beyond such random events in their probable significance per event. If the SMIRF program running at 90 minutes per move casted all of its moves randomly and without any intelligence at all (as a perfect woodpusher), only then would my 'coin flip' analogy be fully applicable. Therefore, when I estimate that it would require 6 games (for example) for me to determine, IF a player with a given set of piece values loses EVERY game, that there is only a 63/64 chance that the result is meaningful (instead of random bad luck), I am being conservative to the extreme. The true figure is almost surely higher than a 63/64 chance. By the way, if you doubt that SMIRF's level of play is intelligent and non-random, then play a CRC variant of your choice against it at 90 minutes per move. After you lose repeatedly, you may not be able to credit yourself with being intelligent either (although you should) ... if you insist upon holding an impractically high standard to define the word. ______ 'If you find a discrepancy, it is enormously more likely that the result of your 2-game match is off from its true win probability.' For a 2-game match ... I agree. However, this may not be true for a 4-game, 6-game or 8-game match and surely is not true to the extremes you imagine. Everything is critically dependant upon the specifications of the match. The number of games played (of course), the playing strength or quality of the program used, the speed of the computer and the time or ply depth per move are the most important factors. _________________________________________________________ 'Play 100 games, and the error in the observed score is reasonable certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.' It would require est. 20 years for me to generate 100 games with the quality (and time controls) I am accustomed to and somewhat satisfied with. Unfortunately, it is not that important to me just to get you to pay attention to the results for the benefit of only your piece values model. As a practical concern to you, everyone else who is working to refine quality piece values models in FRC and CRC will have likely surpassed your achievements by then IF you refuse to learn anything from the results of others who use different yet valid and meaningful methods for playtesting and mathematical analysis than you.
'Of course, that is easily quantified. The entire mathematical field of statistics is designed to precisely quantify such things, through confidence levels and uncertainty intervals.' No, it is not easily quantified. Some things of numerical importance as well as geometric importance that we try to understand or prove in the study of chess variants are NOT covered or addressed by statistics. I wish our field of interest was that simple (relatively speaking) and approachable but it is far more complicated and interdisciplinary. All you talk about is statistics. Is this because statistics is all you know well? ___________ 'That difference just can't be seen with two games. Play 100. There is no shortcut.' I agree. Not with only 2 games. However ... With only 4 games, IF they were ALL victories or defeats for the player using a given piece values model, I could tell you with confidence that there is at least a 15/16 chance the given piece values model is stronger or weaker, respectively, than the piece values model used by its opponent. [Otherwise, the results are inconclusive and useless.] Furthermore, based upon the average number of moves per game required for victory or defeat compared to the established average number of moves in a long, close game, I could probably, correctly estimate whether one model was a little or a lot stronger or weaker, respectively, than the other model. Thus, I will not play 100 games because there is no pressing, rational need to reduce the 'chance of random good-bad luck' to the ridiculously-low value of 'the inverse of (base 2 to exponent 100)'. Is there anything about the odds associated with 'flipping a coin' that is beyond your ability to understand? This is a fundamental mathematical concept applicable without reservation to symmetrical playtesting. In any case, it is a legitimate 'shortcut' that I can and will use freely. ________________ 'Even perfect play doesn't help. We do have perfect play for all 6-men positions.' I meant perfect play throughout an entire game of a CRC variant involving 40 pieces initially. That is why I used the word 'impossible' with reference to state-of-the-art computer technology. _______________________________________________________ 'This is approximately master-level play.' Well, if you are getting master-level play from Joker80 with speed chess games, then I am surely getting a superior level of play from SMIRF with much longer times and deeper plies per move. You see, I used the term 'virtually random moves' appropriately in a comparative context based upon my experience. _____________________________________________ 'Doesn't matter if you play at an hour per move, a week per move, a year per move, 100 year per move. The error will remain >=32%. So if you want to play 100 years per move, fine. But you will still need 100 games.' Of course, it matters a lot. If the program is well-written, then the longer it runs per move, the more plies it completes per move and consequently, the better the moves it makes. Hence, the entire game played will progressively approach the ideal of perfect play ... even though this finite goal is impossible to attain. Incisive, intelligent, resourceful moves must NOT to be confused with or dismissed as purely random moves. Although I could humbly limit myself to applying only statistical methods, I am totally justified, in this case, in more aggressively using the 'probabilities associated with N coin flips ALL with the same result' as an incomplete, minimum value before even taking the playing strength of SMIRF at extremely-long time controls into account to estimate a complete, maximum value. ______________________________________________________________ 'The advantage that a player has in terms of winning probability is the same at any TC I ever tried, and can thus equally reliably be determined with games of any duration.' You are obviously lacking completely in the prerequisite patience and determination to have EVER consistently used long enough time controls to see any benefit whatsoever in doing so. If you had ever done so, then you would realize (as everyone else who has done so realizes) that the quality of the moves improves and even if the winning probability has not changed much numerically in your experience, the figure you obtain is more reliable. [I cannot prove to you that this 'invisible' benefit exists statistically. Instead, it is an important concept that you need to understand in its own terms. This is essential to what most playtesters do, with the notable exception of you. If you want to understand what I do and why, then you must come to grips with this reality.]
CRC piece values tournament http://www.symmetryperfect.com/pass/ Just push the 'download now' button. Game #1 Scharnagl vs. Muller 10 minutes per move SMIRF MS-174c Result- inconclusive. Draw after 87 moves by black. Perpetual check declared.
'This discussion is pointless.' On this one occasion, I agree with you. However, I cannot just let you get away with some of your most outrageous remarks to date. So, unfortunately, this discussion is not yet over. ____________________________________________ 'First you should have results, then it becomes possible to talk about what they mean. You have no result.' Of course, I have a result! The result is obviously the game itself as a win, loss or draw for the purposes of comparing the playing strengths of two players using different sets of CRC piece values. The result is NOT statistical in nature. Instead, the result is probabilistic in nature. I have thoroughly explained this purpose and method to you. I understand it. Reinhard Scharnagl understands it. You do not understand it. I can accept that. However, instead of admitting that you do not understand it, you claim there is nothing to understand. ______________________________________ 'Two sets of piece values as different as day and night, and the only thing you can come up with is that their comparison is 'inconclusive'.' Yes. Draws make it impossible to determine which of two sets of piece values is stronger or weaker. However, by increasing the time (and plies) per move, smaller differences in playing strength can sometimes be revealed with 'conclusive' results. I will attempt the next pair of Scharnagl vs. Muller and Muller vs. Scharnagl games at 30 minutes per move. Knowing how much you appreciate my efforts on your behalf motivates me. ___________________________________________________ 'Talk about pathetic: even the two games you played are the same.' Only one game was played. The logs you saw were produced by the Scharnagl (standard) version of SMIRF for the white player and the Muller (special) version of SMIRF for the black player. The game is handled in this manner to prevent time from being expired without computation occurring. ___________________________________________________ '... does your test setup s*ck!' What, now you hate Embassy Chess too? Take up this issue with Kevin Hill.
Since I had to endure one of your long bedtime stories (to be sure), you are going to have to endure one of mine. Yet unlike yours [too incoherent to merit a reply], mine carries an important point: Consider it a test of your common sense- Here is a scenario ... 01. It is the year 2500 AD. 02. Androids exist. 03. Androids cannot tell lies. 04. Androids can cheat, though. 05. Androids are extremely intelligent in technical matters. 06. Your best friend is an android. 07. It tells you that it won the lottery. 08. You verify that it won the lottery. 09. It tells you that it purchased only one lottery ticket. 10. You verify that it purchased only one lottery ticket. 11. The chance of winning the lottery with only one ticket is 1 out of 100 million. 12. It tells you that it cheated to win the lottery by hacking into its computer system immediately after the winning numbers were announced, purchasing one winning ticket and back-dating the time of the purchase. ____________________________________________ You have only two choices as to what to believe happened- A. The android actually won the lottery by cheating. OR B. The android actually won the lottery by good luck. The android was mistaken in thinking it successfully cheated. ______________________________________________________ The chance of 'A' being true is 99,999,999 out of 100,000,000. The chance of 'B' being true is 1 out of 100,000,000. ________________________________________________ I would place my bet upon 'A' being true because I do not believe such unlikely coincidences will actually occur. You would place your bet upon 'B' being true because you do not believe such unlikely coincidences have any statistical significance whatsoever. _________________________________________ I make this assessment of your judgment ability fairly because you think it is a meaningless result if a player with one set of CRC piece values wins against its opponent 10-times-in-a-row even as the chance of it being 'random good luck' is indisputably only 1 out of 1024. By the way ... base 2 to exponent 100 equals 1,267,650,600,228,229,401,496,703,205,376. Can you see how ridiculous your demand of 100 games is?
'Is this story meant to illustrate that you have no clue as to how to calculate statistical significance?' No. This story is meant to illustrate that you have no clue as to how to calculate probabilistic significance ... and it worked perfectly. ________________________________________________________ There you go again. Missing the point entirely and ranting about probabilities not being proper statistics.
To anyone who was interested ... My playtesting efforts using SMIRF have been suspended indefinitely due to a serious checkmate bug which tainted the first game at 30 minutes per move between Scharnagl's and Muller's sets of CRC piece values.
Since Muller's Joker80 has recently established itself via 'The Battle Of The (Unspeakables)' tournament as the best free CRC program in the world, I checked it out. I must report that setting-up Winboard F (also written by Muller) to use it was straight-forward with helpful documentation. Generally, I am finding the features of Joker80 to be versatile and capable for any reasonable uses.
25 comments displayed
Permalink to the exact comments currently displayed.