[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]
Comments by DerekNalls
Your revised material values for SMIRF look fine to me. I have written them down for safekeeping. Which version will you be compiling? Of course, I do not plan to playtest anyone's material values for pieces upon the 8 x 8 board- only material values for CRC pieces upon the 10 x 8 board.
I have adequate confidence in my latest material values to ask you to publish them upon your web page (instead of my previous material values). CRC material values of pieces http://www.symmetryperfect.com/shots/values-capa.pdf They are, in principle, similar to Muller's set for every piece except that they run on a comparatively compressed scale. Even though I have not yet playtested them, I consider my tentative confidence rational (although admittedly premature and risky) because I trust Muller's methods of playtesting his own material values and I think my latest revisions to my model are conceptually valid.
Muller: You have my best regards toward your worthwhile effort to publish your empirical, statistical method for obtaining the material values of pieces in the ICGA Journal. My assessment is that it will surely be a much better paper than the junk [name removed] published in the same journal regarding piece values. [The above has been edited to remove a name and/or site reference. It is the policy of cv.org to avoid mention of that particular name and site to remove any threat of lawsuits. Sorry to have to do that, but we must protect ourselves. -D. Howe]
re: Muller's assessment of 5 methods of deriving material values for CRC pieces 'I am not sure how much of the agreement between (3) and (4) can be ascribed to the playtesting, and how much to the theoretical arguments ...' As much playtesting as possible. Unfortunately, that amount is deficient by my standards (and yours). I have tried to compensate for marginal quantity with high quality via long time controls. You use a converse approach with opposite emphasis. Given enough years (working with only one server), this quantity of well-played games may eventually become adequate. ' ... and it is not clear how well the theoretical arguments are able to PREdict piece values rather than POSTdict them.' You have pinpointed my greatest disappointment and frustration thusfar with my ongoing work. To date, my theoretical model has not made any impressive predictions verified by playtesting. To the contrary, it has been revised, expanded and complicated many times upon discovery that it was grossly in error or out of conformity with reality. Although the foundations of the theoretical model are built upon arithmetic and geometry to the greatest extent possible with verifiable phenomena important to material values of pieces used logically for refinements, mathematical modelling can be misused to postulate and describe in detail the existence of almost any imaginable non-existent phenomena. For example, the Ptolemy model of the solar system.
'I never found any effect of the time control on the scores I measure for some material imbalance. Within statistical error, the combinations I tries produced the same score at 40/15', 40/20', 40/30', 40/40', 40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I did not consider it worth doing just to prove that it was a waste of time...' _________ The additional time I normally give to playtesting games to improve the move quality is partially wasted because I can only control the time per move instead of the number of plies completed using most chess variant programs. This usually results in the time expiring while it is working on an incomplete ply. Then, it prematurely spits out a move representative of an incomplete tour of the moves available within that ply at a random fraction of that ply. Since there is always more than one move (often, a few-several) under evaluation as being the best possible move [Otherwise, the chosen move would have already been executed.], this means that any move on this 'list of top candidates' is equally likely to be randomly executed. Here are two typical scenarios that should cover what usually happens: A. If the list of top candidates in an 11-ply search consists of 6 moves where the list of top candidates in a 10-ply search consists of 7 moves, then only 1 discovered-to-be-less-than-the-best move has been successfully excluded and cannot be executed. Of course, an 11-ply search completion may typically require est. 8-10 times as much time as the search completions for all previous plies (1-ply thru 10-ply) up until then added together. OR B. If the list of top candidates in an 11-ply search consists of 7 moves [Moreover, the exact same 7 moves.] just as the preceding 10-ply search, then there is no benefit at all in expending 8-10 times as much time. ______________________________________________________________ The reason I endure this brutal waiting game is not for purely masochistic experience but because the additional time has a tangible chance (although no guarantee) of yielding a better move with every occasion. Throughout the numerous moves within a typical game, it can be realistically expected to yield better moves on dozens of occasions. We usually playtest for purposes at opposite extremes of the spectrum yet I regard our efforts as complimentary toward building a complete picture involving material values of pieces. You use 'asymmetrical playtesting' with unequal armies on fast time controls, collect and analyze statistics ... to determine a range, with a margin of error, for individual material piece values. I remain amazed (although I believe you) that you actually obtain any meaningful results at all via games that are played so quickly that the AI players do not have 'enough time to think' while playing games so complex that every computer (and person) needs time to think to play with minimal competence. Can you explain to me in a way I can understand how and why you are able to successfully obtain valuable results using this method? The quality of your results was utterly surprising to me. I apologize for totally doubting you when you introduced your results and mentioned how you obtained them. I use 'symmetrical playtesting' with identical armies on very slow time controls to obtain the best moves realistically possible from an evaluation function thereby giving me a winner (that is by some margin more likely than not deserving) ... to determine which of two sets of material piece values is probably (yet not certainly) better. Nonetheless, as more games are likewise played ... If they present a clear pattern, then the results become more probable to be reliable, decisive and indicative of the true state of affairs. The chances of flipping a coin once and it landing 'heads' are equal to it landing 'tails'. However, the chances of flipping a coin 7 times and it landing 'heads' all 7 times in a row are 1/128. Now, replace the concepts 'heads' and 'tails' with 'victory' and 'defeat'. I presume you follow my point. The results of only a modest number of well-played games can definitely establish their significance beyond chance and to the satisfaction of reasonable probability for a rational human mind. [Most of us, including me, do not need any better than a 95%-99% success to become convinced that there is a real correlation at work even though such is far short of an absolute 100% mathematical proof.] In my experience, I have found that using any less than 10 minutes per move will cause at least one instance within a game when an AI player makes a move that is obvious to me (and correctly assessed as truly being) a poor move. Whenever this occurs, it renders my playtesting results tainted and useless for my purposes. Sometimes this occurs during a game played at 30 minutes per move. However, this rarely occurs during a game played at 90 minutes per move. For my purposes, it is critically important above all other considerations that the winner of these time-consuming games be correctly determined 'most of the time' since 'all of the time' is impossible to assure. I must do everything within my power to get as far from 50% toward 100% reliability in correctly determining the winner. Hence, I am compelled to play test games at nearly the longest survivable time per move to minimize the chances that any move played during a game will be an obviously poor move that could have changed the destiny of the game thereby causing the player that should have won to become the loser, instead. In fact, I feel as if I have no choice under the circumstances.
Before Scharnagl sent me three special versions of SMIRF MS-174c compiled with the CRC material values of Scharnagl, Muller & Nalls, I began playtesting something else that interested me using SMIRF MS-174b-O. I am concerned that the material value of the rook (especially compared to the queen) amongst CRC pieces in the Muller model is too low: rook 55.88 queen 111.76 This means that 2 rooks exactly equal 1 queen in material value. According to the Scharnagl model: rook 55.71 queen 91.20 This means that 2 rooks have a material value (111.42) 22.17% greater than 1 queen. According to the Nalls model: rook 59.43 queen 103.05 This means that 2 rooks have a material value (118.86) 15.34% greater than 1 queen. Essentially the Scharnagl & Nalls models are in agreement in predicting victories in a CRC game for the player missing 1 queen yet possessing 2 rooks. By contrast, the Muller model predicts draws (or appr. equal number of victories and defeats) in a CRC game for either player. I put this extraordinary claim to the test by playing 2 games at 10 minutes per move on an appropriately altered Embassy Chess setup with the missing-1-queen player and the missing-2-rooks player each having a turn at white and black. The missing-2-rooks player lost both games and was always behind. They were not even long games at 40-60 moves. Muller: I think you need to moderately raise the material value of your rook in CRC. It is out of its proper relation with the other material values within the set.
'You hardly have the possibility of trading it before there are open files. So it stands to reason that you might as well use the higher value during the entire game.' Well, I understand and accept your reasons for leaving your lower rook value in CRC as is. It is interesting that you thoroughly understand and accept the reasons of others for using a higher rook value in CRC as well. Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic? _____________________________ '... if we both play a Q-2R match from the opening, it is a serious problem if we don't get the same result. But you have played only 2 games. Statistically, 2 games mean NOTHING.' I never falsely claimed or implied that only 2 games at 10 minutes per move mean everything or even mean a great deal (to satisfy probability overwhelmingly). However, they mean significantly more than nothing. I cannot accept your opinion, based upon a purely statistical viewpoint, since it is at the exclusion another applicable mathematical viewpoint. They definitely mean something ... although exactly how much is not easily known or quantified (measured) mathematically. __________________________________________________ 'I don't even look at results before I have at least 100 games, because before they are about as likely to be the reverse from what they will eventually be, as not.' Statistically, when dealing with speed chess games populated exclusively with virtually random moves ... YES, I can understand and agree with you requiring a minimum of 100 games. However, what you are doing is at the opposite extreme from what I am doing via my playtesting method. Surely you would agree that IF I conducted only 2 games with perfect play for both players that those results would mean EVERYTHING. Unfortunately, with state-of-the-art computer hardware and chess variant programs (such as SMIRF), this is currently impossible and will remain impossible for centuries-millennia. Nonetheless, games played at 100 minutes per move (for example) have a much greater probability of correctly determining which player has a definite, significant advantage than games played at 10 seconds per move (for example). Even though these 'deep games' play of nowhere near 600 times better quality than these 'shallow games' as one might naively expect (due to a non-linear correlation), they are far from random events (to which statistical methods would then be fully applicable). Instead, they occupy a middleground between perfect play games and totally random games. [In my studied opinion, the example 'middleground games' are more similar to and closer to perfect play games than totally random games.] To date, much is unknown to combinatorial game theory about the nature of these 'middleground games'. Remember the analogy to coin flips that I gave you? Well, in fact, the playtest games I usually run go far above and beyond such random events in their probable significance per event. If the SMIRF program running at 90 minutes per move casted all of its moves randomly and without any intelligence at all (as a perfect woodpusher), only then would my 'coin flip' analogy be fully applicable. Therefore, when I estimate that it would require 6 games (for example) for me to determine, IF a player with a given set of piece values loses EVERY game, that there is only a 63/64 chance that the result is meaningful (instead of random bad luck), I am being conservative to the extreme. The true figure is almost surely higher than a 63/64 chance. By the way, if you doubt that SMIRF's level of play is intelligent and non-random, then play a CRC variant of your choice against it at 90 minutes per move. After you lose repeatedly, you may not be able to credit yourself with being intelligent either (although you should) ... if you insist upon holding an impractically high standard to define the word. ______ 'If you find a discrepancy, it is enormously more likely that the result of your 2-game match is off from its true win probability.' For a 2-game match ... I agree. However, this may not be true for a 4-game, 6-game or 8-game match and surely is not true to the extremes you imagine. Everything is critically dependant upon the specifications of the match. The number of games played (of course), the playing strength or quality of the program used, the speed of the computer and the time or ply depth per move are the most important factors. _________________________________________________________ 'Play 100 games, and the error in the observed score is reasonable certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.' It would require est. 20 years for me to generate 100 games with the quality (and time controls) I am accustomed to and somewhat satisfied with. Unfortunately, it is not that important to me just to get you to pay attention to the results for the benefit of only your piece values model. As a practical concern to you, everyone else who is working to refine quality piece values models in FRC and CRC will have likely surpassed your achievements by then IF you refuse to learn anything from the results of others who use different yet valid and meaningful methods for playtesting and mathematical analysis than you.
'Of course, that is easily quantified. The entire mathematical field of statistics is designed to precisely quantify such things, through confidence levels and uncertainty intervals.' No, it is not easily quantified. Some things of numerical importance as well as geometric importance that we try to understand or prove in the study of chess variants are NOT covered or addressed by statistics. I wish our field of interest was that simple (relatively speaking) and approachable but it is far more complicated and interdisciplinary. All you talk about is statistics. Is this because statistics is all you know well? ___________ 'That difference just can't be seen with two games. Play 100. There is no shortcut.' I agree. Not with only 2 games. However ... With only 4 games, IF they were ALL victories or defeats for the player using a given piece values model, I could tell you with confidence that there is at least a 15/16 chance the given piece values model is stronger or weaker, respectively, than the piece values model used by its opponent. [Otherwise, the results are inconclusive and useless.] Furthermore, based upon the average number of moves per game required for victory or defeat compared to the established average number of moves in a long, close game, I could probably, correctly estimate whether one model was a little or a lot stronger or weaker, respectively, than the other model. Thus, I will not play 100 games because there is no pressing, rational need to reduce the 'chance of random good-bad luck' to the ridiculously-low value of 'the inverse of (base 2 to exponent 100)'. Is there anything about the odds associated with 'flipping a coin' that is beyond your ability to understand? This is a fundamental mathematical concept applicable without reservation to symmetrical playtesting. In any case, it is a legitimate 'shortcut' that I can and will use freely. ________________ 'Even perfect play doesn't help. We do have perfect play for all 6-men positions.' I meant perfect play throughout an entire game of a CRC variant involving 40 pieces initially. That is why I used the word 'impossible' with reference to state-of-the-art computer technology. _______________________________________________________ 'This is approximately master-level play.' Well, if you are getting master-level play from Joker80 with speed chess games, then I am surely getting a superior level of play from SMIRF with much longer times and deeper plies per move. You see, I used the term 'virtually random moves' appropriately in a comparative context based upon my experience. _____________________________________________ 'Doesn't matter if you play at an hour per move, a week per move, a year per move, 100 year per move. The error will remain >=32%. So if you want to play 100 years per move, fine. But you will still need 100 games.' Of course, it matters a lot. If the program is well-written, then the longer it runs per move, the more plies it completes per move and consequently, the better the moves it makes. Hence, the entire game played will progressively approach the ideal of perfect play ... even though this finite goal is impossible to attain. Incisive, intelligent, resourceful moves must NOT to be confused with or dismissed as purely random moves. Although I could humbly limit myself to applying only statistical methods, I am totally justified, in this case, in more aggressively using the 'probabilities associated with N coin flips ALL with the same result' as an incomplete, minimum value before even taking the playing strength of SMIRF at extremely-long time controls into account to estimate a complete, maximum value. ______________________________________________________________ 'The advantage that a player has in terms of winning probability is the same at any TC I ever tried, and can thus equally reliably be determined with games of any duration.' You are obviously lacking completely in the prerequisite patience and determination to have EVER consistently used long enough time controls to see any benefit whatsoever in doing so. If you had ever done so, then you would realize (as everyone else who has done so realizes) that the quality of the moves improves and even if the winning probability has not changed much numerically in your experience, the figure you obtain is more reliable. [I cannot prove to you that this 'invisible' benefit exists statistically. Instead, it is an important concept that you need to understand in its own terms. This is essential to what most playtesters do, with the notable exception of you. If you want to understand what I do and why, then you must come to grips with this reality.]
CRC piece values tournament http://www.symmetryperfect.com/pass/ Just push the 'download now' button. Game #1 Scharnagl vs. Muller 10 minutes per move SMIRF MS-174c Result- inconclusive. Draw after 87 moves by black. Perpetual check declared.
'This discussion is pointless.' On this one occasion, I agree with you. However, I cannot just let you get away with some of your most outrageous remarks to date. So, unfortunately, this discussion is not yet over. ____________________________________________ 'First you should have results, then it becomes possible to talk about what they mean. You have no result.' Of course, I have a result! The result is obviously the game itself as a win, loss or draw for the purposes of comparing the playing strengths of two players using different sets of CRC piece values. The result is NOT statistical in nature. Instead, the result is probabilistic in nature. I have thoroughly explained this purpose and method to you. I understand it. Reinhard Scharnagl understands it. You do not understand it. I can accept that. However, instead of admitting that you do not understand it, you claim there is nothing to understand. ______________________________________ 'Two sets of piece values as different as day and night, and the only thing you can come up with is that their comparison is 'inconclusive'.' Yes. Draws make it impossible to determine which of two sets of piece values is stronger or weaker. However, by increasing the time (and plies) per move, smaller differences in playing strength can sometimes be revealed with 'conclusive' results. I will attempt the next pair of Scharnagl vs. Muller and Muller vs. Scharnagl games at 30 minutes per move. Knowing how much you appreciate my efforts on your behalf motivates me. ___________________________________________________ 'Talk about pathetic: even the two games you played are the same.' Only one game was played. The logs you saw were produced by the Scharnagl (standard) version of SMIRF for the white player and the Muller (special) version of SMIRF for the black player. The game is handled in this manner to prevent time from being expired without computation occurring. ___________________________________________________ '... does your test setup s*ck!' What, now you hate Embassy Chess too? Take up this issue with Kevin Hill.
Since I had to endure one of your long bedtime stories (to be sure), you are going to have to endure one of mine. Yet unlike yours [too incoherent to merit a reply], mine carries an important point: Consider it a test of your common sense- Here is a scenario ... 01. It is the year 2500 AD. 02. Androids exist. 03. Androids cannot tell lies. 04. Androids can cheat, though. 05. Androids are extremely intelligent in technical matters. 06. Your best friend is an android. 07. It tells you that it won the lottery. 08. You verify that it won the lottery. 09. It tells you that it purchased only one lottery ticket. 10. You verify that it purchased only one lottery ticket. 11. The chance of winning the lottery with only one ticket is 1 out of 100 million. 12. It tells you that it cheated to win the lottery by hacking into its computer system immediately after the winning numbers were announced, purchasing one winning ticket and back-dating the time of the purchase. ____________________________________________ You have only two choices as to what to believe happened- A. The android actually won the lottery by cheating. OR B. The android actually won the lottery by good luck. The android was mistaken in thinking it successfully cheated. ______________________________________________________ The chance of 'A' being true is 99,999,999 out of 100,000,000. The chance of 'B' being true is 1 out of 100,000,000. ________________________________________________ I would place my bet upon 'A' being true because I do not believe such unlikely coincidences will actually occur. You would place your bet upon 'B' being true because you do not believe such unlikely coincidences have any statistical significance whatsoever. _________________________________________ I make this assessment of your judgment ability fairly because you think it is a meaningless result if a player with one set of CRC piece values wins against its opponent 10-times-in-a-row even as the chance of it being 'random good luck' is indisputably only 1 out of 1024. By the way ... base 2 to exponent 100 equals 1,267,650,600,228,229,401,496,703,205,376. Can you see how ridiculous your demand of 100 games is?
'Is this story meant to illustrate that you have no clue as to how to calculate statistical significance?' No. This story is meant to illustrate that you have no clue as to how to calculate probabilistic significance ... and it worked perfectly. ________________________________________________________ There you go again. Missing the point entirely and ranting about probabilities not being proper statistics.
To anyone who was interested ... My playtesting efforts using SMIRF have been suspended indefinitely due to a serious checkmate bug which tainted the first game at 30 minutes per move between Scharnagl's and Muller's sets of CRC piece values.
Since Muller's Joker80 has recently established itself via 'The Battle Of The (Unspeakables)' tournament as the best free CRC program in the world, I checked it out. I must report that setting-up Winboard F (also written by Muller) to use it was straight-forward with helpful documentation. Generally, I am finding the features of Joker80 to be versatile and capable for any reasonable uses.
Muller: I would like to conduct two focused playtests using Joker80 at very long time controls (e.g., 30 minutes per move) to investigate these important questions- 1. Is Muller's rook value within the CRC set too low? 2. Is Scharnagl's archbishop value within the CRC set too low? I would need for you to compile special versions of Joker80 for me using significantly different values for those CRC pieces as well as Scharnagl's CRC piece set. To isolate the target variable, these games would be Muller (standard values) vs. Muller (test values) and Scharnagl (standard values) vs. Scharnagl (test values) via symmetrical playtesting. Anyway, we can discuss the details if you are interested or willing. Please let me know.
Muller: Please investigate this potentially serious bug I may have discovered while testing Joker80 under Winboard F ... Bugs, Bugs, Bugs! http://www.symmetryperfect.com/pass I am having a hard time with software today.
'Human vs. engine play is virtually untested. Did you at any point of the game use 'undo' (through the WinBoard 'retract move')?' Yes. Many of us error-prone humans use it frequently. ________________________________________________ 'This is indeed something I should fix but the current work-around would be not to use 'undo'.' Makes sense to me. I can avoid using the 'retract move' command altogether. ________________________________________________________ 'I could make a Joker80 version that reads the piece base values from a file 'joker.ini' at startup. Then you could change them to anything you want to test, without the need to re-compile. Would that satisfy your needs?' Yes, better than I ever imagined. Thank you!
Everything is working fine. Thank you! I now have 12 instances of the Joker80 program running in various sub-directories of Winboard F with the 'winboard.ini' file set to conveniently initiate any desired standard or special material values for the CRC models by Muller, Scharnagl and Nalls. In the first test, I am going to attempt to find a playtesting time where a distinct seperation in playing strength occurs between the standard Muller model wherein the rook is 1 pawn more valuable than the bishop and a special Muller model wherein the rook is 2 pawns more valuable than the bishop. If I successfully find a playtesting time that is survivable by humans, then we can hopefully establish a tentative probability as to which CRC model plays decisively better after a few-several games. At par 100 (for the pawn), the bishop is at 459 under both models with the rook at 559 under the standard Muller model and 659 under the special Muller model. I want to playtest a special Muller model with a rook value 2.00 pawns higher than the bishop because the Nalls model has a rook value 2.19 pawns higher than the bishop and the Scharnagl model has a rook value 1.94 pawns higher than the bishop (for an average of 2.06 pawns). Since I am attempting to test for such a small difference in the material value of only one type of piece (the rook), I have doubts that I will be able to obtain conclusive results. In any case ... If I obtain conclusive results, then very long time controls will surely be required to produce them.
Of course, I would bet anything that there are no 1:1 exchanges supported under the standard Muller CRC model that could cause material losses. If that were the case, yours would not be one of the three most credible CRC models under close consideration. In fact, even your excellent Joker80 program would play poorly if stuck with using faulty CRC piece values. Obviously, the longer the exchange, the rarer its occurrence during gameplay. The predominance of simple 1:1 exchanges over even the least complicated, 1:2 or 2:1 exchanges, in gameplay is large although I do not know the stats. In fact, there is a certain 1:2 or 2:1 exchange I am hoping to see that is likely to support my contention that the Muller rook value should be higher: the 1 queen for 2 rooks or 2 rooks for 1 queen exchange. Please recall that under the standard Muller model, this is an equal exchange. However, under asymmetrical playtesting of comparable quality to and similar to that I used to confirm the correctness of your higher archbishop value, I played numerous CRC games at various moderate time controls where the player without 1 queen (yet with 2 rooks) defeated the player without 2 rooks (yet with 1 queen). Ultimately, a key mechanism to conclusive results is that while the standard Muller model is neutral toward a 2 rook : 1 queen or 1 queen : 2 rook exchange, the special Muller model regards its 1 queen as significantly less valuable than 2 rooks of its opponent. Consequently, this contrast in valuation could be played into ... and we would see who wins. I am actually pleased that you are a realist who shares my pessimism in this experiment. In any case, low odds do not deter a best effort to succeed. The main difference between us is that you calculate your pessimism by extreme statistical methods whereas I calculate my pessimism by moderate probabilistic methods. I remain hopeful that eventually I will prove to you that the method Scharnagl & I developed is occasionally productive.
Muller: Please confirm that these are legal values for the 'winboard.ini' file. /firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22 P100=353=459=559=1029=1059=1118' 'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22 P100=353=459=659=1029=1059=1118' 'C:\winboard-F\Joker80\w\S-st\w-S-st 22 P100=306=363=557=702=912=960' 'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22 P100=306=363=557=866=912=960' 'C:\winboard-F\Joker80\w\N-st\w-N-st 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\TJchess\TJChess10x8' } /secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22 P100=353=459=559=1029=1059=1118' 'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22 P100=353=459=659=1029=1059=1118' 'C:\winboard-F\Joker80\b\S-st\b-S-st 22 P100=306=363=557=702=912=960' 'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22 P100=306=363=557=866=912=960' 'C:\winboard-F\Joker80\b\N-st\b-N-st 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\TJchess\TJChess10x8' }
As I moved to renormalize all of the values used in Joker80 (written into the 'winboard.ini' file) with the pawn at a par of 85 points, I looked at my notes again. They reminded me that your use of the 'bishop pair' refinement (with a bonus of 40 points) ramifies that the material value of the rook is either 1.00 pawns or 1.47 pawns greater than the material value of the bishop in CRC, depending upon whether or not only one bishop or both bishops, respectively, remain in the game. At that point, I realized that I would be attempting to playtest for a discrepancy that I know from experience is just too small to detect even at very long time controls. So, this planned test has been cancelled. I am not implying that this matter is unimportant, though. I remain concerned for the standard Muller model whenever it allows the exchange of its 2 rooks for 1 queen belonging to its opponent.
Muller: Please have another look at this except from my 'winboard.ini' file. There are standard and special versions of piece values by Muller, Scharnagl & Nalls for the white and black players renormalized to pawn = 85 points. The special version of the Muller model has a rook value exactly 85 points or 1.00 pawn higher than the standard version. The special version of the Scharnagl model has an archbishop value (736 points) at appr. 95% of the archbishop value (775 points) instead of 597 points at appr. 77% for the standard version. The special version of the Nalls model is identical to the standard version until some test is needed and planned. Since I assume that the 'bishop pairs bonus' is hardwired into Joker80, 40 points has been subtracted from the model-independant, material values of the bishop under all three models. Is this correct? _____________________________________________________ /firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\w\S-st\w-S-st 22 P85=260=269=474=597=775=816' 'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22 P85=260=269=474=736=775=816' 'C:\winboard-F\Joker80\w\N-st\w-N-st 22 P85=262=279=505=799=815=876' 'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22 P85=262=279=505=799=815=876' 'C:\winboard-F\TJchess\TJChess10x8' } /secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\b\S-st\b-S-st 22 P85=260=269=474=597=775=816' 'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22 P85=260=269=474=736=775=816' 'C:\winboard-F\Joker80\b\N-st\b-N-st 22 P85=262=279=505=799=815=876' 'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22 P85=262=279=505=799=815=876' 'C:\winboard-F\TJchess\TJChess10x8' }
'If I were you, I would normalize all models to Q=950 but then replace the pawn value everywhere by 85.' Since this is what you (the developer of Joker80) recommend as optimum, this is what I will do. Are you sure that replacing any pawn values different than 85 points after renormalization to queen = 950 points still renders an accurate and complete representation, more or less, of the Scharnagl and Nalls models? At par of queen = 950 points, the pawn value in the Nalls model is not represented as being only 92.19% as high as that in the Muller model and the pawn value in the Scharnagl model is not represented as being only 98.95% as high as that in the Muller model. Thru it all ... If a perfect representation is not quite possible, I can accept that without reservation. __________________________________ 'I don't think you could say then that you deviate from the model as the models do not really specify which type of Pawn they use as a standard.' Correctly calculating pawn values at the start of the game (much less, throughout the game) requires finesse as it is indeed a complex issue. In fact, its excessively complexity is the reason my 66-page paper on material values of pieces is silent in the case of calculating pawn values in FRC & CRC. Instead, someone needs to read an entire book from an outside source about calculating the material values of the pieces in Chess to sufficiently understand it. Personally, I am content with the test situation as long as Joker80 handles all pawns under all three models initially valued at 85 points as fairly and equally as realistically possible. I cannot speak for Reinhard Scharnagl at all, though. ________________________________________________ 'The way you did it now would make the first Bishop to be traded of the value the model prescribes, but would make the second much lighter. If you would subtract half the bonus, then on the average they would be what the model prescribes.' Now, I understand better. It makes sense. [I am glad I asked you.] Yes, I will subtract 20 points (1/2 of the 'bishop pair bonus') from the model-independant, material values for the bishop under the Scharnagl & Nalls models.
Muller: Here is my latest revision to my 'winboard.ini' file. Are these piece values acceptable to you? Do you think these piece values will work smoothly with Joker80 running under Winboard F yet remain true to all three models? ______________________________________________________ /firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\w\S-st\w-S-st 22 P85=302=339=551=694=902=950' 'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22 P85=302=339=551=857=902=950' 'C:\winboard-F\Joker80\w\N-st\w-N-st 22 P85=284=326=548=866=884=950' 'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22 P85=284=326=548=866=884=950' 'C:\winboard-F\TJchess\TJChess10x8' } /secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\b\S-st\b-S-st 22 P85=302=339=551=694=902=950' 'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22 P85=302=339=551=857=902=950' 'C:\winboard-F\Joker80\b\N-st\b-N-st 22 P85=284=326=548=866=884=950' 'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22 P85=284=326=548=866=884=950' 'C:\winboard-F\TJchess\TJChess10x8' }
Originally, I planned two 'internal playtests'. [By this self-invented term I mean playtests of the standard model of a person against a special model that I have compelling reasons to think may be superior by a provable margin.] The first planned test involves the standard CRC model of Muller against a special CRC model with a higher, closer-to-conventional rook value. Upon closer examination, I suspected that the discrepancy was possibly too small to be detected even with very long time controls. So, I announced that this test was cancelled. Notwithstanding, I may change my mind and return to this unsolved mystery if Joker80 demonstrates unusually-high aptitude as a playtesting tool. This might require very deep runs of moves with a completion time of a few weeks to a few months per pair of games to achieve conclusive results. The second planned test involves the standard CRC model of Scharnagl against a special CRC model with a higher, unconventional archbishop value. Scharnagl currently assigns the archbishop with a material value of appr. 77% that of the chancellor in his standard CRC model. Muller currently assigns the archbishop with a material value of greater than 97% that of the chancellor in his standard CRC model. Nalls currently assigns the archbishop with a material value of lesser than 98% that of the chancellor in his standard CRC model. I devised a special CRC model using identical material values for every piece in the standard CRC model by Scharnagl except that it assigns the archbishop with a material value of exactly 95% that of the chancellor (18% or 1.65 pawns higher). [Note that this figure is slightly more moderate than those by Muller & Nalls.] A discrepancy this large should be detectable at short-moderate time controls. This test is now underway. If either of these tests are successful at establishing or implicating a probability that the special models play stronger than the standard models, then revisions to the standard models may occur. At that juncture, we would be ready to begin 'external playtests'. [By this self-invented term I mean playtests of the standard models of different persons against one another.]
25 comments displayed
Permalink to the exact comments currently displayed.