[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments by DerekNalls

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Aberg variation of Capablanca's Chess. Different setup and castling rules. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

Derek Nalls wrote on Sun, Apr 20, 2008 07:00 AM UTC:

I prefer to await Scharnagl's expert opinion on why SMIRF does not win every game under computer tournament conditions where time allowances per move are extremely small. Of course, this precariously presumes that he wishes to comment after your wholesale insulting remarks toward SMIRF.

This 'Battle Of The Goths Championship 2008' is a strange exercise with AI results that are virtually worthless theoretically due to the 'virtually instantaneous' execution of moves for each program involved. I suspect that the entire purpose of this 'tournament' is contrived to seem to demonstrate the universal superiority of Joker80 where, in fact, the limited superiority at 'speed chess' only- a trivial achievement- is being demonstrated.
______________

'The ones who determine the facts through accurate measurement thus contribute in an absolutely essential way to our understanding, as without such facts the theoreticians cannot even start their work.'

Exactly! That is how I constructed my theory. How did you construct your theory? Let's discuss 'accurate measurement' in a bit more detail since you claim to know so much about it.

Except for opening books, endgame tablebases and some 'very obvious, checkmate or no choice moves' that may arise within the midgame, it is generally true that there is a direct relation between the maximum search time a computer is allowed per move [Ply depth completion is actually the more important criterion but it is a function of time.] and the quality of the moves found or generated.

Since

every move within a game is important and potentially, critically so
(although the importance of the first move of the game is greatest and the
importance of the last move of the game is least)

AND

every move during the game depends critically upon all previous moves by both players for its best chance of being a successful step toward the goal of victory,

it is critically important that every move generated via computer AI be of the highest quality possible for the results to have the highest chances of being theoretically instructive, relevant and valuable instead of mostly-purely random.

Otherwise, you have not adequately distilled each side to play as resourcefully as possible to definitively determine which side probably possesses the ultimate advantage or disadvantage via your gametests with different armies. This is a vital prerequisite to enable you to derive relative piece values that are reliable at all.

This is true to the extreme for chess variants related the Chess such as
Capablanca chess variants for which the game-winning objective is to capture a single royal piece (i.e., king) regardless of material sacrifice. Consequently, the levels of depth and irony inherent to chess variants of this type of design are very high.

The effectiveness of traps is based upon the fact that, upon naive inspection, what looks like 'the best move available upon the board' can, in reality, be 'the worst move available upon the board'. Obviously, it is critical to correctly distinguish between the two wherever they arise within a game. It is not just humans that are susceptible to falling into traps. Chess supercomputers have made similar mistakes. [See Kasparov vs. Deep Blue I.]

For example, an 8-ply search completion may lead a computer to recommend
a very bad move that it would never recommend if allowed a 10-ply search
completion. However, the deeper the search ply completion, the less likely for a 'dramatic irony' of this type to exist and remain dangerously undiscovered.

So ... what do you think you have accomplished by generating 20,000+ very badly played games (obviously) via ultra-fast, ultra-shallow depth moves?
This monumental exercise in 'mindless woodpushing' can only have a
statistically random effect reflecting the tendencies of the individual chess programs involved to spit-out moves when forced to do so before being given adequate time to explore enough plies to play any better than the moron level. This could have some minimal value if chess variants related to chess were games well-suited for morons to play competently. However, they are well-suited only for genii to play competently ... albeit usually and only with extensive training, effort and experience.

In summary, you might as well blow the pieces across the gameboard with strong fans. The results of this type of 'mindless woodpushing' would be only slightly less significant to your misguided effort to devise the 'most accurate relative piece values for CRC in existence' (by your claim) than this method you are presently using.
________________________________________________

'Why would I read a 58-page monologue from someone adhering to such flawed logic? It can only be a waste of time.'

I wish to echo Scharnagl's remark that (paraphrased) 'my published model is still a work in progress'. Nonetheless ... If you fail to read anything, then you fail to learn anything. You should be able to learn something from my mis-steps as well as my correct steps.

The figures within everyone else's published models for the relative piece values of CRC pieces upon the 10 x 8 board implicitly agree with mine that the archbishop is significantly less valuable than the chancellor. So, you are not just characterizing my published work as 'worthless nonsense'. Logically, you must also be characterizing the published works of everyone else of note (namely, Aberg, Trice, Scharnagl) likewise for the single reason that we do not share your radical view that the archbishop and the chancellor have appr. equal value.

Furthermore, under your model, the archbishop is only a little less valuable than the queen which is another radical contention on your part that demands much defense.

Derek Nalls wrote on Sun, Apr 20, 2008 06:20 PM UTC:

'... the Battle-of-the-Goths tournament was played at 1 hour per game per side (55'+5'/move, the time on the clocks is displayed in the viewer). And
you call it speed Chess. Poof, there goes half your argument up in smoke.'

Sorry, I could not find the time per move on your crude web page.

Nonetheless, less than 1 minute per move is much too short to yield quality moves ... at least by anything better than low standards.
_________________________________________________________

'Not that it was any good to begin with: it is well known and amply tested
that the quality of computer play only is a very weak function of time
control.'

WRONG!

The quality of computer play correlates strongly as a function of ply depth completion which, in turn, is a function of time where exponentially greater time is generally required to complete each successive ply.
___________________________________________________________________

'The fact that you ask how 'my theory was constructed' is shocking.
Didn't you notice I did not present any theory at all?'

In fact, I have noticed that you have failed to present a theory to date.
I apologize for politely yet incorrectly giving you the benefit of the doubt that you had developed any theory at all unpublished but somewhere within your mind. Do you actually prefer for me to state or imply that you are clueless even as you claim to be the world's foremost authority on the subject and claim the rest of us are stupid? Fine then.
____________________________________________________________

'I just reported my OBSERVATION that quiet positions with C instead of A
do not have a larger probability to win the game, and that in my opinion
thus any concept of 'piece value' that does not ascribe nearly equal value to A and C is worse than useless.'

When you speak of what is needed to 'win the game' you are fixating upon the mating power of pieces which translates to endgame relative piece values- NOT opening game or midgame relative piece values. Incidentally, relative piece values during the opening game are more important than during the midgame which, in turn, are more important than during the endgame. Furthermore, I am particularly wary about the use of relative piece values at all during the endgame since any theoretically deep possibility to achieve checkmate (regardless of material sacrifices), discovered or undiscovered, renders relative piece values an absolutely non-applicable and false concept.

I strongly recommend that you shift your attention oppositely to the supremely-important opening game to derive more useful relative piece values.
_______

'So what have I think I proved by the battle-of-the-Goths long TC tourney
about the value of A and C? Nothing of course! Did I claim I did? No,
that was just a figment of your imagination!'

I did not claim that I knew exactly how your ridiculous idea that an
archbishop is appr. equally valuable to a chancellor originated. This 'tournament' of yours that I criticized just seems to be a part of your 'delusion maintenance' belief system.
__________________________________________

'It might be of interest to know that prof. Hyatt develops Crafty
(one of the best open-source Chess engines) based on 40/1' games,
as he has found that this is as accurate as using longer TC for relative
performance measurement, and that Rybka (the best engine in the World)
is tuned through games of 40 moves per second.'

Now, you are completely confusing a method for QUICKLY and easily testing a computer hardware and software system to make sure it is operating properly with a method for achieving AI games consisting of highest quality moves of theoretical value to expert analysts of a given chess variant.

I have already explained some of this to you. Gawd!
____________________________________________________

'The method you used (testing the effect of changing the piece values,
rather than the effect of changing the pieces) is highly inferior, and
needs about 100 times as many games to get the statistical noise down to
the same level as my method. (Because in most games, the mis-evaluated
pieces would still be traded against each other.)'

First, you are falsely inventing stats out of thin air!

If you really were competent with statistics, then you would know the
difference between their proper and improper application within your
own work attempting to derive accurate relative piece values.

Second, you do not recognize (due to having no experience) the surprisingly great frequency with which a typical game between two otherwise-identical versions running a quality program with contrasting relative piece values will play into each other's most significant differences in the values of a piece.

Here is a hypothetical example ...

If white (incorrectly) values a rook significantly higher than an archbishop

AND

If black (correctly) values an archbishop significantly higher than a rook,

then the trade of white archbishop for a black rook will be readily
permitted by both programs and is very likely to actually occur at some point during a single game or a couple-few games at most.

Consequently, all things otherwise equal, white will probably lose most
games which is indicative of a problem somewhere within its set of
relative piece values (compared to black).
__________________________________________

'If you are not prepared to face the facts, this discussion is pointless.'

When I reflect your remark back to you, I agree completely.
___________________________________________________________

'Play a few dozen games with Smirf, at any time control you feel
trustworthy, where one side lacks A and the other B+N, and see who is
crushed.'

relative piece values
opening game
(bishop pairs intact)

Muller

pawn 10.00
knight 35.29
bishop 45.88
rook 55.88
archbishop 102.94
chancellor 105.88
queen 111.76

Nalls

pawn 10.00
knight 30.77
bishop 37.56
rook 59.43
archbishop 70.61
chancellor 94.18
queen 101.60

So, what is your problem? Both of our models are in basic agreement on this issue. There is no dispute between us. [I hate to disappoint you.]

What you failed to take into account (since you refuse to educate yourself via my paper) is the 'supreme piece(s) enhancement' within my model. My published start-of-the-game relative piece values are not the final word for a simplistic model. My model is more sophisticated and adaptable with some adjustments required during the game.

For CRC, the 3 most powerful pieces in the game (i.e., archbishop,
chancellor, queen) share, by a weighted formula, a 12.5% bonus which
contributes to 'practical attack values' (a component of material values
under my model). Moreover, the shares for each piece of the 12.5% bonus
typically increase, by a weighted formula, during the game as some of the
3 most powerful pieces are captured and their share(s) is inherited by the
remaining piece(s). Thus, if the archbishop becomes the only remaining,
most powerful piece, then it becomes much more valuable than the
combined values of the bishop and knight.

Notwithstanding, I'll bet you still think my model is 'worthless nonsense'.
Right?

In the future, please do the minimal fact finding prerequisite to making
sense in what you are arguing about?
____________________________________

'... the rest of the World beware that your theory of piece values
sucks in the extreme!'

No, it does not. Your self-described 'far less than a theory, only an
observation' comes close, though.

Derek Nalls wrote on Mon, Apr 21, 2008 05:32 AM UTC:

'So the ply depth depends only logarithmically on search time,
which is VERY WEAKLY.

So if you had wanted to show any understanding of the matter at hand, you
should have written RIGHT! instead of WRONG! above it...'
______________________________________________________

'... it is well known and amply tested that the quality of computer play only is a very weak function of time control.'
____________________________________

I disagreed with your previous remark only because it was misleadingly,
poorly expressed.  You made it sound as if you barely realized at all that
the quality of computer play is a function of search time.  Obviously, you do.  So, here is the correction you demand and deserve ....

RIGHT!
_______

'Absolute nonsense. Most Capablanca Chess games are won by annihilation of
the opponents Piece army, after which the winning side can easily push as
many Pawns to promotion as he needs to perform a quick mate.
Closely-matched end-games are relatively rare, and mating power almost
plays no role at all. As long as the Pawns can promote to pieces with
mating power, like Queens.'

Very well.  I spoke incorrectly when I creditted you with foolishly assigning the archbishop nearly equal value to the chancellor due mainly to its decent mating power, relevant mainly in endgames ... sometimes.
You are even more foolish than that.  You actually think the archbishop 
has nearly equal value to the chancellor throughout the game- in the opening game and mid-game as well.  Wow!

By the way, please add IM Larry Kaufmann to your dubious list of 
'insufferably stupid people' who disagree with your relative piece values
in CRC:

http://en.wikipedia.org/wiki/Gothic_Chess
___________________________________________________

'... But let's cut the beating around the bush ...'

Good idea!

I have now completely run out of patience with your endless inept, 
amateurish attempts to discredit my work.  Not because you disagree.
Not even because you are unnecessarily rude and disrespectful.  Instead, 
strictly because you have NOT done your homework!  You refuse to 
read the same 58-page paper you are confidently grading with an 'F'.  

Consequently, virtually all of your criticisms to date about my model 
for calculating relative piece values have been incorrect, irrelevant
and/or irrational.  When/If you ever address concerns about my method
that I can identify as making sense and knowing at least what you are
talking about, then I will politely answer them.  Until then, my side of 
this conversation is closed.

Piece Values[Subject Thread] [Add Response]

Derek Nalls wrote on Sat, Apr 26, 2008 11:05 PM UTC:

Since ...

A.  The argumentative posts of Muller (mainly against Scharnagl & Aberg)
in advocacy of his model for relative piece values in CRC are
neverending.

B.  My absence from this melee has not spared my curious mind the agony of
reading them at all.

... I hope I can help-out by returning briefly just to point-out the six
most serious, directly-paradoxical and obvious problems with Muller's
model.

1.  The archbishop (102.94) is very nearly as valuable as the chancellor
(105.88)- 97.22%.

2.  The archbishop (102.94) is nearly as valuable as the queen (111.76)-
92.11%.

3.  One archbishop (102.94) is nearly as valuable as two rooks (2 x
55.88)- 92.11%.  In other words, one rook (55.88) is only a little more
than half as valuable as one archbishop (102.94)- 54.28%.

4.  Two rooks (2 x 55.88) have a value exactly equal to one queen
(111.76).

5.  One knight (35.29) plus one rook (55.88) are markedly less valuable
than one archbishop (102.94)- 88.57%.

6.  One bishop (45.88) plus one rook (55.88) are less valuable than one
archbishop (102.94)- 98.85%.

None of these problems exist within the reputable models by Nalls,
Scharnagl, Kaufmann, Trice or Aberg.  You must honestly address all of
these important concerns or realistically expect to be ignored.

Derek Nalls wrote on Wed, Apr 30, 2008 02:20 AM UTC:

A substantial revision and expansion has recently occurred.

universal calculation of piece values
http://www.symmetryperfect.com/shots/calc.pdf
66 pages

Only three games have relative piece values calculated using this complex
model: FRC, CRC and Hex Chess SS (my own invention). Furthermore, I only
confidently consider my figures somewhat reliable for two of these games, FRC (including Chess) and Capablanca Random Chess, because much work has been done by many talented individuals (hopefully, including myself) as well as computers to isolate reliable material values. This dovetails into the reason that I do not take requests. I have absolutely no assurance that the effort spent outside these two established testbeds is productive at all. If it is important to you to know the material values for the pieces within your favorite chess variant (according to this model), then you must calculate them yourself.

Under the recent changes to this model, the material values for FRC pieces
and Hex Chess SS pieces remained exactly the same. However, the material
values for a few CRC pieces changed significantly:

Capablanca Random Chess
material values for pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

pawn 10.00
knight 30.77
bishop 37.56
rook 59.43
archbishop 93.95
chancellor 95.84
queen 103.05

Focused, intensive playtesting on my part has proven Muller to be correct
in his radical, new contention that the accurate material value of the
archbishop is extraordinarily, counter-intuitively high. I think I have
successfully discovered a theoretical basis which is now explained within
my 66-page paper.

All of the problems (that I am presently aware of) within my set of CRC
material values have now been solved. Some problems remain within
Muller's set. I leave it to him whether or not to maturely discuss them.

Derek Nalls wrote on Thu, May 1, 2008 02:47 AM UTC:

As far as playtesting goes ...

Admittedly, my initial intention was just to amuse myself by 
disproving the consistency of Muller's unusually-high archbishop 
material value in relation to other piece values within his CRC set.
If indeed his archbishop material value had been as fictitious as it 
was radical, then this would have been readily-achievable 
using any high-quality chess variant program such as SMIRF.
No matter what test I threw at it, this never happened.

Previously, I have only used 'symmetrical playtesting'.
By this I mean that the material and positions of the pieces
of both players have been identical relative to one another.
This is effective when playing one entire set of CRC piece values
against another entire set as, for example, Reinhard Scharnagl & I
have done on numerous occasions.  The player that consistently 
wins all deep-ply (long time per move) games, alternatively playing 
white and black, can be safely concluded to be the player using 
the better of the two sets of CRC piece values since this single 
variable has been effectively isolated.  However, this playtesting
method cannot isolate which individual pieces within the set 
carry the most or least accurate material values.

In fact, I had no problem with Muller's set of CRC piece values
as a whole.  The order of the material values of all of the CRC 
pieces was-is correct.  However, I had a large problem with his
material value for the archbishop being nearly as high as for
the chancellor.  

To pinpoint an unreasonably-high material value for only one 
piece within a CRC set required 'asymmetrical playtesting'.  
By this I mean that the material and positions of the pieces 
of both players had to be different in an appropriate manner to
test the upper and lower limits of the material value for a certain 
piece (e.g., archbishop).  This was achieved by removing select
pieces from both players within the Embassy Chess setup so that 
BOTH players had a significant material advantage consistent
with different models (i.e., Scharnagl set vs. Muller set).  
This was possible strictly because of the sharp contrast between the 
'normal, average' and 'very high', respectively, material values 
for the archbishop assigned by Scharnagl and Muller.  The fact
that the SMIRF program implicitly uses the Scharnagl set to play
both players is a control variable- not a problem- since it is 
insures equality in the playing strength with which both players
are handled.  The player using the Scharnagl set lost every game 
using SMIRF MS-173h-X ... regardless of time controls, 
white or black player choice and all variations in excluded pieces 
that I could devise.

I thought it was remotely possible that an intransigent, positional 
advantage for the Muller set somehow happened to exist within the 
modified Embassy Chess setup that was larger than its material 
disadvantage.  This type of catastrophe can be the curse of 
'asymmetrical playtesting'.  So, I experimented likewise using a 
few other CRC variants.  Same result!  The Scharnagl set lost every 
game.

I seriously doubt that all CRC variants (or at least, the games I tested)
are realistically likely to carry an intransigent, positional advantage 
for the Muller set.  If this is true, then the Muller set is provably, 
ideally suited to CRC, notwithstanding- just for a different reason.

Finally, I reconsidered my position and revised my model.

Derek Nalls wrote on Fri, May 2, 2008 11:31 AM UTC:

For the reasons you describe (which I mostly agree with), I do not ever use
'asymmetrical playtesting' unless that method is unavoidable.  However,
you should know that I used many permutations of positions within my
'missing pieces' test games to try to average-out positions that may
have pre-set a significant positional advantage for either player.  

Yes, the fact that SMIRF currently uses your (Scharnagl) material values
with a 'normal, average' material value for the archbishop instead of a
'very high' material value (as well as the interrelated positional value
given to the archbishop with SMIRF) means that both players will place
greater effort than I think is appropriate into avoiding being forced into
disadvantageous exchanges where they would trade their chancellor or queen
for the archbishop of the opponent.  Still, the order of your material
values for CRC pieces agrees with the Muller model (although an
archbishop-chancellor exchange is considered only slightly harmful to the
chancellor player under his model).  So, I think tests using SMIRF are
meaningful even if I disagree substantially with the material value for
one piece within your model (i.e., the archbishop).

Due to apprehension over boring my audience with irrelevant details, I did
not even mention within my previous post that I also invented a variety of
10 x 8 test games using the 10 x 8 editor available in SMIRF that were
unrelated to CRC.  

For example, one game consisted of 1 king & 10 pawns per player with 9
archbishops for one player and 8 chancellors or queens for another player.
 Under the Muller model, the player with the 9 archbishops had a
significant material advantage.  Under the Scharnagl model, the player
with the 8 chancellors or 8 queens had a significant material advantage. 
The player with the 9 archbishops won every game.

For example, one game consisted of 1 king & 20 pawns per player with 9
archbishops for one player and 8 chancellors or queens for another player.
 Under the Muller model, the player with the 9 archbishops had a
significant material advantage.  Under the Scharnagl model, the player
with the 8 chancellors or 8 queens had a significant material advantage. 
The player with the 9 archbishops won every game.

For example, one game consisted of 1 king & 10 pawns per player with 18
archbishops for one player and 16 chancellors or queens for another
player.  Under the Muller model, the player with the 18 archbishops had a
significant material advantage.  Under the Scharnagl model, the player
with the 16 chancellors or 16 queens had a significant material advantage.
 The player with the 18 archbishops won every game.

I have seen it demonstrated many times how resilient positionally the
archbishop is against the chancellor and/or the queen in virtually any
game you can create using SMIRF with a 10 x 8 board and a CRC piece set.

When Muller assures us that he is responsibly using statistical methods
similar to those employeed by Larry Kaufmann, a widely-respected
researcher of Chess piece values, I think we should take his word for it. 
Of course, I remain concerned about the reliability of his stats generated
via using fast time controls.  However, it has now been proven to me that
his method is at least sensitive enough to detect 'elephants' (i.e.,
large discrepancies in material values) such as exist between contrasting
CRC models for the archbishop even if it is not sensitive enough to detect
'mice' (i.e., small discrepancies in material values) so to speak.

Derek Nalls wrote on Fri, May 2, 2008 03:37 PM UTC:

Yes, your test example yields a result totally inconsistent with
everyone's models for CRC piece values. [I did not run any playtest
games of it since I trust you completely.] Yes, your test example could
cause someone who placed too much trust in it to draw the wrong conclusion
about the material values of knights vs. archbishops. The reason your test
example is unreliable (and we both agree it must be) is due to its 2:1 ratio of knights to archbishops. The game is a victory for the knights player simply because he/she can overrun the archbishops player and force materially-disadvantageous exchanges despite the fact that 4 archbishops indisputably have a material value significantly greater than 8 knights.

In all three of my test examples from my previous post, the ratios of
archbishops to chancellors and archbishops to queens were only 9:8. Note
the sharp contrast. Although I agree that a 1:1 ratio is the ideal goal, it was impossible to achieve for the purposes of the tests. I do not believe a slight disparity (1 piece) in the total number of test pieces per player is enough to make the test results highly unreliable. [Yes, feel free to invalidate my test example with 18 archbishops vs. 16 chancellors and 18 archbishops vs. 16 queens since a 2 piece advantage existed.] Although surely imperfect and slightly unreliable, I think the test results achieved thru 'asymmetrical playtesting' or 'games with different armies' can be instructive as long as the test conditions are not pushed to the extreme. Your test example was extreme. Two out of three of my test examples were not extreme.

Derek Nalls wrote on Fri, May 2, 2008 04:07 PM UTC:

Feel free to invalidate my other two test examples I (reluctantly)
mentioned as well.  

My reason is that having ranks nearly full of archbishops, chancellors or
queens in test games does not even resemble a proper CRC variant setup
with its variety and placement of pieces.  Therefore, those test results
cannot safely be concluded to have any bearing upon the material values of
pieces in any CRC variant.   

Your reason is well-expressed.

Derek Nalls wrote on Fri, May 2, 2008 04:35 PM UTC:

The feasibility of using identical armies to calculate piece values

It has been a long time since our sets of CRC piece values have played one
another (on my dual 2.4 Ghz CPU server) using otherwise-identical versions
of SMIRF.  Obviously, the reason is that it has been a long time since
there existed a large disparity within our material values for any one of
the CRC pieces.  Recently, that has changed in the case of the
archbishop.

I already have the standard version of SMIRF MS-174b-O which uses
Scharnagl CRC piece values.  Would you be willing to compile a special
version of SMIRF MS-174b-O for me which uses Nalls CRC piece values?

Capablanca Random Chess
material values of pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

Back on safe ground using 'symmetrical playtesting', the results of who
wins the test games should be indicative of who is using a better set of
CRC piece values.

Derek Nalls wrote on Fri, May 2, 2008 05:38 PM UTC:

I understand.  I wondered what the 'X' & 'O' designations for recent
SMIRF versions meant.  Do you still possess an older version of SMIRF (of
satisfactory quality to you) that uses your current CRC material values?

Since there is appr. 2-1/2 pawns difference between our models in our 
material values for the archbishop, I predict that my playtesting results
would probably be worthwhile and decisive.

Derek Nalls wrote on Fri, May 2, 2008 06:13 PM UTC:

Your revised material values for SMIRF look fine to me. I have written them down for safekeeping. Which version will you be compiling? Of course, I do not plan to playtest anyone's material values for pieces upon the 8 x 8 board- only material values for CRC pieces upon the 10 x 8 board.

Derek Nalls wrote on Sat, May 3, 2008 03:07 PM UTC:

I have adequate confidence in my latest material values to ask you to
publish them upon your web page (instead of my previous material values).

CRC
material values of pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

They are, in principle, similar to Muller's set for every piece except
that they run on a comparatively compressed scale.  Even though I have not
yet playtested them, I consider my tentative confidence rational (although
admittedly premature and risky) because I trust Muller's methods of
playtesting his own material values and I think my latest revisions to my
model are conceptually valid.

Aberg variation of Capablanca's Chess. Different setup and castling rules. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

Derek Nalls wrote on Sat, May 3, 2008 03:26 PM UTC:

Muller:

You have my best regards toward your worthwhile effort to publish your
empirical, statistical method for obtaining the material values of pieces
in the ICGA Journal.  My assessment is that it will surely be a much
better paper than the junk [name removed] published in the same journal
regarding piece values.

[The above has been edited to remove a name and/or site reference. It is
the policy of cv.org to avoid mention of that particular name and site to
remove any threat of lawsuits. Sorry to have to do that, but we must
protect ourselves. -D. Howe]

Piece Values[Subject Thread] [Add Response]

Derek Nalls wrote on Sat, May 3, 2008 05:20 PM UTC:

re:  Muller's assessment of 5 methods of deriving material values for CRC pieces

'I am not sure how much of the agreement between (3) and (4) can be
ascribed to the playtesting, and how much to the theoretical arguments
...'

As much playtesting as possible.  Unfortunately, that amount is deficient
by my standards (and yours).  I have tried to compensate for marginal
quantity with high quality via long time controls.  You use a converse
approach with opposite emphasis.  Given enough years (working with 
only one server), this quantity of well-played games may eventually 
become adequate.

' ... and it is not clear how well the theoretical arguments are able to
PREdict piece values rather than POSTdict them.'

You have pinpointed my greatest disappointment and frustration thusfar
with my ongoing work.  To date, my theoretical model has not made 
any impressive predictions verified by playtesting.  To the contrary,
it has been revised, expanded and complicated many times upon 
discovery that it was grossly in error or out of conformity with reality.

Although the foundations of the theoretical model are built upon 
arithmetic and geometry to the greatest extent possible with verifiable 
phenomena important to material values of pieces used logically for 
refinements, mathematical modelling can be misused to postulate and 
describe in detail the existence of almost any imaginable non-existent 
phenomena.  For example, the Ptolemy model of the solar system.

Derek Nalls wrote on Sun, May 4, 2008 06:38 AM UTC:

'I never found any effect of the time control on the scores I measure for
some material imbalance. Within statistical error, the combinations I
tries produced the same score at 40/15', 40/20', 40/30', 40/40',
40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I
did not consider it worth doing just to prove that it was a waste of
time...'
_________

The additional time I normally give to playtesting games to improve the
move quality is partially wasted because I can only control the time per
move instead of the number of plies completed using most chess variant
programs.  This usually results in the time expiring while it is working
on an incomplete ply.  Then, it prematurely spits out a move
representative of an incomplete tour of the moves available within that
ply at a random fraction of that ply.  Since there is always more than one
move (often, a few-several) under evaluation as being the best possible
move [Otherwise, the chosen move would have already been executed.], this
means that any move on this 'list of top candidates' is equally likely
to be randomly executed.

Here are two typical scenarios that should cover what usually happens:

A.  If the list of top candidates in an 11-ply search consists of 6 moves
where the list of top candidates in a 10-ply search consists of 7 moves,
then only 1 discovered-to-be-less-than-the-best move has been successfully
excluded and cannot be executed.  

Of course, an 11-ply search completion may typically require est. 8-10
times as much time as the search completions for all previous plies (1-ply
thru 10-ply) up until then added together.

OR

B.  If the list of top candidates in an 11-ply search consists of 7 moves
[Moreover, the exact same 7 moves.] just as the preceding 10-ply search, 
then there is no benefit at all in expending 8-10 times as much time.
______________________________________________________________

The reason I endure this brutal waiting game is not for purely masochistic
experience but because the additional time has a tangible chance (although
no guarantee) of yielding a better move with every occasion.  Throughout
the numerous moves within a typical game, it can be realistically expected
to yield better moves on dozens of occasions.

We usually playtest for purposes at opposite extremes of the spectrum 
yet I regard our efforts as complimentary toward building a complete 
picture involving material values of pieces.

You use 'asymmetrical playtesting' with unequal armies on fast time 
controls, collect and analyze statistics ... to determine a range, with a
margin of error, for individual material piece values.

I remain amazed (although I believe you) that you actually obtain any 
meaningful results at all via games that are played so quickly that the AI
players do not have 'enough time to think' while playing games so complex
that every computer (and person) needs time to think to play with minimal
competence.  Can you explain to me in a way I can understand how and why
you are able to successfully obtain valuable results using this method? 
The quality of your results was utterly surprising to me.  I apologize for
totally doubting you when you introduced your results and mentioned how you
obtained them.

I use 'symmetrical playtesting' with identical armies on very slow time
controls to obtain the best moves realistically possible from an
evaluation function thereby giving me a winner (that is by some margin
more likely than not deserving) ... to determine which of two sets of
material piece values is probably (yet not certainly) better. 
Nonetheless, as more games are likewise played ...  If they present a
clear pattern, then the results become more probable to be reliable, 
decisive and indicative of the true state of affairs.

The chances of flipping a coin once and it landing 'heads' are equal to
it landing 'tails'.  However, the chances of flipping a coin 7 times and
it landing 'heads' all 7 times in a row are 1/128.  Now, replace the
concepts 'heads' and 'tails' with 'victory' and 'defeat'.  I
presume you follow my point.

The results of only a modest number of well-played games can definitely
establish their significance beyond chance and to the satisfaction of 
reasonable probability for a rational human mind.  [Most of us, including
me, do not need any better than a 95%-99% success to become convinced that
there is a real correlation at work even though such is far short of an
absolute 100% mathematical proof.]

In my experience, I have found that using any less than 10 minutes per
move will cause at least one instance within a game when an AI player
makes a move that is obvious to me (and correctly assessed as truly being)
a poor move.  Whenever this occurs, it renders my playtesting results 
tainted and useless for my purposes.  Sometimes this occurs during a 
game played at 30 minutes per move.  However, this rarely occurs during 
a game played at 90 minutes per move.

For my purposes, it is critically important above all other considerations
that the winner of these time-consuming games be correctly determined 
'most of the time' since 'all of the time' is impossible to assure.
I must do everything within my power to get as far from 50% toward 100%
reliability in correctly determining the winner.  Hence, I am compelled to
play test games at nearly the longest survivable time per move to minimize
the chances that any move played during a game will be an obviously poor 
move that could have changed the destiny of the game thereby causing 
the player that should have won to become the loser, instead.  In fact, 
I feel as if I have no choice under the circumstances.

Derek Nalls wrote on Sun, May 11, 2008 10:05 PM UTC:

Before Scharnagl sent me three special versions of SMIRF MS-174c compiled
with the CRC material values of Scharnagl, Muller & Nalls, I began
playtesting something else that interested me using SMIRF MS-174b-O.

I am concerned that the material value of the rook (especially compared to
the queen) amongst CRC pieces in the Muller model is too low:

rook  55.88
queen  111.76

This means that 2 rooks exactly equal 1 queen in material value.

According to the Scharnagl model:

rook  55.71
queen  91.20

This means that 2 rooks have a material value (111.42) 22.17% greater than
1 queen.

According to the Nalls model:

rook  59.43
queen  103.05

This means that 2 rooks have a material value (118.86) 15.34% greater than
1 queen.

Essentially the Scharnagl & Nalls models are in agreement in predicting
victories in a CRC game for the player missing 1 queen yet possessing 2
rooks.  By contrast, the Muller model predicts draws (or appr. equal
number of victories and defeats) in a CRC game for either player.

I put this extraordinary claim to the test by playing 2 games at 10
minutes per move on an appropriately altered Embassy Chess setup with the
missing-1-queen player and the missing-2-rooks player each having a turn
at white and black.

The missing-2-rooks player lost both games and was always behind.  They
were not even long games at 40-60 moves.

Muller:

I think you need to moderately raise the material value of your rook in
CRC.  It is out of its proper relation with the other material values
within the set.

Derek Nalls wrote on Mon, May 12, 2008 07:06 PM UTC:

'You hardly have the possibility of trading it before there are open
files. So it stands to reason that you might as well use the higher value
during the entire game.'

Well, I understand and accept your reasons for leaving your lower rook
value in CRC as is. It is interesting that you thoroughly understand and
accept the reasons of others for using a higher rook value in CRC as
well. Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic?
_____________________________

'... if we both play a Q-2R match from the opening, it is a serious
problem if we don't get the same result. But you have played only 2
games. Statistically, 2 games mean NOTHING.'

I never falsely claimed or implied that only 2 games at 10 minutes per
move mean everything or even mean a great deal (to satisfy probability
overwhelmingly). However, they mean significantly more than nothing.
I cannot accept your opinion, based upon a purely statistical viewpoint,
since it is at the exclusion another applicable mathematical viewpoint.
They definitely mean something ... although exactly how much is not
easily known or quantified (measured) mathematically.
__________________________________________________

'I don't even look at results before I have at least 100 games, because
before they are about as likely to be the reverse from what they will
eventually be, as not.'

Statistically, when dealing with speed chess games populated
exclusively with virtually random moves ... YES, I can understand and
agree with you requiring a minimum of 100 games. However, what you
are doing is at the opposite extreme from what I am doing via my
playtesting method.

Surely you would agree that IF I conducted only 2 games with perfect
play for both players that those results would mean EVERYTHING.
Unfortunately, with state-of-the-art computer hardware and chess variant
programs (such as SMIRF), this is currently impossible and will remain
impossible for centuries-millennia. Nonetheless, games played at 100
minutes per move (for example) have a much greater probability of
correctly determining which player has a definite, significant advantage
than games played at 10 seconds per move (for example).

Even though these 'deep games' play of nowhere near 600 times better
quality than these 'shallow games' as one might naively expect
(due to a non-linear correlation), they are far from random events
(to which statistical methods would then be fully applicable).
Instead, they occupy a middleground between perfect play games and
totally random games. [In my studied opinion, the example
'middleground games' are more similar to and closer to perfect play
games than totally random games.] To date, much is unknown to
combinatorial game theory about the nature of these 'middleground
games'.

Remember the analogy to coin flips that I gave you? Well, in fact,
the playtest games I usually run go far above and beyond such random
events in their probable significance per event.

If the SMIRF program running at 90 minutes per move casted all of its
moves randomly and without any intelligence at all (as a perfect
woodpusher), only then would my 'coin flip' analogy be fully applicable.
Therefore, when I estimate that it would require 6 games (for example)
for me to determine, IF a player with a given set of piece values loses
EVERY game, that there is only a 63/64 chance that the result is
meaningful (instead of random bad luck), I am being conservative to the
extreme. The true figure is almost surely higher than a 63/64 chance.

By the way, if you doubt that SMIRF's level of play is intelligent and
non-random, then play a CRC variant of your choice against it at 90
minutes per move. After you lose repeatedly, you may not be able to
credit yourself with being intelligent either (although you should) ...
if you insist upon holding an impractically high standard to define the
word.
______

'If you find a discrepancy, it is enormously more likely that the result
of your 2-game match is off from its true win probability.'

For a 2-game match ... I agree. However, this may not be true for a
4-game, 6-game or 8-game match and surely is not true to the extremes
you imagine. Everything is critically dependant upon the specifications
of the match. The number of games played (of course), the playing
strength or quality of the program used, the speed of the computer and
the time or ply depth per move are the most important factors.
_________________________________________________________

'Play 100 games, and the error in the observed score is reasonable
certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.'

It would require est. 20 years for me to generate 100 games with the
quality (and time controls) I am accustomed to and somewhat satisfied
with. Unfortunately, it is not that important to me just to get you to
pay attention to the results for the benefit of only your piece values
model. As a practical concern to you, everyone else who is working to
refine quality piece values models in FRC and CRC will have likely
surpassed your achievements by then IF you refuse to learn anything from
the results of others who use different yet valid and meaningful methods
for playtesting and mathematical analysis than you.

Derek Nalls wrote on Tue, May 13, 2008 02:39 AM UTC:

'Of course, that is easily quantified. The entire mathematical field of
statistics is designed to precisely quantify such things, through
confidence levels and uncertainty intervals.'

No, it is not easily quantified. Some things of numerical importance
as well as geometric importance that we try to understand or prove
in the study of chess variants are NOT covered or addressed by statistics.
I wish our field of interest was that simple (relatively speaking) and
approachable but it is far more complicated and interdisciplinary.
All you talk about is statistics. Is this because statistics is all you
know well?
___________

'That difference just can't be seen with two games. Play 100.
There is no shortcut.'

I agree. Not with only 2 games.

However ...

With only 4 games, IF they were ALL victories or defeats for the player
using a given piece values model, I could tell you with confidence
that there is at least a 15/16 chance the given piece values model is
stronger or weaker, respectively, than the piece values model used by
its opponent. [Otherwise, the results are inconclusive and useless.]

Furthermore, based upon the average number of moves per game
required for victory or defeat compared to the established average
number of moves in a long, close game, I could probably, correctly
estimate whether one model was a little or a lot stronger or weaker,
respectively, than the other model. Thus, I will not play 100 games
because there is no pressing, rational need to reduce the 'chance of
random good-bad luck' to the ridiculously-low value of
'the inverse of (base 2 to exponent 100)'.

Is there anything about the odds associated with 'flipping a coin'
that is beyond your ability to understand? This is a fundamental
mathematical concept applicable without reservation to symmetrical
playtesting. In any case, it is a legitimate 'shortcut' that I can and
will use freely.
________________

'Even perfect play doesn't help. We do have perfect play for all 6-men
positions.'

I meant perfect play throughout an entire game of a CRC variant
involving 40 pieces initially. That is why I used the word 'impossible'
with reference to state-of-the-art computer technology.
_______________________________________________________

'This is approximately master-level play.'

Well, if you are getting master-level play from Joker80 with speed
chess games, then I am surely getting a superior level of play from
SMIRF with much longer times and deeper plies per move. You see,
I used the term 'virtually random moves' appropriately in a
comparative context based upon my experience.
_____________________________________________

'Doesn't matter if you play at an hour per move, a week per move,
a year per move, 100 year per move. The error will remain >=32%.
So if you want to play 100 years per move, fine. But you will still
need 100 games.'

Of course, it matters a lot. If the program is well-written, then the
longer it runs per move, the more plies it completes per move
and consequently, the better the moves it makes. Hence,
the entire game played will progressively approach the ideal of
perfect play ... even though this finite goal is impossible to attain.
Incisive, intelligent, resourceful moves must NOT to be confused with
or dismissed as purely random moves. Although I could humbly limit
myself to applying only statistical methods, I am totally justified,
in this case, in more aggressively using the 'probabilities associated
with N coin flips ALL with the same result' as an incomplete, minimum
value before even taking the playing strength of SMIRF at extremely-long
time controls into account to estimate a complete, maximum value.
______________________________________________________________

'The advantage that a player has in terms of winning probability is the
same at any TC I ever tried, and can thus equally reliably be determined
with games of any duration.'

You are obviously lacking completely in the prerequisite patience and
determination to have EVER consistently used long enough time controls
to see any benefit whatsoever in doing so. If you had ever done so,
then you would realize (as everyone else who has done so realizes)
that the quality of the moves improves and even if the winning probability
has not changed much numerically in your experience, the figure you
obtain is more reliable.

[I cannot prove to you that this 'invisible' benefit exists
statistically. Instead, it is an important concept that you need to
understand in its own terms. This is essential to what most playtesters do, with the notable exception of you. If you want to understand what I do and why, then you must come to grips with this reality.]

Derek Nalls wrote on Tue, May 13, 2008 03:38 AM UTC:

CRC piece values tournament
http://www.symmetryperfect.com/pass/

Just push the 'download now' button.

Game #1
Scharnagl vs. Muller
10 minutes per move
SMIRF MS-174c

Result- inconclusive.
Draw after 87 moves by black.
Perpetual check declared.

Derek Nalls wrote on Tue, May 13, 2008 03:08 PM UTC:

'This discussion is pointless.'

On this one occasion, I agree with you.

However, I cannot just let you get away with some of your most 
outrageous remarks to date.

So, unfortunately, this discussion is not yet over.
____________________________________________

'First you should have results, 
then it becomes possible to talk about what they mean. 
You have no result.'

Of course, I have a result!

The result is obviously the game itself as a win, loss or draw
for the purposes of comparing the playing strengths of two
players using different sets of CRC piece values.

The result is NOT statistical in nature.
Instead, the result is probabilistic in nature.

I have thoroughly explained this purpose and method to you.
I understand it.
Reinhard Scharnagl understands it.
You do not understand it.
I can accept that.
However, instead of admitting that you do not understand it,
you claim there is nothing to understand.
______________________________________

'Two sets of piece values as different as day and night, and the only
thing you can come up with is that their comparison is
'inconclusive'.'

Yes.  Draws make it impossible to determine which of two sets of
piece values is stronger or weaker.  However, by increasing the
time (and plies) per move, smaller differences in playing strength 
can sometimes be revealed with 'conclusive' results.

I will attempt the next pair of Scharnagl vs. Muller and Muller vs.
Scharnagl games at 30 minutes per move.  Knowing how much
you appreciate my efforts on your behalf motivates me.
___________________________________________________

'Talk about pathetic: even the two games you played are the same.'

Only one game was played.

The logs you saw were produced by the Scharnagl (standard) version
of SMIRF for the white player and the Muller (special) version of SMIRF
for the black player.  The game is handled in this manner to prevent 
time from being expired without computation occurring.
___________________________________________________

'... does your test setup s*ck!'

What, now you hate Embassy Chess too?
Take up this issue with Kevin Hill.

Derek Nalls wrote on Tue, May 13, 2008 04:18 PM UTC:

Since I had to endure one of your long bedtime stories (to be sure),
you are going to have to endure one of mine.  Yet unlike yours
[too incoherent to merit a reply], mine carries an important point:

Consider it a test of your common sense-

Here is a scenario ...

01.  It is the year 2500 AD.

02.  Androids exist.

03.  Androids cannot tell lies.

04.  Androids can cheat, though.

05.  Androids are extremely intelligent in technical matters.

06.  Your best friend is an android.

07.  It tells you that it won the lottery.

08.  You verify that it won the lottery.

09.  It tells you that it purchased only one lottery ticket.

10.  You verify that it purchased only one lottery ticket.

11.  The chance of winning the lottery with only one ticket is 1 out of
100 million.

12.  It tells you that it cheated to win the lottery by hacking into its
computer system immediately after the winning numbers were announced,
purchasing one winning ticket and back-dating the time of the purchase.
____________________________________________

You have only two choices as to what to believe happened-

A.  The android actually won the lottery by cheating.

OR

B.  The android actually won the lottery by good luck.
The android was mistaken in thinking it successfully cheated.
______________________________________________________

The chance of 'A' being true is 99,999,999 out of 100,000,000.
The chance of 'B' being true is 1 out of 100,000,000.
________________________________________________

I would place my bet upon 'A' being true
because I do not believe such unlikely coincidences
will actually occur.

You would place your bet upon 'B' being true
because you do not believe such unlikely coincidences
have any statistical significance whatsoever.
_________________________________________

I make this assessment of your judgment ability fairly because you think
it is a meaningless result if a player with one set of CRC piece values
wins against its opponent 10-times-in-a-row even as the chance of it being
'random good luck' is indisputably only 1 out of 1024.

By the way ...

base 2 to exponent 100 equals 1,267,650,600,228,229,401,496,703,205,376.

Can you see how ridiculous your demand of 100 games is?

Derek Nalls wrote on Tue, May 13, 2008 05:27 PM UTC:

'Is this story meant to illustrate that you have no clue as to how to
calculate statistical significance?'

No.

This story is meant to illustrate that you have no clue as to how to
calculate probabilistic significance ... and it worked perfectly.
________________________________________________________

There you go again.  Missing the point entirely and ranting about
probabilities not being proper statistics.

Derek Nalls wrote on Mon, May 19, 2008 09:58 PM UTC:

To anyone who was interested ...

My playtesting efforts using SMIRF have been suspended indefinitely due to a serious checkmate bug which tainted the first game at 30 minutes per move between Scharnagl's and Muller's sets of CRC piece values.

Derek Nalls wrote on Mon, May 19, 2008 10:13 PM UTC:

Since Muller's Joker80 has recently established itself via 'The Battle Of
The (Unspeakables)' tournament as the best free CRC program in the world,
I checked it out.  I must report that setting-up Winboard F (also written
by Muller) to use it was straight-forward with helpful documentation. 
Generally, I am finding the features of Joker80 to be versatile and
capable for any reasonable uses.

25 comments displayed

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.