Check out Janggi (Korean Chess), our featured variant for December, 2024.


[ Help | Earliest Comments | Latest Comments ]
[ List All Subjects of Discussion | Create New Subject of Discussion ]
[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments by DerekNalls

EarliestEarlier Reverse Order LaterLatest
Piece Values[Subject Thread] [Add Response]
Derek Nalls wrote on Fri, May 2, 2008 06:13 PM UTC:
Your revised material values for SMIRF look fine to me. I have written them down for safekeeping. Which version will you be compiling? Of course, I do not plan to playtest anyone's material values for pieces upon the 8 x 8 board- only material values for CRC pieces upon the 10 x 8 board.

Derek Nalls wrote on Sat, May 3, 2008 03:07 PM UTC:
I have adequate confidence in my latest material values to ask you to
publish them upon your web page (instead of my previous material values).

CRC
material values of pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

They are, in principle, similar to Muller's set for every piece except
that they run on a comparatively compressed scale.  Even though I have not
yet playtested them, I consider my tentative confidence rational (although
admittedly premature and risky) because I trust Muller's methods of
playtesting his own material values and I think my latest revisions to my
model are conceptually valid.

Aberg variation of Capablanca's Chess. Different setup and castling rules. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]
Derek Nalls wrote on Sat, May 3, 2008 03:26 PM UTC:
Muller:

You have my best regards toward your worthwhile effort to publish your
empirical, statistical method for obtaining the material values of pieces
in the ICGA Journal.  My assessment is that it will surely be a much
better paper than the junk [name removed] published in the same journal
regarding piece values.

[The above has been edited to remove a name and/or site reference. It is
the policy of cv.org to avoid mention of that particular name and site to
remove any threat of lawsuits. Sorry to have to do that, but we must
protect ourselves. -D. Howe]

Piece Values[Subject Thread] [Add Response]
Derek Nalls wrote on Sat, May 3, 2008 05:20 PM UTC:
re:  Muller's assessment of 5 methods of deriving material values for CRC pieces

'I am not sure how much of the agreement between (3) and (4) can be
ascribed to the playtesting, and how much to the theoretical arguments
...'

As much playtesting as possible.  Unfortunately, that amount is deficient
by my standards (and yours).  I have tried to compensate for marginal
quantity with high quality via long time controls.  You use a converse
approach with opposite emphasis.  Given enough years (working with 
only one server), this quantity of well-played games may eventually 
become adequate.

' ... and it is not clear how well the theoretical arguments are able to
PREdict piece values rather than POSTdict them.'

You have pinpointed my greatest disappointment and frustration thusfar
with my ongoing work.  To date, my theoretical model has not made 
any impressive predictions verified by playtesting.  To the contrary,
it has been revised, expanded and complicated many times upon 
discovery that it was grossly in error or out of conformity with reality.

Although the foundations of the theoretical model are built upon 
arithmetic and geometry to the greatest extent possible with verifiable 
phenomena important to material values of pieces used logically for 
refinements, mathematical modelling can be misused to postulate and 
describe in detail the existence of almost any imaginable non-existent 
phenomena.  For example, the Ptolemy model of the solar system.

Derek Nalls wrote on Sun, May 4, 2008 06:38 AM UTC:
'I never found any effect of the time control on the scores I measure for
some material imbalance. Within statistical error, the combinations I
tries produced the same score at 40/15', 40/20', 40/30', 40/40',
40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I
did not consider it worth doing just to prove that it was a waste of
time...'
_________

The additional time I normally give to playtesting games to improve the
move quality is partially wasted because I can only control the time per
move instead of the number of plies completed using most chess variant
programs.  This usually results in the time expiring while it is working
on an incomplete ply.  Then, it prematurely spits out a move
representative of an incomplete tour of the moves available within that
ply at a random fraction of that ply.  Since there is always more than one
move (often, a few-several) under evaluation as being the best possible
move [Otherwise, the chosen move would have already been executed.], this
means that any move on this 'list of top candidates' is equally likely
to be randomly executed.

Here are two typical scenarios that should cover what usually happens:

A.  If the list of top candidates in an 11-ply search consists of 6 moves
where the list of top candidates in a 10-ply search consists of 7 moves,
then only 1 discovered-to-be-less-than-the-best move has been successfully
excluded and cannot be executed.  

Of course, an 11-ply search completion may typically require est. 8-10
times as much time as the search completions for all previous plies (1-ply
thru 10-ply) up until then added together.

OR

B.  If the list of top candidates in an 11-ply search consists of 7 moves
[Moreover, the exact same 7 moves.] just as the preceding 10-ply search, 
then there is no benefit at all in expending 8-10 times as much time.
______________________________________________________________

The reason I endure this brutal waiting game is not for purely masochistic
experience but because the additional time has a tangible chance (although
no guarantee) of yielding a better move with every occasion.  Throughout
the numerous moves within a typical game, it can be realistically expected
to yield better moves on dozens of occasions.

We usually playtest for purposes at opposite extremes of the spectrum 
yet I regard our efforts as complimentary toward building a complete 
picture involving material values of pieces.

You use 'asymmetrical playtesting' with unequal armies on fast time 
controls, collect and analyze statistics ... to determine a range, with a
margin of error, for individual material piece values.

I remain amazed (although I believe you) that you actually obtain any 
meaningful results at all via games that are played so quickly that the AI
players do not have 'enough time to think' while playing games so complex
that every computer (and person) needs time to think to play with minimal
competence.  Can you explain to me in a way I can understand how and why
you are able to successfully obtain valuable results using this method? 
The quality of your results was utterly surprising to me.  I apologize for
totally doubting you when you introduced your results and mentioned how you
obtained them.

I use 'symmetrical playtesting' with identical armies on very slow time
controls to obtain the best moves realistically possible from an
evaluation function thereby giving me a winner (that is by some margin
more likely than not deserving) ... to determine which of two sets of
material piece values is probably (yet not certainly) better. 
Nonetheless, as more games are likewise played ...  If they present a
clear pattern, then the results become more probable to be reliable, 
decisive and indicative of the true state of affairs.

The chances of flipping a coin once and it landing 'heads' are equal to
it landing 'tails'.  However, the chances of flipping a coin 7 times and
it landing 'heads' all 7 times in a row are 1/128.  Now, replace the
concepts 'heads' and 'tails' with 'victory' and 'defeat'.  I
presume you follow my point.

The results of only a modest number of well-played games can definitely
establish their significance beyond chance and to the satisfaction of 
reasonable probability for a rational human mind.  [Most of us, including
me, do not need any better than a 95%-99% success to become convinced that
there is a real correlation at work even though such is far short of an
absolute 100% mathematical proof.]

In my experience, I have found that using any less than 10 minutes per
move will cause at least one instance within a game when an AI player
makes a move that is obvious to me (and correctly assessed as truly being)
a poor move.  Whenever this occurs, it renders my playtesting results 
tainted and useless for my purposes.  Sometimes this occurs during a 
game played at 30 minutes per move.  However, this rarely occurs during 
a game played at 90 minutes per move.

For my purposes, it is critically important above all other considerations
that the winner of these time-consuming games be correctly determined 
'most of the time' since 'all of the time' is impossible to assure.
I must do everything within my power to get as far from 50% toward 100%
reliability in correctly determining the winner.  Hence, I am compelled to
play test games at nearly the longest survivable time per move to minimize
the chances that any move played during a game will be an obviously poor 
move that could have changed the destiny of the game thereby causing 
the player that should have won to become the loser, instead.  In fact, 
I feel as if I have no choice under the circumstances.

Derek Nalls wrote on Sun, May 11, 2008 10:05 PM UTC:
Before Scharnagl sent me three special versions of SMIRF MS-174c compiled
with the CRC material values of Scharnagl, Muller & Nalls, I began
playtesting something else that interested me using SMIRF MS-174b-O.

I am concerned that the material value of the rook (especially compared to
the queen) amongst CRC pieces in the Muller model is too low:

rook  55.88
queen  111.76

This means that 2 rooks exactly equal 1 queen in material value.

According to the Scharnagl model:

rook  55.71
queen  91.20

This means that 2 rooks have a material value (111.42) 22.17% greater than
1 queen.

According to the Nalls model:

rook  59.43
queen  103.05

This means that 2 rooks have a material value (118.86) 15.34% greater than
1 queen.

Essentially the Scharnagl & Nalls models are in agreement in predicting
victories in a CRC game for the player missing 1 queen yet possessing 2
rooks.  By contrast, the Muller model predicts draws (or appr. equal
number of victories and defeats) in a CRC game for either player.

I put this extraordinary claim to the test by playing 2 games at 10
minutes per move on an appropriately altered Embassy Chess setup with the
missing-1-queen player and the missing-2-rooks player each having a turn
at white and black.

The missing-2-rooks player lost both games and was always behind.  They
were not even long games at 40-60 moves.

Muller:

I think you need to moderately raise the material value of your rook in
CRC.  It is out of its proper relation with the other material values
within the set.

Derek Nalls wrote on Mon, May 12, 2008 07:06 PM UTC:
'You hardly have the possibility of trading it before there are open
files. So it stands to reason that you might as well use the higher value
during the entire game.'

Well, I understand and accept your reasons for leaving your lower rook 
value in CRC as is.  It is interesting that you thoroughly understand and
accept the reasons of others for using a higher rook value in CRC as
well.  Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic?
_____________________________

'... if we both play a Q-2R match from the opening, it is a serious
problem if we don't get the same result. But you have played only 2
games. Statistically, 2 games mean NOTHING.'

I never falsely claimed or implied that only 2 games at 10 minutes per 
move mean everything or even mean a great deal (to satisfy probability
overwhelmingly).  However, they mean significantly more than nothing.  
I cannot accept your opinion, based upon a purely statistical viewpoint,
since it is at the exclusion another applicable mathematical viewpoint.  
They definitely mean something ... although exactly how much is not 
easily known or quantified (measured) mathematically.
__________________________________________________

'I don't even look at results before I have at least 100 games, because
before they are about as likely to be the reverse from what they will 
eventually be, as not.'

Statistically, when dealing with speed chess games populated 
exclusively with virtually random moves ... YES, I can understand and 
agree with you requiring a minimum of 100 games.  However, what you 
are doing is at the opposite extreme from what I am doing via my 
playtesting method.

Surely you would agree that IF I conducted only 2 games with perfect 
play for both players that those results would mean EVERYTHING.  
Unfortunately, with state-of-the-art computer hardware and chess variant 
programs (such as SMIRF), this is currently impossible and will remain 
impossible for centuries-millennia.  Nonetheless, games played at 100 
minutes per move (for example) have a much greater probability of 
correctly determining which player has a definite, significant advantage 
than games played at 10 seconds per move (for example).

Even though these 'deep games' play of nowhere near 600 times better
quality than these 'shallow games' as one might naively expect
(due to a non-linear correlation), they are far from random events 
(to which statistical methods would then be fully applicable).  
Instead, they occupy a middleground between perfect play games and 
totally random games.  [In my studied opinion, the example 
'middleground games' are more similar to and closer to perfect play 
games than totally random games.]  To date, much is unknown to
combinatorial game theory about the nature of these 'middleground 
games'.

Remember the analogy to coin flips that I gave you?  Well, in fact, 
the playtest games I usually run go far above and beyond such random 
events in their probable significance per event.

If the SMIRF program running at 90 minutes per move casted all of its 
moves randomly and without any intelligence at all (as a perfect 
woodpusher), only then would my 'coin flip' analogy be fully applicable.
Therefore, when I estimate that it would require 6 games (for example) 
for me to determine, IF a player with a given set of piece values loses 
EVERY game, that there is only a 63/64 chance that the result is
meaningful (instead of random bad luck), I am being conservative to the
extreme.  The true figure is almost surely higher than a 63/64 chance.

By the way, if you doubt that SMIRF's level of play is intelligent and
non-random, then play a CRC variant of your choice against it at 90 
minutes per move.  After you lose repeatedly, you may not be able to 
credit yourself with being intelligent either (although you should) ... 
if you insist upon holding an impractically high standard to define the 
word.
______

'If you find a discrepancy, it is enormously more likely that the result
of your 2-game match is off from its true win probability.'

For a 2-game match ... I agree.  However, this may not be true for a 
4-game, 6-game or 8-game match and surely is not true to the extremes 
you imagine.  Everything is critically dependant upon the specifications 
of the match.  The number of games played (of course), the playing 
strength or quality of the program used, the speed of the computer and 
the time or ply depth per move are the most important factors.
_________________________________________________________

'Play 100 games, and the error in the observed score is reasonable
certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.'

It would require est. 20 years for me to generate 100 games with the 
quality (and time controls) I am accustomed to and somewhat satisfied 
with.  Unfortunately, it is not that important to me just to get you to
pay attention to the results for the benefit of only your piece values
model.  As a practical concern to you, everyone else who is working to
refine quality piece values models in FRC and CRC will have likely
surpassed your achievements by then IF you refuse to learn anything from
the results of others who use different yet valid and meaningful methods
for playtesting and mathematical analysis than you.

Derek Nalls wrote on Tue, May 13, 2008 02:39 AM UTC:
'Of course, that is easily quantified. The entire mathematical field of
statistics is designed to precisely quantify such things, through
confidence levels and uncertainty intervals.'

No, it is not easily quantified.  Some things of numerical importance
as well as geometric importance that we try to understand or prove 
in the study of chess variants are NOT covered or addressed by statistics.
I wish our field of interest was that simple (relatively speaking) and
approachable but it is far more complicated and interdisciplinary.  
All you talk about is statistics.  Is this because statistics is all you
know well?
___________

'That difference just can't be seen with two games. Play 100.
There is no shortcut.'

I agree.  Not with only 2 games.  

However ...

With only 4 games, IF they were ALL victories or defeats for the player 
using a given piece values model, I could tell you with confidence 
that there is at least a 15/16 chance the given piece values model is 
stronger or weaker, respectively, than the piece values model used by 
its opponent.  [Otherwise, the results are inconclusive and useless.]

Furthermore, based upon the average number of moves per game 
required for victory or defeat compared to the established average 
number of moves in a long, close game, I could probably, correctly 
estimate whether one model was a little or a lot stronger or weaker, 
respectively, than the other model.  Thus, I will not play 100 games 
because there is no pressing, rational need to reduce the 'chance of 
random good-bad luck' to the ridiculously-low value of 
'the inverse of (base 2 to exponent 100)'.

Is there anything about the odds associated with 'flipping a coin'
that is beyond your ability to understand?  This is a fundamental 
mathematical concept applicable without reservation to symmetrical 
playtesting.  In any case, it is a legitimate 'shortcut' that I can and
will use freely.
________________

'Even perfect play doesn't help. We do have perfect play for all 6-men 
positions.'

I meant perfect play throughout an entire game of a CRC variant 
involving 40 pieces initially.  That is why I used the word 'impossible'
with reference to state-of-the-art computer technology.
_______________________________________________________

'This is approximately master-level play.'

Well, if you are getting master-level play from Joker80 with speed
chess games, then I am surely getting a superior level of play from 
SMIRF with much longer times and deeper plies per move.  You see,
I used the term 'virtually random moves' appropriately in a 
comparative context based upon my experience.
_____________________________________________

'Doesn't matter if you play at an hour per move, a week per move, 
a year per move, 100 year per move. The error will remain >=32%. 
So if you want to play 100 years per move, fine. But you will still
need 100 games.'

Of course, it matters a lot.  If the program is well-written, then the 
longer it runs per move, the more plies it completes per move
and consequently, the better the moves it makes.  Hence,
the entire game played will progressively approach the ideal of 
perfect play ... even though this finite goal is impossible to attain.
Incisive, intelligent, resourceful moves must NOT to be confused with 
or dismissed as purely random moves.  Although I could humbly limit 
myself to applying only statistical methods, I am totally justified,
in this case, in more aggressively using the 'probabilities associated 
with N coin flips ALL with the same result' as an incomplete, minimum 
value before even taking the playing strength of SMIRF at extremely-long 
time controls into account to estimate a complete, maximum value.
______________________________________________________________

'The advantage that a player has in terms of winning probability is the
same at any TC I ever tried, and can thus equally reliably be determined
with games of any duration.'

You are obviously lacking completely in the prerequisite patience and 
determination to have EVER consistently used long enough time controls 
to see any benefit whatsoever in doing so.  If you had ever done so, 
then you would realize (as everyone else who has done so realizes) 
that the quality of the moves improves and even if the winning probability
has not changed much numerically in your experience, the figure you 
obtain is more reliable.  

[I cannot prove to you that this 'invisible' benefit exists
statistically. Instead, it is an important concept that you need to
understand in its own terms.  This is essential to what most playtesters do, with the notable exception of you.  If you want to understand what I do and why, then you must come to grips with this reality.]

Derek Nalls wrote on Tue, May 13, 2008 03:38 AM UTC:
CRC piece values tournament
http://www.symmetryperfect.com/pass/

Just push the 'download now' button.

Game #1
Scharnagl vs. Muller
10 minutes per move
SMIRF MS-174c

Result- inconclusive.
Draw after 87 moves by black.
Perpetual check declared.

Derek Nalls wrote on Tue, May 13, 2008 03:08 PM UTC:
'This discussion is pointless.'

On this one occasion, I agree with you.

However, I cannot just let you get away with some of your most 
outrageous remarks to date.

So, unfortunately, this discussion is not yet over.
____________________________________________

'First you should have results, 
then it becomes possible to talk about what they mean. 
You have no result.'

Of course, I have a result!

The result is obviously the game itself as a win, loss or draw
for the purposes of comparing the playing strengths of two
players using different sets of CRC piece values.

The result is NOT statistical in nature.
Instead, the result is probabilistic in nature.

I have thoroughly explained this purpose and method to you.
I understand it.
Reinhard Scharnagl understands it.
You do not understand it.
I can accept that.
However, instead of admitting that you do not understand it,
you claim there is nothing to understand.
______________________________________

'Two sets of piece values as different as day and night, and the only
thing you can come up with is that their comparison is
'inconclusive'.'

Yes.  Draws make it impossible to determine which of two sets of
piece values is stronger or weaker.  However, by increasing the
time (and plies) per move, smaller differences in playing strength 
can sometimes be revealed with 'conclusive' results.

I will attempt the next pair of Scharnagl vs. Muller and Muller vs.
Scharnagl games at 30 minutes per move.  Knowing how much
you appreciate my efforts on your behalf motivates me.
___________________________________________________

'Talk about pathetic: even the two games you played are the same.'

Only one game was played.

The logs you saw were produced by the Scharnagl (standard) version
of SMIRF for the white player and the Muller (special) version of SMIRF
for the black player.  The game is handled in this manner to prevent 
time from being expired without computation occurring.
___________________________________________________

'... does your test setup s*ck!'

What, now you hate Embassy Chess too?
Take up this issue with Kevin Hill.

Derek Nalls wrote on Tue, May 13, 2008 04:18 PM UTC:
Since I had to endure one of your long bedtime stories (to be sure),
you are going to have to endure one of mine.  Yet unlike yours
[too incoherent to merit a reply], mine carries an important point:

Consider it a test of your common sense-

Here is a scenario ...

01.  It is the year 2500 AD.

02.  Androids exist.

03.  Androids cannot tell lies.

04.  Androids can cheat, though.

05.  Androids are extremely intelligent in technical matters.

06.  Your best friend is an android.

07.  It tells you that it won the lottery.

08.  You verify that it won the lottery.

09.  It tells you that it purchased only one lottery ticket.

10.  You verify that it purchased only one lottery ticket.

11.  The chance of winning the lottery with only one ticket is 1 out of
100 million.

12.  It tells you that it cheated to win the lottery by hacking into its
computer system immediately after the winning numbers were announced,
purchasing one winning ticket and back-dating the time of the purchase.
____________________________________________

You have only two choices as to what to believe happened-

A.  The android actually won the lottery by cheating.

OR

B.  The android actually won the lottery by good luck.
The android was mistaken in thinking it successfully cheated.
______________________________________________________

The chance of 'A' being true is 99,999,999 out of 100,000,000.
The chance of 'B' being true is 1 out of 100,000,000.
________________________________________________

I would place my bet upon 'A' being true
because I do not believe such unlikely coincidences
will actually occur.

You would place your bet upon 'B' being true
because you do not believe such unlikely coincidences
have any statistical significance whatsoever.
_________________________________________

I make this assessment of your judgment ability fairly because you think
it is a meaningless result if a player with one set of CRC piece values
wins against its opponent 10-times-in-a-row even as the chance of it being
'random good luck' is indisputably only 1 out of 1024.

By the way ...

base 2 to exponent 100 equals 1,267,650,600,228,229,401,496,703,205,376.

Can you see how ridiculous your demand of 100 games is?

Derek Nalls wrote on Tue, May 13, 2008 05:27 PM UTC:
'Is this story meant to illustrate that you have no clue as to how to
calculate statistical significance?'

No.

This story is meant to illustrate that you have no clue as to how to
calculate probabilistic significance ... and it worked perfectly.
________________________________________________________

There you go again.  Missing the point entirely and ranting about
probabilities not being proper statistics.

Derek Nalls wrote on Mon, May 19, 2008 09:58 PM UTC:
To anyone who was interested ...

My playtesting efforts using SMIRF have been suspended indefinitely due to a serious checkmate bug which tainted the first game at 30 minutes per move between Scharnagl's and Muller's sets of CRC piece values.

Derek Nalls wrote on Mon, May 19, 2008 10:13 PM UTC:
Since Muller's Joker80 has recently established itself via 'The Battle Of
The (Unspeakables)' tournament as the best free CRC program in the world,
I checked it out.  I must report that setting-up Winboard F (also written
by Muller) to use it was straight-forward with helpful documentation. 
Generally, I am finding the features of Joker80 to be versatile and
capable for any reasonable uses.

Derek Nalls wrote on Mon, May 19, 2008 10:28 PM UTC:
Muller:

I would like to conduct two focused playtests using Joker80 at very long
time controls (e.g., 30 minutes per move) to investigate these important questions-

1.  Is Muller's rook value within the CRC set too low?
2.  Is Scharnagl's archbishop value within the CRC set too low?

I would need for you to compile special versions of Joker80 for me using
significantly different values for those CRC pieces as well as
Scharnagl's CRC piece set.  To isolate the target variable, these games would be Muller (standard values) vs. Muller (test values) and Scharnagl (standard values) vs. Scharnagl (test values) via symmetrical playtesting.  Anyway, we can discuss the details if you are interested or willing.  Please let me know.

Derek Nalls wrote on Tue, May 20, 2008 01:13 AM UTC:
Muller:

Please investigate this potentially serious bug I may have discovered
while testing Joker80 under Winboard F ...

Bugs, Bugs, Bugs!
http://www.symmetryperfect.com/pass

I am having a hard time with software today.

Derek Nalls wrote on Tue, May 20, 2008 07:16 AM UTC:
'Human vs. engine play is virtually untested. 
Did you at any point of the game use 'undo'
(through the WinBoard 'retract move')?'

Yes.
Many of us error-prone humans use it frequently.
________________________________________________

'This is indeed something I should fix but
the current work-around would be not to use 'undo'.'

Makes sense to me.
I can avoid using the 'retract move' command altogether.
________________________________________________________

'I could make a Joker80 version that reads the piece base values from a
file 'joker.ini' at startup. Then you could change them to anything you
want to test, without the need to re-compile. Would that satisfy your
needs?'

Yes, better than I ever imagined.
Thank you!

Derek Nalls wrote on Tue, May 20, 2008 04:48 PM UTC:
Everything is working fine.
Thank you!

I now have 12 instances of the Joker80 program running in various
sub-directories of Winboard F with the 'winboard.ini' file set to
conveniently initiate any desired standard or special material values for
the CRC models by Muller, Scharnagl and Nalls.

In the first test, I am going to attempt to find a playtesting time where
a distinct seperation in playing strength occurs between the standard
Muller model wherein the rook is 1 pawn more valuable than the bishop and
a special Muller model wherein the rook is 2 pawns more valuable than the
bishop.  If I successfully find a playtesting time that is survivable by
humans, then we can hopefully establish a tentative probability as to
which CRC model plays decisively better after a few-several games.

At par 100 (for the pawn), the bishop is at 459 under both models with the
rook at 559 under the standard Muller model and 659 under the special
Muller model.

I want to playtest a special Muller model with a rook value 2.00 pawns higher than the bishop because the Nalls model has a rook value 2.19 pawns higher than the bishop and the Scharnagl model has a rook value 1.94 pawns higher than the bishop (for an average of 2.06 pawns).

Since I am attempting to test for such a small difference in the material value of only one type of piece (the rook), I have doubts that I will be able to obtain conclusive results.  In any case ... If I obtain conclusive results, then very long time controls will surely be required to produce them.

Derek Nalls wrote on Tue, May 20, 2008 09:05 PM UTC:
Of course, I would bet anything that there are no 1:1 exchanges supported
under the standard Muller CRC model that could cause material losses.  If
that were the case, yours would not be one of the three most credible CRC
models under close consideration.  In fact, even your excellent Joker80
program would play poorly if stuck with using faulty CRC piece values.

Obviously, the longer the exchange, the rarer its occurrence during
gameplay.  The predominance of simple 1:1 exchanges over even the least
complicated, 1:2 or 2:1 exchanges, in gameplay is large although I do not
know the stats.

In fact, there is a certain 1:2 or 2:1 exchange I am hoping to see that is
likely to support my contention that the Muller rook value should be
higher: the 1 queen for 2 rooks or 2 rooks for 1 queen exchange.  Please
recall that under the standard Muller model, this is an equal exchange. 
However, under asymmetrical playtesting of comparable quality to and
similar to that I used to confirm the correctness of your higher
archbishop value, I played numerous CRC games at various moderate time
controls where the player without 1 queen (yet with 2 rooks) defeated the
player without 2 rooks (yet with 1 queen).  Ultimately, a key mechanism to conclusive results is that while the standard Muller model is neutral toward a 2 rook : 1 queen or 1 queen : 2 rook exchange, the special Muller model regards its 1 queen as significantly less valuable than 2 rooks of its opponent.  Consequently, this contrast in valuation could be played into ... and we would see who wins.

I am actually pleased that you are a realist who shares my pessimism in
this experiment.  In any case, low odds do not deter a best effort to
succeed.  The main difference between us is that you calculate your
pessimism by extreme statistical methods whereas I calculate my pessimism
by moderate probabilistic methods.  I remain hopeful that eventually I
will prove to you that the method Scharnagl & I developed is occasionally
productive.

Derek Nalls wrote on Tue, May 20, 2008 09:17 PM UTC:
Muller:

Please confirm that these are legal values for the 'winboard.ini' file.

/firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22
P100=353=459=559=1029=1059=1118'
'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22
P100=353=459=659=1029=1059=1118'
'C:\winboard-F\Joker80\w\S-st\w-S-st 22
P100=306=363=557=702=912=960'
'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22
P100=306=363=557=866=912=960'
'C:\winboard-F\Joker80\w\N-st\w-N-st 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\TJchess\TJChess10x8'
}
/secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22
P100=353=459=559=1029=1059=1118'
'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22
P100=353=459=659=1029=1059=1118'
'C:\winboard-F\Joker80\b\S-st\b-S-st 22
P100=306=363=557=702=912=960'
'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22
P100=306=363=557=866=912=960'
'C:\winboard-F\Joker80\b\N-st\b-N-st 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\TJchess\TJChess10x8'
}

Derek Nalls wrote on Wed, May 21, 2008 04:53 PM UTC:
As I moved to renormalize all of the values used in Joker80 (written into
the 'winboard.ini' file) with the pawn at a par of 85 points, I looked
at my notes again.  They reminded me that your use of the 'bishop pair'
refinement (with a bonus of 40 points) ramifies that the material value of
the rook is either 1.00 pawns or 1.47 pawns greater than the material value
of the bishop in CRC, depending upon whether or not only one bishop or both
bishops, respectively, remain in the game.  At that point, I realized that
I would be attempting to playtest for a discrepancy that I know from
experience is just too small to detect even at very long time controls. 
So, this planned test has been cancelled.

I am not implying that this matter is unimportant, though.  I remain
concerned for the standard Muller model whenever it allows the exchange of
its 2 rooks for 1 queen belonging to its opponent.

Derek Nalls wrote on Wed, May 21, 2008 07:02 PM UTC:
Muller:

Please have another look at this except from my 'winboard.ini' file. 
There are standard and special versions of piece values by Muller,
Scharnagl & Nalls for the white and black players renormalized to pawn =
85 points.

The special version of the Muller model has a rook value exactly 85 points
or 1.00 pawn higher than the standard version.

The special version of the Scharnagl model has an archbishop value (736
points) at appr. 95% of the archbishop value (775 points) instead of 597
points at appr. 77% for the standard version.

The special version of the Nalls model is identical to the standard
version until some test is needed and planned.

Since I assume that the 'bishop pairs bonus' is hardwired into Joker80,
40 points has been subtracted from the model-independant, material values
of the bishop under all three models.  Is this correct?
_____________________________________________________

/firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\w\S-st\w-S-st 22
P85=260=269=474=597=775=816'
'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22
P85=260=269=474=736=775=816'
'C:\winboard-F\Joker80\w\N-st\w-N-st 22
P85=262=279=505=799=815=876'
'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22
P85=262=279=505=799=815=876'
'C:\winboard-F\TJchess\TJChess10x8'
}
/secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\b\S-st\b-S-st 22
P85=260=269=474=597=775=816'
'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22
P85=260=269=474=736=775=816'
'C:\winboard-F\Joker80\b\N-st\b-N-st 22
P85=262=279=505=799=815=876'
'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22
P85=262=279=505=799=815=876'
'C:\winboard-F\TJchess\TJChess10x8'
}

Derek Nalls wrote on Thu, May 22, 2008 12:13 AM UTC:
'If I were you, I would normalize all models to Q=950 but then replace
the pawn value everywhere by 85.'

Since this is what you (the developer of Joker80) recommend as optimum, 
this is what I will do.

Are you sure that replacing any pawn values different than 85 points
after renormalization to queen = 950 points still renders an accurate 
and complete representation, more or less, of the Scharnagl and Nalls 
models?

At par of queen = 950 points, the pawn value in the Nalls model
is not represented as being only 92.19% as high as that in the Muller 
model and the pawn value in the Scharnagl model is not represented
as being only 98.95% as high as that in the Muller model.

Thru it all ... If a perfect representation is not quite possible, 
I can accept that without reservation.
__________________________________

'I don't think you could say then that you deviate from the
model as the models do not really specify which type of Pawn they use as
a standard.'

Correctly calculating pawn values at the start of the game (much less, 
throughout the game) requires finesse as it is indeed a complex issue.
In fact, its excessively complexity is the reason my 66-page paper on
material values of pieces is silent in the case of calculating pawn values
in FRC & CRC.  Instead, someone needs to read an entire book from an 
outside source about calculating the material values of the pieces in 
Chess to sufficiently understand it.

Personally, I am content with the test situation as long as Joker80 
handles all pawns under all three models initially valued at 85 points
as fairly and equally as realistically possible.

I cannot speak for Reinhard Scharnagl at all, though.
________________________________________________

'The way you did it now would make the first Bishop to be traded of the 
value the model prescribes, but would make the second much lighter. 
If you would subtract half the bonus, then on the average they would 
be what the model prescribes.'

Now, I understand better.
It makes sense.
[I am glad I asked you.]

Yes, I will subtract 20 points (1/2 of the 'bishop pair bonus') from the
model-independant, material values for the bishop under the 
Scharnagl & Nalls models.

Derek Nalls wrote on Thu, May 22, 2008 12:33 AM UTC:
Muller:

Here is my latest revision to my 'winboard.ini' file.
Are these piece values acceptable to you?
Do you think these piece values will work smoothly with Joker80 running
under Winboard F yet remain true to all three models?
______________________________________________________

/firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\w\S-st\w-S-st 22
P85=302=339=551=694=902=950'
'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22
P85=302=339=551=857=902=950'
'C:\winboard-F\Joker80\w\N-st\w-N-st 22
P85=284=326=548=866=884=950'
'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22
P85=284=326=548=866=884=950'
'C:\winboard-F\TJchess\TJChess10x8'
}
/secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\b\S-st\b-S-st 22
P85=302=339=551=694=902=950'
'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22
P85=302=339=551=857=902=950'
'C:\winboard-F\Joker80\b\N-st\b-N-st 22
P85=284=326=548=866=884=950'
'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22
P85=284=326=548=866=884=950'
'C:\winboard-F\TJchess\TJChess10x8'
}

Derek Nalls wrote on Fri, May 23, 2008 12:47 AM UTC:
Originally, I planned two 'internal playtests'.  [By this self-invented
term I mean playtests of the standard model of a person against a special
model that I have compelling reasons to think may be superior by a
provable margin.]

The first planned test involves the standard CRC model of Muller against a
special CRC model with a higher, closer-to-conventional rook value.  Upon
closer examination, I suspected that the discrepancy was possibly too
small to be detected even with very long time controls.  So, I announced
that this test was cancelled.

Notwithstanding, I may change my mind and return to this unsolved mystery
if Joker80 demonstrates unusually-high aptitude as a playtesting tool. 
This might require very deep runs of moves with a completion time of a few
weeks to a few months per pair of games to achieve conclusive results.

The second planned test involves the standard CRC model of Scharnagl
against a special CRC model with a higher, unconventional archbishop
value.

Scharnagl currently assigns the archbishop with a material value of appr.
77% that of the chancellor in his standard CRC model.

Muller currently assigns the archbishop with a material value of greater
than 97% that of the chancellor in his standard CRC model.

Nalls currently assigns the archbishop with a material value of lesser
than 98% that of the chancellor in his standard CRC model.

I devised a special CRC model using identical material values for every
piece in the standard CRC model by Scharnagl except that it assigns the
archbishop with a material value of exactly 95% that of the chancellor
(18% or 1.65 pawns higher).  [Note that this figure is slightly more
moderate than those by Muller & Nalls.]  A discrepancy this large should
be detectable at short-moderate time controls.  This test is now
underway.

If either of these tests are successful at establishing or implicating a
probability that the special models play stronger than the standard
models, then revisions to the standard models may occur.  At that
juncture, we would be ready to begin 'external playtests'.  [By this
self-invented term I mean playtests of the standard models of different
persons against one another.]

25 comments displayed

EarliestEarlier Reverse Order LaterLatest

Permalink to the exact comments currently displayed.