Check out Modern Chess, our featured variant for January, 2025.


[ Help | Earliest Comments | Latest Comments ]
[ List All Subjects of Discussion | Create New Subject of Discussion ]
[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

Earlier Reverse Order Later
Wikipedia link re: Margin of error (may be relavent to piece value studies)[Subject Thread] [Add Response]
Kevin Pacey wrote on Sat, Sep 17, 2016 02:50 AM UTC:

In looking up the latest poll results online (on wikipedia) for the US election, I noticed reference to margin of error, and also noticed that it was naturally bigger for smaller sample sizes. In the following link:

https://en.wikipedia.org/wiki/Margin_of_error

It can be seen that a sample size of just 96 has a margin of error of 10%, but a sample size of 384 has a margin of error of only 5%. It struck me that for a study of piece values using computers, it might be vital to have a considerably large sample size of games where identical engines play against each other in order to be rather confident of conclusions drawn in contesting the values of different pieces. Perhaps the minimum ought to be a sample size of 384 games. In concluding such a study, it might be noted how the margin of error might affect the estimate of a piece's value, if it is at all significant (e.g. "plus or minus 0.125 pawns" [however that might be calculated] possibly stated, after some calculations that are made for a piece's value based on win/loss percentages for that piece).

In his study finding that in chess a knight is exactly worth a bishop, I recall GM Larry Kaufman used a huge number of games (1,000,000+?) between skilled humans to draw his conclusion with a high degree of statistical confidence. This might have been a flawed study all the same; it seems from chess books that most human chess authorities agree that a knight is a little worse than a bishop on average. My own guess is that looking at human vs. human games wouldn't necessarily produce the same statistical result as an engine vs. identical engine study, with such a huge number of games also being played, and all starting with an opening-stage position setup where a single bishop is pitted against a single knight. That's, at the least, since different people value bishops & knights (and under what circumstances they can be exchanged equitably) slightly differently, which affects people's decisions, and in turn the possible results of all the individual games counted in a study, in a more chaotic way than with engines. That's not to mention all too human blunders or lesser mistakes, although these might tend to even out more than discrepencies caused by different players valuing minor pieces differently. I should note though that I own one 1998 middlegame book that is quite content to quote human vs. human database statistics that have results in favour of 2 bishops over 2 knights (or knight + bishop) a big majority of the time, under varying conditions of even material otherwise, much as Kaufman found.

P.S.: In digging back through old Comments, I see that H.G. (if no one else) has in a way basically taken into account much (if not all) of what I posted above, and made computer studies with a minimum of 1,000 games, in at least some cases, e.g. Amazon vs. Q + N (don't know about sample size in the case of B vs. N), when calculating piece values via piece vs. piece(s) battles. Assuming the engine + methodology used is a strong one, I still can't square some of the results of computer studies with my intuition, to my bewilderment. A personal anecdote that's possibly amusing: at one stage when musing about margin of error in regard to piece value estimates, I thought for a second that if a knight (as a piece of lower or equal value to a bishop) were set to 3.0 and the margin were 10% then the margin of error for a study (of 96 games) comparing it to a bishop might be 3.0 x .1 = 0.3 pawns. In similar fashion, I thought if an archbishop were set to 8.0 with a margin of 10% then the margin of error for a study (of 96 games) comparing it to a queen might be 8.0 x .1 = 0.8 pawns. I soon saw no justification for tying margin of error to the assigned numerical value of a piece, and realized it must be incorrect math. :)

Another way to try to convert margin of error from a raw percentage into a percentage of a pawn could first involve considering what constitutes the numerical value of a minimum decisive advantage (i.e. an engine should win 100% of all games in a study with this much advantage). In chess, that's about 1.333 of a pawn according to the old book Point Count Chess; if we accept that value (for the sake of argument) then a margin of error of 10% (i.e. for a study with 96 games) could be converted to 1.333 x .1 = plus or minus 0.133 pawns worth of margin of error. This may be just more incorrect math, but oddly enough I don't see how to easily refute it at the moment, at least with my feeble/rusty math skills.


Kevin Pacey wrote on Sat, Sep 17, 2016 07:05 AM UTC:

I've extensively edited my previous post, in case anyone missed it.


Kevin Pacey wrote on Sat, Jan 14, 2017 07:48 PM UTC:

Not attempting to harp on the subject of computer or statistical studies of piece values, my thoughts on margin of error in these cases has been the same for a long time now. They're threefold: firstly, I'm thinking it's possible in such studies margin of error might have been estimated as at best half of what it should be. That is, say piece X is assumed to be superior to piece Y, then it's superiority might be thought to be manifested as 50%+superiority%+margin of error[assumed no greater than 100/2 or 50]% out of 100% total of n games in a sample. In calculating the margin of error, I think it should be double that [i.e. no greater than 100%], since in THEORY (however unlikely it seems) there could be a sample where piece X scores less than 50%. This is unlikely (though not impossible, given sufficiently weak players or a weak computer program) if X is a rook and Y is a bishop, but suppose X is a knight instead. Another possible problem in estimating the margin of error in such studies is that if one uses a pawn as a kind of standard candle, a pawn is a much greater fraction of a minor piece (e.g. bishop or knight, i.e. about 1/3rd of either of these) than it is a fraction of a senior major piece (e.g. archbishop, chancellor or queen, i.e. a pawn is worth roughly 1/9th [or more] of any of these), which may deserve to be taken into consideration when calculating any sort of margin.

[edit: If the above does deserve to be taken into consideration, after one calculates any 'initial' margin of error for a study, however one approves of doing it, I can suggest it be multiplied by a 'Fudge Factor' to reach a final margin of error. This Fudge Factor could be (as a crude example guess of mine) = ([Assumed total value {in pawns} of the assumed superior or equivalent piece[s] being measured] Squared) Divided by (Assumed total value {in pawns} of the assumed inferior or equivalent piece[s] being measured + 1). Now for example cases: if one side has an extra pawn, Fudge Factor = (1x1)/(0+1) = 1. If one side has a bishop for a knight, Fudge Factor = (3x3)/(3+1) = 9/4. If one side has a rook for 5 pawns, Fudge Factor = (5x5)/(5+1) = 25/6. If one side has a queen and the other side has an Archbishop (or Chancellor), if we say for the sake of argument that they're equivalent then Fudge Factor = (9x9)/(9+1) = 81/10. If one side has two bishops and the other has a knight and bishop, Fudge Factor = (6x6)/(6+1) = 36/7. If one side has 3 queens and the other side has 7 knights (which should actually beat the 3 queens [which are superior on paper in value], with no pawns involved anyway) then Fudge Factor = (9x3x9x3)/(3x7+1) = 27x27/22, i.e. very large. In coming up with Fudge Factor, I tried initially to take into account the total value of each side's Army (not counting kings) for a given setup. That is, the setup being studied in order to measure a [sub-]set of piece[s]. However, this complicated my attempts at finding a plausibly suitable formula (IMHO) too much, in spite of it seeming otherwise very desirable to take the value of the Armies into account somehow.]

Also, I still believe strength of the playing sides (even if they are one and the same player, such as a computer program) can significantly affect the results of such studies (enlarging the margin of error, to put it one way). A link I gave elsewhere notes that knight odds are compensated for by a difference of 600 rating points in chess, so even a pawn difference can be less significant in games between weaker players or computers than in games between stronger players. An analogy I'd make is that if you let kids play games in a sandbox, you'd be lucky if you'd see a somewhat competently designed sand castle at some point, while if a master sculptor played in a sandbox, we'd receive masterpieces that made the best use of the material available.

To sum up my position as it stands now, I believe we'd have piece values from such studies that could be trusted with a high degree of confidence (at least by myself) if margin of error is convincingly accurate and (more importantly, perhaps) computer programs with (widely accepted) very high chess ratings were used as the playing sides in such studies (which are intended, at least for now, for chess and rather chess-like games).


3 comments displayed

Earlier Reverse Order Later

Permalink to the exact comments currently displayed.