Check out Modern Chess, our featured variant for January, 2025.


[ Help | Earliest Comments | Latest Comments ]
[ List All Subjects of Discussion | Create New Subject of Discussion ]
[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

EarliestEarlier Reverse Order Later
Including Piece Values on Rules Pages[Subject Thread] [Add Response]
Kevin Pacey wrote on Tue, Mar 12, 2024 12:26 AM UTC in reply to H. G. Muller from Mon Mar 11 09:59 PM:

Well, 2700+ play could be of interest one day (that is, may matter, in spite of being dismissed as if that level of play should be treated in that type of [different, i.e. dismissive] way no matter what), if someone were to imply they know the actual difference, if any, between a B and a N on average for 8x8 [chess]. Computer study results, perhaps at times implying they have established the truth of piece values, are already posted all over the internet, not just here on CVP, where readers/players are all presently presumed to be sub-2700. Even thus, readers of CVP might care out of curiosity alone to know the ultimate truth, however established, even if they do not benefit by it very often, if ever, in their own games, unless they become [near-]2700+ themselves one day. A lot of people finding it interesting to know that 8x8 checkers has been weakly solved in modern times is somewhat comparable, perhaps.

That explanation of margins of error doesn't mention a thing or two that might go wrong if any assumptions are made at any step, such as assuming the presumed materially weaker side will never win the most games in a study [especially if only a few hundred games] by a possible fluke, even if unlikely.

Then, there is my own hypothesis that the larger in value/(more powerful) a piece is, the greater a certain margin of error might need to be within a study.

Perhaps unrelated(? - cannot recall if we discussed ever), 4 [uncompensated]tempi (worth 1/3rd of a pawn each in an open position), or 4/3rds of a pawn might be a normal minimum decisive edge, at least that's in line with an old-school rule of thumb I saw in an old book 'Point Count Chess', where a pawn is 3 points and 4 points ahead is supposed decisive (again, 1 pawn = 3 tempi in an open position is an old rule from even longer ago).


H. G. Muller wrote on Tue, Mar 12, 2024 11:29 AM UTC in reply to Kevin Pacey from 12:26 AM:

Systematic errors can never be estimated. There is no limit to how inaccurate a method of measurement can be. The only recourse is to be sure you design the method as good as you can. But what you mention is a statistical error, not a systematic one. Of course the weaker side can win, in any match with a finite number of games, by a fluke. Conventional statistics tells you how large the probability for that is. The probability to be off by 2 standard deviations or more (in either direction) is about 5%. To be off by 3 about 0.27%. It quickly tails off, but to make the standard deviation twice smaller you need 4 times as many games.

So it depends on how much weaker the weak side is. To demonstrate with only a one-in-a-million probablity for a fluke that a Queen is stronger than a Pawn wouldn't require very many games. The 20-0 result that you would almost certainly get would only have a one-in-a-million probability when the Queen was not better, but equal. OTOH, to show that a certain material imbalance provides a 1% better result with 95% 'confidence' (i.e. only 5% chance it is a fluke), you will need 6400 games (40%/sqrt(6400) = 40%/80 = 0.5%, so a 51% outcome is two standard deviations away from equality).

My aim is usually to determine piece values with a standard deviation of about 0.1 Pawn. Since Pawn odds typically causes a 65-70% result, 0.1 Pawn would result in 1.5-2% excess score, and 400-700 games would achieve that (40%/sqrt(400) = 2%). I consider it questionable whether it makes sense to strive for more accurate values, because piece values in themselves are already averages over various material combinations, and the actual material that is present might affect them by more than 0.1 Pawn.

I am not sure what you want to say in your first paragraph. You still argue like there would be an 'absolute truth' in piece values. But there isn't. The only thing that is absolute is the distance to checkmate. Piece values are only a heuristic used by fallible players who cannot calculate far enough ahead to see the checkmate. (Together with other heuristics for judging positional aspects.) If the checkmate is beyond your horizon you go for the material you think is strongest (i.e. gives the best prospects for winning), and hope for the best. If material gain is beyond the horizon you go for the position with the current material that you consider best. Above a certain level of play piece values become meaningless, and positions will be judged by other criteria than what material is present. And below that level they cannot be 'absolute truth', because it is not the ultimate level.

I never claimed that statistics of computer-generated games provide uncontestable proof of piece values. But they provide evidence. If a program that human players rated around 2000 Elo have difficulty beating in orthodox Chess hardly does better with a Chancellor as Queen replacement than as with an Archbishop (say 54%), it seems very unlikely that the Archbishop would be two Pawns less valuable. As that same engine would have very little trouble to convert other uncontested 2-Pawn advantages (such as R vs N, or 2N+P vs R) to a 90% score. It would require pretty strong evidence to the contrary to dismiss that as irrelevant, plus an explanation for why the program systematically blundered that advantage away. But there doesn't seem to be any such evidence at all. That a high-rated player thinks it is different is not evidence, especially if the rating is only based on games where neither A nor C participate. That the average number of moves on an empty board of A is smaller than that of C is not evidence, as it was never proven that piece values only depend on average mobility. (And counter examples on which everyone would agree can easily be given.) That A is a compound of pieces that are known to be weaker than the pieces C is a compound of is no evidence, as it was never proven that the value of a piece is equal to the sum of its compounds. (The Queen is an accepted counter-example.)

As to the draw margin: I usually took that as 1.5 Pawn, but that is very close to 4/3, and my only reason to pick it was that it is somewhere between 1 and 2 Pawns. And advantage of 1 Pawn is often not enough, 2 usually is. But 'decisive' is a relative notion. At lower levels games with a two-Pawn advantage can still be lost. GMs would probably not stand much chance against Stockfish if they were allowed to start with a two-Pawn advantage. At high levels a Pawn advantage was reported by Kaufmann to be equivalent to a 200-Elo rating advantage.


Kevin Pacey wrote on Tue, Mar 12, 2024 11:44 AM UTC in reply to H. G. Muller from Sun Mar 10 09:34 PM:

I may be wrong, but I thought your first two paragraphs in this post of yours I'm replying to indicated that you thought pieces have a 'well-defined value'. Call me mistaken for thinking you meant that there is an absolute truth.


H. G. Muller wrote on Tue, Mar 12, 2024 01:16 PM UTC in reply to Kevin Pacey from 11:44 AM:

"Well-defined value" was used there in the sence of "universally valid for everyone that uses them". (Which does not exclude that there are people that do not use them at all, because they have better means for judging positions. Stockfish no longer uses piece values... It evaluates positions entirely through means of a trained neural net.)  If that would be the case, it would not be of any special interest to specifically investigate their value for high-rated players; any reasonable player would do. I already said it was not clear to me what exactly you wanted to say there, but I perceive this interest in high ratings as somewhat inconsistent. Either it would be the same as always, and thus not specially interesting, or the piece values would not be universal but dependent on rating, and the whole issue of piece values would not be very relevant. It seems there is no gain either way, so why bother?


Kevin Pacey wrote on Tue, Mar 12, 2024 05:43 PM UTC in reply to H. G. Muller from 01:16 PM:

Chess piece values in beginner books (N=B=3, R=5, Q=9) are in fact little white lies to merely simplify their lives (other, unrelated, common white lies also exist - some are the fault of books simply being very old, and/or by poor authors). As you get more experienced/read advanced books, you are told/discover to generally not trade 2 minor pieces for R and P, at least not before the endgame. Similarly, you are told/discover to generally not trade 3 minor pieces for a Q. Also, don't trade a minor piece for 3 pawns too early in a game, as a rule of thumb.

World Champion Euwe, for example, had a set of piece values that tried to take all that into account, yet stay fairly true to the crude but simple to recall beginner values. His values were N=B=3.5, R=5.5 and Q=10 (noting that one thing beginner values get right is 2R=Q+P). Some of the problems of assigning piece values go away if you worry more about satisfying the advanced equations for 2 for 1 and 3 for 1 trades when thinking about such possibilities during a game (or as part of an algorithm).

Euwe did not bother to give a B any different value than N numerically, although he examined single B vs. single N cases in chapter(s) in a Middlegame Book volume (with co-author Kramer). Various grandmasters have historically given a B as having a [tiny] edge in value over a knight - some didn't pin themselves down, and wrote something like N=3, B=3+, the '+' presumably being a small fraction. Since I prefer Q=B+R+P=10, I have B=3.5 to keep that equation tidy, and have N=3.49 completely arbitrarily in my own mind (but generally leave it as 3.5 when writing a set of values, for the sake of simplicity).

To my mind, anyway, there may be a way I haven't mentioned until now to establish close to an absolute true value difference between B and N, if any, if enough decisive 2700+ games can ever be included in a database. For the wins and losses comparison, if you can somehow establish that having the B or the N was The decisive reason for the game's result, after an initial small error or two by the loser, that's the kind of decisive game that really matters. Yes, that raises the number of games you would need in such a database even way more. That's a theory, though again something impractical at present.


H. G. Muller wrote on Tue, Mar 12, 2024 06:23 PM UTC in reply to Kevin Pacey from 05:43 PM:

Well, the values that Kaufman found were B=N=3.25, R=5 and Q=9.75. So also there 2 minor > R+P (6.5 vs 6), minor > 3P (3.25 vs 3), 2R > Q (10 vs 9.75). Only 3 minor = Q. Except of course that this ignores the B-pair bonus; 3 minors is bound to involve at least one Bishop, and if that broke the pair... So in almost all cases 3 minors > Q.

You can also see the onset of the leveling effect in the Q-vs-3 case: it is not only bad in the presence of extra Bishops (making sure the Q is opposed by a pair), but also in the presence of extra Rooks. These Rooks would suffer much more from the presence of three opponent minors than they suffer from the presence of an opponent Queen. (But this of course transcends the simple theory of piece values.) So the conclusion would be that he only case where you have equality is Q vs NNB plus Pawns. This could very well be correct without being in contradiction with the claim that 2 minors are in general stronger.

BTW, in his article Kaufman already is skeptical about the Q value he found, and said that he personally would prefer a value 9.50.

If you don't recognize teh B-pair as a separate term, then it is of course no miracle that you find the Bishop on average to be stronger. Because i a large part of the cases it will be part of a pair.


Kevin Pacey wrote on Tue, Mar 12, 2024 07:44 PM UTC in reply to H. G. Muller from 06:23 PM:

Grandmaster Nigel Short once told me, in so many words, that B+(2 connected passed pawns) generally beats R in an endgame. However, more generally, I have trouble believing N+2 pawns is even = to R, at least in endgames where the pawns are not all part of big healthy pawn island(s), which may be the average case in absolute reality.

The equation Q=R+B+P might seldom exactly hold true in a given chess position. As an observation we discussed long ago, sometimes a mixed bag of units that sticks [defensively] together well holds it own (at the least) vs. a Q, especially if she does not have the initiative (if either side does). However, my intuition tells me that Q is preferable to R+B+P in most cases that could ever arise, i.e. on average (maybe even more so than 2 minors outweigh R+P before an endgame on average), since games tend to open up, and that may favour the Q, for one thing (games often eventually opening up is sometimes given as a reason for thinking B>N on average).

So, a feather in Kaufman's cap here for finding the odd-looking value of the Q compared to R+B+P value. The only issue I have is, Q=R+B+P is such a darn useful/appealing rule of thumb for estimating the value of a Q in quick and dirty fashion, even in chess variants - such a fashion can serve players on CVP's GC while more accurate values are waiting to be found for the ever expanding number of variants played here.


H. G. Muller wrote on Wed, Mar 13, 2024 06:51 AM UTC in reply to Kevin Pacey from Tue Mar 12 07:44 PM:

The problem with Pawns is that they are severely area bound, so that not all Pawns are equivalent, and some of these 'sub-types' cooperate better than others. Bishops in principle suffer from this too, but one seldomly has those on equal shades. (But still: good Bishop and bad Bishop.) So you cannot speak of THE Pawn value; depending on the Pawn constellation it might vary from 0.5 (doubled Rook Pawn), to 2.5 (7th-rank passer).

Kaufman already remarked that a Bishop appears to be better than a Knight when pitted against a Rook, which means it must have been weaker in some other piece combinations to arrive at an equal overall average. But I think common lore has it that Knights are particularly bad if you have Pawns on both wings, or in general, Pawns that are spread out. By requiring that the extra Pawns are connected passers you would more or less ensure that: there must be other Pawns, because in a pure KRKNPP end-game the Rook has no winning chances at all.

Rules involving a Bishop, like Q=R+B+P are always problematic, because it depends on the presence of the other Bishop to complete the pair. And also here the leveling effect starts to kick in, although to a lesser extent than with Q vs 3 minors. But add two Chancellors and Archbishops, and Q < R+B. (So really Q+C+A < C+A+R+B).


Diceroller is Fire wrote on Wed, Mar 13, 2024 08:49 AM UTC:

Just off topic idea: what if Shogi Pawn will have a value of 1?


Kevin Pacey wrote on Fri, Mar 15, 2024 04:33 PM UTC in reply to H. G. Muller from Tue Mar 12 01:16 PM:

Not to say I don't trust your post I'm replying to, H.G., but as they say, 'trust but verify'...

A not-too-old answer I saw when I Googled 'Does Stockfish use piece values', as found on 'Quora':

'In chess analysis, computer tools like Stockfish, Komodo, and AlphaZero help us know the importance of each chess piece during the game. They use calculations to assign a value to each piece based on factors like mobility, king safety, and board position...'(12 Sep 2023, Tato Shervashidze, Chess Coach...)

If that's true, such computers are actively doing 'calculating' of their piece values (rather than relying on e.g. statistical-studies-generated ones that are generalizations), on a position-by-position basis in a given game that they are playing.

That's also rather than by using piece values calculated before the start of any play whatsoever, say in the sort of way Betza tried to calculate fairy piece values (or my own cruder way(s) of estimating such values, i.e. in quick and dirty fashion).


H. G. Muller wrote on Fri, Mar 15, 2024 06:47 PM UTC in reply to Kevin Pacey from 04:33 PM:

'In chess analysis, computer tools like Stockfish, Komodo, and AlphaZero help us know the importance of each chess piece during the game. They use calculations to assign a value to each piece based on factors like mobility, king safety, and board position...'(12 Sep 2023, Tato Shervashidze, Chess Coach...)

It is not only false, but it sounds like total nonsense to me. For one, AlphaZero is not comparable in any respect to Komodo or Stockfish; everything is different, and naming them in one breath already exposes the one who says this as completely ignorant on the subject of computer chess. (Which of course doesn't exclude he is a good Chess coach or has a high rating.)

In the past few years there has been a revolution in chess programming, after it had been converging to a method thought to be optimal for several decades. Initially programs were scoring positions at the leaves of a look-ahead search tree by a static (= not playing out any moves) heuristic that is now called a Hand-Crafted Evaluation. Piece values were a major part of that, often interpolated between 'opening' and 'end-game' values depending on the strength of the material still on board. The positional terms were Piece-Square Tables (accounting for mild general position dependence of piece values, without taking note of the location of other pieces, such as that Knights are poor at edges, and even poorer in corners), mobility (the actual number of moves a piece has in the current position), King safety (the number of squares around the King attacked by opponent pieces, and the value and number of these pieces), Pawn structure (passer advance, isolated / backward and doubled Pawns)

These parameters were never calculated (for orthodox Chess engines), but often were tuned. This was done by taking a large data set (like 500,000) of quiet positions from games with known result, and then tweeking all the bonuses and penalties (including piece values) that were used in the HCE until the calcuated evaluation score correlated best with the game result.

Than came AlphaZero out of nowhere, with everything completely different. It used a neural network for evaluation of positions as well as for guiding the search. This network simulates a brain with millions of cells, in some 40 layers, with tens of millions of connections between them. And they tuned the strength of those connections by having the thing play chess against itself. No one knows what each connection represents, but the result is that it eventually it could very accurately predict the winning probability for a position, apparently paying attention even to subtle strategic condiderations.

After that a hybrid form was invented: NNUE (for Easily Updatable Neural Network; no idea why they spelled it backwards...). This uses a conventional (unguided by any NN) search to calculate ahead, but at the end of each line evaluates by a NN of a peculiar design. It does not use explicit piece values, but calculates something very similar to Piece-Square Tables (which can be seen as a sort of piece values specified by location of the piece, and can simulate a plain piece value by specifying that same value on every square). Except that it does have such a PST for each location of the King. So the value of a piece cannot be dependent only on its absolute location, but also on how it is positioned relative to the King. (Well, this was invented for Shogi, and there proximity to the King is often more important than the intinsic strength of the piece type...). And it doesn't have one such a 64x64 table for each piece type, but 256 of them. And all these 256 values of each piece (on its current location, for the current King location) are than fed into a NN of 5 layers with 32 cells per layer, to combine them, until finally a single number appears at the output. This NN is then trained by tuning all the 256x64x64x6 values in the KPST, and the strength of the 4000 connections in the NN to reproduce the win probability of a huge data set of quiet positions, as good as it can.

This works, but after this no one knows what exactly the NN does. None of the values in the KPST in the optimally trained NN have the slightest resemblance to piece values as we know them. We cannot identify a King-Safety part, or a Pawn-Structure part, or a mobility part. It is just one totally integrated complete mess of totally meaningless multiplier parameters, that magically manage to conspire to give a very accurate prediction for who has the better winning chances in a given position. Stockfish and other strong engines now all use NNUE evaluation, (because they typically gain ~80 Elo compared to their original HCE), and the main development towards higher Elo comes from finding better sets for training it, or playing a little bit with the size of the NN. (Large NN can predict more accurately, but slow doen the engine, so that it cannot look as far ahead.)


Kevin Pacey wrote on Fri, Mar 15, 2024 08:01 PM UTC in reply to H. G. Muller from 06:47 PM:

Well, I knew the fellow might easily be wrong about Komodo. However, previously I had seen Stockfish used a neural network (at least to some extent) starting 2020 - unless that's more false stuff on the internet too (maybe a crusade for truth online could extend beyond chess variants, back to chess itself!?):

https://en.wikipedia.org/wiki/Stockfish_(chess)#:~:text=Starting%20with%20Stockfish%2012%20(2020,leaving%20just%20the%20neural%20network.

edit: it is a similar story as of 2020 for Komodo, apparently:

https://en.wikipedia.org/wiki/Komodo_(chess)#:~:text=On%20November%209%2C%202020%2C%20Komodo,networks%20in%20its%20evaluation%20function.


H. G. Muller wrote on Fri, Mar 15, 2024 08:52 PM UTC in reply to Kevin Pacey from 08:01 PM:

I don't keep close tabs on the development of Stockfish. But there are always many forks around, and sooner or later the best of each will be adopted into the official main branch. 2020 as the start of the NNUE mania sounds about right. And there might be hybrid versions around, which still relied in part on hand-crafted terms, added to the NN output to get the total score. I would expect this to have some advantages for terms like Pawn structure; it will be hard for a NN to extract Pawn-Structure info from King-Piece-Square tables. But it seems the latest Stockfish relies entirely on the NN.

It would be funny to test it on positions that it has certainly not seen in its training set, like 3Q vs 7N. It might be at a total loss for what to do. (Not thet the HCE did such a good job on that...)

 


13 comments displayed

EarliestEarlier Reverse Order Later

Permalink to the exact comments currently displayed.