Comments/Ratings for a Single Item
![An unknown type!](/index/script.gif)
I made a couple significant changes to the code I posted here earlier. One is a bug fix. The uasort on $players should be a usort. A uasort leaves array keys intact, but my subsequent use of the array assumed that array keys had been changed. The other change is to the generation of the second two sets of ratings. Between the second and third sets, I changed the order of the $players array, essentially twisting it inside out in a spiral, so that the order is very different. I then calculated the last two sets of ratings in the otherwise same zig-zagging and reverse zig-zagging order of the first two sets. Here's the code I used to change the order: $neworder = array(); $midpoint = (int)floor($pc/2); for ($i = 0; $i < $pc; $i++) { $neworder[$i] = ($i & 1) ? $players[$midpoint - (int)ceil($i/2)] : $players[$midpoint + $i/2]; } $players = $neworder; I may change this in the future. I am thinking of evaluating all pairs in a single order, based on which pairs have played the most games together. This would start with the pairs that are most likely to give reliable ratings and move on to pairs less likely to give reliable ratings. This would help make the ratings of the latter more reliable when it finally got to them, and it is probably the best order overall for reliable ratings.
Roberto,
You were right about it not accounting for draws. Although it was supposed to, it was checking for the wrong string. So it was missing all the draws. This has now been fixed.
Roberto writes:
One point of serious discussion is whether a rated player must lose rating when defeated by a less rated player, and if the response is accepted to be YES, how much must be the loss?
I'm not sure this question makes sense within the context of how GCR works. When GCR compares two opponents, it uses the total of their scores in a single comparison, and it does not compare them on a game-by-game basis. But let's suppose two players play only one game together, and the higher rated player loses to the lower rated player. In that case, the higher rated player will lose points. The loss of points is determined by the difference between their ratings, by the stability of the higher rated player's rating, and by the reliability of the score, which for only one game is its lowest at 25.
I think only recognized variants or variants which have made it into one or two Game Courier Tournaments should be considered for an overall rating anyway. (A game may need some fixing in the rules or in their writing. There have been recent ambiguities about Rococo or Switching Chess. More annoyingly, my own ill-considered Pocket Polypiece Chess setup gave me the opening advantage of one Pawn against George Duke.)
I'm not going to restrict which Chess variants can be rated. I will let players decide which games they want rated, and I will allow this script to work with any game. On the matter of multivariant ratings, which is what I think Antoine may mean by overall ratings, this script comes with various filters. None will distinguish recognized variants from others, but they do offer the option of looking at ratings for games played in specific tournaments. If you want to look at games played in any tournament, change the Tournament Filter to ?*. This will filter out null strings.
In regard to the 'bell-curved scale of probabilities,' I think we should be seeing the bell-curve as a distribution of the number of players (y-axis) with respect to playing strength (x-axis). Thus giving us the bell. But perhaps we are refering to 2 different curves here. In regard to probability, I read that a 200 point rating difference implies that the higher rated player should be winning 3 out of 4 games between the 2. I can look up the source later. I again mention the following website as it offers a relatively simple method of rating calculation. I still believe that it may be of great value in the CV Ratings project. The site includes example calculations. http://www.chess-express.com/explain.htm
I'm sure we are referring to two different bell curves. The Elo method specifically tries to measure probabilities, and the probabilities fall along a bell curve. The following page lists the probabilities associated with various differences of Elo points: http://www.ascotti.org/programming/chess/elo.htm The two axes are (1) the difference between two Elo ratings and (2) the probability that one player will defeat the other in a game. In contrast, the GCR method correlates (1) the difference between GCR ratings with (2) an expected total score on the games played between two players. This relation is linear.
The CxR concept is as follows... and to use it for CV seems easy. New Rating = Your Old Rating PLUS (Score x 21) PLUS (Pre-Game Rating of Opponent MINUS Your Pre-Game Rating) [divided by 25] where Score is +1 for a WIN, 0 for a DRAW, -1 for a LOSS So, if I am 1800 and my opponent is 1900 and I win: My New Rating would be: 1800 + (1 x 21) + (1900 - 1800)/25 My New Rating would be: 1800 + 21 + 4 = 1825 For the 1900 guy I played we'd see: His New Rating = 1900 + (-1 x 21) + (1800 -1900) / 25 His New Rating = 1900 -21 -4 = 1875 The website I mentioned has different examples and includes unrated player calculations. But, even if we apply the CxR to the initial Ratings you (Fergus) have calculated, this system will polish the values over time and we will have numbers close to those seen in the USCF. The winning probability is actually irrelevant when using this system.
I'm returning to a matter Roberto raised earlier. Suppose a 3000 rated player plays one game against a 1500 rated player and loses. As the system works right now, the 3000 rated player would lose 300 points. In contrast, if two 1500 rated players played one game together, the loser would lose 100 points. Therefore, I think the formulas for reliablity and stability need to be changed. One idea is this: reliability = pow((number of games played together),2) stability = pow((abs(old_rating - 1500)/100 + 1), 2) With these formulas, the 3000 rated player would lose about six points for losing a single game to a 1500 rated player. Furthermore, both scores would be equal for one game played between 1500 rated players, as they both are now, and this would have the same effect. When two 1500 rated players played a single game together, one would get 1600 and the other 1400.
Michael, The purpose of a rating system is to measure relative differences between playing strength. I can't emphasize the world relative enough. The best way to measure relative playing strength is a holistic method that regularly takes into account all games in its database. One consequence of this is that ratings may change even when someone stops playing games. This makes the method more accurate. The Elo and CXR methods have not been holistic, because a holistic method is not feasible on the scale these systems are designed for. They have to settle for strictly sequential changes. Because GCR works in a closed environment with complete access to game logs, it does not have to settle for strictly sequential changes. It has the luxury of making global assessments of relative playing strength on the basis of how everyone is doing. A separate issue you raised is of a 3000 rated player losing less points than a 1500 rated player. Since last night, I have rethought how to use and calculate stability. Instead of basing stability on a player's rating, I can keep track of how many games have so far factored into the estimate of each player's rating. One thought is to just count the games whose results have so far directly factored into a player's rating. Another thought is to also keep track of each opponent's stability, keep a running total of this, and divide it by the number of opponents a player has so far been compared with. I'm thinking of adding these two figures together, or maybe averaging them, to recalculate the stability score of each player after each comparison. Thus, stability would be a factor of how reliable an indicator a player's past games have been of his present rating. That covers my new thoughts on recalculating stability. As for using it, I am thinking of using both player's stability scores to weigh how much ratings may change in each direction. I am still trying to work out the details on this. The main change is that both stability scores would affect the change in rating of both players being compared. In contrast, the present method factors in only a player's own stability score in recalculating his rating. One consequence of this is that if a mid-range rated player defeats a high-rated player, and the mid-range player has so far accumulated the higher stability score, the change will be more towards his rating than towards the high-rated player's rating. The overall effect will be to adjust ratings toward the better established ratings, making all ratings in general more accurate.
Roberto,
As you know, there is a language barrier between us. It sometimes gets in the way of understanding you, as it has with your misuse of the word notorious. I am simply asking for clarification on what you are trying to say.
Anyway, I am in agreement with the last points you have raised. A single game should not have a great effect on a player's score, and the formulas need more tweaking, but it's not an easy task.
well if you don't play games Michael, your rating will drop :) looks like mine will be dropping too he he. (i'm kinda a little shocked by that) not that i really care but, i must be bored, but doesn't that mean, if you have two players that have a 'true' rating (played many rated games) of 1500, and one of them is inactive for a bit, therefore rating drops, now if these players play, it will be a game between 2 players where one is higher rated than the other, where in reality, it should be a game between equals ... wouldn't that distort ratings after outcome? another thing, fair amount of games played are more in the spirit of TESTING OUT A VARIANT, more than anything else. i agree with those that said that only 'tournament games' should be rated, unless people agree otherwise beforehand. as far as 1500 vs 3000, and 1500 rises 750 points if wins, surely that is too much. i agree that 3000 player should not drop 'heavily' finally (yawn), are we going to see people less likely to put up a challenge because of fear of someone much less rated accepting? will this lead to 'behind the scenes' arranging of games? if a vote was taken, would more people want ratings than not? sorry for length, just adding food for thought.
25 comments displayed
Permalink to the exact comments currently displayed.