[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]
Comments/Ratings for a Single Item
Well, I am not sure how it works, but I suggest revise my rating performance. I have some doubts, perhaps it should be a bit under-estimated.
My record in th last 365 days is: 11 games won, 8 games drawn, 7 games lost and one game undecided. I suspect there is something wrong in the method used, but if it is not the case, I suggest adopt another method more indicative of the real performance. I also suggest a minimum of 1000 and a maximum of 3000.
In my opinion, not all games should be rated. At BrainKing, for example, you can play rated or unrated games. There is a checkbox to indicate at the game invite whether it will be rated or not. Yahoo has this feature as well. Players also have indivdual ratings for each game-type played. Someone could be 1920 at Chess, 1700 for Shogi, 1800 for Xianqi, as an example. In my case, I play some games, like tournament games seriously... but others I play strictly for fun, taking only seconds to move after seeing the position... If all games were to be rated I'd have to give up the fun 'coffee-house style' games.
I just made a modification that reversed the order of the ratings. This is a bug I'm now working on.
The bug is now gone. I just had to change -= to +=.
Fergus has provided a volatile ratings system that will definitely add some excitment to our games here. Perhaps an 'Adjusted GCR' ranging between 500 and 2500, would be a little easier on the players' nerves, so to speak. This can be calculated by [A] doubling the GCR, [B] adding 1500, [C] dividing the total by 3.
Just saw Gary's Comment. I usually play no more than five casual games at a time, concentrating on them as if they were tournament games. Hence my (misleading) high rating.
Very nice. Interesting. I may also help to pair players, finding fair partners and also more challenging ones. Of course, the wide variety of variants makes this rating less 'firm' than chess ratings. I think that Gary's suggestion to allow a game to be rated or not would be a nice addition. This could be a choice made at the start of the game, allong with time controls, etc. This would also allow some players to develop rated 'niches', such as Shogi, or Shatranj-variants, while playing other games 'just for fun.' Some may just want to be rated generalists. Others may not want to be rated at all.
I have changed how the portion function works since last night. I plan to keep the range between 0 and 3000, since that is the range used for ratings by both FIDE and USCF, and I want to mirror that. The problem right now is that the method I'm using too quickly gives high and low scores. I probably need a different function for what I'm calling a plussum, so-called because I'm mainly doing this all from scratch without established terminology. The function for this is (games * 1500) / (games + 3). This starts out with the small amount of 375, but it rapidly increases toward 1500. At only ten games, the value is already approximately 1154. If this function were slower, we would see less disparity between the ratings. Maybe something logarithmic would work.
I have now made the results much more conservative by giving an average rating of 1500 to each pair of players who have never played each other. I have scrapped the idea of using logarithms with the plussum function.
This is David 'misleading high rating' Paulowich again. Trying to kill two birds with one stone, I offer the Weighted Game Courier Rating (WGCR) equal to the average of 3 numbers: 1500, the player's GCR for all games, and the player's GCR computed for games played in the last 365 days. While leaving the original (highly volatile ) GCR intact for internal use, the WGCR can be used on the web page to provide a rating with more recent games 'counting double'. My Canadian rating has bounced around 1800 in recent years, so I certainly do not feel constrained by ratings restricted to a 500 to 2500 range.
Would it be difficult to implement ELO rating system here?
I may eventually use Elo or make it more like Elo, but I have to do more studying of Elo and other methods.
Fergus, please revise the 'DRAW EFFECT' in your algorithm, I suspect there is a bug. Generally speaking, The method used IS NOT good, it is important the rating of the contendor to make changes in the RATINGS after the game, this is usual in Chess and other games. ELO is one of the best measures, and it has proven to be really indicative, as it has been confirmed by experience. You can see how, in some international tournaments, players of different countries that have never played before against others in the Tournament neither in the same other players environment or usual contendors, can play more or less according to the forces indicated by ELO rating. A good way to do it in our case can be assign, initially, 'preliminar ELO´s' according to the results in POINTS over NUMBER OF GAMES, using any reasonable function. After that, use ELO method over the history of the last 365 days, and it can be PONDERED by a factor, say 0.5, if the game is NOT a Tournament game. In this way, we can obtain a more reasonable measure. My Chess ELO has oscilated, over my past 20 years history, between 1800 and 1950. I don´t expect this is necessarily my rating in different CV´s, but I doubt it sounds reasonable enormous deviations from these values, and if you observe my results against some players better ranked with your method, you can think that something may be wrong, and by these reasons I have some reasonable doubts about the method used and its goodness for the purposes.
The idea of retro-active ratings is a displeasing 'after-the-fact' concept to me. The reason has to do with coffee-house style care free games versus tournament rated games. When I played chess at the Edgewater Park Invitational, for example, I only lost 1 game, I drew against a 2000 USCF rated player and only lost to a 2300 USCF rated player. The 6 games involved 10 hours of playing. Now, I also play at a coffee-shop. Wild fun games with just 15 minutes on the clock. These are not rated and we can try bold ideas that we would likely avoid in a rated game. By comparison it as as if someone comes up and says, 'Hey, we've just rated your coffee-house games and fudged them into your USCF rating.' In regard to the 1500+ rating I saw by my name here, by comparison: Best USCF was 1943, Rating in MSN real-time play broke above 2000, rating against Chess computers reached 2110. Current Gothic Chess is about 1975. In regard to Chess Variants I would like to see the following system, which seems fair: All players start at 1600 (or 1500 if preferred). Rated games do not begin until New games are started, say to go in effect Jan. 15th (or some other date). In-progress games which may have been for fun or simply experimental, do not count. As example: When I played a fun-game of Maxima with Roberto, he gave me lessons during the game. Very helpful lessons. Would that have happened in a rated game?
Gary is right, many games people have played have been coffe-games, test games or fun games in which result was not important, but it can be used for a 'first' number, as follows. This is s pseudo-ELO idea, and it can be good for us: At first, we need a 'preliminar' measure, but it is going to be modified after the first calculation, and it is going to be an adjusted measure with time, once all people is concious about how it works. The first number is A= 1000+ (Points/Number fo games)*1500, for everyone. Points is calculated as usual, 1 for win, 0.5 for draw. Number of games refers to LOGS in the last 365 days. After that, we can run an algorithm, sequenced by time, and in each game the rating is modified as follows: If player A has a highest rating than B, and he wins, his rating is modified with the added change rate= K*(2500 - Rating of player A)/2500. K is a factor, usually 1, but it can be modified in tournaments, to, say, 2 or 3. This change rate is also substracted of the player´s B rating. If Player B wins, his rating is modified with the added change rate= K*(Rating of player A - Rating of player B)/C. C is a number that must indicate how fast we need reflect the 'force change' in a player. I suggest C=100, and K is the 'Tournament factor'. This rate change is substracted from player´s A rating. In case of draws, the last rule applies, but with rate change divided by two. An unrated player is considered, at first, with a rating of 1000, unless he gives some evidence of another rating, and it can be used, translated to our scale. Try some examples, and you can see that this is a very reasonable measure.
Since all the previous comments were written, I have completely changed how ratings are calculated. Details are given on the page. It bears more similarity to Elo, but it is still very different from Elo. Unlike Elo, which is designed for retaining information across tournaments without keeping extensive records of all games played, this method is designed with the assumption that all the records are available. This is the basis for the reliability measure used to allot a portion of the performance rating between two players. This method is also designed to avoid some of the problems of Elo, such as good players losing points by playing inferior opponents even without losing any games.
I expect to eventually add a rating option, then rate only games intended to be rated. But that will have to wait until I've settled on a method. For now I need the data to test the ratings methods I try out. I think I have a good one now, but I'll give it some time for consideration. Comments on it are welcome.
The following website discusses rating calculations. It may be of use in the CV Ratings project. The site includes example calculations. http://www.chess-express.com/explain.htm
Plaese write the whole algorithm with all details, and I´ll try to analyze it more deeply. As stated, it does not look very clear, and some aspects may be subject of further discussion.
Here is what the code look like after the files are read and the $wins
table is made. The $wins table is a two-dimensional array of pairwise
wins, for which $wins[$p1][$p2] is the number of times $p1 has beaten $p2.
Draws count as half wins for both players.
// Calculate various values needed for rating calculations
$players = array_keys($wins);
// Initialize all ratings to 1500
reset ($players);
foreach ($players as $pl) {
$gcr[$pl] = $gcr1[$pl] = $gcr2[$pl] = $gcr3[$pl] = $gcr4[$pl] = 1500;
}
// Count opponents and games of each player
reset ($players);
$pc = count($players);
for ($i = 0; $i < $pc; $i++) {
$p1 = $players[$i];
$opponents[$p1] = 0;
$gameswon[$p1] = 0;
$gameslost[$p1] = 0;
for ($j = 0; $j < $pc; $j++) {
$p2 = $players[$j];
if (($p1 != $p2) && ($wins[$p1][$p2] || $wins[$p2][$p1])) {
$opponents[$p1]++;
$gameswon[$p1] += $wins[$p1][$p2];
$gameslost[$p1] += $wins[$p2][$p1];
}
}
$gamesplayed[$p1] = $gameswon[$p1] + $gameslost[$p1];
$percentwon[$p1] = ($gamesplayed[$p1] > 0) ? (($gameswon[$p1] * 100) /
$gamesplayed[$p1]) : 0;
}
// Sort players
function psort ($p1, $p2) {
global $opponents, $gamesplayed;
if ($opponents[$p1] > $opponents[$p2])
return -1;
elseif ($opponents[$p1] < $opponents[$p2])
return 1;
elseif ($gamesplayed[$p1] > $gamesplayed[$p2])
return -1;
elseif ($gamesplayed[$p1] < $gamesplayed[$p2])
return 1;
elseif ($gameswon[$p1] > $gameswon[$p2])
return -1;
elseif ($gameswon[$p1] < $gameswon[$p2])
return 1;
else
return 0;
}
uasort ($players, 'psort');
// For each pair, calculate new ratings based on previous ratings.
// Use zig-zagging order to optimize interdependency between ratings.
for ($i = 1; $i < $pc; $i++) {
for ($j = $i; $j < $pc; $j++) {
$p1 = $players[$j-$i];
$p2 = $players[$j];
$n = $wins[$p1][$p2] + $wins[$p2][$p1];
if ($n == 0)
continue;
$stability1 = (abs($gcr1[$p1] - 1500) + 500) / 20;
$stability2 = (abs($gcr1[$p2] - 1500) + 500) / 20;
$reliability = (100 * $n) / ($n + 3);
$gap = abs($gcr1[$p1] - $gcr1[$p2]);
$lowpoint = min($gcr1[$p1],$gcr1[$p2]);
$midpoint = $lowpoint + $gap/2;
$gap = max($gap, 400);
$lowpoint = min($lowpoint, $midpoint - 200);
$pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n;
$pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n;
$gcr1[$p1] = (($stability1 * $gcr1[$p1]) + ($reliability * $pr1)) /
($stability1 + $reliability);
$gcr1[$p2] = (($stability2 * $gcr1[$p2]) + ($reliability * $pr2)) /
($stability2 + $reliability);
}
}
// Calculate all ratings again in reverse zig-zagging order.
for ($i = $pc-1; $i > 0; $i--) {
for ($j = $pc-1; $j > $i; $j--) {
$p1 = $players[$j-$i];
$p2 = $players[$j];
$n = $wins[$p1][$p2] + $wins[$p2][$p1];
if ($n == 0)
continue;
$stability1 = (abs($gcr2[$p1] - 1500) + 500) / 20;
$stability2 = (abs($gcr2[$p2] - 1500) + 500) / 20;
$reliability = (100 * $n) / ($n + 3);
$gap = abs($gcr2[$p1] - $gcr2[$p2]);
$lowpoint = min($gcr2[$p1],$gcr2[$p2]);
$midpoint = $lowpoint + $gap/2;
$gap = max($gap, 400);
$lowpoint = min($lowpoint, $midpoint - 200);
$pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n;
$pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n;
$gcr2[$p1] = (($stability1 * $gcr2[$p1]) + ($reliability * $pr1)) /
($stability1 + $reliability);
$gcr2[$p2] = (($stability2 * $gcr2[$p2]) + ($reliability * $pr2)) /
($stability2 + $reliability);
}
}
// Calculate all ratings again in half reverse zig-zagging order.
for ($i = 1; $i < $pc; $i++) {
for ($j = $pc-1; $j > $i; $j--) {
$p1 = $players[$j-$i];
$p2 = $players[$j];
$n = $wins[$p1][$p2] + $wins[$p2][$p1];
if ($n == 0)
continue;
$stability1 = (abs($gcr3[$p1] - 1500) + 500) / 20;
$stability2 = (abs($gcr3[$p2] - 1500) + 500) / 20;
$reliability = (100 * $n) / ($n + 3);
$gap = abs($gcr3[$p1] - $gcr3[$p2]);
$lowpoint = min($gcr3[$p1],$gcr3[$p2]);
$midpoint = $lowpoint + $gap/2;
$gap = max($gap, 400);
$lowpoint = min($lowpoint, $midpoint - 200);
$pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n;
$pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n;
$gcr3[$p1] = (($stability1 * $gcr3[$p1]) + ($reliability * $pr1)) /
($stability1 + $reliability);
$gcr3[$p2] = (($stability2 * $gcr3[$p2]) + ($reliability * $pr2)) /
($stability2 + $reliability);
}
}
// Calculate all ratings again in reverse half reverse zig-zagging order.
for ($i = $pc-1; $i > 0; $i--) {
for ($j = $i; $j < $pc; $j++) {
$p1 = $players[$j-$i];
$p2 = $players[$j];
$n = $wins[$p1][$p2] + $wins[$p2][$p1];
if ($n == 0)
continue;
$stability1 = (abs($gcr4[$p1] - 1500) + 500) / 20;
$stability2 = (abs($gcr4[$p2] - 1500) + 500) / 20;
$reliability = (100 * $n) / ($n + 3);
$gap = abs($gcr4[$p1] - $gcr4[$p2]);
$lowpoint = min($gcr4[$p1],$gcr4[$p2]);
$midpoint = $lowpoint + $gap/2;
$gap = max($gap, 400);
$lowpoint = min($lowpoint, $midpoint - 200);
$pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n;
$pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n;
$gcr4[$p1] = (($stability1 * $gcr4[$p1]) + ($reliability * $pr1)) /
($stability1 + $reliability);
$gcr4[$p2] = (($stability2 * $gcr4[$p2]) + ($reliability * $pr2)) /
($stability2 + $reliability);
}
}
// Average all four sets of ratings.
// This helps minimize the effects of using any order, making ratings more
homogonous.
for ($i = 0; $i < $pc; $i++) {
$p1 = $players[$i];
$gcr[$p1] = ($gcr1[$p1] + $gcr2[$p1] + $gcr3[$p1] + $gcr4[$p1]) / 4;
}
// Sort ratings
arsort ($gcr);
reset ($gcr);
// Print table of ratings
echo '
';
Name | Userid | GCR | Percent won |
---|---|---|---|
%s | %s | %d | %.2f |
The intention is good, but the main problem is that you are gathering apples, pears, mangoes and kiwis together, and the method does not make sense if some details are not well considered. What are we trying to rate?. If we are trying to measure multi-variant skills, the method does not work. By example, a player and his 6 y.o. son may decide play 'AMAZONS', and they decide play only between them. Suppose the father is much more stronger than his son, but nothing special. Luck can´t help too much in this game, and, after a couple of months we can have, as the best rated player in Game Courier, the father, after 120 victories, and 0 loses. The highest rank in our site can correspond to a player that only plays a game that is not a chess variant, and the player itself is an average player, and only plays against a child. We need consider multi-variant games, and I can give similar examples as I did to show that not all games must be rated, only 'officially rated' games. By example, Tournament games or concertated rated games, and the multi-variant purpose must be considered, or decide apply the method to each variant independently, the mix makes not sense here in this way.
well i know one thing we are rating here, and that is 'unrated games'.
all i know is, i played 5 unrated games, so can someone tell me how i got
a rating from that. it is just the principal of the thing.
also i am sus about the rating system, it might somewhat suck. i played 2
games against a 1600 rated player, with 1 win and 1 loss, 2 games against
a 1570 player, again with 1 win 1 loss, and finally a draw with a 1516
player, and my rating is 1462. it would of been the draw, taking my win
percent from 50 to 40, that would of dropped my rating, but anyway, this
is beside the point, they were all unrated games, that is the point.
so can you take my name off the rating list, and we can never speak of
this again lol :)
(please, bring the option for 'unrated games' back as soon as possible) btw, i agree, the intention was good, top points for that, i just don't know why you didn't talk to the players/members beforehand, which would of been nice, or am i wrong, did i somehow miss this conversation?
Roberto, please carefully read the first paragraph on this page and the warning above the 'Game Courier Ratings for *' table. You will see that we are in agreement on the point you raised, but it is not relevant to an evaluation of the method used to rate players here. Christine, I will refer you to an earlier comment of mine dated 2006-01-08. We are in agreement that game ratings should be reserved for games intended to be rated. But, for the time being, I am using what data I have, which are the logs of previously unrated games, to test the method I am designing to rate players. What I wish to focus on right now is whether this method is an accurate enough measure of relative playing strengths between players. In this respect, it is looking good to me, but there might be ways to tweak it.
Christine, Although it would seem unfair that you are rated lower than any of the players you have defeated, further examination of the evidence reveals that it is fair. Your three opponents have all played several more games than you have, and they all have much stronger track records against their opponents than you have against yours. As long as you have played few games, this method works with the assumption that your performance was probably due to luck. As you play more games, as your opponents have, it begins to place more trust in your performances against other players. Like Elo, the method used here is self-correcting. As you play more and more games, it is better able to come up with a more accurate estimate of your relative playing ability. Since you have played only a few games, it doesn't yet have enough information to make a good estimate.
It may be possible to add PROV after a player's rating. This means it is a 'Provisional' rating which only becomes official after 14 games have been played. The USCF uses that system. But perhaps it is not worth the effort. Anyway, it is an idea.
25 comments displayed
Permalink to the exact comments currently displayed.