[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

~~Earlier~~ ⇧Reverse Order⇩ Later⇧ Latest⇩

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

Roberto Lavieri wrote on Sat, Jan 7, 2006 01:44 PM UTC:

Well, I am not sure how it works, but I suggest revise my rating performance. I have some doubts, perhaps it should be a bit under-estimated.

Roberto Lavieri wrote on Sat, Jan 7, 2006 04:04 PM UTC:

My record in th last 365 days is: 11 games won, 8 games drawn, 7 games lost and one game undecided. I suspect there is something wrong in the method used, but if it is not the case, I suggest adopt another method more indicative of the real performance. I also suggest a minimum of 1000 and a maximum of 3000.

Gary Gifford wrote on Sat, Jan 7, 2006 04:25 PM UTC:

In my opinion, not all games should be rated. At BrainKing, for example, you can play rated or unrated games. There is a checkbox to indicate at the game invite whether it will be rated or not. Yahoo has this feature as well. Players also have indivdual ratings for each game-type played. Someone could be 1920 at Chess, 1700 for Shogi, 1800 for Xianqi, as an example. In my case, I play some games, like tournament games seriously... but others I play strictly for fun, taking only seconds to move after seeing the position... If all games were to be rated I'd have to give up the fun 'coffee-house style' games.

🕸📝Fergus Duniho wrote on Sat, Jan 7, 2006 04:45 PM UTC:

I just made a modification that reversed the order of the ratings. This is a bug I'm now working on.

🕸📝Fergus Duniho wrote on Sat, Jan 7, 2006 04:53 PM UTC:

The bug is now gone. I just had to change -= to +=.

Anonymous wrote on Sat, Jan 7, 2006 04:59 PM UTC:

Fergus has provided a volatile ratings system that will definitely add some excitment to our games here. Perhaps an 'Adjusted GCR' ranging between 500 and 2500, would be a little easier on the players' nerves, so to speak. This can be calculated by [A] doubling the GCR, [B] adding 1500, [C] dividing the total by 3.

Just saw Gary's Comment. I usually play no more than five casual games at a time, concentrating on them as if they were tournament games. Hence my (misleading) high rating.

Tony Quintanilla wrote on Sat, Jan 7, 2006 05:00 PM UTC:Excellent ★★★★★

Very nice. Interesting. I may also help to pair players, finding fair
partners and also more challenging ones. Of course, the wide variety of
variants makes this rating less 'firm' than chess ratings. 

I think that Gary's suggestion to allow a game to be rated or not would
be a nice addition. This could be a choice made at the start of the game,
allong with time controls, etc. This would also allow some players to
develop rated 'niches', such as Shogi, or Shatranj-variants, while
playing other games 'just for fun.' Some may just want to be rated
generalists. Others may not want to be rated at all.

🕸📝Fergus Duniho wrote on Sat, Jan 7, 2006 05:18 PM UTC:

I have changed how the portion function works since last night. I plan to keep the range between 0 and 3000, since that is the range used for ratings by both FIDE and USCF, and I want to mirror that. The problem right now is that the method I'm using too quickly gives high and low scores. I probably need a different function for what I'm calling a plussum, so-called because I'm mainly doing this all from scratch without established terminology. The function for this is (games * 1500) / (games + 3). This starts out with the small amount of 375, but it rapidly increases toward 1500. At only ten games, the value is already approximately 1154. If this function were slower, we would see less disparity between the ratings. Maybe something logarithmic would work.

🕸📝Fergus Duniho wrote on Sat, Jan 7, 2006 06:23 PM UTC:

I have now made the results much more conservative by giving an average rating of 1500 to each pair of players who have never played each other. I have scrapped the idea of using logarithms with the plussum function.

David Paulowich wrote on Sat, Jan 7, 2006 06:44 PM UTC:

This is David 'misleading high rating' Paulowich again. Trying to kill two birds with one stone, I offer the Weighted Game Courier Rating (WGCR) equal to the average of 3 numbers: 1500, the player's GCR for all games, and the player's GCR computed for games played in the last 365 days. While leaving the original (highly volatile ) GCR intact for internal use, the WGCR can be used on the web page to provide a rating with more recent games 'counting double'. My Canadian rating has bounced around 1800 in recent years, so I certainly do not feel constrained by ratings restricted to a 500 to 2500 range.

Andreas Kaufmann wrote on Sat, Jan 7, 2006 06:53 PM UTC:

Would it be difficult to implement ELO rating system here?

🕸📝Fergus Duniho wrote on Sat, Jan 7, 2006 07:40 PM UTC:

I may eventually use Elo or make it more like Elo, but I have to do more studying of Elo and other methods.

Roberto Lavieri wrote on Sun, Jan 8, 2006 01:28 PM UTC:

Fergus, please revise the 'DRAW EFFECT' in your algorithm, I suspect
there is a bug. 
Generally speaking, The method used IS NOT good, it is important the
rating of the contendor to make changes in the RATINGS after the game,
this is usual in Chess and other games. ELO is one of the best measures,
and it has proven to be really indicative, as it has been confirmed by
experience. You can see how, in some international tournaments, players of
different countries that have never played before against others in the
Tournament neither in the same other players environment or usual
contendors, can play more or less according to the forces indicated by ELO
rating. A good way to do it in our case can be assign, initially,
'preliminar ELO´s' according to the results in POINTS over NUMBER OF
GAMES, using any reasonable function. After that, use ELO method over the
history of the last 365 days, and it can be PONDERED by a factor, say 0.5,
if the game is NOT a Tournament game. In this way, we can obtain a more
reasonable measure. My Chess ELO has oscilated, over my past 20 years
history, between 1800 and 1950. I don´t expect this is necessarily my
rating in different CV´s, but I doubt it sounds reasonable enormous
deviations from these values, and if you observe my results against some
players better ranked with your method, you can think that something may
be wrong, and by these reasons I have some reasonable doubts about the
method used and its goodness for the purposes.

Gary Gifford wrote on Sun, Jan 8, 2006 02:42 PM UTC:

The idea of retro-active ratings is a displeasing 'after-the-fact' concept
to me.  The reason has to do with coffee-house style care free games versus
tournament rated games.  When I played chess at the Edgewater Park
Invitational, for example, I only lost 1 game, I drew against a 2000 USCF
rated player and only lost to a 2300 USCF rated player.  The 6 games
involved 10 hours of playing.  Now, I also play at a coffee-shop.  Wild
fun games with just 15 minutes on the clock.  These are not rated and we
can try bold ideas that we would likely avoid in a rated game.  By
comparison it as as if someone comes up and says, 'Hey, we've just rated
your coffee-house games and fudged them into your USCF rating.'  In regard
to the 1500+ rating I saw by my name here, by comparison: Best USCF was
1943, Rating in MSN real-time play broke above 2000, rating against Chess
computers reached 2110.  Current Gothic Chess is about 1975.
In regard to Chess Variants I would like to see the following system,
which seems fair:  All players start at 1600 (or 1500 if preferred). Rated
games do not begin until New games are started, say to go in effect Jan.
15th (or some other date).  In-progress games which may have been for fun
or simply experimental, do not count.  As example: When I played a
fun-game of Maxima with Roberto, he gave me lessons during the game.  Very
helpful lessons.  Would that have happened in a rated game?

Roberto Lavieri wrote on Sun, Jan 8, 2006 03:20 PM UTC:

Gary is right, many games people have played have been coffe-games, test
games or fun games in which result was not important, but it can be used
for a 'first' number, as follows. This is s pseudo-ELO idea, and it can
be good for us:
At first, we need a 'preliminar' measure, but it is going to be modified
after the first calculation, and it is going to be an adjusted measure with
time, once all people is concious about how it works. The first number is
A= 1000+ (Points/Number fo games)*1500, for everyone. Points is calculated
as usual, 1 for win, 0.5 for draw. Number of games refers to
LOGS in the last 365 days. After that, we can run an algorithm, sequenced
by time, and in each game the rating is modified as follows: If player A
has a highest rating than B, and he wins, his rating is modified with the
added change rate= K*(2500 - Rating of player A)/2500. K is a factor,
usually 1, but it can be modified in tournaments, to, say, 2 or 3. This
change rate is also substracted of the player´s B rating. If Player B
wins, his rating is modified with the added change rate= K*(Rating of
player A - Rating of player B)/C. C is a number that must indicate how
fast we need reflect the 'force change' in a player. I suggest C=100,
and K is the 'Tournament factor'. This rate change is substracted from
player´s A rating. In case of draws, the last rule applies, but with rate
change divided by two. An unrated player is considered, at first, with a
rating of 1000, unless he gives some evidence of another rating, and it
can be used, translated to our scale. Try some examples, and you can see
that this is a very reasonable measure.

🕸📝Fergus Duniho wrote on Sun, Jan 8, 2006 04:35 PM UTC:

Since all the previous comments were written, I have completely changed how ratings are calculated. Details are given on the page. It bears more similarity to Elo, but it is still very different from Elo. Unlike Elo, which is designed for retaining information across tournaments without keeping extensive records of all games played, this method is designed with the assumption that all the records are available. This is the basis for the reliability measure used to allot a portion of the performance rating between two players. This method is also designed to avoid some of the problems of Elo, such as good players losing points by playing inferior opponents even without losing any games.

🕸📝Fergus Duniho wrote on Sun, Jan 8, 2006 04:45 PM UTC:

I expect to eventually add a rating option, then rate only games intended to be rated. But that will have to wait until I've settled on a method. For now I need the data to test the ratings methods I try out. I think I have a good one now, but I'll give it some time for consideration. Comments on it are welcome.

Gary Gifford wrote on Sun, Jan 8, 2006 06:10 PM UTC:

The following website discusses rating calculations. It may be of use in
the CV Ratings project.  The site includes example calculations.

http://www.chess-express.com/explain.htm

Roberto Lavieri wrote on Mon, Jan 9, 2006 12:41 AM UTC:

Plaese write the whole algorithm with all details, and I´ll try to analyze it more deeply. As stated, it does not look very clear, and some aspects may be subject of further discussion.

🕸📝Fergus Duniho wrote on Mon, Jan 9, 2006 03:20 AM UTC:

Here is what the code look like after the files are read and the $wins table is made. The $wins table is a two-dimensional array of pairwise wins, for which $wins[$p1][$p2] is the number of times $p1 has beaten $p2. Draws count as half wins for both players. // Calculate various values needed for rating calculations $players = array_keys($wins); // Initialize all ratings to 1500 reset ($players); foreach ($players as $pl) { $gcr[$pl] = $gcr1[$pl] = $gcr2[$pl] = $gcr3[$pl] = $gcr4[$pl] = 1500; } // Count opponents and games of each player reset ($players); $pc = count($players); for ($i = 0; $i < $pc; $i++) { $p1 = $players[$i]; $opponents[$p1] = 0; $gameswon[$p1] = 0; $gameslost[$p1] = 0; for ($j = 0; $j < $pc; $j++) { $p2 = $players[$j]; if (($p1 != $p2) && ($wins[$p1][$p2] || $wins[$p2][$p1])) { $opponents[$p1]++; $gameswon[$p1] += $wins[$p1][$p2]; $gameslost[$p1] += $wins[$p2][$p1]; } } $gamesplayed[$p1] = $gameswon[$p1] + $gameslost[$p1]; $percentwon[$p1] = ($gamesplayed[$p1] > 0) ? (($gameswon[$p1] * 100) / $gamesplayed[$p1]) : 0; } // Sort players function psort ($p1, $p2) { global $opponents, $gamesplayed; if ($opponents[$p1] > $opponents[$p2]) return -1; elseif ($opponents[$p1] < $opponents[$p2]) return 1; elseif ($gamesplayed[$p1] > $gamesplayed[$p2]) return -1; elseif ($gamesplayed[$p1] < $gamesplayed[$p2]) return 1; elseif ($gameswon[$p1] > $gameswon[$p2]) return -1; elseif ($gameswon[$p1] < $gameswon[$p2]) return 1; else return 0; } uasort ($players, 'psort'); // For each pair, calculate new ratings based on previous ratings. // Use zig-zagging order to optimize interdependency between ratings. for ($i = 1; $i < $pc; $i++) { for ($j = $i; $j < $pc; $j++) { $p1 = $players[$j-$i]; $p2 = $players[$j]; $n = $wins[$p1][$p2] + $wins[$p2][$p1]; if ($n == 0) continue; $stability1 = (abs($gcr1[$p1] - 1500) + 500) / 20; $stability2 = (abs($gcr1[$p2] - 1500) + 500) / 20; $reliability = (100 * $n) / ($n + 3); $gap = abs($gcr1[$p1] - $gcr1[$p2]); $lowpoint = min($gcr1[$p1],$gcr1[$p2]); $midpoint = $lowpoint + $gap/2; $gap = max($gap, 400); $lowpoint = min($lowpoint, $midpoint - 200); $pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n; $pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n; $gcr1[$p1] = (($stability1 * $gcr1[$p1]) + ($reliability * $pr1)) / ($stability1 + $reliability); $gcr1[$p2] = (($stability2 * $gcr1[$p2]) + ($reliability * $pr2)) / ($stability2 + $reliability); } } // Calculate all ratings again in reverse zig-zagging order. for ($i = $pc-1; $i > 0; $i--) { for ($j = $pc-1; $j > $i; $j--) { $p1 = $players[$j-$i]; $p2 = $players[$j]; $n = $wins[$p1][$p2] + $wins[$p2][$p1]; if ($n == 0) continue; $stability1 = (abs($gcr2[$p1] - 1500) + 500) / 20; $stability2 = (abs($gcr2[$p2] - 1500) + 500) / 20; $reliability = (100 * $n) / ($n + 3); $gap = abs($gcr2[$p1] - $gcr2[$p2]); $lowpoint = min($gcr2[$p1],$gcr2[$p2]); $midpoint = $lowpoint + $gap/2; $gap = max($gap, 400); $lowpoint = min($lowpoint, $midpoint - 200); $pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n; $pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n; $gcr2[$p1] = (($stability1 * $gcr2[$p1]) + ($reliability * $pr1)) / ($stability1 + $reliability); $gcr2[$p2] = (($stability2 * $gcr2[$p2]) + ($reliability * $pr2)) / ($stability2 + $reliability); } } // Calculate all ratings again in half reverse zig-zagging order. for ($i = 1; $i < $pc; $i++) { for ($j = $pc-1; $j > $i; $j--) { $p1 = $players[$j-$i]; $p2 = $players[$j]; $n = $wins[$p1][$p2] + $wins[$p2][$p1]; if ($n == 0) continue; $stability1 = (abs($gcr3[$p1] - 1500) + 500) / 20; $stability2 = (abs($gcr3[$p2] - 1500) + 500) / 20; $reliability = (100 * $n) / ($n + 3); $gap = abs($gcr3[$p1] - $gcr3[$p2]); $lowpoint = min($gcr3[$p1],$gcr3[$p2]); $midpoint = $lowpoint + $gap/2; $gap = max($gap, 400); $lowpoint = min($lowpoint, $midpoint - 200); $pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n; $pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n; $gcr3[$p1] = (($stability1 * $gcr3[$p1]) + ($reliability * $pr1)) / ($stability1 + $reliability); $gcr3[$p2] = (($stability2 * $gcr3[$p2]) + ($reliability * $pr2)) / ($stability2 + $reliability); } } // Calculate all ratings again in reverse half reverse zig-zagging order. for ($i = $pc-1; $i > 0; $i--) { for ($j = $i; $j < $pc; $j++) { $p1 = $players[$j-$i]; $p2 = $players[$j]; $n = $wins[$p1][$p2] + $wins[$p2][$p1]; if ($n == 0) continue; $stability1 = (abs($gcr4[$p1] - 1500) + 500) / 20; $stability2 = (abs($gcr4[$p2] - 1500) + 500) / 20; $reliability = (100 * $n) / ($n + 3); $gap = abs($gcr4[$p1] - $gcr4[$p2]); $lowpoint = min($gcr4[$p1],$gcr4[$p2]); $midpoint = $lowpoint + $gap/2; $gap = max($gap, 400); $lowpoint = min($lowpoint, $midpoint - 200); $pr1 = $lowpoint + ($wins[$p1][$p2] * $gap) / $n; $pr2 = $lowpoint + ($wins[$p2][$p1] * $gap) / $n; $gcr4[$p1] = (($stability1 * $gcr4[$p1]) + ($reliability * $pr1)) / ($stability1 + $reliability); $gcr4[$p2] = (($stability2 * $gcr4[$p2]) + ($reliability * $pr2)) / ($stability2 + $reliability); } } // Average all four sets of ratings. // This helps minimize the effects of using any order, making ratings more homogonous. for ($i = 0; $i < $pc; $i++) { $p1 = $players[$i]; $gcr[$p1] = ($gcr1[$p1] + $gcr2[$p1] + $gcr3[$p1] + $gcr4[$p1]) / 4; } // Sort ratings arsort ($gcr); reset ($gcr); // Print table of ratings echo ''; foreach ($gcr as $userid => $rating) { printf ('', userid_name($userid), $userid, $rating, $percentwon[$userid]); } echo '

Game Courier Ratings for {$gamewcp}
Name	Userid	GCR	Percent won
%s	%s	%d	%.2f

Roberto Lavieri wrote on Mon, Jan 9, 2006 11:56 AM UTC:

The intention is good, but the main problem is that you are gathering apples, pears, mangoes and kiwis together, and the method does not make sense if some details are not well considered. What are we trying to rate?. If we are trying to measure multi-variant skills, the method does not work. By example, a player and his 6 y.o. son may decide play 'AMAZONS', and they decide play only between them. Suppose the father is much more stronger than his son, but nothing special. Luck can´t help too much in this game, and, after a couple of months we can have, as the best rated player in Game Courier, the father, after 120 victories, and 0 loses. The highest rank in our site can correspond to a player that only plays a game that is not a chess variant, and the player itself is an average player, and only plays against a child. We need consider multi-variant games, and I can give similar examples as I did to show that not all games must be rated, only 'officially rated' games. By example, Tournament games or concertated rated games, and the multi-variant purpose must be considered, or decide apply the method to each variant independently, the mix makes not sense here in this way.

Christine Bagley-Jones wrote on Mon, Jan 9, 2006 02:09 PM UTC:

well i know one thing we are rating here, and that is 'unrated games'. all i know is, i played 5 unrated games, so can someone tell me how i got a rating from that. it is just the principal of the thing. also i am sus about the rating system, it might somewhat suck. i played 2 games against a 1600 rated player, with 1 win and 1 loss, 2 games against a 1570 player, again with 1 win 1 loss, and finally a draw with a 1516 player, and my rating is 1462. it would of been the draw, taking my win percent from 50 to 40, that would of dropped my rating, but anyway, this is beside the point, they were all unrated games, that is the point. so can you take my name off the rating list, and we can never speak of this again lol :)

(please, bring the option for 'unrated games' back as soon as possible) btw, i agree, the intention was good, top points for that, i just don't know why you didn't talk to the players/members beforehand, which would of been nice, or am i wrong, did i somehow miss this conversation?

🕸📝Fergus Duniho wrote on Mon, Jan 9, 2006 03:55 PM UTC:

Roberto, please carefully read the first paragraph on this page and the
warning above the 'Game Courier Ratings for *' table. You will see that
we are in agreement on the point you raised, but it is not relevant to an
evaluation of the method used to rate players here.

Christine, I will refer you to an earlier comment of mine dated
2006-01-08. We are in agreement that game ratings should be reserved for
games intended to be rated. But, for the time being, I am using what data
I have, which are the logs of previously unrated games, to test the method
I am designing to rate players.

What I wish to focus on right now is whether this method is an accurate
enough measure of relative playing strengths between players. In this
respect, it is looking good to me, but there might be ways to tweak it.

🕸📝Fergus Duniho wrote on Mon, Jan 9, 2006 05:25 PM UTC:

Christine,

Although it would seem unfair that you are rated lower than any of the
players you have defeated, further examination of the evidence reveals
that it is fair. Your three opponents have all played several more games
than you have, and they all have much stronger track records against their
opponents than you have against yours. As long as you have played few
games, this method works with the assumption that your performance was
probably due to luck. As you play more games, as your opponents have, it
begins to place more trust in your performances against other players.
Like Elo, the method used here is self-correcting. As you play more and
more games, it is better able to come up with a more accurate estimate of
your relative playing ability. Since you have played only a few games, it
doesn't yet have enough information to make a good estimate.

Gary Gifford wrote on Mon, Jan 9, 2006 05:57 PM UTC:

It may be possible to add PROV after a player's rating. This means it is a 'Provisional' rating which only becomes official after 14 games have been played. The USCF uses that system. But perhaps it is not worth the effort. Anyway, it is an idea.

25 comments displayed

~~Earlier~~ ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.