lichess.org
Donate

Fix inconsistencies in lichess's cached evaluation

It's fine for an evaluator to be limited, but inconsistencies in lichess's cached evaluation should ideally be corrected.

Here's an example sequence: 1. e4 e5 2. exd5 Qxd5 3. Nc3
The evaluation is: 0.1 0.3 0.4 0.4 0.3

According to lichess's arrow, white's choice for moves 2 and 3 are best. Therefore, they should neither increase nor decrease the evaluation. Also, no move should ever improve a player's evaluation.

Like I said, I recognize that an evaluator with limited lookahead can have these kinds of inconsistencies, but when it comes to recording things into the cache, these inconsistencies should be eliminated by:
updating the preceding evaluation so that it is greater than or equal to the evaluation following white's move (or less than or equal to black's move), and updating the preceding evaluation so that it is equal to the evaluation when a player does their best move, and updating the cached "best move arrow" (if it's cached).
Evaluations will change after every move simply because of the horizon effect - every move made effectively causes the engine to search one ply deeper..... so even if evaluations (at say depth 25) they will probably change when depth 26 is reached. I don't see how this can be fixed, as there will probably always be someone who wants their analysis to be that bit more accurate, so they will search a bit longer.

Also, computers are considered to be weaker at openings than they are at the rest of the game - your example might be highlighting that.

EDIT: and thibault's point changes things as well.
IMHO it is pointless to give evaluations as long as the game is within the opening book.
Even after that the evaluation changes because of the horizon effect.
Different engines yield different evaluations of the same position.
There is an inherent margin of error on evaluations, maybe even +/- 0.3.
I would not adhere too much importance to the evaluation.
I much agree with #1, though not so much in the case of cached opening evaluations, but rather much later in games, when playing a move, or especially going back a move, seems to reset the evaluation, and return worse findings.

e.g. after white move 46 (in no particular game), engine says black's best move is 46... Nf6 and eval is +3.5, I'll play Nf6 for black, and then evaluation will quickly jump to +10 (or #17). Then I'll go back to the position after white 46, and evaluation will go back to 3.5 after Nf6 etc., not updating that the top move is known to lead to #17. Desired behavior : explore other moves for black 46, or update the eval to +10, if they are all indeed worse than 46... Nf6.
Related: http://www.zipproth.de/ Look at the section "Brainfish". This is Stockfish with the opening book where all inconsistencies are eliminated. The problem, however, is that it is not clear how to combine this with lichess cloud analisys that is updated all the time and is done to the different depths (often to the greater depth in further positions).

It would be nice to have some "opening tree" where the evaluations of the vertices would be just the minimum for black and maximum for white of evaluations of all children nodes, with cloud SF evaluation only at the leaves of this giant tree, but the problem is that you can't be sure that some crazy lines which are out of a tree because they seem inferior but can be reached from a non-leaf vertex, are not in fact the best if we would look to a greater depth. And cloud evals of children nodes are not entirely comparable because they are looked at different depths (it may become a big problem to consider some less analyzed vertex better than some well-analyzed). We may end up with an algorithm that just plays weaker than usual cloud SF with independent analysis in every node.
@thibault Thanks for your reply. (By the way, I enjoyed your recent talk on lichess!) Like I said in my question, I understand why the inconsistencies are happening. What I'm suggesting is that the cached values be corrected so that the inconsistencies disappear. That is equivalent to saying that the depth is still "n", but some extra lines of depth great than "n" are sometimes considered. (It can be more than "n+1" if you considered multiple applications of the inconsistency correction — the evaluation can bubble up from many moves down.)

Also, I'm not sure how the cache works. I assume lichess caches board states based on the frequency of entering a state. (You can keep track of the frequency of all of the states below the cached states, and measure the frequency to find candidates for caching.)

However, I suggest that lichess also tries to cache board states that have significant inconsistencies between their evaluation and some next move's evaluation. I've seen cases where lichess evaluates a state to roughly even, and then white makes a move that improves his evaluation to +10. Having cached the state, it would have then been possible to see the +10 one step earlier. (And possibly more.) This is a separate idea about making the cache focus on those states where the default depth leads to very incorrect evaluations.

This topic has been archived and can no longer be replied to.