Talk:Negamax
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Comments
[edit]The pseudo code is correct, but it's hard to interpreter it correctly. I've successfully implemented it.
The code is *WRONG*! For negamax we must negate the heuristic value of node for player 2. Also, if the root of this call is for player 2, we must negate the final result. Please correct the code.
- It's correct (the recursive call to negamax is negated). --IanOsgood (talk) 17:54, 17 April 2008 (UTC)
- I think the previous poster is correct, the code is wrong. I converted the pseudocode to Python and ran it on a trivial tree containing one node with one child. The result was always the negative of the child.
def negamax(node, depth, alpha, beta): if type(node) == type(1): return node else: for child in node: alpha = max(alpha, -negamax(child, depth-1, -beta, -alpha)) # the following if statement constitutes alpha-beta pruning} if alpha>=beta: return beta return alpha
>>> tree = [ 1 ] >>> negamax(tree, 5, -1000, 1000) -1
JSB73 (talk) 22:38, 28 April 2009 (UTC)
The code is definitely wrong as per the reasons stated above. More importantly the code is completely unreferenced. Here is a link that mentions the negation of the heuristic: http://www.hamedahmadi.com/gametree/
I'd fix this myself if I didn't have to make up for the time that I lost trying to debug my java implementation of this negamax pseudocode.202.89.191.178 (talk) 17:19, 6 May 2009 (UTC)
- I corrected the code using information from that page. I ran tests against naive minimax on trees of various depths and up to 11 million leaf nodes. The code always returned the correct result and always beat minimax with alpha-beta pruning for sufficiently large trees (>100 leaf nodes). JSB73 (talk) 19:53, 17 May 2009 (UTC)
MAKE MORE SENSE —Preceding unsigned comment added by 198.49.142.2 (talk) 15:26, 19 February 2010 (UTC)
- I think the above "MAKE MORE SENSE" means: preceding commentator (@JSB73), please clarify "*the* code", "*that* page" [to which I agree!!]. Also, instead of using 11 million nodes it's sufficient to consider a tree with 3x3x3 nodes or so... Thanks!
- Also, I do think that the snippet above is OK, in particular I think the "value" stuff currently there on the main page is redundant with " α " -- or what am I missing? — MFH:Talk 22:31, 3 January 2023 (UTC)
Remark on the last paragraph
[edit]I would like to discuss about the last paragraph of this page:
« What can be confusing for beginners is how the heuristic value of the current node is calculated. In this implementation, the value is always calculated from the point of view of the player running the algorithm because of the color parameter. This is the same behavior as the normal minimax algorithm. If this parameter was not present, the evaluation function would need to return a score for the current player, i.e. the min or max player. »
I had really hard time to implement this algorithm directly from the given pseudo-code. It was mainly because I misunderstood the role of the 'color' variable. Indeed, in my implementation the alternation of players is handled outside of the algorithm and one can ask the object 'game' which is the current active player.
The fact is that my evaluation function for the leafs (or nodes, when 'depth=0') was already giving a '+1' when the root player was winning and a '-1' when the opponent of the root player was winning, so I discarded the color
argument...
I was wrong... but as much as the pseudo code was. Indeed, the evaluation function should give '+1' (positive number) if the current player is winning and '-1' (negative number) if the opponent of the current player is winning. It should be a local decision and not a decision based on the player who ran the initial call of the negamax function.
I realized, once this bug as been fixed in my implementation, the meaning of the last paragraph (which is partially wrong as one can notice that the 'color' parameter is here to invert the result of the evaluation function which was probably like my own function), but, for me, it should be rewritten in a clearer way. Thus, I propose this:
- Remove the 'color' parameter from the pseudo-code and from the text around.
- Rewrite the last paragraph as follow:
« This pseudo-code assumes that the heuristic value of a node/leaf will be given for the current player of the node and not for the player running the initial call. Meaning that a positive result means that the current player is gaining the given amount (and the opponent score is deduced of the exact same amount).
In the case of a heuristic value computed for the player running the initial call, one should take care of carrying the players' alternation to negate this effect.
Anyway, it must be understood that the negamax algorithm is the only one taking care of propagating the alternation and only local choices have to be taken. » —Preceding unsigned comment added by 82.242.180.106 (talk) 14:48, 26 December 2010 (UTC)
Providing correction to alpha-beta pruning with transposition table code
[edit]I'm very appreciative of the code on this page, as it helped me significantly with implementing alpha-beta pruning with transposition tables.
But the following code on the page currently is incorrect:
if bestValue ≤ alphaOrig ttEntry.Flag := UPPERBOUND else if bestValue ≥ β ttEntry.Flag := LOWERBOUND else ttEntry.Flag := EXACT endif
In fact this will lead to different results than alpha-beta pruning alone. I have tested this with my own working code. alphaOrig is not a tight enough bound to avoid problems with nodes marked EXACT, even though the nodes marked UPPERBOUND and LOWERBOUND are correct (but not all nodes that should be marked UPPERBOUND and LOWERBOUND are currently marked, which is the crux of the problem actually).
The current alpha must in fact be used, and not the original alpha, if you want to get the same results as alpha-beta pruning alone. But alpha sometimes "masquerades" as beta in Negamax, I believe, because beta is never updated while processing. So you must be careful how the current alpha is used.
I'm not 100% sure since I'm not implementing Negamax in my code here, but I believe the correct code should look like the following:
if (color = 1) if bestValue ≤ alpha ttEntry.Flag := UPPERBOUND else ttEntry.Flag := EXACT endif else if bestValue ≥ alpha ttEntry.Flag := LOWERBOUND else ttEntry.Flag := EXACT endif — Preceding unsigned comment added by 66.192.104.10 (talk) 21:29, 26 February 2015 (UTC) endif
Just leaving this here in case anyone runs into the same problem I did. — Preceding unsigned comment added by 66.192.104.10 (talk) 21:16, 26 February 2015 (UTC)
- The article's code is accurate according to the references, duplicate Negamax at other programming sites, and my own testing. Note it's correct only in within the context shown, for the Negamax implementation shown. You shouldn't expect the code to work as is for you when you're not implementing Negamax as you say.
- alphaOrig must be used (instead of alpha) to help set ttEntry flag, since alpha is an updating value. Otherwise, it would be impossible for the EXACT flag to get set. For your other concerns, the article's references can provide more detailed information. GamePlayerAI (talk) 16:50, 6 March 2015 (UTC)
I realize that my code above is different from other programming sites and references. I believe in fact that the existing sources are wrong, and that they did not notice because the problems (at least in my app) don't become apparent until 5 levels or deeper of lookahead, which I think is hard to realize. I think most folks verify with shallower trees. I spent a lot of time working trees out by hand and debugging using my visualization tool, and I can't see any remaining bugs, though it's always possible I missed them.
Even my changed version is not perfect -- it diverges from plain alpha-beta at 9 levels deep, but at least it's closer. I think this problem is fundamental to using a transposition table with alpha-beta pruning. The only way I could duplicate alpha-beta pruning EXACTLY was to disallow any transposition table use unless no alpha-beta pruning had happened at this layer or at any deeper layer, but that rules out an awful lot of cached nodes, and the difference in play is not large (another reason I think this mainly goes unnoticed). Furthermore, with better move ranking, this problem also becomes harder to notice. Probably I only noticed because I chose to implement my code for Reversi/Othello, where good move ranking is hard (in my opinion).
This is all due to original research on my part, and I'm not planning on publishing any of this, so I guess I'll never be able to convince folks to add this to the Wikipedia article. Probably that's the right way to do things anyway. But I at least wanted to have this information out there. Thanks for taking the time to review it. — Preceding unsigned comment added by 73.21.166.99 (talk) 04:19, 16 March 2015 (UTC)
- Transposition table addition to the algorithm should not affect alpha/beta results, regardless of search depth. There are definitely errors in the implementation if it does. Having experience this too, this is particularly difficult to troubleshoot, due to recursive code, and incorrect results propagate all over the game tree quickly. Errors can be the transposition table itself, such as a table that's returning bad data, due to table entry collisions, or table corruption. Other cases are table lookup misuse (using data that should be ignored) and misinterpretation (improper save or restore of bestValue and its alpha beta relationship). GamePlayerAI (talk) 16:47, 17 March 2015 (UTC)
- I didn't realize that this whole phenomenon had a name: search instability. It apparently happens in exactly the kinds of interactions that we're talking about here. The version I implemented and showed the code for above reduced search instability for me, so I guess that's good, but perhaps it wouldn't for everybody in every game. It seems hard to know. — Preceding unsigned comment added by 73.21.166.99 (talk) 02:07, 31 March 2015 (UTC)
2A02:AB88:39C0:3900:CC2A:E662:9F18:5846 (talk) 11:46, 28 April 2021 (UTC)
- "Transposition table addition to the algorithm should not affect alpha/beta results, regardless of search depth."
- This is not true. Breuker mentions this in a footnote: "If the depth still to be searched is less than the depth retrieved, the search results may differ from the results when searching without a transposition table". Ph.D. thesis: Memory versus Search in Games Chapter 2: The transposition, D. Breuker, page17
---
It appears to me that the use of the temporary variable alphaOrig in the pseudo code is confusing at best and may be divergent from some other references of "negamax with alpha beta pruning and transposition tables". The issue is that alpha can change during "Transposition Table Lookup; node is the lookup key for ttEntry", so perhaps alphaOrig should capture the value *after* this code rather than before.
For comparision, view A REVIEW OF GAME-TREE PRUNING, T.A. Marsland, page 14. In this pseudo-code version, no temporary variable is used, and alpha itself is not modified during move evaluation. (Rather than updating alpha, it recurses using the function -max( α ,score)). So alpha is the value *after* the transposition table lookup, not before as with alphaOrig in the Wikipedia pseudo-code.
So personally, I prefer the Marsland pseudo-code for education purposes. For implementation purposes, the code used by Alpha-Beta with Sibling Prediction Pruning in Chess, Jeroen W.T. Carolus is perhaps more clever, and no temporary variable is used (by checking LOWERBOUND before UPPERBOUND).
Janesn (talk) 22:00, 27 September 2016 (UTC)
- The source of the transposition table example in the article is Breuker, and I'll refer to the other two as Marsland, and Carolus. Marsland transposition table logic is equivalent when alphaOrig in Breuker stores α after the transposition table lookup (rather than before). But consider the following case during a negamax function call:
- transposition table lookup updates α because it's a "lower bound" (Breuker: alphaOrig < α Marsland: alphaOrig = α)
- the move evaluation returns the same as unchanged α for bestValue (score)
- update the node's transposition table entry with the same bestValue (score)
- In Breuker's logic, the node's transposition table entry will update with "exact" flag (since alphaOrig < bestValue < β). In Marsland, the update will have "upper bound" flag (since score ≤ α). Optimally, the flag for the score should be "exact" rather than alternating between upper and lower bound. So I think Breuker's version is better?
- In Carolus, there's no alphaOrig and no equivalent. alpha updates during move evaluation. In this case, after move evaluation, best can never be greater than alpha, and setting "exact" flag for the transposition table entry is impossible. I think there are other errors in Carolus:
if(best <= alpha) // a lowerbound value StoreTTEntry(board.getHashKey(), best, LOWERBOUND, depth); else if(best >= beta) // an upperbound value StoreTTEntry(board.getHashKey(), best, UPPERBOUND, depth); else // a true minimax value StoreTTEntry(board.getHashKey(), best, EXACT, depth); return best;
- Namely if(best ≥ beta) should be the first test for lower bound, then if(best ≤ alpha) for upper bound. As shown, the lower/upper bound flags are incorrectly swapped.
- There's another source Plaat I'll mention which is interesting. In Plaat, the transposition table entry stores both upper and lower bound scores. When both bounds are equal, then score is exact.
- GamePlayerAI (talk) 23:18, 29 September 2016 (UTC)
Your comparision between Marsland and Breuker pinpoints why I find the early assignment of alphaOrig confusing. In Marsland it is clear that "flag" always captures the fact that the called negamax function did not complete exactly, thus setting upper bound. In Breuker, it can set the EXACT flag, which at least to me is not clear how he is able to make that optimization. My suspicion remains that a subsequent retrieval of the position in the transposition table could diverge from alpha-beta.
The Plaat code, while not negamax, has the same approach as Marsland. The alpha and beta variables are set after the transposition table lookup and not modified during move evaluation.
Janesn (talk) 19:52, 1 October 2016 (UTC)
- This may be surprising; all of the approaches mentioned, Breuker, Marsland, Carolus (with the noted bug fixes), and Plaat, will return correct alpha-beta results. They all produce identical exact results. When the negamax function does not complete exactly, a lower bound, or an upper bound, is still a genuine answer. Thus, it's still OK to flag a known to be exact value as lower bound or upper bound in the transposition table. I have tried Breuker and Marsland approaches in the past, and there are no differences in the alpha-beta result outcome, when seeking exact results at the root node.
- With the Breuker approach, the negamax caller (parent) will interpret return results in the same way as the transposition table update made by the callee. Changes to α or alphaOrig in the callee are invisible to the caller, so the caller will test unchanged alphaOrig < bestValue < β to determine whether it received a upper bound, lower bound, or exact result. All the other approach can result in harmless flag inconsistencies between caller interpretation and transposition table entry for a given result.
- Although Platt has the same approach as Marsland regarding alphaOrig and equivalent, the results in the transposition table are not the same. With Platt, a score seen as both an upper bound and lower bound unconditionally becomes exact.
- As to how to improve the wikipedia article to help explain purpose and possible confusion about alphaOrig, I don't know about that at the moment. GamePlayerAI (talk) 00:47, 2 October 2016 (UTC)
While it is OK to flag a known-to-be-exact value as lower or upper bound, it is suspicious to flag an upper bound as an exact value when the lower and upper bound scores differ. But this is what can happen in Breuker when the transposition table lookup updates α because it's a "lower bound" and then the called negamax function does not complete exactly. Marsland flags the new entry as an "upper bound" score. Both Breuker and Marsland completely replace the old entry, so the "lower bound" score is lost.
Plaat stores the lower bound and upper bound scores separately, so the same transposition table entry can be used for both lower bound and upper bound checking. Only when both scores are exactly the same is the transposition table entry deemed exact. Thus Plaat has the same outcome as Marsland, perhaps slightly faster.
As I am already repeating myself, this will be my last posting on the subject. Thank you for your discussion. Janesn (talk) 21:10, 3 October 2016 (UTC)
- For posterity, the suspicious behavior described cannot occur. The algorithms assume absence of search instability (see also Aspiration Windows), where negamax cannot return contradictory results (such as new upper bound < previous lower bound (from TT) for the same (re-)search), even when negamax does not complete exactly. GamePlayerAI (talk) 16:30, 11 October 2016 (UTC)
Dead Link
[edit]The link to the C99 example code under external links is dead. — Preceding unsigned comment added by 2605:A000:131A:3E:ECE3:A970:240E:6901 (talk) 09:12, 12 February 2016 (UTC)