辅导0CCS0CSE编程、 写作Episode

” 辅导0CCS0CSE编程、 写作EpisodeIntroduction to CS Engineering (0CCS0CSE)Assignment 23: Episode1 Value FunctionImplementing Eq. 1 can cause confusion because V (S) is on both sides of the equation andin Python V (S) is a dictionary. This document will help explain lines 2325 in Algorithm 1.V (St) = V (St) + [Rt+1 + V (St+1) V (St)] (1)Although lines 23 and 24 appear to update the valueFunction dictionary in Algorithm 1,they do not. Lines 23 and 24 are Retrieve information from the value function dictionary.The introduction of two new variables, v st1 and v st0, to replace V (St+1) and V (St), wouldhelp to clarify that only line 25 changes the dictionary.v st1 GetValueOf(board)v st0 GetValueOf(previousState)V (St) v st0+session.learningRate(reward+(session.discountRatev st1)v st0)Furthermore, GetValueOf(…) is a multistep process (1) get the key from the board (2)check if the key is in valueFunction, either i. the key is in valueFunction return thevalue associated with the key in the dictionary, e.g., return self.valueFunction[key] orii. the key is not in valueFunction add the key to the dictionary, initialise its valueto zero and return 0. It would be best to add a new method, getValueOf(self, board),which does all of this. In Algorithm 1, lines 23 and 24, both board and previousState areTicTacToe objects.1Algorithm 1 This method executes a single tictactoe game and updates the state valuetable after every move played by the RL agent.1: procedure episode(board, Opponent, session)2:3: result True4: turn 05: previousState CopyBoard()6:7: while not board.isGameOver() and result do8: if turn 1 then :9: turn 010: end if11:12: agentMoved False13:14: if turn is 0 and Session.agentFirst or turn is 1 and not session.agentFirst then15: result makeTrainingMove(board, session.epsilon)16: agentMoved True17: else18: result opponent.makeMove(board)19: end if20:21: if agentMoved then22: reward getReward(board)23: V (St+1) GetValueOf(board)24: V (St) GetValueOf(previousState)25: V (St) V (St) +session.learningRate (reward + (session.discountRate V (St+1)) V (St))26: previousState CopyBoard()27:28: end if29:30: turn turn + 131: end while32:33: reward GetReward(board)34: V (St+1) GetValueOf(board)35: V (St+1) = V (St+1) + session.learningRate reward36: end procedure2请加QQ:99515681 或邮箱:99515681@qq.com WX:codehelp

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导