辅导0CCS0CSE编程、写作Episode

” 辅导0CCS0CSE编程、写作EpisodeIntroduction to CS Engineering (0CCS0CSE)Assignment 23: Episode1 Value FunctionImplementing Eq. 1 can cause confusion because V (S) is on both sides of the equation andin Python V (S) is a dictionary. This document will help explain lines 2325 in Algorithm 1.V (St) = V (St) + [Rt+1 + V (St+1) V (St)] (1)Although lines 23 and 24 appear to update the valueFunction dictionary in Algorithm 1,they do not. Lines 23 and 24 are Retrieve information from the value function dictionary.The introduction of two new variables, v st1 and v st0, to replace V (St+1) and V (St), wouldhelp to clarify that only line 25 changes the dictionary.v st1 GetValueOf(board)v st0 GetValueOf(previousState)V (St) v st0+session.learningRate(reward+(session.discountRatev st1)v st0)Furthermore, GetValueOf(…) is a multistep process (1) get the key from the board (2)check if the key is in valueFunction, either i. the key is in valueFunction return thevalue associated with the key in the dictionary, e.g., return self.valueFunction[key] orii. the key is not in valueFunction add the key to the dictionary, initialise its valueto zero and return 0. It would be best to add a new method, getValueOf(self, board),which does all of this. In Algorithm 1, lines 23 and 24, both board and previousState areTicTacToe objects.1Algorithm 1 This method executes a single tictactoe game and updates the state valuetable after every move played by the RL agent.1: procedure episode(board, Opponent, session)2:3: result True4: turn 05: previousState CopyBoard()6:7: while not board.isGameOver() and result do8: if turn 1 then :9: turn 010: end if11:12: agentMoved False13:14: if turn is 0 and Session.agentFirst or turn is 1 and not session.agentFirst then15: result makeTrainingMove(board, session.epsilon)16: agentMoved True17: else18: result opponent.makeMove(board)19: end if20:21: if agentMoved then22: reward getReward(board)23: V (St+1) GetValueOf(board)24: V (St) GetValueOf(previousState)25: V (St) V (St) +session.learningRate (reward + (session.discountRate V (St+1)) V (St))26: previousState CopyBoard()27:28: end if29:30: turn turn + 131: end while32:33: reward GetReward(board)34: V (St+1) GetValueOf(board)35: V (St+1) = V (St+1) + session.learningRate reward36: end procedure2请加QQ：99515681 或邮箱：99515681@qq.com WX：codehelp

“

添加老师微信回复‘’官网辅导‘’获取专业老师帮助，或点击联系老师1对1在线指导。

声明：本站包含转载考而思在线或考而思。对于转载内容，本站尊重原创者劳动，保留原文作者或出处。但由于人为因素的限制，难免有疏忽、失误和遗漏，或者内容来源无法查明。如果出现类似这些情况，不管是被转载内容的原作者，还是本站读者，请及时联系本站，以确保第一时间予以修正。

本站辅导：留学课程辅导丨留学生论文辅导丨留学生作业辅导丨留学挂科申诉丨留学生课程预习

推荐：essay代写

相关文章