辅导CSE 525程序、 写作Programming编程

” 辅导CSE 525程序、 写作Programming编程CSE 525 Programming Assignment 1Due March 20th 11:59:59The goal of this assignment is to implement three RL algorithms listed as follows.● Monte Carlo (with function approximation)● Fitted Q iteration● DQNYou will be using 1 MuJoCo environment (InvertedPendulumMuJoCoEnv-v0) and 1Atari environment (Pong-v0), and compare the RL algorithms. Feel free to use all of theextensions/tricks we discussed during the classes for reliable learning. As the behaviorpolicy of off-policy RL methods, use epsilon-greedy.What you need to submit:(1) A notebook file that contains your networks definition, training processes,evaluation results and necessary comments of your codes.(2) A report that contains core codes of the algorithms and networks design, analysisof your results and comparison between the algorithms.PrerequisitesIn this assignment, we Recommend using Colab, OpenAI Gym, OpenAI Gym[Atari],PyBullet and PyBulletGym (Open AI Gym[Mujoco] implementation based on PyBullet).So, before getting started, please be prepared for the smooth running of the requireddependencies.The afore-mentioned packages are actually simulated environments that are able tointeract with our agents to offer instant observations, rewards, and other importantinformation. For this time, we picked 1 discrete environment in Atari called Pong-v0and 1 continuous environment in MuJoCo called InvertedPendulumMuJoCoEnv-v0.Note that the actions in Pong are discrete while the actions in InvertedPendulum arecontinuous. As you know, the three algorithms are not able to deal with continuousactions, which further requires you to discretize the action spaces in theInvertedPendulum environment first.For the Atari game Pong environment, we encourage you to preprocess the imageinput to make it easier for the network to learn.Rubrics1) Network design for two Environments. (20 points in toal, 10 points each)2) Training process for three Algorithms, there should be 6 training processes in totalfor 2 environments and three algorithms. (30 points in toal, 5 points each, youshould provide a decent amount of comments to explain your codes.)3) Evaluation results of your 6 training programs, this should include cumulativereward by training episodes plots, average return on ten times run of your finalpolicy and any other plots that you find helpful to explain your designsperformance. (30 points in toal, 5 points each.)4) Analysis of performance of three algorithms for each environment, analyze yourplots and numbers under each algorithm and compare three algorithms under eachenvironment. (15 points in total)5) Comparison between the use of epsilon-greedy vs. random behavior policy. Forthis experiment, use InvertedPendulum as your environment and fitted Qiteration as your RL algorithm. Give plots of your cumulative reward by episodesand average return on Test runs of your learned policy and analyze the performanceof different behavior policies. (5 points in total)To start with:We prepared a simple starter code for you to understand what you should code and whereto put your analysis. You dont have to strictly follow the format, write your code in theway you are comfortable with.Before turning in:1. Check your Notebook file, make sure that once the instructors restart and runall, no errors occur. Also, make sure the format of your report is correct.2. Rename your notebook file like firstname_lastname_SBUID.ipynb and yourreport like firstname_lastname_SBUID_report.pdf. Zip these two files in a namelike firstname_lastname_SBUID.zip and upload to Blackboard.After turning in:1. Any format errors and fail-to-run errors might result in penalty.2. Late submissions Might result in penalty. 10% per day, 50% max.请加QQ:99515681 或邮箱:99515681@qq.com WX:codehelp

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导