Southern New Hampshire University Module 5 Reinforce Algorithm and Actor Critic Models Paper

Description

In Module Five, you learned about Q-learning and its variants. This  module introduces you to advanced reinforcement learning concepts such  as actor-critic methods. Using the cartpole problem you solved in Module  Five, you will explore how the same problem could be solved using  policy-based approaches and actor-critic methods. Then you will write a  short paper discussing your findings. This will help you gain a deeper  understanding of advanced reinforcement learning concepts and how they  can be applied to computing problems.

Prompt

In the Module Five assignment, you experimented with the cartpole  problem using a Deep Q-Network (DQN), which is a value-based approach.  There is also a policy-based approach that works better for continuous  environments. Read through one such approach here: Deriving Policy Gradients and Implementing REINFORCE.  Actor-critic models represent another variation of reinforcement  learning by taking the best of both value- and policy-based approaches.  One such approach, A2C, is described here: Understanding Actor Critic Methods and A2C. Be sure to pay special attention to the code included in both of these readings.

Note: As with some of your previous readings, there are some  complicated math equations referenced in these articles. Understanding  the details of the equations is not essential. Instead, pay special  attention to the included code. Although not required for the purposes  of this assignment, you are encouraged to try out the code for these  solutions in a Jupyter Notebook in your Virtual Lab (Apporto).

After reviewing the articles, you must write a short paper explaining your findings. You must include references and in-text citations using APA format. Specifically, in your paper you must address each of the following:

  • Explain how the cartpole problem can be solved using the REINFORCE algorithm. Consider using pseudocode, UML, diagrams, or flowcharts to help illustrate your solution.
  • Explain how the cartpole problem can be solved using the A2C algorithm. Consider using pseudocode, UML, diagrams, or flowcharts to help illustrate your solution.
  • Explain how policy gradient approaches differ from value-based approaches, such as Q-learning.
  • Explain how actor-critic approaches differ from value- and policy-based approaches.