Note that the symbols used in the pseudocode below have the following meanings:
- MDP: Markov Decision Process;
- V(s): Value function, the avg reture of one state;
- π(s): Policy, in the sense that for a given state- s,- π(s)represents the action that the agent will take in that state according to the policy, usually can be divided into a random manner or a deterministic manner;
- R(s,a): Immediate reward when taking action- ain state- s;
- P(s'|s,a): Transition probability from state- sto state- s'under an action- a;
- γ: Discount factor for future reward.
Value iteration:function ValueIteration(MDP):// MDP is a Markov Decision ProcessV(s) = 0 for all states s  // Initializationrepeat until convergence:delta = 0for each state s:v = V(s)V(s) = max over all actions a of [ R(s, a) + γ * Σ P(s' | s, a) * V(s') ]delta = max(delta, |v - V(s)|)return V  // Optimal value functionfunction ExtractOptimalPolicy(MDP, V):// MDP is a Markov Decision Process, V is the optimal value functionfor each state s:π(s) = argmax over all actions a of [ R(s, a) + γ * Σ P(s' | s, a) * V(s') ]return π  // Optimal policyPolicy iteration:function PolicyIteration(MDP):// MDP is a Markov Decision ProcessInitialize a policy π arbitrarilyrepeat until policy converges:// Policy EvaluationV = EvaluatePolicy(MDP, π)// Policy Improvementπ' = GreedyPolicyImprovement(MDP, V)if π' = π:break  // Policy has convergedπ = π'return π  // Optimal policyfunction EvaluatePolicy(MDP, π):// MDP is a Markov Decision Process, π is a policyV(s) = 0 for all states s  // Initializationrepeat until convergence:delta = 0for each state s:v = V(s)V(s) = Σ P(s' | s, π(s)) * [ R(s, π(s)) + γ * V(s') ]delta = max(delta, |v - V(s)|)return V  // Value function under the given policyfunction GreedyPolicyImprovement(MDP, V):// MDP is a Markov Decision Process, V is a value functionfor each state s:π(s) = argmax over all actions a of [ R(s, a) + γ * Σ P(s' | s, a) * V(s') ]return π  // Improved policy
given the shiyu Zhao's course [1] ppt :


References:
[1] https://www.bilibili.com/video/BV1sd4y167NS
[2] https://chat.openai.com/