抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

Dekel'Blog

奔赴山海,保持热爱

Set

State: the set of states S

Action: the set of actions A(s) is associated for state s ∈ S.

Reward: the set of rewards R(s, a).

Probability distribution

State transition probability

​ at state s, taking action a, the probability to transit s to state s' :

Reward probability:

​ at state s, taking action a, the probability to get reward r :

Policy

at state s, the probability to choose action a is

Markov property(memoryless property)

评论

看完了不如留下点什么吧