强化学习（一）Basic Concepts

Dekel

字数：76 字

时长：1 分钟

发布于：2025年9月5日

次浏览

• State: the set of states S

• Action: the set of actions A(s) is associated for state s ∈ S.

• Reward: the set of rewards R(s, a).

• State transition probability

at state s, taking action a, the probability to transit s to state s' :

• Reward probability:

at state s, taking action a, the probability to get reward r :

at state s, the probability to choose action a is

更新于：2025年9月17日

字数：76 字

时长：1 分钟

强化学习（二）状态值与贝尔曼方程

1. 回报（Return）在 RL 中，智能体和环境交互，每一步都会得到一个奖励。智能体追求的目标并不是单个奖励，而是从某个时刻开始累积到未来的回报(Return)：这里是折...

毕业设计-自适应和互信息最大化无人机实时追踪蒸馏模型

1. 输入处理输入：模板图像，通常是初始帧或前一帧，尺寸较小，聚焦于目标物体。搜索图像，当前帧中需要搜索的区域，尺寸较大，覆盖目标可能的移动范围。分块： ...

看完了不如留下点什么吧