Real World Examples of Reinforcement Learning [Part 2]

Chess Example:
A master chess player makes a move informed by:
- Planning possible replies and counter-replies.
- Making immediate, intuitive judgments about the desirability of specific positions and moves.
Adaptive Controller Example:
- An adaptive controller adjusts parameters of a petroleum refinery’s operations in real time.
- It optimizes the yield/cost/quality trade-off based on specified marginal costs, rather than strictly following the original set points suggested by engineers.
Gazelle Example:
- A gazelle calf struggles to stand minutes after birth.
- Within half an hour, it is capable of running at 20 miles per hour.
Mobile Robot Example:
A mobile robot decides whether to:
- Enter a new room to collect more trash.
- Return to its recharging station based on:
- The current battery level.
- Past experiences of how quickly and easily it found the charger.
Phil’s Breakfast Example:
Phil prepares breakfast, a seemingly simple task that involves:
- A complex network of conditional behaviors and goal-subgoal relationships:
- Walking to the cupboard, opening it, selecting a cereal box, then reaching for and retrieving the box.
- These actions are part of a series of interactive sequences that require coordination to obtain a bowl, spoon, and milk jug.
Each step involves:
- Eye movements to obtain information.
- Rapid judgments on how to carry or prioritize objects.
Phil’s actions are guided by goals:
- Grasping a spoon or reaching the refrigerator.
- Serving broader goals, such as eating with the spoon once the cereal is ready and obtaining nourishment.
Phil is constantly accessing information about his body’s state, influencing his nutritional needs, hunger level, and food preferences, often without being aware of it.

Interaction Between Agent and Environment:
- All examples involve an active decision-making agent interacting with its environment.
- The agent aims to achieve a goal despite uncertainty about the environment.
- The agent’s actions can influence the future state of the environment (e.g., next chess position, refinery levels, robot’s next location).
- These actions affect the options and opportunities available to the agent later on.
- Correct decision-making requires considering indirect, delayed consequences of actions, often necessitating foresight or planning.
Monitoring and Reacting to the Environment:
- The effects of actions cannot always be fully predicted.
- The agent must monitor the environment frequently and react appropriately.
- Example: Phil must watch the milk as he pours it into the cereal bowl to avoid overflow.
- Goals in these examples are explicit, meaning the agent can judge progress based on what it can sense directly (e.g., chess player knowing if they win, refinery controller monitoring petroleum levels, mobile robot sensing battery levels).
Broad Definition of Agent and Environment:
- An agent isn’t necessarily an entire robot or organism, and the environment isn’t limited to what’s external to the agent.
- Example: The robot’s battery is part of its environment, and Phil’s hunger and food preferences are part of his internal decision-making environment.
- The state of an agent’s environment may include information about the machine or organism in which the agent resides, including memories and aspirations.
Learning from Experience:
- In all examples, the agent can use experience to improve performance over time.
- Example: The chess player refines intuition to evaluate positions better, the gazelle calf increases its running efficiency, and Phil streamlines his breakfast-making process.
- The agent’s initial knowledge (from prior experience or design) influences what is easy or useful to learn.
- However, interaction with the environment is essential for adjusting behavior to exploit specific features of the task effectively.

Real World Examples of Reinforcement Learning [Part 2]

By Ashis

Related Post

Leave a Reply Cancel reply

You Missed

Reinforcement Learning: How Optimistic Initial Values effect Exploration [Part 8]

Reinforcement Learning: Action-Value Methods wrt Incremental Implementation[Part 6]

Reinforcement Learning: The N-Armed Bandit Problem- Action-Value Methods [Part 5]

Reinforcement Learning: The N-Armed Bandit Problem- Overview [Part 4]