Skip to content

DDPG

python_motion_planning.local_planner.ddpg.DDPG

Bases: LocalPlanner

Class for Deep Deterministic Policy Gradient (DDPG) motion planning.

Parameters:

Name Type Description Default
start tuple

start point coordinate

required
goal tuple

goal point coordinate

required
env Env

environment

required
heuristic_type str

heuristic function type

'euclidean'
hidden_depth int

the number of hidden layers of the neural network

3
hidden_width int

the number of neurons in hidden layers of the neural network

512
batch_size int

batch size to optimize the neural networks

2000
buffer_size int

maximum replay buffer size

1000000.0
gamma float

discount factor

0.999
tau float

Softly update the target network

0.001
lr float

learning rate

0.0001
train_noise float

Action noise coefficient during training for exploration

0.1
random_episodes int

Take the random actions in the beginning for the better exploration

50
max_episode_steps int

Maximum steps for each episode

200
update_freq int

Frequency (times) of updating the network for each step

1
update_steps int

Update the network for every 'update_steps' steps

1
evaluate_freq int

Frequency (times) of evaluations and calculate the average

50
evaluate_episodes int

Evaluate the network every 'evaluate_episodes' episodes

50
actor_save_path str

Save path of the trained actor network

'models/actor_best.pth'
critic_save_path str

Save path of the trained critic network

'models/critic_best.pth'
actor_load_path str

Load path of the trained actor network

None
critic_load_path str

Load path of the trained critic network

None
**params

other parameters can be found in the parent class LocalPlanner

{}

Examples:

Import the necessary dependencies

Python Console Session
>>> from python_motion_planning.utils import Grid
>>> from python_motion_planning.local_planner import DDPG

Train the model and save the trained model

Train the model, only for learning-based planners, such as DDPG. It costs a lot of time to train the model, please be patient. If you want a faster training, try reducing num_episodes and batch_size, or increasing update_steps and evaluate_episodes, or fine-tuning other hyperparameters if you are familiar with them, usually in a cost of performance, however.

Python Console Session
>>> plt = DDPG(start=(5, 5, 0), goal=(45, 25, 0), env=Grid(51, 31),
>>>    actor_save_path="models/actor_best.pth", critic_save_path="models/critic_best.pth")
>>> plt.train(num_episodes=10000)

load the trained model and run

Python Console Session
>>> plt = DDPG(start=(5, 5, 0), goal=(45, 25, 0), env=Grid(51, 31),
>>>    actor_load_path="models/actor_best.pth", critic_load_path="models/critic_best.pth")
>>> plt.run()
References

[1] Continuous control with deep reinforcement learning

evaluate_policy()

Evaluate the policy and calculating the average reward.

Returns:

Name Type Description
evaluate_reward float

average reward of the policy

optimize_model()

Optimize the neural networks when training.

Returns:

Name Type Description
actor_loss float

actor loss

critic_loss float

critic loss

plan()

Deep Deterministic Policy Gradient (DDPG) motion plan function.

Returns:

Name Type Description
flag bool

planning successful if true else failed

pose_list list

history poses of robot

reset(random_sg=False)

Reset the environment and the robot.

Parameters:

Name Type Description Default
random_sg bool

whether to generate random start and goal or not

False

Returns:

Name Type Description
state Tensor

initial state of the robot

reward(state, win, lose)

The state reward function.

Parameters:

Name Type Description Default
state Tensor

current state of the robot

required
win bool

whether the episode is won (reached the goal)

required
lose bool

whether the episode is lost (collided)

required

Returns:

Name Type Description
reward float

reward for the current state

run()

Running both plannig and animation.

select_action(s)

Select the action from the actor network.

Parameters:

Name Type Description Default
s Tensor

current state

required

Returns:

Name Type Description
a Tensor

selected action

step(state, action)

Take a step in the environment.

Parameters:

Name Type Description Default
state Tensor

current state of the robot

required
action Tensor

action to take

required

Returns:

Name Type Description
next_state Tensor

next state of the robot

reward float

reward for taking the action

done bool

whether the episode is done

train(num_episodes=1000)

Train the model.

Parameters:

Name Type Description Default
num_episodes int

number of episodes to train the model

1000