DDPG¶

`python_motion_planning.local_planner.ddpg.DDPG` ¶

Bases: LocalPlanner

Class for Deep Deterministic Policy Gradient (DDPG) motion planning.

Parameters:

Name	Type	Description	Default
`start`	`tuple`	start point coordinate	required
`goal`	`tuple`	goal point coordinate	required
`env`	`Env`	environment	required
`heuristic_type`	`str`	heuristic function type	`'euclidean'`
`hidden_depth`	`int`	the number of hidden layers of the neural network	`3`
`hidden_width`	`int`	the number of neurons in hidden layers of the neural network	`512`
`batch_size`	`int`	batch size to optimize the neural networks	`2000`
`buffer_size`	`int`	maximum replay buffer size	`1000000.0`
`gamma`	`float`	discount factor	`0.999`
`tau`	`float`	Softly update the target network	`0.001`
`lr`	`float`	learning rate	`0.0001`
`train_noise`	`float`	Action noise coefficient during training for exploration	`0.1`
`random_episodes`	`int`	Take the random actions in the beginning for the better exploration	`50`
`max_episode_steps`	`int`	Maximum steps for each episode	`200`
`update_freq`	`int`	Frequency (times) of updating the network for each step	`1`
`update_steps`	`int`	Update the network for every 'update_steps' steps	`1`
`evaluate_freq`	`int`	Frequency (times) of evaluations and calculate the average	`50`
`evaluate_episodes`	`int`	Evaluate the network every 'evaluate_episodes' episodes	`50`
`actor_save_path`	`str`	Save path of the trained actor network	`'models/actor_best.pth'`
`critic_save_path`	`str`	Save path of the trained critic network	`'models/critic_best.pth'`
`actor_load_path`	`str`	Load path of the trained actor network	`None`
`critic_load_path`	`str`	Load path of the trained critic network	`None`
`**params`		other parameters can be found in the parent class LocalPlanner	`{}`

Examples:

Import the necessary dependencies¶

Python Console Session

>>> from python_motion_planning.utils import Grid
>>> from python_motion_planning.local_planner import DDPG

Train the model and save the trained model¶

Train the model, only for learning-based planners, such as DDPG. It costs a lot of time to train the model, please be patient. If you want a faster training, try reducing num_episodes and batch_size, or increasing update_steps and evaluate_episodes, or fine-tuning other hyperparameters if you are familiar with them, usually in a cost of performance, however.

Python Console Session

>>> plt = DDPG(start=(5, 5, 0), goal=(45, 25, 0), env=Grid(51, 31),
>>>    actor_save_path="models/actor_best.pth", critic_save_path="models/critic_best.pth")
>>> plt.train(num_episodes=10000)

load the trained model and run¶

Python Console Session

>>> plt = DDPG(start=(5, 5, 0), goal=(45, 25, 0), env=Grid(51, 31),
>>>    actor_load_path="models/actor_best.pth", critic_load_path="models/critic_best.pth")
>>> plt.run()

References

[1] Continuous control with deep reinforcement learning

`evaluate_policy()` ¶

Evaluate the policy and calculating the average reward.

Returns:

Name	Type	Description
`evaluate_reward`	`float`	average reward of the policy

`optimize_model()` ¶

Optimize the neural networks when training.

Returns:

Name	Type	Description
`actor_loss`	`float`	actor loss
`critic_loss`	`float`	critic loss

`plan()` ¶

Deep Deterministic Policy Gradient (DDPG) motion plan function.

Returns:

Name	Type	Description
`flag`	`bool`	planning successful if true else failed
`pose_list`	`list`	history poses of robot

`reset(random_sg=False)` ¶

Reset the environment and the robot.

Parameters:

Name	Type	Description	Default
`random_sg`	`bool`	whether to generate random start and goal or not	`False`

Returns:

Name	Type	Description
`state`	`Tensor`	initial state of the robot

`reward(state, win, lose)` ¶

The state reward function.

Parameters:

Name	Type	Description	Default
`state`	`Tensor`	current state of the robot	required
`win`	`bool`	whether the episode is won (reached the goal)	required
`lose`	`bool`	whether the episode is lost (collided)	required

Returns:

Name	Type	Description
`reward`	`float`	reward for the current state

`run()` ¶

Running both plannig and animation.

`select_action(s)` ¶

Select the action from the actor network.

Parameters:

Name	Type	Description	Default
`s`	`Tensor`	current state	required

Returns:

Name	Type	Description
`a`	`Tensor`	selected action

`step(state, action)` ¶

Take a step in the environment.

Parameters:

Name	Type	Description	Default
`state`	`Tensor`	current state of the robot	required
`action`	`Tensor`	action to take	required

Returns:

Name	Type	Description
`next_state`	`Tensor`	next state of the robot
`reward`	`float`	reward for taking the action
`done`	`bool`	whether the episode is done

`train(num_episodes=1000)` ¶

Train the model.

Parameters:

Name	Type	Description	Default
`num_episodes`	`int`	number of episodes to train the model	`1000`

DDPG¶

python_motion_planning.local_planner.ddpg.DDPG ¶

Import the necessary dependencies¶

Train the model and save the trained model¶

load the trained model and run¶

evaluate_policy() ¶

optimize_model() ¶

plan() ¶

reset(random_sg=False) ¶

reward(state, win, lose) ¶

run() ¶

select_action(s) ¶

step(state, action) ¶

train(num_episodes=1000) ¶

`python_motion_planning.local_planner.ddpg.DDPG` ¶

`evaluate_policy()` ¶

`optimize_model()` ¶

`plan()` ¶

`reset(random_sg=False)` ¶

`reward(state, win, lose)` ¶

`run()` ¶

`select_action(s)` ¶

`step(state, action)` ¶

`train(num_episodes=1000)` ¶