DQNPlanner¶
python_motion_planning.local_planner.dqn.DQNPlanner
¶
Bases: LocalPlanner
Class for Fully Connected Deep Q-Value Network (DQN) motion planning.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start
|
tuple
|
start point coordinate |
required |
goal
|
tuple
|
goal point coordinate |
required |
env
|
Env
|
environment |
required |
heuristic_type
|
str
|
heuristic function type |
'euclidean'
|
hidden_depth
|
int
|
the number of hidden layers of the neural network |
required |
hidden_width
|
int
|
the number of neurons in hidden layers of the neural network |
required |
batch_size
|
int
|
batch size to optimize the neural networks |
2000
|
buffer_size
|
int
|
maximum replay buffer size |
1000000.0
|
gamma
|
float
|
discount factor |
0.999
|
tau
|
float
|
Softly update the target network |
0.001
|
lr
|
float
|
learning rate |
0.0001
|
train_noise
|
float
|
Action noise coefficient during training for exploration |
0.1
|
random_episodes
|
int
|
Take the random actions in the beginning for the better exploration |
50
|
max_episode_steps
|
int
|
Maximum steps for each episode |
200
|
update_freq
|
int
|
Frequency (times) of updating the network for each step |
1
|
update_steps
|
int
|
Update the network for every 'update_steps' steps |
1
|
evaluate_freq
|
int
|
Frequency (times) of evaluations and calculate the average |
50
|
evaluate_episodes
|
int
|
Evaluate the network every 'evaluate_episodes' episodes |
50
|
actor_save_path
|
str
|
Save path of the trained actor network |
required |
critic_save_path
|
str
|
Save path of the trained critic network |
required |
actor_load_path
|
str
|
Load path of the trained actor network |
required |
critic_load_path
|
str
|
Load path of the trained critic network |
required |
**params
|
other parameters can be found in the parent class LocalPlanner |
{}
|
Examples:
>>> from python_motion_planning.utils import Grid
>>> from python_motion_planning.local_planner import DDPG
>>> plt = DDPG(start=(5, 5, 0), goal=(45, 25, 0), env=Grid(51, 31),
actor_save_path="models/actor_best.pth", critic_save_path="models/critic_best.pth")
>>> plt.train(num_episodes=10000)
load the trained model and run¶
>>> plt = DDPG(start=(5, 5, 0), goal=(45, 25, 0), env=Grid(51, 31),
actor_load_path="models/actor_best.pth", critic_load_path="models/critic_best.pth")
>>> plt.run()
buildActionSpace()
¶
Action space consists of 25 uniformly sampled actions in permitted range and 25 randomly sampled actions.
evaluate_policy()
¶
Evaluate the policy and calculating the average reward.
Returns:
Name | Type | Description |
---|---|---|
evaluate_reward |
float
|
average reward of the policy |
optimize_model()
¶
Optimize the neural networks when training.
Returns:
Name | Type | Description |
---|---|---|
actor_loss |
float
|
actor loss |
critic_loss |
float
|
critic loss |
plan()
¶
Deep Deterministic Policy Gradient (DDPG) motion plan function.
Returns:
Name | Type | Description |
---|---|---|
flag |
bool
|
planning successful if true else failed |
pose_list |
list
|
history poses of robot |
reset(random_sg=False)
¶
Reset the environment and the robot.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
random_sg
|
bool
|
whether to generate random start and goal or not |
False
|
Returns:
Name | Type | Description |
---|---|---|
state |
Tensor
|
initial state of the robot |
reward(prev_state, state, win, lose)
¶
The state reward function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
Tensor
|
current state of the robot |
required |
win
|
bool
|
whether the episode is won (reached the goal) |
required |
lose
|
bool
|
whether the episode is lost (collided) |
required |
Returns:
Name | Type | Description |
---|---|---|
reward |
float
|
reward for the current state |
run()
¶
Running both plannig and animation.
step(state, action)
¶
Take a step in the environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
Tensor
|
current state of the robot |
required |
action
|
Tensor
|
action to take |
required |
Returns:
Name | Type | Description |
---|---|---|
next_state |
Tensor
|
next state of the robot |
reward |
float
|
reward for taking the action |
done |
bool
|
whether the episode is done |
train(num_episodes=10000)
¶
Train the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_episodes
|
int
|
number of episodes to train the model |
10000
|