← 返回目录

默认的参数配置在joyrl/framework/config.py中,具体如下:

`` python class GeneralConfig(object): ''' General parameters for running ''' def __init__(self) -> None: # basic settings self.env_name = "gym" # name of environment self.algo_name = "DQN" # name of algorithm self.mode = "train" # train, test self.device = "cpu" # device to use self.seed = 0 # random seed self.max_episode = -1 # number of episodes for training, set -1 to keep running self.max_step = -1 # number of episodes for testing, set -1 means unlimited steps self.collect_traj = False # if collect trajectory or not # multiprocessing settings self.n_interactors = 1 # number of workers self.interactor_mode = "dummy" # dummy, only works when learner_mode is serial self.learner_mode = "serial" # serial, parallel, whether workers and learners are in parallel self.n_learners = 1 # number of learners if using multi-processing, default 1 # online evaluation settings self.online_eval = False # online evaluation or not self.online_eval_episode = 10 # online eval episodes self.model_save_fre = 500 # model save frequency per update step # load model settings self.load_checkpoint = False # if load checkpoint self.load_path = "Train_single_CartPole-v1_DQN_20230515-211721" # path to load model self.load_model_step = 'best' # load model at which step # stats recorder settings self.interact_summary_fre = 1 # record interact stats per episode self.policy_summary_fre = 100 # record update stats per update step `

说明:

env_name:环境名称,目前只支持gym环境,后续会支持自定义环境。 algo_name:算法名称,如DQNPPO等, 详见算法参数说明 mode:模式,traintest device:设备,cpucuda seed:随机种子, 当为0时,则不设置随机种子。 max_episode:最大训练回合数,当为-1时,则不限制训练轮数。 max_step:每回合最大步数,当为-1时,则不限制每回合最大步数,直到环境返回done=True或者truncate=True请根据实际环境情况设置 collect_traj:是否收集轨迹,当为True时,则收集轨迹,否则不收集轨迹,一般用于模仿学习、逆强化学习等。 n_interactors:交互器数量,默认为1,请根据实际情况设置。 n_learners:学习器数量,默认为1,请根据实际情况设置。 online_eval:是否在线测试,当为True时,则在线测试,否则不在线测试。开启在线测试时,会额外输出一个名为best的模型,用于保存训练过程中测试效果最好的模型,但不一定是最新的模型。 online_eval_episode:在线测试回合数,请根据实际情况设置。 model_save_fre:模型文件保存频率,注意不要设置过小,否则会影响训练效率。 load_checkpoint:是否加载模型文件,当为True时,则加载模型文件,否则不加载模型文件。 load_path:模型文件路径,当load_checkpoint=True时有效。 load_model_step:加载模型文件的步数,best表示加载最好的模型。 interact_summary_fre: 交互器统计频率,每隔多少回合统计一次交互器的统计信息,如奖励等,对于复杂的任务,可以设置为10,避免对于简单的任务,可以设置为1 policy_summary_fre`: 学习器统计频率,每隔多少更新部署统计一次学习器的统计信息,如损失等,注意不要设置过小。