Environments

Environments are the core of any DRL project since they specify the “game” and interaction logic

Tetris_scheduling Environment

This file provides the scheduling environment class Env, which can be used to load and simulate scheduling-problem instances.

class environments.env_tetris_scheduling.Env(config: dict, data: List[List[Task]])

Bases: Env

Environment for scheduling optimization. This class inherits from the base gym environment, so the functions step, reset, _state_obs and render are implemented and can be used by default.

If you want to customize the given rewards, you can adapt the function compute_reward.

Parameters
  • config – Dictionary with parameters to specify environment attributes

  • data – Scheduling problem to be solved, so a list of instances

__init__(config: dict, data: List[List[Task]])
reset() List[float]
  • Resets the episode information trackers

  • Updates the number of runs

  • Loads new instance

Returns

First observation by calling the class function self.state_obs

step(action: ~typing.Union[int, float], **kwargs) -> (typing.List[float], typing.Any, <class 'bool'>, typing.Dict)

Step Function :param action: Action to be performed on the current state of the environment :return: Observation, reward, done, infos

get_instance_info() -> (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>)

Retrieves info about the instance size and configuration from an instance sample :return: (number of jobs, number of tasks and the maximum runtime) of this datapoint

property state_obs: List[float]

Transforms state (task state and factory state) to gym obs Scales the values between 0-1 and transforms to onehot encoding

Returns

Observation

static to_one_hot(x: int, max_size: int) array

Convert to One Hot encoding

Parameters
  • x – Index which value should be 1

  • max_size – Size of the one hot encoding vector

Returns

One hot encoded vector

static check_valid_job_action(job_action: array, job_mask: array) bool

Check if job action is valid

Parameters
  • job_action – Job action as one hot vector

  • job_mask – One hot vector with ones for each valid job

Returns

True if job_action is valid, else False

get_selected_task(job_idx: int) Tuple[int, Task]

Helper Function to get the selected task (next possible task) only by the job index

Parameters

job_idx – job index

Returns

Index of the task in the task list and the selected task

choose_machine(task: Task) int

This function performs the logic, with which the machine is chosen (in the case of the flexible JSSP) Implemented at the moment: Choose the machine out of the set of possible machines with the earliest possible start time

Parameters

task – Task

Returns

Machine on which the task will be scheduled.

get_action_mask() array

Get Action mask It is needed for the heuristics, the machine selection (and the agent, if it is masked). 0 -> available 1 -> not available

Returns

Action mask

execute_action(job_id: int, task: Task, machine_id: int) None

This Function executes a valid action - set machine - update job and task

Parameters
  • job_id – job_id of the task to be executed

  • task – Task

  • machine_id – ID of the machine on which the task is to be executed

Returns

None

compute_reward() Any

Calculates the reward that will later be returned to the agent. Uses the self.reward_strategy string to discriminate between different reward strategies. Default is ‘dense_reward’.

Returns

Reward

sparse_makespan_reward() int

Computes the reward based on the final makespan at the end of the episode. Else 0.

Returns

(int) sparse reward

mr2_reward() Any

Computes mr2 reward based on https://doi.org/10.1016/j.engappai.2022.104868

Returns

mr2 reward

check_done() bool

Check if all jobs are done

Returns

True if all jobs are done, else False

calculate_tardiness() int

Calculates the tardiness of all jobs (this is the previous was the calc reward function)

Returns

(int) tardiness of last solution

get_makespan()

Returns the current makespan (the time the latest of all scheduled tasks finishes)

log_intermediate_step() None

Log Function

Returns

None

close()

This is a relict of using OpenAI Gym API. This is currently unnecessary.

seed(seed=1)

This is a relict of using OpenAI Gym API. Currently unnecessary, because the environment is deterministic -> no seed is used.

render(mode='human')

Visualizes the current status of the environment

Parameters

mode – “human”: Displays the gantt chart, “image”: Returns an image of the gantt chart

Returns

PIL.Image.Image if mode=image, else None

Tetris_scheduling_indirect_action Environment

class environments.env_tetris_scheduling_indirect_action.IndirectActionEnv(config: dict, data: List[List[Task]])

Bases: Env

Scheduling environment for scheduling optimization according to https://www.sciencedirect.com/science/article/pii/S0952197622001130.

Main differences to the vanilla environment:

  • ACTION: Indirect action mapping

  • REWARD: m-r2 reward (which means we have to train on the same data again and again)

  • OBSERVATION: observation different (“normalization” looks like division by max to [0, 1] in paper code). Not every part makes sense, due to the different interaction logic

  • INTERACTION LOGIC WARNING:

  • original paper: time steps are run through, the agent can take as many actions as it wants per time-step, but may not schedule into the past.

  • our adaptation: we still play tetris, meaning that we schedule whole blocks of work at a time

Parameters
  • config – Dictionary with parameters to specify environment attributes

  • data – Scheduling problem to be solved, so a list of instances

__init__(config: dict, data: List[List[Task]])
step(action: int, **kwargs)

Step Function

Parameters
  • action – Action to be performed on the current state of the environment

  • kwargs – should include “action_mode”, because the interaction pattern between heuristics and the agent are different and need to be processed differently

Returns

Observation, reward, done, infos

reset() ndarray
  • Resets the episode information trackers

  • Updates the number of runs

  • Loads new instance

Returns

First observation by calling the class function self.state_obs

property state_obs: ndarray

Transforms state (task state and factory state) to gym obs Scales the values between 0-1 and transforms to onehot encoding Confer https://www.sciencedirect.com/science/article/pii/S0952197622001130 section 4.2.1

Returns

Observation

get_action_mask() array

Get Action mask In this environment, we always treat all actions as valid, because the interaction logic accepts it. Note that we only allow non-masked algorithms. The heuristics, however, still need the job mask. 0 -> available 1 -> not available

Returns

Action mask

get_next_tasks()

returns the next tasks that can be scheduled

Util Functions

class environments.environment_loader.EnvironmentLoader

Bases: object

Loads the right environment as named in the passed config file. Also checks if the environment is compatible with the chosen algorithm.

classmethod load(config: dict, check_env_agent_compatibility: bool = True, register_gym_env: bool = False, **kwargs) Tuple[Any, str]

loading function

classmethod check_environment_agent_compatibility(config: dict, env_name: Optional[str] = None, algo_name: Optional[str] = None)

Check if environment and algorithm are compatible. E.g., some environments may depend on action masking.