Action Space#

An action is specified by an action type (e.g., CLICK_COORDS) and the necessary fields for that action type (e.g., coords=[30, 60]).

Supported Action Types#

MiniWoB++ environments support the following action types:

Name

Description

NONE

Do nothing for the current step.

MOVE_COORDS

Move the cursor to the specified coordinates.

CLICK_COORDS

Click on the specified coordinates.

DBLCLICK_COORDS

Double-click on the specified coordinates.

MOUSEDOWN_COORDS

Start dragging on the specified coordinates.

MOUSEUP_COORDS

Stop dragging on the specified coordinates.

SCROLL_UP_COORDS

Scroll up on the mouse wheel at the specified coordinates.

SCROLL_DOWN_COORDS

Scroll down on the mouse wheel at the specified coordinates.

CLICK_ELEMENT

Click on the specified element using JavaScript.

PRESS_KEY

Press the specified key or key combination.

TYPE_TEXT

Type the specified string.

TYPE_FIELD

Type the value of the specified task field.

FOCUS_ELEMENT_AND_TYPE_TEXT

Click on the specified element using JavaScript, and then type the specified string.

FOCUS_ELEMENT_AND_TYPE_FIELD

Click on the specified element using JavaScript, and then type the value of the specified task field.

There are action types that perform similar actions (e.g., CLICK_COORDS and CLICK_ELEMENT). A common practice is to specify a subset of action types that the agent can use in the config, as described below.

Action Configs#

The list of selected action types, along with other configurations, can be customized by passing a miniwob.action.ActionSpaceConfig object to the action_space_config argument during environment construction.

An ActionSpaceConfig object has the following fields:

Key

Type

Description

action_types

Sequence[ActionTypes]

An ordered sequence of action types to include.

screen_width

float

Screen width. Will be overridden by the environment constructor.

screen_height

float

Screen height. Will be overridden by the environment constructor.

coord_bins

tuple[int, int]

If specified, bin the x and y coordinates to these numbers of bins. Mouse actions will be executed at the middle of the specified partition.

scroll_amount

int

The amount to scroll for scroll actions.

scroll_time

int

Time in milliseconds to wait for scroll action animation.

allowed_keys

Sequence[str]

An ordered sequence of allowed keys and key combinations for the PRESS_KEY action.

text_max_len

int

Maximum text length for the TYPE_TEXT action.

text_charset

str or set[str]

Character set for the TYPE_TEXT action.

Presets#

The following preset names can be specified in place of the ActionSpaceConfig object:

Adding "_mac_os" to the preset name will change the key modifiers in allowed_keys from Control to Meta.

Key combinations#

The PRESS_KEY action type issues a key combination via Selenium. Each key combination in the allowed_keys config follow the rules:

  • Modifiers are specified using prefixes “C-” (Control), “S-” (Shift), “A-” (Alternate), or “M-” (Meta).

  • Printable character keys (a, 1, etc.) are specified directly. Shifted characters (A, !, etc.) are equivalent to “S-” + non-shifted counterpart.

  • Special keys are inclosed in “<…>”. The list of valid names is specified in miniwob.constants.WEBDRIVER_SPECIAL_KEYS.

Example valid key combinations:"7", "<Enter>", "C-S-<ArrowLeft>".

Action Object#

The action passed to the step method should be a dict whose field inclusion depends on the selected action types in the config:

Key

Type

Description

Inclusion

action_type

int

Action type index from the action_types list in the config.

Always.

coords

np.ndarray of shape (2,)

Left and top coordinates. Depending on the coord_bins config, the values can be of type int8 (binned) or float32 (unbinned).

When any *COORDS action type is selected.

ref

int

Element ref ID. If no element has the specified ref, the action becomes a no-op.

When any *_ELEMENT* action type is selected.

key

int

Key index from the allowed_keys list in the config.

When the PRESS_KEY action type is selected.

text

str

Text to type.

When any *_TYPE_TEXT action type is selected.

field

int

Index from the task field list obs["fields"]. If the index is out of bound, no text will be typed.

When any *_TYPE_FIELD action type is selected.

For instance, if the config only contains action types CLICK_COORDS and PRESS_KEY, the action object can be

action = {
  "action_type": 0,     # CLICK_COORDS
  "coords": np.array([100, 50]),
  "key": 0,             # Ignored by the action CLICK_COORDS
}