fast_training_env

Attributes

`EPSILON`
`PCT_TO_REWARD_SCALE`

Classes

FastTrainingEnv

Fast training environment with minimal state tracking.

Module Contents

fast_training_env.EPSILON = 1e-08

fast_training_env.PCT_TO_REWARD_SCALE = 100.0

class fast_training_env.FastTrainingEnv(data, cfg, features, time_step=(TimeFrameUnit.Day, 1))

Bases: trading.src.alg.environments.base_environment.BaseTradingEnv

Fast training environment with minimal state tracking. Optimized for speed with constant-time operations. Does NOT maintain position history, trade constraints, or complex metrics. Target: 10,000 iterations per second.

Anti-memorization features: - Symbol shuffling at each reset to prevent learning position-specific patterns - hmax constraint to limit concentration in any single stock

Parameters:

data (pandas.DataFrame)
cfg (trading.cli.alg.config.StockEnv)
features (list[str] | list[trading.src.features.generic_features.Feature])
time_step (tuple[alpaca.data.timeframe.TimeFrameUnit, int])

initial_cash

cash

holdings

_symbol_permutation

_inverse_permutation

_hmax

_precompute_price_arrays(): Pre-compute price arrays and feature matrices for fast lookups.

_get_observation(i=-1)

Get observation with minimal computation using pre-computed matrices. Returns: [cash, holdings, current_prices, indicators]

Note: Holdings and prices are returned in SHUFFLED order matching the current symbol permutation, so the model sees a consistent view.

Parameters:: i (int)
Return type:: numpy.ndarray

_get_shuffled_features(start_idx, end_idx)

Get features for the lookback window, shuffled to match symbol permutation. Maintains speed by using pre-computed indices.

Parameters:

start_idx (int)
end_idx (int)

Return type:

numpy.ndarray

reset(*, seed=None, options=None)

Reset to initial state with symbol shuffling.

Symbol shuffling prevents the model from memorizing that a specific action index corresponds to the best-performing stock. Each episode, the mapping between action indices and actual stocks is randomized.

Parameters:

seed (Optional[int])
options (Optional[dict])

step(action)

Fast step with minimal state updates. Reward based on immediate portfolio value change.

Actions are mapped through the symbol permutation to actual stock indices. hmax constraint limits maximum shares traded per stock per step.