fast_training_env
Attributes
Classes
Fast training environment with minimal state tracking. |
Module Contents
- fast_training_env.EPSILON = 1e-08
- fast_training_env.PCT_TO_REWARD_SCALE = 100.0
- class fast_training_env.FastTrainingEnv(data, cfg, features, time_step=(TimeFrameUnit.Day, 1))
Bases:
trading.src.alg.environments.base_environment.BaseTradingEnvFast training environment with minimal state tracking. Optimized for speed with constant-time operations. Does NOT maintain position history, trade constraints, or complex metrics. Target: 10,000 iterations per second.
Anti-memorization features: - Symbol shuffling at each reset to prevent learning position-specific patterns - hmax constraint to limit concentration in any single stock
- Parameters:
data (pandas.DataFrame)
features (list[str] | list[trading.src.features.generic_features.Feature])
time_step (tuple[alpaca.data.timeframe.TimeFrameUnit, int])
- initial_cash
- cash
- holdings
- _symbol_permutation
- _inverse_permutation
- _hmax
- _precompute_price_arrays()
Pre-compute price arrays and feature matrices for fast lookups.
- _get_observation(i=-1)
Get observation with minimal computation using pre-computed matrices. Returns: [cash, holdings, current_prices, indicators]
Note: Holdings and prices are returned in SHUFFLED order matching the current symbol permutation, so the model sees a consistent view.
- Parameters:
i (int)
- Return type:
numpy.ndarray
- _get_shuffled_features(start_idx, end_idx)
Get features for the lookback window, shuffled to match symbol permutation. Maintains speed by using pre-computed indices.
- Parameters:
start_idx (int)
end_idx (int)
- Return type:
numpy.ndarray
- reset(*, seed=None, options=None)
Reset to initial state with symbol shuffling.
Symbol shuffling prevents the model from memorizing that a specific action index corresponds to the best-performing stock. Each episode, the mapping between action indices and actual stocks is randomized.
- Parameters:
seed (Optional[int])
options (Optional[dict])
- step(action)
Fast step with minimal state updates. Reward based on immediate portfolio value change.
Actions are mapped through the symbol permutation to actual stock indices. hmax constraint limits maximum shares traded per stock per step.