Create Model_Inputs+Outputs.md
Browse files- Model_Inputs+Outputs.md +170 -0
Model_Inputs+Outputs.md
ADDED
|
@@ -0,0 +1,170 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Here we define the inputs and outputs of the "black box" Transformer-based forecasting model (Enhanced_Business_Model_for_Collaborative_Predictive_Supply_Chain_model.py) within this collaborative supply chain context.
|
| 2 |
+
We categorize them for clarity and provide details on their format and expected characteristics.
|
| 3 |
+
This detailed breakdown of inputs and outputs provides a clear picture of the data requirements and the expected results of the forecasting model, serving as a solid foundation for its development and implementation within the collaborative supply chain framework. It also sets the stage for specifying data preprocessing steps, model architecture, and evaluation metrics.
|
| 4 |
+
|
| 5 |
+
**I. Inputs**
|
| 6 |
+
|
| 7 |
+
The inputs are all the data fed into the Transformer model to generate the forecasts. Since we're aiming for a comprehensive and dynamic system, the inputs are diverse and can be grouped into several categories:
|
| 8 |
+
|
| 9 |
+
**A. Historical Sales Data:**
|
| 10 |
+
|
| 11 |
+
* **Description:** Time-series data of past sales, at the most granular level possible (ideally SKU-store-day).
|
| 12 |
+
* **Format:**
|
| 13 |
+
* **Structure:** Typically a tabular format (e.g., CSV, Parquet, database table). Could also be a tensor if pre-processed for the Transformer.
|
| 14 |
+
* **Columns:**
|
| 15 |
+
* `timestamp`: Date and time of the sale (e.g., `YYYY-MM-DD HH:MM:SS` or a Unix timestamp).
|
| 16 |
+
* `sku`: Stock Keeping Unit (unique product identifier).
|
| 17 |
+
* `store_id`: Identifier for the store location.
|
| 18 |
+
* `quantity`: Number of units sold.
|
| 19 |
+
* `price`: Unit price at the time of sale.
|
| 20 |
+
* `discount`: Any discount applied (amount or percentage).
|
| 21 |
+
* **Characteristics:**
|
| 22 |
+
* High frequency (daily or even hourly).
|
| 23 |
+
* Potentially millions or billions of rows.
|
| 24 |
+
* May exhibit seasonality, trends, and noise.
|
| 25 |
+
|
| 26 |
+
**B. Promotional Data:**
|
| 27 |
+
|
| 28 |
+
* **Description:** Information about past, current, and *planned* promotional activities.
|
| 29 |
+
* **Format:**
|
| 30 |
+
* **Structure:** Tabular format.
|
| 31 |
+
* **Columns:**
|
| 32 |
+
* `promotion_id`: Unique identifier for the promotion.
|
| 33 |
+
* `sku`: SKU(s) included in the promotion.
|
| 34 |
+
* `store_id`: Store(s) where the promotion is active.
|
| 35 |
+
* `start_date`: Start date of the promotion.
|
| 36 |
+
* `end_date`: End date of the promotion.
|
| 37 |
+
* `promotion_type`: Type of promotion (e.g., "BOGO," "percentage discount," "fixed price discount," "coupon").
|
| 38 |
+
* `discount_value`: Value of the discount (e.g., 0.2 for a 20% discount, 5.00 for a $5 discount).
|
| 39 |
+
* `marketing_spend`: (Optional) Amount spent on advertising for the promotion.
|
| 40 |
+
* **Characteristics:**
|
| 41 |
+
* Less frequent than sales data.
|
| 42 |
+
* Should include *future* planned promotions, which are crucial for forecasting.
|
| 43 |
+
|
| 44 |
+
**C. Inventory Data:**
|
| 45 |
+
|
| 46 |
+
* **Description:** Information about current and historical inventory levels.
|
| 47 |
+
* **Format:**
|
| 48 |
+
* **Structure:** Tabular format.
|
| 49 |
+
* **Columns:**
|
| 50 |
+
* `timestamp`: Date and time of the inventory snapshot.
|
| 51 |
+
* `sku`: Stock Keeping Unit.
|
| 52 |
+
* `store_id`: Store location (or warehouse ID for wholesalers).
|
| 53 |
+
* `quantity_on_hand`: Number of units currently in stock.
|
| 54 |
+
* `quantity_on_order`: Number of units ordered but not yet received.
|
| 55 |
+
* `reorder_point`: (Optional) The inventory level at which a new order should be placed.
|
| 56 |
+
* `safety_stock` (Optional) Minimum stock.
|
| 57 |
+
* **Characteristics:**
|
| 58 |
+
* Frequency can vary (daily, weekly).
|
| 59 |
+
|
| 60 |
+
**D. External Factors:**
|
| 61 |
+
|
| 62 |
+
* **Description:** Data that is not directly related to sales or inventory but can influence demand.
|
| 63 |
+
* **Format:**
|
| 64 |
+
* **Structure:** Can be tabular or time-series data from various sources.
|
| 65 |
+
* **Examples:**
|
| 66 |
+
* **Economic Indicators:** GDP growth, unemployment rate, consumer confidence index, inflation rate. (Typically time-series data from government sources or financial data providers.)
|
| 67 |
+
* **Weather Data:** Temperature, precipitation, forecasts. (Time-series data from weather APIs.)
|
| 68 |
+
* **Holiday/Event Indicators:** Binary indicators (0 or 1) for holidays, major events, school breaks. (Typically a pre-defined calendar.)
|
| 69 |
+
* **Social Media Sentiment:** Aggregated sentiment scores related to the product or brand. (Requires text processing and sentiment analysis.)
|
| 70 |
+
* **Web Traffic Data:** Website visits, product page views, search queries. (Data from web analytics platforms.)
|
| 71 |
+
* **Competitor Data:** Pricing and promotional activity of competitors (if available, often through web scraping or third-party data providers).
|
| 72 |
+
* **Characteristics:**
|
| 73 |
+
* Varying frequencies and formats depending on the source.
|
| 74 |
+
|
| 75 |
+
**E. Product Metadata:**
|
| 76 |
+
|
| 77 |
+
* **Description:** Static information about the products.
|
| 78 |
+
* **Format:**
|
| 79 |
+
* **Structure:** Tabular format.
|
| 80 |
+
* **Columns:**
|
| 81 |
+
* `sku`: Stock Keeping Unit.
|
| 82 |
+
* `product_category`: Category the product belongs to.
|
| 83 |
+
* `product_subcategory`: Subcategory.
|
| 84 |
+
* `brand`: Brand name.
|
| 85 |
+
* `product_description`: Textual description (may be used for embeddings).
|
| 86 |
+
* `price_tier`: (Optional) Categorization based on price (e.g., "economy," "mid-range," "premium").
|
| 87 |
+
* **Characteristics:**
|
| 88 |
+
* Relatively static; changes infrequently.
|
| 89 |
+
|
| 90 |
+
**F. Store Metadata:**
|
| 91 |
+
|
| 92 |
+
* **Description:** Static information of store.
|
| 93 |
+
* **Format:**
|
| 94 |
+
* **Structure:** Tabular format.
|
| 95 |
+
* **Columns:**
|
| 96 |
+
*`store_id`: Unique store identifier.
|
| 97 |
+
*`location`: City and state.
|
| 98 |
+
*`store_type`: Physical, online, mixed.
|
| 99 |
+
|
| 100 |
+
**II. Outputs**
|
| 101 |
+
|
| 102 |
+
The outputs are the forecasts generated by the Transformer model.
|
| 103 |
+
|
| 104 |
+
**A. Probabilistic Forecasts:**
|
| 105 |
+
|
| 106 |
+
* **Description:** Instead of a single point forecast (e.g., "we will sell 100 units"), the model provides a *probability distribution* of future demand. This quantifies the uncertainty in the forecast.
|
| 107 |
+
* **Format:**
|
| 108 |
+
* **Structure:** Typically a set of quantiles (or percentiles) for each SKU-store-future time period.
|
| 109 |
+
* **Example:** For SKU 123, store A, on 2024-07-04, the model might output:
|
| 110 |
+
* `p10`: 80 units (10th percentile - there's a 10% chance demand will be 80 units or less)
|
| 111 |
+
* `p50`: 105 units (50th percentile - median forecast)
|
| 112 |
+
* `p90`: 130 units (90th percentile - there's a 90% chance demand will be 130 units or less)
|
| 113 |
+
* ...and other quantiles as needed (e.g., p25, p75, p95, p99).
|
| 114 |
+
* **Characteristics:**
|
| 115 |
+
* Provides a range of possible outcomes, allowing for risk-aware decision-making.
|
| 116 |
+
* Allows for calculation of confidence intervals.
|
| 117 |
+
|
| 118 |
+
**B. Forecast Horizon:**
|
| 119 |
+
|
| 120 |
+
* **Description:** The length of time into the future for which the model generates forecasts.
|
| 121 |
+
* **Format:**
|
| 122 |
+
* Defined by the model configuration and the needs of the business. Could be days, weeks, or months.
|
| 123 |
+
* Typically specified as a number of time steps (e.g., 28 days, 12 weeks).
|
| 124 |
+
* **Characteristics:**
|
| 125 |
+
* Longer horizons generally have greater uncertainty.
|
| 126 |
+
|
| 127 |
+
**C. Forecast Granularity:**
|
| 128 |
+
|
| 129 |
+
* **Description:** The level of detail at which the forecasts are generated (SKU-store-day, SKU-region-week, etc.).
|
| 130 |
+
* **Format:**
|
| 131 |
+
* Determined by the model and the available data.
|
| 132 |
+
* Should align with the business needs (e.g., retailers need store-level forecasts, while wholesalers might need regional forecasts).
|
| 133 |
+
|
| 134 |
+
**D. Forecast Timestamps:**
|
| 135 |
+
|
| 136 |
+
* **Description:** The specific dates and times for which the forecasts are generated.
|
| 137 |
+
* **Format:**
|
| 138 |
+
* A list or sequence of timestamps corresponding to the forecast horizon and granularity.
|
| 139 |
+
* Example: `[2024-07-04, 2024-07-05, 2024-07-06, ...]`
|
| 140 |
+
|
| 141 |
+
**E. (Optional) Explainability Outputs:**
|
| 142 |
+
|
| 143 |
+
* **Description:** Outputs that help explain *why* the model made a particular forecast. This is especially important for building trust and understanding.
|
| 144 |
+
* **Format:**
|
| 145 |
+
* **Attention Weights:** For Transformer models, the attention weights can be visualized to show which parts of the input sequence were most important for the prediction.
|
| 146 |
+
* **Feature Importance Scores:** Estimates of the relative importance of different input features.
|
| 147 |
+
* **SHAP Values:** A more sophisticated method for explaining individual predictions.
|
| 148 |
+
* **Characteristics:**
|
| 149 |
+
* Can be complex to interpret, but provide valuable insights.
|
| 150 |
+
|
| 151 |
+
**Summary Table:**
|
| 152 |
+
|
| 153 |
+
| Category | Description | Format | Characteristics |
|
| 154 |
+
| ---------------- | -------------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- |
|
| 155 |
+
| **Inputs** | | | |
|
| 156 |
+
| Historical Sales | Past sales data (SKU-store-day level) | Tabular (timestamp, sku, store_id, quantity, price, discount) | High frequency, potentially large, may exhibit seasonality/trends/noise. |
|
| 157 |
+
| Promotional Data | Past, current, and *planned* promotions | Tabular (promotion_id, sku, store_id, start/end dates, type, value, spend) | Less frequent than sales data, includes future promotions. |
|
| 158 |
+
| Inventory Data | Current and historical inventory levels | Tabular (timestamp, sku, store_id/warehouse_id, quantity_on_hand, quantity_on_order, reorder point) | Frequency varies (daily, weekly). |
|
| 159 |
+
| External Factors | Economic indicators, weather, holidays, social media, web traffic, competitors | Tabular or time-series (various) | Varying frequencies and formats. |
|
| 160 |
+
| Product Metadata | Static information about products | Tabular (sku, category, subcategory, brand, description, price_tier) | Relatively static. |
|
| 161 |
+
| Store Metadata | Static information of store | Tabular (store_id, location, store_type) | Relatively static.
|
| 162 |
+
|
| 163 |
+
| **Outputs** | Description | Format | Characteristics |
|
| 164 |
+
| ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------- | -------------------------------------------------------------------- |
|
| 165 |
+
| Probabilistic Forecasts | Probability distribution of future demand | Set of quantiles (p10, p50, p90, etc.) for each SKU-store-future time period | Provides a range of outcomes, quantifies uncertainty. |
|
| 166 |
+
| Forecast Horizon | Length of time into the future | Number of time steps (days, weeks, months) | Longer horizons have greater uncertainty. |
|
| 167 |
+
| Forecast Granularity| Level of detail (SKU-store-day, SKU-region-week, etc.) | Determined by model and business needs | Aligns with business requirements. |
|
| 168 |
+
| Forecast Timestamps | Dates/times for which forecasts are generated | List/sequence of timestamps | Corresponds to horizon and granularity. |
|
| 169 |
+
| Explainability (Optional) | Outputs that explain model predictions | Attention weights, feature importance scores, SHAP values | Complex to interpret, but provide valuable insights. |
|
| 170 |
+
|