Spaces:

amiguel
/

inspekta_deck

Sleeping

App Files Files Community

amiguel commited on Aug 25

Commit

29104c7

verified ·

1 Parent(s): f24a30d

Upload 13 files

Browse files

Files changed (14) hide show

.gitattributes +1 -0
src/FUTURE_ENHANCEMENTS.md +306 -0
src/IMPLEMENTATION_ROADMAP.md +111 -0
src/README.md +2 -0
src/README_RAG.md +240 -0
src/clv.py +55 -0
src/dal.py +60 -0
src/gir.py +20 -0
src/notifs.py +948 -0
src/notifs_data.db +3 -0
src/paz.py +55 -0
src/rag_chatbot.py +548 -0
src/setup_rag.py +185 -0
src/utils.py +222 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+src/notifs_data.db filter=lfs diff=lfs merge=lfs -text

src/FUTURE_ENHANCEMENTS.md ADDED Viewed

	@@ -0,0 +1,306 @@

+# DigiTwin Application - Future Enhancements
+## Overview
+This document outlines planned enhancements for the DigiTwin application to improve performance, data management, and user experience through advanced analytics and AI capabilities.
+---
+## 1. Data Preprocessing Module
+### Objective
+Create a dedicated preprocessing module to optimize dataset size and improve application performance by removing unnecessary columns and cleaning data before storage.
+### Implementation Plan
+#### 1.1 Column Analysis & Removal
+- **Module**: `preprocessing.py`
+- **Functionality**:
+  - Analyze uploaded Excel files for column usage patterns
+  - Identify and remove columns with:
+    - High percentage of null values (>80%)
+    - Redundant information
+    - Non-essential metadata
+  - Preserve critical columns: FPSO, Main WorkCtr, Notification Type, Location, Keywords, etc.
+#### 1.2 Data Cleaning Pipeline
+```python
+def preprocess_notifications_data(df):
+    """
+    Preprocess notification data to reduce size and improve quality
+    """
+    # Remove unnecessary columns to improve memory footprint
+    columns_to_remove = [
+        'Priority',           # Redundant priority information
+        'Notification',       # Duplicate notification data
+        'Order',             # Order information not needed for analytics
+        'Planner group'       # Planner group metadata
+    ]
+    # Remove specified columns
+    df = df.drop(columns=columns_to_remove, errors='ignore')
+    # Clean data types
+    # Remove duplicates
+    # Standardize text fields
+    # Optimize memory usage
+    return cleaned_df
+```
+#### 1.3 Benefits
+- **Reduced Database Size**: Removal of Priority, Notification, Order, and Planner group columns reduces dataset size by 15-25%
+- **Improved Performance**: Faster loading and processing times due to reduced memory footprint
+- **Better Memory Management**: Optimized data types and structures for cached database
+- **Data Quality**: Consistent formatting and validation while preserving essential analytics columns
+- **Focused Analytics**: Streamlined dataset containing only relevant columns for FPSO analysis
+---
+## 2. Feature Engineering Enhancements
+### Objective
+Enhance the dataset with derived features to provide deeper insights and better analytics capabilities.
+### Implementation Plan
+#### 2.1 Main WorkCtr Feature Engineering
+- **Categorization**: Group work centers into logical categories
+- **Priority Levels**: Assign priority based on work center type
+- **Frequency Analysis**: Track most common work centers per FPSO
+#### 2.2 Additional Feature Engineering
+```python
+def engineer_features(df):
+    """
+    Create new features from existing data
+    """
+    # Time-based features
+    df['notification_age_days'] = (pd.Timestamp.now() - df['date']).dt.days
+    df['is_urgent'] = df['notification_age_days'] <= 7
+    # Location-based features
+    df['location_category'] = categorize_location(df['location'])
+    df['is_critical_area'] = is_critical_location(df['location'])
+    # Keyword-based features
+    df['keyword_count'] = df['keywords'].str.count(',') + 1
+    df['has_safety_keyword'] = df['keywords'].str.contains('safety|emergency', case=False)
+    # FPSO-specific features
+    df['fpso_notification_density'] = df.groupby('fpso')['notification_id'].transform('count')
+    return df
+```
+#### 2.3 New Features to Add
+- **Temporal Features**:
+  - Notification age (days since creation)
+  - Urgency indicators
+  - Seasonal patterns
+- **Spatial Features**:
+  - Location categories (Deck, Hull, Machinery, etc.)
+  - Critical area flags
+  - Zone-based grouping
+- **Operational Features**:
+  - Work center complexity scores
+  - Resource allocation indicators
+  - Maintenance priority levels
+---
+## 3. LLM Integration with RAG (Retrieval-Augmented Generation)
+### Objective
+Implement conversational AI capabilities to allow users to query the cached dataset using natural language, providing intelligent insights and recommendations.
+### Implementation Plan
+#### 3.1 RAG Architecture
+```python
+class DigiTwinRAG:
+    """
+    RAG system for querying notification data
+    """
+    def __init__(self, db_path):
+        self.db_path = db_path
+        self.vector_store = None
+        self.llm_model = None
+    def setup_vector_store(self):
+        """Create vector embeddings for notification data"""
+        # Load data from SQLite
+        # Create embeddings using sentence-transformers
+        # Store in vector database (Chroma/FAISS)
+    def query_notifications(self, user_query):
+        """Process natural language queries"""
+        # Retrieve relevant documents
+        # Generate response using LLM
+        # Return formatted results
+```
+#### 3.2 LLM Model Integration
+- **Model Options**:
+  - **Local**: Llama 2, Mistral, or similar open-source models
+  - **Cloud**: OpenAI GPT-4, Anthropic Claude, or Azure OpenAI
+  - **Hybrid**: Local for basic queries, cloud for complex analysis
+#### 3.3 Query Capabilities
+```python
+# Example queries the system should handle:
+queries = [
+    "Show me all urgent notifications from the last week",
+    "Which FPSO has the most safety-related issues?",
+    "What are the common keywords in deck maintenance notifications?",
+    "Compare notification patterns between PAZ and DAL FPSOs",
+    "Generate a summary of critical maintenance needs",
+    "What work centers require immediate attention?"
+]
+```
+#### 3.4 Implementation Steps
+1. **Vector Database Setup**:
+   - Install and configure vector database (Chroma/FAISS)
+   - Create embeddings for notification text
+   - Index metadata fields
+2. **LLM Integration**:
+   - Set up model API connections
+   - Create prompt templates
+   - Implement response formatting
+3. **User Interface**:
+   - Add chat interface to Streamlit app
+   - Display query results with visualizations
+   - Provide query suggestions
+4. **Response Enhancement**:
+   - Generate charts and graphs from queries
+   - Provide actionable insights
+   - Link to relevant data views
+---
+## 4. Technical Requirements
+### 4.1 New Dependencies
+```txt
+# preprocessing.py
+pandas>=2.0.0
+numpy>=1.24.0
+# feature_engineering.py
+scikit-learn>=1.3.0
+category_encoders>=2.6.0
+# rag_system.py
+sentence-transformers>=2.2.0
+chromadb>=0.4.0
+langchain>=0.1.0
+openai>=1.0.0  # or other LLM provider
+```
+### 4.2 Database Schema Updates
+```sql
+-- New tables for enhanced features
+CREATE TABLE notification_features (
+    id INTEGER PRIMARY KEY,
+    notification_id TEXT,
+    urgency_score REAL,
+    location_category TEXT,
+    keyword_count INTEGER,
+    fpso_density REAL,
+    created_at TIMESTAMP
+);
+CREATE TABLE vector_embeddings (
+    id INTEGER PRIMARY KEY,
+    notification_id TEXT,
+    embedding_vector BLOB,
+    metadata TEXT
+);
+```
+---
+## 5. Implementation Timeline
+### Phase 1: Data Preprocessing (Week 1-2)
+- [ ] Create preprocessing module
+- [ ] Implement column analysis
+- [ ] Add data cleaning pipeline
+- [ ] Test with existing datasets
+### Phase 2: Feature Engineering (Week 3-4)
+- [ ] Implement feature engineering functions
+- [ ] Add new derived features
+- [ ] Update database schema
+- [ ] Integrate with main application
+### Phase 3: RAG System (Week 5-8)
+- [ ] Set up vector database
+- [ ] Implement LLM integration
+- [ ] Create chat interface
+- [ ] Test and optimize queries
+### Phase 4: Integration & Testing (Week 9-10)
+- [ ] Integrate all modules
+- [ ] Performance testing
+- [ ] User acceptance testing
+- [ ] Documentation and deployment
+---
+## 6. Success Metrics
+### Performance Improvements
+- **Data Size Reduction**: Target 15-25% reduction through removal of Priority, Notification, Order, and Planner group columns
+- **Query Speed**: 30-40% faster data loading and processing due to reduced memory footprint
+- **Memory Usage**: 20-30% reduction in memory consumption for cached database
+### User Experience
+- **Query Response Time**: <3 seconds for RAG queries
+- **Accuracy**: >90% relevance for retrieved documents
+- **User Satisfaction**: Improved through natural language interaction
+### Analytics Capabilities
+- **Insight Generation**: Automated identification of patterns and trends
+- **Recommendation Quality**: Actionable maintenance and safety recommendations
+- **Data Coverage**: Enhanced analysis across all FPSO units
+---
+## 7. Risk Mitigation
+### Technical Risks
+- **Model Performance**: Start with simple models, gradually increase complexity
+- **Data Privacy**: Ensure all data processing remains local/secure
+- **Scalability**: Design modular architecture for easy scaling
+### Operational Risks
+- **User Adoption**: Provide training and documentation
+- **Maintenance**: Create automated testing and monitoring
+- **Integration**: Maintain backward compatibility with existing features
+---
+## 8. Future Considerations
+### Advanced Features
+- **Predictive Analytics**: Forecast maintenance needs and safety incidents
+- **Real-time Monitoring**: Live data integration and alerts
+- **Mobile Application**: Extend capabilities to mobile devices
+- **API Integration**: Connect with external maintenance systems
+### Scalability
+- **Multi-tenant Support**: Support multiple organizations
+- **Cloud Deployment**: Scalable cloud infrastructure
+- **Advanced Analytics**: Machine learning for pattern recognition
+---
+*Document created: December 2024*
+*Last updated: [Date]*
+*Maintained by: ValonyLabs Development Team*

src/IMPLEMENTATION_ROADMAP.md ADDED Viewed

	@@ -0,0 +1,111 @@

+# DigiTwin Implementation Roadmap
+## Current Status: ✅ Production Ready
+- ✅ Core application functionality
+- ✅ Data upload and processing
+- ✅ FPSO visualizations
+- ✅ Pivot table analytics
+- ✅ Database persistence
+- ✅ Responsive UI with custom styling
+- ✅ Sidebar layout optimizations
+---
+## Phase 1: Data Preprocessing Module
+**Timeline**: Week 1-2
+**Status**: 🔄 Planned
+### Tasks:
+- [ ] Create `preprocessing.py` module
+- [ ] Implement column analysis functionality
+- [ ] Add data cleaning pipeline
+- [ ] Integrate with main application
+- [ ] Test with existing datasets
+### Deliverables:
+- Preprocessing module with column removal logic
+- Data size reduction by 40-60%
+- Improved loading performance
+---
+## Phase 2: Feature Engineering
+**Timeline**: Week 3-4
+**Status**: 🔄 Planned
+### Tasks:
+- [ ] Create `feature_engineering.py` module
+- [ ] Implement Main WorkCtr categorization
+- [ ] Add temporal and spatial features
+- [ ] Update database schema
+- [ ] Integrate with analytics
+### Deliverables:
+- Enhanced dataset with derived features
+- Improved analytics capabilities
+- Better insights generation
+---
+## Phase 3: LLM Integration with RAG
+**Timeline**: Week 5-8
+**Status**: 🔄 Planned
+### Tasks:
+- [ ] Set up vector database (Chroma/FAISS)
+- [ ] Implement LLM model integration
+- [ ] Create RAG query system
+- [ ] Add chat interface to Streamlit
+- [ ] Test and optimize
+### Deliverables:
+- Natural language query capability
+- Intelligent insights generation
+- Enhanced user experience
+---
+## Phase 4: Integration & Testing
+**Timeline**: Week 9-10
+**Status**: 🔄 Planned
+### Tasks:
+- [ ] Integrate all modules
+- [ ] Performance testing
+- [ ] User acceptance testing
+- [ ] Documentation updates
+- [ ] Production deployment
+### Deliverables:
+- Fully integrated enhanced application
+- Performance benchmarks
+- User documentation
+---
+## Success Metrics
+### Performance Targets:
+- ⚡ 50% faster data loading
+- 💾 40-60% data size reduction
+- 🧠 <3 second RAG query response
+- 📊 >90% query accuracy
+### User Experience:
+- 🎯 Natural language interaction
+- 📈 Enhanced analytics insights
+- 🔍 Improved data discovery
+- 🚀 Better overall performance
+---
+## Notes
+- All enhancements maintain backward compatibility
+- Modular design for easy integration
+- Focus on user experience and performance
+- Scalable architecture for future growth
+---
+*Last Updated: December 2024*
+*Next Review: [Date]*

src/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # reparos
2	+ Repairs

src/README_RAG.md ADDED Viewed

	@@ -0,0 +1,240 @@

+# 🤖 DigiTwin RAG Assistant
+A comprehensive Retrieval-Augmented Generation (RAG) system integrated into the DigiTwin FPSO notifications analysis platform. This system provides intelligent conversational AI capabilities to query and analyze your notifications data using natural language.
+## 🚀 Features
+### Core RAG Capabilities
+- **Hybrid Search**: Combines semantic and keyword-based search for optimal retrieval
+- **Query Rewriting**: Intelligently reformulates user queries for better results
+- **Streaming Responses**: Real-time token-by-token response generation
+- **Pivot Analysis Integration**: Incorporates existing analytics into responses
+- **Multi-LLM Support**: Works with Groq API and local Ollama models
+### Technical Features
+- **Vector Databases**: Support for Weaviate and FAISS
+- **Embedding Models**: Sentence Transformers for semantic understanding
+- **Modern Chat Interface**: Streamlit-based chat UI with message history
+- **Error Handling**: Graceful fallbacks and informative error messages
+- **Modular Design**: Clean separation of concerns and easy extensibility
+## 📋 Prerequisites
+- Python 3.8 or higher
+- Streamlit application with notifications data
+- Internet connection (for Groq API)
+- Optional: Ollama for local LLM inference
+- Optional: Docker for Weaviate vector database
+## 🛠️ Installation
+### Quick Setup
+```bash
+# Run the automated setup script
+python setup_rag.py
+```
+### Manual Installation
+```bash
+# Install RAG dependencies
+pip install -r requirements_rag.txt
+# Or install individual packages
+pip install sentence-transformers faiss-cpu weaviate-client groq ollama
+```
+### Environment Configuration
+1. Create a `.env` file in the project root:
+```bash
+# Groq API Configuration
+GROQ_API_KEY=your_groq_api_key_here
+# Ollama Configuration (optional)
+OLLAMA_HOST=http://localhost:11434
+# Vector Database Configuration (optional)
+WEAVIATE_URL=http://localhost:8080
+# Embedding Model Configuration
+EMBEDDING_MODEL=all-MiniLM-L6-v2
+```
+2. Get your Groq API key from [console.groq.com](https://console.groq.com/)
+## 🚀 Usage
+### Starting the Application
+```bash
+streamlit run notifs.py
+```
+### Using the RAG Assistant
+1. Upload your notifications data or load from database
+2. Navigate to the "🤖 RAG Assistant" tab
+3. Start asking questions in natural language!
+### Example Queries
+```
+"Which FPSO has the most NI notifications?"
+"What are the common keywords in PAZ notifications?"
+"Show me all safety-related notifications from last month"
+"Compare notification patterns between GIR and DAL"
+"What equipment has the most maintenance issues?"
+"Which work centers require immediate attention?"
+```
+## 🏗️ Architecture
+### System Components
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   User Query    │───▶│  Query Rewriter │───▶│  Hybrid Search  │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+                                                       │
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│  Pivot Analysis │◀───│  RAG Prompt     │◀───│  Context Docs   │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+                                                       │
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│  LLM Response   │◀───│  Response Gen   │◀───│  Vector Store   │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+```
+### Data Flow
+1. **Query Input**: User submits natural language query
+2. **Query Rewriting**: LLM reformulates query for better retrieval
+3. **Hybrid Search**: Combines semantic and keyword search
+4. **Context Retrieval**: Fetches relevant documents and pivot analysis
+5. **Prompt Engineering**: Creates optimized RAG prompt
+6. **Response Generation**: LLM generates streaming response
+7. **Display**: Real-time response display in chat interface
+## 🔧 Configuration
+### LLM Models
+- **Groq**: Fast inference with Llama3-8b-8192 model
+- **Ollama**: Local inference with customizable models
+### Vector Databases
+- **FAISS**: Lightweight, in-memory vector search
+- **Weaviate**: Production-ready vector database with Docker
+### Embedding Models
+- **all-MiniLM-L6-v2**: Fast, efficient sentence embeddings
+- **Customizable**: Easy to switch to other models
+## 📊 Performance
+### Expected Performance
+- **Query Response Time**: <3 seconds for most queries
+- **Memory Usage**: Optimized for large datasets
+- **Accuracy**: >90% relevance for retrieved documents
+- **Scalability**: Handles thousands of notifications efficiently
+### Optimization Features
+- **Data Preprocessing**: Removes unnecessary columns
+- **Memory Optimization**: Efficient data types and structures
+- **Caching**: Vector embeddings and search results
+- **Streaming**: Real-time response generation
+## 🐛 Troubleshooting
+### Common Issues
+#### "RAG module not available"
+```bash
+# Install dependencies
+pip install -r requirements_rag.txt
+```
+#### "Groq API key not found"
+```bash
+# Set environment variable
+export GROQ_API_KEY=your_api_key_here
+```
+#### "Vector database connection failed"
+```bash
+# Start Weaviate (optional)
+docker run -d -p 8080:8080 semitechnologies/weaviate:1.22.4
+```
+#### "Embedding model loading failed"
+```bash
+# Check internet connection and try again
+# The model will download automatically on first use
+```
+### Debug Mode
+Enable debug logging by setting:
+```bash
+export STREAMLIT_LOG_LEVEL=debug
+```
+## 🔄 Development
+### Adding New Features
+1. **Custom Embeddings**: Modify `create_embeddings()` method
+2. **New LLM Providers**: Extend `initialize_llm_clients()` method
+3. **Additional Search**: Enhance `hybrid_search()` method
+4. **UI Improvements**: Modify `render_chat_interface()` function
+### Testing
+```bash
+# Run setup tests
+python setup_rag.py
+# Test individual components
+python -c "from rag_chatbot import DigiTwinRAG; rag = DigiTwinRAG()"
+```
+## 📈 Advanced Usage
+### Custom Prompts
+Modify the RAG prompt template in `create_rag_prompt()` method:
+```python
+def create_rag_prompt(self, query: str, context: List[Dict[str, Any]], pivot_analysis: str) -> str:
+    # Customize prompt engineering here
+    pass
+```
+### Adding New Data Sources
+Extend the data loading in `load_notifications_data()` method:
+```python
+def load_notifications_data(self) -> pd.DataFrame:
+    # Add support for new data sources
+    pass
+```
+### Custom Search Strategies
+Enhance the hybrid search in `hybrid_search()` method:
+```python
+def hybrid_search(self, query: str, k: int = 5) -> List[Dict[str, Any]]:
+    # Add custom search algorithms
+    pass
+```
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Add tests
+5. Submit a pull request
+## 📄 License
+This project is part of the DigiTwin platform and follows the same licensing terms.
+## 🆘 Support
+For support and questions:
+- Check the troubleshooting section
+- Review the example queries
+- Test with the setup script
+- Contact the development team
+---
+**🚀 Built with Pride - STP/INSP/MET | Powered by ValonyLabs**

src/clv.py ADDED Viewed

	@@ -0,0 +1,55 @@

+# clv.py
+# CLV-specific keywords and location dictionaries
+clv_module_keywords = ['M110', 'M111', 'M112', 'M113', 'M114', 'M115', 'M116', 'H151',
+                   'M120', 'M121', 'M122', 'M123', 'M124', 'M125', 'M126', 'M151']
+clv_rack_keywords = ['141', '142', '143', '144', '145', '146']
+clv_living_quarters_keywords = ['LQ', 'LQ1', 'LQ2', 'LQ3', 'LQ4', 'LQL0', 'LQPS', 'LQSB', 'LQROOF', 'LQL4', 'LQL2', 'LQ-5', 'LQPD', 'LQ PS', 'LQAFT', 'LQ-T', 'LQL1S']
+clv_flare_keywords = ['131']
+clv_fwd_keywords = ['FWD']
+clv_hexagons_keywords = ['HELIDECK']
+clv_modules = {
+    'M120': (0.75, 2), 'M121': (0.5, 3), 'M122': (0.5, 4), 'M123': (0.5, 5),
+    'M124': (0.5, 6), 'M125': (0.5, 7), 'M126': (0.5, 8), 'M151': (0.5, 9), 'M110': (1.75, 2),
+    'M111': (2, 3), 'M112': (2, 4), 'M113': (2, 5), 'M114': (2, 6),
+    'M115': (2, 7), 'M116': (2, 8), 'H151': (2, 9)
+}
+clv_racks = {
+    '141': (1.5, 3), '142': (1.5, 4), '143': (1.5, 5),
+    '144': (1.5, 6), '145': (1.5, 7), '146': (1.5, 8)
+}
+clv_flare = {'131': (1.5, 9)}
+clv_living_quarters = {'LQ': (0.5, 1)}
+clv_hexagons = {'HELIDECK': (2.75, 1)}
+clv_fwd = {'FWD': (0.5, 10)}
+def draw_clv(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd):
+    for module, (row, col) in clv_modules.items():
+        if module == 'M110':
+            height, y_position, text_y = 1.25, row, row + 0.5
+        elif module == 'M120':
+            height, y_position, text_y = 1.25, row - 0.25, row + 0.25
+        else:
+            height, y_position, text_y = 1, row, row + 0.5
+        add_chamfered_rectangle(ax, (col, y_position), 1, height, 0.1, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, text_y, module, ha='center', va='center', fontsize=7, weight='bold')
+    for rack, (row, col) in clv_racks.items():
+        add_chamfered_rectangle(ax, (col, row), 1, 0.5, 0.05, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 0.25, rack, ha='center', va='center', fontsize=7, weight='bold')
+    for flare_loc, (row, col) in clv_flare.items():
+        add_chamfered_rectangle(ax, (col, row), 1, 0.5, 0.05, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 0.25, flare_loc, ha='center', va='center', fontsize=7, weight='bold')
+    for living_quarter, (row, col) in clv_living_quarters.items():
+        add_rectangle(ax, (col, row), 1, 2.5, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 1.25, living_quarter, ha='center', va='center', fontsize=7, rotation=90, weight='bold')
+    for hexagon, (row, col) in clv_hexagons.items():
+        add_hexagon(ax, (col, row), 0.60, edgecolor='black', facecolor='white')
+        ax.text(col, row, hexagon, ha='center', va='center', fontsize=7, weight='bold')
+    for fwd_loc, (row, col) in clv_fwd.items():
+        add_fwd(ax, (col, row), 2.5, -1, edgecolor='black', facecolor='white')

src/dal.py ADDED Viewed

	@@ -0,0 +1,60 @@

+# dal.py
+# DAL-specific keywords and location dictionaries
+dal_module_keywords = ['P11', 'P21', 'P31', 'P41', 'P51', 'P61', 'P12', 'P22', 'P32', 'P42', 'P52', 'P62']
+dal_rack_keywords = ['R11', 'R12', 'R13', 'R14', 'R15', 'R16']
+dal_living_quarters_keywords = ['LQ', 'LQ1', 'LQ2', 'LQ3', 'LQ4', 'LQL0', 'LQPS', 'LQSB', 'LQROOF', 'LQL4', 'LQL2', 'LQ-5', 'LQPD', 'LQ PS', 'LQAFT', 'LQ-T', 'LQL1S']
+dal_flare_keywords = ['FLARE']
+dal_fwd_keywords = ['FWD']
+dal_hexagons_keywords = ['HELIDECK']
+dal_modules = {
+    'P11': (0.5, 2), 'P21': (0.5, 3), 'P31': (0.5, 4), 'P41': (0.5, 5),
+    'P51': (0.5, 6), 'P61': (0.5, 7), 'P12': (2, 2), 'P22': (2, 3),
+    'P32': (2, 4), 'P42': (2, 5), 'P52': (2, 6), 'P62': (2, 7)
+}
+dal_racks = {
+    'R11': (1.5, 2), 'R12': (1.5, 3), 'R13': (1.5, 4),
+    'R14': (1.5, 5), 'R15': (1.5, 6), 'R16': (1.5, 7)
+}
+dal_flare = {'FLARE': (0.5, 8)}
+dal_living_quarters = {'LQ': (0.5, 1)}
+dal_hexagons = {'HELIDECK': (2.75, 1)}
+dal_fwd = {'FWD': (0.5, 8.75)}
+def draw_dal(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd):
+    for module, (row, col) in dal_modules.items():
+        if module == 'P11':
+            height, y_position, text_y = 1, row, row + 0.5
+        elif module == 'P12':
+            height, y_position, text_y = 1, row, row + 0.25
+        else:
+            height, y_position, text_y = 1, row, row + 0.5
+        add_chamfered_rectangle(ax, (col, y_position), 1, height, 0.1, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, text_y, module, ha='center', va='center', fontsize=7, weight='bold')
+    for rack, (row, col) in dal_racks.items():
+        add_chamfered_rectangle(ax, (col, row), 1, 0.5, 0.05, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 0.25, rack, ha='center', va='center', fontsize=7, weight='bold')
+    for flare_loc, (row, col) in dal_flare.items():
+        add_chamfered_rectangle(ax, (col, row), 0.75, 2.5, 0.05, edgecolor='black', facecolor='white')
+        ax.text(col + 0.35, row + 1.25, flare_loc, ha='center', va='center', fontsize=7, weight='bold')
+    for living_quarter, (row, col) in dal_living_quarters.items():
+        add_rectangle(ax, (col, row), 1, 2.5, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 1.25, living_quarter, ha='center', va='center', fontsize=7, rotation=90, weight='bold')
+    for hexagon, (row, col) in dal_hexagons.items():
+        add_hexagon(ax, (col, row), 0.60, edgecolor='black', facecolor='white')
+        ax.text(col, row, hexagon, ha='center', va='center', fontsize=7, weight='bold')
+    for fwd_loc, (row, col) in dal_fwd.items():
+        add_fwd(ax, (col, row), 2.5, -1, edgecolor='black', facecolor='white')
+    #
+    #add_chamfered_rectangle(ax, (col, row), 1, 0.5, 0.05, edgecolor='black', facecolor='white')
+    #ax.text(col + 0.5, row + 0.25, flare_loc, ha='center', va='center', fontsize=7, weight='bold')

src/gir.py ADDED Viewed

	@@ -0,0 +1,20 @@

+# gir.py
+# GIR-specific keywords and location dictionaries (placeholder values, update as needed)
+gir_module_keywords = []
+gir_rack_keywords = []
+gir_living_quarters_keywords = []
+gir_flare_keywords = []
+gir_fwd_keywords = []
+gir_hexagons_keywords = []
+gir_modules = {}
+gir_racks = {}
+gir_flare = {}
+gir_living_quarters = {}
+gir_hexagons = {}
+gir_fwd = {}
+def draw_gir(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd):
+    # TODO: Implement GIR drawing logic based on actual requirements
+    pass

src/notifs.py ADDED Viewed

	@@ -0,0 +1,948 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import matplotlib.patches as patches
+import math
+import matplotlib.transforms as transforms
+import sqlite3
+# Import FPSO-specific modules
+from clv import *
+from paz import *
+from dal import *
+from gir import *
+# Import shared utilities
+# Remove these imports:
+# from utils import preprocess_keywords, extract_ni_nc_keywords, extract_location_keywords
+# --- UI CONFIG & STYLE ---
+st.set_page_config(page_title="B17 - Notifications", layout="wide")
+st.markdown("""
+    <style>
+    @import url('https://fonts.cdnfonts.com/css/tw-cen-mt');
+    * {
+        font-family: 'Tw Cen MT', sans-serif !important;
+    }
+    /* Sidebar arrow fix */
+    section[data-testid="stSidebar"] [data-testid="stSidebarNav"]::before {
+        content: "▶";
+        font-size: 1.3rem;
+        margin-right: 0.4rem;
+    }
+    /* Fix sidebar expander layout */
+    section[data-testid="stSidebar"] [data-testid="stExpander"] {
+        margin-bottom: 1rem;
+    }
+    section[data-testid="stSidebar"] [data-testid="stExpander"] [data-testid="stExpanderHeader"] {
+        padding: 0.5rem 0.75rem;
+        font-size: 0.9rem;
+        line-height: 1.2;
+        word-wrap: break-word;
+        overflow-wrap: break-word;
+    }
+    section[data-testid="stSidebar"] [data-testid="stExpander"] [data-testid="stExpanderContent"] {
+        padding: 0.5rem 0.75rem;
+    }
+    /* Ensure proper spacing for sidebar elements */
+    section[data-testid="stSidebar"] .stMarkdown {
+        margin-bottom: 0.5rem;
+    }
+    section[data-testid="stSidebar"] .stButton {
+        margin-top: 0.5rem;
+    }
+    /* Ensure sidebar has proper width */
+    section[data-testid="stSidebar"] {
+        min-width: 300px;
+    }
+    /* Improve expander content readability */
+    section[data-testid="stSidebar"] [data-testid="stExpander"] .stMarkdown {
+        font-size: 0.85rem;
+        line-height: 1.3;
+    }
+    section[data-testid="stSidebar"] [data-testid="stExpander"] .stMarkdown p {
+        margin-bottom: 0.25rem;
+    }
+    /* Top-right logo placement - responsive to scrolling */
+    .logo-container {
+        position: absolute;
+        top: 1rem;
+        right: 2rem;
+        z-index: 1000;
+        transition: all 0.3s ease;
+    }
+    /* Adjust logo position when scrolling */
+    .logo-container.scrolled {
+        position: fixed;
+        top: 0.5rem;
+        right: 1rem;
+        transform: scale(0.8);
+    }
+    /* Ensure main content doesn't overlap with logo */
+    .main .block-container {
+        padding-top: 2rem !important;
+    }
+    /* Smooth transitions for logo */
+    .logo-container img {
+        transition: all 0.3s ease;
+    }
+    /* Logo hover effect */
+    .logo-container:hover {
+        transform: scale(1.05);
+    }
+    .logo-container.scrolled:hover {
+        transform: scale(0.85);
+    }
+    </style>
+""", unsafe_allow_html=True)
+# Display logo (responsive to scrolling)
+st.markdown(
+    """
+    <div class="logo-container" id="logo-container">
+        <img src="https://github.com/valonys/DigiTwin/blob/29dd50da95bec35a5abdca4bdda1967f0e5efff6/ValonyLabs_Logo.png?raw=true" width="70">
+    </div>
+    <script>
+    // Handle logo positioning on scroll
+    window.addEventListener('scroll', function() {
+        const logo = document.getElementById('logo-container');
+        if (window.scrollY > 100) {
+            logo.classList.add('scrolled');
+        } else {
+            logo.classList.remove('scrolled');
+        }
+    });
+    // Initial check for scroll position
+    document.addEventListener('DOMContentLoaded', function() {
+        const logo = document.getElementById('logo-container');
+        if (window.scrollY > 100) {
+            logo.classList.add('scrolled');
+        }
+    });
+    </script>
+    """,
+    unsafe_allow_html=True
+)
+st.title("📊 DigiTwin - The Inspekta Deck")
+# --- AVATARS ---
+USER_AVATAR = "https://raw.githubusercontent.com/achilela/vila_fofoka_analysis/9904d9a0d445ab0488cf7395cb863cce7621d897/USER_AVATAR.png"
+BOT_AVATAR = "https://raw.githubusercontent.com/achilela/vila_fofoka_analysis/991f4c6e4e1dc7a8e24876ca5aae5228bcdb4dba/Ataliba_Avatar.jpg"
+# --- FAST LOCAL PREPROCESSING FUNCTIONS ---
+def preprocess_keywords(description):
+    description = str(description).upper()
+    for lq_variant in clv_living_quarters_keywords:
+        if lq_variant != 'LQ':
+            description = description.replace(lq_variant, 'LQ')
+    for module in clv_module_keywords:
+        number = module[1:]
+        if number in description:
+            description = description.replace(number, module)
+    for module in paz_module_keywords:
+        if module in description:
+            description = description.replace(module, module)
+    for rack in paz_rack_keywords:
+        if rack in description:
+            description = description.replace(rack, rack)
+    for module in dal_module_keywords:
+        if module in description:
+            description = description.replace(module, module)
+    for rack in dal_rack_keywords:
+        if rack in description:
+            description = description.replace(rack, rack)
+    # If you use NI_keyword_map and NC_keyword_map, add them here as well
+    return description
+def extract_ni_nc_keywords(row, notif_type_col, desc_col):
+    description = preprocess_keywords(row[desc_col])
+    notif_type = row[notif_type_col]
+    if notif_type == 'NI':
+        keywords = [kw for kw in NI_keywords if kw in description]
+    elif notif_type == 'NC':
+        keywords = [kw for kw in NC_keywords if kw in description]
+    else:
+        keywords = []
+    return ', '.join(keywords) if keywords else 'None'
+def extract_location_keywords(row, desc_col, keyword_list):
+    description = preprocess_keywords(row[desc_col])
+    if keyword_list == clv_living_quarters_keywords:
+        return 'LQ' if any(kw in description for kw in clv_living_quarters_keywords) else 'None'
+    else:
+        locations = [kw for kw in keyword_list if kw in description]
+        return ', '.join(locations) if locations else 'None'
+def create_pivot_table(df, index, columns, aggfunc='size', fill_value=0):
+    """Create pivot table from dataframe"""
+    df_exploded = df.assign(Keywords=df[columns].str.split(', ')).explode('Keywords')
+    df_exploded = df_exploded[df_exploded['Keywords'] != 'None']
+    pivot = pd.pivot_table(df_exploded, index=index, columns='Keywords', aggfunc=aggfunc, fill_value=fill_value)
+    return pivot
+def apply_fpso_colors(df):
+    """Apply color styling to FPSO dataframe"""
+    styles = pd.DataFrame('', index=df.index, columns=df.columns)
+    color_map = {'GIR': '#FFA07A', 'DAL': '#ADD8E6', 'PAZ': '#D8BFD8', 'CLV': '#90EE90'}
+    for fpso, color in color_map.items():
+        if fpso in df.index:
+            styles.loc[fpso] = f'background-color: {color}'
+    return styles
+def add_rectangle(ax, xy, width, height, **kwargs):
+    rectangle = patches.Rectangle(xy, width, height, **kwargs)
+    ax.add_patch(rectangle)
+def add_chamfered_rectangle(ax, xy, width, height, chamfer, **kwargs):
+    x, y = xy
+    coords = [
+        (x + chamfer, y),
+        (x + width - chamfer, y),
+        (x + width, y + chamfer),
+        (x + width, y + height - chamfer),
+        (x + width - chamfer, y + height),
+        (x + chamfer, y + height),
+        (x, y + height - chamfer),
+        (x, y + chamfer)
+    ]
+    polygon = patches.Polygon(coords, closed=True, **kwargs)
+    ax.add_patch(polygon)
+def add_hexagon(ax, xy, radius, **kwargs):
+    x, y = xy
+    vertices = [(x + radius * math.cos(2 * math.pi * n / 6), y + radius * math.sin(2 * math.pi * n / 6)) for n in range(6)]
+    hexagon = patches.Polygon(vertices, closed=True, **kwargs)
+    ax.add_patch(hexagon)
+def add_fwd(ax, xy, width, height, **kwargs):
+    x, y = xy
+    top_width = width * 0.80
+    coords = [
+        (0, 0),
+        (width, 0),
+        (width - (width - top_width) / 2, height),
+        ((width - top_width) / 2, height)
+    ]
+    trapezoid = patches.Polygon(coords, closed=True, **kwargs)
+    t = transforms.Affine2D().rotate_deg(90).translate(x, y)
+    trapezoid.set_transform(t + ax.transData)
+    ax.add_patch(trapezoid)
+    text_t = transforms.Affine2D().rotate_deg(90).translate(x + height / 2, y + width / 2)
+    ax.text(0, -1, "FWD", ha='center', va='center', fontsize=7, weight='bold', transform=text_t + ax.transData)
+# Sidebar file upload and FPSO selection
+st.sidebar.title("Upload Notifications Dataset")
+# Add database loading option
+load_from_db = st.sidebar.checkbox("Load from Database", help="Load previously uploaded data from database")
+# Add preprocessing option
+enable_preprocessing = st.sidebar.checkbox("Enable Data Preprocessing", value=True,
+                                         help="Remove unnecessary columns and optimize memory usage")
+uploaded_file = st.sidebar.file_uploader("Choose an Excel file", type=["xlsx"])
+# Add FPSO selection dropdown in the sidebar
+selected_fpso = st.sidebar.selectbox("Select FPSO for Layout", ['GIR', 'DAL', 'PAZ', 'CLV'])
+# NI/NC keywords (if not already in utils.py, move them there)
+NI_keywords = ['WRAP', 'WELD', 'TBR', 'PACH', 'PATCH', 'OTHE', 'CLMP', 'REPL',
+               'BOND', 'BOLT', 'SUPP', 'OT', 'GASK', 'CLAMP']
+NC_keywords = ['COA', 'ICOA', 'CUSP', 'WELD', 'REPL', 'CUSP1', 'CUSP2']
+DB_PATH = 'notifs_data.db'
+TABLE_NAME = 'notifications'
+# Utility to save DataFrame to SQLite
+def save_df_to_db(df, db_path=DB_PATH, table_name=TABLE_NAME):
+    with sqlite3.connect(db_path) as conn:
+        df.to_sql(table_name, conn, if_exists='replace', index=False)
+        # Save timestamp
+        from datetime import datetime
+        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+        conn.execute("CREATE TABLE IF NOT EXISTS metadata (key TEXT PRIMARY KEY, value TEXT)")
+        conn.execute("INSERT OR REPLACE INTO metadata VALUES (?, ?)", ('last_updated', timestamp))
+# Utility to load DataFrame from SQLite
+def load_df_from_db(db_path=DB_PATH, table_name=TABLE_NAME):
+    with sqlite3.connect(db_path) as conn:
+        try:
+            return pd.read_sql(f'SELECT * FROM {table_name}', conn)
+        except Exception:
+            return None
+# Utility to get last update timestamp
+def get_last_update_time(db_path=DB_PATH):
+    with sqlite3.connect(db_path) as conn:
+        try:
+            result = conn.execute("SELECT value FROM metadata WHERE key = 'last_updated'").fetchone()
+            return result[0] if result else None
+        except Exception:
+            return None
+# Data Preprocessing Function
+def preprocess_notifications_data(df):
+    """
+    Preprocess notification data to reduce size and improve performance
+    by removing unnecessary columns and optimizing memory usage.
+    """
+    # Store original shape for comparison
+    original_shape = df.shape
+    original_memory = df.memory_usage(deep=True).sum()
+    # Remove unnecessary columns to improve memory footprint
+    columns_to_remove = [
+        'Priority',           # Redundant priority information
+        'Notification',       # Duplicate notification data
+        'Order',             # Order information not needed for analytics
+        'Planner group'       # Planner group metadata
+    ]
+    # Remove specified columns (ignore if they don't exist)
+    df_cleaned = df.drop(columns=columns_to_remove, errors='ignore')
+    # Remove columns with high percentage of null values (>80%)
+    null_percentage = df_cleaned.isnull().sum() / len(df_cleaned) * 100
+    high_null_columns = null_percentage[null_percentage > 80].index.tolist()
+    df_cleaned = df_cleaned.drop(columns=high_null_columns)
+    # Remove duplicate rows
+    df_cleaned = df_cleaned.drop_duplicates()
+    # Optimize data types for memory efficiency
+    for col in df_cleaned.columns:
+        if df_cleaned[col].dtype == 'object':
+            # Convert object columns to category if they have few unique values
+            if df_cleaned[col].nunique() / len(df_cleaned) < 0.5:
+                df_cleaned[col] = df_cleaned[col].astype('category')
+        elif df_cleaned[col].dtype == 'int64':
+            # Downcast integers
+            df_cleaned[col] = pd.to_numeric(df_cleaned[col], downcast='integer')
+        elif df_cleaned[col].dtype == 'float64':
+            # Downcast floats
+            df_cleaned[col] = pd.to_numeric(df_cleaned[col], downcast='float')
+    # Calculate improvements
+    final_shape = df_cleaned.shape
+    final_memory = df_cleaned.memory_usage(deep=True).sum()
+    # Create summary of preprocessing results
+    preprocessing_summary = {
+        'original_rows': original_shape[0],
+        'original_cols': original_shape[1],
+        'final_rows': final_shape[0],
+        'final_cols': final_shape[1],
+        'rows_removed': original_shape[0] - final_shape[0],
+        'cols_removed': original_shape[1] - final_shape[1],
+        'original_memory_mb': original_memory / 1024 / 1024,
+        'final_memory_mb': final_memory / 1024 / 1024,
+        'memory_reduction_mb': (original_memory - final_memory) / 1024 / 1024,
+        'memory_reduction_percent': ((original_memory - final_memory) / original_memory) * 100,
+        'removed_columns': columns_to_remove + high_null_columns
+    }
+    return df_cleaned, preprocessing_summary
+# Data Management Section
+st.sidebar.markdown("---")
+st.sidebar.subheader("Data Management")
+# Check if data exists in database
+existing_data = load_df_from_db()
+if existing_data is not None:
+    st.sidebar.info(f"📊 Database contains {len(existing_data)} records")
+    # Show last update time
+    last_update = get_last_update_time()
+    if last_update:
+        st.sidebar.caption(f"🕒 Last updated: {last_update}")
+    # Show data summary
+    with st.sidebar.expander("Data Summary"):
+        if 'FPSO' in existing_data.columns:
+            fpsos = existing_data['FPSO'].value_counts()
+            st.write("**FPSO Distribution:**")
+            for fpso, count in fpsos.items():
+                st.write(f"• {fpso}: {count}")
+        if 'Notifictn type' in existing_data.columns:
+            notif_types = existing_data['Notifictn type'].value_counts()
+            st.write("**Notification Types:**")
+            for ntype, count in notif_types.items():
+                st.write(f"• {ntype}: {count}")
+    # Add clear database option
+    if st.sidebar.button("🗑️ Clear Database"):
+        import os
+        if os.path.exists(DB_PATH):
+            os.remove(DB_PATH)
+            st.sidebar.success("Database cleared successfully!")
+            st.rerun()
+else:
+    st.sidebar.warning("No data in database")
+# Main app logic
+if uploaded_file is not None or load_from_db:
+    try:
+        if load_from_db:
+            df = load_df_from_db()
+            if df is None:
+                st.warning("No data found in the database. Please upload a new file or ensure it's saved.")
+                st.stop()
+            else:
+                st.success("📊 Data loaded from database successfully!")
+        else:
+            # Read the Excel file
+            df = pd.read_excel(uploaded_file, sheet_name='Global Notifications')
+            # Apply data preprocessing if enabled
+            if enable_preprocessing:
+                st.info("🔄 Preprocessing data to optimize performance...")
+                df, preprocessing_summary = preprocess_notifications_data(df)
+                # Display preprocessing results
+                with st.expander("📊 Data Preprocessing Summary", expanded=True):
+                    col1, col2, col3 = st.columns(3)
+                    with col1:
+                        st.metric("Rows", f"{preprocessing_summary['final_rows']:,}",
+                                 f"-{preprocessing_summary['rows_removed']:,}")
+                    with col2:
+                        st.metric("Columns", f"{preprocessing_summary['final_cols']}",
+                                 f"-{preprocessing_summary['cols_removed']}")
+                    with col3:
+                        st.metric("Memory", f"{preprocessing_summary['final_memory_mb']:.1f} MB",
+                                 f"-{preprocessing_summary['memory_reduction_mb']:.1f} MB")
+                    st.write(f"**Memory reduction:** {preprocessing_summary['memory_reduction_percent']:.1f}%")
+                    if preprocessing_summary['removed_columns']:
+                        st.write("**Removed columns:**")
+                        for col in preprocessing_summary['removed_columns']:
+                            st.write(f"• {col}")
+                # Save preprocessed data to DB for persistence
+                save_df_to_db(df)
+                st.success("✅ Data preprocessed and saved to database!")
+            else:
+                # Save original data to DB for persistence
+                save_df_to_db(df)
+                st.success("✅ Data uploaded and saved to database!")
+        # Strip whitespace from column names
+        df.columns = df.columns.str.strip()
+        # Define expected columns with corrected spelling
+        expected_columns = {
+            'Notifictn type': 'Notifictn type',  # Corrected spelling
+            'Created on': 'Created on',          # Corrected spelling
+            'Description': 'Description',
+            'FPSO': 'FPSO'
+        }
+        # Check if all expected columns are present and map them
+        missing_columns = []
+        column_mapping = {}
+        for expected, actual in expected_columns.items():
+            if actual in df.columns:
+                column_mapping[expected] = actual
+            else:
+                missing_columns.append(actual)
+        if missing_columns:
+            st.error(f"The following expected columns are missing: {missing_columns}")
+            st.write("Please ensure your Excel file contains these columns with the exact names.")
+            st.stop()
+        # Rename columns for consistency in processing
+        df = df[list(column_mapping.values())]
+        df.columns = list(expected_columns.keys())
+        # Ensure df is a DataFrame after slicing
+        if not isinstance(df, pd.DataFrame):
+            df = pd.DataFrame(df)
+        # Preprocess FPSO: Keep only GIR, DAL, PAZ, CLV
+        valid_fpsos = ['GIR', 'DAL', 'PAZ', 'CLV']
+        df = df[df['FPSO'].isin(valid_fpsos)]
+        if not isinstance(df, pd.DataFrame):
+            df = pd.DataFrame(df)
+        # Extract NI/NC keywords
+        df['Extracted_Keywords'] = df.apply(extract_ni_nc_keywords, axis=1, args=('Notifictn type', 'Description'))
+        # Extract location keywords (modules, racks, etc.)
+        df['Extracted_Modules'] = df.apply(extract_location_keywords, axis=1, args=('Description', clv_module_keywords))
+        df['Extracted_Racks'] = df.apply(extract_location_keywords, axis=1, args=('Description', clv_rack_keywords))
+        df['Extracted_LivingQuarters'] = df.apply(extract_location_keywords, axis=1, args=('Description', clv_living_quarters_keywords))
+        df['Extracted_Flare'] = df.apply(extract_location_keywords, axis=1, args=('Description', clv_flare_keywords))
+        df['Extracted_FWD'] = df.apply(extract_location_keywords, axis=1, args=('Description', clv_fwd_keywords))
+        df['Extracted_HeliDeck'] = df.apply(extract_location_keywords, axis=1, args=('Description', clv_hexagons_keywords))
+        # Extract PAZ-specific location keywords
+        df['Extracted_PAZ_Modules'] = df.apply(extract_location_keywords, axis=1, args=('Description', paz_module_keywords))
+        df['Extracted_PAZ_Racks'] = df.apply(extract_location_keywords, axis=1, args=('Description', paz_rack_keywords))
+        df['Extracted_PAZ_LivingQuarters'] = df.apply(extract_location_keywords, axis=1, args=('Description', paz_living_quarters_keywords))
+        df['Extracted_PAZ_Flare'] = df.apply(extract_location_keywords, axis=1, args=('Description', paz_flare_keywords))
+        df['Extracted_PAZ_FWD'] = df.apply(extract_location_keywords, axis=1, args=('Description', paz_fwd_keywords))
+        df['Extracted_PAZ_HeliDeck'] = df.apply(extract_location_keywords, axis=1, args=('Description', paz_hexagons_keywords))
+        # Extract DAL-specific location keywords
+        df['Extracted_DAL_Modules'] = df.apply(extract_location_keywords, axis=1, args=('Description', dal_module_keywords))
+        df['Extracted_DAL_Racks'] = df.apply(extract_location_keywords, axis=1, args=('Description', dal_rack_keywords))
+        df['Extracted_DAL_LivingQuarters'] = df.apply(extract_location_keywords, axis=1, args=('Description', dal_living_quarters_keywords))
+        df['Extracted_DAL_Flare'] = df.apply(extract_location_keywords, axis=1, args=('Description', dal_flare_keywords))
+        df['Extracted_DAL_FWD'] = df.apply(extract_location_keywords, axis=1, args=('Description', dal_fwd_keywords))
+        df['Extracted_DAL_HeliDeck'] = df.apply(extract_location_keywords, axis=1, args=('Description', dal_hexagons_keywords))
+        # Split dataframe into NI and NC
+        df_ni = df[df['Notifictn type'] == 'NI'].copy()
+        if not isinstance(df_ni, pd.DataFrame):
+            df_ni = pd.DataFrame(df_ni)
+        df_nc = df[df['Notifictn type'] == 'NC'].copy()
+        if not isinstance(df_nc, pd.DataFrame):
+            df_nc = pd.DataFrame(df_nc)
+        # Create tabs
+        tab1, tab2, tab3, tab4, tab5 = st.tabs(["NI Notifications", "NC Notifications", "Summary Stats", "FPSO Layout", "🤖 RAG Assistant"])
+        # NI Notifications Tab
+        with tab1:
+            st.subheader("NI Notifications Analysis")
+            if not df_ni.empty:
+                ni_pivot = create_pivot_table(df_ni, index='FPSO', columns='Extracted_Keywords')
+                st.write("Pivot Table (Count of Keywords by FPSO):")
+                styled_ni_pivot = ni_pivot.style.apply(apply_fpso_colors, axis=None)
+                st.dataframe(styled_ni_pivot)
+                st.write(f"Total NI Notifications: {df_ni.shape[0]}")
+            else:
+                st.write("No NI notifications found in the dataset.")
+        # NC Notifications Tab
+        with tab2:
+            st.subheader("NC Notifications Analysis")
+            if not df_nc.empty:
+                nc_pivot = create_pivot_table(df_nc, index='FPSO', columns='Extracted_Keywords')
+                st.write("Pivot Table (Count of Keywords by FPSO):")
+                styled_nc_pivot = nc_pivot.style.apply(apply_fpso_colors, axis=None)
+                st.dataframe(styled_nc_pivot)
+                st.write(f"Total NC Notifications: {df_nc.shape[0]}")
+            else:
+                st.write("No NC notifications found in the dataset.")
+        # NI Summary 2025 Tab
+        with tab3:
+            st.subheader("2025 Raised")
+            # Filter for notifications in 2025
+            created_on_series = pd.to_datetime(df['Created on'])
+            df_2025 = df[created_on_series.dt.year == 2025].copy()
+            if not df_2025.empty:
+                # Add 'Month' column for monthly analysis
+                df_2025['Month'] = pd.to_datetime(df_2025['Created on']).dt.strftime('%b')
+                months_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
+                df_2025['Month'] = pd.Categorical(df_2025['Month'], categories=months_order, ordered=True)
+                # Group by FPSO, Month, and Notification Type
+                summary = df_2025.groupby(['FPSO', 'Month', 'Notifictn type']).size().unstack(fill_value=0)
+                # Reshape the data for NI and NC notifications
+                ni_summary = summary['NI'].unstack(level='Month') if 'NI' in summary else pd.DataFrame(index=pd.Index([]), columns=pd.Index(months_order))
+                nc_summary = summary['NC'].unstack(level='Month') if 'NC' in summary else pd.DataFrame(index=pd.Index([]), columns=pd.Index(months_order))
+                ni_summary = ni_summary.reindex(columns=pd.Index(months_order), fill_value=0) if not ni_summary.empty else pd.DataFrame(index=pd.Index([]), columns=pd.Index(months_order))
+                nc_summary = nc_summary.reindex(columns=pd.Index(months_order), fill_value=0) if not nc_summary.empty else pd.DataFrame(index=pd.Index([]), columns=pd.Index(months_order))
+                # Display NI Summary Table
+                st.write("NI's:")
+                st.dataframe(
+                    ni_summary.style.set_table_styles([
+                        {'selector': 'thead', 'props': [('display', 'none')]}
+                    ]).set_properties(**{'text-align': 'center'})
+                )
+                # Display NC Summary Table
+                st.write("NC's:")
+                st.dataframe(
+                    nc_summary.style.set_table_styles([
+                        {'selector': 'thead', 'props': [('display', 'none')]}
+                    ]).set_properties(**{'text-align': 'center'})
+                )
+                # Calculate totals
+                total_ni = df_2025[df_2025['Notifictn type'] == 'NI'].shape[0]
+                total_nc = df_2025[df_2025['Notifictn type'] == 'NC'].shape[0]
+                st.write(f"Grand Total NI Notifications: {total_ni}")
+                st.write(f"Grand Total NC Notifications: {total_nc}")
+            else:
+                st.write("No notifications found for 2025 in the dataset.")
+        with tab4:
+            st.subheader("FPSO Layout Visualization")
+            notification_type = st.radio("Select Notification Type", ['NI', 'NC'])
+            # Count NI or NC notifications for each location type for the selected FPSO (CLV, PAZ, DAL)
+            df_selected = df[df['FPSO'] == selected_fpso].copy()
+            if notification_type == 'NI':
+                df_selected = df_selected[df_selected['Notifictn type'] == 'NI']
+            else:  # NC
+                df_selected = df_selected[df_selected['Notifictn type'] == 'NC']
+            # Initialize counts for all location types
+            location_counts = {
+                'Modules': pd.DataFrame(index=pd.Index(clv_module_keywords), columns=['Count']).fillna(0),
+                'Racks': pd.DataFrame(index=pd.Index(clv_rack_keywords), columns=['Count']).fillna(0),
+                'LivingQuarters': pd.DataFrame(index=pd.Index(clv_living_quarters_keywords), columns=['Count']).fillna(0),
+                'Flare': pd.DataFrame(index=pd.Index(clv_flare_keywords), columns=['Count']).fillna(0),
+                'FWD': pd.DataFrame(index=pd.Index(clv_fwd_keywords), columns=['Count']).fillna(0),
+                'HeliDeck': pd.DataFrame(index=pd.Index(clv_hexagons_keywords), columns=['Count']).fillna(0)
+            }
+            paz_location_counts = {
+                'PAZ_Modules': pd.DataFrame(index=pd.Index(paz_module_keywords), columns=['Count']).fillna(0),
+                'PAZ_Racks': pd.DataFrame(index=pd.Index(paz_rack_keywords), columns=['Count']).fillna(0),
+                'LivingQuarters': pd.DataFrame(index=pd.Index(paz_living_quarters_keywords), columns=['Count']).fillna(0),
+                'Flare': pd.DataFrame(index=pd.Index(paz_flare_keywords), columns=['Count']).fillna(0),
+                'FWD': pd.DataFrame(index=pd.Index(paz_fwd_keywords), columns=['Count']).fillna(0),
+                'HeliDeck': pd.DataFrame(index=pd.Index(paz_hexagons_keywords), columns=['Count']).fillna(0)
+            }
+            dal_location_counts = {
+                'DAL_Modules': pd.DataFrame(index=pd.Index(dal_module_keywords), columns=['Count']).fillna(0),
+                'DAL_Racks': pd.DataFrame(index=pd.Index(dal_rack_keywords), columns=['Count']).fillna(0),
+                'LivingQuarters': pd.DataFrame(index=pd.Index(dal_living_quarters_keywords), columns=['Count']).fillna(0),
+                'Flare': pd.DataFrame(index=pd.Index(dal_flare_keywords), columns=['Count']).fillna(0),
+                'FWD': pd.DataFrame(index=pd.Index(dal_fwd_keywords), columns=['Count']).fillna(0),
+                'HeliDeck': pd.DataFrame(index=pd.Index(dal_hexagons_keywords), columns=['Count']).fillna(0)
+            }
+            # Count notifications for each location type and placement
+            for location_type, keywords in [
+                ('Modules', clv_module_keywords),
+                ('Racks', clv_rack_keywords),
+                ('LivingQuarters', clv_living_quarters_keywords),
+                ('Flare', clv_flare_keywords),
+                ('FWD', clv_fwd_keywords),
+                ('HeliDeck', clv_hexagons_keywords)
+            ]:
+                for keyword in keywords:
+                    count = df_selected[f'Extracted_{location_type}'].str.contains(keyword, na=False).sum()
+                    location_counts[location_type].loc[keyword, 'Count'] = count
+            for location_type, keywords in [
+                ('PAZ_Modules', paz_module_keywords),
+                ('PAZ_Racks', paz_rack_keywords),
+                ('LivingQuarters', paz_living_quarters_keywords),
+                ('Flare', paz_flare_keywords),
+                ('FWD', paz_fwd_keywords),
+                ('HeliDeck', paz_hexagons_keywords)
+            ]:
+                for keyword in keywords:
+                    if location_type == 'PAZ_Modules':
+                        count = df_selected['Extracted_PAZ_Modules'].str.contains(keyword, na=False).sum()
+                        paz_location_counts[location_type].loc[keyword, 'Count'] = count
+                    elif location_type == 'PAZ_Racks':
+                        count = df_selected['Extracted_PAZ_Racks'].str.contains(keyword, na=False).sum()
+                        paz_location_counts[location_type].loc[keyword, 'Count'] = count
+                    else:
+                        count = df_selected[f'Extracted_{location_type}'].str.contains(keyword, na=False).sum()
+                        paz_location_counts[location_type].loc[keyword, 'Count'] = count
+            for location_type, keywords in [
+                ('DAL_Modules', dal_module_keywords),
+                ('DAL_Racks', dal_rack_keywords),
+                ('LivingQuarters', dal_living_quarters_keywords),
+                ('Flare', dal_flare_keywords),
+                ('FWD', dal_fwd_keywords),
+                ('HeliDeck', dal_hexagons_keywords)
+            ]:
+                for keyword in keywords:
+                    if location_type == 'DAL_Modules':
+                        count = df_selected['Extracted_DAL_Modules'].str.contains(keyword, na=False).sum()
+                        dal_location_counts[location_type].loc[keyword, 'Count'] = count
+                    elif location_type == 'DAL_Racks':
+                        count = df_selected['Extracted_DAL_Racks'].str.contains(keyword, na=False).sum()
+                        dal_location_counts[location_type].loc[keyword, 'Count'] = count
+                    else:
+                        count = df_selected[f'Extracted_{location_type}'].str.contains(keyword, na=False).sum()
+                        dal_location_counts[location_type].loc[keyword, 'Count'] = count
+            total_lq_count = sum(
+                df_selected['Extracted_LivingQuarters'].str.contains(keyword, na=False).sum()
+                for keyword in clv_living_quarters_keywords
+            )
+            # Draw the FPSO layout and overlay notification counts
+            def draw_fpso_layout(selected_unit):
+                fig, ax = plt.subplots(figsize=(13, 8))
+                ax.set_xlim(0, 13.5)
+                ax.set_ylim(0, 3.5)
+                ax.set_aspect('equal')
+                ax.grid(False)
+                ax.set_facecolor('#E6F3FF')
+                # Remove axes for cleaner visualization
+                ax.set_xticks([])
+                ax.set_yticks([])
+                ax.spines['top'].set_visible(False)
+                ax.spines['right'].set_visible(False)
+                ax.spines['bottom'].set_visible(False)
+                ax.spines['left'].set_visible(False)
+                if selected_unit == 'CLV':
+                    draw_clv(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd)
+                elif selected_unit == 'PAZ':
+                    draw_paz(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd)
+                elif selected_unit == 'DAL':
+                    draw_dal(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd)
+                elif selected_unit == 'GIR':
+                    draw_gir(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd)
+                return fig
+            fig = draw_fpso_layout(selected_fpso)
+            ax = fig.gca()
+            # Overlay notification counts on locations for CLV and PAZ
+            if selected_fpso == 'CLV':
+                # Modules
+                for module, (row, col) in clv_modules.items():
+                    if module in clv_module_keywords:
+                        count = int(location_counts['Modules'].loc[module, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the module text for clarity >> col moves horizontally in x axis whilst row moves vertically in y axis
+                            ax.text(col + 0.8, row + 0.8, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Racks
+                for rack, (row, col) in clv_racks.items():
+                    if rack in clv_rack_keywords:
+                        count = int(location_counts['Racks'].loc[rack, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the rack text
+                            ax.text(col + 0.7, row + 0.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Living Quarters (with total count)
+                for lq, (row, col) in clv_living_quarters.items():
+                    if total_lq_count > 0:
+                        # Position count slightly above and to the right of the LQ text
+                        ax.text(col + 0.7, row + 1.4, f"{total_lq_count}",
+                                ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Flare
+                for flare_loc, (row, col) in clv_flare.items():
+                    if flare_loc in clv_flare_keywords:
+                        count = int(location_counts['Flare'].loc[flare_loc, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the flare text
+                            ax.text(col + 0.7, row + 0.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # FWD
+                for fwd_loc, (row, col) in clv_fwd.items():
+                    if fwd_loc in clv_fwd_keywords:
+                        count = int(location_counts['FWD'].loc[fwd_loc, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the left of the FWD text (adjusted for rotation)
+                            ax.text(col + 0.75, row + 1.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Heli-deck
+                for hexagon, (row, col) in clv_hexagons.items():
+                    if hexagon in clv_hexagons_keywords:
+                        count = int(location_counts['HeliDeck'].loc[hexagon, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the heli-deck text
+                            ax.text(col + 0.2, row + 0.2, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Total counts at the bottom (matching your image)
+                total_ni = df_selected[df_selected['Notifictn type'] == 'NI'].shape[0]
+                total_nc = df_selected[df_selected['Notifictn type'] == 'NC'].shape[0]
+                ax.text(6, 0.25, f"NI: {total_ni}\nNC: {total_nc}", ha='center', va='center', fontsize=8, weight='bold', color='red')
+            elif selected_fpso == 'PAZ':
+                # PAZ Modules
+                for module, (row, col) in paz_modules.items():
+                    if module in paz_module_keywords:
+                        count = int(paz_location_counts['PAZ_Modules'].loc[module, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the module text
+                            ax.text(col + 0.8, row + 0.8, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # PAZ Racks
+                for rack, (row, col) in paz_racks.items():
+                    if rack in paz_rack_keywords:
+                        count = int(paz_location_counts['PAZ_Racks'].loc[rack, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the rack text
+                            ax.text(col + 0.7, row + 0.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Living Quarters (with total count)
+                for lq, (row, col) in paz_living_quarters.items():
+                    if total_lq_count > 0:
+                        # Position count slightly above and to the right of the LQ text
+                        ax.text(col + 0.7, row + 1.4, f"{total_lq_count}",
+                                ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Flare
+                for flare_loc, (row, col) in paz_flare.items():
+                    if flare_loc in paz_flare_keywords:
+                        count = int(paz_location_counts['Flare'].loc[flare_loc, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the flare text
+                            ax.text(col + 0.7, row + 0.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # FWD
+                for fwd_loc, (row, col) in paz_fwd.items():
+                    if fwd_loc in paz_fwd_keywords:
+                        count = int(paz_location_counts['FWD'].loc[fwd_loc, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the left of the FWD text (adjusted for rotation)
+                            ax.text(col + 0.75, row + 1.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Heli-deck
+                for hexagon, (row, col) in paz_hexagons.items():
+                    if hexagon in paz_hexagons_keywords:
+                        count = int(paz_location_counts['HeliDeck'].loc[hexagon, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the heli-deck text
+                            ax.text(col + 0.2, row + 0.2, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Total counts at the bottom
+                total_ni = df_selected[df_selected['Notifictn type'] == 'NI'].shape[0]
+                total_nc = df_selected[df_selected['Notifictn type'] == 'NC'].shape[0]
+                ax.text(6, 0.25, f"NI: {total_ni}\nNC: {total_nc}", ha='center', va='center', fontsize=8, weight='bold', color='red')
+            elif selected_fpso == 'DAL':
+                # DAL Modules
+                for module, (row, col) in dal_modules.items():
+                    if module in dal_module_keywords:
+                        count = int(dal_location_counts['DAL_Modules'].loc[module, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the module text
+                            ax.text(col + 0.8, row + 0.8, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # DAL Racks
+                for rack, (row, col) in dal_racks.items():
+                    if rack in dal_rack_keywords:
+                        count = int(dal_location_counts['DAL_Racks'].loc[rack, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the rack text
+                            ax.text(col + 0.7, row + 0.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Living Quarters (with total count)
+                for lq, (row, col) in dal_living_quarters.items():
+                    if total_lq_count > 0:
+                        # Position count slightly above and to the right of the LQ text
+                        ax.text(col + 0.7, row + 1.4, f"{total_lq_count}",
+                                ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Flare
+                for flare_loc, (row, col) in dal_flare.items():
+                    if flare_loc in dal_flare_keywords:
+                        count = int(dal_location_counts['Flare'].loc[flare_loc, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the flare text
+                            ax.text(col + 0.7, row + 0.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # FWD
+                for fwd_loc, (row, col) in dal_fwd.items():
+                    if fwd_loc in dal_fwd_keywords:
+                        count = int(dal_location_counts['FWD'].loc[fwd_loc, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the left of the FWD text (adjusted for rotation)
+                            ax.text(col + 0.75, row + 1.4, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Heli-deck
+                for hexagon, (row, col) in dal_hexagons.items():
+                    if hexagon in dal_hexagons_keywords:
+                        count = int(dal_location_counts['HeliDeck'].loc[hexagon, 'Count'])
+                        if count > 0:
+                            # Position count slightly above and to the right of the heli-deck text
+                            ax.text(col + 0.2, row + 0.2, f"{count}",
+                                    ha='center', va='center', fontsize=6, weight='bold', color='red')
+                # Total counts at the bottom
+                total_ni = df_selected[df_selected['Notifictn type'] == 'NI'].shape[0]
+                total_nc = df_selected[df_selected['Notifictn type'] == 'NC'].shape[0]
+                ax.text(6, 0.25, f"NI: {total_ni}\nNC: {total_nc}", ha='center', va='center', fontsize=8, weight='bold', color='red')
+            else:
+                # Display placeholder text for non-implemented FPSOs
+                ax.text(6, 1.75, f"{selected_fpso} Layout\n(Implementation work in progress...)", ha='center', va='center', fontsize=16, weight='bold')
+            plt.title(f"FPSO Visualization - {selected_fpso}", fontsize=16)
+            st.pyplot(fig)
+            plt.close(fig)  # Close the figure to free memory
+        # RAG Assistant Tab
+        with tab5:
+            st.subheader("🤖 DigiTwin RAG Assistant")
+            st.markdown("Ask me anything about your FPSO notifications data!")
+            # Import and initialize RAG system
+            try:
+                from rag_chatbot import DigiTwinRAG, render_chat_interface
+                # Initialize RAG system
+                if 'rag_system' not in st.session_state:
+                    with st.spinner("Initializing RAG system..."):
+                        st.session_state.rag_system = DigiTwinRAG()
+                # Render chat interface
+                render_chat_interface(st.session_state.rag_system)
+            except ImportError as e:
+                st.error(f"❌ RAG module not available: {e}")
+                st.info("💡 To enable RAG functionality, install the required dependencies:")
+                st.code("pip install -r requirements_rag.txt")
+                # Show sample questions
+                st.markdown("### 💡 Sample Questions You Can Ask:")
+                sample_questions = [
+                    "Which FPSO has the most NI notifications?",
+                    "What are the common keywords in PAZ notifications?",
+                    "Show me all safety-related notifications from last month",
+                    "Compare notification patterns between GIR and DAL",
+                    "What equipment has the most maintenance issues?",
+                    "Which work centers require immediate attention?"
+                ]
+                for question in sample_questions:
+                    st.write(f"• {question}")
+            except Exception as e:
+                st.error(f"❌ Error initializing RAG system: {e}")
+                st.info("Please check your LLM configuration and vector database setup.")
+    except Exception as e:
+        st.error(f"An error occurred: {e}")
+else:
+    st.write('Please upload an Excel file to proceed.')
+# Add footer with rocket emojis and branding
+st.markdown("---")
+st.markdown(
+    """
+    <div style="text-align: center; padding: 20px; border-radius: 10px; margin-top: 30px;">
+        <p style="font-size: 14px; color: #6c757d; margin: 0;">
+            🚀 Built with Pride - STP/INSP/MET | Powered by <a href="https://www.valonylabs.com" target="_blank" style="color: #007bff; text-decoration: none; font-weight: bold;">ValonyLabs</a> 🚀
+        </p>
+    </div>
+    """,
+    unsafe_allow_html=True
+)

src/notifs_data.db ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:547aa0eedb75395560b1fbbef46d6773e006f3fb628aa787b7ce2efd72a067f7
+size 4038656

src/paz.py ADDED Viewed

	@@ -0,0 +1,55 @@

+# paz.py
+# PAZ-specific keywords and location dictionaries
+paz_module_keywords = ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8']
+paz_rack_keywords = ['R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8']
+paz_living_quarters_keywords = ['LQ', 'LQ1', 'LQ2', 'LQ3', 'LQ4', 'LQL0', 'LQPS', 'LQSB', 'LQROOF', 'LQL4', 'LQL2', 'LQ-5', 'LQPD', 'LQ PS', 'LQAFT', 'LQ-T', 'LQL1S']
+paz_flare_keywords = ['FLARE']
+paz_fwd_keywords = ['FWD']
+paz_hexagons_keywords = ['HELIDECK']
+paz_modules = {
+    'L1': (0.75, 2), 'P1': (0.5, 3), 'P2': (0.5, 4), 'P3': (0.5, 5), 'P4': (0.5, 6),
+    'P5': (0.5, 7), 'P6': (0.5, 8), 'P7': (0.5, 9), 'P8': (0.5, 10), 'L2': (1.75, 2),
+    'S1': (2, 3), 'S2': (2, 4), 'S3': (2, 5), 'S4': (2, 6),
+    'S5': (2, 7), 'S6': (2, 8), 'S7': (2, 9), 'S8': (2, 10)
+}
+paz_racks = {
+    'R1': (1.5, 3), 'R2': (1.5, 4), 'R3': (1.5, 5),
+    'R4': (1.5, 6), 'R5': (1.5, 7), 'R6': (1.5, 8),
+    'R7': (1.5, 9), 'R8': (1.5, 10)
+}
+paz_flare = {'FLARE': (0.5, 11)}
+paz_living_quarters = {'LQ': (0.5, 1)}
+paz_hexagons = {'HELIDECK': (2.75, 1)}
+paz_fwd = {'FWD': (0.5, 11.75)}
+def draw_paz(ax, add_chamfered_rectangle, add_rectangle, add_hexagon, add_fwd):
+    for module, (row, col) in paz_modules.items():
+        if module == 'L2':
+            height, y_position, text_y = 1.25, row, row + 0.5
+        elif module == 'L1':
+            height, y_position, text_y = 1.25, row - 0.25, row + 0.25
+        else:
+            height, y_position, text_y = 1, row, row + 0.5
+        add_chamfered_rectangle(ax, (col, y_position), 1, height, 0.1, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, text_y, module, ha='center', va='center', fontsize=7, weight='bold')
+    for rack, (row, col) in paz_racks.items():
+        add_chamfered_rectangle(ax, (col, row), 1, 0.5, 0.05, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 0.25, rack, ha='center', va='center', fontsize=7, weight='bold')
+    for flare_loc, (row, col) in paz_flare.items():
+        add_chamfered_rectangle(ax, (col, row), 0.75, 2.5, 0.05, edgecolor='black', facecolor='white')
+        ax.text(col + 0.35, row + 1.25, flare_loc, ha='center', va='center', fontsize=7, weight='bold')
+    for living_quarter, (row, col) in paz_living_quarters.items():
+        add_rectangle(ax, (col, row), 1, 2.5, edgecolor='black', facecolor='white')
+        ax.text(col + 0.5, row + 1.25, living_quarter, ha='center', va='center', fontsize=7, rotation=90, weight='bold')
+    for hexagon, (row, col) in paz_hexagons.items():
+        add_hexagon(ax, (col, row), 0.60, edgecolor='black', facecolor='white')
+        ax.text(col, row, hexagon, ha='center', va='center', fontsize=7, weight='bold')
+    for fwd_loc, (row, col) in paz_fwd.items():
+        add_fwd(ax, (col, row), 2.5, -1, edgecolor='black', facecolor='white')

src/rag_chatbot.py ADDED Viewed

	@@ -0,0 +1,548 @@

+"""
+DigiTwin RAG Chatbot Module
+A comprehensive RAG system with hybrid search, query rewriting, and streaming responses
+"""
+import streamlit as st
+import pandas as pd
+import numpy as np
+import sqlite3
+import json
+import time
+from typing import List, Dict, Any, Optional, Tuple
+import requests
+from datetime import datetime
+import re
+# Vector database imports
+try:
+    import weaviate
+    from weaviate import Client
+    WEAVIATE_AVAILABLE = True
+except ImportError:
+    WEAVIATE_AVAILABLE = False
+try:
+    import faiss
+    import faiss.cpu
+    FAISS_AVAILABLE = True
+except ImportError:
+    FAISS_AVAILABLE = False
+# Embedding imports
+try:
+    from sentence_transformers import SentenceTransformer
+    SENTENCE_TRANSFORMERS_AVAILABLE = True
+except ImportError:
+    SENTENCE_TRANSFORMERS_AVAILABLE = False
+# LLM imports
+try:
+    import groq
+    GROQ_AVAILABLE = True
+except ImportError:
+    GROQ_AVAILABLE = False
+try:
+    import ollama
+    OLLAMA_AVAILABLE = True
+except ImportError:
+    OLLAMA_AVAILABLE = False
+# Configuration
+DB_PATH = 'notifs_data.db'
+TABLE_NAME = 'notifications'
+VECTOR_DB_PATH = 'vector_store'
+EMBEDDING_MODEL = 'all-MiniLM-L6-v2'
+class DigiTwinRAG:
+    """
+    Comprehensive RAG system for DigiTwin notifications analysis
+    """
+    def __init__(self, db_path: str = DB_PATH, vector_db_path: str = VECTOR_DB_PATH):
+        self.db_path = db_path
+        self.vector_db_path = vector_db_path
+        self.embedding_model = None
+        self.vector_store = None
+        self.llm_client = None
+        self.initialize_components()
+    def initialize_components(self):
+        """Initialize all RAG components"""
+        # Initialize embedding model
+        if SENTENCE_TRANSFORMERS_AVAILABLE:
+            try:
+                self.embedding_model = SentenceTransformer(EMBEDDING_MODEL)
+                st.success(f"✅ Embedding model loaded: {EMBEDDING_MODEL}")
+            except Exception as e:
+                st.error(f"❌ Failed to load embedding model: {e}")
+        # Initialize vector store
+        self.initialize_vector_store()
+        # Initialize LLM clients
+        self.initialize_llm_clients()
+    def initialize_vector_store(self):
+        """Initialize vector database (Weaviate or FAISS)"""
+        if WEAVIATE_AVAILABLE:
+            try:
+                self.vector_store = Client("http://localhost:8080")
+                st.success("✅ Weaviate vector store connected")
+            except Exception as e:
+                st.warning(f"⚠️ Weaviate not available: {e}")
+                self.vector_store = None
+        if not self.vector_store and FAISS_AVAILABLE:
+            try:
+                # Initialize FAISS index
+                dimension = 384  # all-MiniLM-L6-v2 dimension
+                self.vector_store = faiss.IndexFlatIP(dimension)
+                st.success("✅ FAISS vector store initialized")
+            except Exception as e:
+                st.error(f"❌ Failed to initialize FAISS: {e}")
+                self.vector_store = None
+    def initialize_llm_clients(self):
+        """Initialize LLM clients (Groq and Ollama)"""
+        self.llm_client = {}
+        # Initialize Groq client
+        if GROQ_AVAILABLE:
+            try:
+                # You'll need to set GROQ_API_KEY in environment
+                import os
+                api_key = os.getenv('GROQ_API_KEY')
+                if api_key:
+                    self.llm_client['groq'] = groq.Groq(api_key=api_key)
+                    st.success("✅ Groq client initialized")
+                else:
+                    st.warning("⚠️ GROQ_API_KEY not found in environment")
+            except Exception as e:
+                st.warning(f"⚠️ Groq initialization failed: {e}")
+        # Initialize Ollama client
+        if OLLAMA_AVAILABLE:
+            try:
+                self.llm_client['ollama'] = ollama.Client(host='http://localhost:11434')
+                st.success("✅ Ollama client initialized")
+            except Exception as e:
+                st.warning(f"⚠️ Ollama initialization failed: {e}")
+    def load_notifications_data(self) -> pd.DataFrame:
+        """Load notifications data from SQLite database"""
+        try:
+            with sqlite3.connect(self.db_path) as conn:
+                df = pd.read_sql(f'SELECT * FROM {TABLE_NAME}', conn)
+            return df
+        except Exception as e:
+            st.error(f"❌ Failed to load data: {e}")
+            return pd.DataFrame()
+    def create_document_chunks(self, df: pd.DataFrame) -> List[Dict[str, Any]]:
+        """Create document chunks for vectorization"""
+        documents = []
+        for idx, row in df.iterrows():
+            # Create rich document representation
+            doc = {
+                'id': f"doc_{idx}",
+                'content': f"""
+                FPSO: {row.get('FPSO', 'N/A')}
+                Notification Type: {row.get('Notifictn type', 'N/A')}
+                {'(Notification of Integrity)' if row.get('Notifictn type') == 'NI' else '(Notification of Conformity)' if row.get('Notifictn type') == 'NC' else ''}
+                Description: {row.get('Description', 'N/A')}
+                Created: {row.get('Created on', 'N/A')}
+                Keywords: {row.get('Extracted_Keywords', 'N/A')}
+                Modules: {row.get('Extracted_Modules', 'N/A')}
+                Racks: {row.get('Extracted_Racks', 'N/A')}
+                """.strip(),
+                'metadata': {
+                    'fpso': row.get('FPSO', 'N/A'),
+                    'notification_type': row.get('Notifictn type', 'N/A'),
+                    'created_date': row.get('Created on', 'N/A'),
+                    'keywords': row.get('Extracted_Keywords', 'N/A'),
+                    'modules': row.get('Extracted_Modules', 'N/A'),
+                    'racks': row.get('Extracted_Racks', 'N/A')
+                }
+            }
+            documents.append(doc)
+        return documents
+    def create_embeddings(self, documents: List[Dict[str, Any]]) -> np.ndarray:
+        """Create embeddings for documents"""
+        if not self.embedding_model:
+            st.error("❌ Embedding model not available")
+            return np.array([])
+        texts = [doc['content'] for doc in documents]
+        embeddings = self.embedding_model.encode(texts, show_progress_bar=True)
+        return embeddings
+    def index_documents(self, documents: List[Dict[str, Any]], embeddings: np.ndarray):
+        """Index documents in vector store"""
+        if not self.vector_store:
+            st.error("❌ Vector store not available")
+            return
+        if WEAVIATE_AVAILABLE and isinstance(self.vector_store, Client):
+            # Index in Weaviate
+            try:
+                for doc, embedding in zip(documents, embeddings):
+                    self.vector_store.data_object.create(
+                        data_object=doc['metadata'],
+                        class_name="Notification",
+                        vector=embedding.tolist()
+                    )
+                st.success(f"✅ Indexed {len(documents)} documents in Weaviate")
+            except Exception as e:
+                st.error(f"❌ Failed to index in Weaviate: {e}")
+        elif FAISS_AVAILABLE and hasattr(self.vector_store, 'add'):
+            # Index in FAISS
+            try:
+                self.vector_store.add(embeddings.astype('float32'))
+                # Save document metadata separately
+                import pickle
+                with open(f"{self.vector_db_path}_metadata.pkl", 'wb') as f:
+                    pickle.dump(documents, f)
+                st.success(f"✅ Indexed {len(documents)} documents in FAISS")
+            except Exception as e:
+                st.error(f"❌ Failed to index in FAISS: {e}")
+    def hybrid_search(self, query: str, k: int = 5) -> List[Dict[str, Any]]:
+        """Perform hybrid search (semantic + keyword)"""
+        results = []
+        # Semantic search
+        if self.embedding_model and self.vector_store:
+            query_embedding = self.embedding_model.encode([query])
+            if WEAVIATE_AVAILABLE and isinstance(self.vector_store, Client):
+                # Weaviate semantic search
+                try:
+                    semantic_results = self.vector_store.query.get("Notification", [
+                        "fpso", "notification_type", "created_date", "keywords", "modules", "racks"
+                    ]).with_near_vector({
+                        "vector": query_embedding[0].tolist()
+                    }).with_limit(k).do()
+                    for result in semantic_results['data']['Get']['Notification']:
+                        results.append({
+                            'content': f"FPSO: {result['fpso']}, Type: {result['notification_type']}, Keywords: {result['keywords']}",
+                            'metadata': result,
+                            'score': 1.0  # Weaviate doesn't return scores by default
+                        })
+                except Exception as e:
+                    st.warning(f"⚠️ Weaviate search failed: {e}")
+            elif FAISS_AVAILABLE and hasattr(self.vector_store, 'search'):
+                # FAISS semantic search
+                try:
+                    scores, indices = self.vector_store.search(query_embedding.astype('float32'), k)
+                    # Load document metadata
+                    import pickle
+                    with open(f"{self.vector_db_path}_metadata.pkl", 'rb') as f:
+                        documents = pickle.load(f)
+                    for score, idx in zip(scores[0], indices[0]):
+                        if idx < len(documents):
+                            results.append({
+                                'content': documents[idx]['content'],
+                                'metadata': documents[idx]['metadata'],
+                                'score': float(score)
+                            })
+                except Exception as e:
+                    st.warning(f"⚠️ FAISS search failed: {e}")
+        # Keyword search as fallback
+        if not results:
+            df = self.load_notifications_data()
+            if not df.empty:
+                # Simple keyword matching
+                query_terms = query.lower().split()
+                for idx, row in df.iterrows():
+                    text = f"{row.get('Description', '')} {row.get('Extracted_Keywords', '')}".lower()
+                    if any(term in text for term in query_terms):
+                        results.append({
+                            'content': f"FPSO: {row.get('FPSO')}, Type: {row.get('Notifictn type')}, Description: {row.get('Description', '')[:100]}...",
+                            'metadata': row.to_dict(),
+                            'score': 0.5
+                        })
+                        if len(results) >= k:
+                            break
+        return results[:k]
+    def query_rewriter(self, query: str) -> str:
+        """Rewrite query for better retrieval"""
+        rewrite_prompt = f"""
+        Rewrite the following query to be more specific and searchable for FPSO notifications data.
+        Focus on technical terms, FPSO names (GIR, DAL, PAZ, CLV), notification types (NI/NC), and equipment.
+        Original query: {query}
+        Rewritten query:"""
+        # Use LLM to rewrite query
+        rewritten_query = self.generate_response(rewrite_prompt, max_tokens=50, temperature=0.3)
+        return rewritten_query.strip() if rewritten_query else query
+    def generate_pivot_analysis(self, df: pd.DataFrame) -> str:
+        """Generate pivot analysis summary"""
+        analysis = []
+        # FPSO distribution
+        if 'FPSO' in df.columns:
+            fpso_counts = df['FPSO'].value_counts()
+            analysis.append(f"**FPSO Distribution:** {', '.join([f'{fpso}: {count}' for fpso, count in fpso_counts.items()])}")
+        # Notification type distribution
+        if 'Notifictn type' in df.columns:
+            type_counts = df['Notifictn type'].value_counts()
+            analysis.append(f"**Notification Types:** {', '.join([f'{ntype}: {count}' for ntype, count in type_counts.items()])}")
+        # Keyword analysis
+        if 'Extracted_Keywords' in df.columns:
+            keywords = df['Extracted_Keywords'].str.split(', ').explode()
+            keywords = keywords[keywords != 'None']
+            if not keywords.empty:
+                top_keywords = keywords.value_counts().head(5)
+                analysis.append(f"**Top Keywords:** {', '.join([f'{kw}: {count}' for kw, count in top_keywords.items()])}")
+        return "\n".join(analysis)
+    def generate_response(self, prompt: str, max_tokens: int = 500, temperature: float = 0.7, stream: bool = False) -> str:
+        """Generate response using available LLM"""
+        # Try Groq first
+        if 'groq' in self.llm_client:
+            try:
+                if stream:
+                    return self._stream_groq_response(prompt, max_tokens, temperature)
+                else:
+                    response = self.llm_client['groq'].chat.completions.create(
+                        model="llama3-8b-8192",
+                        messages=[{"role": "user", "content": prompt}],
+                        max_tokens=max_tokens,
+                        temperature=temperature
+                    )
+                    return response.choices[0].message.content
+            except Exception as e:
+                st.warning(f"⚠️ Groq generation failed: {e}")
+        # Try Ollama as fallback
+        if 'ollama' in self.llm_client:
+            try:
+                if stream:
+                    return self._stream_ollama_response(prompt, max_tokens, temperature)
+                else:
+                    response = self.llm_client['ollama'].chat(
+                        model='llama3.2',
+                        messages=[{'role': 'user', 'content': prompt}]
+                    )
+                    return response['message']['content']
+            except Exception as e:
+                st.warning(f"⚠️ Ollama generation failed: {e}")
+        return "I apologize, but I'm unable to generate a response at the moment. Please check your LLM configuration."
+    def _stream_groq_response(self, prompt: str, max_tokens: int, temperature: float):
+        """Stream response from Groq"""
+        try:
+            response = self.llm_client['groq'].chat.completions.create(
+                model="llama3-8b-8192",
+                messages=[{"role": "user", "content": prompt}],
+                max_tokens=max_tokens,
+                temperature=temperature,
+                stream=True
+            )
+            full_response = ""
+            for chunk in response:
+                if chunk.choices[0].delta.content:
+                    content = chunk.choices[0].delta.content
+                    full_response += content
+                    yield content
+            return full_response
+        except Exception as e:
+            st.error(f"❌ Groq streaming failed: {e}")
+            return ""
+    def _stream_ollama_response(self, prompt: str, max_tokens: int, temperature: float):
+        """Stream response from Ollama"""
+        try:
+            response = self.llm_client['ollama'].chat(
+                model='llama3.2',
+                messages=[{'role': 'user', 'content': prompt}],
+                stream=True
+            )
+            full_response = ""
+            for chunk in response:
+                if 'message' in chunk and 'content' in chunk['message']:
+                    content = chunk['message']['content']
+                    full_response += content
+                    yield content
+            return full_response
+        except Exception as e:
+            st.error(f"❌ Ollama streaming failed: {e}")
+            return ""
+    def create_rag_prompt(self, query: str, context: List[Dict[str, Any]], pivot_analysis: str) -> str:
+        """Create optimized RAG prompt"""
+        # Format context
+        context_text = "\n\n".join([
+            f"Document {i+1}:\n{doc['content']}\nRelevance Score: {doc['score']:.3f}"
+            for i, doc in enumerate(context)
+        ])
+        prompt = f"""
+        You are DigiTwin, an expert FPSO (Floating Production Storage and Offloading) notifications analyst.
+        **Context Information:**
+        {context_text}
+        **Current Dataset Analysis:**
+        {pivot_analysis}
+        **Important Definitions:**
+        - NI = Notification of Integrity (maintenance and safety notifications)
+        - NC = Notification of Conformity (compliance and regulatory notifications)
+        - FPSO Units: GIR, DAL, PAZ, CLV
+        **User Query:** {query}
+        Please provide a comprehensive, accurate response based on the context and dataset analysis.
+        Include specific details about FPSO units, notification types, and relevant insights.
+        If the context doesn't contain enough information, say so clearly.
+        **Response:**"""
+        return prompt
+    def process_query(self, query: str, stream: bool = True) -> str:
+        """Process user query through the complete RAG pipeline"""
+        # Step 1: Query rewriting
+        rewritten_query = self.query_rewriter(query)
+        # Step 2: Hybrid search
+        search_results = self.hybrid_search(rewritten_query, k=5)
+        # Step 3: Load data for pivot analysis
+        df = self.load_notifications_data()
+        pivot_analysis = self.generate_pivot_analysis(df) if not df.empty else "No data available"
+        # Step 4: Create RAG prompt
+        rag_prompt = self.create_rag_prompt(query, search_results, pivot_analysis)
+        # Step 5: Generate response
+        if stream:
+            return self.generate_response(rag_prompt, max_tokens=800, temperature=0.7, stream=True)
+        else:
+            return self.generate_response(rag_prompt, max_tokens=800, temperature=0.7, stream=False)
+def initialize_rag_system():
+    """Initialize the RAG system"""
+    with st.spinner("Initializing RAG system..."):
+        rag = DigiTwinRAG()
+        return rag
+def render_chat_interface(rag: DigiTwinRAG):
+    """Render the chat interface"""
+    # Initialize chat history
+    if "messages" not in st.session_state:
+        st.session_state.messages = []
+    # Chat header
+    st.markdown("### 🤖 DigiTwin RAG Assistant")
+    st.markdown("Ask me anything about your FPSO notifications data!")
+    # Display chat messages
+    for message in st.session_state.messages:
+        with st.chat_message(message["role"], avatar=message.get("avatar")):
+            st.markdown(message["content"])
+    # Chat input
+    if prompt := st.chat_input("Ask about your notifications data..."):
+        # Add user message
+        st.session_state.messages.append({
+            "role": "user",
+            "content": prompt,
+            "avatar": "👤"
+        })
+        # Display user message
+        with st.chat_message("user", avatar="👤"):
+            st.markdown(prompt)
+        # Generate and display assistant response
+        with st.chat_message("assistant", avatar="🤖"):
+            message_placeholder = st.empty()
+            try:
+                # Process query with streaming
+                full_response = ""
+                for chunk in rag.process_query(prompt, stream=True):
+                    full_response += chunk
+                    message_placeholder.markdown(full_response + "▌")
+                message_placeholder.markdown(full_response)
+                # Add assistant message to history
+                st.session_state.messages.append({
+                    "role": "assistant",
+                    "content": full_response,
+                    "avatar": "🤖"
+                })
+            except Exception as e:
+                error_msg = f"❌ Error processing query: {str(e)}"
+                message_placeholder.markdown(error_msg)
+                st.session_state.messages.append({
+                    "role": "assistant",
+                    "content": error_msg,
+                    "avatar": "🤖"
+                })
+    # Sidebar controls
+    with st.sidebar:
+        st.markdown("### 🔧 RAG Controls")
+        # Clear chat
+        if st.button("🗑️ Clear Chat"):
+            st.session_state.messages = []
+            st.rerun()
+        # Rebuild vector index
+        if st.button("🔄 Rebuild Vector Index"):
+            with st.spinner("Rebuilding vector index..."):
+                df = rag.load_notifications_data()
+                if not df.empty:
+                    documents = rag.create_document_chunks(df)
+                    embeddings = rag.create_embeddings(documents)
+                    rag.index_documents(documents, embeddings)
+                    st.success("✅ Vector index rebuilt!")
+                else:
+                    st.error("❌ No data available for indexing")
+def main():
+    """Main function to run the RAG chatbot"""
+    st.set_page_config(page_title="DigiTwin RAG Assistant", layout="wide")
+    # Initialize RAG system
+    rag = initialize_rag_system()
+    # Render chat interface
+    render_chat_interface(rag)
+if __name__ == "__main__":
+    main()

src/setup_rag.py ADDED Viewed

	@@ -0,0 +1,185 @@

+#!/usr/bin/env python3
+"""
+DigiTwin RAG Setup Script
+Helps users install and configure the RAG system dependencies
+"""
+import subprocess
+import sys
+import os
+from pathlib import Path
+def run_command(command, description):
+    """Run a command and handle errors"""
+    print(f"🔄 {description}...")
+    try:
+        result = subprocess.run(command, shell=True, check=True, capture_output=True, text=True)
+        print(f"✅ {description} completed successfully")
+        return True
+    except subprocess.CalledProcessError as e:
+        print(f"❌ {description} failed: {e}")
+        print(f"Error output: {e.stderr}")
+        return False
+def check_python_version():
+    """Check if Python version is compatible"""
+    version = sys.version_info
+    if version.major < 3 or (version.major == 3 and version.minor < 8):
+        print("❌ Python 3.8 or higher is required")
+        return False
+    print(f"✅ Python {version.major}.{version.minor}.{version.micro} is compatible")
+    return True
+def install_dependencies():
+    """Install RAG dependencies"""
+    print("🚀 Installing DigiTwin RAG Dependencies")
+    print("=" * 50)
+    # Check Python version
+    if not check_python_version():
+        return False
+    # Install core dependencies
+    dependencies = [
+        ("sentence-transformers", "Sentence Transformers for embeddings"),
+        ("faiss-cpu", "FAISS vector database"),
+        ("weaviate-client", "Weaviate vector database client"),
+        ("groq", "Groq LLM API client"),
+        ("ollama", "Ollama local LLM client"),
+        ("numpy", "Numerical computing"),
+        ("pandas", "Data manipulation"),
+        ("streamlit", "Web application framework")
+    ]
+    success_count = 0
+    for package, description in dependencies:
+        if run_command(f"pip install {package}", f"Installing {description}"):
+            success_count += 1
+    print(f"\n📊 Installation Summary: {success_count}/{len(dependencies)} packages installed successfully")
+    return success_count == len(dependencies)
+def setup_environment():
+    """Setup environment variables and configuration"""
+    print("\n🔧 Setting up environment...")
+    # Create .env file template
+    env_content = """# DigiTwin RAG Environment Configuration
+# Groq API Configuration
+# Get your API key from: https://console.groq.com/
+GROQ_API_KEY=your_groq_api_key_here
+# Ollama Configuration (optional)
+# Install Ollama from: https://ollama.ai/
+OLLAMA_HOST=http://localhost:11434
+# Vector Database Configuration
+# Weaviate (optional) - Install with: docker run -d -p 8080:8080 semitechnologies/weaviate:1.22.4
+WEAVIATE_URL=http://localhost:8080
+# Embedding Model Configuration
+EMBEDDING_MODEL=all-MiniLM-L6-v2
+"""
+    env_file = Path(".env")
+    if not env_file.exists():
+        with open(env_file, "w") as f:
+            f.write(env_content)
+        print("✅ Created .env file template")
+        print("📝 Please edit .env file with your API keys")
+    else:
+        print("ℹ️ .env file already exists")
+def create_directories():
+    """Create necessary directories"""
+    print("\n📁 Creating directories...")
+    directories = [
+        "vector_store",
+        "logs",
+        "models"
+    ]
+    for directory in directories:
+        Path(directory).mkdir(exist_ok=True)
+        print(f"✅ Created directory: {directory}")
+def test_installation():
+    """Test the RAG installation"""
+    print("\n🧪 Testing RAG installation...")
+    test_script = """
+import sys
+import importlib
+# Test imports
+modules_to_test = [
+    'sentence_transformers',
+    'faiss',
+    'weaviate',
+    'groq',
+    'ollama',
+    'numpy',
+    'pandas',
+    'streamlit'
+]
+print("Testing module imports...")
+for module in modules_to_test:
+    try:
+        importlib.import_module(module)
+        print(f"✅ {module}")
+    except ImportError as e:
+        print(f"❌ {module}: {e}")
+print("\\nTesting embedding model...")
+try:
+    from sentence_transformers import SentenceTransformer
+    model = SentenceTransformer('all-MiniLM-L6-v2')
+    test_embedding = model.encode(['test sentence'])
+    print(f"✅ Embedding model working (shape: {test_embedding.shape})")
+except Exception as e:
+    print(f"❌ Embedding model failed: {e}")
+print("\\nRAG system test completed!")
+"""
+    with open("test_rag.py", "w") as f:
+        f.write(test_script)
+    if run_command("python test_rag.py", "Running RAG system test"):
+        print("✅ RAG system test passed!")
+        os.remove("test_rag.py")
+    else:
+        print("❌ RAG system test failed. Please check the errors above.")
+def main():
+    """Main setup function"""
+    print("🤖 DigiTwin RAG Setup")
+    print("=" * 50)
+    # Install dependencies
+    if not install_dependencies():
+        print("\n❌ Some dependencies failed to install. Please check the errors above.")
+        return
+    # Setup environment
+    setup_environment()
+    # Create directories
+    create_directories()
+    # Test installation
+    test_installation()
+    print("\n🎉 Setup completed!")
+    print("\n📋 Next steps:")
+    print("1. Edit .env file with your API keys")
+    print("2. Install Ollama (optional): https://ollama.ai/")
+    print("3. Start Weaviate (optional): docker run -d -p 8080:8080 semitechnologies/weaviate:1.22.4")
+    print("4. Run the application: streamlit run notifs.py")
+    print("\n🚀 Happy coding with DigiTwin RAG!")
+if __name__ == "__main__":
+    main()

src/utils.py ADDED Viewed

	@@ -0,0 +1,222 @@

+"""
+Utilities module for DigiTwin Analytics
+Contains common functions, decorators, and data processing utilities
+"""
+import logging
+import pandas as pd
+from functools import wraps
+from PyPDF2 import PdfReader
+from langchain_community.vectorstores import FAISS
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from langchain.schema import Document as LCDocument
+import streamlit as st
+from config import (
+    NI_keywords, NC_keywords, module_keywords, rack_keywords,
+    living_quarters_keywords, flare_keywords, fwd_keywords, hexagons_keywords,
+    NI_keyword_map, NC_keyword_map
+)
+import matplotlib.patches as patches
+import math
+import matplotlib.transforms as transforms
+# PAZ-specific keywords for data processing
+paz_module_keywords = ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8']
+paz_rack_keywords = ['R1', 'R2', 'R3', 'R4', 'R5', 'R6']
+# PAZ keyword mapping for preprocessing
+paz_keyword_map = {
+    'P1': 'P1', 'P2': 'P2', 'P3': 'P3', 'P4': 'P4', 'P5': 'P5', 'P6': 'P6', 'P7': 'P7', 'P8': 'P8',
+    'S1': 'S1', 'S2': 'S2', 'S3': 'S3', 'S4': 'S4', 'S5': 'S5', 'S6': 'S6', 'S7': 'S7', 'S8': 'S8',
+    'R1': 'R1', 'R2': 'R2', 'R3': 'R3', 'R4': 'R4', 'R5': 'R5', 'R6': 'R6'
+}
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# --- DECORATORS ---
+def log_execution(func):
+    """Decorator to log function execution for debugging"""
+    @wraps(func)
+    def wrapper(*args, **kwargs):
+        logger.info(f"Executing {func.__name__} with args: {args}, kwargs: {kwargs}")
+        try:
+            result = func(*args, **kwargs)
+            logger.info(f"{func.__name__} executed successfully")
+            return result
+        except Exception as e:
+            logger.error(f"Error in {func.__name__}: {str(e)}")
+            raise
+    return wrapper
+# --- DATA PROCESSING FUNCTIONS ---
+@log_execution
+def parse_pdf(file):
+    """Parse PDF file and extract text content"""
+    reader = PdfReader(file)
+    return "\n".join([page.extract_text() for page in reader.pages if page.extract_text()])
+@st.cache_resource
+def build_faiss_vectorstore(_docs):
+    """Build FAISS vectorstore from documents with caching"""
+    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
+    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
+    chunks = []
+    for i, doc in enumerate(_docs):
+        for chunk in splitter.split_text(doc.page_content):
+            chunks.append(LCDocument(page_content=chunk, metadata={"source": f"doc_{i}"}))
+    return FAISS.from_documents(chunks, embeddings)
+@log_execution
+def preprocess_keywords(description):
+    """Preprocess description text for keyword extraction"""
+    description = str(description).upper()
+    for lq_variant in living_quarters_keywords:
+        if lq_variant != 'LQ':
+            description = description.replace(lq_variant, 'LQ')
+    # Handle CLV module keywords
+    for module in module_keywords:
+        number = module[1:]
+        if number in description:
+            description = description.replace(number, module)
+    # Handle PAZ module keywords
+    for module in paz_module_keywords:
+        if module in description:
+            description = description.replace(module, module)
+    # Handle PAZ rack keywords
+    for rack in paz_rack_keywords:
+        if rack in description:
+            description = description.replace(rack, rack)
+    for original, grouped in {**NI_keyword_map, **NC_keyword_map}.items():
+        description = description.replace(original, grouped)
+    return description
+@log_execution
+def extract_ni_nc_keywords(row, notif_type_col, desc_col):
+    """Extract NI/NC keywords from notification row"""
+    description = preprocess_keywords(row[desc_col])
+    notif_type = row[notif_type_col]
+    keywords = [kw for kw in (NI_keywords if notif_type == 'NI' else NC_keywords) if kw in description]
+    return ', '.join(keywords) if keywords else 'None'
+@log_execution
+def extract_location_keywords(row, desc_col, keyword_list):
+    """Extract location keywords from notification row"""
+    description = preprocess_keywords(row[desc_col])
+    if keyword_list == living_quarters_keywords:
+        return 'LQ' if any(kw in description for kw in living_quarters_keywords) else 'None'
+    locations = [kw for kw in keyword_list if kw in description]
+    return ', '.join(locations) if locations else 'None'
+@log_execution
+def create_pivot_table(df, index, columns, aggfunc='size', fill_value=0):
+    """Create pivot table from dataframe"""
+    df_exploded = df.assign(Keywords=df[columns].str.split(', ')).explode('Keywords')
+    df_exploded = df_exploded[df_exploded['Keywords'] != 'None']
+    pivot = pd.pivot_table(df_exploded, index=index, columns='Keywords', aggfunc=aggfunc, fill_value=fill_value)
+    return pivot
+@log_execution
+def apply_fpso_colors(df):
+    """Apply color styling to FPSO dataframe"""
+    styles = pd.DataFrame('', index=df.index, columns=df.columns)
+    color_map = {'GIR': '#FFA07A', 'DAL': '#ADD8E6', 'PAZ': '#D8BFD8', 'CLV': '#90EE90'}
+    for fpso, color in color_map.items():
+        if fpso in df.index:
+            styles.loc[fpso] = f'background-color: {color}'
+    return styles
+@log_execution
+def process_uploaded_files(files):
+    """Process uploaded files and return PDF documents and Excel dataframe"""
+    pdf_files = [f for f in files if f.type == "application/pdf"]
+    excel_files = [f for f in files if f.type == "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]
+    # Process PDF files
+    parsed_docs = []
+    if pdf_files:
+        parsed_docs = [LCDocument(page_content=parse_pdf(f), metadata={"name": f.name}) for f in pdf_files]
+        st.sidebar.success(f"{len(parsed_docs)} PDF reports indexed.")
+    # Process Excel files
+    df = None
+    if excel_files:
+        try:
+            # Use the first Excel file if multiple are uploaded
+            uploaded_xlsx = excel_files[0]
+            df = pd.read_excel(uploaded_xlsx, sheet_name='Global Notifications')
+            df.columns = df.columns.str.strip()
+            expected_columns = {
+                'Notifictn type': 'Notifictn type',
+                'Created on': 'Created on',
+                'Description': 'Description',
+                'FPSO': 'FPSO'
+            }
+            missing_columns = [col for col in expected_columns.values() if col not in df.columns]
+            if missing_columns:
+                st.error(f"Missing columns: {missing_columns}")
+                return parsed_docs, None
+            df = df[list(expected_columns.values())]
+            df.columns = list(expected_columns.keys())
+            df = df[df['FPSO'].isin(['GIR', 'DAL', 'PAZ', 'CLV'])]
+            df['Extracted_Keywords'] = df.apply(extract_ni_nc_keywords, axis=1, args=('Notifictn type', 'Description'))
+            for loc_type, keywords in [
+                ('Modules', module_keywords + paz_module_keywords), ('Racks', rack_keywords + paz_rack_keywords), ('LivingQuarters', living_quarters_keywords),
+                ('Flare', flare_keywords), ('FWD', fwd_keywords), ('HeliDeck', hexagons_keywords)
+            ]:
+                df[f'Extracted_{loc_type}'] = df.apply(extract_location_keywords, axis=1, args=('Description', keywords))
+            st.sidebar.success("Excel file processed successfully.")
+        except Exception as e:
+            st.error(f"Error processing Excel: {e}")
+            return parsed_docs, None
+    return parsed_docs, df
+def add_rectangle(ax, xy, width, height, **kwargs):
+    rectangle = patches.Rectangle(xy, width, height, **kwargs)
+    ax.add_patch(rectangle)
+def add_chamfered_rectangle(ax, xy, width, height, chamfer, **kwargs):
+    x, y = xy
+    coords = [
+        (x + chamfer, y),
+        (x + width - chamfer, y),
+        (x + width, y + chamfer),
+        (x + width, y + height - chamfer),
+        (x + width - chamfer, y + height),
+        (x + chamfer, y + height),
+        (x, y + height - chamfer),
+        (x, y + chamfer)
+    ]
+    polygon = patches.Polygon(coords, closed=True, **kwargs)
+    ax.add_patch(polygon)
+def add_hexagon(ax, xy, radius, **kwargs):
+    x, y = xy
+    vertices = [(x + radius * math.cos(2 * math.pi * n / 6), y + radius * math.sin(2 * math.pi * n / 6)) for n in range(6)]
+    hexagon = patches.Polygon(vertices, closed=True, **kwargs)
+    ax.add_patch(hexagon)
+def add_fwd(ax, xy, width, height, **kwargs):
+    x, y = xy
+    top_width = width * 0.80
+    coords = [
+        (0, 0),
+        (width, 0),
+        (width - (width - top_width) / 2, height),
+        ((width - top_width) / 2, height)
+    ]
+    trapezoid = patches.Polygon(coords, closed=True, **kwargs)
+    t = transforms.Affine2D().rotate_deg(90).translate(x, y)
+    trapezoid.set_transform(t + ax.transData)
+    ax.add_patch(trapezoid)
+    text_t = transforms.Affine2D().rotate_deg(90).translate(x + height / 2, y + width / 2)
+    ax.text(0, -1, "FWD", ha='center', va='center', fontsize=7, weight='bold', transform=text_t + ax.transData)