NLong commited on
Commit
b5fb8d2
Β·
verified Β·
1 Parent(s): 4ec4559

Upload 12 files

Browse files
RAG_SETUP_GUIDE.md ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš€ Enhanced RAG System Setup Guide
2
+
3
+ This guide will help you set up the Enhanced RAG (Retrieval-Augmented Generation) system for saving high-confidence news to Google Drive.
4
+
5
+ ## πŸ“‹ Overview
6
+
7
+ The Enhanced RAG system automatically saves news with **95%+ confidence** from Gemini analysis to Google Drive, allowing you to:
8
+ - View all high-confidence news entries
9
+ - Use them for better RAG analysis
10
+ - Track user input patterns
11
+ - Build a comprehensive knowledge base
12
+
13
+ ## πŸ”§ Setup Steps
14
+
15
+ ### Step 1: Google Cloud Console Setup
16
+
17
+ 1. **Go to Google Cloud Console**
18
+ - Visit: https://console.cloud.google.com/
19
+
20
+ 2. **Create or Select Project**
21
+ - Create a new project or select existing one
22
+ - Note your project ID
23
+
24
+ 3. **Enable Google Drive API**
25
+ - Go to "APIs & Services" β†’ "Library"
26
+ - Search for "Google Drive API"
27
+ - Click "Enable"
28
+
29
+ 4. **Create OAuth 2.0 Credentials**
30
+ - Go to "APIs & Services" β†’ "Credentials"
31
+ - Click "Create Credentials" β†’ "OAuth 2.0 Client IDs"
32
+ - Choose "Desktop application"
33
+ - Download the JSON file
34
+ - Rename it to `credentials.json`
35
+ - Place it in your project directory
36
+
37
+ ### Step 2: Local Setup
38
+
39
+ 1. **Run the Setup Script**
40
+ ```bash
41
+ python setup_google_drive_rag.py
42
+ ```
43
+
44
+ 2. **Follow the Authentication Process**
45
+ - A browser window will open
46
+ - Log in with your Google account
47
+ - Grant permissions for Google Drive access
48
+ - The script will save your credentials
49
+
50
+ 3. **Verify Setup**
51
+ - The script will test Google Drive access
52
+ - It will create the RAG folder and file
53
+ - You'll see confirmation messages
54
+
55
+ ### Step 3: Hugging Face Spaces Setup (Optional)
56
+
57
+ If you want to use this on Hugging Face Spaces:
58
+
59
+ 1. **Add Secrets to Hugging Face**
60
+ - Go to your Space settings
61
+ - Add these secrets:
62
+ - `GOOGLE_CLIENT_ID`: Your OAuth client ID
63
+ - `GOOGLE_CLIENT_SECRET`: Your OAuth client secret
64
+ - `GOOGLE_REFRESH_TOKEN`: Get this from your local token.json
65
+
66
+ 2. **Get Refresh Token**
67
+ - Run the setup script locally first
68
+ - Check the `token.json` file
69
+ - Copy the `refresh_token` value
70
+
71
+ ## πŸ“ File Structure
72
+
73
+ After setup, you'll have:
74
+
75
+ ```
76
+ your-project/
77
+ β”œβ”€β”€ credentials.json # Google OAuth credentials
78
+ β”œβ”€β”€ token.json # Saved authentication token
79
+ β”œβ”€β”€ rag_news_manager.py # Main RAG system
80
+ β”œβ”€β”€ setup_google_drive_rag.py # Setup script
81
+ β”œβ”€β”€ view_rag_news.py # News viewer
82
+ └── app.py # Your main app (updated)
83
+ ```
84
+
85
+ ## πŸ” Google Drive Structure
86
+
87
+ The system creates:
88
+
89
+ ```
90
+ Google Drive/
91
+ └── Vietnamese_Fake_News_RAG/
92
+ └── high_confidence_news.json
93
+ ```
94
+
95
+ ## πŸ“Š How It Works
96
+
97
+ ### Automatic Saving
98
+ - When users input news, the system analyzes it
99
+ - If Gemini confidence > 95%, it's automatically saved to Google Drive
100
+ - Each entry includes:
101
+ - News text
102
+ - Prediction (REAL/FAKE)
103
+ - Confidence score
104
+ - Gemini analysis
105
+ - Search results
106
+ - Timestamp
107
+
108
+ ### Data Format
109
+ ```json
110
+ {
111
+ "metadata": {
112
+ "created_at": "2024-01-01T00:00:00",
113
+ "description": "High-confidence Vietnamese fake news for RAG",
114
+ "threshold": 0.95,
115
+ "total_entries": 10,
116
+ "last_updated": "2024-01-01T12:00:00"
117
+ },
118
+ "news_entries": [
119
+ {
120
+ "id": 1,
121
+ "content_hash": "abc123...",
122
+ "news_text": "Argentina vΓ΄ Δ‘α»‹ch World Cup 2022...",
123
+ "prediction": "REAL",
124
+ "gemini_confidence": 0.98,
125
+ "gemini_analysis": "1. KαΊΎT LUαΊ¬N: THαΊ¬T...",
126
+ "distilbert_confidence": 0.85,
127
+ "search_results": [...],
128
+ "created_at": "2024-01-01T10:00:00",
129
+ "source": "user_input",
130
+ "verified": true
131
+ }
132
+ ]
133
+ }
134
+ ```
135
+
136
+ ## πŸ–₯️ Viewing Saved News
137
+
138
+ ### Option 1: Command Line Viewer
139
+ ```bash
140
+ python view_rag_news.py
141
+ ```
142
+
143
+ Features:
144
+ - View all saved news
145
+ - Filter by prediction (REAL/FAKE)
146
+ - Search through entries
147
+ - View statistics
148
+ - Open Google Drive directly
149
+
150
+ ### Option 2: Google Drive Web Interface
151
+ - Go to your Google Drive
152
+ - Find the "Vietnamese_Fake_News_RAG" folder
153
+ - Open "high_confidence_news.json"
154
+ - View the raw JSON data
155
+
156
+ ### Option 3: Direct Google Drive Links
157
+ The system provides direct links:
158
+ - Folder: `https://drive.google.com/drive/folders/{folder_id}`
159
+ - File: `https://drive.google.com/file/d/{file_id}/view`
160
+
161
+ ## πŸ”§ Configuration
162
+
163
+ ### In app.py
164
+ ```python
165
+ # Enhanced RAG System Configuration
166
+ ENABLE_ENHANCED_RAG = True # Enable/disable the system
167
+ RAG_CONFIDENCE_THRESHOLD = 0.95 # 95% threshold for saving
168
+ ```
169
+
170
+ ### Thresholds
171
+ - **95%**: Only very high-confidence predictions are saved
172
+ - **90%**: More entries saved, but still high quality
173
+ - **85%**: More entries, but some uncertainty
174
+
175
+ ## πŸ“ˆ Statistics
176
+
177
+ The system tracks:
178
+ - Total entries saved
179
+ - Real vs Fake news count
180
+ - Average confidence score
181
+ - Latest entry timestamp
182
+ - Google Drive folder/file IDs
183
+
184
+ ## 🚨 Troubleshooting
185
+
186
+ ### Common Issues
187
+
188
+ 1. **"credentials.json not found"**
189
+ - Make sure you downloaded the OAuth credentials
190
+ - Rename the file to exactly `credentials.json`
191
+ - Place it in the project directory
192
+
193
+ 2. **"Authentication failed"**
194
+ - Check your internet connection
195
+ - Make sure Google Drive API is enabled
196
+ - Try running the setup script again
197
+
198
+ 3. **"Permission denied"**
199
+ - Make sure you granted all required permissions
200
+ - Check if your Google account has Drive access
201
+
202
+ 4. **"RAG system not available"**
203
+ - Check if all dependencies are installed
204
+ - Make sure `rag_news_manager.py` is in the same directory
205
+
206
+ ### Debug Mode
207
+ Add this to see detailed logs:
208
+ ```python
209
+ import logging
210
+ logging.basicConfig(level=logging.DEBUG)
211
+ ```
212
+
213
+ ## πŸ”„ Integration with Existing System
214
+
215
+ The Enhanced RAG system works alongside your existing knowledge base:
216
+ - **Local Knowledge Base**: Still works as before
217
+ - **Enhanced RAG**: Additional Google Drive storage
218
+ - **Both systems**: Can be used together for comprehensive RAG
219
+
220
+ ## πŸ“± Usage Examples
221
+
222
+ ### View Recent News
223
+ ```bash
224
+ python view_rag_news.py
225
+ # Select option 2: View Recent News
226
+ ```
227
+
228
+ ### Search for Specific Topics
229
+ ```bash
230
+ python view_rag_news.py
231
+ # Select option 6: Search News
232
+ # Enter: "COVID-19"
233
+ ```
234
+
235
+ ### Check Statistics
236
+ ```bash
237
+ python view_rag_news.py
238
+ # Select option 1: View Statistics
239
+ ```
240
+
241
+ ## 🎯 Benefits
242
+
243
+ 1. **Automatic Collection**: No manual intervention needed
244
+ 2. **High Quality**: Only 95%+ confidence entries saved
245
+ 3. **Easy Access**: View through multiple interfaces
246
+ 4. **Scalable**: Google Drive handles large datasets
247
+ 5. **Searchable**: Find specific news entries quickly
248
+ 6. **Analytics**: Track patterns and statistics
249
+
250
+ ## πŸ” Security
251
+
252
+ - OAuth 2.0 authentication
253
+ - Credentials stored securely
254
+ - Only your Google account can access
255
+ - No sensitive data exposed
256
+
257
+ ## πŸ“ž Support
258
+
259
+ If you encounter issues:
260
+ 1. Check the troubleshooting section
261
+ 2. Verify all setup steps completed
262
+ 3. Check Google Cloud Console for API quotas
263
+ 4. Ensure proper file permissions
264
+
265
+ ---
266
+
267
+ **πŸŽ‰ Congratulations!** You now have a comprehensive RAG system that automatically saves high-confidence news to Google Drive for analysis and viewing!
app.py CHANGED
@@ -31,6 +31,10 @@ KNOWLEDGE_BASE_DB = "knowledge_base.db"
31
  CONFIDENCE_THRESHOLD = 0.95 # 95% Gemini confidence threshold for RAG knowledge base
32
  ENABLE_KNOWLEDGE_BASE_SEARCH = True # Enable knowledge base search with training data
33
 
 
 
 
 
34
  # Cloud Storage Configuration
35
  USE_CLOUD_STORAGE = True # Set to True to use cloud storage instead of local DB
36
  CLOUD_STORAGE_TYPE = "google_drive" # Options: "google_drive", "google_cloud", "local"
@@ -440,6 +444,23 @@ def get_knowledge_base_stats():
440
  # Initialize knowledge base on startup
441
  init_knowledge_base()
442
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
443
  def populate_knowledge_base_from_training_data():
444
  """Populate knowledge base with existing training data"""
445
  try:
@@ -1366,6 +1387,31 @@ def analyze_news(news_text):
1366
  print("βœ… Successfully added to knowledge base for future RAG retrieval!")
1367
  else:
1368
  print("⚠️ Failed to add to knowledge base (duplicate or error)")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1369
 
1370
  # Build the detailed report with better formatting
1371
  # Use combined_confidence to determine the final classification (not just DistilBERT)
 
31
  CONFIDENCE_THRESHOLD = 0.95 # 95% Gemini confidence threshold for RAG knowledge base
32
  ENABLE_KNOWLEDGE_BASE_SEARCH = True # Enable knowledge base search with training data
33
 
34
+ # Enhanced RAG System Configuration
35
+ ENABLE_ENHANCED_RAG = True # Enable enhanced RAG system for Google Drive
36
+ RAG_CONFIDENCE_THRESHOLD = 0.95 # 95% threshold for saving to RAG
37
+
38
  # Cloud Storage Configuration
39
  USE_CLOUD_STORAGE = True # Set to True to use cloud storage instead of local DB
40
  CLOUD_STORAGE_TYPE = "google_drive" # Options: "google_drive", "google_cloud", "local"
 
444
  # Initialize knowledge base on startup
445
  init_knowledge_base()
446
 
447
+ # Initialize Enhanced RAG System
448
+ if ENABLE_ENHANCED_RAG:
449
+ try:
450
+ from rag_news_manager import initialize_rag_system
451
+ print("πŸš€ Initializing Enhanced RAG System...")
452
+ if initialize_rag_system():
453
+ print("βœ… Enhanced RAG System initialized successfully!")
454
+ else:
455
+ print("⚠️ Enhanced RAG System initialization failed - continuing without it")
456
+ ENABLE_ENHANCED_RAG = False
457
+ except ImportError as e:
458
+ print(f"⚠️ Enhanced RAG System not available: {e}")
459
+ ENABLE_ENHANCED_RAG = False
460
+ except Exception as e:
461
+ print(f"⚠️ Enhanced RAG System initialization error: {e}")
462
+ ENABLE_ENHANCED_RAG = False
463
+
464
  def populate_knowledge_base_from_training_data():
465
  """Populate knowledge base with existing training data"""
466
  try:
 
1387
  print("βœ… Successfully added to knowledge base for future RAG retrieval!")
1388
  else:
1389
  print("⚠️ Failed to add to knowledge base (duplicate or error)")
1390
+
1391
+ # Step 8: Enhanced RAG System - Save to Google Drive if confidence is high enough
1392
+ if ENABLE_ENHANCED_RAG and gemini_max_confidence > RAG_CONFIDENCE_THRESHOLD:
1393
+ try:
1394
+ from rag_news_manager import add_news_to_rag
1395
+
1396
+ print(f"πŸš€ High confidence detected ({gemini_max_confidence:.1%}) - saving to Enhanced RAG system...")
1397
+ final_prediction = "REAL" if gemini_real_confidence > gemini_fake_confidence else "FAKE"
1398
+
1399
+ rag_success = add_news_to_rag(
1400
+ news_text=news_text,
1401
+ gemini_analysis=gemini_analysis,
1402
+ gemini_confidence=gemini_max_confidence,
1403
+ prediction=final_prediction,
1404
+ search_results=search_results,
1405
+ distilbert_confidence=distilbert_confidence
1406
+ )
1407
+
1408
+ if rag_success:
1409
+ print("βœ… Successfully saved to Enhanced RAG system (Google Drive)!")
1410
+ else:
1411
+ print("⚠️ Failed to save to Enhanced RAG system (duplicate or error)")
1412
+
1413
+ except Exception as e:
1414
+ print(f"⚠️ Enhanced RAG system error: {e}")
1415
 
1416
  # Build the detailed report with better formatting
1417
  # Use combined_confidence to determine the final classification (not just DistilBERT)
debug_rag_setup.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Debug RAG system setup
4
+ """
5
+
6
+ def debug_rag_setup():
7
+ """Debug the RAG system setup step by step"""
8
+ print("πŸ”§ Debugging RAG System Setup")
9
+ print("=" * 40)
10
+
11
+ try:
12
+ # Step 1: Test imports
13
+ print("1. Testing imports...")
14
+ from rag_news_manager import RAGNewsManager
15
+ print("βœ… RAGNewsManager imported successfully")
16
+
17
+ # Step 2: Create manager instance
18
+ print("2. Creating RAG manager...")
19
+ manager = RAGNewsManager()
20
+ print("βœ… RAG manager created")
21
+
22
+ # Step 3: Test authentication
23
+ print("3. Testing authentication...")
24
+ if manager.authenticate():
25
+ print("βœ… Authentication successful")
26
+ else:
27
+ print("❌ Authentication failed")
28
+ return False
29
+
30
+ # Step 4: Test folder setup
31
+ print("4. Testing folder setup...")
32
+ if manager.setup_rag_folder():
33
+ print("βœ… Folder setup successful")
34
+ print(f" Folder ID: {manager.rag_folder_id}")
35
+ else:
36
+ print("❌ Folder setup failed")
37
+ return False
38
+
39
+ # Step 5: Test file setup
40
+ print("5. Testing file setup...")
41
+ if manager.setup_rag_file():
42
+ print("βœ… File setup successful")
43
+ print(f" File ID: {manager.rag_file_id}")
44
+ else:
45
+ print("❌ File setup failed")
46
+ return False
47
+
48
+ # Step 6: Test data loading
49
+ print("6. Testing data loading...")
50
+ data = manager.load_rag_data()
51
+ if data:
52
+ print("βœ… Data loading successful")
53
+ print(f" Total entries: {data.get('metadata', {}).get('total_entries', 0)}")
54
+ else:
55
+ print("❌ Data loading failed")
56
+ return False
57
+
58
+ # Step 7: Test statistics
59
+ print("7. Testing statistics...")
60
+ stats = manager.get_rag_statistics()
61
+ if stats:
62
+ print("βœ… Statistics successful")
63
+ print(f" Total entries: {stats['total_entries']}")
64
+ print(f" Folder ID: {stats.get('folder_id', 'None')}")
65
+ print(f" File ID: {stats.get('file_id', 'None')}")
66
+ else:
67
+ print("❌ Statistics failed")
68
+ return False
69
+
70
+ print("\nπŸŽ‰ All tests passed! RAG system is working correctly.")
71
+ return True
72
+
73
+ except Exception as e:
74
+ print(f"❌ Error during debugging: {e}")
75
+ import traceback
76
+ traceback.print_exc()
77
+ return False
78
+
79
+ if __name__ == "__main__":
80
+ debug_rag_setup()
fix_oauth_setup.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Fix OAuth setup for Google Drive RAG system
4
+ """
5
+
6
+ import os
7
+ import json
8
+ from google.oauth2.credentials import Credentials
9
+ from google_auth_oauthlib.flow import InstalledAppFlow
10
+ from google.auth.transport.requests import Request
11
+
12
+ # Configuration
13
+ SCOPES = ['https://www.googleapis.com/auth/drive.file']
14
+ CREDENTIALS_FILE = 'credentials.json'
15
+ TOKEN_FILE = 'token.json'
16
+
17
+ def fix_oauth_setup():
18
+ """Fix OAuth setup with proper redirect URIs"""
19
+ print("πŸ”§ Fixing OAuth Setup for Google Drive RAG")
20
+ print("=" * 50)
21
+
22
+ # Check if credentials file exists
23
+ if not os.path.exists(CREDENTIALS_FILE):
24
+ print(f"❌ {CREDENTIALS_FILE} not found!")
25
+ print("\nπŸ“‹ Please follow these steps:")
26
+ print("1. Go to: https://console.cloud.google.com/")
27
+ print("2. APIs & Services β†’ Credentials")
28
+ print("3. Create Credentials β†’ OAuth 2.0 Client IDs")
29
+ print("4. Application type: Desktop application")
30
+ print("5. Download as 'credentials.json'")
31
+ return False
32
+
33
+ # Delete old token file if it exists
34
+ if os.path.exists(TOKEN_FILE):
35
+ print(f"πŸ—‘οΈ Removing old token file: {TOKEN_FILE}")
36
+ os.remove(TOKEN_FILE)
37
+
38
+ print(f"βœ… Found {CREDENTIALS_FILE}")
39
+
40
+ try:
41
+ # Load and validate credentials
42
+ with open(CREDENTIALS_FILE, 'r') as f:
43
+ creds_data = json.load(f)
44
+
45
+ print("βœ… Credentials file is valid")
46
+ print(f" Client ID: {creds_data.get('client_id', 'N/A')[:20]}...")
47
+
48
+ # Check if it's a desktop application
49
+ if creds_data.get('installed'):
50
+ print("βœ… Desktop application credentials detected")
51
+ else:
52
+ print("⚠️ Warning: This doesn't look like desktop application credentials")
53
+ print(" Make sure you selected 'Desktop application' when creating credentials")
54
+
55
+ except json.JSONDecodeError:
56
+ print("❌ Invalid JSON in credentials file")
57
+ return False
58
+ except Exception as e:
59
+ print(f"❌ Error reading credentials: {e}")
60
+ return False
61
+
62
+ # Try authentication with different ports
63
+ ports_to_try = [8080, 8081, 8082, 0] # 0 means let the system choose
64
+
65
+ for port in ports_to_try:
66
+ try:
67
+ print(f"\nπŸ” Trying authentication on port {port if port > 0 else 'auto'}...")
68
+
69
+ # Create flow
70
+ flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
71
+
72
+ if port == 0:
73
+ # Let the system choose the port
74
+ creds = flow.run_local_server(port=0)
75
+ else:
76
+ # Try specific port
77
+ creds = flow.run_local_server(port=port)
78
+
79
+ print("βœ… Authentication successful!")
80
+
81
+ # Save credentials
82
+ with open(TOKEN_FILE, 'w') as token:
83
+ token.write(creds.to_json())
84
+ print(f"βœ… Credentials saved to {TOKEN_FILE}")
85
+
86
+ # Test the credentials
87
+ print("\nπŸ§ͺ Testing Google Drive access...")
88
+ from googleapiclient.discovery import build
89
+ service = build('drive', 'v3', credentials=creds)
90
+
91
+ results = service.files().list(pageSize=1, fields="files(id, name)").execute()
92
+ files = results.get('files', [])
93
+
94
+ print("βœ… Google Drive access successful!")
95
+ print(f" Found {len(files)} file(s) in your Drive")
96
+
97
+ return True
98
+
99
+ except Exception as e:
100
+ error_msg = str(e).lower()
101
+ if "redirect_uri_mismatch" in error_msg:
102
+ print(f"❌ Port {port} failed: redirect_uri_mismatch")
103
+ if port < 8082: # Don't show this message for the last attempt
104
+ print(" Trying next port...")
105
+ continue
106
+ else:
107
+ print(f"❌ Port {port} failed: {e}")
108
+ if port < 8082:
109
+ print(" Trying next port...")
110
+ continue
111
+
112
+ print("\n❌ All authentication attempts failed!")
113
+ print("\nπŸ”§ Manual Fix Required:")
114
+ print("1. Go to: https://console.cloud.google.com/")
115
+ print("2. APIs & Services β†’ Credentials")
116
+ print("3. Edit your OAuth 2.0 Client ID")
117
+ print("4. Add these to 'Authorized redirect URIs':")
118
+ print(" - http://localhost:8080/")
119
+ print(" - http://localhost:8081/")
120
+ print(" - http://localhost:8082/")
121
+ print(" - http://127.0.0.1:8080/")
122
+ print(" - http://127.0.0.1:8081/")
123
+ print(" - http://127.0.0.1:8082/")
124
+ print("5. Save and try again")
125
+
126
+ return False
127
+
128
+ def main():
129
+ """Main function"""
130
+ print("πŸš€ OAuth Fix for Google Drive RAG System")
131
+ print("=" * 50)
132
+
133
+ if fix_oauth_setup():
134
+ print("\nπŸŽ‰ OAuth setup fixed successfully!")
135
+ print("βœ… You can now run: python setup_google_drive_rag.py")
136
+ else:
137
+ print("\n❌ OAuth setup failed")
138
+ print("πŸ’‘ Please follow the manual fix instructions above")
139
+
140
+ if __name__ == "__main__":
141
+ main()
fix_verification_issue.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Fix Google verification issue for OAuth
4
+ """
5
+
6
+ import os
7
+ import json
8
+ from google.oauth2.credentials import Credentials
9
+ from google_auth_oauthlib.flow import InstalledAppFlow
10
+ from google.auth.transport.requests import Request
11
+
12
+ # Configuration
13
+ SCOPES = ['https://www.googleapis.com/auth/drive.file']
14
+ CREDENTIALS_FILE = 'credentials.json'
15
+ TOKEN_FILE = 'token.json'
16
+
17
+ def fix_verification_issue():
18
+ """Fix Google verification issue"""
19
+ print("πŸ”§ Fixing Google Verification Issue")
20
+ print("=" * 50)
21
+
22
+ print("πŸ“‹ The issue is that your app needs to be in 'Testing' mode")
23
+ print(" and you need to be added as a test user.")
24
+ print()
25
+
26
+ print("πŸ”§ Manual Fix Required:")
27
+ print("1. Go to: https://console.cloud.google.com/")
28
+ print("2. APIs & Services β†’ OAuth consent screen")
29
+ print("3. Make sure 'Publishing status' is set to 'Testing'")
30
+ print("4. Scroll down to 'Test users' section")
31
+ print("5. Click 'Add Users'")
32
+ print("6. Add your email: [email protected]")
33
+ print("7. Save the changes")
34
+ print()
35
+
36
+ print("πŸ”„ Alternative: Change User Type to Internal")
37
+ print(" (If you have a Google Workspace account)")
38
+ print()
39
+
40
+ # Check if credentials exist
41
+ if not os.path.exists(CREDENTIALS_FILE):
42
+ print(f"❌ {CREDENTIALS_FILE} not found!")
43
+ return False
44
+
45
+ # Delete old token file
46
+ if os.path.exists(TOKEN_FILE):
47
+ print(f"πŸ—‘οΈ Removing old token file: {TOKEN_FILE}")
48
+ os.remove(TOKEN_FILE)
49
+
50
+ print("βœ… Ready to test after you add yourself as a test user")
51
+ print()
52
+
53
+ # Ask user if they've completed the steps
54
+ response = input("Have you added yourself as a test user? (y/n): ").strip().lower()
55
+
56
+ if response == 'y':
57
+ print("\nπŸ§ͺ Testing authentication...")
58
+
59
+ try:
60
+ # Try authentication
61
+ flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
62
+ creds = flow.run_local_server(port=0)
63
+
64
+ print("βœ… Authentication successful!")
65
+
66
+ # Save credentials
67
+ with open(TOKEN_FILE, 'w') as token:
68
+ token.write(creds.to_json())
69
+ print(f"βœ… Credentials saved to {TOKEN_FILE}")
70
+
71
+ # Test Google Drive access
72
+ print("\nπŸ§ͺ Testing Google Drive access...")
73
+ from googleapiclient.discovery import build
74
+ service = build('drive', 'v3', credentials=creds)
75
+
76
+ results = service.files().list(pageSize=1, fields="files(id, name)").execute()
77
+ files = results.get('files', [])
78
+
79
+ print("βœ… Google Drive access successful!")
80
+ print(f" Found {len(files)} file(s) in your Drive")
81
+
82
+ return True
83
+
84
+ except Exception as e:
85
+ error_msg = str(e).lower()
86
+ if "access_denied" in error_msg or "verification" in error_msg:
87
+ print("❌ Still getting verification error")
88
+ print("πŸ’‘ Make sure you:")
89
+ print(" 1. Added yourself as a test user")
90
+ print(" 2. Set publishing status to 'Testing'")
91
+ print(" 3. Saved all changes")
92
+ return False
93
+ else:
94
+ print(f"❌ Authentication failed: {e}")
95
+ return False
96
+ else:
97
+ print("πŸ’‘ Please complete the steps above and run this script again")
98
+ return False
99
+
100
+ def main():
101
+ """Main function"""
102
+ print("πŸš€ Google Verification Fix for RAG System")
103
+ print("=" * 50)
104
+
105
+ if fix_verification_issue():
106
+ print("\nπŸŽ‰ Verification issue fixed!")
107
+ print("βœ… You can now run: python setup_google_drive_rag.py")
108
+ else:
109
+ print("\n❌ Please complete the manual steps above")
110
+
111
+ if __name__ == "__main__":
112
+ main()
get_drive_links.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Quick script to get Google Drive links for your RAG files
4
+ """
5
+
6
+ from rag_news_manager import initialize_rag_system, get_rag_stats
7
+
8
+ def get_drive_links():
9
+ """Get direct Google Drive links"""
10
+ print("πŸ”— Getting Google Drive Links...")
11
+
12
+ # Initialize RAG system
13
+ if not initialize_rag_system():
14
+ print("❌ Failed to initialize RAG system")
15
+ return
16
+
17
+ # Get statistics (includes folder and file IDs)
18
+ stats = get_rag_stats()
19
+
20
+ if not stats:
21
+ print("❌ Could not get RAG statistics")
22
+ return
23
+
24
+ print(f"\nπŸ“Š RAG System Statistics:")
25
+ print(f" Total entries: {stats['total_entries']}")
26
+ print(f" Real news: {stats['real_count']}")
27
+ print(f" Fake news: {stats['fake_count']}")
28
+ print(f" Average confidence: {stats['avg_confidence']:.1%}")
29
+
30
+ print(f"\nπŸ”— Google Drive Links:")
31
+
32
+ if stats['folder_id']:
33
+ folder_url = f"https://drive.google.com/drive/folders/{stats['folder_id']}"
34
+ print(f"πŸ“ RAG Folder: {folder_url}")
35
+ print(f" (Click to open in browser)")
36
+
37
+ if stats['file_id']:
38
+ file_url = f"https://drive.google.com/file/d/{stats['file_id']}/view"
39
+ print(f"πŸ“„ RAG File: {file_url}")
40
+ print(f" (Click to view the JSON data)")
41
+
42
+ print(f"\nπŸ’‘ Tips:")
43
+ print(f" - Use the folder link to browse all RAG files")
44
+ print(f" - Use the file link to view the raw JSON data")
45
+ print(f" - Run 'python view_rag_news.py' for a better interface")
46
+
47
+ if __name__ == "__main__":
48
+ get_drive_links()
quick_check.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Quick check to see if you have any saved RAG news
4
+ """
5
+
6
+ from rag_news_manager import initialize_rag_system, get_rag_stats
7
+
8
+ def quick_check():
9
+ """Quick check for saved news"""
10
+ print("πŸ” Quick RAG System Check")
11
+ print("=" * 30)
12
+
13
+ # Initialize RAG system
14
+ if not initialize_rag_system():
15
+ print("❌ RAG system not initialized")
16
+ print("πŸ’‘ Run: python setup_google_drive_rag.py")
17
+ return
18
+
19
+ # Get statistics
20
+ stats = get_rag_stats()
21
+
22
+ if not stats:
23
+ print("❌ Could not get statistics")
24
+ return
25
+
26
+ print(f"πŸ“Š Current Status:")
27
+ print(f" Total entries: {stats['total_entries']}")
28
+
29
+ if stats['total_entries'] == 0:
30
+ print("πŸ“­ No news entries saved yet")
31
+ print("πŸ’‘ Try analyzing some news with your app first!")
32
+ print("πŸ’‘ News with 95%+ confidence will be automatically saved")
33
+ else:
34
+ print(f"βœ… You have {stats['total_entries']} saved news entries!")
35
+ print(f" Real news: {stats['real_count']}")
36
+ print(f" Fake news: {stats['fake_count']}")
37
+ print(f" Average confidence: {stats['avg_confidence']:.1%}")
38
+
39
+ if stats['latest_entry']:
40
+ latest = stats['latest_entry']
41
+ print(f"\nπŸ“° Latest entry:")
42
+ print(f" {latest['news_text'][:80]}...")
43
+ print(f" {latest['prediction']} ({latest['gemini_confidence']:.1%})")
44
+
45
+ # Show Google Drive links
46
+ if stats['folder_id']:
47
+ folder_url = f"https://drive.google.com/drive/folders/{stats['folder_id']}"
48
+ print(f"\nπŸ”— Google Drive Folder: {folder_url}")
49
+
50
+ if stats['file_id']:
51
+ file_url = f"https://drive.google.com/file/d/{stats['file_id']}/view"
52
+ print(f"πŸ”— Google Drive File: {file_url}")
53
+
54
+ if __name__ == "__main__":
55
+ quick_check()
rag_news_manager.py ADDED
@@ -0,0 +1,432 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Enhanced RAG News Manager for Google Drive
4
+ Saves high-confidence news (95%+ from Gemini) to Google Drive for RAG purposes
5
+ """
6
+
7
+ import json
8
+ import os
9
+ import hashlib
10
+ from datetime import datetime
11
+ from google.oauth2.credentials import Credentials
12
+ from google_auth_oauthlib.flow import InstalledAppFlow
13
+ from google.auth.transport.requests import Request
14
+ from googleapiclient.discovery import build
15
+ from googleapiclient.http import MediaIoBaseDownload, MediaIoBaseUpload
16
+ import io
17
+
18
+ # Configuration
19
+ SCOPES = ['https://www.googleapis.com/auth/drive.file']
20
+ RAG_FOLDER_NAME = "Vietnamese_Fake_News_RAG"
21
+ RAG_FILE_NAME = "high_confidence_news.json"
22
+ CONFIDENCE_THRESHOLD = 0.95 # 95% threshold
23
+
24
+ class RAGNewsManager:
25
+ def __init__(self):
26
+ self.service = None
27
+ self.rag_folder_id = None
28
+ self.rag_file_id = None
29
+ self.credentials_file = 'credentials.json'
30
+ self.token_file = 'token.json'
31
+
32
+ def authenticate(self):
33
+ """Authenticate with Google Drive API"""
34
+ try:
35
+ creds = None
36
+
37
+ # Check if running on Hugging Face Spaces
38
+ is_hf_space = os.getenv('SPACE_ID') is not None
39
+
40
+ if is_hf_space:
41
+ # For Hugging Face Spaces, use environment variables
42
+ client_id = os.getenv('GOOGLE_CLIENT_ID')
43
+ client_secret = os.getenv('GOOGLE_CLIENT_SECRET')
44
+ refresh_token = os.getenv('GOOGLE_REFRESH_TOKEN')
45
+
46
+ if client_id and client_secret and refresh_token:
47
+ creds = Credentials.from_authorized_user_info({
48
+ 'client_id': client_id,
49
+ 'client_secret': client_secret,
50
+ 'refresh_token': refresh_token,
51
+ 'token_uri': 'https://oauth2.googleapis.com/token'
52
+ }, SCOPES)
53
+ else:
54
+ print("⚠️ Google Drive credentials not found in Hugging Face secrets")
55
+ return False
56
+ else:
57
+ # For local development, use files
58
+ if os.path.exists(self.token_file):
59
+ creds = Credentials.from_authorized_user_file(self.token_file, SCOPES)
60
+
61
+ # If no valid credentials, request authorization
62
+ if not creds or not creds.valid:
63
+ if creds and creds.expired and creds.refresh_token:
64
+ creds.refresh(Request())
65
+ else:
66
+ if os.path.exists(self.credentials_file):
67
+ flow = InstalledAppFlow.from_client_secrets_file(
68
+ self.credentials_file, SCOPES)
69
+ creds = flow.run_local_server(port=0)
70
+ else:
71
+ print("⚠️ credentials.json not found for local development")
72
+ return False
73
+
74
+ # Save credentials for next run
75
+ with open(self.token_file, 'w') as token:
76
+ token.write(creds.to_json())
77
+
78
+ self.service = build('drive', 'v3', credentials=creds)
79
+ print("βœ… Google Drive authentication successful!")
80
+ return True
81
+
82
+ except Exception as e:
83
+ print(f"❌ Google Drive authentication failed: {e}")
84
+ return False
85
+
86
+ def setup_rag_folder(self):
87
+ """Create or find the RAG folder in Google Drive"""
88
+ try:
89
+ # Check if folder already exists
90
+ results = self.service.files().list(
91
+ q=f"name='{RAG_FOLDER_NAME}' and mimeType='application/vnd.google-apps.folder'",
92
+ fields="files(id, name)"
93
+ ).execute()
94
+
95
+ folders = results.get('files', [])
96
+
97
+ if folders:
98
+ self.rag_folder_id = folders[0]['id']
99
+ print(f"βœ… Found existing RAG folder: {RAG_FOLDER_NAME}")
100
+ else:
101
+ # Create new folder
102
+ folder_metadata = {
103
+ 'name': RAG_FOLDER_NAME,
104
+ 'mimeType': 'application/vnd.google-apps.folder'
105
+ }
106
+
107
+ folder = self.service.files().create(
108
+ body=folder_metadata,
109
+ fields='id'
110
+ ).execute()
111
+
112
+ self.rag_folder_id = folder.get('id')
113
+ print(f"βœ… Created new RAG folder: {RAG_FOLDER_NAME}")
114
+
115
+ return True
116
+
117
+ except Exception as e:
118
+ print(f"❌ Error setting up RAG folder: {e}")
119
+ return False
120
+
121
+ def setup_rag_file(self):
122
+ """Create or find the RAG data file"""
123
+ try:
124
+ # Check if file already exists
125
+ results = self.service.files().list(
126
+ q=f"name='{RAG_FILE_NAME}' and parents in '{self.rag_folder_id}'",
127
+ fields="files(id, name)"
128
+ ).execute()
129
+
130
+ files = results.get('files', [])
131
+
132
+ if files:
133
+ self.rag_file_id = files[0]['id']
134
+ print(f"βœ… Found existing RAG file: {RAG_FILE_NAME}")
135
+ else:
136
+ # Create new file with empty data
137
+ initial_data = {
138
+ "metadata": {
139
+ "created_at": datetime.now().isoformat(),
140
+ "description": "High-confidence Vietnamese fake news for RAG",
141
+ "threshold": CONFIDENCE_THRESHOLD,
142
+ "total_entries": 0
143
+ },
144
+ "news_entries": []
145
+ }
146
+
147
+ file_metadata = {
148
+ 'name': RAG_FILE_NAME,
149
+ 'parents': [self.rag_folder_id]
150
+ }
151
+
152
+ media = MediaIoBaseUpload(
153
+ io.BytesIO(json.dumps(initial_data, ensure_ascii=False, indent=2).encode('utf-8')),
154
+ mimetype='application/json'
155
+ )
156
+
157
+ file = self.service.files().create(
158
+ body=file_metadata,
159
+ media_body=media,
160
+ fields='id'
161
+ ).execute()
162
+
163
+ self.rag_file_id = file.get('id')
164
+ print(f"βœ… Created new RAG file: {RAG_FILE_NAME}")
165
+
166
+ return True
167
+
168
+ except Exception as e:
169
+ print(f"❌ Error setting up RAG file: {e}")
170
+ return False
171
+
172
+ def load_rag_data(self):
173
+ """Load existing RAG data from Google Drive"""
174
+ try:
175
+ if not self.rag_file_id:
176
+ return {"metadata": {"total_entries": 0}, "news_entries": []}
177
+
178
+ request = self.service.files().get_media(fileId=self.rag_file_id)
179
+ file_content = io.BytesIO()
180
+ downloader = MediaIoBaseDownload(file_content, request)
181
+
182
+ done = False
183
+ while done is False:
184
+ status, done = downloader.next_chunk()
185
+
186
+ file_content.seek(0)
187
+ data = json.loads(file_content.read().decode('utf-8'))
188
+
189
+ print(f"πŸ“š Loaded {data.get('metadata', {}).get('total_entries', 0)} entries from RAG file")
190
+ return data
191
+
192
+ except Exception as e:
193
+ print(f"❌ Error loading RAG data: {e}")
194
+ return {"metadata": {"total_entries": 0}, "news_entries": []}
195
+
196
+ def save_rag_data(self, data):
197
+ """Save RAG data to Google Drive"""
198
+ try:
199
+ if not self.rag_file_id:
200
+ return False
201
+
202
+ # Update metadata
203
+ data['metadata']['last_updated'] = datetime.now().isoformat()
204
+ data['metadata']['total_entries'] = len(data['news_entries'])
205
+
206
+ # Convert to JSON
207
+ json_data = json.dumps(data, ensure_ascii=False, indent=2)
208
+
209
+ media = MediaIoBaseUpload(
210
+ io.BytesIO(json_data.encode('utf-8')),
211
+ mimetype='application/json'
212
+ )
213
+
214
+ # Update the file
215
+ self.service.files().update(
216
+ fileId=self.rag_file_id,
217
+ media_body=media
218
+ ).execute()
219
+
220
+ print(f"βœ… Saved {len(data['news_entries'])} entries to RAG file")
221
+ return True
222
+
223
+ except Exception as e:
224
+ print(f"❌ Error saving RAG data: {e}")
225
+ return False
226
+
227
+ def add_high_confidence_news(self, news_text, gemini_analysis, gemini_confidence,
228
+ prediction, search_results=None, distilbert_confidence=None):
229
+ """Add high-confidence news to RAG system"""
230
+ try:
231
+ # Check confidence threshold
232
+ if gemini_confidence < CONFIDENCE_THRESHOLD:
233
+ print(f"⚠️ Confidence {gemini_confidence:.1%} below threshold {CONFIDENCE_THRESHOLD:.1%}")
234
+ return False
235
+
236
+ # Create content hash for deduplication
237
+ content_hash = hashlib.md5(news_text.encode('utf-8')).hexdigest()
238
+
239
+ # Load existing data
240
+ data = self.load_rag_data()
241
+
242
+ # Check if entry already exists
243
+ for entry in data['news_entries']:
244
+ if entry.get('content_hash') == content_hash:
245
+ print(f"⚠️ News already exists in RAG (hash: {content_hash[:8]}...)")
246
+ return False
247
+
248
+ # Create new entry
249
+ new_entry = {
250
+ 'id': len(data['news_entries']) + 1,
251
+ 'content_hash': content_hash,
252
+ 'news_text': news_text,
253
+ 'prediction': prediction,
254
+ 'gemini_confidence': gemini_confidence,
255
+ 'gemini_analysis': gemini_analysis,
256
+ 'distilbert_confidence': distilbert_confidence,
257
+ 'search_results': search_results or [],
258
+ 'created_at': datetime.now().isoformat(),
259
+ 'source': 'user_input',
260
+ 'verified': True # High confidence means verified
261
+ }
262
+
263
+ # Add to data
264
+ data['news_entries'].append(new_entry)
265
+
266
+ # Save to Google Drive
267
+ success = self.save_rag_data(data)
268
+
269
+ if success:
270
+ print(f"βœ… Added high-confidence news to RAG:")
271
+ print(f" πŸ“° News: {news_text[:100]}...")
272
+ print(f" 🎯 Prediction: {prediction}")
273
+ print(f" πŸ“Š Confidence: {gemini_confidence:.1%}")
274
+ print(f" πŸ”— Hash: {content_hash[:8]}...")
275
+ return True
276
+ else:
277
+ return False
278
+
279
+ except Exception as e:
280
+ print(f"❌ Error adding news to RAG: {e}")
281
+ return False
282
+
283
+ def search_rag_news(self, query_text, limit=5):
284
+ """Search RAG news for similar entries"""
285
+ try:
286
+ data = self.load_rag_data()
287
+ if not data['news_entries']:
288
+ return []
289
+
290
+ results = []
291
+ query_lower = query_text.lower()
292
+
293
+ for entry in data['news_entries']:
294
+ # Simple text similarity search
295
+ if (query_lower in entry.get('news_text', '').lower() or
296
+ query_lower in entry.get('gemini_analysis', '').lower()):
297
+
298
+ results.append({
299
+ 'news_text': entry['news_text'],
300
+ 'prediction': entry['prediction'],
301
+ 'confidence': entry['gemini_confidence'],
302
+ 'analysis': entry['gemini_analysis'],
303
+ 'created_at': entry['created_at'],
304
+ 'id': entry['id']
305
+ })
306
+
307
+ # Sort by confidence and creation date
308
+ results.sort(key=lambda x: (x['confidence'], x['created_at']), reverse=True)
309
+ results = results[:limit]
310
+
311
+ if results:
312
+ print(f"πŸ” Found {len(results)} similar entries in RAG")
313
+
314
+ return results
315
+
316
+ except Exception as e:
317
+ print(f"❌ Error searching RAG news: {e}")
318
+ return []
319
+
320
+ def get_rag_statistics(self):
321
+ """Get statistics about RAG data"""
322
+ try:
323
+ data = self.load_rag_data()
324
+ entries = data['news_entries']
325
+
326
+ if not entries:
327
+ return {
328
+ 'total_entries': 0,
329
+ 'real_count': 0,
330
+ 'fake_count': 0,
331
+ 'avg_confidence': 0,
332
+ 'latest_entry': None,
333
+ 'folder_id': self.rag_folder_id,
334
+ 'file_id': self.rag_file_id
335
+ }
336
+
337
+ real_count = sum(1 for entry in entries if entry['prediction'] == 'REAL')
338
+ fake_count = sum(1 for entry in entries if entry['prediction'] == 'FAKE')
339
+ avg_confidence = sum(entry['gemini_confidence'] for entry in entries) / len(entries)
340
+
341
+ # Get latest entry
342
+ latest_entry = max(entries, key=lambda x: x['created_at']) if entries else None
343
+
344
+ stats = {
345
+ 'total_entries': len(entries),
346
+ 'real_count': real_count,
347
+ 'fake_count': fake_count,
348
+ 'avg_confidence': avg_confidence,
349
+ 'latest_entry': latest_entry,
350
+ 'folder_id': self.rag_folder_id,
351
+ 'file_id': self.rag_file_id
352
+ }
353
+
354
+ return stats
355
+
356
+ except Exception as e:
357
+ print(f"❌ Error getting RAG statistics: {e}")
358
+ return None
359
+
360
+ def initialize(self):
361
+ """Initialize the RAG system"""
362
+ print("πŸš€ Initializing RAG News Manager...")
363
+
364
+ if not self.authenticate():
365
+ return False
366
+
367
+ if not self.setup_rag_folder():
368
+ return False
369
+
370
+ if not self.setup_rag_file():
371
+ return False
372
+
373
+ print("βœ… RAG News Manager initialized successfully!")
374
+ return True
375
+
376
+ # Global instance
377
+ rag_manager = RAGNewsManager()
378
+
379
+ def initialize_rag_system():
380
+ """Initialize the RAG system"""
381
+ return rag_manager.initialize()
382
+
383
+ def add_news_to_rag(news_text, gemini_analysis, gemini_confidence, prediction,
384
+ search_results=None, distilbert_confidence=None):
385
+ """Add news to RAG system if confidence is high enough"""
386
+ return rag_manager.add_high_confidence_news(
387
+ news_text, gemini_analysis, gemini_confidence, prediction,
388
+ search_results, distilbert_confidence
389
+ )
390
+
391
+ def search_rag_for_context(query_text, limit=3):
392
+ """Search RAG for context to use in analysis"""
393
+ return rag_manager.search_rag_news(query_text, limit)
394
+
395
+ def get_rag_stats():
396
+ """Get RAG system statistics"""
397
+ return rag_manager.get_rag_statistics()
398
+
399
+ if __name__ == "__main__":
400
+ # Test the RAG system
401
+ print("Testing RAG News Manager...")
402
+
403
+ if initialize_rag_system():
404
+ # Test adding a news entry
405
+ test_news = "Argentina vΓ΄ Δ‘α»‹ch World Cup 2022 lΓ  sα»± thαΊ­t"
406
+ test_analysis = "1. KẾT LUẬN: THẬT\n2. ĐỘ TIN CẬY: THẬT: 98% / GIẒ: 2%"
407
+ test_confidence = 0.98
408
+
409
+ success = add_news_to_rag(
410
+ news_text=test_news,
411
+ gemini_analysis=test_analysis,
412
+ gemini_confidence=test_confidence,
413
+ prediction="REAL"
414
+ )
415
+
416
+ if success:
417
+ print("βœ… Test news added successfully!")
418
+
419
+ # Get statistics
420
+ stats = get_rag_stats()
421
+ if stats:
422
+ print(f"πŸ“Š RAG Statistics:")
423
+ print(f" Total entries: {stats['total_entries']}")
424
+ print(f" Real news: {stats['real_count']}")
425
+ print(f" Fake news: {stats['fake_count']}")
426
+ print(f" Average confidence: {stats['avg_confidence']:.1%}")
427
+ print(f" Google Drive folder ID: {stats['folder_id']}")
428
+ print(f" Google Drive file ID: {stats['file_id']}")
429
+ else:
430
+ print("❌ Failed to add test news")
431
+ else:
432
+ print("❌ Failed to initialize RAG system")
setup_google_drive_rag.py ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Setup script for Google Drive RAG system
4
+ This script helps you set up Google Drive authentication for the RAG news manager
5
+ """
6
+
7
+ import os
8
+ import json
9
+ from google.oauth2.credentials import Credentials
10
+ from google_auth_oauthlib.flow import InstalledAppFlow
11
+ from google.auth.transport.requests import Request
12
+
13
+ # Configuration
14
+ SCOPES = ['https://www.googleapis.com/auth/drive.file']
15
+ CREDENTIALS_FILE = 'credentials.json'
16
+ TOKEN_FILE = 'token.json'
17
+
18
+ def setup_google_drive_credentials():
19
+ """Set up Google Drive credentials for local development"""
20
+ print("πŸ”§ Setting up Google Drive credentials for RAG system...")
21
+ print("=" * 60)
22
+
23
+ # Check if credentials file exists
24
+ if not os.path.exists(CREDENTIALS_FILE):
25
+ print(f"❌ {CREDENTIALS_FILE} not found!")
26
+ print("\nπŸ“‹ To get Google Drive credentials:")
27
+ print("1. Go to Google Cloud Console: https://console.cloud.google.com/")
28
+ print("2. Create a new project or select existing one")
29
+ print("3. Enable Google Drive API")
30
+ print("4. Go to 'Credentials' β†’ 'Create Credentials' β†’ 'OAuth 2.0 Client IDs'")
31
+ print("5. Choose 'Desktop application'")
32
+ print("6. Download the JSON file and rename it to 'credentials.json'")
33
+ print("7. Place it in this directory")
34
+ return False
35
+
36
+ print(f"βœ… Found {CREDENTIALS_FILE}")
37
+
38
+ # Load credentials
39
+ try:
40
+ with open(CREDENTIALS_FILE, 'r') as f:
41
+ creds_data = json.load(f)
42
+
43
+ print("βœ… Credentials file is valid JSON")
44
+ print(f" Client ID: {creds_data.get('client_id', 'N/A')[:20]}...")
45
+ print(f" Project ID: {creds_data.get('project_id', 'N/A')}")
46
+
47
+ except json.JSONDecodeError:
48
+ print("❌ Invalid JSON in credentials file")
49
+ return False
50
+ except Exception as e:
51
+ print(f"❌ Error reading credentials: {e}")
52
+ return False
53
+
54
+ # Authenticate
55
+ creds = None
56
+
57
+ # Check if token file exists
58
+ if os.path.exists(TOKEN_FILE):
59
+ print(f"βœ… Found existing {TOKEN_FILE}")
60
+ try:
61
+ creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
62
+ print("βœ… Loaded existing credentials")
63
+ except Exception as e:
64
+ print(f"⚠️ Error loading existing credentials: {e}")
65
+ creds = None
66
+
67
+ # If no valid credentials, get new ones
68
+ if not creds or not creds.valid:
69
+ if creds and creds.expired and creds.refresh_token:
70
+ print("πŸ”„ Refreshing expired credentials...")
71
+ try:
72
+ creds.refresh(Request())
73
+ print("βœ… Credentials refreshed successfully")
74
+ except Exception as e:
75
+ print(f"❌ Error refreshing credentials: {e}")
76
+ creds = None
77
+
78
+ if not creds:
79
+ print("πŸ” Starting OAuth flow...")
80
+ print(" A browser window will open for authentication")
81
+ print(" Please log in with your Google account and grant permissions")
82
+
83
+ try:
84
+ flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
85
+ creds = flow.run_local_server(port=0)
86
+ print("βœ… Authentication successful!")
87
+ except Exception as e:
88
+ print(f"❌ Authentication failed: {e}")
89
+ return False
90
+
91
+ # Save credentials for next time
92
+ try:
93
+ with open(TOKEN_FILE, 'w') as token:
94
+ token.write(creds.to_json())
95
+ print(f"βœ… Credentials saved to {TOKEN_FILE}")
96
+ except Exception as e:
97
+ print(f"⚠️ Warning: Could not save credentials: {e}")
98
+
99
+ # Test the credentials
100
+ print("\nπŸ§ͺ Testing Google Drive access...")
101
+ try:
102
+ from googleapiclient.discovery import build
103
+ service = build('drive', 'v3', credentials=creds)
104
+
105
+ # List files to test access
106
+ results = service.files().list(pageSize=1, fields="files(id, name)").execute()
107
+ files = results.get('files', [])
108
+
109
+ print("βœ… Google Drive access successful!")
110
+ print(f" Found {len(files)} file(s) in your Drive")
111
+
112
+ if files:
113
+ print(f" Sample file: {files[0]['name']}")
114
+
115
+ return True
116
+
117
+ except Exception as e:
118
+ print(f"❌ Google Drive access test failed: {e}")
119
+ return False
120
+
121
+ def test_rag_system():
122
+ """Test the RAG system"""
123
+ print("\nπŸ§ͺ Testing RAG News Manager...")
124
+ print("=" * 40)
125
+
126
+ try:
127
+ from rag_news_manager import initialize_rag_system, get_rag_stats
128
+
129
+ if initialize_rag_system():
130
+ print("βœ… RAG system initialized successfully!")
131
+
132
+ # Get statistics
133
+ stats = get_rag_stats()
134
+ if stats:
135
+ print(f"πŸ“Š Current RAG Statistics:")
136
+ print(f" Total entries: {stats['total_entries']}")
137
+ print(f" Real news: {stats['real_count']}")
138
+ print(f" Fake news: {stats['fake_count']}")
139
+ print(f" Average confidence: {stats['avg_confidence']:.1%}")
140
+ print(f" Google Drive folder: {stats['folder_id']}")
141
+ print(f" Google Drive file: {stats['file_id']}")
142
+
143
+ # Provide Google Drive links
144
+ if stats['folder_id']:
145
+ folder_url = f"https://drive.google.com/drive/folders/{stats['folder_id']}"
146
+ print(f"\nπŸ”— Google Drive RAG Folder: {folder_url}")
147
+
148
+ if stats['file_id']:
149
+ file_url = f"https://drive.google.com/file/d/{stats['file_id']}/view"
150
+ print(f"πŸ”— Google Drive RAG File: {file_url}")
151
+ else:
152
+ print("⚠️ Could not get RAG statistics")
153
+ else:
154
+ print("❌ RAG system initialization failed")
155
+ return False
156
+
157
+ except ImportError as e:
158
+ print(f"❌ Could not import RAG system: {e}")
159
+ return False
160
+ except Exception as e:
161
+ print(f"❌ RAG system test failed: {e}")
162
+ return False
163
+
164
+ return True
165
+
166
+ def main():
167
+ """Main setup function"""
168
+ print("πŸš€ Google Drive RAG System Setup")
169
+ print("=" * 50)
170
+ print("This script will help you set up Google Drive integration")
171
+ print("for saving high-confidence news for RAG purposes.")
172
+ print()
173
+
174
+ # Step 1: Setup credentials
175
+ if not setup_google_drive_credentials():
176
+ print("\n❌ Setup failed at credentials step")
177
+ return False
178
+
179
+ # Step 2: Test RAG system
180
+ if not test_rag_system():
181
+ print("\n❌ Setup failed at RAG system test")
182
+ return False
183
+
184
+ print("\nπŸŽ‰ Setup completed successfully!")
185
+ print("=" * 50)
186
+ print("βœ… Google Drive credentials configured")
187
+ print("βœ… RAG system initialized")
188
+ print("βœ… Ready to save high-confidence news!")
189
+ print()
190
+ print("πŸ“‹ Next steps:")
191
+ print("1. Your app will now automatically save news with 95%+ confidence")
192
+ print("2. Check your Google Drive for the 'Vietnamese_Fake_News_RAG' folder")
193
+ print("3. View saved news in the 'high_confidence_news.json' file")
194
+ print("4. The system will use this data for better RAG analysis")
195
+
196
+ return True
197
+
198
+ if __name__ == "__main__":
199
+ main()
view_rag_news.py ADDED
@@ -0,0 +1,283 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ RAG News Viewer
4
+ View and manage high-confidence news saved in Google Drive
5
+ """
6
+
7
+ import json
8
+ import os
9
+ from datetime import datetime
10
+ from rag_news_manager import initialize_rag_system, get_rag_stats, rag_manager
11
+
12
+ def format_news_entry(entry, index):
13
+ """Format a news entry for display"""
14
+ created_date = datetime.fromisoformat(entry['created_at'].replace('Z', '+00:00'))
15
+ formatted_date = created_date.strftime("%Y-%m-%d %H:%M:%S")
16
+
17
+ prediction_emoji = "βœ…" if entry['prediction'] == 'REAL' else "❌"
18
+ confidence_color = "🟒" if entry['gemini_confidence'] > 0.95 else "🟑"
19
+
20
+ print(f"\n{'='*80}")
21
+ print(f"πŸ“° ENTRY #{index} - {prediction_emoji} {entry['prediction']} {confidence_color}")
22
+ print(f"{'='*80}")
23
+ print(f"πŸ†” ID: {entry['id']}")
24
+ print(f"πŸ“… Created: {formatted_date}")
25
+ print(f"πŸ“Š Confidence: {entry['gemini_confidence']:.1%}")
26
+ print(f"πŸ”— Hash: {entry['content_hash'][:12]}...")
27
+ print(f"πŸ“ Source: {entry.get('source', 'Unknown')}")
28
+ print(f"βœ… Verified: {entry.get('verified', False)}")
29
+
30
+ if entry.get('distilbert_confidence'):
31
+ print(f"πŸ€– DistilBERT: {entry['distilbert_confidence']:.1%}")
32
+
33
+ print(f"\nπŸ“° NEWS TEXT:")
34
+ print(f"{'-'*40}")
35
+ print(entry['news_text'])
36
+
37
+ print(f"\n🧠 GEMINI ANALYSIS:")
38
+ print(f"{'-'*40}")
39
+ print(entry['gemini_analysis'])
40
+
41
+ if entry.get('search_results'):
42
+ print(f"\nπŸ” SEARCH RESULTS ({len(entry['search_results'])} sources):")
43
+ print(f"{'-'*40}")
44
+ for i, result in enumerate(entry['search_results'][:3], 1):
45
+ print(f"{i}. {result.get('title', 'No title')}")
46
+ print(f" {result.get('snippet', 'No snippet')[:100]}...")
47
+ print(f" πŸ”— {result.get('link', 'No link')}")
48
+
49
+ return True
50
+
51
+ def view_all_news():
52
+ """View all saved news entries"""
53
+ print("πŸ“š VIEWING ALL RAG NEWS ENTRIES")
54
+ print("=" * 60)
55
+
56
+ try:
57
+ data = rag_manager.load_rag_data()
58
+ entries = data.get('news_entries', [])
59
+
60
+ if not entries:
61
+ print("πŸ“­ No news entries found in RAG system")
62
+ return
63
+
64
+ print(f"πŸ“Š Found {len(entries)} news entries")
65
+ print(f"πŸ“… Last updated: {data.get('metadata', {}).get('last_updated', 'Unknown')}")
66
+
67
+ # Sort by creation date (newest first)
68
+ entries.sort(key=lambda x: x['created_at'], reverse=True)
69
+
70
+ for i, entry in enumerate(entries, 1):
71
+ format_news_entry(entry, i)
72
+
73
+ if i < len(entries):
74
+ input("\n⏸️ Press Enter to view next entry (or Ctrl+C to exit)...")
75
+
76
+ print(f"\nβœ… Displayed all {len(entries)} entries")
77
+
78
+ except KeyboardInterrupt:
79
+ print("\n\nπŸ‘‹ Viewing interrupted by user")
80
+ except Exception as e:
81
+ print(f"❌ Error viewing news: {e}")
82
+
83
+ def view_recent_news(limit=5):
84
+ """View recent news entries"""
85
+ print(f"πŸ“° VIEWING {limit} MOST RECENT NEWS ENTRIES")
86
+ print("=" * 50)
87
+
88
+ try:
89
+ data = rag_manager.load_rag_data()
90
+ entries = data.get('news_entries', [])
91
+
92
+ if not entries:
93
+ print("πŸ“­ No news entries found in RAG system")
94
+ return
95
+
96
+ # Sort by creation date (newest first)
97
+ entries.sort(key=lambda x: x['created_at'], reverse=True)
98
+ recent_entries = entries[:limit]
99
+
100
+ print(f"πŸ“Š Showing {len(recent_entries)} most recent entries")
101
+
102
+ for i, entry in enumerate(recent_entries, 1):
103
+ format_news_entry(entry, i)
104
+
105
+ if i < len(recent_entries):
106
+ input("\n⏸️ Press Enter to view next entry (or Ctrl+C to exit)...")
107
+
108
+ except KeyboardInterrupt:
109
+ print("\n\nπŸ‘‹ Viewing interrupted by user")
110
+ except Exception as e:
111
+ print(f"❌ Error viewing recent news: {e}")
112
+
113
+ def view_by_prediction(prediction):
114
+ """View news entries by prediction type"""
115
+ print(f"πŸ” VIEWING {prediction} NEWS ENTRIES")
116
+ print("=" * 50)
117
+
118
+ try:
119
+ data = rag_manager.load_rag_data()
120
+ entries = data.get('news_entries', [])
121
+
122
+ # Filter by prediction
123
+ filtered_entries = [entry for entry in entries if entry['prediction'] == prediction]
124
+
125
+ if not filtered_entries:
126
+ print(f"πŸ“­ No {prediction} news entries found")
127
+ return
128
+
129
+ print(f"πŸ“Š Found {len(filtered_entries)} {prediction} entries")
130
+
131
+ # Sort by confidence (highest first)
132
+ filtered_entries.sort(key=lambda x: x['gemini_confidence'], reverse=True)
133
+
134
+ for i, entry in enumerate(filtered_entries, 1):
135
+ format_news_entry(entry, i)
136
+
137
+ if i < len(filtered_entries):
138
+ input("\n⏸️ Press Enter to view next entry (or Ctrl+C to exit)...")
139
+
140
+ except KeyboardInterrupt:
141
+ print("\n\nπŸ‘‹ Viewing interrupted by user")
142
+ except Exception as e:
143
+ print(f"❌ Error viewing {prediction} news: {e}")
144
+
145
+ def search_news(query):
146
+ """Search news entries"""
147
+ print(f"πŸ” SEARCHING FOR: '{query}'")
148
+ print("=" * 50)
149
+
150
+ try:
151
+ results = rag_manager.search_rag_news(query, limit=10)
152
+
153
+ if not results:
154
+ print("πŸ“­ No matching news entries found")
155
+ return
156
+
157
+ print(f"πŸ“Š Found {len(results)} matching entries")
158
+
159
+ for i, entry in enumerate(results, 1):
160
+ format_news_entry(entry, i)
161
+
162
+ if i < len(results):
163
+ input("\n⏸️ Press Enter to view next entry (or Ctrl+C to exit)...")
164
+
165
+ except KeyboardInterrupt:
166
+ print("\n\nπŸ‘‹ Search interrupted by user")
167
+ except Exception as e:
168
+ print(f"❌ Error searching news: {e}")
169
+
170
+ def show_statistics():
171
+ """Show RAG system statistics"""
172
+ print("πŸ“Š RAG SYSTEM STATISTICS")
173
+ print("=" * 40)
174
+
175
+ try:
176
+ stats = get_rag_stats()
177
+
178
+ if not stats:
179
+ print("❌ Could not retrieve statistics")
180
+ return
181
+
182
+ print(f"πŸ“ˆ Total Entries: {stats['total_entries']}")
183
+ print(f"βœ… Real News: {stats['real_count']}")
184
+ print(f"❌ Fake News: {stats['fake_count']}")
185
+ print(f"πŸ“Š Average Confidence: {stats['avg_confidence']:.1%}")
186
+
187
+ if stats['latest_entry']:
188
+ latest = stats['latest_entry']
189
+ latest_date = datetime.fromisoformat(latest['created_at'].replace('Z', '+00:00'))
190
+ print(f"πŸ•’ Latest Entry: {latest_date.strftime('%Y-%m-%d %H:%M:%S')}")
191
+ print(f" πŸ“° {latest['news_text'][:80]}...")
192
+ print(f" 🎯 {latest['prediction']} ({latest['gemini_confidence']:.1%})")
193
+
194
+ print(f"\nπŸ”— Google Drive Links:")
195
+ if stats['folder_id']:
196
+ folder_url = f"https://drive.google.com/drive/folders/{stats['folder_id']}"
197
+ print(f" πŸ“ RAG Folder: {folder_url}")
198
+
199
+ if stats['file_id']:
200
+ file_url = f"https://drive.google.com/file/d/{stats['file_id']}/view"
201
+ print(f" πŸ“„ RAG File: {file_url}")
202
+
203
+ except Exception as e:
204
+ print(f"❌ Error getting statistics: {e}")
205
+
206
+ def main_menu():
207
+ """Main menu for the viewer"""
208
+ while True:
209
+ print("\n" + "="*60)
210
+ print("πŸ” RAG NEWS VIEWER - Vietnamese Fake News Detection")
211
+ print("="*60)
212
+ print("1. πŸ“Š View Statistics")
213
+ print("2. πŸ“° View Recent News (5 entries)")
214
+ print("3. πŸ“š View All News")
215
+ print("4. βœ… View Real News Only")
216
+ print("5. ❌ View Fake News Only")
217
+ print("6. πŸ” Search News")
218
+ print("7. πŸ”— Open Google Drive")
219
+ print("8. ❌ Exit")
220
+ print("="*60)
221
+
222
+ try:
223
+ choice = input("πŸ‘‰ Select option (1-8): ").strip()
224
+
225
+ if choice == '1':
226
+ show_statistics()
227
+ elif choice == '2':
228
+ view_recent_news(5)
229
+ elif choice == '3':
230
+ view_all_news()
231
+ elif choice == '4':
232
+ view_by_prediction('REAL')
233
+ elif choice == '5':
234
+ view_by_prediction('FAKE')
235
+ elif choice == '6':
236
+ query = input("πŸ” Enter search query: ").strip()
237
+ if query:
238
+ search_news(query)
239
+ else:
240
+ print("❌ Please enter a search query")
241
+ elif choice == '7':
242
+ stats = get_rag_stats()
243
+ if stats and stats['folder_id']:
244
+ folder_url = f"https://drive.google.com/drive/folders/{stats['folder_id']}"
245
+ print(f"πŸ”— Opening Google Drive: {folder_url}")
246
+ import webbrowser
247
+ webbrowser.open(folder_url)
248
+ else:
249
+ print("❌ Google Drive folder not found")
250
+ elif choice == '8':
251
+ print("πŸ‘‹ Goodbye!")
252
+ break
253
+ else:
254
+ print("❌ Invalid choice. Please select 1-8.")
255
+
256
+ except KeyboardInterrupt:
257
+ print("\n\nπŸ‘‹ Goodbye!")
258
+ break
259
+ except Exception as e:
260
+ print(f"❌ Error: {e}")
261
+
262
+ def main():
263
+ """Main function"""
264
+ print("πŸš€ RAG News Viewer")
265
+ print("=" * 30)
266
+
267
+ # Initialize RAG system
268
+ print("πŸ”§ Initializing RAG system...")
269
+ if not initialize_rag_system():
270
+ print("❌ Failed to initialize RAG system")
271
+ print("Please run setup_google_drive_rag.py first")
272
+ return
273
+
274
+ print("βœ… RAG system initialized successfully!")
275
+
276
+ # Show initial statistics
277
+ show_statistics()
278
+
279
+ # Start main menu
280
+ main_menu()
281
+
282
+ if __name__ == "__main__":
283
+ main()