zhiminy commited on
Commit
b0e562e
·
0 Parent(s):
Files changed (7) hide show
  1. .gitattributes +35 -0
  2. .github/workflows/hf_sync.yml +35 -0
  3. .gitignore +4 -0
  4. README.md +122 -0
  5. app.py +2009 -0
  6. msr.py +795 -0
  7. requirements.txt +9 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.github/workflows/hf_sync.yml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+
8
+ jobs:
9
+ sync:
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Checkout GitHub Repository
14
+ uses: actions/checkout@v3
15
+ with:
16
+ fetch-depth: 0 # Fetch the entire history to avoid shallow clone issues
17
+
18
+ - name: Install Git LFS
19
+ run: |
20
+ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
21
+ sudo apt-get install git-lfs
22
+ git lfs install
23
+
24
+ - name: Configure Git
25
+ run: |
26
+ git config --global user.name "GitHub Actions Bot"
27
+ git config --global user.email "[email protected]"
28
+
29
+ - name: Push to Hugging Face
30
+ env:
31
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
32
+ run: |
33
+ git remote add huggingface https://user:${HF_TOKEN}@huggingface.co/spaces/SWE-Arena/SWE-Issue
34
+ git fetch huggingface
35
+ git push huggingface main --force
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ *.env
2
+ *.venv
3
+ *.ipynb
4
+ *.pyc
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SWE-Issue
3
+ emoji: 🤖
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
9
+ hf_oauth: true
10
+ pinned: false
11
+ short_description: Track GitHub issue statistics for SWE agents
12
+ ---
13
+
14
+ # SWE Agent Issue Leaderboard
15
+
16
+ SWE-Issue ranks software engineering agents by their real-world GitHub issue resolution performance.
17
+
18
+ A lightweight platform for tracking real-world GitHub issue statistics for software engineering agents. No benchmarks. No sandboxes. Just real issues that got resolved.
19
+
20
+ Currently, the leaderboard tracks public GitHub issues across open-source repositories where the agent has contributed.
21
+
22
+ ## Why This Exists
23
+
24
+ Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent meets real repositories, real maintainers, and real problem-solving challenges.
25
+
26
+ This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: did the issue get resolved? How many were actually completed? Is the agent improving over time? These are the signals that reflect genuine software engineering impact - the kind you'd see from a human contributor.
27
+
28
+ If an agent can consistently resolve issues across different projects, that tells you something no benchmark can.
29
+
30
+ ## What We Track
31
+
32
+ The leaderboard pulls data directly from GitHub's issue history and shows you key metrics for the current year:
33
+
34
+ **Leaderboard Table**
35
+ - **Total Issues**: How many issues the agent has been involved with (authored, assigned, or mentioned)
36
+ - **Resolved Issues**: How many issues were marked as completed
37
+ - **Resolution Rate**: Percentage of issues that were successfully resolved (see calculation details below)
38
+
39
+ **Monthly Trends Visualization**
40
+ Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
41
+ - Resolution rate trends (line plots)
42
+ - Issue volume over time (bar charts)
43
+
44
+ This helps you see which agents are improving, which are consistently strong, and how active they've been recently.
45
+
46
+ The focus on current-year performance highlights active agents and recent contributions rather than outdated historical data.
47
+
48
+ ## How It Works
49
+
50
+ Behind the scenes, we're doing a few things:
51
+
52
+ **Data Collection**
53
+ We search GitHub using multiple query patterns to catch all issues associated with an agent:
54
+ - Issues authored by the agent (`author:agent-name`)
55
+ - Issues assigned to the agent (`assignee:agent-name`)
56
+ - Issues mentioning the agent (`mentions:agent-name`)
57
+
58
+ **Regular Updates**
59
+ The leaderboard refreshes automatically every day at 12:00 AM UTC. You can also hit the refresh button if you want fresh data right now.
60
+
61
+ **Community Submissions**
62
+ Anyone can submit a coding agent to track via the leaderboard. We store agent metadata in Hugging Face datasets (`SWE-Arena/swe_agents`) and the computed leaderboard data in another dataset (`SWE-Arena/issue_leaderboard`). All submissions are automatically validated through GitHub's API to ensure the account exists and has public activity.
63
+
64
+ ## Using the Leaderboard
65
+
66
+ ### Just Browsing?
67
+ Head to the Leaderboard tab where you'll find:
68
+ - **Searchable table**: Search by agent name or organization
69
+ - **Filterable columns**: Filter by resolution rate to find top performers
70
+ - **Monthly charts**: Scroll down to see resolution rate trends and issue activity over time
71
+ - **Refresh button**: Click to get the latest numbers on demand
72
+
73
+ The charts use color-coded lines and bars so you can easily track individual agents across months.
74
+
75
+ ### Want to Add Your Agent?
76
+ In the Submit Agent tab, provide:
77
+ - **GitHub identifier*** (required): Your agent's GitHub username or bot account
78
+ - **Agent name*** (required): Display name for the leaderboard
79
+ - **Organization*** (required): Your organization or team name
80
+ - **Website*** (required): Link to your agent's homepage or documentation
81
+ - **Description** (optional): Brief explanation of what your agent does
82
+
83
+ Click Submit. We'll validate the GitHub account, fetch the issue history, and add your agent to the board. Initial data loading takes a few seconds.
84
+
85
+ ## Understanding the Metrics
86
+
87
+ **Total Issues vs Resolved Issues**
88
+ Not every issue an agent touches will be resolved. Sometimes issues are opened for discussion, tracking, or exploration. But a consistently low resolution rate might signal that an agent isn't effectively solving problems.
89
+
90
+ **Resolution Rate**
91
+ This is the percentage of issues that were successfully completed, calculated as:
92
+
93
+ Resolution Rate = resolved issues ÷ total issues × 100
94
+
95
+ **Important**: An issue is considered "resolved" when its `state_reason` is marked as `completed` on GitHub. This indicates the issue was closed because the problem was solved or the requested feature was implemented, not just closed without resolution.
96
+
97
+ Higher resolution rates are generally better, but context matters. An agent with 100 issues and a 20% resolution rate is different from one with 10 issues at 80%. Look at both the rate and the volume.
98
+
99
+ **Monthly Trends**
100
+ The visualization below the leaderboard table shows:
101
+ - **Line plots**: How resolution rates change over time for each agent
102
+ - **Bar charts**: How many issues each agent worked on each month
103
+
104
+ Use these charts to spot patterns:
105
+ - Consistent high resolution rates indicate effective problem-solving
106
+ - Increasing trends show agents that are learning and improving
107
+ - High issue volumes with good resolution rates demonstrate both productivity and effectiveness
108
+
109
+ ## What's Next
110
+
111
+ We're planning to add more granular insights:
112
+
113
+ - **Repository-based analysis**: Break down performance by repository to highlight domain strengths, maintainer alignment, and project-specific resolution rates
114
+ - **Extended metrics**: Comment activity, response time, and issue complexity analysis
115
+ - **Resolution time analysis**: Track how long issues take from creation to completion
116
+ - **Issue type patterns**: Identify whether agents are better at bugs, features, or documentation issues
117
+
118
+ Our goal is to make leaderboard data as transparent and reflective of real-world engineering outcomes as possible.
119
+
120
+ ## Questions or Issues?
121
+
122
+ If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-Issue/issues) and we'll take a look.
app.py ADDED
@@ -0,0 +1,2009 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from gradio_leaderboard import Leaderboard
3
+ import json
4
+ import os
5
+ import time
6
+ import requests
7
+ from datetime import datetime, timezone, timedelta
8
+ from collections import defaultdict
9
+ from huggingface_hub import HfApi, hf_hub_download
10
+ from datasets import load_dataset, Dataset
11
+ import threading
12
+ from dotenv import load_dotenv
13
+ import pandas as pd
14
+ import random
15
+ import argparse
16
+ import plotly.graph_objects as go
17
+ from plotly.subplots import make_subplots
18
+ from apscheduler.schedulers.background import BackgroundScheduler
19
+ from apscheduler.triggers.cron import CronTrigger
20
+
21
+ # Load environment variables
22
+ load_dotenv()
23
+
24
+ # Parse command-line arguments
25
+ parser = argparse.ArgumentParser(description='SWE Agent Issue Leaderboard')
26
+ parser.add_argument('--debug', '--DEBUG', action='store_true',
27
+ help='Enable debug mode (limits issue retrieval to 10 per query pattern)')
28
+ parser.add_argument('--no-debug', '--production', action='store_true',
29
+ help='Explicitly disable debug mode (force production mode)')
30
+ args = parser.parse_args()
31
+
32
+ # =============================================================================
33
+ # CONFIGURATION
34
+ # =============================================================================
35
+
36
+ # DEBUG MODE: Set to True to limit issue retrieval for testing
37
+ # When enabled, only fetches up to 10 issues per query pattern per agent
38
+ # Priority: 1) Command-line args, 2) Environment variable, 3) Default (False)
39
+ if args.no_debug:
40
+ DEBUG_MODE = False
41
+ elif args.debug:
42
+ DEBUG_MODE = True
43
+ else:
44
+ DEBUG_MODE = os.getenv('DEBUG_MODE', 'False').lower() in ('true', '1', 'yes')
45
+
46
+ # In-memory cache for debug mode (data persists during session but NOT saved to HF)
47
+ DEBUG_LEADERBOARD_CACHE = {}
48
+ DEBUG_ISSUE_METADATA_CACHE = defaultdict(list)
49
+
50
+ AGENTS_REPO = "SWE-Arena/swe_agents" # HuggingFace dataset for agent metadata
51
+ LEADERBOARD_REPO = "SWE-Arena/issue_leaderboard"
52
+ ISSUE_METADATA_REPO = "SWE-Arena/issue_metadata" # HuggingFace dataset for issue metadata
53
+
54
+ LEADERBOARD_COLUMNS = [
55
+ ("Agent Name", "string"),
56
+ ("Organization", "string"),
57
+ ("Total Issues", "number"),
58
+ ("Resolved Issues", "number"),
59
+ ("Resolved Rate (%)", "number"),
60
+ ]
61
+
62
+ # =============================================================================
63
+ # JSONL FILE OPERATIONS
64
+ # =============================================================================
65
+
66
+ def load_jsonl(filename):
67
+ """Load JSONL file and return list of dictionaries."""
68
+ if not os.path.exists(filename):
69
+ return []
70
+
71
+ data = []
72
+ with open(filename, 'r', encoding='utf-8') as f:
73
+ for line in f:
74
+ line = line.strip()
75
+ if line:
76
+ try:
77
+ entry = json.loads(line)
78
+ data.append(entry)
79
+ except json.JSONDecodeError as e:
80
+ print(f"Warning: Skipping invalid JSON line: {e}")
81
+ return data
82
+
83
+
84
+ def save_jsonl(filename, data):
85
+ """Save list of dictionaries to JSONL file."""
86
+ with open(filename, 'w', encoding='utf-8') as f:
87
+ for item in data:
88
+ f.write(json.dumps(item) + '\n')
89
+
90
+
91
+ def cache_to_dict(cache_list):
92
+ """Convert list of cache entries to dictionary by identifier."""
93
+ return {entry['github_identifier']: entry for entry in cache_list}
94
+
95
+
96
+ def dict_to_cache(cache_dict):
97
+ """Convert dictionary back to list of values."""
98
+ return list(cache_dict.values())
99
+
100
+
101
+ def normalize_date_format(date_string):
102
+ """
103
+ Convert date strings to standardized ISO 8601 format with Z suffix.
104
+ Handles both old format (2025-10-15T23:23:47.983068) and new format (2025-10-15T23:23:47Z).
105
+ """
106
+ if not date_string or date_string == 'N/A':
107
+ return 'N/A'
108
+
109
+ try:
110
+ # Parse the date string (handles both with and without microseconds)
111
+ if '.' in date_string:
112
+ # Old format with microseconds
113
+ dt = datetime.fromisoformat(date_string.replace('Z', '+00:00'))
114
+ else:
115
+ # Already in correct format or GitHub format
116
+ return date_string
117
+
118
+ # Convert to standardized format
119
+ return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
120
+ except Exception as e:
121
+ print(f"Warning: Could not parse date '{date_string}': {e}")
122
+ return date_string
123
+
124
+
125
+ # =============================================================================
126
+ # GITHUB API OPERATIONS
127
+ # =============================================================================
128
+
129
+ def request_with_backoff(method, url, *, headers=None, params=None, json_body=None, data=None, max_retries=10, timeout=30):
130
+ """
131
+ Perform an HTTP request with exponential backoff and jitter for GitHub API.
132
+ Retries on 403/429 (rate limits), 5xx server errors, and transient network exceptions.
133
+
134
+ Returns the final requests.Response on success or non-retryable status, or None after exhausting retries.
135
+ """
136
+ delay = 1.0
137
+ for attempt in range(max_retries):
138
+ try:
139
+ resp = requests.request(
140
+ method,
141
+ url,
142
+ headers=headers or {},
143
+ params=params,
144
+ json=json_body,
145
+ data=data,
146
+ timeout=timeout
147
+ )
148
+
149
+ status = resp.status_code
150
+
151
+ # Success
152
+ if 200 <= status < 300:
153
+ return resp
154
+
155
+ # Rate limits or server errors -> retry with backoff
156
+ if status in (403, 429) or 500 <= status < 600:
157
+ wait = None
158
+
159
+ # Prefer Retry-After when present
160
+ retry_after = resp.headers.get('Retry-After') or resp.headers.get('retry-after')
161
+ if retry_after:
162
+ try:
163
+ wait = float(retry_after)
164
+ except Exception:
165
+ wait = None
166
+
167
+ # Fallback to X-RateLimit-Reset when 403/429
168
+ if wait is None and status in (403, 429):
169
+ reset_hdr = resp.headers.get('X-RateLimit-Reset') or resp.headers.get('x-ratelimit-reset')
170
+ if reset_hdr:
171
+ try:
172
+ reset_ts = int(float(reset_hdr))
173
+ wait = max(reset_ts - time.time() + 2, 1)
174
+ except Exception:
175
+ wait = None
176
+
177
+ # Final fallback: exponential backoff with jitter
178
+ if wait is None:
179
+ wait = delay + random.uniform(0, 0.5)
180
+
181
+ # Cap individual wait to avoid extreme sleeps
182
+ wait = max(1.0, min(wait, 120.0))
183
+ print(f"GitHub API {status}. Backing off {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
184
+ time.sleep(wait)
185
+ delay = min(delay * 2, 60.0)
186
+ continue
187
+
188
+ # Non-retryable error; return response for caller to handle
189
+ return resp
190
+
191
+ except requests.RequestException as e:
192
+ # Network error -> retry with backoff
193
+ wait = delay + random.uniform(0, 0.5)
194
+ wait = max(1.0, min(wait, 60.0))
195
+ print(f"Request error: {e}. Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
196
+ time.sleep(wait)
197
+ delay = min(delay * 2, 60.0)
198
+
199
+ print(f"Exceeded max retries for {url}")
200
+ return None
201
+
202
+ def get_github_token():
203
+ """Get GitHub token from environment variables."""
204
+ token = os.getenv('GITHUB_TOKEN')
205
+ if not token:
206
+ print("Warning: GITHUB_TOKEN not found. API rate limits: 60/hour (authenticated: 5000/hour)")
207
+ return token
208
+
209
+
210
+ def validate_github_username(identifier):
211
+ """Verify that a GitHub identifier exists with backoff-aware requests."""
212
+ try:
213
+ token = get_github_token()
214
+ headers = {'Authorization': f'token {token}'} if token else {}
215
+ url = f'https://api.github.com/users/{identifier}'
216
+ response = request_with_backoff('GET', url, headers=headers, max_retries=1)
217
+ if response is None:
218
+ return False, "Validation error: network/rate limit exhausted"
219
+ if response.status_code == 200:
220
+ return True, "Username is valid"
221
+ elif response.status_code == 404:
222
+ return False, "GitHub identifier not found"
223
+ else:
224
+ return False, f"Validation error: HTTP {response.status_code}"
225
+ except Exception as e:
226
+ return False, f"Validation error: {str(e)}"
227
+
228
+
229
+ def fetch_issues_with_time_partition(base_query, start_date, end_date, headers, issues_by_id, debug_limit=None):
230
+ """
231
+ Fetch issues within a specific time range using time-based partitioning.
232
+ Recursively splits the time range if hitting the 1000-result limit.
233
+
234
+ Args:
235
+ debug_limit: If set, stops fetching after this many issues (for testing)
236
+
237
+ Returns the number of issues found in this time partition.
238
+ """
239
+ # Format dates for GitHub search (YYYY-MM-DD)
240
+ start_str = start_date.strftime('%Y-%m-%d')
241
+ end_str = end_date.strftime('%Y-%m-%d')
242
+
243
+ # Add date range to query
244
+ query = f'{base_query} created:{start_str}..{end_str}'
245
+
246
+ print(f" Searching range {start_str} to {end_str}...")
247
+
248
+ page = 1
249
+ per_page = 100
250
+ total_in_partition = 0
251
+
252
+ while True:
253
+ # Check debug limit
254
+ if debug_limit is not None and total_in_partition >= debug_limit:
255
+ print(f" 🐛 DEBUG MODE: Reached limit of {debug_limit} issues, stopping...")
256
+ return total_in_partition
257
+ url = 'https://api.github.com/search/issues'
258
+ params = {
259
+ 'q': query,
260
+ 'per_page': per_page,
261
+ 'page': page,
262
+ 'sort': 'created',
263
+ 'order': 'asc'
264
+ }
265
+
266
+ try:
267
+ response = request_with_backoff('GET', url, headers=headers, params=params)
268
+ if response is None:
269
+ print(f" Error: retries exhausted for range {start_str} to {end_str}")
270
+ return total_in_partition
271
+
272
+ if response.status_code != 200:
273
+ print(f" Error: HTTP {response.status_code} for range {start_str} to {end_str}")
274
+ return total_in_partition
275
+
276
+ data = response.json()
277
+ total_count = data.get('total_count', 0)
278
+ items = data.get('items', [])
279
+
280
+ if not items:
281
+ break
282
+
283
+ # Add issues to global dict
284
+ for issue in items:
285
+ issue_id = issue.get('id')
286
+ if issue_id and issue_id not in issues_by_id:
287
+ issues_by_id[issue_id] = issue
288
+ total_in_partition += 1
289
+
290
+ # Check if we hit the 1000-result limit
291
+ if total_count > 1000 and page == 10:
292
+ print(f" ⚠️ Hit 1000-result limit ({total_count} total). Splitting time range...")
293
+
294
+ # Calculate midpoint
295
+ time_diff = end_date - start_date
296
+ mid_date = start_date + time_diff / 2
297
+
298
+ # Recursively fetch both halves
299
+ count1 = fetch_issues_with_time_partition(base_query, start_date, mid_date, headers, issues_by_id, debug_limit)
300
+ count2 = fetch_issues_with_time_partition(base_query, mid_date + timedelta(days=1), end_date, headers, issues_by_id, debug_limit)
301
+
302
+ return count1 + count2
303
+
304
+ # Normal pagination: check if there are more pages
305
+ if len(items) < per_page or page >= 10:
306
+ break
307
+
308
+ page += 1
309
+ time.sleep(0.5) # Courtesy delay between pages
310
+
311
+ except Exception as e:
312
+ print(f" Error fetching range {start_str} to {end_str}: {str(e)}")
313
+ return total_in_partition
314
+
315
+ if total_in_partition > 0:
316
+ print(f" ✓ Found {total_in_partition} issues in range {start_str} to {end_str}")
317
+
318
+ return total_in_partition
319
+
320
+
321
+ def extract_issue_metadata(issue):
322
+ """
323
+ Extract minimal issue metadata for efficient storage.
324
+ Only keeps essential fields: html_url, created_at, closed_at, state_reason.
325
+ Note: agent_name is not stored as it's inferred from the folder structure.
326
+
327
+ Issue states:
328
+ - state: "open" or "closed"
329
+ - state_reason: "completed" (resolved), "not_planned" (closed as not planned), or None (still open)
330
+ """
331
+ # Extract dates and state
332
+ created_at = issue.get('created_at')
333
+ closed_at = issue.get('closed_at')
334
+ state = issue.get('state')
335
+ state_reason = issue.get('state_reason')
336
+
337
+ return {
338
+ 'html_url': issue.get('html_url'),
339
+ 'created_at': created_at,
340
+ 'closed_at': closed_at,
341
+ 'state': state,
342
+ 'state_reason': state_reason
343
+ }
344
+
345
+
346
+ def fetch_all_issues_metadata(identifier, agent_name, token=None, start_from_date=None, year=None, exclude_dates=None):
347
+ """
348
+ Fetch issues associated with a GitHub user or bot for the past 6 months.
349
+ Returns lightweight metadata instead of full issue objects.
350
+
351
+ This function employs time-based partitioning to navigate GitHub's 1000-result limit per query.
352
+ It searches using multiple query patterns:
353
+ - is:issue author:{identifier} (issues authored by the bot)
354
+ - is:issue assignee:{identifier} (issues assigned to the bot)
355
+ - is:issue mentions:{identifier} (issues mentioning the bot)
356
+
357
+ Args:
358
+ identifier: GitHub username or bot identifier
359
+ agent_name: Human-readable name of the agent for metadata purposes
360
+ token: GitHub API token for authentication
361
+ start_from_date: Only fetch issues created after this date (for incremental updates)
362
+ year: Year parameter (deprecated, retained for compatibility but not utilized)
363
+ exclude_dates: Set of date objects to exclude from mining (dates that have already been processed)
364
+
365
+ Returns:
366
+ List of dictionaries containing minimal issue metadata
367
+ """
368
+ headers = {'Authorization': f'token {token}'} if token else {}
369
+
370
+ # Debug mode: limit issue retrieval for testing
371
+ debug_limit_per_pattern = 10 if DEBUG_MODE else None
372
+
373
+ if DEBUG_MODE:
374
+ print(f"\n🐛 DEBUG MODE ENABLED: Limiting to {debug_limit_per_pattern} issues per query pattern")
375
+
376
+ # Define query patterns for issues:
377
+ # 1) author pattern: issues authored by the identifier
378
+ # 2) assignee pattern: issues assigned to the identifier
379
+ # 3) mentions pattern: issues mentioning the identifier
380
+ stripped_id = identifier.replace('[bot]', '')
381
+ query_patterns = []
382
+
383
+ # Always add author pattern
384
+ query_patterns.append(f'is:issue author:{identifier}')
385
+
386
+ # Add assignee and mentions patterns
387
+ if stripped_id:
388
+ query_patterns.append(f'is:issue assignee:{stripped_id}')
389
+ query_patterns.append(f'is:issue mentions:{stripped_id}')
390
+
391
+ # Use a dict to deduplicate issues by ID
392
+ issues_by_id = {}
393
+
394
+ # Define time range: past 6 months only (or from start_from_date if specified)
395
+ current_time = datetime.now(timezone.utc)
396
+ six_months_ago = current_time - timedelta(days=180) # ~6 months
397
+
398
+ if start_from_date:
399
+ # Use start_from_date but ensure it's not older than 6 months
400
+ start_date = max(start_from_date, six_months_ago)
401
+ else:
402
+ start_date = six_months_ago
403
+
404
+ # End date is current time
405
+ end_date = current_time
406
+
407
+ for query_pattern in query_patterns:
408
+ print(f"\n🔍 Searching with query: {query_pattern}")
409
+ print(f" Time range: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
410
+
411
+ pattern_start_time = time.time()
412
+ initial_count = len(issues_by_id)
413
+
414
+ # Fetch with time partitioning
415
+ issues_found = fetch_issues_with_time_partition(
416
+ query_pattern,
417
+ start_date,
418
+ end_date,
419
+ headers,
420
+ issues_by_id,
421
+ debug_limit_per_pattern
422
+ )
423
+
424
+ pattern_duration = time.time() - pattern_start_time
425
+ new_issues = len(issues_by_id) - initial_count
426
+
427
+ print(f" ✓ Pattern complete: {new_issues} new issues found ({issues_found} total fetched, {len(issues_by_id) - initial_count - (issues_found - new_issues)} duplicates)")
428
+ print(f" ⏱️ Time taken: {pattern_duration:.1f} seconds")
429
+
430
+ # Delay between different query patterns (shorter in debug mode)
431
+ time.sleep(0.2 if DEBUG_MODE else 1.0)
432
+
433
+ # Convert to lightweight metadata
434
+ all_issues = list(issues_by_id.values())
435
+
436
+ # Filter out issues from excluded dates if specified
437
+ if exclude_dates:
438
+ filtered_issues = []
439
+ excluded_count = 0
440
+ for issue in all_issues:
441
+ created_at = issue.get('created_at')
442
+ if created_at:
443
+ try:
444
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
445
+ issue_date = dt.date()
446
+ if issue_date not in exclude_dates:
447
+ filtered_issues.append(issue)
448
+ else:
449
+ excluded_count += 1
450
+ except Exception:
451
+ filtered_issues.append(issue) # Keep issues with unparseable dates
452
+ else:
453
+ filtered_issues.append(issue) # Keep issues without created_at
454
+
455
+ if excluded_count > 0:
456
+ print(f" ⏭️ Skipped {excluded_count} issues from already-mined dates")
457
+ all_issues = filtered_issues
458
+
459
+ if DEBUG_MODE:
460
+ print(f"\n✅ COMPLETE (DEBUG MODE): Found {len(all_issues)} unique issues for {identifier}")
461
+ print(f" Note: In production mode, this would fetch ALL issues")
462
+ else:
463
+ print(f"\n✅ COMPLETE: Found {len(all_issues)} unique issues for {identifier}")
464
+ print(f"📦 Extracting minimal metadata...")
465
+
466
+ metadata_list = [extract_issue_metadata(issue) for issue in all_issues]
467
+
468
+ # Calculate memory savings
469
+ import sys
470
+ original_size = sys.getsizeof(str(all_issues))
471
+ metadata_size = sys.getsizeof(str(metadata_list))
472
+ savings_pct = ((original_size - metadata_size) / original_size * 100) if original_size > 0 else 0
473
+
474
+ print(f"💾 Memory efficiency: {original_size // 1024}KB → {metadata_size // 1024}KB (saved {savings_pct:.1f}%)")
475
+
476
+ return metadata_list
477
+
478
+
479
+ def calculate_issue_stats_from_metadata(metadata_list):
480
+ """
481
+ Calculate statistics from a list of issue metadata (lightweight objects).
482
+ Works with minimal metadata: html_url, created_at, closed_at, state, state_reason.
483
+
484
+ Returns a dictionary with comprehensive issue metrics.
485
+
486
+ Resolved Rate is calculated as:
487
+ resolved issues / total issues * 100
488
+
489
+ Resolved Issues = issues closed as completed (state_reason="completed")
490
+ We do NOT count issues closed as not planned (state_reason="not_planned")
491
+ """
492
+ total_issues = len(metadata_list)
493
+
494
+ # Count resolved issues - those with state_reason="completed"
495
+ resolved = sum(1 for issue_meta in metadata_list
496
+ if issue_meta.get('state_reason') == 'completed')
497
+
498
+ # Calculate resolved rate
499
+ resolved_rate = (resolved / total_issues * 100) if total_issues > 0 else 0
500
+
501
+ return {
502
+ 'total_issues': total_issues,
503
+ 'resolved': resolved,
504
+ 'resolved_rate': round(resolved_rate, 2),
505
+ }
506
+
507
+
508
+ def calculate_monthly_metrics_by_agent():
509
+ """
510
+ Calculate monthly metrics for all agents for visualization.
511
+ Loads data directly from SWE-Arena/issue_metadata dataset for the current year.
512
+
513
+ Returns:
514
+ dict: {
515
+ 'agents': list of agent names,
516
+ 'months': list of month labels (e.g., '2025-01'),
517
+ 'data': {
518
+ agent_name: {
519
+ 'resolved_rates': list of resolved rates by month,
520
+ 'total_issues': list of issue counts by month,
521
+ 'resolved_issues': list of resolved issue counts by month
522
+ }
523
+ }
524
+ }
525
+ """
526
+ # Get current year for loading metadata
527
+ current_year = datetime.now().year
528
+
529
+ # Load ALL agents from HuggingFace agents repo
530
+ agents = load_agents_from_hf()
531
+
532
+ # Create mapping from agent_identifier to agent_name
533
+ identifier_to_name = {agent.get('github_identifier'): agent.get('agent_name') for agent in agents if agent.get('github_identifier')}
534
+
535
+ # Load all issue metadata for current year from issue_metadata dataset
536
+ all_metadata = load_issue_metadata_for_year(current_year)
537
+
538
+ if not all_metadata:
539
+ return {'agents': [], 'months': [], 'data': {}}
540
+
541
+ # Group by agent and month
542
+ agent_month_data = defaultdict(lambda: defaultdict(list))
543
+
544
+ for issue_meta in all_metadata:
545
+ agent_identifier = issue_meta.get('agent_identifier')
546
+ created_at = issue_meta.get('created_at')
547
+
548
+ if not agent_identifier or not created_at:
549
+ continue
550
+
551
+ # Get agent_name from identifier
552
+ agent_name = identifier_to_name.get(agent_identifier, agent_identifier)
553
+
554
+ try:
555
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
556
+ month_key = f"{dt.year}-{dt.month:02d}"
557
+ agent_month_data[agent_name][month_key].append(issue_meta)
558
+ except Exception as e:
559
+ print(f"Warning: Could not parse date '{created_at}': {e}")
560
+ continue
561
+
562
+ # Get all unique months and sort them
563
+ all_months = set()
564
+ for agent_data in agent_month_data.values():
565
+ all_months.update(agent_data.keys())
566
+ months = sorted(list(all_months))
567
+
568
+ # Calculate metrics for each agent and month
569
+ result_data = {}
570
+ for agent_name, month_dict in agent_month_data.items():
571
+ resolved_rates = []
572
+ total_issues_list = []
573
+ resolved_issues_list = []
574
+
575
+ for month in months:
576
+ issues_in_month = month_dict.get(month, [])
577
+
578
+ # Count resolved issues (those with state_reason="completed")
579
+ resolved_count = sum(1 for issue in issues_in_month if issue.get('state_reason') == 'completed')
580
+
581
+ # Total issues created in this month
582
+ total_count = len(issues_in_month)
583
+
584
+ # Calculate resolved rate
585
+ resolved_rate = (resolved_count / total_count * 100) if total_count > 0 else None
586
+
587
+ resolved_rates.append(resolved_rate)
588
+ total_issues_list.append(total_count)
589
+ resolved_issues_list.append(resolved_count)
590
+
591
+ result_data[agent_name] = {
592
+ 'resolved_rates': resolved_rates,
593
+ 'total_issues': total_issues_list,
594
+ 'resolved_issues': resolved_issues_list
595
+ }
596
+
597
+ return {
598
+ 'agents': sorted(list(agent_month_data.keys())),
599
+ 'months': months,
600
+ 'data': result_data
601
+ }
602
+
603
+
604
+ # =============================================================================
605
+ # ISSUE METADATA STORAGE & RETRIEVAL
606
+ # =============================================================================
607
+
608
+ def group_metadata_by_date(metadata_list):
609
+ """
610
+ Group issue metadata by exact date (year.month.day) for efficient daily storage.
611
+ Returns dict: {(year, month, day): [metadata_list]}
612
+ """
613
+ grouped = defaultdict(list)
614
+
615
+ for issue_meta in metadata_list:
616
+ created_at = issue_meta.get('created_at')
617
+ if not created_at:
618
+ continue
619
+
620
+ try:
621
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
622
+ key = (dt.year, dt.month, dt.day)
623
+ grouped[key].append(issue_meta)
624
+ except Exception as e:
625
+ print(f"Warning: Could not parse date '{created_at}': {e}")
626
+
627
+ return dict(grouped)
628
+
629
+
630
+ def save_issue_metadata_to_hf(metadata_list, agent_identifier):
631
+ """
632
+ Save issue metadata to HuggingFace dataset, organized by [agent_identifier]/YYYY.MM.DD.jsonl.
633
+ Each file is stored in the agent's folder and named YYYY.MM.DD.jsonl for that day's issues.
634
+ In debug mode, saves to in-memory cache only.
635
+
636
+ This function APPENDS new metadata and DEDUPLICATES by html_url.
637
+
638
+ Args:
639
+ metadata_list: List of issue metadata dictionaries
640
+ agent_identifier: GitHub identifier of the agent (used as folder name)
641
+ """
642
+ # Skip saving to HF in debug mode - use in-memory cache instead
643
+ if DEBUG_MODE:
644
+ global DEBUG_ISSUE_METADATA_CACHE
645
+ # Merge with existing cache, deduplicating by html_url
646
+ existing = {issue['html_url']: issue for issue in DEBUG_ISSUE_METADATA_CACHE[agent_identifier] if issue.get('html_url')}
647
+ new = {issue['html_url']: issue for issue in metadata_list if issue.get('html_url')}
648
+ existing.update(new)
649
+ DEBUG_ISSUE_METADATA_CACHE[agent_identifier] = list(existing.values())
650
+ print(f"🐛 DEBUG MODE: Saved to in-memory cache only ({len(metadata_list)} issues) - NOT saved to HuggingFace")
651
+ return True
652
+
653
+ try:
654
+ token = get_hf_token()
655
+ if not token:
656
+ raise Exception("No HuggingFace token found")
657
+
658
+ api = HfApi()
659
+
660
+ # Group by exact date (year, month, day)
661
+ grouped = group_metadata_by_date(metadata_list)
662
+
663
+ for (issue_year, month, day), day_metadata in grouped.items():
664
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
665
+ filename = f"{agent_identifier}/{issue_year}.{month:02d}.{day:02d}.jsonl"
666
+ local_filename = f"{issue_year}.{month:02d}.{day:02d}.jsonl"
667
+ print(f"📤 Uploading {len(day_metadata)} issues to {filename}...")
668
+
669
+ # Download existing file if it exists
670
+ existing_metadata = []
671
+ try:
672
+ file_path = hf_hub_download(
673
+ repo_id=ISSUE_METADATA_REPO,
674
+ filename=filename,
675
+ repo_type="dataset",
676
+ token=token
677
+ )
678
+ existing_metadata = load_jsonl(file_path)
679
+ print(f" Found {len(existing_metadata)} existing issues in {filename}")
680
+ except Exception:
681
+ print(f" No existing file found for {filename}, creating new")
682
+
683
+ # Merge and deduplicate by html_url
684
+ existing_by_url = {meta['html_url']: meta for meta in existing_metadata if meta.get('html_url')}
685
+ new_by_url = {meta['html_url']: meta for meta in day_metadata if meta.get('html_url')}
686
+
687
+ # Update with new data (new data overwrites old)
688
+ existing_by_url.update(new_by_url)
689
+ merged_metadata = list(existing_by_url.values())
690
+
691
+ # Save locally
692
+ save_jsonl(local_filename, merged_metadata)
693
+
694
+ try:
695
+ # Upload to HuggingFace with folder path
696
+ upload_with_retry(
697
+ api=api,
698
+ path_or_fileobj=local_filename,
699
+ path_in_repo=filename,
700
+ repo_id=ISSUE_METADATA_REPO,
701
+ repo_type="dataset",
702
+ token=token
703
+ )
704
+ print(f" ✓ Saved {len(merged_metadata)} total issues to {filename}")
705
+ finally:
706
+ # Always clean up local file, even if upload fails
707
+ if os.path.exists(local_filename):
708
+ os.remove(local_filename)
709
+
710
+ return True
711
+
712
+ except Exception as e:
713
+ print(f"✗ Error saving issue metadata: {str(e)}")
714
+ return False
715
+
716
+
717
+ def load_issue_metadata_for_year(year):
718
+ """
719
+ Load all issue metadata for a specific year from HuggingFace.
720
+ Scans all agent folders and loads daily files matching the year.
721
+ In debug mode, loads from in-memory cache if available.
722
+
723
+ Structure: [agent_identifier]/YYYY.MM.DD.jsonl
724
+
725
+ Returns:
726
+ List of dictionaries with 'agent_identifier' added to each issue metadata.
727
+ """
728
+ # In debug mode, check in-memory cache first
729
+ if DEBUG_MODE and DEBUG_ISSUE_METADATA_CACHE:
730
+ all_metadata = []
731
+ for agent_identifier, metadata_list in DEBUG_ISSUE_METADATA_CACHE.items():
732
+ for issue_meta in metadata_list:
733
+ issue_with_agent = issue_meta.copy()
734
+ issue_with_agent['agent_identifier'] = agent_identifier
735
+ all_metadata.append(issue_with_agent)
736
+ if all_metadata:
737
+ print(f"🐛 DEBUG MODE: Loading issue metadata from in-memory cache ({len(all_metadata)} issues)")
738
+ return all_metadata
739
+
740
+ try:
741
+ api = HfApi()
742
+ token = get_hf_token()
743
+
744
+ # List all files in the repository
745
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
746
+
747
+ # Filter for files matching the year pattern: [agent_identifier]/YYYY.MM.DD.jsonl
748
+ # Extract year from filename
749
+ year_str = str(year)
750
+ year_files = []
751
+ for f in files:
752
+ if f.endswith('.jsonl'):
753
+ parts = f.split('/')
754
+ if len(parts) == 2: # [agent_identifier]/YYYY.MM.DD.jsonl
755
+ filename = parts[1]
756
+ if filename.startswith(year_str + '.'):
757
+ year_files.append(f)
758
+
759
+ print(f"📥 Loading issue metadata for {year} ({len(year_files)} daily files across all agents)...")
760
+
761
+ all_metadata = []
762
+ for filename in year_files:
763
+ try:
764
+ # Extract agent_identifier from path (first part)
765
+ # Format: agent_identifier/YYYY.MM.DD.jsonl
766
+ parts = filename.split('/')
767
+ if len(parts) != 2:
768
+ print(f" Warning: Unexpected filename format: {filename}")
769
+ continue
770
+
771
+ agent_identifier = parts[0]
772
+
773
+ file_path = hf_hub_download(
774
+ repo_id=ISSUE_METADATA_REPO,
775
+ filename=filename,
776
+ repo_type="dataset",
777
+ token=token
778
+ )
779
+ day_metadata = load_jsonl(file_path)
780
+
781
+ # Add agent_identifier to each issue metadata for processing
782
+ for issue_meta in day_metadata:
783
+ issue_meta['agent_identifier'] = agent_identifier
784
+
785
+ all_metadata.extend(day_metadata)
786
+ print(f" ✓ Loaded {len(day_metadata)} issues from {filename}")
787
+ except Exception as e:
788
+ print(f" Warning: Could not load {filename}: {str(e)}")
789
+
790
+ print(f"✓ Loaded {len(all_metadata)} total issues for {year}")
791
+ return all_metadata
792
+
793
+ except Exception as e:
794
+ print(f"✗ Error loading issue metadata for {year}: {str(e)}")
795
+ return []
796
+
797
+
798
+ def get_latest_issue_date_for_agent(agent_identifier):
799
+ """
800
+ Get the latest issue creation date for an agent from stored metadata.
801
+ Used for incremental updates - only fetch issues newer than this date.
802
+
803
+ Structure: [agent_identifier]/YYYY.MM.DD.jsonl
804
+
805
+ Args:
806
+ agent_identifier: GitHub identifier of the agent
807
+
808
+ Returns:
809
+ datetime or None if no existing issues found.
810
+ """
811
+ try:
812
+ api = HfApi()
813
+ token = get_hf_token()
814
+
815
+ # List all files in the repository
816
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
817
+
818
+ # Filter for files in this agent's folder
819
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
820
+ agent_pattern = f"{agent_identifier}/"
821
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
822
+
823
+ if not agent_files:
824
+ return None
825
+
826
+ # Find latest created_at across all files
827
+ latest_date = None
828
+ for filename in agent_files:
829
+ try:
830
+ file_path = hf_hub_download(
831
+ repo_id=ISSUE_METADATA_REPO,
832
+ filename=filename,
833
+ repo_type="dataset",
834
+ token=token
835
+ )
836
+ metadata = load_jsonl(file_path)
837
+
838
+ for issue in metadata:
839
+ created_at = issue.get('created_at')
840
+ if created_at:
841
+ try:
842
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
843
+ if latest_date is None or dt > latest_date:
844
+ latest_date = dt
845
+ except Exception:
846
+ continue
847
+ except Exception:
848
+ continue
849
+
850
+ return latest_date
851
+
852
+ except Exception:
853
+ return None
854
+
855
+
856
+ def get_daily_files_last_n_months(agent_identifier, n_months=6):
857
+ """
858
+ Get list of daily file paths for an agent from the last N months.
859
+
860
+ Args:
861
+ agent_identifier: GitHub identifier of the agent
862
+ n_months: Number of months to look back (default: 6)
863
+
864
+ Returns:
865
+ List of file paths in format: [agent_identifier]/YYYY.MM.DD.jsonl
866
+ """
867
+ try:
868
+ api = HfApi()
869
+ token = get_hf_token()
870
+
871
+ # Calculate date range
872
+ today = datetime.now(timezone.utc)
873
+ n_months_ago = today - timedelta(days=30 * n_months)
874
+
875
+ # List all files in the repository
876
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
877
+
878
+ # Filter for files in this agent's folder
879
+ agent_pattern = f"{agent_identifier}/"
880
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
881
+
882
+ # Filter by date range (extract date from filename)
883
+ recent_files = []
884
+ for filename in agent_files:
885
+ try:
886
+ # Extract date from filename: YYYY.MM.DD.jsonl
887
+ parts = filename.split('/')
888
+ if len(parts) != 2:
889
+ continue
890
+
891
+ date_part = parts[1].replace('.jsonl', '') # Get YYYY.MM.DD
892
+ date_components = date_part.split('.')
893
+ if len(date_components) != 3:
894
+ continue
895
+
896
+ file_year, file_month, file_day = map(int, date_components)
897
+ file_date = datetime(file_year, file_month, file_day, tzinfo=timezone.utc)
898
+
899
+ # Include if within last n_months
900
+ if n_months_ago <= file_date <= today:
901
+ recent_files.append(filename)
902
+ except Exception:
903
+ continue
904
+
905
+ return recent_files
906
+
907
+ except Exception as e:
908
+ print(f"Error getting daily files: {str(e)}")
909
+ return []
910
+
911
+
912
+ def get_already_mined_dates(agent_identifier, n_months=6):
913
+ """
914
+ Get set of dates that have already been mined for an agent.
915
+
916
+ Args:
917
+ agent_identifier: GitHub identifier of the agent
918
+ n_months: Number of months to look back (default: 6)
919
+
920
+ Returns:
921
+ Set of date objects (datetime.date) that already have data files
922
+ """
923
+ try:
924
+ api = HfApi()
925
+
926
+ # Calculate date range
927
+ today = datetime.now(timezone.utc)
928
+ n_months_ago = today - timedelta(days=30 * n_months)
929
+
930
+ # List all files in the repository
931
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
932
+
933
+ # Filter for files in this agent's folder
934
+ agent_pattern = f"{agent_identifier}/"
935
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
936
+
937
+ mined_dates = set()
938
+ for filename in agent_files:
939
+ try:
940
+ # Extract date from filename: [agent_identifier]/YYYY.MM.DD.jsonl
941
+ parts = filename.split('/')
942
+ if len(parts) != 2:
943
+ continue
944
+
945
+ date_part = parts[1].replace('.jsonl', '') # Get YYYY.MM.DD
946
+ date_components = date_part.split('.')
947
+ if len(date_components) != 3:
948
+ continue
949
+
950
+ file_year, file_month, file_day = map(int, date_components)
951
+ file_date = datetime(file_year, file_month, file_day, tzinfo=timezone.utc).date()
952
+
953
+ # Only include dates within the last n_months
954
+ if n_months_ago.date() <= file_date <= today.date():
955
+ mined_dates.add(file_date)
956
+ except Exception as e:
957
+ print(f" Warning: Could not parse date from filename {filename}: {e}")
958
+ continue
959
+
960
+ return mined_dates
961
+
962
+ except Exception as e:
963
+ print(f" Warning: Could not get already-mined dates for {agent_identifier}: {str(e)}")
964
+ return set()
965
+
966
+
967
+ def fetch_issue_current_status(issue_url, token):
968
+ """
969
+ Fetch the current status of a single issue from GitHub API.
970
+
971
+ Args:
972
+ issue_url: Issue HTML URL (e.g., https://github.com/owner/repo/issues/123)
973
+ token: GitHub API token
974
+
975
+ Returns:
976
+ Dictionary with updated state, state_reason, and closed_at, or None if failed
977
+ """
978
+ try:
979
+ # Convert HTML URL to API URL
980
+ # https://github.com/owner/repo/issues/123 -> https://api.github.com/repos/owner/repo/issues/123
981
+ parts = issue_url.replace('https://github.com/', '').split('/')
982
+ if len(parts) < 4:
983
+ return None
984
+
985
+ owner, repo, issue_word, issue_number = parts[0], parts[1], parts[2], parts[3]
986
+ api_url = f'https://api.github.com/repos/{owner}/{repo}/issues/{issue_number}'
987
+
988
+ headers = {'Authorization': f'token {token}'} if token else {}
989
+ response = request_with_backoff('GET', api_url, headers=headers, max_retries=3)
990
+
991
+ if response is None or response.status_code != 200:
992
+ return None
993
+
994
+ issue_data = response.json()
995
+ state = issue_data.get('state')
996
+ state_reason = issue_data.get('state_reason')
997
+ closed_at = issue_data.get('closed_at')
998
+
999
+ return {
1000
+ 'state': state,
1001
+ 'state_reason': state_reason,
1002
+ 'closed_at': closed_at
1003
+ }
1004
+
1005
+ except Exception as e:
1006
+ print(f" Error fetching issue status for {issue_url}: {str(e)}")
1007
+ return None
1008
+
1009
+
1010
+ def refresh_open_issues_for_agent(agent_identifier, token):
1011
+ """
1012
+ Refresh status for all open issues from the last 6 months for an agent.
1013
+ Only updates issues that are still open (state="open" or no state_reason).
1014
+
1015
+ This implements the smart update strategy:
1016
+ - Skip issues that are already closed/resolved
1017
+ - Fetch current status for open issues
1018
+ - Update and save back to daily files
1019
+
1020
+ Args:
1021
+ agent_identifier: GitHub identifier of the agent
1022
+ token: GitHub API token
1023
+
1024
+ Returns:
1025
+ Tuple: (total_checked, updated_count)
1026
+ """
1027
+ print(f"\n🔄 Refreshing open issues for {agent_identifier} (last 6 months)...")
1028
+
1029
+ try:
1030
+ # Get daily files from last 6 months
1031
+ recent_files = get_daily_files_last_n_months(agent_identifier, n_months=6)
1032
+
1033
+ if not recent_files:
1034
+ print(f" No recent files found for {agent_identifier}")
1035
+ return (0, 0)
1036
+
1037
+ print(f" Found {len(recent_files)} daily files to check")
1038
+
1039
+ total_checked = 0
1040
+ updated_count = 0
1041
+
1042
+ # Process each file
1043
+ for filename in recent_files:
1044
+ try:
1045
+ # Download file
1046
+ file_path = hf_hub_download(
1047
+ repo_id=ISSUE_METADATA_REPO,
1048
+ filename=filename,
1049
+ repo_type="dataset",
1050
+ token=get_hf_token()
1051
+ )
1052
+ issues = load_jsonl(file_path)
1053
+
1054
+ if not issues:
1055
+ continue
1056
+
1057
+ updated_issues = []
1058
+ file_had_updates = False
1059
+
1060
+ # Check each issue
1061
+ for issue in issues:
1062
+ # Skip if already closed (has a state_reason)
1063
+ if issue.get('state') == 'closed' and issue.get('state_reason'):
1064
+ updated_issues.append(issue)
1065
+ continue
1066
+
1067
+ # Issue is open, fetch current status
1068
+ total_checked += 1
1069
+ issue_url = issue.get('html_url')
1070
+
1071
+ if not issue_url:
1072
+ updated_issues.append(issue)
1073
+ continue
1074
+
1075
+ current_status = fetch_issue_current_status(issue_url, token)
1076
+
1077
+ if current_status:
1078
+ # Check if status changed (now closed)
1079
+ if current_status['state'] == 'closed':
1080
+ print(f" ✓ Issue status changed: {issue_url}")
1081
+ issue['state'] = current_status['state']
1082
+ issue['state_reason'] = current_status['state_reason']
1083
+ issue['closed_at'] = current_status['closed_at']
1084
+ updated_count += 1
1085
+ file_had_updates = True
1086
+
1087
+ updated_issues.append(issue)
1088
+ time.sleep(0.1) # Rate limiting courtesy delay
1089
+
1090
+ # Save file if there were updates
1091
+ if file_had_updates:
1092
+ # Extract filename components for local save
1093
+ parts = filename.split('/')
1094
+ local_filename = parts[-1] # Just YYYY.MM.DD.jsonl
1095
+
1096
+ # Save locally
1097
+ save_jsonl(local_filename, updated_issues)
1098
+
1099
+ try:
1100
+ # Upload back to HuggingFace
1101
+ api = HfApi()
1102
+ upload_with_retry(
1103
+ api=api,
1104
+ path_or_fileobj=local_filename,
1105
+ path_in_repo=filename,
1106
+ repo_id=ISSUE_METADATA_REPO,
1107
+ repo_type="dataset",
1108
+ token=get_hf_token()
1109
+ )
1110
+ print(f" 💾 Updated {filename}")
1111
+ finally:
1112
+ # Always clean up local file, even if upload fails
1113
+ if os.path.exists(local_filename):
1114
+ os.remove(local_filename)
1115
+
1116
+ except Exception as e:
1117
+ print(f" Warning: Could not process {filename}: {str(e)}")
1118
+ continue
1119
+
1120
+ print(f" ✅ Refresh complete: {total_checked} open issues checked, {updated_count} updated")
1121
+ return (total_checked, updated_count)
1122
+
1123
+ except Exception as e:
1124
+ print(f" ✗ Error refreshing issues for {agent_identifier}: {str(e)}")
1125
+ return (0, 0)
1126
+
1127
+
1128
+ # =============================================================================
1129
+ # HUGGINGFACE DATASET OPERATIONS
1130
+ # =============================================================================
1131
+
1132
+ def load_agents_from_hf():
1133
+ """Load all agent metadata JSON files from HuggingFace dataset."""
1134
+ try:
1135
+ api = HfApi()
1136
+ agents = []
1137
+
1138
+ # List all files in the repository
1139
+ files = api.list_repo_files(repo_id=AGENTS_REPO, repo_type="dataset")
1140
+
1141
+ # Filter for JSON files only
1142
+ json_files = [f for f in files if f.endswith('.json')]
1143
+
1144
+ print(f"Found {len(json_files)} agent files in {AGENTS_REPO}")
1145
+
1146
+ # Download and parse each JSON file
1147
+ for json_file in json_files:
1148
+ try:
1149
+ file_path = hf_hub_download(
1150
+ repo_id=AGENTS_REPO,
1151
+ filename=json_file,
1152
+ repo_type="dataset"
1153
+ )
1154
+
1155
+ with open(file_path, 'r') as f:
1156
+ agent_data = json.load(f)
1157
+ agents.append(agent_data)
1158
+
1159
+ except Exception as e:
1160
+ print(f"Warning: Could not load {json_file}: {str(e)}")
1161
+ continue
1162
+
1163
+ print(f"✓ Loaded {len(agents)} agents from HuggingFace")
1164
+ return agents
1165
+
1166
+ except Exception as e:
1167
+ print(f"Could not load agents from HuggingFace: {str(e)}")
1168
+ return None
1169
+
1170
+
1171
+ def load_leaderboard_dataset():
1172
+ """Load leaderboard data from HuggingFace dataset for current year.
1173
+ In debug mode, loads from in-memory cache if available."""
1174
+ # In debug mode, check in-memory cache first
1175
+ if DEBUG_MODE and DEBUG_LEADERBOARD_CACHE:
1176
+ print(f"🐛 DEBUG MODE: Loading leaderboard from in-memory cache ({len(DEBUG_LEADERBOARD_CACHE)} entries)")
1177
+ return list(DEBUG_LEADERBOARD_CACHE.values())
1178
+
1179
+ try:
1180
+ year = datetime.now().year
1181
+ filename = f"{year}.csv"
1182
+
1183
+ # Try to download the CSV file for current year
1184
+ file_path = hf_hub_download(
1185
+ repo_id=LEADERBOARD_REPO,
1186
+ filename=filename,
1187
+ repo_type="dataset"
1188
+ )
1189
+
1190
+ # Load CSV into list of dicts
1191
+ df = pd.read_csv(file_path)
1192
+ data = df.to_dict('records')
1193
+ print(f"✓ Loaded {len(data)} entries from {filename}")
1194
+ return data
1195
+
1196
+ except Exception as e:
1197
+ print(f"Could not load leaderboard dataset for year {datetime.now().year}: {str(e)}")
1198
+ return None
1199
+
1200
+
1201
+ def get_hf_token():
1202
+ """Get HuggingFace token from environment variables."""
1203
+ token = os.getenv('HF_TOKEN')
1204
+ if not token:
1205
+ print("Warning: HF_TOKEN not found in environment variables")
1206
+ return token
1207
+
1208
+
1209
+ def upload_with_retry(api, path_or_fileobj, path_in_repo, repo_id, repo_type, token, max_retries=5):
1210
+ """
1211
+ Upload file to HuggingFace with exponential backoff retry logic.
1212
+
1213
+ Args:
1214
+ api: HfApi instance
1215
+ path_or_fileobj: Local file path to upload
1216
+ path_in_repo: Target path in the repository
1217
+ repo_id: Repository ID
1218
+ repo_type: Type of repository (e.g., "dataset")
1219
+ token: HuggingFace token
1220
+ max_retries: Maximum number of retry attempts
1221
+
1222
+ Returns:
1223
+ True if upload succeeded, raises exception if all retries failed
1224
+ """
1225
+ delay = 2.0 # Initial delay in seconds
1226
+
1227
+ for attempt in range(max_retries):
1228
+ try:
1229
+ api.upload_file(
1230
+ path_or_fileobj=path_or_fileobj,
1231
+ path_in_repo=path_in_repo,
1232
+ repo_id=repo_id,
1233
+ repo_type=repo_type,
1234
+ token=token
1235
+ )
1236
+ if attempt > 0:
1237
+ print(f" ✓ Upload succeeded on attempt {attempt + 1}/{max_retries}")
1238
+ return True
1239
+
1240
+ except Exception as e:
1241
+ if attempt < max_retries - 1:
1242
+ wait_time = delay + random.uniform(0, 1.0)
1243
+ print(f" ⚠️ Upload failed (attempt {attempt + 1}/{max_retries}): {str(e)}")
1244
+ print(f" ⏳ Retrying in {wait_time:.1f} seconds...")
1245
+ time.sleep(wait_time)
1246
+ delay = min(delay * 2, 60.0) # Exponential backoff, max 60s
1247
+ else:
1248
+ print(f" ✗ Upload failed after {max_retries} attempts: {str(e)}")
1249
+ raise
1250
+
1251
+
1252
+ def save_agent_to_hf(data):
1253
+ """Save a new agent to HuggingFace dataset as {identifier}.json in root."""
1254
+ try:
1255
+ api = HfApi()
1256
+ token = get_hf_token()
1257
+
1258
+ if not token:
1259
+ raise Exception("No HuggingFace token found. Please set HF_TOKEN in your Space settings.")
1260
+
1261
+ identifier = data['github_identifier']
1262
+ filename = f"{identifier}.json"
1263
+
1264
+ # Save locally first
1265
+ with open(filename, 'w') as f:
1266
+ json.dump(data, f, indent=2)
1267
+
1268
+ try:
1269
+ # Upload to HuggingFace (root directory)
1270
+ upload_with_retry(
1271
+ api=api,
1272
+ path_or_fileobj=filename,
1273
+ path_in_repo=filename,
1274
+ repo_id=AGENTS_REPO,
1275
+ repo_type="dataset",
1276
+ token=token
1277
+ )
1278
+ print(f"✓ Saved agent to HuggingFace: {filename}")
1279
+ return True
1280
+ finally:
1281
+ # Always clean up local file, even if upload fails
1282
+ if os.path.exists(filename):
1283
+ os.remove(filename)
1284
+
1285
+ except Exception as e:
1286
+ print(f"✗ Error saving agent: {str(e)}")
1287
+ return False
1288
+
1289
+
1290
+ def save_leaderboard_to_hf(cache_dict):
1291
+ """Save complete leaderboard to HuggingFace dataset as CSV.
1292
+ In debug mode, saves to in-memory cache only."""
1293
+ # Skip saving in debug mode - use in-memory cache instead
1294
+ if DEBUG_MODE:
1295
+ global DEBUG_LEADERBOARD_CACHE
1296
+ DEBUG_LEADERBOARD_CACHE = cache_dict.copy()
1297
+ data_list = dict_to_cache(cache_dict)
1298
+ print(f"🐛 DEBUG MODE: Saved to in-memory cache only ({len(data_list)} entries) - NOT saved to HuggingFace")
1299
+ return True
1300
+
1301
+ try:
1302
+ token = get_hf_token()
1303
+ if not token:
1304
+ raise Exception("No HuggingFace token found. Please set HF_TOKEN in your Space settings.")
1305
+
1306
+ # Convert to DataFrame
1307
+ data_list = dict_to_cache(cache_dict)
1308
+ df = pd.DataFrame(data_list)
1309
+
1310
+ # Save to CSV with year as filename
1311
+ year = datetime.now().year
1312
+ filename = f"{year}.csv"
1313
+ df.to_csv(filename, index=False)
1314
+
1315
+ try:
1316
+ # Upload to HuggingFace
1317
+ api = HfApi()
1318
+ upload_with_retry(
1319
+ api=api,
1320
+ path_or_fileobj=filename,
1321
+ path_in_repo=filename,
1322
+ repo_id=LEADERBOARD_REPO,
1323
+ repo_type="dataset",
1324
+ token=token
1325
+ )
1326
+ print(f"✓ Saved leaderboard to HuggingFace as {filename} ({len(data_list)} entries)")
1327
+ return True
1328
+ finally:
1329
+ # Always clean up local file, even if upload fails
1330
+ if os.path.exists(filename):
1331
+ os.remove(filename)
1332
+
1333
+ except Exception as e:
1334
+ print(f"✗ Error saving leaderboard: {str(e)}")
1335
+ return False
1336
+
1337
+
1338
+ # =============================================================================
1339
+ # DATA MANAGEMENT
1340
+ # =============================================================================
1341
+
1342
+ def update_all_agents_incremental():
1343
+ """
1344
+ Memory-efficient incremental update of issue statistics for all agents.
1345
+
1346
+ Strategy:
1347
+ 1. For each agent, load existing data from SWE-Arena/issue_metadata
1348
+ 2. Identify already-mined dates (based on filename: YYYY.MM.DD.jsonl)
1349
+ 3. Only fetch issues from dates that haven't been mined yet (within last 6 months)
1350
+ 4. If no data exists at all, mine everything from scratch
1351
+ 5. Store minimal metadata (not full issue objects) to avoid storage limits
1352
+ 6. Construct leaderboard from ALL stored metadata (last 6 months)
1353
+
1354
+ Returns dictionary of all agent data with current stats.
1355
+ """
1356
+ token = get_github_token()
1357
+ current_year = datetime.now().year
1358
+
1359
+ # Load agent metadata from HuggingFace
1360
+ agents = load_agents_from_hf()
1361
+ if not agents:
1362
+ print("No agents found in HuggingFace dataset")
1363
+ return {}
1364
+
1365
+ cache_dict = {}
1366
+
1367
+ # Update each agent
1368
+ for agent in agents:
1369
+ identifier = agent.get('github_identifier')
1370
+ agent_name = agent.get('agent_name', 'Unknown')
1371
+
1372
+ if not identifier:
1373
+ print(f"Warning: Skipping agent without identifier: {agent}")
1374
+ continue
1375
+
1376
+ try:
1377
+ print(f"\n{'='*80}")
1378
+ print(f"Processing: {agent_name} ({identifier})")
1379
+ print(f"{'='*80}")
1380
+
1381
+ # Get already-mined dates for this agent (last 6 months)
1382
+ already_mined_dates = get_already_mined_dates(identifier, n_months=6)
1383
+
1384
+ if already_mined_dates:
1385
+ print(f"📅 Found {len(already_mined_dates)} already-mined dates")
1386
+ print(f" Skipping these dates and fetching only new data...")
1387
+ # Fetch only issues from dates not yet mined
1388
+ new_metadata = fetch_all_issues_metadata(
1389
+ identifier,
1390
+ agent_name,
1391
+ token,
1392
+ start_from_date=None, # Use full 6-month range
1393
+ exclude_dates=already_mined_dates # But exclude already-mined dates
1394
+ )
1395
+ else:
1396
+ print(f"📅 No existing data found. Mining everything from scratch...")
1397
+ # Mine everything from scratch (full 6-month range)
1398
+ new_metadata = fetch_all_issues_metadata(
1399
+ identifier,
1400
+ agent_name,
1401
+ token,
1402
+ start_from_date=None
1403
+ )
1404
+
1405
+ if new_metadata:
1406
+ # Save new metadata to HuggingFace (organized by agent_identifier/YYYY.MM.DD.jsonl)
1407
+ print(f"💾 Saving {len(new_metadata)} new issue records...")
1408
+ save_issue_metadata_to_hf(new_metadata, identifier)
1409
+ else:
1410
+ print(f" No new issues to save")
1411
+
1412
+ # Load ALL metadata for current year to calculate stats (aggregates entire last 6 months)
1413
+ print(f"📊 Calculating statistics from ALL stored metadata (last 6 months)...")
1414
+ all_year_metadata = load_issue_metadata_for_year(current_year)
1415
+
1416
+ # Filter for this specific agent
1417
+ agent_metadata = [issue for issue in all_year_metadata if issue.get('agent_identifier') == identifier]
1418
+
1419
+ # Calculate stats from metadata
1420
+ stats = calculate_issue_stats_from_metadata(agent_metadata)
1421
+
1422
+ # Merge metadata with stats
1423
+ cache_dict[identifier] = {
1424
+ 'agent_name': agent_name,
1425
+ 'organization': agent.get('organization', 'Unknown'),
1426
+ 'github_identifier': identifier,
1427
+ **stats
1428
+ }
1429
+
1430
+ print(f"✓ Updated {identifier}: {stats['total_issues']} issues, {stats['resolved_rate']}% resolved")
1431
+
1432
+ except Exception as e:
1433
+ print(f"✗ Error updating {identifier}: {str(e)}")
1434
+ import traceback
1435
+ traceback.print_exc()
1436
+ continue
1437
+
1438
+ return cache_dict
1439
+
1440
+
1441
+ def construct_leaderboard_from_metadata():
1442
+ """
1443
+ Construct leaderboard from stored issue metadata instead of fetching all issues.
1444
+ Much more memory-efficient and faster.
1445
+
1446
+ Returns dictionary of agent stats.
1447
+ """
1448
+ print("📊 Constructing leaderboard from issue metadata...")
1449
+ current_year = datetime.now().year
1450
+
1451
+ # Load agents
1452
+ agents = load_agents_from_hf()
1453
+ if not agents:
1454
+ print("No agents found")
1455
+ return {}
1456
+
1457
+ # Load all issue metadata for current year
1458
+ all_metadata = load_issue_metadata_for_year(current_year)
1459
+
1460
+ cache_dict = {}
1461
+
1462
+ for agent in agents:
1463
+ identifier = agent.get('github_identifier')
1464
+ agent_name = agent.get('agent_name', 'Unknown')
1465
+
1466
+ # Filter metadata for this agent
1467
+ agent_metadata = [issue for issue in all_metadata if issue.get('agent_identifier') == identifier]
1468
+
1469
+ # Calculate stats
1470
+ stats = calculate_issue_stats_from_metadata(agent_metadata)
1471
+
1472
+ cache_dict[identifier] = {
1473
+ 'agent_name': agent_name,
1474
+ 'organization': agent.get('organization', 'Unknown'),
1475
+ 'github_identifier': identifier,
1476
+ **stats
1477
+ }
1478
+
1479
+ return cache_dict
1480
+
1481
+
1482
+ def initialize_data():
1483
+ """
1484
+ Initialize data on application startup.
1485
+ Priority: 1) Leaderboard dataset, 2) Issue metadata (if available), 3) Full GitHub mining
1486
+
1487
+ In DEBUG MODE:
1488
+ - If no data available, automatically mine up to 10 issues per query per agent
1489
+ - Does NOT save to HuggingFace datasets
1490
+ """
1491
+ print("🚀 Initializing leaderboard data...")
1492
+
1493
+ # Try loading existing leaderboard
1494
+ leaderboard_data = load_leaderboard_dataset()
1495
+ if leaderboard_data:
1496
+ print("✓ Initialized from leaderboard dataset")
1497
+ return
1498
+
1499
+ # Try constructing from issue metadata (fast, memory-efficient)
1500
+ try:
1501
+ cache_dict = construct_leaderboard_from_metadata()
1502
+ # Check if there's actually meaningful data (at least one agent with issues)
1503
+ has_data = any(entry.get('total_issues', 0) > 0 for entry in cache_dict.values())
1504
+ if cache_dict and has_data:
1505
+ save_leaderboard_to_hf(cache_dict)
1506
+ print("✓ Initialized from issue metadata")
1507
+ return
1508
+ except Exception as e:
1509
+ print(f"Could not construct from metadata: {e}")
1510
+
1511
+ # If in debug mode and no data available, mine immediately
1512
+ if DEBUG_MODE:
1513
+ print("\n🐛 DEBUG MODE: No data available, mining immediately (up to 10 issues per query per agent)...")
1514
+ agents = load_agents_from_hf()
1515
+ if agents:
1516
+ print(f"✓ Loaded {len(agents)} agents from HuggingFace")
1517
+ print("⛏️ Mining GitHub data in debug mode (limited to 10 issues per query)...")
1518
+ cache_dict = update_all_agents_incremental()
1519
+ if cache_dict:
1520
+ # In debug mode, this won't actually save to HF
1521
+ save_leaderboard_to_hf(cache_dict)
1522
+ print("✓ Debug mining complete (data NOT saved to HuggingFace)")
1523
+ return
1524
+ else:
1525
+ print("⚠️ No agents found. Waiting for first submission...")
1526
+ return
1527
+
1528
+ # Production mode: Fallback to full incremental mining from GitHub
1529
+ agents = load_agents_from_hf()
1530
+ if agents:
1531
+ print(f"✓ Loaded {len(agents)} agents from HuggingFace")
1532
+ print("⛏️ Mining GitHub data (this may take a while)...")
1533
+ cache_dict = update_all_agents_incremental()
1534
+ if cache_dict:
1535
+ save_leaderboard_to_hf(cache_dict)
1536
+ return
1537
+
1538
+ # No data available
1539
+ print("⚠️ No data sources available. Waiting for first submission...")
1540
+
1541
+
1542
+ # =============================================================================
1543
+ # UI FUNCTIONS
1544
+ # =============================================================================
1545
+
1546
+ def create_monthly_metrics_plot():
1547
+ """
1548
+ Create a Plotly figure with dual y-axes showing:
1549
+ - Left y-axis: Resolved Rate (%) as line curves
1550
+ - Right y-axis: Total Issues created as bar charts
1551
+
1552
+ Each agent gets a unique color for both their line and bars.
1553
+ """
1554
+ metrics = calculate_monthly_metrics_by_agent()
1555
+
1556
+ if not metrics['agents'] or not metrics['months']:
1557
+ # Return an empty figure with a message
1558
+ fig = go.Figure()
1559
+ fig.add_annotation(
1560
+ text="No data available for visualization",
1561
+ xref="paper", yref="paper",
1562
+ x=0.5, y=0.5, showarrow=False,
1563
+ font=dict(size=16)
1564
+ )
1565
+ fig.update_layout(
1566
+ title=None,
1567
+ xaxis_title=None,
1568
+ height=500
1569
+ )
1570
+ return fig
1571
+
1572
+ # Create figure with secondary y-axis
1573
+ fig = make_subplots(specs=[[{"secondary_y": True}]])
1574
+
1575
+ # Define colors for agents (using a color palette)
1576
+ colors = [
1577
+ '#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
1578
+ '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'
1579
+ ]
1580
+
1581
+ agents = metrics['agents']
1582
+ months = metrics['months']
1583
+ data = metrics['data']
1584
+
1585
+ # Add traces for each agent
1586
+ for idx, agent_name in enumerate(agents):
1587
+ color = colors[idx % len(colors)]
1588
+ agent_data = data[agent_name]
1589
+
1590
+ # Add line trace for resolved rate (left y-axis)
1591
+ resolved_rates = agent_data['resolved_rates']
1592
+ # Filter out None values for plotting
1593
+ x_resolved = [month for month, rate in zip(months, resolved_rates) if rate is not None]
1594
+ y_resolved = [rate for rate in resolved_rates if rate is not None]
1595
+
1596
+ if x_resolved and y_resolved: # Only add trace if there's data
1597
+ fig.add_trace(
1598
+ go.Scatter(
1599
+ x=x_resolved,
1600
+ y=y_resolved,
1601
+ name=agent_name,
1602
+ mode='lines+markers',
1603
+ line=dict(color=color, width=2),
1604
+ marker=dict(size=6),
1605
+ legendgroup=agent_name,
1606
+ showlegend=True,
1607
+ hovertemplate='<b>%{fullData.name}</b><br>' +
1608
+ 'Month: %{x}<br>' +
1609
+ 'Resolved Rate: %{y:.2f}%<br>' +
1610
+ '<extra></extra>'
1611
+ ),
1612
+ secondary_y=False
1613
+ )
1614
+
1615
+ # Add bar trace for total issues (right y-axis)
1616
+ # Only show bars for months where agent has issues
1617
+ x_bars = []
1618
+ y_bars = []
1619
+ for month, count in zip(months, agent_data['total_issues']):
1620
+ if count > 0: # Only include months with issues
1621
+ x_bars.append(month)
1622
+ y_bars.append(count)
1623
+
1624
+ if x_bars and y_bars: # Only add trace if there's data
1625
+ fig.add_trace(
1626
+ go.Bar(
1627
+ x=x_bars,
1628
+ y=y_bars,
1629
+ name=f"{agent_name} (Issues)",
1630
+ marker=dict(color=color, opacity=0.6),
1631
+ legendgroup=agent_name,
1632
+ showlegend=False, # Don't show in legend (already shown for line)
1633
+ hovertemplate='<b>%{fullData.name}</b><br>' +
1634
+ 'Month: %{x}<br>' +
1635
+ 'Total Issues: %{y}<br>' +
1636
+ '<extra></extra>',
1637
+ offsetgroup=agent_name # Group bars by agent for proper spacing
1638
+ ),
1639
+ secondary_y=True
1640
+ )
1641
+
1642
+ # Update axes labels
1643
+ fig.update_xaxes(title_text=None)
1644
+ fig.update_yaxes(title_text="<b>Resolved Rate (%)</b>", secondary_y=False)
1645
+ fig.update_yaxes(title_text="<b>Total Issues</b>", secondary_y=True)
1646
+
1647
+ # Update layout
1648
+ fig.update_layout(
1649
+ title=None,
1650
+ hovermode='x unified',
1651
+ barmode='group',
1652
+ height=600,
1653
+ legend=dict(
1654
+ orientation="h",
1655
+ yanchor="bottom",
1656
+ y=1.02,
1657
+ xanchor="right",
1658
+ x=1
1659
+ ),
1660
+ margin=dict(l=50, r=50, t=100, b=50)
1661
+ )
1662
+
1663
+ return fig
1664
+
1665
+
1666
+ def get_leaderboard_dataframe():
1667
+ """
1668
+ Load leaderboard data from HuggingFace and convert to pandas DataFrame for display.
1669
+ Returns formatted DataFrame sorted by acceptance rate.
1670
+ """
1671
+ # Load leaderboard data from HuggingFace
1672
+ leaderboard_data = load_leaderboard_dataset()
1673
+
1674
+ if not leaderboard_data:
1675
+ # Return empty DataFrame with correct columns if no data
1676
+ column_names = [col[0] for col in LEADERBOARD_COLUMNS]
1677
+ return pd.DataFrame(columns=column_names)
1678
+
1679
+ rows = []
1680
+ for data in leaderboard_data:
1681
+ # Only include display-relevant fields
1682
+ rows.append([
1683
+ data.get('agent_name', 'Unknown'),
1684
+ data.get('organization', 'Unknown'),
1685
+ data.get('total_issues', 0),
1686
+ data.get('resolved', 0),
1687
+ data.get('resolved_rate', 0.0),
1688
+ ])
1689
+
1690
+ # Create DataFrame
1691
+ column_names = [col[0] for col in LEADERBOARD_COLUMNS]
1692
+ df = pd.DataFrame(rows, columns=column_names)
1693
+
1694
+ # Ensure numeric types
1695
+ numeric_cols = ["Total Issues", "Resolved Issues", "Resolved Rate (%)"]
1696
+ for col in numeric_cols:
1697
+ if col in df.columns:
1698
+ df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
1699
+
1700
+ # Sort by Resolved Rate (%) descending
1701
+ if "Resolved Rate (%)" in df.columns and not df.empty:
1702
+ df = df.sort_values(by="Resolved Rate (%)", ascending=False).reset_index(drop=True)
1703
+
1704
+ return df
1705
+
1706
+
1707
+ def refresh_leaderboard():
1708
+ """Manually trigger data refresh for all agents using incremental updates."""
1709
+ try:
1710
+ print("🔄 Manual refresh initiated (incremental mode)")
1711
+ cache_dict = update_all_agents_incremental()
1712
+ if cache_dict:
1713
+ save_leaderboard_to_hf(cache_dict)
1714
+ return "✅ Data refreshed successfully!", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1715
+ except Exception as e:
1716
+ error_msg = f"❌ Refresh failed: {str(e)}"
1717
+ print(error_msg)
1718
+ return error_msg, get_leaderboard_dataframe(), create_monthly_metrics_plot()
1719
+
1720
+
1721
+ def submit_agent(identifier, agent_name, organization, description, website):
1722
+ """
1723
+ Submit a new agent to the leaderboard.
1724
+ Validates input, saves submission, and fetches PR metadata (memory-efficient).
1725
+ """
1726
+ # Validate required fields
1727
+ if not identifier or not identifier.strip():
1728
+ return "❌ GitHub identifier is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1729
+ if not agent_name or not agent_name.strip():
1730
+ return "❌ Agent name is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1731
+ if not organization or not organization.strip():
1732
+ return "❌ Organization name is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1733
+ if not website or not website.strip():
1734
+ return "❌ Website URL is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1735
+
1736
+ # Clean inputs
1737
+ identifier = identifier.strip()
1738
+ agent_name = agent_name.strip()
1739
+ organization = organization.strip()
1740
+ description = description.strip()
1741
+ website = website.strip()
1742
+
1743
+ # Validate GitHub identifier
1744
+ is_valid, message = validate_github_username(identifier)
1745
+ if not is_valid:
1746
+ return f"❌ {message}", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1747
+
1748
+ # Check for duplicates by loading agents from HuggingFace
1749
+ agents = load_agents_from_hf()
1750
+ if agents:
1751
+ existing_names = {agent['github_identifier'] for agent in agents}
1752
+ if identifier in existing_names:
1753
+ return f"⚠️ Agent with identifier '{identifier}' already exists", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1754
+
1755
+ # Create submission
1756
+ submission = {
1757
+ 'agent_name': agent_name,
1758
+ 'organization': organization,
1759
+ 'github_identifier': identifier,
1760
+ 'description': description,
1761
+ 'website': website,
1762
+ }
1763
+
1764
+ # Save to HuggingFace
1765
+ if not save_agent_to_hf(submission):
1766
+ return "❌ Failed to save submission", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1767
+
1768
+ # Fetch issue metadata immediately (memory-efficient)
1769
+ token = get_github_token()
1770
+ try:
1771
+ print(f"Fetching issue metadata for {agent_name}...")
1772
+
1773
+ # Fetch lightweight metadata
1774
+ metadata_list = fetch_all_issues_metadata(identifier, agent_name, token)
1775
+
1776
+ if metadata_list:
1777
+ # Save metadata to HuggingFace
1778
+ save_issue_metadata_to_hf(metadata_list, identifier)
1779
+
1780
+ # Calculate stats from metadata
1781
+ stats = calculate_issue_stats_from_metadata(metadata_list)
1782
+
1783
+ # Load current leaderboard
1784
+ leaderboard_data = load_leaderboard_dataset()
1785
+ if not leaderboard_data:
1786
+ leaderboard_data = []
1787
+
1788
+ # Convert to dict for easy updating
1789
+ cache_dict = {entry['github_identifier']: entry for entry in leaderboard_data}
1790
+ cache_dict[identifier] = {**submission, **stats}
1791
+
1792
+ # Save to HuggingFace
1793
+ save_leaderboard_to_hf(cache_dict)
1794
+
1795
+ return f"✅ Successfully submitted {agent_name}!", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1796
+
1797
+ except Exception as e:
1798
+ error_msg = f"⚠️ Submitted {agent_name}, but failed to fetch issue data: {str(e)}"
1799
+ print(error_msg)
1800
+ import traceback
1801
+ traceback.print_exc()
1802
+ return error_msg, get_leaderboard_dataframe(), create_monthly_metrics_plot()
1803
+
1804
+
1805
+ # =============================================================================
1806
+ # BACKGROUND TASKS
1807
+ # =============================================================================
1808
+
1809
+ def daily_update_task():
1810
+ """
1811
+ Daily scheduled task (runs at 12:00 AM UTC) for smart issue updates.
1812
+
1813
+ Strategy:
1814
+ 1. For each agent, refresh open issues from last 6 months
1815
+ 2. Skip issues that are already closed/resolved (no API calls)
1816
+ 3. Only fetch status for open issues to check if they've been closed/resolved
1817
+ 4. Update leaderboard with refreshed data
1818
+
1819
+ This is much more efficient than fetching all issues every time.
1820
+ """
1821
+ print(f"\n{'='*80}")
1822
+ print(f"🕛 Daily update started at {datetime.now(timezone.utc).isoformat()}")
1823
+ print(f"{'='*80}")
1824
+
1825
+ try:
1826
+ token = get_github_token()
1827
+
1828
+ # Load all agents
1829
+ agents = load_agents_from_hf()
1830
+ if not agents:
1831
+ print("No agents found")
1832
+ return
1833
+
1834
+ print(f"📋 Processing {len(agents)} agents...")
1835
+
1836
+ total_checked = 0
1837
+ total_updated = 0
1838
+
1839
+ # Refresh open issues for each agent (last 6 months)
1840
+ for agent in agents:
1841
+ identifier = agent.get('github_identifier')
1842
+ agent_name = agent.get('agent_name', 'Unknown')
1843
+
1844
+ if not identifier:
1845
+ continue
1846
+
1847
+ print(f"\n{'='*60}")
1848
+ print(f"Processing: {agent_name} ({identifier})")
1849
+ print(f"{'='*60}")
1850
+
1851
+ # Refresh open issues from last 6 months
1852
+ checked, updated = refresh_open_issues_for_agent(identifier, token)
1853
+ total_checked += checked
1854
+ total_updated += updated
1855
+
1856
+ print(f"\n{'='*80}")
1857
+ print(f"📊 Refresh Summary:")
1858
+ print(f" Total open issues checked: {total_checked}")
1859
+ print(f" Issues updated (closed/resolved): {total_updated}")
1860
+ print(f"{'='*80}")
1861
+
1862
+ # Reconstruct leaderboard from all stored metadata
1863
+ print(f"\n📈 Rebuilding leaderboard from refreshed data...")
1864
+ cache_dict = construct_leaderboard_from_metadata()
1865
+
1866
+ if cache_dict:
1867
+ # Save leaderboard
1868
+ save_leaderboard_to_hf(cache_dict)
1869
+ print("✓ Leaderboard updated successfully")
1870
+
1871
+ print(f"\n✅ Daily update completed at {datetime.now(timezone.utc).isoformat()}")
1872
+
1873
+ except Exception as e:
1874
+ print(f"✗ Daily update failed: {str(e)}")
1875
+ import traceback
1876
+ traceback.print_exc()
1877
+
1878
+
1879
+ # =============================================================================
1880
+ # GRADIO APPLICATION
1881
+ # =============================================================================
1882
+
1883
+ # Initialize data before creating UI
1884
+ if DEBUG_MODE:
1885
+ print("\n" + "="*80)
1886
+ print("🐛 DEBUG MODE ENABLED 🐛")
1887
+ print("="*80)
1888
+ print("Issue retrieval is limited to 10 issues per query pattern per agent")
1889
+
1890
+ # Show how debug mode was enabled
1891
+ if args.debug:
1892
+ print("Enabled via: command-line flag '--debug'")
1893
+ print("To disable: run without '--debug' flag")
1894
+ else:
1895
+ print("Enabled via: DEBUG_MODE environment variable")
1896
+ print("To disable: run with '--no-debug' flag or unset DEBUG_MODE")
1897
+
1898
+ print("="*80 + "\n")
1899
+ else:
1900
+ print("\n🚀 Starting in PRODUCTION MODE - full issue retrieval enabled")
1901
+ if args.no_debug:
1902
+ print(" (Explicitly set via '--no-debug' flag)")
1903
+ print()
1904
+
1905
+ initialize_data()
1906
+
1907
+ # Start APScheduler for daily updates at 12:00 AM UTC
1908
+ scheduler = BackgroundScheduler(timezone="UTC")
1909
+ scheduler.add_job(
1910
+ daily_update_task,
1911
+ trigger=CronTrigger(hour=0, minute=0), # 12:00 AM UTC daily
1912
+ id='daily_issue_refresh',
1913
+ name='Daily Issue Status Refresh',
1914
+ replace_existing=True
1915
+ )
1916
+ scheduler.start()
1917
+ print("✓ Scheduler started: Daily updates at 12:00 AM UTC")
1918
+
1919
+ # Create Gradio interface
1920
+ with gr.Blocks(title="SWE Agent Issue Leaderboard", theme=gr.themes.Soft()) as app:
1921
+
1922
+ gr.Markdown("# 🏆 SWE Agent Issue Leaderboard")
1923
+ gr.Markdown("Track and compare GitHub issue resolution statistics for SWE agents")
1924
+
1925
+ with gr.Tabs():
1926
+
1927
+ # Leaderboard Tab
1928
+ with gr.Tab("📊 Leaderboard"):
1929
+ with gr.Row():
1930
+ refresh_button = gr.Button("🔄 Refresh Data", variant="primary")
1931
+ status_display = gr.Textbox(
1932
+ label="Status",
1933
+ value="Ready",
1934
+ interactive=False,
1935
+ scale=3
1936
+ )
1937
+
1938
+ leaderboard_table = Leaderboard(
1939
+ value=get_leaderboard_dataframe(),
1940
+ datatype=LEADERBOARD_COLUMNS,
1941
+ search_columns=["Agent Name", "Organization"],
1942
+ filter_columns=["Resolved Rate (%)"]
1943
+ )
1944
+
1945
+ gr.Markdown("### Monthly Metrics")
1946
+ gr.Markdown("Track resolution rates and issue activity over time")
1947
+
1948
+ monthly_plot = gr.Plot(
1949
+ value=create_monthly_metrics_plot(),
1950
+ label="Monthly Issue Metrics"
1951
+ )
1952
+
1953
+ refresh_button.click(
1954
+ fn=refresh_leaderboard,
1955
+ outputs=[status_display, leaderboard_table, monthly_plot]
1956
+ )
1957
+
1958
+ # Submit Agent Tab
1959
+ with gr.Tab("➕ Submit Agent"):
1960
+
1961
+ gr.Markdown("### Submit Your Agent")
1962
+ gr.Markdown("Fill in the details below to add your agent to the leaderboard. Make sure you're logged in to HuggingFace CLI on your machine.")
1963
+
1964
+ with gr.Row():
1965
+ with gr.Column():
1966
+ github_input = gr.Textbox(
1967
+ label="GitHub Identifier*",
1968
+ placeholder="Your agent username (e.g., my-agent-bot)"
1969
+ )
1970
+ name_input = gr.Textbox(
1971
+ label="Agent Name*",
1972
+ placeholder="Your agent's display name"
1973
+ )
1974
+
1975
+ with gr.Column():
1976
+ organization_input = gr.Textbox(
1977
+ label="Organization*",
1978
+ placeholder="Your organization or team name"
1979
+ )
1980
+ description_input = gr.Textbox(
1981
+ label="Description",
1982
+ placeholder="Brief description of your agent",
1983
+ lines=3
1984
+ )
1985
+ website_input = gr.Textbox(
1986
+ label="Website",
1987
+ placeholder="https://your-agent-website.com"
1988
+ )
1989
+
1990
+ submit_button = gr.Button(
1991
+ "Submit Agent",
1992
+ variant="primary"
1993
+ )
1994
+ submission_status = gr.Textbox(
1995
+ label="Submission Status",
1996
+ interactive=False
1997
+ )
1998
+
1999
+ # Event handler
2000
+ submit_button.click(
2001
+ fn=submit_agent,
2002
+ inputs=[github_input, name_input, organization_input, description_input, website_input],
2003
+ outputs=[submission_status, leaderboard_table, monthly_plot]
2004
+ )
2005
+
2006
+
2007
+ # Launch application
2008
+ if __name__ == "__main__":
2009
+ app.launch()
msr.py ADDED
@@ -0,0 +1,795 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Standalone miner to fetch issue metadata and update the leaderboard immediately.
3
+
4
+ This script reuses the same logic and on-disk/HuggingFace formats as app.py, but
5
+ has no UI or scheduler. You can run it once, or run it in a loop for hours.
6
+
7
+ Datasets used:
8
+ - Agents: SWE-Arena/swe_agents
9
+ - Issue metadata: SWE-Arena/issue_metadata
10
+ - Leaderboard: SWE-Arena/issue_leaderboard
11
+
12
+ Environment:
13
+ - Requires HF_TOKEN (for HuggingFace uploads)
14
+ - Optional GITHUB_TOKEN (highly recommended to avoid low rate limits)
15
+ - Reads .env if present
16
+
17
+ CLI flags:
18
+ - --debug / --no-debug: Same semantics as app.py (debug limits to 10 issues/pattern
19
+ and DOES NOT save to HF, mirroring app.py behavior).
20
+ - --loop: Keep running in a loop.
21
+ - --interval-seconds N: Sleep between loops (default 3600 seconds).
22
+
23
+ Note: In production mode (default), data will be saved to HuggingFace datasets.
24
+ """
25
+
26
+ import argparse
27
+ import json
28
+ import os
29
+ import random
30
+ import sys
31
+ import time
32
+ from collections import defaultdict
33
+ from datetime import datetime, timezone, timedelta
34
+
35
+ import pandas as pd
36
+ import requests
37
+ from dotenv import load_dotenv
38
+ from huggingface_hub import HfApi, hf_hub_download
39
+
40
+
41
+ # =============================================================================
42
+ # Environment & CLI
43
+ # =============================================================================
44
+
45
+ load_dotenv()
46
+
47
+ parser = argparse.ArgumentParser(description="Immediate issue miner for SWE Arena")
48
+ parser.add_argument("--debug", "--DEBUG", action="store_true", help="Enable debug mode (limits issue retrieval to 10 per query; does NOT save to HF)")
49
+ parser.add_argument("--no-debug", "--production", action="store_true", help="Explicitly disable debug mode (force production mode)")
50
+ parser.add_argument("--loop", action="store_true", help="Run in a loop until interrupted")
51
+ parser.add_argument("--interval-seconds", type=int, default=3600, help="Sleep interval between loops in seconds (default: 3600)")
52
+ args = parser.parse_args()
53
+
54
+ # DEBUG MODE priority: 1) flags, 2) env var, 3) default False
55
+ if args.no_debug:
56
+ DEBUG_MODE = False
57
+ elif args.debug:
58
+ DEBUG_MODE = True
59
+ else:
60
+ DEBUG_MODE = os.getenv("DEBUG_MODE", "False").lower() in ("true", "1", "yes")
61
+
62
+
63
+ # =============================================================================
64
+ # Constants (match app.py)
65
+ # =============================================================================
66
+
67
+ DEBUG_LEADERBOARD_CACHE = {}
68
+ DEBUG_ISSUE_METADATA_CACHE = defaultdict(list)
69
+
70
+ AGENTS_REPO = "SWE-Arena/swe_agents"
71
+ LEADERBOARD_REPO = "SWE-Arena/issue_leaderboard"
72
+ ISSUE_METADATA_REPO = "SWE-Arena/issue_metadata"
73
+
74
+
75
+ # =============================================================================
76
+ # Utilities & I/O (match app.py behavior exactly)
77
+ # =============================================================================
78
+
79
+ def load_jsonl(filename):
80
+ """Load JSONL file and return list of dictionaries."""
81
+ if not os.path.exists(filename):
82
+ return []
83
+
84
+ data = []
85
+ with open(filename, 'r', encoding='utf-8') as f:
86
+ for line in f:
87
+ line = line.strip()
88
+ if line:
89
+ try:
90
+ entry = json.loads(line)
91
+ data.append(entry)
92
+ except json.JSONDecodeError as e:
93
+ print(f"Warning: Skipping invalid JSON line: {e}")
94
+ return data
95
+
96
+
97
+ def save_jsonl(filename, data):
98
+ """Save list of dictionaries to JSONL file."""
99
+ with open(filename, 'w', encoding='utf-8') as f:
100
+ for item in data:
101
+ f.write(json.dumps(item) + '\n')
102
+
103
+
104
+ def cache_to_dict(cache_list):
105
+ return {entry['github_identifier']: entry for entry in cache_list}
106
+
107
+
108
+ def dict_to_cache(cache_dict):
109
+ return list(cache_dict.values())
110
+
111
+
112
+ def get_github_token():
113
+ token = os.getenv('GITHUB_TOKEN')
114
+ if not token:
115
+ print("Warning: GITHUB_TOKEN not found. API rate limits: 60/hour (authenticated: 5000/hour)")
116
+ return token
117
+
118
+
119
+ def get_hf_token():
120
+ token = os.getenv('HF_TOKEN')
121
+ if not token:
122
+ print("Warning: HF_TOKEN not found in environment variables")
123
+ return token
124
+
125
+
126
+ def upload_with_retry(api, path_or_fileobj, path_in_repo, repo_id, repo_type, token, max_retries=5):
127
+ """
128
+ Upload file to HuggingFace with exponential backoff retry logic.
129
+
130
+ Args:
131
+ api: HfApi instance
132
+ path_or_fileobj: Local file path to upload
133
+ path_in_repo: Target path in the repository
134
+ repo_id: Repository ID
135
+ repo_type: Type of repository (e.g., "dataset")
136
+ token: HuggingFace token
137
+ max_retries: Maximum number of retry attempts
138
+
139
+ Returns:
140
+ True if upload succeeded, raises exception if all retries failed
141
+ """
142
+ delay = 2.0 # Initial delay in seconds
143
+
144
+ for attempt in range(max_retries):
145
+ try:
146
+ api.upload_file(
147
+ path_or_fileobj=path_or_fileobj,
148
+ path_in_repo=path_in_repo,
149
+ repo_id=repo_id,
150
+ repo_type=repo_type,
151
+ token=token
152
+ )
153
+ if attempt > 0:
154
+ print(f" ✓ Upload succeeded on attempt {attempt + 1}/{max_retries}")
155
+ return True
156
+
157
+ except Exception as e:
158
+ if attempt < max_retries - 1:
159
+ wait_time = delay + random.uniform(0, 1.0)
160
+ print(f" ⚠️ Upload failed (attempt {attempt + 1}/{max_retries}): {str(e)}")
161
+ print(f" ⏳ Retrying in {wait_time:.1f} seconds...")
162
+ time.sleep(wait_time)
163
+ delay = min(delay * 2, 60.0) # Exponential backoff, max 60s
164
+ else:
165
+ print(f" ✗ Upload failed after {max_retries} attempts: {str(e)}")
166
+ raise
167
+
168
+
169
+ # =============================================================================
170
+ # GitHub API with backoff (same as app.py)
171
+ # =============================================================================
172
+
173
+ def request_with_backoff(method, url, *, headers=None, params=None, json_body=None, data=None, max_retries=10, timeout=30):
174
+ delay = 1.0
175
+ for attempt in range(max_retries):
176
+ try:
177
+ resp = requests.request(
178
+ method,
179
+ url,
180
+ headers=headers or {},
181
+ params=params,
182
+ json=json_body,
183
+ data=data,
184
+ timeout=timeout
185
+ )
186
+
187
+ status = resp.status_code
188
+
189
+ if 200 <= status < 300:
190
+ return resp
191
+
192
+ if status in (403, 429) or 500 <= status < 600:
193
+ wait = None
194
+ retry_after = resp.headers.get('Retry-After') or resp.headers.get('retry-after')
195
+ if retry_after:
196
+ try:
197
+ wait = float(retry_after)
198
+ except Exception:
199
+ wait = None
200
+ if wait is None and status in (403, 429):
201
+ reset_hdr = resp.headers.get('X-RateLimit-Reset') or resp.headers.get('x-ratelimit-reset')
202
+ if reset_hdr:
203
+ try:
204
+ reset_ts = int(float(reset_hdr))
205
+ wait = max(reset_ts - time.time() + 2, 1)
206
+ except Exception:
207
+ wait = None
208
+ if wait is None:
209
+ wait = delay + random.uniform(0, 0.5)
210
+ wait = max(1.0, min(wait, 120.0))
211
+ print(f"GitHub API {status}. Backing off {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
212
+ time.sleep(wait)
213
+ delay = min(delay * 2, 60.0)
214
+ continue
215
+
216
+ return resp
217
+
218
+ except requests.RequestException as e:
219
+ wait = delay + random.uniform(0, 0.5)
220
+ wait = max(1.0, min(wait, 60.0))
221
+ print(f"Request error: {e}. Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
222
+ time.sleep(wait)
223
+ delay = min(delay * 2, 60.0)
224
+
225
+ print(f"Exceeded max retries for {url}")
226
+ return None
227
+
228
+
229
+ def fetch_issues_with_time_partition(base_query, start_date, end_date, headers, issues_by_id, debug_limit=None):
230
+ start_str = start_date.strftime('%Y-%m-%d')
231
+ end_str = end_date.strftime('%Y-%m-%d')
232
+ query = f'{base_query} created:{start_str}..{end_str}'
233
+ print(f" Searching range {start_str} to {end_str}...")
234
+ page = 1
235
+ per_page = 100
236
+ total_in_partition = 0
237
+ while True:
238
+ if debug_limit is not None and total_in_partition >= debug_limit:
239
+ print(f" 🐛 DEBUG MODE: Reached limit of {debug_limit} issues, stopping...")
240
+ return total_in_partition
241
+ url = 'https://api.github.com/search/issues'
242
+ params = {
243
+ 'q': query,
244
+ 'per_page': per_page,
245
+ 'page': page,
246
+ 'sort': 'created',
247
+ 'order': 'asc'
248
+ }
249
+ try:
250
+ response = request_with_backoff('GET', url, headers=headers, params=params)
251
+ if response is None:
252
+ print(f" Error: retries exhausted for range {start_str} to {end_str}")
253
+ return total_in_partition
254
+ if response.status_code != 200:
255
+ print(f" Error: HTTP {response.status_code} for range {start_str} to {end_str}")
256
+ return total_in_partition
257
+ data = response.json()
258
+ total_count = data.get('total_count', 0)
259
+ items = data.get('items', [])
260
+ if not items:
261
+ break
262
+ for issue in items:
263
+ issue_id = issue.get('id')
264
+ if issue_id and issue_id not in issues_by_id:
265
+ issues_by_id[issue_id] = issue
266
+ total_in_partition += 1
267
+ if total_count > 1000 and page == 10:
268
+ print(f" ⚠️ Hit 1000-result limit ({total_count} total). Splitting time range...")
269
+ time_diff = end_date - start_date
270
+ mid_date = start_date + time_diff / 2
271
+ count1 = fetch_issues_with_time_partition(base_query, start_date, mid_date, headers, issues_by_id, debug_limit)
272
+ count2 = fetch_issues_with_time_partition(base_query, mid_date + timedelta(days=1), end_date, headers, issues_by_id, debug_limit)
273
+ return count1 + count2
274
+ if len(items) < per_page or page >= 10:
275
+ break
276
+ page += 1
277
+ time.sleep(0.5)
278
+ except Exception as e:
279
+ print(f" Error fetching range {start_str} to {end_str}: {str(e)}")
280
+ return total_in_partition
281
+ if total_in_partition > 0:
282
+ print(f" ✓ Found {total_in_partition} issues in range {start_str} to {end_str}")
283
+ return total_in_partition
284
+
285
+
286
+ def extract_issue_metadata(issue):
287
+ created_at = issue.get('created_at')
288
+ closed_at = issue.get('closed_at')
289
+ state = issue.get('state')
290
+ state_reason = issue.get('state_reason')
291
+ return {
292
+ 'html_url': issue.get('html_url'),
293
+ 'created_at': created_at,
294
+ 'closed_at': closed_at,
295
+ 'state': state,
296
+ 'state_reason': state_reason
297
+ }
298
+
299
+
300
+ def fetch_all_issues_metadata(identifier, agent_name, token=None, start_from_date=None, year=None, exclude_dates=None):
301
+ headers = {'Authorization': f'token {token}'} if token else {}
302
+ debug_limit_per_pattern = 10 if DEBUG_MODE else None
303
+ if DEBUG_MODE:
304
+ print(f"\n🐛 DEBUG MODE ENABLED: Limiting to {debug_limit_per_pattern} issues per query pattern")
305
+ # Define query patterns for issues:
306
+ # 1) author pattern: issues authored by the identifier
307
+ # 2) assignee pattern: issues assigned to the identifier
308
+ # 3) mentions pattern: issues mentioning the identifier
309
+ stripped_id = identifier.replace('[bot]', '')
310
+ query_patterns = []
311
+ # Always add author pattern
312
+ query_patterns.append(f'is:issue author:{identifier}')
313
+ # Add assignee and mentions patterns
314
+ if stripped_id:
315
+ query_patterns.append(f'is:issue assignee:{stripped_id}')
316
+ query_patterns.append(f'is:issue mentions:{stripped_id}')
317
+ issues_by_id = {}
318
+ current_time = datetime.now(timezone.utc)
319
+ six_months_ago = current_time - timedelta(days=180)
320
+ if start_from_date:
321
+ start_date = max(start_from_date, six_months_ago)
322
+ else:
323
+ start_date = six_months_ago
324
+ end_date = current_time
325
+ for query_pattern in query_patterns:
326
+ print(f"\n🔍 Searching with query: {query_pattern}")
327
+ print(f" Time range: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
328
+ pattern_start_time = time.time()
329
+ initial_count = len(issues_by_id)
330
+ issues_found = fetch_issues_with_time_partition(
331
+ query_pattern,
332
+ start_date,
333
+ end_date,
334
+ headers,
335
+ issues_by_id,
336
+ debug_limit_per_pattern
337
+ )
338
+ pattern_duration = time.time() - pattern_start_time
339
+ new_issues = len(issues_by_id) - initial_count
340
+ print(f" ✓ Pattern complete: {new_issues} new issues found ({issues_found} total fetched, {len(issues_by_id) - initial_count - (issues_found - new_issues)} duplicates)")
341
+ print(f" ⏱️ Time taken: {pattern_duration:.1f} seconds")
342
+ time.sleep(0.2 if DEBUG_MODE else 1.0)
343
+ all_issues = list(issues_by_id.values())
344
+
345
+ # Filter out issues from excluded dates if specified
346
+ if exclude_dates:
347
+ filtered_issues = []
348
+ excluded_count = 0
349
+ for issue in all_issues:
350
+ created_at = issue.get('created_at')
351
+ if created_at:
352
+ try:
353
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
354
+ issue_date = dt.date()
355
+ if issue_date not in exclude_dates:
356
+ filtered_issues.append(issue)
357
+ else:
358
+ excluded_count += 1
359
+ except Exception:
360
+ filtered_issues.append(issue) # Keep issues with unparseable dates
361
+ else:
362
+ filtered_issues.append(issue) # Keep issues without created_at
363
+
364
+ if excluded_count > 0:
365
+ print(f" ⏭️ Skipped {excluded_count} issues from already-mined dates")
366
+ all_issues = filtered_issues
367
+
368
+ if DEBUG_MODE:
369
+ print(f"\n✅ COMPLETE (DEBUG MODE): Found {len(all_issues)} unique issues for {identifier}")
370
+ print(f" Note: In production mode, this would fetch ALL issues")
371
+ else:
372
+ print(f"\n✅ COMPLETE: Found {len(all_issues)} unique issues for {identifier}")
373
+ print("📦 Extracting minimal metadata...")
374
+ metadata_list = [extract_issue_metadata(issue) for issue in all_issues]
375
+ original_size = sys.getsizeof(str(all_issues))
376
+ metadata_size = sys.getsizeof(str(metadata_list))
377
+ savings_pct = ((original_size - metadata_size) / original_size * 100) if original_size > 0 else 0
378
+ print(f"💾 Memory efficiency: {original_size // 1024}KB → {metadata_size // 1024}KB (saved {savings_pct:.1f}%)")
379
+ return metadata_list
380
+
381
+
382
+ def group_metadata_by_date(metadata_list):
383
+ grouped = defaultdict(list)
384
+ for issue_meta in metadata_list:
385
+ created_at = issue_meta.get('created_at')
386
+ if not created_at:
387
+ continue
388
+ try:
389
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
390
+ key = (dt.year, dt.month, dt.day)
391
+ grouped[key].append(issue_meta)
392
+ except Exception as e:
393
+ print(f"Warning: Could not parse date '{created_at}': {e}")
394
+ return dict(grouped)
395
+
396
+
397
+ def save_issue_metadata_to_hf(metadata_list, agent_identifier):
398
+ if DEBUG_MODE:
399
+ global DEBUG_ISSUE_METADATA_CACHE
400
+ existing = {issue['html_url']: issue for issue in DEBUG_ISSUE_METADATA_CACHE[agent_identifier] if issue.get('html_url')}
401
+ new = {issue['html_url']: issue for issue in metadata_list if issue.get('html_url')}
402
+ existing.update(new)
403
+ DEBUG_ISSUE_METADATA_CACHE[agent_identifier] = list(existing.values())
404
+ print(f"🐛 DEBUG MODE: Saved to in-memory cache only ({len(metadata_list)} issues) - NOT saved to HuggingFace")
405
+ return True
406
+ try:
407
+ token = get_hf_token()
408
+ if not token:
409
+ raise Exception("No HuggingFace token found")
410
+ api = HfApi()
411
+ grouped = group_metadata_by_date(metadata_list)
412
+ for (issue_year, month, day), day_metadata in grouped.items():
413
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
414
+ filename = f"{agent_identifier}/{issue_year}.{month:02d}.{day:02d}.jsonl"
415
+ local_filename = f"{issue_year}.{month:02d}.{day:02d}.jsonl"
416
+ print(f"📤 Uploading {len(day_metadata)} issues to {filename}...")
417
+ existing_metadata = []
418
+ try:
419
+ file_path = hf_hub_download(
420
+ repo_id=ISSUE_METADATA_REPO,
421
+ filename=filename,
422
+ repo_type="dataset",
423
+ token=token
424
+ )
425
+ existing_metadata = load_jsonl(file_path)
426
+ print(f" Found {len(existing_metadata)} existing issues in {filename}")
427
+ except Exception:
428
+ print(f" No existing file found for {filename}, creating new")
429
+ existing_by_url = {meta['html_url']: meta for meta in existing_metadata if meta.get('html_url')}
430
+ new_by_url = {meta['html_url']: meta for meta in day_metadata if meta.get('html_url')}
431
+ existing_by_url.update(new_by_url)
432
+ merged_metadata = list(existing_by_url.values())
433
+ save_jsonl(local_filename, merged_metadata)
434
+ try:
435
+ upload_with_retry(
436
+ api=api,
437
+ path_or_fileobj=local_filename,
438
+ path_in_repo=filename,
439
+ repo_id=ISSUE_METADATA_REPO,
440
+ repo_type="dataset",
441
+ token=token
442
+ )
443
+ print(f" ✓ Saved {len(merged_metadata)} total issues to {filename}")
444
+ finally:
445
+ # Always clean up the local file, even if upload fails
446
+ if os.path.exists(local_filename):
447
+ os.remove(local_filename)
448
+ return True
449
+ except Exception as e:
450
+ print(f"✗ Error saving issue metadata: {str(e)}")
451
+ return False
452
+
453
+
454
+ def load_agents_from_hf():
455
+ try:
456
+ api = HfApi()
457
+ agents = []
458
+ files = api.list_repo_files(repo_id=AGENTS_REPO, repo_type="dataset")
459
+ json_files = [f for f in files if f.endswith('.json')]
460
+ print(f"Found {len(json_files)} agent files in {AGENTS_REPO}")
461
+ for json_file in json_files:
462
+ try:
463
+ file_path = hf_hub_download(
464
+ repo_id=AGENTS_REPO,
465
+ filename=json_file,
466
+ repo_type="dataset"
467
+ )
468
+ with open(file_path, 'r') as f:
469
+ agent_data = json.load(f)
470
+ agents.append(agent_data)
471
+ except Exception as e:
472
+ print(f"Warning: Could not load {json_file}: {str(e)}")
473
+ continue
474
+ print(f"✓ Loaded {len(agents)} agents from HuggingFace")
475
+ return agents
476
+ except Exception as e:
477
+ print(f"Could not load agents from HuggingFace: {str(e)}")
478
+ return None
479
+
480
+
481
+ def load_issue_metadata_for_year(year):
482
+ if DEBUG_MODE and DEBUG_ISSUE_METADATA_CACHE:
483
+ all_metadata = []
484
+ for agent_identifier, metadata_list in DEBUG_ISSUE_METADATA_CACHE.items():
485
+ for issue_meta in metadata_list:
486
+ issue_with_agent = issue_meta.copy()
487
+ issue_with_agent['agent_identifier'] = agent_identifier
488
+ all_metadata.append(issue_with_agent)
489
+ if all_metadata:
490
+ print(f"🐛 DEBUG MODE: Loading issue metadata from in-memory cache ({len(all_metadata)} issues)")
491
+ return all_metadata
492
+ try:
493
+ api = HfApi()
494
+ token = get_hf_token()
495
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
496
+ # Filter for files matching the year pattern: [agent_identifier]/YYYY.MM.DD.jsonl
497
+ year_str = str(year)
498
+ year_files = []
499
+ for f in files:
500
+ if f.endswith('.jsonl'):
501
+ parts = f.split('/')
502
+ if len(parts) == 2: # [agent_identifier]/YYYY.MM.DD.jsonl
503
+ filename = parts[1]
504
+ if filename.startswith(year_str + '.'):
505
+ year_files.append(f)
506
+ print(f"📥 Loading issue metadata for {year} ({len(year_files)} daily files across all agents)...")
507
+ all_metadata = []
508
+ for filename in year_files:
509
+ try:
510
+ parts = filename.split('/')
511
+ if len(parts) != 2:
512
+ print(f" Warning: Unexpected filename format: {filename}")
513
+ continue
514
+ agent_identifier = parts[0]
515
+ file_path = hf_hub_download(
516
+ repo_id=ISSUE_METADATA_REPO,
517
+ filename=filename,
518
+ repo_type="dataset",
519
+ token=token
520
+ )
521
+ day_metadata = load_jsonl(file_path)
522
+ for issue_meta in day_metadata:
523
+ issue_meta['agent_identifier'] = agent_identifier
524
+ all_metadata.extend(day_metadata)
525
+ print(f" ✓ Loaded {len(day_metadata)} issues from {filename}")
526
+ except Exception as e:
527
+ print(f" Warning: Could not load {filename}: {str(e)}")
528
+ print(f"✓ Loaded {len(all_metadata)} total issues for {year}")
529
+ return all_metadata
530
+ except Exception as e:
531
+ print(f"✗ Error loading issue metadata for {year}: {str(e)}")
532
+ return []
533
+
534
+
535
+ def get_latest_issue_date_for_agent(agent_identifier):
536
+ try:
537
+ api = HfApi()
538
+ token = get_hf_token()
539
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
540
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
541
+ agent_pattern = f"{agent_identifier}/"
542
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
543
+ if not agent_files:
544
+ return None
545
+ latest_date = None
546
+ for filename in agent_files:
547
+ try:
548
+ file_path = hf_hub_download(
549
+ repo_id=ISSUE_METADATA_REPO,
550
+ filename=filename,
551
+ repo_type="dataset",
552
+ token=token
553
+ )
554
+ metadata = load_jsonl(file_path)
555
+ for issue in metadata:
556
+ created_at = issue.get('created_at')
557
+ if created_at:
558
+ try:
559
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
560
+ if latest_date is None or dt > latest_date:
561
+ latest_date = dt
562
+ except Exception:
563
+ continue
564
+ except Exception:
565
+ continue
566
+ return latest_date
567
+ except Exception:
568
+ return None
569
+
570
+
571
+ def get_already_mined_dates(agent_identifier, n_months=6):
572
+ """
573
+ Get set of dates that have already been mined for an agent.
574
+
575
+ Args:
576
+ agent_identifier: GitHub identifier of the agent
577
+ n_months: Number of months to look back (default: 6)
578
+
579
+ Returns:
580
+ Set of date objects (datetime.date) that already have data files
581
+ """
582
+ try:
583
+ api = HfApi()
584
+
585
+ # Calculate date range
586
+ today = datetime.now(timezone.utc)
587
+ n_months_ago = today - timedelta(days=30 * n_months)
588
+
589
+ # List all files in the repository
590
+ files = api.list_repo_files(repo_id=ISSUE_METADATA_REPO, repo_type="dataset")
591
+
592
+ # Filter for files in this agent's folder
593
+ agent_pattern = f"{agent_identifier}/"
594
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
595
+
596
+ mined_dates = set()
597
+ for filename in agent_files:
598
+ try:
599
+ # Extract date from filename: [agent_identifier]/YYYY.MM.DD.jsonl
600
+ parts = filename.split('/')
601
+ if len(parts) != 2:
602
+ continue
603
+
604
+ date_part = parts[1].replace('.jsonl', '') # Get YYYY.MM.DD
605
+ date_components = date_part.split('.')
606
+ if len(date_components) != 3:
607
+ continue
608
+
609
+ file_year, file_month, file_day = map(int, date_components)
610
+ file_date = datetime(file_year, file_month, file_day, tzinfo=timezone.utc).date()
611
+
612
+ # Only include dates within the last n_months
613
+ if n_months_ago.date() <= file_date <= today.date():
614
+ mined_dates.add(file_date)
615
+ except Exception as e:
616
+ print(f" Warning: Could not parse date from filename {filename}: {e}")
617
+ continue
618
+
619
+ return mined_dates
620
+
621
+ except Exception as e:
622
+ print(f" Warning: Could not get already-mined dates for {agent_identifier}: {str(e)}")
623
+ return set()
624
+
625
+
626
+ def save_leaderboard_to_hf(cache_dict):
627
+ if DEBUG_MODE:
628
+ global DEBUG_LEADERBOARD_CACHE
629
+ DEBUG_LEADERBOARD_CACHE = cache_dict.copy()
630
+ data_list = dict_to_cache(cache_dict)
631
+ print(f"🐛 DEBUG MODE: Saved to in-memory cache only ({len(data_list)} entries) - NOT saved to HuggingFace")
632
+ return True
633
+ try:
634
+ token = get_hf_token()
635
+ if not token:
636
+ raise Exception("No HuggingFace token found. Please set HF_TOKEN in your environment.")
637
+ data_list = dict_to_cache(cache_dict)
638
+ df = pd.DataFrame(data_list)
639
+ year = datetime.now().year
640
+ filename = f"{year}.csv"
641
+ df.to_csv(filename, index=False)
642
+ api = HfApi()
643
+ try:
644
+ upload_with_retry(
645
+ api=api,
646
+ path_or_fileobj=filename,
647
+ path_in_repo=filename,
648
+ repo_id=LEADERBOARD_REPO,
649
+ repo_type="dataset",
650
+ token=token
651
+ )
652
+ print(f"✓ Saved leaderboard to HuggingFace as {filename} ({len(data_list)} entries)")
653
+ return True
654
+ finally:
655
+ # Always clean up local file, even if upload fails
656
+ if os.path.exists(filename):
657
+ os.remove(filename)
658
+ except Exception as e:
659
+ print(f"✗ Error saving leaderboard: {str(e)}")
660
+ return False
661
+
662
+
663
+ def calculate_issue_stats_from_metadata(metadata_list):
664
+ total_issues = len(metadata_list)
665
+ resolved = sum(1 for issue_meta in metadata_list if issue_meta.get('state_reason') == 'completed')
666
+ resolved_rate = (resolved / total_issues * 100) if total_issues > 0 else 0
667
+ return {
668
+ 'total_issues': total_issues,
669
+ 'resolved': resolved,
670
+ 'resolved_rate': round(resolved_rate, 2),
671
+ }
672
+
673
+
674
+ def update_all_agents_incremental():
675
+ """
676
+ Memory-efficient incremental update of issue statistics for all agents.
677
+
678
+ Strategy:
679
+ 1. For each agent, load existing data from SWE-Arena/issue_metadata
680
+ 2. Identify already-mined dates (based on filename: YYYY.MM.DD.jsonl)
681
+ 3. Only fetch issues from dates that haven't been mined yet (within last 6 months)
682
+ 4. If no data exists at all, mine everything from scratch
683
+ 5. Store minimal metadata (not full issue objects) to avoid storage limits
684
+ 6. Construct leaderboard from ALL stored metadata (last 6 months)
685
+
686
+ Returns dictionary of all agent data with current stats.
687
+ """
688
+ token = get_github_token()
689
+ current_year = datetime.now().year
690
+ agents = load_agents_from_hf()
691
+ if not agents:
692
+ print("No agents found in HuggingFace dataset")
693
+ return {}
694
+ cache_dict = {}
695
+ for agent in agents:
696
+ identifier = agent.get('github_identifier')
697
+ agent_name = agent.get('agent_name', 'Unknown')
698
+ if not identifier:
699
+ print(f"Warning: Skipping agent without identifier: {agent}")
700
+ continue
701
+ try:
702
+ print(f"\n{'='*80}")
703
+ print(f"Processing: {agent_name} ({identifier})")
704
+ print(f"{'='*80}")
705
+
706
+ # Get already-mined dates for this agent (last 6 months)
707
+ already_mined_dates = get_already_mined_dates(identifier, n_months=6)
708
+
709
+ if already_mined_dates:
710
+ print(f"📅 Found {len(already_mined_dates)} already-mined dates")
711
+ print(f" Skipping these dates and fetching only new data...")
712
+ # Fetch only issues from dates not yet mined
713
+ new_metadata = fetch_all_issues_metadata(
714
+ identifier,
715
+ agent_name,
716
+ token,
717
+ start_from_date=None, # Use full 6-month range
718
+ exclude_dates=already_mined_dates # But exclude already-mined dates
719
+ )
720
+ else:
721
+ print(f"📅 No existing data found. Mining everything from scratch...")
722
+ # Mine everything from scratch (full 6-month range)
723
+ new_metadata = fetch_all_issues_metadata(
724
+ identifier,
725
+ agent_name,
726
+ token,
727
+ start_from_date=None
728
+ )
729
+
730
+ if new_metadata:
731
+ print(f"💾 Saving {len(new_metadata)} new issue records...")
732
+ save_issue_metadata_to_hf(new_metadata, identifier)
733
+ else:
734
+ print(f" No new issues to save")
735
+
736
+ # Load ALL metadata for current year to calculate stats (aggregates entire last 6 months)
737
+ print(f"📊 Calculating statistics from ALL stored metadata (last 6 months)...")
738
+ all_year_metadata = load_issue_metadata_for_year(current_year)
739
+ agent_metadata = [issue for issue in all_year_metadata if issue.get('agent_identifier') == identifier]
740
+ stats = calculate_issue_stats_from_metadata(agent_metadata)
741
+ cache_dict[identifier] = {
742
+ 'agent_name': agent_name,
743
+ 'organization': agent.get('organization', 'Unknown'),
744
+ 'github_identifier': identifier,
745
+ **stats
746
+ }
747
+ print(f"✓ Updated {identifier}: {stats['total_issues']} issues, {stats['resolved_rate']}% resolved")
748
+ except Exception as e:
749
+ print(f"✗ Error updating {identifier}: {str(e)}")
750
+ import traceback
751
+ traceback.print_exc()
752
+ continue
753
+ return cache_dict
754
+
755
+
756
+ def run_once():
757
+ print("\n🚀 Immediate mining run started")
758
+ cache_dict = update_all_agents_incremental()
759
+ if cache_dict:
760
+ save_leaderboard_to_hf(cache_dict)
761
+ print("✅ Immediate mining run completed\n")
762
+
763
+
764
+ def main():
765
+ if DEBUG_MODE:
766
+ print("\n" + "="*80)
767
+ print("🐛 DEBUG MODE ENABLED 🐛")
768
+ print("="*80)
769
+ print("Issue retrieval is limited to 10 issues per query pattern per agent")
770
+ print("Data will NOT be saved to HuggingFace in debug mode.")
771
+ print("="*80 + "\n")
772
+ else:
773
+ print("\n🚀 Starting in PRODUCTION MODE - full issue retrieval enabled")
774
+ print()
775
+
776
+ if not args.loop:
777
+ run_once()
778
+ return
779
+
780
+ print(f"🔁 Loop mode enabled. Interval: {args.interval_seconds} seconds")
781
+ try:
782
+ while True:
783
+ start = time.time()
784
+ run_once()
785
+ elapsed = time.time() - start
786
+ sleep_for = max(0, args.interval_seconds - int(elapsed))
787
+ if sleep_for > 0:
788
+ print(f"😴 Sleeping {sleep_for} seconds before next run...")
789
+ time.sleep(sleep_for)
790
+ except KeyboardInterrupt:
791
+ print("\n👋 Loop interrupted by user. Exiting...")
792
+
793
+
794
+ if __name__ == "__main__":
795
+ main()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ APScheduler
2
+ datasets
3
+ gradio
4
+ gradio_leaderboard
5
+ huggingface_hub
6
+ pandas
7
+ plotly
8
+ PyGithub
9
+ python-dotenv