Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -7,24 +7,359 @@ tags: | |
| 7 | 
             
            - ML-Agents-SoccerTwos
         | 
| 8 | 
             
            library_name: ml-agents
         | 
| 9 | 
             
            ---
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 10 |  | 
| 11 | 
            -
             | 
| 12 | 
            -
             | 
| 13 | 
            -
             | 
| 14 | 
            -
             | 
| 15 | 
            -
             | 
| 16 | 
            -
             | 
| 17 | 
            -
             | 
| 18 | 
            -
             | 
| 19 | 
            -
             | 
| 20 | 
            -
             | 
| 21 | 
            -
             | 
| 22 | 
            -
             | 
| 23 | 
            -
             | 
| 24 | 
            -
             | 
| 25 | 
            -
             | 
| 26 | 
            -
             | 
| 27 | 
            -
             | 
| 28 | 
            -
             | 
| 29 | 
            -
             | 
| 30 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 7 | 
             
            - ML-Agents-SoccerTwos
         | 
| 8 | 
             
            library_name: ml-agents
         | 
| 9 | 
             
            ---
         | 
| 10 | 
            +
              # ML-Agents SoccerTwos - Multi-Agent Soccer AI
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            [](https://github.com/Unity-Technologies/ml-agents)
         | 
| 13 | 
            +
            [](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#soccer-twos)
         | 
| 14 | 
            +
            [](https://opensource.org/licenses/Apache-2.0)
         | 
| 15 | 
            +
            [](https://onnx.ai/)
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            A sophisticated multi-agent reinforcement learning model trained on the **Unity ML-Agents SoccerTwos** environment. This model demonstrates advanced cooperative and competitive behaviors in a 2v2 soccer simulation, showcasing emergent team strategies and individual skill development.
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            ## 🏆 Model Overview
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            The SoccerTwos model represents a breakthrough in multi-agent reinforcement learning, where four AI agents (two teams of two players each) learn to play soccer through self-play and competitive training. The model exhibits complex behaviors including:
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            - **Team Coordination**: Agents learn to pass, coordinate positioning, and execute team strategies
         | 
| 24 | 
            +
            - **Individual Skills**: Ball control, shooting, defending, and positioning
         | 
| 25 | 
            +
            - **Emergent Behaviors**: Complex plays that emerge from simple reward structures
         | 
| 26 | 
            +
            - **Competitive Balance**: Agents adapt to opponents' strategies in real-time
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            ## 🎮 Environment Description
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            ### SoccerTwos Environment Specifications
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            **Game Setup**:
         | 
| 33 | 
            +
            - **Teams**: 2 teams (Blue vs Purple)
         | 
| 34 | 
            +
            - **Players per Team**: 2 agents
         | 
| 35 | 
            +
            - **Field**: 3D soccer field with goals, boundaries, and physics
         | 
| 36 | 
            +
            - **Objective**: Score more goals than the opponent team
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            **Physics & Mechanics**:
         | 
| 39 | 
            +
            - **Ball Physics**: Realistic ball bouncing, rolling, and collision
         | 
| 40 | 
            +
            - **Agent Movement**: 3D movement with rotation and acceleration
         | 
| 41 | 
            +
            - **Collision Detection**: Agent-to-agent, agent-to-ball, and boundary interactions
         | 
| 42 | 
            +
            - **Goal Detection**: Automated scoring system
         | 
| 43 | 
            +
             | 
| 44 | 
            +
            ### Observation Space
         | 
| 45 | 
            +
            Each agent receives:
         | 
| 46 | 
            +
            - **Vector Observations**: 336 dimensional vector including:
         | 
| 47 | 
            +
              - Agent position and velocity (x, y, z)
         | 
| 48 | 
            +
              - Agent rotation (quaternion)
         | 
| 49 | 
            +
              - Ball position and velocity
         | 
| 50 | 
            +
              - Teammate positions and velocities
         | 
| 51 | 
            +
              - Opponent positions and velocities
         | 
| 52 | 
            +
              - Goal positions and orientations
         | 
| 53 | 
            +
              - Time remaining in episode
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            ### Action Space
         | 
| 56 | 
            +
            - **Continuous Actions**: 3 dimensions
         | 
| 57 | 
            +
              - Forward/Backward movement
         | 
| 58 | 
            +
              - Left/Right movement  
         | 
| 59 | 
            +
              - Rotation (turning)
         | 
| 60 | 
            +
            - **Action Range**: [-1, 1] for each dimension
         | 
| 61 | 
            +
            - **Total Actions per Step**: 4 agents × 3 actions = 12 concurrent actions
         | 
| 62 | 
            +
             | 
| 63 | 
            +
            ## 🧠 Model Architecture
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            ### Neural Network Design
         | 
| 66 | 
            +
            - **Input Layer**: 336 neurons (observation vector)
         | 
| 67 | 
            +
            - **Hidden Layers**: Multi-layer perceptron with ReLU activations
         | 
| 68 | 
            +
            - **Output Layers**: 
         | 
| 69 | 
            +
              - **Policy Head**: 3 continuous actions (movement + rotation)
         | 
| 70 | 
            +
              - **Value Head**: Single value estimate for state evaluation
         | 
| 71 | 
            +
            - **Architecture**: Actor-Critic with shared feature extraction
         | 
| 72 | 
            +
             | 
| 73 | 
            +
            ### Training Algorithm
         | 
| 74 | 
            +
            - **Algorithm**: PPO (Proximal Policy Optimization)
         | 
| 75 | 
            +
            - **Training Type**: Self-play with competitive reward structure
         | 
| 76 | 
            +
            - **Curriculum Learning**: Progressive difficulty increase
         | 
| 77 | 
            +
            - **Multi-Agent Coordination**: Shared experiences with individual policies
         | 
| 78 | 
            +
             | 
| 79 | 
            +
            ## 📊 Training Configuration
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            ### Hyperparameters
         | 
| 82 | 
            +
            ```yaml
         | 
| 83 | 
            +
            # Core PPO Settings
         | 
| 84 | 
            +
            batch_size: 2048
         | 
| 85 | 
            +
            buffer_size: 20480
         | 
| 86 | 
            +
            learning_rate: 3e-4
         | 
| 87 | 
            +
            learning_rate_schedule: linear
         | 
| 88 | 
            +
            epsilon: 0.2
         | 
| 89 | 
            +
            beta: 5e-4
         | 
| 90 | 
            +
            lambd: 0.95
         | 
| 91 | 
            +
            num_epoch: 3
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            # Network Architecture
         | 
| 94 | 
            +
            hidden_units: 512
         | 
| 95 | 
            +
            num_layers: 2
         | 
| 96 | 
            +
            normalize: true
         | 
| 97 | 
            +
            vis_encode_type: simple
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            # Training Schedule
         | 
| 100 | 
            +
            max_steps: 50000000
         | 
| 101 | 
            +
            time_horizon: 1000
         | 
| 102 | 
            +
            summary_freq: 12000
         | 
| 103 | 
            +
            ```
         | 
| 104 | 
            +
             | 
| 105 | 
            +
            ### Reward Structure
         | 
| 106 | 
            +
            - **Goal Scoring**: +1.0 for scoring a goal
         | 
| 107 | 
            +
            - **Goal Conceding**: -1.0 for opponent scoring
         | 
| 108 | 
            +
            - **Ball Contact**: +0.001 for touching the ball
         | 
| 109 | 
            +
            - **Ball Proximity**: Small positive reward for being close to ball
         | 
| 110 | 
            +
            - **Time Penalty**: Small negative reward to encourage active play
         | 
| 111 | 
            +
             | 
| 112 | 
            +
            ## 🚀 Usage & Deployment
         | 
| 113 | 
            +
             | 
| 114 | 
            +
            ### Loading the Model (Python)
         | 
| 115 | 
            +
            ```python
         | 
| 116 | 
            +
            import onnxruntime as ort
         | 
| 117 | 
            +
            import numpy as np
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            # Load the ONNX model
         | 
| 120 | 
            +
            model_path = "SoccerTwos.onnx"
         | 
| 121 | 
            +
            session = ort.InferenceSession(model_path)
         | 
| 122 | 
            +
             | 
| 123 | 
            +
            # Get input/output names
         | 
| 124 | 
            +
            input_name = session.get_inputs()[0].name
         | 
| 125 | 
            +
            output_names = [output.name for output in session.get_outputs()]
         | 
| 126 | 
            +
             | 
| 127 | 
            +
            # Run inference
         | 
| 128 | 
            +
            def predict_action(observation):
         | 
| 129 | 
            +
                observation = np.array(observation, dtype=np.float32)
         | 
| 130 | 
            +
                observation = observation.reshape(1, -1)  # Batch dimension
         | 
| 131 | 
            +
                
         | 
| 132 | 
            +
                outputs = session.run(output_names, {input_name: observation})
         | 
| 133 | 
            +
                actions = outputs[0][0]  # Extract actions from batch
         | 
| 134 | 
            +
                
         | 
| 135 | 
            +
                return actions
         | 
| 136 | 
            +
            ```
         | 
| 137 | 
            +
             | 
| 138 | 
            +
            ### Unity Integration
         | 
| 139 | 
            +
            ```csharp
         | 
| 140 | 
            +
            // Unity C# script example
         | 
| 141 | 
            +
            using Unity.MLAgents;
         | 
| 142 | 
            +
            using Unity.MLAgents.Sensors;
         | 
| 143 | 
            +
            using Unity.MLAgents.Actuators;
         | 
| 144 | 
            +
             | 
| 145 | 
            +
            public class SoccerAgent : Agent
         | 
| 146 | 
            +
            {
         | 
| 147 | 
            +
                [SerializeField] private string modelPath = "SoccerTwos.onnx";
         | 
| 148 | 
            +
                
         | 
| 149 | 
            +
                public override void OnActionReceived(ActionBuffers actionBuffers)
         | 
| 150 | 
            +
                {
         | 
| 151 | 
            +
                    // Extract continuous actions
         | 
| 152 | 
            +
                    float moveX = actionBuffers.ContinuousActions[0];
         | 
| 153 | 
            +
                    float moveZ = actionBuffers.ContinuousActions[1]; 
         | 
| 154 | 
            +
                    float rotate = actionBuffers.ContinuousActions[2];
         | 
| 155 | 
            +
                    
         | 
| 156 | 
            +
                    // Apply actions to agent
         | 
| 157 | 
            +
                    ApplyMovement(moveX, moveZ, rotate);
         | 
| 158 | 
            +
                }
         | 
| 159 | 
            +
            }
         | 
| 160 | 
            +
            ```
         | 
| 161 | 
            +
             | 
| 162 | 
            +
            ### Evaluation Script
         | 
| 163 | 
            +
            ```python
         | 
| 164 | 
            +
            # Evaluation with metrics tracking
         | 
| 165 | 
            +
            class SoccerEvaluator:
         | 
| 166 | 
            +
                def __init__(self, model_path):
         | 
| 167 | 
            +
                    self.session = ort.InferenceSession(model_path)
         | 
| 168 | 
            +
                    self.reset_metrics()
         | 
| 169 | 
            +
                
         | 
| 170 | 
            +
                def reset_metrics(self):
         | 
| 171 | 
            +
                    self.goals_scored = 0
         | 
| 172 | 
            +
                    self.goals_conceded = 0
         | 
| 173 | 
            +
                    self.ball_touches = 0
         | 
| 174 | 
            +
                    self.episode_length = 0
         | 
| 175 | 
            +
                
         | 
| 176 | 
            +
                def evaluate_episode(self, observations, actions, rewards):
         | 
| 177 | 
            +
                    # Run full episode evaluation
         | 
| 178 | 
            +
                    total_reward = sum(rewards)
         | 
| 179 | 
            +
                    win_rate = 1.0 if self.goals_scored > self.goals_conceded else 0.0
         | 
| 180 | 
            +
                    
         | 
| 181 | 
            +
                    return {
         | 
| 182 | 
            +
                        'total_reward': total_reward,
         | 
| 183 | 
            +
                        'goals_scored': self.goals_scored,
         | 
| 184 | 
            +
                        'goals_conceded': self.goals_conceded,
         | 
| 185 | 
            +
                        'win_rate': win_rate,
         | 
| 186 | 
            +
                        'ball_touches': self.ball_touches
         | 
| 187 | 
            +
                    }
         | 
| 188 | 
            +
            ```
         | 
| 189 | 
            +
             | 
| 190 | 
            +
            ## 📈 Performance Metrics
         | 
| 191 | 
            +
             | 
| 192 | 
            +
            ### Training Results
         | 
| 193 | 
            +
            - **Total Training Steps**: 50+ million environment steps
         | 
| 194 | 
            +
            - **Training Duration**: 100+ hours on GPU cluster
         | 
| 195 | 
            +
            - **Convergence**: Stable performance achieved after ~30M steps
         | 
| 196 | 
            +
            - **Self-Play Generations**: Multiple generations of opponent strength
         | 
| 197 | 
            +
             | 
| 198 | 
            +
            ### Behavioral Analysis
         | 
| 199 | 
            +
            **Offensive Strategies**:
         | 
| 200 | 
            +
            - **Passing Coordination**: Agents learn to pass to open teammates
         | 
| 201 | 
            +
            - **Shooting Accuracy**: Improved goal-scoring from optimal positions
         | 
| 202 | 
            +
            - **Ball Control**: Sophisticated dribbling and ball manipulation
         | 
| 203 | 
            +
            - **Positioning**: Strategic positioning for receiving passes
         | 
| 204 | 
            +
             | 
| 205 | 
            +
            **Defensive Strategies**:
         | 
| 206 | 
            +
            - **Goal Defense**: Coordinated defending of goal area
         | 
| 207 | 
            +
            - **Ball Interception**: Proactive ball stealing and blocking
         | 
| 208 | 
            +
            - **Opponent Tracking**: Following and pressuring opponents
         | 
| 209 | 
            +
            - **Formation Maintenance**: Maintaining defensive shape
         | 
| 210 | 
            +
             | 
| 211 | 
            +
            ### Emergent Behaviors
         | 
| 212 | 
            +
            - **Tactical Plays**: Complex multi-agent coordination patterns
         | 
| 213 | 
            +
            - **Adaptive Strategies**: Counter-strategies to opponent behaviors
         | 
| 214 | 
            +
            - **Role Specialization**: Informal goalkeeper and striker roles
         | 
| 215 | 
            +
            - **Team Communication**: Implicit coordination without explicit communication
         | 
| 216 | 
            +
             | 
| 217 | 
            +
            ## 🔧 Technical Specifications
         | 
| 218 | 
            +
             | 
| 219 | 
            +
            ### Model File Details
         | 
| 220 | 
            +
            - **Format**: ONNX (Open Neural Network Exchange)
         | 
| 221 | 
            +
            - **File Size**: ~5-10 MB (depending on architecture)
         | 
| 222 | 
            +
            - **Input Shape**: (1, 336) - Single agent observation
         | 
| 223 | 
            +
            - **Output Shape**: (1, 3) - Continuous actions
         | 
| 224 | 
            +
            - **Precision**: Float32
         | 
| 225 | 
            +
            - **Optimization**: Optimized for inference speed
         | 
| 226 | 
            +
             | 
| 227 | 
            +
            ### System Requirements
         | 
| 228 | 
            +
            **Minimum**:
         | 
| 229 | 
            +
            - **RAM**: 4GB
         | 
| 230 | 
            +
            - **CPU**: Intel i5 or AMD Ryzen 5
         | 
| 231 | 
            +
            - **GPU**: Not required for inference
         | 
| 232 | 
            +
            - **Unity Version**: 2021.3 LTS or later
         | 
| 233 | 
            +
             | 
| 234 | 
            +
            **Recommended**:
         | 
| 235 | 
            +
            - **RAM**: 8GB+
         | 
| 236 | 
            +
            - **CPU**: Intel i7 or AMD Ryzen 7
         | 
| 237 | 
            +
            - **GPU**: NVIDIA GTX 1060 or better (for multiple simultaneous agents)
         | 
| 238 | 
            +
            - **Unity Version**: 2022.3 LTS
         | 
| 239 | 
            +
             | 
| 240 | 
            +
            ## 🎯 Evaluation Protocol
         | 
| 241 | 
            +
             | 
| 242 | 
            +
            ### Standard Evaluation
         | 
| 243 | 
            +
            ```python
         | 
| 244 | 
            +
            # Multi-episode evaluation
         | 
| 245 | 
            +
            def evaluate_model(model_path, num_episodes=100):
         | 
| 246 | 
            +
                evaluator = SoccerEvaluator(model_path)
         | 
| 247 | 
            +
                results = []
         | 
| 248 |  | 
| 249 | 
            +
                for episode in range(num_episodes):
         | 
| 250 | 
            +
                    # Run episode
         | 
| 251 | 
            +
                    episode_result = evaluator.run_episode()
         | 
| 252 | 
            +
                    results.append(episode_result)
         | 
| 253 | 
            +
                
         | 
| 254 | 
            +
                # Aggregate results
         | 
| 255 | 
            +
                avg_reward = np.mean([r['total_reward'] for r in results])
         | 
| 256 | 
            +
                win_rate = np.mean([r['win_rate'] for r in results])
         | 
| 257 | 
            +
                avg_goals = np.mean([r['goals_scored'] for r in results])
         | 
| 258 | 
            +
                
         | 
| 259 | 
            +
                return {
         | 
| 260 | 
            +
                    'average_reward': avg_reward,
         | 
| 261 | 
            +
                    'win_rate': win_rate,
         | 
| 262 | 
            +
                    'average_goals_per_episode': avg_goals,
         | 
| 263 | 
            +
                    'total_episodes': num_episodes
         | 
| 264 | 
            +
                }
         | 
| 265 | 
            +
            ```
         | 
| 266 | 
            +
             | 
| 267 | 
            +
            ### Performance Benchmarks
         | 
| 268 | 
            +
            - **Win Rate vs Random**: 95%+ win rate against random agents
         | 
| 269 | 
            +
            - **Win Rate vs Scripted**: 80%+ win rate against rule-based agents
         | 
| 270 | 
            +
            - **Average Goals per Episode**: 2.5-3.5 goals per team
         | 
| 271 | 
            +
            - **Episode Length**: Optimal game duration with active play
         | 
| 272 | 
            +
             | 
| 273 | 
            +
            ## 🔬 Research Applications
         | 
| 274 | 
            +
             | 
| 275 | 
            +
            ### Multi-Agent Learning Research
         | 
| 276 | 
            +
            - **Cooperation vs Competition**: Studying balance between team cooperation and individual performance
         | 
| 277 | 
            +
            - **Emergent Communication**: Analyzing implicit coordination mechanisms
         | 
| 278 | 
            +
            - **Transfer Learning**: Adapting skills to related multi-agent scenarios
         | 
| 279 | 
            +
            - **Curriculum Learning**: Progressive training methodologies
         | 
| 280 | 
            +
             | 
| 281 | 
            +
            ### Applications Beyond Gaming
         | 
| 282 | 
            +
            - **Robotics**: Multi-robot coordination and task allocation
         | 
| 283 | 
            +
            - **Autonomous Vehicles**: Coordinated navigation and traffic management
         | 
| 284 | 
            +
            - **Swarm Intelligence**: Collective behavior and distributed decision-making
         | 
| 285 | 
            +
            - **Economic Modeling**: Multi-agent market simulations
         | 
| 286 | 
            +
             | 
| 287 | 
            +
            ## 🛠️ Customization & Fine-tuning
         | 
| 288 | 
            +
             | 
| 289 | 
            +
            ### Training Your Own Model
         | 
| 290 | 
            +
            ```python
         | 
| 291 | 
            +
            # Custom training configuration
         | 
| 292 | 
            +
            from mlagents_envs.environment import UnityEnvironment
         | 
| 293 | 
            +
            from mlagents.trainers.settings import TrainerSettings
         | 
| 294 | 
            +
             | 
| 295 | 
            +
            # Environment setup
         | 
| 296 | 
            +
            env = UnityEnvironment(file_name="SoccerTwos")
         | 
| 297 | 
            +
            trainer_config = TrainerSettings(
         | 
| 298 | 
            +
                trainer_type="ppo",
         | 
| 299 | 
            +
                hyperparameters={
         | 
| 300 | 
            +
                    "batch_size": 2048,
         | 
| 301 | 
            +
                    "buffer_size": 20480,
         | 
| 302 | 
            +
                    "learning_rate": 3e-4,
         | 
| 303 | 
            +
                    "beta": 5e-4,
         | 
| 304 | 
            +
                    "epsilon": 0.2,
         | 
| 305 | 
            +
                    "lambd": 0.95,
         | 
| 306 | 
            +
                    "num_epoch": 3,
         | 
| 307 | 
            +
                    "learning_rate_schedule": "linear"
         | 
| 308 | 
            +
                }
         | 
| 309 | 
            +
            )
         | 
| 310 | 
            +
            ```
         | 
| 311 | 
            +
             | 
| 312 | 
            +
            ### Model Variations
         | 
| 313 | 
            +
            - **Different Team Sizes**: 1v1, 3v3, or larger teams
         | 
| 314 | 
            +
            - **Modified Rewards**: Emphasis on passing, defending, or ball control
         | 
| 315 | 
            +
            - **Environmental Changes**: Different field sizes, obstacles, or rules
         | 
| 316 | 
            +
            - **Skill Specialization**: Training specialized roles (goalkeeper, striker, etc.)
         | 
| 317 | 
            +
             | 
| 318 | 
            +
            ## 📚 Documentation & Resources
         | 
| 319 | 
            +
             | 
| 320 | 
            +
            ### Unity ML-Agents Resources
         | 
| 321 | 
            +
            - [ML-Agents GitHub](https://github.com/Unity-Technologies/ml-agents)
         | 
| 322 | 
            +
            - [SoccerTwos Environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#soccer-twos)
         | 
| 323 | 
            +
            - [Training Configuration](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md)
         | 
| 324 | 
            +
            - [Multi-Agent Training](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-ML-Agents.md#training-multiple-agents)
         | 
| 325 | 
            +
             | 
| 326 | 
            +
            ### Academic References
         | 
| 327 | 
            +
            - [Multi-Agent Reinforcement Learning](https://arxiv.org/abs/1706.02275)
         | 
| 328 | 
            +
            - [Proximal Policy Optimization](https://arxiv.org/abs/1707.06347)
         | 
| 329 | 
            +
            - [Emergent Complexity in Multi-Agent Environments](https://arxiv.org/abs/1909.07528)
         | 
| 330 | 
            +
             | 
| 331 | 
            +
            ## 🤝 Contributing
         | 
| 332 | 
            +
             | 
| 333 | 
            +
            We welcome contributions to improve the model and documentation:
         | 
| 334 | 
            +
             | 
| 335 | 
            +
            **Areas for Contribution**:
         | 
| 336 | 
            +
            - **Hyperparameter Optimization**: Finding better training configurations
         | 
| 337 | 
            +
            - **Architecture Improvements**: Enhanced neural network designs
         | 
| 338 | 
            +
            - **Evaluation Metrics**: More comprehensive performance measures
         | 
| 339 | 
            +
            - **Visualization Tools**: Better analysis and debugging tools
         | 
| 340 | 
            +
            - **Documentation**: Tutorials and examples
         | 
| 341 | 
            +
             | 
| 342 | 
            +
            ## 📝 Citation
         | 
| 343 | 
            +
             | 
| 344 | 
            +
            ```bibtex
         | 
| 345 | 
            +
            @misc{ml_agents_soccer_twos_2025,
         | 
| 346 | 
            +
              title={ML-Agents SoccerTwos: Multi-Agent Soccer AI},
         | 
| 347 | 
            +
              author={Adilbai},
         | 
| 348 | 
            +
              year={2025},
         | 
| 349 | 
            +
              publisher={Hugging Face},
         | 
| 350 | 
            +
              url={https://huggingface.co/Adilbai/ML-Agents-SoccerTwos},
         | 
| 351 | 
            +
              note={Unity ML-Agents trained model for 2v2 soccer simulation}
         | 
| 352 | 
            +
            }
         | 
| 353 | 
            +
            ```
         | 
| 354 | 
            +
             | 
| 355 | 
            +
            ## 📄 License
         | 
| 356 | 
            +
             | 
| 357 | 
            +
            This model is released under the Apache 2.0 License, consistent with Unity ML-Agents framework licensing.
         | 
| 358 | 
            +
             | 
| 359 | 
            +
            ## 🏷️ Tags
         | 
| 360 | 
            +
             | 
| 361 | 
            +
            `multi-agent` `reinforcement-learning` `unity-ml-agents` `soccer` `cooperative-ai` `competitive-ai` `onnx` `game-ai` `emergent-behavior` `team-coordination`
         | 
| 362 | 
            +
             | 
| 363 | 
            +
            ---
         | 
| 364 | 
            +
             | 
| 365 | 
            +
            **Note**: This model represents advanced multi-agent AI capabilities and serves as an excellent example of emergent team behaviors in competitive environments. The model is suitable for research, education, and game development applications.
         |