Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,202 +1,163 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            base_model: meta-llama/Meta-Llama-3.1-8B
         | 
| 3 | 
             
            library_name: peft
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 4 | 
             
            ---
         | 
| 5 |  | 
| 6 | 
            -
            # Model Card for Model ID
         | 
| 7 | 
            -
             | 
| 8 | 
            -
            <!-- Provide a quick summary of what the model is/does. -->
         | 
| 9 |  | 
| 10 |  | 
|  | |
| 11 |  | 
| 12 | 
             
            ## Model Details
         | 
| 13 |  | 
| 14 | 
            -
             | 
| 15 | 
            -
             | 
| 16 | 
            -
            <!-- Provide a longer summary of what this model is. -->
         | 
| 17 | 
            -
             | 
| 18 | 
            -
             | 
| 19 |  | 
| 20 | 
            -
            - **Developed by:**  | 
| 21 | 
            -
            - ** | 
| 22 | 
            -
            - ** | 
| 23 | 
            -
            - ** | 
| 24 | 
            -
            - ** | 
| 25 | 
            -
            - ** | 
| 26 | 
            -
            - **Finetuned from model [optional]:** [More Information Needed]
         | 
| 27 |  | 
| 28 | 
            -
            ### Model Sources  | 
| 29 |  | 
| 30 | 
            -
             | 
| 31 | 
            -
             | 
| 32 | 
            -
            - **Repository:** [More Information Needed]
         | 
| 33 | 
            -
            - **Paper [optional]:** [More Information Needed]
         | 
| 34 | 
            -
            - **Demo [optional]:** [More Information Needed]
         | 
| 35 |  | 
| 36 | 
             
            ## Uses
         | 
| 37 |  | 
| 38 | 
            -
            <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
         | 
| 39 | 
            -
             | 
| 40 | 
             
            ### Direct Use
         | 
| 41 | 
            -
             | 
| 42 | 
            -
            <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
         | 
| 43 | 
            -
             | 
| 44 | 
            -
            [More Information Needed]
         | 
| 45 |  | 
| 46 | 
             
            ### Downstream Use [optional]
         | 
|  | |
| 47 |  | 
| 48 | 
            -
            <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
         | 
| 49 | 
            -
             | 
| 50 | 
            -
            [More Information Needed]
         | 
| 51 |  | 
| 52 | 
             
            ### Out-of-Scope Use
         | 
|  | |
|  | |
| 53 |  | 
| 54 | 
            -
             | 
| 55 | 
            -
             | 
| 56 | 
            -
            [More Information Needed]
         | 
| 57 |  | 
| 58 | 
             
            ## Bias, Risks, and Limitations
         | 
|  | |
|  | |
| 59 |  | 
| 60 | 
            -
             | 
| 61 | 
            -
             | 
| 62 | 
            -
            [More Information Needed]
         | 
| 63 | 
            -
             | 
| 64 | 
            -
            ### Recommendations
         | 
| 65 | 
            -
             | 
| 66 | 
            -
            <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
         | 
| 67 | 
            -
             | 
| 68 | 
            -
            Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
         | 
| 69 |  | 
| 70 | 
             
            ## How to Get Started with the Model
         | 
| 71 |  | 
| 72 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 73 |  | 
| 74 | 
            -
            [More Information Needed]
         | 
| 75 |  | 
| 76 | 
             
            ## Training Details
         | 
| 77 |  | 
| 78 | 
             
            ### Training Data
         | 
| 79 |  | 
| 80 | 
            -
             | 
| 81 |  | 
| 82 | 
            -
            [More Information Needed]
         | 
| 83 |  | 
| 84 | 
             
            ### Training Procedure
         | 
| 85 |  | 
| 86 | 
            -
            <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
         | 
| 87 | 
            -
             | 
| 88 | 
             
            #### Preprocessing [optional]
         | 
| 89 |  | 
| 90 | 
            -
             | 
| 91 | 
            -
             | 
| 92 |  | 
| 93 | 
             
            #### Training Hyperparameters
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
| 94 |  | 
| 95 | 
            -
             | 
| 96 | 
            -
             | 
| 97 | 
            -
            #### Speeds, Sizes, Times [optional]
         | 
| 98 | 
            -
             | 
| 99 | 
            -
            <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
         | 
| 100 | 
            -
             | 
| 101 | 
            -
            [More Information Needed]
         | 
| 102 |  | 
| 103 | 
             
            ## Evaluation
         | 
| 104 |  | 
| 105 | 
            -
            <!-- This section describes the evaluation protocols and provides the results. -->
         | 
| 106 | 
            -
             | 
| 107 | 
             
            ### Testing Data, Factors & Metrics
         | 
| 108 |  | 
| 109 | 
             
            #### Testing Data
         | 
| 110 |  | 
| 111 | 
            -
             | 
| 112 | 
            -
             | 
| 113 | 
            -
            [More Information Needed]
         | 
| 114 | 
            -
             | 
| 115 | 
            -
            #### Factors
         | 
| 116 | 
            -
             | 
| 117 | 
            -
            <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
         | 
| 118 | 
            -
             | 
| 119 | 
            -
            [More Information Needed]
         | 
| 120 | 
            -
             | 
| 121 | 
            -
            #### Metrics
         | 
| 122 | 
            -
             | 
| 123 | 
            -
            <!-- These are the evaluation metrics being used, ideally with a description of why. -->
         | 
| 124 |  | 
| 125 | 
            -
            [More Information Needed]
         | 
| 126 |  | 
| 127 | 
            -
             | 
| 128 |  | 
| 129 | 
            -
             | 
|  | |
|  | |
| 130 |  | 
| 131 | 
            -
             | 
| 132 |  | 
| 133 |  | 
|  | |
| 134 |  | 
| 135 | 
            -
             | 
| 136 |  | 
| 137 | 
            -
             | 
| 138 | 
            -
             | 
| 139 | 
            -
            [More Information Needed]
         | 
| 140 | 
            -
             | 
| 141 | 
            -
            ## Environmental Impact
         | 
| 142 | 
            -
             | 
| 143 | 
            -
            <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
         | 
| 144 | 
            -
             | 
| 145 | 
            -
            Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
         | 
| 146 | 
            -
             | 
| 147 | 
            -
            - **Hardware Type:** [More Information Needed]
         | 
| 148 | 
            -
            - **Hours used:** [More Information Needed]
         | 
| 149 | 
            -
            - **Cloud Provider:** [More Information Needed]
         | 
| 150 | 
            -
            - **Compute Region:** [More Information Needed]
         | 
| 151 | 
            -
            - **Carbon Emitted:** [More Information Needed]
         | 
| 152 | 
            -
             | 
| 153 | 
            -
            ## Technical Specifications [optional]
         | 
| 154 | 
            -
             | 
| 155 | 
            -
            ### Model Architecture and Objective
         | 
| 156 | 
            -
             | 
| 157 | 
            -
            [More Information Needed]
         | 
| 158 | 
            -
             | 
| 159 | 
            -
            ### Compute Infrastructure
         | 
| 160 | 
            -
             | 
| 161 | 
            -
            [More Information Needed]
         | 
| 162 | 
            -
             | 
| 163 | 
            -
            #### Hardware
         | 
| 164 | 
            -
             | 
| 165 | 
            -
            [More Information Needed]
         | 
| 166 | 
            -
             | 
| 167 | 
            -
            #### Software
         | 
| 168 | 
            -
             | 
| 169 | 
            -
            [More Information Needed]
         | 
| 170 | 
            -
             | 
| 171 | 
            -
            ## Citation [optional]
         | 
| 172 | 
            -
             | 
| 173 | 
            -
            <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
         | 
| 174 | 
            -
             | 
| 175 | 
            -
            **BibTeX:**
         | 
| 176 | 
            -
             | 
| 177 | 
            -
            [More Information Needed]
         | 
| 178 | 
            -
             | 
| 179 | 
            -
            **APA:**
         | 
| 180 | 
            -
             | 
| 181 | 
            -
            [More Information Needed]
         | 
| 182 | 
            -
             | 
| 183 | 
            -
            ## Glossary [optional]
         | 
| 184 | 
            -
             | 
| 185 | 
            -
            <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
         | 
| 186 | 
            -
             | 
| 187 | 
            -
            [More Information Needed]
         | 
| 188 | 
            -
             | 
| 189 | 
            -
            ## More Information [optional]
         | 
| 190 | 
            -
             | 
| 191 | 
            -
            [More Information Needed]
         | 
| 192 | 
            -
             | 
| 193 | 
            -
            ## Model Card Authors [optional]
         | 
| 194 | 
            -
             | 
| 195 | 
            -
            [More Information Needed]
         | 
| 196 | 
            -
             | 
| 197 | 
            -
            ## Model Card Contact
         | 
| 198 | 
            -
             | 
| 199 | 
            -
            [More Information Needed]
         | 
| 200 | 
            -
            ### Framework versions
         | 
| 201 |  | 
| 202 | 
             
            - PEFT 0.13.0
         | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            base_model: meta-llama/Meta-Llama-3.1-8B
         | 
| 3 | 
             
            library_name: peft
         | 
| 4 | 
            +
            datasets:
         | 
| 5 | 
            +
            - barbaroo/Sprotin_parallel
         | 
| 6 | 
            +
            language:
         | 
| 7 | 
            +
            - en
         | 
| 8 | 
            +
            - fo
         | 
| 9 | 
            +
            metrics:
         | 
| 10 | 
            +
            - bleu
         | 
| 11 | 
            +
            - chrf
         | 
| 12 | 
            +
            - bertscore
         | 
| 13 | 
            +
            pipeline_tag: text-generation
         | 
| 14 | 
             
            ---
         | 
| 15 |  | 
|  | |
|  | |
|  | |
| 16 |  | 
| 17 |  | 
| 18 | 
            +
            # Model Card: English–Faroese Translation Adapter
         | 
| 19 |  | 
| 20 | 
             
            ## Model Details
         | 
| 21 |  | 
| 22 | 
            +
            **Model Description**
         | 
|  | |
|  | |
|  | |
|  | |
| 23 |  | 
| 24 | 
            +
            - **Developed by:** Barbara Scalvini
         | 
| 25 | 
            +
            - **Model type:** Language model adapter for **English → Faroese** translation  
         | 
| 26 | 
            +
            - **Language(s):** English, Faroese  
         | 
| 27 | 
            +
            - **License:** This adapter inherits the license from the original Llama 3.1 8B model.
         | 
| 28 | 
            +
            - **Finetuned from model:** [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)  
         | 
| 29 | 
            +
            - **Library used:** [PEFT 0.13.0](https://github.com/huggingface/peft)
         | 
|  | |
| 30 |  | 
| 31 | 
            +
            ### Model Sources 
         | 
| 32 |  | 
| 33 | 
            +
            - **Paper:** [COMING SOON]  
         | 
| 34 | 
            +
            ---
         | 
|  | |
|  | |
|  | |
| 35 |  | 
| 36 | 
             
            ## Uses
         | 
| 37 |  | 
|  | |
|  | |
| 38 | 
             
            ### Direct Use
         | 
| 39 | 
            +
            This adapter is intended to perform **English→Faroese** translation, leveraging a **parameter-efficient fine-tuning** (PEFT) approach.
         | 
|  | |
|  | |
|  | |
| 40 |  | 
| 41 | 
             
            ### Downstream Use [optional]
         | 
| 42 | 
            +
            - Can be integrated into broader **multilingual** or **localization** workflows.
         | 
| 43 |  | 
|  | |
|  | |
|  | |
| 44 |  | 
| 45 | 
             
            ### Out-of-Scope Use
         | 
| 46 | 
            +
            - Any uses that rely on languages other than **English or Faroese** will likely yield suboptimal results.
         | 
| 47 | 
            +
            - Other tasks (e.g., summarization, classification) may be unsupported or require further fine-tuning.
         | 
| 48 |  | 
| 49 | 
            +
            ---
         | 
|  | |
|  | |
| 50 |  | 
| 51 | 
             
            ## Bias, Risks, and Limitations
         | 
| 52 | 
            +
            - **Biases:** The model could reflect **biases** present in the training data, such as historical or societal biases in English or Faroese texts.
         | 
| 53 | 
            +
            - **Recommendation:** Users should **critically evaluate** outputs, especially in sensitive or high-stakes applications.
         | 
| 54 |  | 
| 55 | 
            +
            ---
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 56 |  | 
| 57 | 
             
            ## How to Get Started with the Model
         | 
| 58 |  | 
| 59 | 
            +
            ```python
         | 
| 60 | 
            +
            from transformers import AutoModelForCausalLM, AutoTokenizer
         | 
| 61 | 
            +
            import torch
         | 
| 62 | 
            +
             | 
| 63 | 
            +
            # Load the trained model and tokenizer from the checkpoint
         | 
| 64 | 
            +
            checkpoint_dir = "barbaroo/llama3.1_translate_8B"  # The directory where your trained model and tokenizer are saved
         | 
| 65 | 
            +
            model = AutoModelForCausalLM.from_pretrained(checkpoint_dir, device_map="auto", load_in_8bit = True)
         | 
| 66 | 
            +
            tokenizer = AutoTokenizer.from_pretrained(checkpoint_dir)
         | 
| 67 | 
            +
            MAX_SEQ_LENGTH = 512
         | 
| 68 | 
            +
            sentences = ["What's your name?"]
         | 
| 69 | 
            +
             | 
| 70 | 
            +
            # Define the prompt template (same as in training)
         | 
| 71 | 
            +
            alpaca_prompt = """
         | 
| 72 | 
            +
            ### Instruction:
         | 
| 73 | 
            +
            {}
         | 
| 74 | 
            +
             | 
| 75 | 
            +
            ### Input:
         | 
| 76 | 
            +
            {}
         | 
| 77 | 
            +
             | 
| 78 | 
            +
            ### Response:
         | 
| 79 | 
            +
            {}"""
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            # Inference loop
         | 
| 82 | 
            +
            for sentence in sentences:
         | 
| 83 | 
            +
                inputs = tokenizer(
         | 
| 84 | 
            +
                    [
         | 
| 85 | 
            +
                        alpaca_prompt.format(
         | 
| 86 | 
            +
                            "Translate this sentence from English to Faroese:",  # Instruction
         | 
| 87 | 
            +
                            sentence,  # The input sentence to translate
         | 
| 88 | 
            +
                            "",  # Leave blank for generation
         | 
| 89 | 
            +
                        )
         | 
| 90 | 
            +
                    ],
         | 
| 91 | 
            +
                    return_tensors="pt",
         | 
| 92 | 
            +
                    padding=True,
         | 
| 93 | 
            +
                    truncation=True,  # Make sure the input is not too long
         | 
| 94 | 
            +
                    max_length=MAX_SEQ_LENGTH  # Enforce the max length if necessary
         | 
| 95 | 
            +
                ).to("cuda")
         | 
| 96 | 
            +
             | 
| 97 | 
            +
                # Generate the translation
         | 
| 98 | 
            +
                outputs = model.generate(
         | 
| 99 | 
            +
                    **inputs,
         | 
| 100 | 
            +
                    max_new_tokens=512,  # Limit the number of new tokens generated
         | 
| 101 | 
            +
                    eos_token_id=tokenizer.eos_token_id,  # Ensure EOS token is used
         | 
| 102 | 
            +
                    pad_token_id=tokenizer.pad_token_id,  # Ensure padding token is used
         | 
| 103 | 
            +
                    temperature=0.1,  # Sampling temperature for diversity
         | 
| 104 | 
            +
                    top_p=1.0,  # Sampling top-p for generation
         | 
| 105 | 
            +
                    use_cache=True  # Use cache for efficiency
         | 
| 106 | 
            +
                )
         | 
| 107 | 
            +
             | 
| 108 | 
            +
                # Decode the generated tokens into text
         | 
| 109 | 
            +
                output_string = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
         | 
| 110 | 
            +
                print(f"Input: {sentence}")
         | 
| 111 | 
            +
                print(f"Generated Translation: {output_string}")
         | 
| 112 | 
            +
            ```
         | 
| 113 |  | 
|  | |
| 114 |  | 
| 115 | 
             
            ## Training Details
         | 
| 116 |  | 
| 117 | 
             
            ### Training Data
         | 
| 118 |  | 
| 119 | 
            +
            We used the Sprotin parallel corpus for **English–Faroese** translation: [barbaroo/Sprotin_parallel](https://huggingface.co/datasets/barbaroo/Sprotin_parallel). 
         | 
| 120 |  | 
|  | |
| 121 |  | 
| 122 | 
             
            ### Training Procedure
         | 
| 123 |  | 
|  | |
|  | |
| 124 | 
             
            #### Preprocessing [optional]
         | 
| 125 |  | 
| 126 | 
            +
            - **Tokenization**: We used the tokenizer from the base model `meta-llama/Llama-3.1-8B`.
         | 
| 127 | 
            +
            - The Alpaca prompt format was used, with Instruction, Input and Response. 
         | 
| 128 |  | 
| 129 | 
             
            #### Training Hyperparameters
         | 
| 130 | 
            +
             
         | 
| 131 | 
            +
            - **Epochs**: **3** total, with an **early stopping** criterion monitoring validation loss.  
         | 
| 132 | 
            +
            - **Batch Size**: **2, with 4 Gradient accumulation steps**  
         | 
| 133 | 
            +
            - **Learning Rate**: **2e-4** 
         | 
| 134 | 
            +
            - **Optimizer**: **AdamW** with a linear learning-rate scheduler and warm-up.
         | 
| 135 |  | 
| 136 | 
            +
            ---
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 137 |  | 
| 138 | 
             
            ## Evaluation
         | 
| 139 |  | 
|  | |
|  | |
| 140 | 
             
            ### Testing Data, Factors & Metrics
         | 
| 141 |  | 
| 142 | 
             
            #### Testing Data
         | 
| 143 |  | 
| 144 | 
            +
            - The model was evaluated on the **[FLORES-200]** benchmark,  of ~1012 English–Faroese pairs.  
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 145 |  | 
|  | |
| 146 |  | 
| 147 | 
            +
            #### Metrics and Results
         | 
| 148 |  | 
| 149 | 
            +
            - **BLEU**: **[0.175]** 
         | 
| 150 | 
            +
            - **chrF**: **[49.5]**
         | 
| 151 | 
            +
            - **BERTScore f1**: **[0.948]**
         | 
| 152 |  | 
| 153 | 
            +
            Human evaluation was also performed (see paper)
         | 
| 154 |  | 
| 155 |  | 
| 156 | 
            +
            ## Citation []
         | 
| 157 |  | 
| 158 | 
            +
            [COMING SOON]
         | 
| 159 |  | 
| 160 | 
            +
            ---
         | 
| 161 | 
            +
            ## Framework versions 
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 162 |  | 
| 163 | 
             
            - PEFT 0.13.0
         |