uploaded readme
Browse files
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,179 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            GGUF quantization made by Richard Erkhov.
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            [Github](https://github.com/RichardErkhov)
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            [Discord](https://discord.gg/pvy7H8DZMG)
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            [Request more models](https://github.com/RichardErkhov/quant_request)
         | 
| 8 | 
            +
             | 
| 9 | 
            +
             | 
| 10 | 
            +
            Octopus-v2 - GGUF
         | 
| 11 | 
            +
            - Model creator: https://huggingface.co/NexaAIDev/
         | 
| 12 | 
            +
            - Original model: https://huggingface.co/NexaAIDev/Octopus-v2/
         | 
| 13 | 
            +
             | 
| 14 | 
            +
             | 
| 15 | 
            +
            | Name | Quant method | Size |
         | 
| 16 | 
            +
            | ---- | ---- | ---- |
         | 
| 17 | 
            +
            | [Octopus-v2.Q2_K.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q2_K.gguf) | Q2_K | 1.08GB |
         | 
| 18 | 
            +
            | [Octopus-v2.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.IQ3_XS.gguf) | IQ3_XS | 1.16GB |
         | 
| 19 | 
            +
            | [Octopus-v2.IQ3_S.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.IQ3_S.gguf) | IQ3_S | 1.2GB |
         | 
| 20 | 
            +
            | [Octopus-v2.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q3_K_S.gguf) | Q3_K_S | 1.2GB |
         | 
| 21 | 
            +
            | [Octopus-v2.IQ3_M.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.IQ3_M.gguf) | IQ3_M | 1.22GB |
         | 
| 22 | 
            +
            | [Octopus-v2.Q3_K.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q3_K.gguf) | Q3_K | 1.29GB |
         | 
| 23 | 
            +
            | [Octopus-v2.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q3_K_M.gguf) | Q3_K_M | 1.29GB |
         | 
| 24 | 
            +
            | [Octopus-v2.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q3_K_L.gguf) | Q3_K_L | 1.36GB |
         | 
| 25 | 
            +
            | [Octopus-v2.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.IQ4_XS.gguf) | IQ4_XS | 1.4GB |
         | 
| 26 | 
            +
            | [Octopus-v2.Q4_0.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q4_0.gguf) | Q4_0 | 1.44GB |
         | 
| 27 | 
            +
            | [Octopus-v2.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.IQ4_NL.gguf) | IQ4_NL | 1.45GB |
         | 
| 28 | 
            +
            | [Octopus-v2.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q4_K_S.gguf) | Q4_K_S | 1.45GB |
         | 
| 29 | 
            +
            | [Octopus-v2.Q4_K.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q4_K.gguf) | Q4_K | 1.52GB |
         | 
| 30 | 
            +
            | [Octopus-v2.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q4_K_M.gguf) | Q4_K_M | 1.52GB |
         | 
| 31 | 
            +
            | [Octopus-v2.Q4_1.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q4_1.gguf) | Q4_1 | 1.56GB |
         | 
| 32 | 
            +
            | [Octopus-v2.Q5_0.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q5_0.gguf) | Q5_0 | 1.68GB |
         | 
| 33 | 
            +
            | [Octopus-v2.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q5_K_S.gguf) | Q5_K_S | 1.68GB |
         | 
| 34 | 
            +
            | [Octopus-v2.Q5_K.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q5_K.gguf) | Q5_K | 1.71GB |
         | 
| 35 | 
            +
            | [Octopus-v2.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q5_K_M.gguf) | Q5_K_M | 1.71GB |
         | 
| 36 | 
            +
            | [Octopus-v2.Q5_1.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q5_1.gguf) | Q5_1 | 1.79GB |
         | 
| 37 | 
            +
            | [Octopus-v2.Q6_K.gguf](https://huggingface.co/RichardErkhov/NexaAIDev_-_Octopus-v2-gguf/blob/main/Octopus-v2.Q6_K.gguf) | Q6_K | 1.92GB |
         | 
| 38 | 
            +
             | 
| 39 | 
            +
             | 
| 40 | 
            +
             | 
| 41 | 
            +
            	Original model description:
         | 
| 42 | 
            +
            	---
         | 
| 43 | 
            +
            license: apache-2.0
         | 
| 44 | 
            +
            base_model: google/gemma-2b
         | 
| 45 | 
            +
            model-index:
         | 
| 46 | 
            +
            - name: Octopus-V2-2B
         | 
| 47 | 
            +
              results: []
         | 
| 48 | 
            +
            tags:
         | 
| 49 | 
            +
            - function calling
         | 
| 50 | 
            +
            - on-device language model
         | 
| 51 | 
            +
            - android
         | 
| 52 | 
            +
            inference: false
         | 
| 53 | 
            +
            space: false
         | 
| 54 | 
            +
            spaces: false
         | 
| 55 | 
            +
            language:
         | 
| 56 | 
            +
            - en
         | 
| 57 | 
            +
            ---
         | 
| 58 | 
            +
            # Octopus V2: On-device language model for super agent
         | 
| 59 | 
            +
            <p align="center">
         | 
| 60 | 
            +
            - <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Product</a>
         | 
| 61 | 
            +
            - <a href="https://arxiv.org/abs/2404.01744" target="_blank">ArXiv</a>
         | 
| 62 | 
            +
            - <a href="https://www.youtube.com/watch?v=jhM0D0OObOw&ab_channel=NexaAI" target="_blank">Video Demo</a>
         | 
| 63 | 
            +
            </p>
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            <p align="center" width="100%">
         | 
| 66 | 
            +
              <a><img src="Octopus-logo.jpeg" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
         | 
| 67 | 
            +
            </p>
         | 
| 68 | 
            +
             | 
| 69 | 
            +
            ## Introducing Octopus-V2-2B
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            Octopus-V2-2B, an advanced open-source language model with 2 billion parameters, represents Nexa AI's research breakthrough in the application of large language models (LLMs) for function calling, specifically tailored for Android APIs. Unlike Retrieval-Augmented Generation (RAG) methods, which require detailed descriptions of potential function arguments—sometimes needing up to tens of thousands of input tokens—Octopus-V2-2B introduces a unique **functional token** strategy for both its training and inference stages. This approach not only allows it to achieve performance levels comparable to GPT-4 but also significantly enhances its inference speed beyond that of RAG-based methods, making it especially beneficial for edge computing devices.
         | 
| 72 | 
            +
             | 
| 73 | 
            +
            📱 **On-device Applications**:  Octopus-V2-2B is engineered to operate seamlessly on Android devices, extending its utility across a wide range of applications, from Android system management to the orchestration of multiple devices. 
         | 
| 74 | 
            +
             | 
| 75 | 
            +
            🚀 **Inference Speed**: When benchmarked, Octopus-V2-2B demonstrates a remarkable inference speed, outperforming the combination of "Llama7B + RAG solution" by a factor of 36X on a single A100 GPU. Furthermore, compared to GPT-4-turbo (gpt-4-0125-preview), which relies on clusters A100/H100 GPUs, Octopus-V2-2B is 168% faster. This efficiency is attributed to our **functional token** design.
         | 
| 76 | 
            +
             | 
| 77 | 
            +
            🐙 **Accuracy**: Octopus-V2-2B not only excels in speed but also in accuracy, surpassing the "Llama7B + RAG solution" in function call accuracy by 31%. It achieves a function call accuracy comparable to GPT-4 and RAG + GPT-3.5, with scores ranging between 98% and 100% across benchmark datasets.
         | 
| 78 | 
            +
             | 
| 79 | 
            +
            💪 **Function Calling Capabilities**: Octopus-V2-2B is capable of generating individual, nested, and parallel function calls across a variety of complex scenarios.
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            ## Example Use Cases
         | 
| 82 | 
            +
             | 
| 83 | 
            +
             | 
| 84 | 
            +
            <p align="center" width="100%">
         | 
| 85 | 
            +
            <a><img src="tool-usage-compressed.png" alt="ondevice" style="width: 80%; min-width: 300px; display: block; margin: auto;"></a>
         | 
| 86 | 
            +
            </p>
         | 
| 87 | 
            +
             | 
| 88 | 
            +
            You can run the model on a GPU using the following code. 
         | 
| 89 | 
            +
            ```python
         | 
| 90 | 
            +
            from transformers import AutoTokenizer, GemmaForCausalLM
         | 
| 91 | 
            +
            import torch
         | 
| 92 | 
            +
            import time
         | 
| 93 | 
            +
             | 
| 94 | 
            +
            def inference(input_text):
         | 
| 95 | 
            +
                start_time = time.time()
         | 
| 96 | 
            +
                input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
         | 
| 97 | 
            +
                input_length = input_ids["input_ids"].shape[1]
         | 
| 98 | 
            +
                outputs = model.generate(
         | 
| 99 | 
            +
                    input_ids=input_ids["input_ids"], 
         | 
| 100 | 
            +
                    max_length=1024,
         | 
| 101 | 
            +
                    do_sample=False)
         | 
| 102 | 
            +
                generated_sequence = outputs[:, input_length:].tolist()
         | 
| 103 | 
            +
                res = tokenizer.decode(generated_sequence[0])
         | 
| 104 | 
            +
                end_time = time.time()
         | 
| 105 | 
            +
                return {"output": res, "latency": end_time - start_time}
         | 
| 106 | 
            +
             | 
| 107 | 
            +
            model_id = "NexaAIDev/Octopus-v2"
         | 
| 108 | 
            +
            tokenizer = AutoTokenizer.from_pretrained(model_id)
         | 
| 109 | 
            +
            model = GemmaForCausalLM.from_pretrained(
         | 
| 110 | 
            +
                model_id, torch_dtype=torch.bfloat16, device_map="auto"
         | 
| 111 | 
            +
            )
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            input_text = "Take a selfie for me with front camera"
         | 
| 114 | 
            +
            nexa_query = f"Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:"
         | 
| 115 | 
            +
            start_time = time.time()
         | 
| 116 | 
            +
            print("nexa model result:\n", inference(nexa_query))
         | 
| 117 | 
            +
            print("latency:", time.time() - start_time," s")
         | 
| 118 | 
            +
            ```
         | 
| 119 | 
            +
             | 
| 120 | 
            +
            ## Evaluation
         | 
| 121 | 
            +
             | 
| 122 | 
            +
            The benchmark result can be viewed in [this excel](android_benchmark.xlsx), which is manually verified. All the queries in the benchmark test are sampled by Gemini. 
         | 
| 123 | 
            +
             | 
| 124 | 
            +
            <p align="center" width="100%">
         | 
| 125 | 
            +
            <a><img src="latency_plot.jpg" alt="ondevice" style="width: 80%; min-width: 300px; display: block; margin: auto; margin-bottom: 20px;"></a>
         | 
| 126 | 
            +
            <a><img src="accuracy_plot.jpg" alt="ondevice" style="width: 80%; min-width: 300px; display: block; margin: auto;"></a>
         | 
| 127 | 
            +
            </p>
         | 
| 128 | 
            +
             | 
| 129 | 
            +
            **Note**: One can notice that the query includes all necessary parameters used for a function. It is expected that query includes all parameters during inference as well.
         | 
| 130 | 
            +
             | 
| 131 | 
            +
            ## Training Data
         | 
| 132 | 
            +
            We wrote 20 Android API descriptions to used to train the models, see [this file](android_functions.txt) for details. The Android API implementations for our demos, and our training data will be published later. Below is one Android API description example
         | 
| 133 | 
            +
            ```
         | 
| 134 | 
            +
            def get_trending_news(category=None, region='US', language='en', max_results=5):
         | 
| 135 | 
            +
                """
         | 
| 136 | 
            +
                Fetches trending news articles based on category, region, and language.
         | 
| 137 | 
            +
             | 
| 138 | 
            +
                Parameters:
         | 
| 139 | 
            +
                - category (str, optional): News category to filter by, by default use None for all categories. Optional to provide.
         | 
| 140 | 
            +
                - region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses 'US'. Optional to provide.
         | 
| 141 | 
            +
                - language (str, optional): ISO 639-1 language code for article language, by default uses 'en'. Optional to provide.
         | 
| 142 | 
            +
                - max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide.
         | 
| 143 | 
            +
             | 
| 144 | 
            +
                Returns:
         | 
| 145 | 
            +
                - list[str]: A list of strings, each representing an article. Each string contains the article's heading and URL.
         | 
| 146 | 
            +
                """
         | 
| 147 | 
            +
            ```
         | 
| 148 | 
            +
             | 
| 149 | 
            +
             | 
| 150 | 
            +
            ## License
         | 
| 151 | 
            +
            This model was trained on commercially viable data and is under the [Nexa AI community disclaimer](https://www.nexa4ai.com/disclaimer). 
         | 
| 152 | 
            +
             | 
| 153 | 
            +
             | 
| 154 | 
            +
            ## References
         | 
| 155 | 
            +
            We thank the Google Gemma team for their amazing models!
         | 
| 156 | 
            +
            ```
         | 
| 157 | 
            +
            @misc{gemma-2023-open-models,
         | 
| 158 | 
            +
              author = {{Gemma Team, Google DeepMind}},
         | 
| 159 | 
            +
              title = {Gemma: Open Models Based on Gemini Research and Technology},
         | 
| 160 | 
            +
              url = {https://goo.gle/GemmaReport},  
         | 
| 161 | 
            +
              year = {2023},
         | 
| 162 | 
            +
            }
         | 
| 163 | 
            +
            ```
         | 
| 164 | 
            +
             | 
| 165 | 
            +
            ## Citation
         | 
| 166 | 
            +
            ```
         | 
| 167 | 
            +
            @misc{chen2024octopus,
         | 
| 168 | 
            +
                  title={Octopus v2: On-device language model for super agent}, 
         | 
| 169 | 
            +
                  author={Wei Chen and Zhiyuan Li},
         | 
| 170 | 
            +
                  year={2024},
         | 
| 171 | 
            +
                  eprint={2404.01744},
         | 
| 172 | 
            +
                  archivePrefix={arXiv},
         | 
| 173 | 
            +
                  primaryClass={cs.CL}
         | 
| 174 | 
            +
            }
         | 
| 175 | 
            +
            ```
         | 
| 176 | 
            +
             | 
| 177 | 
            +
            ## Contact
         | 
| 178 | 
            +
            Please [contact us](mailto:[email protected]) to reach out for any issues and comments!
         | 
| 179 | 
            +
            	
         | 
