Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -6,28 +6,30 @@ base_model: 
     | 
|
| 6 | 
         
             
            tags:
         
     | 
| 7 | 
         
             
            - InternVL2_5
         
     | 
| 8 | 
         
             
            - InternVL2_5-1B
         
     | 
| 
         | 
|
| 9 | 
         
             
            - Int8
         
     | 
| 10 | 
         
             
            - VLM
         
     | 
| 
         | 
|
| 11 | 
         
             
            ---
         
     | 
| 12 | 
         | 
| 13 | 
         
            -
            # InternVL2_5-1B- 
     | 
| 14 | 
         | 
| 15 | 
         
            -
            This version of InternVL2_5-1B has been converted to run on the Axera NPU using **w8a16** quantization.
         
     | 
| 16 | 
         | 
| 17 | 
         
             
            This model has been optimized with the following LoRA: 
         
     | 
| 18 | 
         | 
| 19 | 
         
            -
            Compatible with Pulsar2 version:  
     | 
| 20 | 
         | 
| 21 | 
         
             
            ## Convert tools links:
         
     | 
| 22 | 
         | 
| 23 | 
         
             
            For those who are interested in model conversion, you can try to export axmodel through the original repo : 
         
     | 
| 24 | 
         
             
            https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
         
     | 
| 25 | 
         | 
| 26 | 
         
            -
            [ 
     | 
| 27 | 
         | 
| 28 | 
         
            -
            [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ 
     | 
| 29 | 
         | 
| 30 | 
         
            -
            [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl- 
     | 
| 31 | 
         | 
| 32 | 
         
             
            ## Support Platform
         
     | 
| 33 | 
         | 
| 
         @@ -41,10 +43,9 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO 
     | 
|
| 41 | 
         
             
            |AX650| 350 ms | 420 ms |32 tokens/sec|
         
     | 
| 42 | 
         | 
| 43 | 
         
             
            - AX630C
         
     | 
| 44 | 
         
            -
              -  
     | 
| 45 | 
         
            -
              - [ 
     | 
| 46 | 
         
            -
              - [ 
     | 
| 47 | 
         
            -
            - AX630C
         
     | 
| 48 | 
         | 
| 49 | 
         
             
            |Chips|image encoder 364|ttft|w8a16|
         
     | 
| 50 | 
         
             
            |--|--|--|--|
         
     | 
| 
         @@ -55,15 +56,23 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO 
     | 
|
| 55 | 
         
             
            Download all files from this repository to the device
         
     | 
| 56 | 
         | 
| 57 | 
         
             
            ```
         
     | 
| 58 | 
         
            -
            root@ 
     | 
| 59 | 
         
             
            .
         
     | 
| 
         | 
|
| 60 | 
         
             
            |-- config.json
         
     | 
| 
         | 
|
| 61 | 
         
             
            |-- internvl2_5_1b_364_ax630c
         
     | 
| 
         | 
|
| 62 | 
         
             
            |-- internvl2_5_tokenizer
         
     | 
| 63 | 
         
             
            |-- internvl2_5_tokenizer_364.py
         
     | 
| 
         | 
|
| 64 | 
         
             
            |-- main
         
     | 
| 
         | 
|
| 
         | 
|
| 65 | 
         
             
            |-- run_internvl2_5_364_ax630c.sh
         
     | 
| 66 | 
         
            -
            `--  
     | 
| 
         | 
|
| 
         | 
|
| 67 | 
         
             
            ```
         
     | 
| 68 | 
         | 
| 69 | 
         
             
            #### Install transformer
         
     | 
| 
         @@ -75,16 +84,16 @@ pip install transformers==4.41.1 
     | 
|
| 75 | 
         
             
            #### Start the Tokenizer service
         
     | 
| 76 | 
         | 
| 77 | 
         
             
            ```
         
     | 
| 78 | 
         
            -
             
     | 
| 79 | 
         
            -
            None None 151645 <|im_end|>
         
     | 
| 80 | 
         
            -
             
     | 
| 81 | 
         
            -
             
     | 
| 82 | 
         
            -
             
     | 
| 83 | 
         
            -
             
     | 
| 84 | 
         
            -
            http:// 
     | 
| 85 | 
         
             
            ```
         
     | 
| 86 | 
         | 
| 87 | 
         
            -
            #### Inference with  
     | 
| 88 | 
         | 
| 89 | 
         
             
            - input text
         
     | 
| 90 | 
         | 
| 
         @@ -94,35 +103,71 @@ Describe the picture 
     | 
|
| 94 | 
         | 
| 95 | 
         
             
            - input image
         
     | 
| 96 | 
         | 
| 97 | 
         
            -
             
         
     | 
| 29 | 
         | 
| 30 | 
         
            +
            [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl) 
         
     | 
| 31 | 
         | 
| 32 | 
         
            +
            [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)
         
     | 
| 33 | 
         | 
| 34 | 
         
             
            ## Support Platform
         
     | 
| 35 | 
         | 
| 
         | 
|
| 43 | 
         
             
            |AX650| 350 ms | 420 ms |32 tokens/sec|
         
     | 
| 44 | 
         | 
| 45 | 
         
             
            - AX630C
         
     | 
| 46 | 
         
            +
              - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
         
     | 
| 47 | 
         
            +
              - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
         
     | 
| 48 | 
         
            +
              - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
         
     | 
| 
         | 
|
| 49 | 
         | 
| 50 | 
         
             
            |Chips|image encoder 364|ttft|w8a16|
         
     | 
| 51 | 
         
             
            |--|--|--|--|
         
     | 
| 
         | 
|
| 56 | 
         
             
            Download all files from this repository to the device
         
     | 
| 57 | 
         | 
| 58 | 
         
             
            ```
         
     | 
| 59 | 
         
            +
            root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# tree -L 1
         
     | 
| 60 | 
         
             
            .
         
     | 
| 61 | 
         
            +
            |-- README.md
         
     | 
| 62 | 
         
             
            |-- config.json
         
     | 
| 63 | 
         
            +
            |-- image1.jpg
         
     | 
| 64 | 
         
             
            |-- internvl2_5_1b_364_ax630c
         
     | 
| 65 | 
         
            +
            |-- internvl2_5_1b_448_ax650
         
     | 
| 66 | 
         
             
            |-- internvl2_5_tokenizer
         
     | 
| 67 | 
         
             
            |-- internvl2_5_tokenizer_364.py
         
     | 
| 68 | 
         
            +
            |-- internvl2_5_tokenizer_448.py
         
     | 
| 69 | 
         
             
            |-- main
         
     | 
| 70 | 
         
            +
            |-- main_ax650
         
     | 
| 71 | 
         
            +
            |-- post_config.json
         
     | 
| 72 | 
         
             
            |-- run_internvl2_5_364_ax630c.sh
         
     | 
| 73 | 
         
            +
            `-- run_internvl2_5_448_ax650.sh
         
     | 
| 74 | 
         
            +
             
     | 
| 75 | 
         
            +
            3 directories, 10 files
         
     | 
| 76 | 
         
             
            ```
         
     | 
| 77 | 
         | 
| 78 | 
         
             
            #### Install transformer
         
     | 
| 
         | 
|
| 84 | 
         
             
            #### Start the Tokenizer service
         
     | 
| 85 | 
         | 
| 86 | 
         
             
            ```
         
     | 
| 87 | 
         
            +
            root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# python3 internvl2_5_tokenizer_448.py
         
     | 
| 88 | 
         
            +
            None None 151645 <|im_end|> 151665 151667
         
     | 
| 89 | 
         
            +
            context_len is  256
         
     | 
| 90 | 
         
            +
            prompt is <|im_start|>system
         
     | 
| 91 | 
         
            +
            你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
         
     | 
| 92 | 
         
            +
            .......
         
     | 
| 93 | 
         
            +
            http://0.0.0.0:12345
         
     | 
| 94 | 
         
             
            ```
         
     | 
| 95 | 
         | 
| 96 | 
         
            +
            #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
         
     | 
| 97 | 
         | 
| 98 | 
         
             
            - input text
         
     | 
| 99 | 
         | 
| 
         | 
|
| 103 | 
         | 
| 104 | 
         
             
            - input image
         
     | 
| 105 | 
         | 
| 106 | 
         
            +
            
         
     | 
| 107 | 
         | 
| 108 | 
         
            +
            Open another terminal and run `./run_internvl2_5_448_ax650.sh`
         
     | 
| 109 | 
         | 
| 110 | 
         
             
            ```
         
     | 
| 111 | 
         
            +
            root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# ./run_internvl2_5_448_ax650.sh
         
     | 
| 112 | 
         
            +
            [I][                            Init][ 134]: LLM init start
         
     | 
| 113 | 
         
            +
            [I][                            Init][  34]: connect http://0.0.0.0:12345 ok
         
     | 
| 114 | 
         
             
            bos_id: -1, eos_id: 151645
         
     | 
| 115 | 
         
            +
            img_start_token: 151665
         
     | 
| 116 | 
         
            +
            img_context_token: 151667
         
     | 
| 117 | 
         
            +
              3% | ██                                |   1 /  27 [0.01s<0.30s, 90.91 count/s] tokenizer init ok
         
     | 
| 118 | 
         
            +
            [I][                            Init][  45]: LLaMaEmbedSelector use mmap
         
     | 
| 119 | 
         
            +
              7% | ███                               |   2 /  27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
         
     | 
| 120 | 
         
            +
            100% | ████████████████████████████████ |  27 /  27 [4.31s<4.31s, 6.26 count/s] init post axmodel ok,remain_cmm(3881 MB)
         
     | 
| 121 | 
         
            +
            [I][                            Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
         
     | 
| 122 | 
         
            +
            [I][                            Init][ 251]: image encoder input nchw@float32
         
     | 
| 123 | 
         
            +
            [I][                            Init][ 281]: image encoder output float32
         
     | 
| 124 | 
         
            +
             
     | 
| 125 | 
         
            +
            [I][                            Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
         
     | 
| 126 | 
         
            +
            [I][                            Init][ 293]: max_token_len : 2559
         
     | 
| 127 | 
         
            +
            [I][                            Init][ 296]: kv_cache_size : 128, kv_cache_num: 2559
         
     | 
| 128 | 
         
            +
            [I][                            Init][ 304]: prefill_token_num : 128
         
     | 
| 129 | 
         
            +
            [I][                            Init][ 308]: grp: 1, prefill_max_token_num : 1
         
     | 
| 130 | 
         
            +
            [I][                            Init][ 308]: grp: 2, prefill_max_token_num : 128
         
     | 
| 131 | 
         
            +
            [I][                            Init][ 308]: grp: 3, prefill_max_token_num : 256
         
     | 
| 132 | 
         
            +
            [I][                            Init][ 308]: grp: 4, prefill_max_token_num : 384
         
     | 
| 133 | 
         
            +
            [I][                            Init][ 308]: grp: 5, prefill_max_token_num : 512
         
     | 
| 134 | 
         
            +
            [I][                            Init][ 308]: grp: 6, prefill_max_token_num : 640
         
     | 
| 135 | 
         
            +
            [I][                            Init][ 308]: grp: 7, prefill_max_token_num : 768
         
     | 
| 136 | 
         
            +
            [I][                            Init][ 308]: grp: 8, prefill_max_token_num : 896
         
     | 
| 137 | 
         
            +
            [I][                            Init][ 308]: grp: 9, prefill_max_token_num : 1024
         
     | 
| 138 | 
         
            +
            [I][                            Init][ 312]: prefill_max_token_num : 1024
         
     | 
| 139 | 
         
            +
            [I][                     load_config][ 282]: load config:
         
     | 
| 140 | 
         
            +
            {
         
     | 
| 141 | 
         
            +
                "enable_repetition_penalty": false,
         
     | 
| 142 | 
         
            +
                "enable_temperature": true,
         
     | 
| 143 | 
         
            +
                "enable_top_k_sampling": true,
         
     | 
| 144 | 
         
            +
                "enable_top_p_sampling": false,
         
     | 
| 145 | 
         
            +
                "penalty_window": 20,
         
     | 
| 146 | 
         
            +
                "repetition_penalty": 1.2,
         
     | 
| 147 | 
         
            +
                "temperature": 0.9,
         
     | 
| 148 | 
         
            +
                "top_k": 10,
         
     | 
| 149 | 
         
            +
                "top_p": 0.8
         
     | 
| 150 | 
         
            +
            }
         
     | 
| 151 | 
         
            +
             
     | 
| 152 | 
         
            +
            [I][                            Init][ 321]: LLM init ok
         
     | 
| 153 | 
         
             
            Type "q" to exit, Ctrl+c to stop current running
         
     | 
| 154 | 
         
            +
            prompt >> Describe the picture
         
     | 
| 155 | 
         
             
            image >> image1.jpg
         
     | 
| 156 | 
         
            +
            [I][                          Encode][ 415]: image encode time : 395.42 ms, size : 229376
         
     | 
| 157 | 
         
            +
            [I][                          Encode][ 524]: idx:0 offset : 48 out_embed.size() : 277760
         
     | 
| 158 | 
         
            +
            [I][                             Run][ 551]: input token num : 310, prefill_split_num : 3
         
     | 
| 159 | 
         
            +
            [I][                             Run][ 566]: prefill grpid 4
         
     | 
| 160 | 
         
            +
            [I][                             Run][ 593]: input_num_token:128
         
     | 
| 161 | 
         
            +
            [I][                             Run][ 593]: input_num_token:128
         
     | 
| 162 | 
         
            +
            [I][                             Run][ 593]: input_num_token:54
         
     | 
| 163 | 
         
            +
            [I][                             Run][ 717]: ttft: 625.86 ms
         
     | 
| 164 | 
         | 
| 165 | 
         
            +
            : The image features a red panda sitting in a tree with a blurred green background indicating foliage.
         
     | 
| 166 | 
         
            +
            The red panda has a distinctive reddish-brown head and back, white underparts, and black patches around its eyes,
         
     | 
| 167 | 
         
            +
            nose, and mouth. It appears to be resting or lounging comfortably on a wooden platform.
         
     | 
| 168 | 
         | 
| 169 | 
         
            +
            [N][                             Run][ 826]: hit eos,avg 27.37 token/s
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 170 | 
         | 
| 171 | 
         
            +
            prompt >> q
         
     | 
| 172 | 
         | 
| 173 | 
         
            +
            ```
         
     |