Commit 
							
							·
						
						ab52c51
	
1
								Parent(s):
							
							97c9383
								
Upload model
Browse files- README.md +24 -3
- config.json +24 -0
- special_tokens_map.json +1 -0
- tokenizer_config.json +1 -0
- vocab.txt +0 -0
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,24 @@ | |
| 1 | 
            -
             | 
| 2 | 
            -
             | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ## `semanlink_all_mpnet_base_v2`
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            `semanlink_all_mpnet_base_v2` has been fine-tuned on the knowledge graph [Semanlink](http://www.semanlink.net/sl/home?lang=fr) via the library [MKB](https://github.com/raphaelsty/mkb) on the link-prediction task. The model is dedicated to the representation of both technical and generic terminology in machine learning, NLP, news.
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            ## Usage (Sentence-Transformers)
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            Using this model becomes easy when you have sentence-transformers installed:
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            ```
         | 
| 12 | 
            +
            pip install -U sentence-transformers
         | 
| 13 | 
            +
            ```
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            Then you can use the model like this:
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            ```python
         | 
| 18 | 
            +
            from sentence_transformers import SentenceTransformer
         | 
| 19 | 
            +
            sentences = ["Machine Learning", "Geoffrey Hinton"]
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            model = SentenceTransformer('raphaelsty/semanlink_all_mpnet_base_v2')
         | 
| 22 | 
            +
            embeddings = model.encode(sentences)
         | 
| 23 | 
            +
            print(embeddings)
         | 
| 24 | 
            +
            ```
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,24 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
         | 
| 3 | 
            +
              "architectures": [
         | 
| 4 | 
            +
                "MPNetModel"
         | 
| 5 | 
            +
              ],
         | 
| 6 | 
            +
              "attention_probs_dropout_prob": 0.1,
         | 
| 7 | 
            +
              "bos_token_id": 0,
         | 
| 8 | 
            +
              "eos_token_id": 2,
         | 
| 9 | 
            +
              "hidden_act": "gelu",
         | 
| 10 | 
            +
              "hidden_dropout_prob": 0.1,
         | 
| 11 | 
            +
              "hidden_size": 768,
         | 
| 12 | 
            +
              "initializer_range": 0.02,
         | 
| 13 | 
            +
              "intermediate_size": 3072,
         | 
| 14 | 
            +
              "layer_norm_eps": 1e-05,
         | 
| 15 | 
            +
              "max_position_embeddings": 514,
         | 
| 16 | 
            +
              "model_type": "mpnet",
         | 
| 17 | 
            +
              "num_attention_heads": 12,
         | 
| 18 | 
            +
              "num_hidden_layers": 12,
         | 
| 19 | 
            +
              "pad_token_id": 1,
         | 
| 20 | 
            +
              "relative_attention_num_buckets": 32,
         | 
| 21 | 
            +
              "torch_dtype": "float32",
         | 
| 22 | 
            +
              "transformers_version": "4.17.0",
         | 
| 23 | 
            +
              "vocab_size": 30527
         | 
| 24 | 
            +
            }
         | 
    	
        special_tokens_map.json
    ADDED
    
    | @@ -0,0 +1 @@ | |
|  | 
|  | |
| 1 | 
            +
            {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "[UNK]", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
         | 
    	
        tokenizer_config.json
    ADDED
    
    | @@ -0,0 +1 @@ | |
|  | 
|  | |
| 1 | 
            +
            {"do_lower_case": true, "bos_token": "<s>", "eos_token": "</s>", "sep_token": "</s>", "cls_token": "<s>", "unk_token": "[UNK]", "pad_token": "<pad>", "mask_token": "<mask>", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "sentence-transformers/all-mpnet-base-v2", "tokenizer_class": "MPNetTokenizer"}
         | 
    	
        vocab.txt
    ADDED
    
    | The diff for this file is too large to render. 
		See raw diff | 
|  | 
