radoslavralev commited on
Commit
42b1539
·
verified ·
1 Parent(s): 1c4cda6

Add new SentenceTransformer model

Browse files
Files changed (2) hide show
  1. README.md +51 -53
  2. model.safetensors +1 -1
README.md CHANGED
@@ -12,53 +12,51 @@ tags:
12
  - retrieval
13
  - reranking
14
  - generated_from_trainer
15
- - dataset_size:13667
16
  - loss:ArcFaceInBatchLoss
17
  base_model: sentence-transformers/all-MiniLM-L6-v2
18
  widget:
19
- - source_sentence: It was mobilized in December 2014 from elements of the dissolved
20
- 51st Mechanized Brigade and newly formed units .
21
  sentences:
22
- - This North-South route falls entirely in the Belgian territory and runs together
23
- with the Belgian roads N31 and A17 .
24
- - It was mobilized in December 2014 from elements of the disbanded 51st Mechanized
25
- Brigade and newly formed units .
26
- - All windows are double wood , hung up with a single light .
27
- - source_sentence: It is located at Ellison Bay , in the town of Liberty Grove , Wisconsin
28
- .
 
29
  sentences:
30
- - It is located in Ellison Bay , in the town of Liberty Grove , Wisconsin .
31
- - It is located in Liberty Grove , Wisconsin , in the town of Ellison Bay .
32
- - 'The Hadejia River ( Hausa : `` kogin Haɗeja `` ) is a river in northern Nigeria
33
- and is a tributary of the Yobe River ( Komadugu Yobe ) .'
34
- - source_sentence: Both long and short vowels can be nasalized ( differentiation between
35
- `` acces `` and `` Ä cces `` below ) , but long nasal vowels are more common .
 
 
36
  sentences:
37
- - Both long and short vowels can be nasalized ( the distinction between `` acces
38
- `` and `` ącces `` below ) , but long nasal vowels are more common .
39
- - Wilson was a member of the Senate from 1844 to 1846 and 1850 to 1852 . From 1851
40
- to 1852 he was the Massachusetts State Senate 's President .
41
- - Both long vowels can be nasalized ( the distinction between `` acces `` and ``
42
- ącces `` below ) , but long and short nasal vowels are more common .
43
- - source_sentence: At that time , on June 22 , 1754 , Edward Bentham married Bentham
44
- Elizabeth Bates ( d . 1790 ) from Hampshire in the nearby county of Alton .
45
  sentences:
46
- - The Department of Criminal Justice developed the first certificate program in
47
- forensic science in North Carolina and sponsors a summer comparative studies program
48
- based in Europe .
49
- - At that time , on June 22 , 1754 , Edward Bentham married Bentham Elizabeth Bates
50
- ( d . 1790 ) from Hampshire in the nearby county of Alton .
51
- - It was at this time , on 22 June 1754 , that Edward Bentham married Elizabeth
52
- Bates ( d 1790 ) from Alton in the nearby county of Hampshire .
53
- - source_sentence: In 1973 Michels ' apos broke ; Barcelona the world transfer record
54
- to bring Cruyff to Catalonia .
55
  sentences:
56
- - In 1973 , Cruyff 'Barcelona broke the world transfer record to bring Michels to
57
- Catalonia .
58
- - Amalric then marched to Cairo , where Shawar offered Amalric two million pieces
59
- of gold .
60
- - In 1973 Michels ' apos broke ; Barcelona the world transfer record to bring Cruyff
61
- to Catalonia .
62
  datasets:
63
  - redis/langcache-sentencepairs-v2
64
  pipeline_tag: sentence-similarity
@@ -159,9 +157,9 @@ from sentence_transformers import SentenceTransformer
159
  model = SentenceTransformer("redis/langcache-embed-experimental")
160
  # Run inference
161
  sentences = [
162
- "In 1973 Michels ' apos broke ; Barcelona the world transfer record to bring Cruyff to Catalonia .",
163
- "In 1973 Michels ' apos broke ; Barcelona the world transfer record to bring Cruyff to Catalonia .",
164
- "In 1973 , Cruyff 'Barcelona broke the world transfer record to bring Michels to Catalonia .",
165
  ]
166
  embeddings = model.encode(sentences)
167
  print(embeddings.shape)
@@ -170,9 +168,9 @@ print(embeddings.shape)
170
  # Get the similarity scores for the embeddings
171
  similarities = model.similarity(embeddings, embeddings)
172
  print(similarities)
173
- # tensor([[1.0000, 1.0000, 0.9219],
174
- # [1.0000, 1.0000, 0.9219],
175
- # [0.9219, 0.9219, 1.0078]], dtype=torch.bfloat16)
176
  ```
177
 
178
  <!--
@@ -238,18 +236,18 @@ You can finetune this model on your own dataset.
238
  #### LangCache Sentence Pairs (all)
239
 
240
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
241
- * Size: 6,780 training samples
242
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
243
  * Approximate statistics based on the first 1000 samples:
244
  | | anchor | positive | negative |
245
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
246
  | type | string | string | string |
247
- | details | <ul><li>min: 8 tokens</li><li>mean: 26.28 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.27 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.25 tokens</li><li>max: 47 tokens</li></ul> |
248
  * Samples:
249
  | anchor | positive | negative |
250
  |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
251
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>This marine species occurs in the eastern Indian Ocean and before the Maldives and New Caledonia .</code> |
252
- | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>Both young people burn with love really , for both , but without being able to say it to himself , admitting him always .</code> |
253
  | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
254
  * Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
255
  ```json
@@ -265,18 +263,18 @@ You can finetune this model on your own dataset.
265
  #### LangCache Sentence Pairs (all)
266
 
267
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
268
- * Size: 6,780 evaluation samples
269
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
270
  * Approximate statistics based on the first 1000 samples:
271
  | | anchor | positive | negative |
272
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
273
  | type | string | string | string |
274
- | details | <ul><li>min: 8 tokens</li><li>mean: 26.28 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.27 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.25 tokens</li><li>max: 47 tokens</li></ul> |
275
  * Samples:
276
  | anchor | positive | negative |
277
  |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
278
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>This marine species occurs in the eastern Indian Ocean and before the Maldives and New Caledonia .</code> |
279
- | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>Both young people burn with love really , for both , but without being able to say it to himself , admitting him always .</code> |
280
  | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
281
  * Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
282
  ```json
 
12
  - retrieval
13
  - reranking
14
  - generated_from_trainer
15
+ - dataset_size:9233417
16
  - loss:ArcFaceInBatchLoss
17
  base_model: sentence-transformers/all-MiniLM-L6-v2
18
  widget:
19
+ - source_sentence: Hayley Vaughan portrayed Ripa on the ABC daytime soap opera , ``
20
+ All My Children `` , between 1990 and 2002 .
21
  sentences:
22
+ - Traxxpad is a music application for Sony 's PlayStation Portable published by
23
+ Definitive Studios and developed by Eidos Interactive .
24
+ - Between 1990 and 2002 , Hayley Vaughan Ripa portrayed in the ABC soap opera ``
25
+ All My Children `` .
26
+ - Between 1990 and 2002 , Ripa Hayley portrayed Vaughan in the ABC soap opera ``
27
+ All My Children `` .
28
+ - source_sentence: Olivella monilifera is a species of dwarf sea snail , small gastropod
29
+ mollusk in the family Olivellidae , the marine olives .
30
  sentences:
31
+ - Olivella monilifera is a species of the dwarf - sea snail , small gastropod mollusk
32
+ in the Olivellidae family , the marine olives .
33
+ - He was cut by the Browns after being signed by the Bills in 2013 . He was later
34
+ released .
35
+ - Olivella monilifera is a kind of sea snail , marine gastropod mollusk in the Olivellidae
36
+ family , the dwarf olives .
37
+ - source_sentence: Hayashi said that Mackey `` is a sort of `` of the original model
38
+ for Tenchi .
39
  sentences:
40
+ - In the summer of 2009 , Ellick shot a documentary about Malala Yousafzai .
41
+ - Hayashi said that Mackey is `` sort of `` the original model for Tenchi .
42
+ - Mackey said that Hayashi is `` sort of `` the original model for Tenchi .
43
+ - source_sentence: Much of the film was shot on location in Los Angeles and in nearby
44
+ Burbank and Glendale .
 
 
 
45
  sentences:
46
+ - Much of the film was shot on location in Los Angeles and in nearby Burbank and
47
+ Glendale .
48
+ - Much of the film was shot on site in Burbank and Glendale and in the nearby Los
49
+ Angeles .
50
+ - Traxxpad is a music application for the Sony PlayStation Portable developed by
51
+ the Definitive Studios and published by Eidos Interactive .
52
+ - source_sentence: According to him , the earth is the carrier of his artistic work
53
+ , which is only integrated into the creative process by minimal changes .
 
54
  sentences:
55
+ - National players are Bold players .
56
+ - According to him , earth is the carrier of his artistic work being integrated
57
+ into the creative process only by minimal changes .
58
+ - According to him , earth is the carrier of his creative work being integrated
59
+ into the artistic process only by minimal changes .
 
60
  datasets:
61
  - redis/langcache-sentencepairs-v2
62
  pipeline_tag: sentence-similarity
 
157
  model = SentenceTransformer("redis/langcache-embed-experimental")
158
  # Run inference
159
  sentences = [
160
+ 'According to him , the earth is the carrier of his artistic work , which is only integrated into the creative process by minimal changes .',
161
+ 'According to him , earth is the carrier of his artistic work being integrated into the creative process only by minimal changes .',
162
+ 'According to him , earth is the carrier of his creative work being integrated into the artistic process only by minimal changes .',
163
  ]
164
  embeddings = model.encode(sentences)
165
  print(embeddings.shape)
 
168
  # Get the similarity scores for the embeddings
169
  similarities = model.similarity(embeddings, embeddings)
170
  print(similarities)
171
+ # tensor([[1.0000, 0.9844, 0.9844],
172
+ # [0.9844, 1.0000, 1.0000],
173
+ # [0.9844, 1.0000, 1.0078]], dtype=torch.bfloat16)
174
  ```
175
 
176
  <!--
 
236
  #### LangCache Sentence Pairs (all)
237
 
238
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
239
+ * Size: 126,938 training samples
240
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
241
  * Approximate statistics based on the first 1000 samples:
242
  | | anchor | positive | negative |
243
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
244
  | type | string | string | string |
245
+ | details | <ul><li>min: 8 tokens</li><li>mean: 26.28 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.28 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 25.69 tokens</li><li>max: 65 tokens</li></ul> |
246
  * Samples:
247
  | anchor | positive | negative |
248
  |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
249
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
250
+ | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
251
  | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
252
  * Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
253
  ```json
 
263
  #### LangCache Sentence Pairs (all)
264
 
265
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
266
+ * Size: 126,938 evaluation samples
267
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
268
  * Approximate statistics based on the first 1000 samples:
269
  | | anchor | positive | negative |
270
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
271
  | type | string | string | string |
272
+ | details | <ul><li>min: 8 tokens</li><li>mean: 26.28 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 26.28 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 25.69 tokens</li><li>max: 65 tokens</li></ul> |
273
  * Samples:
274
  | anchor | positive | negative |
275
  |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|
276
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>how can I get financial freedom as soon as possible?</code> |
277
+ | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The older Punts are still very much in existence today and race in the same fleets as the newer boats .</code> |
278
  | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> |
279
  * Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
280
  ```json
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e8afa2d5537827c6d1a9fcda9f205aa3101495b1a7687a881a257ed4227c38a1
3
  size 45437864
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d7e72835b966eaeabd3532bf9069d0626fc4f4ef5fef0f6eac90c7402f42d6f
3
  size 45437864