Spaces:

evaluate-measurement
/

text_duplicates

Sleeping

lvwerra HF Staff commited on Aug 22, 2022

Commit

df6ae6f

1 Parent(s): ac0977e

Update Space (evaluate main: eb281894)

Files changed (2) hide show

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ This measurement requires a list of strings as input:
 ### Output Values
 - **duplicate_fraction**(`float`): the fraction of duplicates in the input string(s).
-- **duplicates_list**(`list`): (optional) a list of tuples with the duplicate strings and the number of times they are repeated.
 By default, this measurement outputs a dictionary containing the fraction of duplicates in the input string(s) (`duplicate_fraction`):
   )
@@ -46,7 +46,7 @@ By default, this measurement outputs a dictionary containing the fraction of dup
 With the `list_duplicates=True` option, this measurement will also output a dictionary of tuples with duplicate strings and their counts.
 ```python
-{'duplicate_fraction': 0.33333333333333337, 'duplicates_list': {'hello sun': 2}}
 ```
 Warning: the `list_duplicates=True` function can be memory-intensive for large datasets.
@@ -69,7 +69,7 @@ Example with multiple duplicates and `list_duplicates=True`:
 >>> duplicates = evaluate.load("text_duplicates")
 >>> results = duplicates.compute(data=data, list_duplicates=True)
 >>> print(results)
-{'duplicate_fraction': 0.4, 'duplicates_list': {'hello sun': 2, 'foo bar': 2}}
 ```
 ## Citation(s)

 ### Output Values
 - **duplicate_fraction**(`float`): the fraction of duplicates in the input string(s).
+- **duplicates_dict**(`list`): (optional) a list of tuples with the duplicate strings and the number of times they are repeated.
 By default, this measurement outputs a dictionary containing the fraction of duplicates in the input string(s) (`duplicate_fraction`):
   )
 With the `list_duplicates=True` option, this measurement will also output a dictionary of tuples with duplicate strings and their counts.
 ```python
+{'duplicate_fraction': 0.33333333333333337, 'duplicates_dict': {'hello sun': 2}}
 ```
 Warning: the `list_duplicates=True` function can be memory-intensive for large datasets.
 >>> duplicates = evaluate.load("text_duplicates")
 >>> results = duplicates.compute(data=data, list_duplicates=True)
 >>> print(results)
+{'duplicate_fraction': 0.4, 'duplicates_dict': {'hello sun': 2, 'foo bar': 2}}
 ```
 ## Citation(s)

requirements.txt CHANGED Viewed

	@@ -1 +1 @@
1	- git+https://github.com/huggingface/evaluate.git@~~f4aba41fdabe7f42cf6c7dcd5bfab6dd83adfd30~~


1	+ git+https://github.com/huggingface/evaluate.git@eb281894ce23f68902c4b12040dd5b1a9cb32f90