Update Space (evaluate main: eb281894)
Browse files- README.md +3 -3
- requirements.txt +1 -1
README.md
CHANGED
|
@@ -35,7 +35,7 @@ This measurement requires a list of strings as input:
|
|
| 35 |
|
| 36 |
### Output Values
|
| 37 |
- **duplicate_fraction**(`float`): the fraction of duplicates in the input string(s).
|
| 38 |
-
- **
|
| 39 |
|
| 40 |
By default, this measurement outputs a dictionary containing the fraction of duplicates in the input string(s) (`duplicate_fraction`):
|
| 41 |
)
|
|
@@ -46,7 +46,7 @@ By default, this measurement outputs a dictionary containing the fraction of dup
|
|
| 46 |
With the `list_duplicates=True` option, this measurement will also output a dictionary of tuples with duplicate strings and their counts.
|
| 47 |
|
| 48 |
```python
|
| 49 |
-
{'duplicate_fraction': 0.33333333333333337, '
|
| 50 |
```
|
| 51 |
|
| 52 |
Warning: the `list_duplicates=True` function can be memory-intensive for large datasets.
|
|
@@ -69,7 +69,7 @@ Example with multiple duplicates and `list_duplicates=True`:
|
|
| 69 |
>>> duplicates = evaluate.load("text_duplicates")
|
| 70 |
>>> results = duplicates.compute(data=data, list_duplicates=True)
|
| 71 |
>>> print(results)
|
| 72 |
-
{'duplicate_fraction': 0.4, '
|
| 73 |
```
|
| 74 |
|
| 75 |
## Citation(s)
|
|
|
|
| 35 |
|
| 36 |
### Output Values
|
| 37 |
- **duplicate_fraction**(`float`): the fraction of duplicates in the input string(s).
|
| 38 |
+
- **duplicates_dict**(`list`): (optional) a list of tuples with the duplicate strings and the number of times they are repeated.
|
| 39 |
|
| 40 |
By default, this measurement outputs a dictionary containing the fraction of duplicates in the input string(s) (`duplicate_fraction`):
|
| 41 |
)
|
|
|
|
| 46 |
With the `list_duplicates=True` option, this measurement will also output a dictionary of tuples with duplicate strings and their counts.
|
| 47 |
|
| 48 |
```python
|
| 49 |
+
{'duplicate_fraction': 0.33333333333333337, 'duplicates_dict': {'hello sun': 2}}
|
| 50 |
```
|
| 51 |
|
| 52 |
Warning: the `list_duplicates=True` function can be memory-intensive for large datasets.
|
|
|
|
| 69 |
>>> duplicates = evaluate.load("text_duplicates")
|
| 70 |
>>> results = duplicates.compute(data=data, list_duplicates=True)
|
| 71 |
>>> print(results)
|
| 72 |
+
{'duplicate_fraction': 0.4, 'duplicates_dict': {'hello sun': 2, 'foo bar': 2}}
|
| 73 |
```
|
| 74 |
|
| 75 |
## Citation(s)
|
requirements.txt
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
git+https://github.com/huggingface/evaluate.git@
|
|
|
|
| 1 |
+
git+https://github.com/huggingface/evaluate.git@eb281894ce23f68902c4b12040dd5b1a9cb32f90
|