Spaces:

xuxw98
/

TAPA

Runtime error

App Files Files Community

TAPA / howto /unstructured_dataset.md

xuxw98's picture

Upload 58 files

7d52396 over 2 years ago

|

history blame contribute delete

828 Bytes

	# Finetuning on an unstructured dataset

	While most scripts were made to finetune on instruction datasets, it is possible to finetune on any dataset. This is useful for experimentation while not being as expensive as training a full model.

	This guide is only to prepare the finetuning, as either LoRA or Adapter-v1 methods support this dataset type!

	## Preparation

	1. Gather your text into an input file named `input.txt`
	2. Divide the data into training and validation sets using the following script:

	```bash
	python scripts/prepare_any_text.py
	```

	3. Modify relevant scripts for your finetuning method under `finetune/` and `evaluate/`, setting the `instruction_tuning` variable to `False`

	And then you're set! Proceed to run the [LoRA guide](./finetune_lora.md) or [Adapter v1 guide](./finetune_adapter.md).