FIM-SE
					Collection
				
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
					β’ 
				5 items
				β’ 
				Updated
					
				
π Paper β’ π Repo β’ π€ Models
FIM-SE stands for Fill-In-the-Middle with both Starting and Ending character constraints. The proposed method addresses character-level infilling tasks by utilizing a line-level format to avoid predicting any sub-token in inference.
| Model | Checkpoint | Size | License | 
|---|---|---|---|
| FIM-SE-CL-7B | π€ HF Link | 7B | Llama2 | 
| FIM-SE-CL-34B | π€ HF Link | 13B | Llama2 | 
| FIM-SE-SC-1B | π€ HF Link | 1B | StarCoder | 
| FIM-SE-SC-15B | π€ HF Link | 15B | StarCoder | 
As shown in the figure, the prompt is organized as
<PRE>R-Prefix<SUF>R-Suffix<START>L-Prefix<END>F-Suffix<MID>
Please refer to our GitHub Repo for more technical details.
If you find this repo useful for your research, please kindly cite our paper:
@misc{ren2024empowering,
    title={Empowering Character-level Text Infilling by Eliminating Sub-Tokens}, 
    author={Houxing Ren and Mingjie Zhan and Zhongyuan Wu and Hongsheng Li},
    year={2024},
    eprint={2405.17103},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
We thank the following amazing projects that truly inspired us: