| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						language: | 
					
					
						
						| 
							 | 
						- fa | 
					
					
						
						| 
							 | 
						library_name: hezar | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- image-to-text | 
					
					
						
						| 
							 | 
						- hezar | 
					
					
						
						| 
							 | 
						metrics: | 
					
					
						
						| 
							 | 
						- wer | 
					
					
						
						| 
							 | 
						pipeline_tag: image-to-text | 
					
					
						
						| 
							 | 
						datasets: | 
					
					
						
						| 
							 | 
						- hezarai/flickr30k-fa | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						A Persian image captioning model constructed from a ViT + RoBERTa architecture trained on [flickr30k-fa](https://www.kaggle.com/datasets/sajjadayobi360/flickrfa) (created by Sajjad Ayoubi). | 
					
					
						
						| 
							 | 
						The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (RoBERTa) was initialized  | 
					
					
						
						| 
							 | 
						from https://huggingface.co/HooshvareLab/roberta-fa-zwnj-base . | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Usage | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						pip install hezar | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						from hezar.models import Model | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model = Model.load("hezarai/vit-roberta-fa-image-captioning-flickr30k") | 
					
					
						
						| 
							 | 
						captions = model.predict("example_image.jpg") | 
					
					
						
						| 
							 | 
						print(captions) | 
					
					
						
						| 
							 | 
						``` |