| For example, in the above diagram, to return the feature map from the first stage of the Swin backbone, you can set out_indices=(1,): | |
| from transformers import AutoImageProcessor, AutoBackbone | |
| import torch | |
| from PIL import Image | |
| import requests | |
| url = "http://images.cocodataset.org/val2017/000000039769.jpg" | |
| image = Image.open(requests.get(url, stream=True).raw) | |
| processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224") | |
| model = AutoBackbone.from_pretrained("microsoft/swin-tiny-patch4-window7-224", out_indices=(1,)) | |
| inputs = processor(image, return_tensors="pt") | |
| outputs = model(**inputs) | |
| feature_maps = outputs.feature_maps | |
| Now you can access the feature_maps object from the first stage of the backbone: | |
| list(feature_maps[0].shape) | |
| [1, 96, 56, 56] | |
| AutoFeatureExtractor | |
| For audio tasks, a feature extractor processes the audio signal the correct input format. |