Spaces:
Running
Running
| import streamlit as st | |
| from streamlit_extras.switch_page_button import switch_page | |
| translations = { | |
| 'en': {'title': 'DenseConnector', | |
| 'original_tweet': | |
| """ | |
| [Original tweet](https://twitter.com/mervenoyann/status/1796089181988352216) (May 30, 2024) | |
| """, | |
| 'tweet_1': | |
| """ | |
| Do we fully leverage image encoders in vision language models? 👀 | |
| A new paper built a dense connector that does it better! Let's dig in 🧶 | |
| """, | |
| 'tweet_2': | |
| """ | |
| VLMs consist of an image encoder block, a projection layer that projects image embeddings to text embedding space and then a text decoder sequentially connected 📖 | |
| This [paper](https://t.co/DPQzbj0eWm) explores using intermediate states of image encoder and not a single output 🤩 | |
| """, | |
| 'tweet_3': | |
| """ | |
| The authors explore three different ways of instantiating dense connector: sparse token integration, sparse channel integration and dense channel integration (each of them just take intermediate outputs and put them together in different ways, see below). | |
| """, | |
| 'tweet_4': | |
| """ | |
| They explore all three of them integrated to LLaVA 1.5 and found out each of the new models are superior to the original LLaVA 1.5. | |
| """, | |
| 'tweet_5': | |
| """ | |
| I tried the [model](https://huggingface.co/spaces/HuanjinYao/DenseConnector-v1.5-8B) and it seems to work very well 🥹 | |
| The authors have released various [checkpoints](https://t.co/iF8zM2qvDa) based on different decoders (Vicuna 7/13B and Llama 3-8B). | |
| """, | |
| 'ressources': | |
| """ | |
| Ressources: | |
| [Dense Connector for MLLMs](https://arxiv.org/abs/2405.13800) | |
| by Huanjin Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang (2024) | |
| [GitHub](https://github.com/HJYao00/DenseConnector) | |
| """ | |
| }, | |
| 'fr': { | |
| 'title': 'DenseConnector', | |
| 'original_tweet': | |
| """ | |
| [Tweet de base](https://twitter.com/mervenoyann/status/1796089181988352216) (en anglais) (30 mai 2024) | |
| """, | |
| 'tweet_1': | |
| """ | |
| Exploitons-nous pleinement les encodeurs d'images dans les modèles de langage/vision ? 👀 | |
| Un nouveau papier a construit un connecteur dense qui le fait mieux ! Creusons un peu 🧶 | |
| """, | |
| 'tweet_2': | |
| """ | |
| Les VLM se composent d'un bloc encodeur d'images, d'une couche de projection qui projette les enchâssements d'images dans l'espace d'enchâssement du texte, puis d'un décodeur de texte connecté séquentiellement 📖. | |
| Ce [papier](https://t.co/DPQzbj0eWm) explore l'utilisation d'états intermédiaires de l'encodeur d'images et non d'une sortie unique 🤩 | |
| """, | |
| 'tweet_3': | |
| """ | |
| Les auteurs explorent trois manières différentes d'instancier un connecteur dense : l'intégration de tokens épars, l'intégration de canaux épars et l'intégration de canaux denses (chacune d'entre elles prend simplement des sorties intermédiaires et les rassemble de différentes manières, voir ci-dessous). | |
| """, | |
| 'tweet_4': | |
| """ | |
| Ils ont exploré les trois modèles intégrés à LLaVA 1.5 et ont constaté que chacun des nouveaux modèles est supérieur au LLaVA 1.5 original. | |
| """, | |
| 'tweet_5': | |
| """ | |
| J'ai essayé le [modèle](https://huggingface.co/spaces/HuanjinYao/DenseConnector-v1.5-8B) et il semble fonctionner très bien 🥹 | |
| Les auteurs ont publié plusieurs [checkpoints](https://t.co/iF8zM2qvDa) basés sur différents décodeurs (Vicuna 7/13B et Llama 3-8B). | |
| """, | |
| 'ressources': | |
| """ | |
| Ressources : | |
| [Dense Connector for MLLMs](https://arxiv.org/abs/2405.13800) | |
| de Huanjin Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang (2024) | |
| [GitHub](https://github.com/HJYao00/DenseConnector) | |
| """ | |
| } | |
| } | |
| def language_selector(): | |
| languages = {'EN': '🇬🇧', 'FR': '🇫🇷'} | |
| selected_lang = st.selectbox('', options=list(languages.keys()), format_func=lambda x: languages[x], key='lang_selector') | |
| return 'en' if selected_lang == 'EN' else 'fr' | |
| left_column, right_column = st.columns([5, 1]) | |
| # Add a selector to the right column | |
| with right_column: | |
| lang = language_selector() | |
| # Add a title to the left column | |
| with left_column: | |
| st.title(translations[lang]["title"]) | |
| st.success(translations[lang]["original_tweet"], icon="ℹ️") | |
| st.markdown(""" """) | |
| st.markdown(translations[lang]["tweet_1"], unsafe_allow_html=True) | |
| st.markdown(""" """) | |
| st.image("pages/DenseConnector/image_1.jpg", use_container_width=True) | |
| st.markdown(""" """) | |
| st.markdown(translations[lang]["tweet_2"], unsafe_allow_html=True) | |
| st.markdown(""" """) | |
| st.image("pages/DenseConnector/image_2.jpg", use_container_width=True) | |
| st.markdown(""" """) | |
| st.markdown(translations[lang]["tweet_3"], unsafe_allow_html=True) | |
| st.markdown(""" """) | |
| st.image("pages/DenseConnector/image_3.jpg", use_container_width=True) | |
| st.markdown(""" """) | |
| st.markdown(translations[lang]["tweet_4"], unsafe_allow_html=True) | |
| st.markdown(""" """) | |
| st.image("pages/DenseConnector/image_4.jpg", use_container_width=True) | |
| st.markdown(""" """) | |
| st.markdown(translations[lang]["tweet_5"], unsafe_allow_html=True) | |
| st.markdown(""" """) | |
| st.image("pages/DenseConnector/image_5.jpg", use_container_width=True) | |
| st.markdown(""" """) | |
| st.info(translations[lang]["ressources"], icon="📚") | |
| st.markdown(""" """) | |
| st.markdown(""" """) | |
| st.markdown(""" """) | |
| col1, col2, col3= st.columns(3) | |
| with col1: | |
| if lang == "en": | |
| if st.button('Previous paper', use_container_width=True): | |
| switch_page("CuMo") | |
| else: | |
| if st.button('Papier précédent', use_container_width=True): | |
| switch_page("CuMo") | |
| with col2: | |
| if lang == "en": | |
| if st.button("Home", use_container_width=True): | |
| switch_page("Home") | |
| else: | |
| if st.button("Accueil", use_container_width=True): | |
| switch_page("Home") | |
| with col3: | |
| if lang == "en": | |
| if st.button("Next paper", use_container_width=True): | |
| switch_page("Depth Anything v2") | |
| else: | |
| if st.button("Papier suivant", use_container_width=True): | |
| switch_page("Depth Anything v2") | |