Multi Model Models. from https://huggingface.co/collections/merve/mit-talk-31-10-papers-671f6a16e156f77739820c89 (MIT Talk 31/10 Papers)
- NVLM: Describes images using vision-language integration.
- BRAVE: Detects multiple objects in cluttered scenes.
- Mini-Gemini: Answers questions about images.
- Unified OCR: Extracts text from diverse images.
- EVA-CLIP: Matches text and images.
- BLIP: Efficient text/image retrieval.
- LLM-in-Vision: Describes complex visual scenes.
- Efficient Retrieval Model: Fast document search.