Earth Observation Foundation Models10 people

OlmoEarth v1.1: A more efficient family of OlmoEarth models

Ai2

Earth Observation Foundation Models · 2605.20804 · 2026-05-20

Earth Observation Foundation Models2605.20804
Image Generation / Vision Models30 people

Qwen-Image-VAE-2.0 Technical Report

Alibaba Qwen

Image Generation / Vision Models · 2605.13565 · 2026-05-13

Image Generation / Vision Models2605.13565
Image Generation / Vision Models75 people

Qwen-Image-2.0 Technical Report

Alibaba Qwen

Image Generation / Vision Models · 2605.10730 · 2026-05-11

Image Generation / Vision Models2605.10730
Mixture-of-Experts Language Models3 people

EMO: Pretraining Mixture of Experts for Emergent Modularity

Ai2

Mixture-of-Experts Language Models · 2605.06663 · 2026-05-07

Mixture-of-Experts Language Models2605.06663
Vision-Language-Action Models29 people

MolmoAct2: Action Reasoning Models for Real-world Deployment

Ai2

Vision-Language-Action Models · 2605.02881 · 2026-05-04

Vision-Language-Action Models2605.02881
Reasoning Models11 people

Nemotron 3 Super: Open, efficient mixture-of-experts hybrid mamba-transformer model for agentic reasoning

NVIDIA

Reasoning Models · 2604.12374 · 2026-04-14

Reasoning Models2604.12374
Medical Multimodal Models42 people

MedGemma 1.5 Technical Report

Google Gemini

Medical Multimodal Models · 2604.05081 · 2026-04-06

Medical Multimodal Models2604.05081
Large Language Models22 people

Olmo Hybrid: From Theory to Practice and Back

Ai2

Large Language Models · 2604.03444 · 2026-04-03

Large Language Models2604.03444
Reasoning Models17 people

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

NVIDIA

Reasoning Models · 2603.19220 · 2026-03-19

Reasoning Models2603.19220
OCR / Document Intelligence Models23 people

GLM-OCR Technical Report

Z.ai

OCR / Document Intelligence Models · 2603.10910 · 2026-03-11

OCR / Document Intelligence Models2603.10910
Large Language Models12 people

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

Microsoft

Large Language Models · 2603.05168 · 2026-03-05

Large Language Models2603.05168
Reasoning Models6 people

Phi-4-reasoning-vision-15B Technical Report

Microsoft

Reasoning Models · 2603.03975 · 2026-03-04

Reasoning Models2603.03975
Large Language Models190 people

GLM-5: Thinking, Coding, and Agentic Intelligence

Z.ai

Large Language Models · 2602.15763 · 2026-02-17

Large Language Models2602.15763
Retrieval Embedding Models12 people

Nemotron ColEmbed V2: Top-Performing Late Interaction Embedding Models for Visual Document Retrieval

NVIDIA

Retrieval Embedding Models · 2602.03992 · 2026-02-03

Retrieval Embedding Models2602.03992
Multimodal Agentic Models324 people

Kimi K2.5: Visual Agentic Intelligence

Moonshot AI

Multimodal Agentic Models · 2602.02276 · 2026-02-02

Multimodal Agentic Models2602.02276
Speech and Audio Models13 people

Qwen3-ASR Technical Report

Alibaba Qwen

Speech and Audio Models · 2601.21337 · 2026-01-29

Speech and Audio Models2601.21337
Speech and Audio Models16 people

Qwen3-TTS Technical Report

Alibaba Qwen

Speech and Audio Models · 2601.15621 · 2026-01-22

Speech and Audio Models2601.15621
Translation Models21 people

TranslateGemma Technical Report

Google Gemini

Translation Models · 2601.09012 · 2026-01-13

Translation Models2601.09012
Alignment and Safety28 people

Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Anthropic

Alignment and Safety · 2601.04603 · 2026-01-08

Alignment and Safety2601.04603
Multimodal Agentic Models11 people

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Microsoft

Multimodal Agentic Models · 2512.22047 · 2025-12-26

Multimodal Agentic Models2512.22047
Large Language Models6 people

NVIDIA Nemotron 3: Efficient and Open Intelligence

NVIDIA

Large Language Models · 2512.20856 · 2025-12-24

Large Language Models2512.20856
Reasoning Models11 people

Nemotron 3 nano: Open, efficient mixture-of-experts hybrid mamba-transformer model for agentic reasoning

NVIDIA

Reasoning Models · 2512.20848 · 2025-12-23

Reasoning Models2512.20848
Reasoning Models7 people

Seed-Prover-1.5: Stronger Training-Time and Test-Time Scaling for Neural Theorem Proving

ByteDance Seed

Reasoning Models · 2512.17260 · 2025-12-19

Reasoning Models2512.17260
Mathematical Reasoning Models10 people

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

NVIDIA

Mathematical Reasoning Models · 2512.15489 · 2025-12-17

Mathematical Reasoning Models2512.15489
Large Language Models67 people

Olmo 3

Ai2

Large Language Models · 2512.13961 · 2025-12-15

Large Language Models2512.13961
Foundation Models33 people

LFM2 Technical Report

Liquid AI

Foundation Models · 2511.23404 · 2025-12-01

Foundation Models2511.23404
Large Language Models15 people

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

NVIDIA

Large Language Models · 2511.18890 · 2025-11-24

Large Language Models2511.18890
Speech and Audio Models14 people

Step-Audio-EditX Technical Report

Stepfun

Speech and Audio Models · 2511.03601 · 2025-11-05

Speech and Audio Models2511.03601
Code Language Models9 people

CWM: An Open-Weights LLM for Research on Code Generation with World Models

Meta AI

Code Language Models · 2509.12054 · 2025-09-24

Code Language Models2509.12054
Text Embedding Models6 people

EmbeddingGemma: Open Models for Text Similarity Search

Google Gemini

Text Embedding Models · 2509.20354 · 2025-09-24

Text Embedding Models2509.20354
Multimodal Models16 people

Qwen3-Omni Technical Report

Alibaba Qwen

Multimodal Models · 2509.17765 · 2025-09-22

Multimodal Models2509.17765
Speech and Audio Models5 people

Continuous Audio Language Models

Kyutai

Speech and Audio Models · 2509.06926 · 2025-09-08

Speech and Audio Models2509.06926
Reasoning Models9 people

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

NVIDIA

Reasoning Models · 2508.14444 · 2025-08-20

Reasoning Models2508.14444
Agentic Language Models7 people

xLAM-2 Technical Report

Salesforce AI Research

Agentic Language Models · 2508.14935 · 2025-08-20

Agentic Language Models2508.14935
Vision-Language-Action Models19 people

MolmoAct: Action Reasoning Models that can Reason in Space

Ai2

Vision-Language-Action Models · 2508.07917 · 2025-08-11

Vision-Language-Action Models2508.07917
Language Models96 people

GLM-4.5: Agentic, Reasoning, and Coding Foundation Models

Z.ai

Language Models · 2508.06471 · 2025-08-08

Language Models2508.06471
Reasoning Models9 people

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving with Tree Search and Reinforcement Learning

ByteDance Seed

Reasoning Models · 2507.23726 · 2025-07-30

Reasoning Models2507.23726
Reasoning Models18 people

Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning

ByteDance Seed

Reasoning Models · 2507.19849 · 2025-07-24

Reasoning Models2507.19849
Audio Language Models10 people

Step-Audio 2: Cascaded Multimodal Large Language Models with Versatile Speech Capabilities

Stepfun

Audio Language Models · 2507.16632 · 2025-07-22

Audio Language Models2507.16632
Speech Language Models100 people

Voxtral Technical Report

Mistral AI

Speech Language Models · 2507.13264 · 2025-07-17

Speech Language Models2507.13264
Multimodal Language Models94 people

Apple Intelligence Foundation Language Models: Tech Report 2025

Apple

Multimodal Language Models · 2507.13575 · 2025-07-16

Multimodal Language Models2507.13575
Large Language Models23 people

FlexOlmo: Open Language Models for Flexible Data Use

Ai2

Large Language Models · 2507.07024 · 2025-07-09

Large Language Models2507.07024
Biomedical Language Models3 people

TxGemma: Open Therapeutic Language Models

Google Gemini

Biomedical Language Models · 2507.07023 · 2025-07-09

Biomedical Language Models2507.07023
Model Safety / System Cards7 people

Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework

Amazon

Model Safety / System Cards · 2507.06260 · 2025-07-07

Model Safety / System Cards2507.06260
Medical Multimodal Models62 people

MedGemma Technical Report

Google Gemini

Medical Multimodal Models · 2507.05201 · 2025-07-07

Medical Multimodal Models2507.05201
Multimodal Models12 people

GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Z.ai

Multimodal Models · 2507.01006 · 2025-07-01

Multimodal Models2507.01006
Language Models3 people

ERNIE 4.5 Tiny Technical Report

Baidu

Language Models · 2025-06-30

Language Models
Multimodal Language Models7 people

ERNIE 4.5 Technical Report

Baidu

Multimodal Language Models · 2025-06-30

Multimodal Language Models
Reasoning Large Language Models144 people

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

MiniMax

Reasoning Large Language Models · 2506.13585 · 2025-06-16

Reasoning Large Language Models2506.13585
Reasoning Models10 people

Magistral: Efficient Training of Small Language Models for Reasoning

Mistral AI

Reasoning Models · 2506.10910 · 2025-06-12

Reasoning Models2506.10910
Text Embeddings and Retrieval12 people

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Alibaba Qwen

Text Embeddings and Retrieval · 2506.05176 · 2025-06-05

Text Embeddings and Retrieval2506.05176
Code Models26 people

Seed-Coder: Let the Code Model Curate Data for Itself

ByteDance Seed

Code Models · 2506.03524 · 2025-06-04

Code Models2506.03524
Diffusion Language Models14 people

On Gemini Diffusion

Google Gemini

Diffusion Language Models · 2505.20099 · 2025-05-27

Diffusion Language Models2505.20099
Multimodal Large Language Models96 people

Gemma 3n Technical Report

Google Gemini

Multimodal Large Language Models · 2025-05-20

Multimodal Large Language Models
Speech Language Models37 people

Amazon Nova Sonic Technical Report

Amazon

Speech Language Models · 2505.11298 · 2025-05-15

Speech Language Models2505.11298
Large Language Models60 people

Qwen3 Technical Report

Alibaba Qwen

Large Language Models · 2505.09388 · 2025-05-14

Large Language Models2505.09388
Multimodal Language Models14 people

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Cohere

Multimodal Language Models · 2505.08751 · 2025-05-13

Multimodal Language Models2505.08751
Speech Language Models10 people

MiniMax-Speech: Intrinsic Zero-Shot Speech Understanding for Advanced Foundation Models

MiniMax

Speech Language Models · 2505.07916 · 2025-05-12

Speech Language Models2505.07916
Vision-Language Models21 people

Seed1.5-VL Technical Report

ByteDance Seed

Vision-Language Models · 2505.07062 · 2025-05-11

Vision-Language Models2505.07062
Mathematical Reasoning Models6 people

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning and Monte-Carlo Tree Search with Proof Assistant Feedback

DeepSeek

Mathematical Reasoning Models · 2504.21801 · 2025-04-30

Mathematical Reasoning Models2504.21801
Large Language Models61 people

Amazon Nova Premier Technical Report

Amazon

Large Language Models · 2025-04-30

Large Language Models
Reasoning Models18 people

Phi-4-mini-reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Microsoft

Reasoning Models · 2504.21233 · 2025-04-29

Reasoning Models2504.21233
Reasoning Models8 people

Phi-4-reasoning Technical Report

Microsoft

Reasoning Models · 2504.21318 · 2025-04-29

Reasoning Models2504.21318
Large Language Models8 people

BitNet b1.58 2B4T Technical Report

Microsoft

Large Language Models · 2504.12285 · 2025-04-16

Large Language Models2504.12285
Reasoning Models13 people

Nemotron-CrossThink: Efficient Knowledge Distillation of Long Chain-of-Thought Reasoning

NVIDIA

Reasoning Models · 2504.13941 · 2025-04-15

Reasoning Models2504.13941
Reasoning Models10 people

GLM-Z1-Rumination: An Open Frontier-Class Reasoning Model Through Test-Time Scaling

Z.ai

Reasoning Models · 2025-04-15

Reasoning Models
Vision-Language Models50 people

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Shanghai AI Laboratory

Vision-Language Models · 2504.10479 · 2025-04-14

Vision-Language Models2504.10479
Reasoning Models268 people

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

ByteDance Seed

Reasoning Models · 2504.13914 · 2025-04-10

Reasoning Models2504.13914
Vision-Language Models94 people

Kimi-VL Technical Report

Moonshot AI

Vision-Language Models · 2504.07491 · 2025-04-10

Vision-Language Models2504.07491
Large Language Models13 people

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

NVIDIA

Large Language Models · 2504.03624 · 2025-04-04

Large Language Models2504.03624
Reasoning Models11 people

Hunyuan-T1: Scaling Up Test-Time Compute with Open-Source Reinforcement Learning

Tencent Hunyuan

Reasoning Models · 2504.02234 · 2025-04-03

Reasoning Models2504.02234
Safety and Moderation Models17 people

ShieldGemma 2: Robust and Tractable Image Content Moderation

Google Gemini

Safety and Moderation Models · 2504.01081 · 2025-04-01

Safety and Moderation Models2504.01081
Large Language Models157 people

Command A: An Enterprise-Ready Large Language Model

Cohere

Large Language Models · 2504.00698 · 2025-04-01

Large Language Models2504.00698
Large Language Models18 people

Mistral Small 3.1 Technical Report

Mistral AI

Large Language Models · 2503.23335 · 2025-03-31

Large Language Models2503.23335
Interpretability30 people

Tracing the thoughts of a large language model

Anthropic

Interpretability · 2025-03-27

Interpretability
Interpretability13 people

On the Biology of a Large Language Model

Anthropic

Interpretability · 2025-03-27

Interpretability
Robotics8 people

Gemini Robotics-ER: Transforming Robotic Embodiment

Google Gemini

Robotics · 2503.20031 · 2025-03-27

Robotics2503.20031
Reasoning Models5 people

QwQ-32B: Embracing the Power of Reinforcement Learning

Alibaba Qwen

Reasoning Models · 2503.20735 · 2025-03-27

Reasoning Models2503.20735
Robotics Multimodal Models107 people

Gemini Robotics: Bringing AI into the Physical World

Google Gemini

Robotics Multimodal Models · 2503.20020 · 2025-03-27

Robotics Multimodal Models2503.20020
Multimodal Large Language Models12 people

Gemma 3 Technical Report

Google Gemini

Multimodal Large Language Models · 2503.19786 · 2025-03-25

Multimodal Large Language Models2503.19786
Multimodal Models10 people

Qwen2.5-Omni Technical Report

Alibaba Qwen

Multimodal Models · 2503.20215 · 2025-03-23

Multimodal Models2503.20215
Reasoning Models5 people

Falcon-H1: A Family of Hybrid-Head Language Models for Efficient Reasoning

Technology Innovation Institute

Reasoning Models · 2503.16419 · 2025-03-20

Reasoning Models2503.16419
Multimodal Language Models785 people

The Amazon Nova family of models: Technical report and model card

Amazon

Multimodal Language Models · 2506.12103 · 2025-03-17

Multimodal Language Models2506.12103
Reasoning Models8 people

EXAONE Deep: Reasoning Enhanced Language Models

LG AI Research

Reasoning Models · 2503.12524 · 2025-03-16

Reasoning Models2503.12524
Reasoning Models10 people

ERNIE-X1 Technical Report

Baidu

Reasoning Models · 2025-03-16

Reasoning Models
Alignment and Safety13 people

Auditing language models for hidden objectives

Anthropic

Alignment and Safety · 2503.10965 · 2025-03-14

Alignment and Safety2503.10965
Text Embedding Models6 people

Gemini Embedding: Generalizable Embeddings From Gemini

Google Gemini

Text Embedding Models · 2503.07891 · 2025-03-11

Text Embedding Models2503.07891
Language Models15 people

Phi-4 Technical Report

Microsoft

Language Models · 2503.01743 · 2025-03-03

Language Models2503.01743
Vision-Language Models27 people

Qwen2.5-VL Technical Report

Alibaba Qwen

Vision-Language Models · 2502.13923 · 2025-02-19

Vision-Language Models2502.13923
Medical Language Models11 people

Baichuan-M1: Pushing the Medical Capability of Large Language Models

Baichuan

Medical Language Models · 2502.12671 · 2025-02-18

Medical Language Models2502.12671
Multimodal Agent Models14 people

Magma: A Foundation Model for Multimodal AI Agents

Microsoft

Multimodal Agent Models · 2502.13130 · 2025-02-18

Multimodal Agent Models2502.13130
Alignment and Safety13 people

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Anthropic

Alignment and Safety · 2501.18837 · 2025-01-31

Alignment and Safety2501.18837
Multimodal Large Language Models13 people

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

DeepSeek

Multimodal Large Language Models · 2501.17811 · 2025-01-29

Multimodal Large Language Models2501.17811
Language Models28 people

Qwen2.5-1M Technical Report

Alibaba Qwen

Language Models · 2501.15383 · 2025-01-26

Language Models2501.15383
Code Language Models9 people

Scaling Granite Code Models to 128K Context

IBM Research

Code Language Models · 2501.15305 · 2025-01-25

Code Language Models2501.15305
Large Language Models10 people

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek

Large Language Models · 2501.12948 · 2025-01-22

Large Language Models2501.12948
Multimodal Agentic Models14 people

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

ByteDance Seed

Multimodal Agentic Models · 2501.12326 · 2025-01-21

Multimodal Agentic Models2501.12326
Large Language Models11 people

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Moonshot AI

Large Language Models · 2501.12599 · 2025-01-21

Large Language Models2501.12599
Large Language Models19 people

MiniMax-Text-01

MiniMax

Large Language Models · 2025-01-15

Large Language Models
Vision-Language Models18 people

MiniMax-VL-01

MiniMax

Vision-Language Models · 2025-01-15

Vision-Language Models
Large Language Models38 people

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax

Large Language Models · 2501.08313 · 2025-01-14

Large Language Models2501.08313
Large Language Models39 people

2 OLMo 2 Furious

Ai2

Large Language Models · 2501.00656 · 2024-12-31

Large Language Models2501.00656
Large Language Models197 people

DeepSeek-V3 Technical Report

DeepSeek

Large Language Models · 2412.19437 · 2024-12-27

Large Language Models2412.19437
Reasoning Models18 people

OpenAI o1 System Card

OpenAI

Reasoning Models · 2412.16720 · 2024-12-21

Reasoning Models2412.16720
Large Language Models42 people

Qwen2.5 Technical Report

Alibaba Qwen

Large Language Models · 2412.15115 · 2024-12-19

Large Language Models2412.15115
Alignment and Safety20 people

Alignment faking in large language models

Anthropic

Alignment and Safety · 2412.14093 · 2024-12-18

Alignment and Safety2412.14093
Vision-Language Models11 people

FastVLM: Efficient Vision Encoding for Vision Language Models

Apple

Vision-Language Models · 2412.13303 · 2024-12-17

Vision-Language Models2412.13303
Vision-Language Models13 people

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

DeepSeek

Vision-Language Models · 2412.10302 · 2024-12-12

Vision-Language Models2412.10302
Language Models8 people

Large Concept Models: Language Modeling in a Sentence Representation Space

Meta AI

Language Models · 2412.08821 · 2024-12-11

Language Models2412.08821
Large Language Models9 people

EXAONE 3.5: Series of Language Models for Real-world Use Cases

LG AI Research

Large Language Models · 2412.04862 · 2024-12-06

Large Language Models2412.04862
Multimodal Language Models35 people

NVLM: Open Frontier-Class Multimodal LLMs

NVIDIA

Multimodal Language Models · 2412.04468 · 2024-12-05

Multimodal Language Models2412.04468
Vision-Language Models13 people

PaliGemma 2: A Family of Versatile VLMs for Transfer

Google Gemini

Vision-Language Models · 2412.03555 · 2024-12-04

Vision-Language Models2412.03555
Audio Language Models12 people

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbots

Z.ai

Audio Language Models · 2412.02612 · 2024-12-04

Audio Language Models2412.02612
Text Embeddings and Retrieval4 people

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Snowflake

Text Embeddings and Retrieval · 2412.04506 · 2024-12-03

Text Embeddings and Retrieval2412.04506
Large Language Models26 people

Yi-Lightning Technical Report

01.AI

Large Language Models · 2412.01253 · 2024-12-02

Large Language Models2412.01253
LLM Post-Training23 people

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Ai2

LLM Post-Training · 2411.15124 · 2024-11-22

LLM Post-Training2411.15124
Vision-Language Models22 people

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

DeepSeek

Vision-Language Models · 2411.07975 · 2024-11-11

Vision-Language Models2411.07975
Large Language Models108 people

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Tencent Hunyuan

Large Language Models · 2411.02265 · 2024-11-04

Large Language Models2411.02265
Model Safety / System Cards419 people

GPT-4o System Card

OpenAI

Model Safety / System Cards · 2410.21276 · 2024-10-25

Model Safety / System Cards2410.21276
Vision-Language Models22 people

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

DeepSeek

Vision-Language Models · 2410.13848 · 2024-10-18

Vision-Language Models2410.13848
Multimodal Large Language Models21 people

Pixtral 12B

Mistral AI

Multimodal Large Language Models · 2410.07073 · 2024-10-09

Multimodal Large Language Models2410.07073
Large Language Models6 people

Falcon Mamba 7B: The First Competitive Attention-free 7B Language Model

Technology Innovation Institute

Large Language Models · 2410.05355 · 2024-10-07

Large Language Models2410.05355
Multimodal Agentic Models8 people

UGROUND-V1: A Fully Open Large Multimodal GUI Agent Model

OSU NLP Group

Multimodal Agentic Models · 2410.05243 · 2024-10-07

Multimodal Agentic Models2410.05243
Speech Language Models7 people

Moshi: a speech-text foundation model for real-time dialogue

Kyutai

Speech Language Models · 2410.00037 · 2024-09-30

Speech Language Models2410.00037
Multimodal Language Models23 people

MM1.5: Methods, Analysis and Insights from Multimodal LLM Fine-tuning

Apple

Multimodal Language Models · 2409.20566 · 2024-09-30

Multimodal Language Models2409.20566
Multimodal Models26 people

Emu3: Next-Token Prediction is All You Need

BIGAI

Multimodal Models · 2409.18869 · 2024-09-27

Multimodal Models2409.18869
Vision-Language Models18 people

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Ai2

Vision-Language Models · 2409.17146 · 2024-09-25

Vision-Language Models2409.17146
Reasoning and Math Models16 people

Qwen2.5-Math Technical Report

Alibaba Qwen

Reasoning and Math Models · 2409.12122 · 2024-09-18

Reasoning and Math Models2409.12122
Code Language Models8 people

Qwen2.5-Coder Technical Report

Alibaba Qwen

Code Language Models · 2409.12186 · 2024-09-18

Code Language Models2409.12186
Vision-Language Models26 people

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Alibaba Qwen

Vision-Language Models · 2409.12191 · 2024-09-18

Vision-Language Models2409.12191
Agentic Language Models8 people

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Salesforce AI Research

Agentic Language Models · 2409.03215 · 2024-09-05

Agentic Language Models2409.03215
Large Language Models24 people

OLMoE: Open Mixture-of-Experts Language Models

Ai2

Large Language Models · 2409.02060 · 2024-09-03

Large Language Models2409.02060
Language Models12 people

Jamba 1.5 Technical Report

AI21 Labs

Language Models · 2408.12570 · 2024-08-22

Language Models2408.12570
Vision-Language Models20 people

XGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Salesforce AI Research

Vision-Language Models · 2408.08872 · 2024-08-16

Vision-Language Models2408.08872
Mathematical Reasoning Models8 people

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

DeepSeek

Mathematical Reasoning Models · 2408.08152 · 2024-08-14

Mathematical Reasoning Models2408.08152
Safety and Moderation Models12 people

ShieldGemma: Generative AI Content Moderation Based on Gemma

Google Gemini

Safety and Moderation Models · 2407.21772 · 2024-07-31

Safety and Moderation Models2407.21772
Large Language Models19 people

Gemma 2: Improving Open Language Models at a Practical Size

Google Gemini

Large Language Models · 2408.00118 · 2024-07-31

Large Language Models2408.00118
Large Language Models69 people

The Llama 3 Herd of Models

Meta AI

Large Language Models · 2407.21783 · 2024-07-31

Large Language Models2407.21783
Multimodal Language Models149 people

Apple Intelligence Foundation Language Models

Apple

Multimodal Language Models · 2407.21075 · 2024-07-29

Multimodal Language Models2407.21075
Language Models17 people

Falcon2-11B Technical Report

Technology Innovation Institute

Language Models · 2407.14885 · 2024-07-20

Language Models2407.14885
Audio Language Models26 people

Qwen2-Audio Technical Report

Alibaba Qwen

Audio Language Models · 2407.10759 · 2024-07-14

Audio Language Models2407.10759
Vision-Language Models14 people

PaliGemma: A versatile 3B VLM for transfer

Google Gemini

Vision-Language Models · 2407.07726 · 2024-07-10

Vision-Language Models2407.07726
Vision-Language Models27 people

InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Shanghai AI Laboratory

Vision-Language Models · 2407.03320 · 2024-07-03

Vision-Language Models2407.03320
Large Language Models19 people

Open Instruct: A Simple Method for Aligning Language Models with Human Preferences

Ai2

Large Language Models · 2406.18405 · 2024-06-26

Large Language Models2406.18405
Large Language Models165 people

Nemotron-4 340B Technical Report

NVIDIA

Large Language Models · 2406.11704 · 2024-06-17

Large Language Models2406.11704
Code Language Models10 people

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek

Code Language Models · 2406.11931 · 2024-06-17

Code Language Models2406.11931
Code Language Models21 people

CodeGemma: Open Code Models Based on Gemma

Google Gemini

Code Language Models · 2406.11409 · 2024-06-17

Code Language Models2406.11409
Speech and Audio Models9 people

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Microsoft

Speech and Audio Models · 2406.05370 · 2024-06-08

Speech and Audio Models2406.05370
Reasoning and Math Models9 people

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

DeepSeek

Reasoning and Math Models · 2405.14333 · 2024-05-23

Reasoning and Math Models2405.14333
Multimodal Large Language Models26 people

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Meta AI

Multimodal Large Language Models · 2405.09818 · 2024-05-16

Multimodal Large Language Models2405.09818
Text Embeddings and Retrieval4 people

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

Snowflake

Text Embeddings and Retrieval · 2405.05374 · 2024-05-08

Text Embeddings and Retrieval2405.05374
Code Language Models31 people

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

IBM Research

Code Language Models · 2405.04324 · 2024-05-07

Code Language Models2405.04324
Large Language Models156 people

DeepSeek-V2 Technical Report

DeepSeek

Large Language Models · 2405.04434 · 2024-05-07

Large Language Models2405.04434
Medical Multimodal Models47 people

Advancing Multimodal Medical Capabilities of Gemini

Google Gemini

Medical Multimodal Models · 2405.03162 · 2024-05-06

Medical Multimodal Models2405.03162
Large Language Models24 people

Snowflake Arctic: An Enterprise LLM

Snowflake

Large Language Models · 2405.00492 · 2024-04-30

Large Language Models2405.00492
Multimodal Models9 people

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

ByteDance Seed

Multimodal Models · 2404.14396 · 2024-04-22

Multimodal Models2404.14396
Large Language Models11 people

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Apple

Large Language Models · 2404.14619 · 2024-04-22

Large Language Models2404.14619
Language Models7 people

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Microsoft

Language Models · 2404.14219 · 2024-04-22

Language Models2404.14219
Large Language Models32 people

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Google Gemini

Large Language Models · 2404.07839 · 2024-04-11

Large Language Models2404.07839
Language Models8 people

Jamba: A Hybrid Transformer-Mamba Language Model

AI21 Labs

Language Models · 2403.19887 · 2024-03-28

Language Models2403.19887
Large Language Models28 people

InternLM2 Technical Report

Shanghai AI Laboratory

Large Language Models · 2403.17297 · 2024-03-26

Large Language Models2403.17297
Multimodal Language Models31 people

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Apple

Multimodal Language Models · 2403.09611 · 2024-03-14

Multimodal Language Models2403.09611
Large Language Models22 people

Gemma: Open Models Based on Gemini Research and Technology

Google Gemini

Large Language Models · 2403.08295 · 2024-03-13

Large Language Models2403.08295
Vision-Language Models5 people

DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek

Vision-Language Models · 2403.05525 · 2024-03-08

Vision-Language Models2403.05525
Multimodal Models60 people

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Google Gemini

Multimodal Models · 2403.05530 · 2024-03-08

Multimodal Models2403.05530
Large Language Models13 people

Yi: Open Foundation Models by 01.AI

01.AI

Large Language Models · 2403.04652 · 2024-03-07

Large Language Models2403.04652
Large Language Models34 people

DBRX: A Generalist Open Source LLM

Databricks

Large Language Models · 2402.19427 · 2024-02-29

Large Language Models2402.19427
Large Language Models10 people

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Microsoft

Large Language Models · 2402.17764 · 2024-02-27

Large Language Models2402.17764
Large Language Models19 people

Nemotron-4 15B Technical Report

NVIDIA

Large Language Models · 2402.16819 · 2024-02-26

Large Language Models2402.16819
Alignment and Safety17 people

Many-shot Jailbreaking

Anthropic

Alignment and Safety · 2402.03206 · 2024-02-12

Alignment and Safety2402.03206
Speech Language Models14 people

SPIrit-LM: Interleaved Spoken and Written Language Model

Meta AI

Speech Language Models · 2402.05755 · 2024-02-09

Speech Language Models2402.05755
Text Embeddings and Retrieval6 people

Multilingual E5 Text Embeddings: A Technical Report

Microsoft

Text Embeddings and Retrieval · 2402.05672 · 2024-02-08

Text Embeddings and Retrieval2402.05672
Mathematical Reasoning Models8 people

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeek

Mathematical Reasoning Models · 2402.03300 · 2024-02-06

Mathematical Reasoning Models2402.03300
Large Language Models33 people

OLMo: Accelerating the Science of Language Models

Ai2

Large Language Models · 2402.00838 · 2024-02-01

Large Language Models2402.00838
Code Models13 people

DeepSeek-Coder: When the Large Language Model Meets Programming

DeepSeek

Code Models · 2401.14196 · 2024-01-25

Code Models2401.14196
Mixture-of-Experts Language Models17 people

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

DeepSeek

Mixture-of-Experts Language Models · 2401.06066 · 2024-01-11

Mixture-of-Experts Language Models2401.06066
Alignment and Safety39 people

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Anthropic

Alignment and Safety · 2401.05566 · 2024-01-10

Alignment and Safety2401.05566
Large Language Models14 people

Mixtral of Experts

Mistral AI

Large Language Models · 2401.04088 · 2024-01-08

Large Language Models2401.04088
Large Language Models86 people

DeepSeek LLM Technical Report

DeepSeek

Large Language Models · 2401.02954 · 2024-01-05

Large Language Models2401.02954
Multimodal Models67 people

Gemini: A Family of Highly Capable Multimodal Models

Google Gemini

Multimodal Models · 2312.11805 · 2023-12-19

Multimodal Models2312.11805