About the Role
Cohere's mission is to scale intelligence to serve humanity by training and deploying frontier models for developers and enterprises building AI systems. The company focuses on magical experiences like content generation, semantic search, RAG, and agents, believing their work is instrumental to AI's widespread adoption. They are a team of passionate researchers, engineers, and designers committed to increasing model capabilities and customer value. At Cohere, there is a strong belief in the transformative power of multimodal AI to revolutionize human-technology interaction. The engineering teams are dedicated to pushing the boundaries of what is possible in this field, and this Senior Member of Technical Staff role is crucial to this endeavor. The position offers an ideal environment for exploration, innovation, and shaping the future of AI, supported by an exceptional ratio of compute resources to engineers. This role involves the design and development of cutting-edge multimodal AI systems, which includes seamlessly integrating various modalities such as text, speech, and vision. A key aspect of the role is conducting research and experiments on advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and other related areas. The specialist will collaborate closely with world-class teams, both learning from and contributing to their collective expertise in the field. Notably, Cohere's Multimodal team introduced Command A Vision, a flagship vision-language model in July 2025, which consistently outperforms major models like Llama 4 Maverick, Mistral Medium/Pixtral Large, and GPT4.1, achieving an 83.1% average benchmark with 112B parameters running on just 2 GPUs. This demonstrates Cohere’s ability to achieve breakthrough performance with a focused team, emphasizing that breakthrough compute is not always necessary.
Responsibilities
- Design and develop cutting-edge multimodal AI systems, integrating various modalities such as text, speech, and vision.
- Conduct research and experiments on our advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and more.
- Collaborate closely with our world-class teams, learning from and contributing to their expertise in the field.
Requirements
- Possess exceptional software engineering skills, with a proven track record of building robust and scalable systems.
- Have a strong command of Python and are well-versed in popular deep learning frameworks like JAX, PyTorch, and TensorFlow, with an understanding of their multimodal capabilities.
- Knowledge of distributed training strategies, especially for large-scale multimodal models.
- Familiarity with autoregressive models, particularly their application in multimodal tasks such as image or video captioning, speech-to-text generation.
Qualifications
- Publications in top-tier venues demonstrating your expertise in multimodal AI research.
- Experience in writing efficient GPU kernels using CUDA, optimising performance for multimodal tasks.
Benefits
- An open and inclusive culture and work environment
- Work closely with a team on the cutting edge of AI research
- Weekly lunch stipend, in-office lunches & snacks
- Full health and dental benefits, including a separate budget to take care of your mental health
- 100% Parental Leave top-up for up to 6 months
- Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
- Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
- 6 weeks of vacation (30 working days!)