Skip to main content
Logo GMV

Main navigation

  • Sectors
    • Icono espacio
      Space
    • Icono Aeronáutica
      Aeronautics
    • Icono Defensa y Seguridad
      Defense and Security
    • Icono Sistemas Inteligentes de Transporte
      Intelligent Transportation Systems
    • Icono Automoción
      Automotive
    • Icono Ciberseguridad
      Cybersecurity
    • Icono Servicios públicos Digitales
      Digital Public Services
    • Icono Sanidad
      Healthcare
    • Icono Industria
      Industry
    • Icono Financiero
      Financial
    • Icono Industria
      Services
    • All Sectors

    Highlight

    Visual Language Models
    Visual Language Models: when robots understand their surroundings
  • Talent
  • About GMV
    • Get to Know the Company
    • History
    • Management Team
    • Certifications
    • Sustainability
  • Communication
    • News
    • Events
    • Blog
    • Magazine GMV News
    • Press Room
    • Media library
    • Latest from GMV

Secondary navigation

  • Products A-Z
  • GMV Global
    • Global (en)
    • Spain and LATAM (es - ca - en)
    • Germany (de - en)
    • Portugal (pt - en)
    • Poland (pl - en)
    • All branches and all GMV sites
  • Home
Back
New search
Date
Blog
  • Automation

Visual Language Models: when robots understand their surroundings

06/02/2026
  • Print
Share
Visual Language Models

One of the main challenges for advanced industrial robotics is giving robots the ability to not just capture information from their environment, but to also interpret that information in a coherent and contextual way. Understanding requires more than just seeing, so in order for robots to operate autonomously and reliably in real-world settings, they must be able to integrate data from multiple sources, such as cameras, proximity sensors, LiDAR, microphones, and other systems, then transform that information into actionable knowledge in real time.

Machine vision has traditionally been based on specialized models that are trained for specific tasks, with a high level of dependence on data labeling and controlled scenarios. Although these methods have proven to be effective in well‑defined industrial contexts, they show clear limitations when faced with dynamic environments, operational variability, or situations that were not covered during the training.

In this context, the emergence of Visual Language Models (VLMs) represents a paradigm shift. These models combine the capabilities of machine vision and natural language processing into a unified architecture, making it possible to associate visual elements with high-level linguistic concepts. The result is a deeper understanding of the environment, which is based not only on visual patterns, but also on semantics, context, and relationships between objects and actions.

From a technical perspective, VLMs allow improved cross-domain generalization, which reduces the need for specific training on each use case, while also facilitating knowledge transfer among different scenarios. Models of this type have now been widely studied, and they have demonstrated a remarkable capacity for understanding images based on natural language descriptions, and vice-versa.

At GMV, these capabilities are being transferred to the operational environment and made available on the market through their integration into uPathWay, which is the company’s intelligent platform for management, orchestration, and optimization of heterogeneous fleets of robots and autonomous vehicles in industrial settings. This incorporation of VLMs is now opening the door to new scenarios for interaction and supervision, by adding another layer of contextual intelligence on top of more traditional perception.

Some of the most notable use cases now include:

  • Monitoring of robots by using natural language supported by visual information, which facilitates more intuitive human-robot interactions while also reducing technical obstacles for operators and supervisors.
  • Automatic generation of descriptions for operating conditions and incidents, based on images or video sequences captured by the robots themselves.
  • Visual validation of tasks, such as automated confirmation that a load, a pallet, or an inspected element is correctly positioned or in its expected state.
  • Context-based detection of anomalies, to identify unexpected situations that were not expressly defined in advance through rules or models.
  • More natural and flexible interfaces that can support decision-making, by combining natural language prompts and visual information from the environment.

These capabilities are all contributing to a form of robotics that is more autonomous, explainable, and scalable, with the ability to adapt to complex or dynamic industrial environments, or those with a high degree of uncertainty. In addition to improving task automation, VLMs are also allowing progress towards systems that can not only execute instructions, but also interpret and communicate what is happening around them.

GMV is continuing its work on integration of advanced perception and contextual intelligence, as key elements that will drive the automation of the future, all with the aim of bringing these technologies out of the research phase and into real operational applications.

 

Author: Ángel C. Lázaro

  • Print
Share

Comments

About text formats

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA

Related

uPathWay
  • Automation
When robots learn to speak the same language: The new era of industrial interoperability

Contact

Europaplatz 2
64293 Darmstadt | Deutschland
Tel. +49 6151 3972 970
Fax. +49 6151 8609 415

Zeppelinstraße, 16
82205 Gilching | Deutschland
Tel. +49 (0) 8105 77670 150
Fax. +49 (0) 8105 77670 298

Contact menu

  • Contact
  • GMV around the world

Blog

  • Blog

Sectors

Sectors menu

  • Space
  • Aeronautics
  • Defense and Security
  • Intelligent Transportation Systems
  • Automotive
  • Cybersecurity
  • Digital Public Services
  • Healthcare
  • Industry
  • Financial
  • Services
  • Talent
  • About GMV
  • Shortcut to
    • Press Room
    • News
    • Blog
    • Products A-Z
© 2026, GMV Innovating Solutions S.L.

Footer menu

  • Contact
  • Legal Notice
  • Privacy Policy
  • Cookie Policy
  • Impressum

Footer Info

  • Commitment to the Environment
  • Financial Information