IAV VoiceMind
When the Avatar Becomes a Tank Trainer: AI-Assisted Training with VoiceMind.
With VoiceMind, IAV has developed a platform that redefines voice control. At its core is a lip-synced avatar that serves as a virtual reference person for the user. The system is powered by a progressive Large Language Model (LLM) that understands content, responds contextually, and adapts flexibly to follow-up questions.
VoiceMind is currently used in the civilian vehicle sector. However, its potential extends further:
The platform is set to support the military environment in the future – for example, in the training of tank drivers. VoiceMind is thus a prime example of a transformation project originally developed for the automotive sector. Instead of lengthy manuals or rigid training formats, VoiceMind offers an interactive, voice-controlled guide that works in any language and runs on standard devices. Particularly exciting: VoiceMind can be used even before face-to-face training to clarify initial questions, convey basics, and make training more efficient.
We speak with Dr. Thomas Wiesner, an expert in AI-assisted voice control at IAV, about the vision, technical foundations, and specific applications in the defense sector.

Question: Dr. Wiesner, how did the idea to design VoiceMind for the military sector come about?
Answer: The idea arose from a specific challenge: training tank drivers is complex, time-consuming, and often hampered by language barriers. Especially in international missions, such as training Ukrainian forces on the Leopard 2, participants do not necessarily speak English or German. VoiceMind can support as a digital trainer here – language-independent, interactive, and flexible. It helps build a foundation before a personal trainer is even on site.
Question: What exactly does VoiceMind do in this context?
Answer: VoiceMind takes on the role of a virtual trainer. It is based on the vehicle's manual and training materials and explains content step by step. The user can ask follow-up questions at any time, receive additional information, and be actively guided through the learning process. The system responds to comprehension problems and adjusts the explanation accordingly – all in the user's native language.
Question: How does interaction with the system work?
Answer: VoiceMind is not an audiobook. It guides the user through the content, asks follow-up questions like "Did you find the switch?" and offers additional hints or visual support if needed. Communication is dialogical and dynamic – just like in real training. Our human avatar is particularly noteworthy. This creates a similar personal connection as with a trainer.
Question: How does VoiceMind handle language barriers?
Answer: The system works in all common languages – in both directions. The user speaks in their native language, VoiceMind recognizes it and responds in that language. This also works if the language changes between questions. Thanks to the LLM, the system also understands incomplete or complex questions and provides appropriate answers. This greatly facilitates training.
Question: How flexible is VoiceMind with different vehicle types or hardware?
Answer: Very flexible. If another vehicle needs to be trained, we simply swap out the database. Even very old technology that may have been mothballed and is now needed again can be trained. If there is a manual, we can train on it. VoiceMind runs on computers, tablets or smartphones and does not require special infrastructure. This makes it ideal for use under various conditions.
Question: What about data security?
Answer: VoiceMind itself does not store user data. Everything spoken is processed directly. In theory, VoiceMind can also be operated locally without cloud services, provided sufficiently powerful hardware is used. This means that all data sovereignty lies with the user – a crucial advantage in the military environment.
Question: How quickly can a new training session be set up with VoiceMind?
Answer: Once the manual is available, we can set up a functioning training session within a day. Presentations or additional materials can also be quickly integrated. This saves time and resources.
Question: Where do the contents that VoiceMind works with come from?
Answer: The basis is the official manual of the respective vehicle. VoiceMind analyzes this and converts it into a dialogue-capable format. Additionally, training materials, presentations, or technical documentation can be integrated into a presentation mode. This ensures full control over what VoiceMind trains and how this training is didactically structured.

Question: What role does the avatar play in the training?
Answer: The avatar creates an emotional connection. It makes a difference whether I speak with an anonymous system or with a virtual person who appears sympathetic to me. This increases acceptance, builds trust, and improves the learning experience.
Question: Can VoiceMind also check learning success?
Answer: Yes. At the end of the training, VoiceMind can ask targeted follow-up questions: "Do you have any open questions?" or "Can you summarize the tactical advantage of this system for me?" This creates a feedback loop that ensures learning success and continuously improves the content.
Question: What advantages does VoiceMind bring to the organization of training?
Answer: VoiceMind significantly reduces the number of training hours required. Training can take place independently of location, minimizing logistical effort. At the same time, less personnel is needed – with VoiceMind, many participants can be prepared simultaneously, with each receiving individual instruction and being able to ask questions. And: VoiceMind does not get sick, is always available, and can be flexibly adapted to different devices and content.
Question: Dr. Wiesner, what other applications do you see for VoiceMind beyond training?
Answer: VoiceMind is not just a training system – it can also actively support ongoing operations. If questions arise during the operation of a vehicle or system, VoiceMind is immediately available as a contact person. It explains functions, provides hints, and responds to follow-up questions. A crucial advantage: VoiceMind does not experience combat stress. It remains calm, reacts precisely and reliably – even in situations where human trainers are overwhelmed or unavailable. This makes it a valuable companion, not only in training but also in operational use.
Conclusion:
VoiceMind demonstrates how modern AI technology can support training in the military sector – efficiently, scalable, and without language barriers. The use as an interactive trainer is a logical next step for IAV and an example of how innovation is practically implemented.
Dr. Thomas Wiesner
IAV Team Manager