By Michael Tressler
Sr Solutions Consultant, Jabra
BA, Software Engineering, Ball State University
Contents
About the Author 4
Preamble 5
Introduction 6
What is Intelligent Speaker? 7
What makes a speaker microphone device an Intelligent speaker? 8
Why is this even a thing? 8
What are the requirements for Intelligent Speaker? 10
Hardware requirements 10
Software Requirements 11
Network Requirements 11
Licensing Requirements 12
Other stuff 12
How do I set up Intelligent Speaker? 13
Configure the hardware 13
Configure Teams Rooms on Windows 15
Create/Edit Teams Meeting Policies 17
Creating Teams Rooms Policy 18
Creating the End User Policy 21
Assigning Teams Rooms Policy for n00bs 21
Assigning Teams Rooms Policy for 1337 h4x0r 24
Assigning End User Policy for n00bs 24
Assigning End User Policy for 1337 h4x04 25
Digital voice profile 27
Biometric Privacy with Intelligent Speaker 27
Set up your digital voice profile 28
How to use Intelligent Speaker in a meeting 31
Editing mistakes 34
Troubleshooting 35
Summary 36
About the Author
Michael Tressler is a Senior Solutions Consultant at Jabra. He focuses on enabling video sales in the channel via education, training, and awareness with our partners. He is closing in on his first year with Jabra.
Prior to Jabra, Michael worked for 6 years at Microsoft, with three of those years exclusively focused on Microsoft Teams devices such as Teams Rooms on Windows and Teams Rooms on Android.
Michael has trained thousands of partners and customers on Teams Rooms on Windows and Teams Android devices.
Michael is a moderately proud graduate of Ball State University – best known for graduating David Letterman (so the standards at that school are…let’s go with inconsistent).
You can follow Michael on Mastodon via @flinchbot@twit.social.
Note: Everything said in this paper about the Jabra PanaCast 50 also applies to the Lenovo ThinkSmart Bar 180, as they are essentially the same device with different branding.
Introduction
Microsoft announced Intelligent Speaker at Ignite in March of 2021 and it went into preview in the second half of that year. At initial release, EPOS and Yealink were the only two manufacturers to produce Intelligent Speaker certified devices.
In the blog announcement from Microsoft, Intelligent Speaker was defined as such: “…allow attendees to use the transcription to follow along or capture actions, by knowing who in the room said what. Whether you are working remotely or following the meeting in the conference room, you can effectively see who said what during the meeting.”
Cool. What does any of that mean?
Why does this thing exist?
How do you set it up?
Any security issues with this?
Wait – I thought this was about identifying the person talking, so shouldn’t it be called “Intelligent Microphone”?
These questions, and many more, will be answered in the following beautifully worded paragraphs.
What is Intelligent Speaker?
Intelligent Speaker (now often called speaker recognition), at its core, is proprietary Microsoft technology to uniquely identify a person’s voice to have accurate speaker attribution in places such as meeting rooms. Put another way, when multiple people are speaking in a conference room, and the transcription feature in Microsoft Teams is enabled, how can the sentences and words of each in-room attendee be attributed to them as opposed to being generically attributed to the meeting space?
Here is a sample transcript of a user speaking in a conference room, appropriately named “Epic Conference Room of Awesome”. Note the sentences from this snippet of the transcript are not attributed to a human, but rather to the conference room.
Figure Donuts are delicious
First off, how many people are in this conversation? 1 person talking to themselves? Two people? Three?
Let’s say there are two people in this conversation. Who said what? Am I bringing the donuts or is the other person bringing them? Who even is the other person?
Wouldn’t it be swell if the transcript showed the name of the person who said each sentence instead of just the room name?
What makes a speaker microphone device an Intelligent speaker?
As of July 2024, just having a microphone connected to Teams Rooms on Windows makes a device an Intelligent Speaker. But this has not always been the case. In the before times, you needed a specific piece of hardware to use the speaker recognition feature.
Which now raises the question: Is a Microsoft certified Intelligent Speaker still needed? Is there any benefit to these certified devices over “non-certified Intelligent Speakers”?
Good question. But first, let’s dive into a bit of what makes a certified Intelligent Speaker special.
There are some hints of what makes Microsoft certified Intelligent Speakers so fancy. But not much. It is mentioned that Intelligent Speakers include a 7-microphone array to help identify the voices of up to ten people in a meeting room. Little more is given regarding hardware requirements. The Jabra PanaCast 50 (P50) has 8 beamforming microphones, so I guess that’s good enough!
Beyond the hardware, there are the Microsoft services on the backend that really provide the magic powers. Microsoft says that they are leveraging the powers of Microsoft Graph to “…provide[s] access to rich people-centric data and insight in the Microsoft Cloud to contextualize the transcription. For example, because we know who the speaker is, the acronyms, names of colleagues, and different words the speaker uses can be more accurately transcribed.”
Word.
Not mentioned is all the other magic that needs to happen. For example, it must be able to match up a given voice to you. Or me. There is an audio-matching algorithm that must do this. And then a Speech-to-text service too to convert your spoken words to text so that it can be accurately written to the transcript. And then there must be a way for us to manually fix mistakes (if we care enough).
As of October 2023, there are now six hardware devices that Microsoft has certified to use the Intelligent Speaker feature.
- EPOS Capture 5
- Yealink M Speech
- Sennheiser TeamConnect Intelligent Speaker
- Jabra Panacast 50 (YEAH BABY!)
- Yealink SmartVision 60
- Lenovo ThinkSmart Bar 180
Back to the question: Are the above devices any better? I don’t know of any specific testing and results. However, Microsoft has repeatedly said that certified Intelligent Speakers will outperform non-certified microphones. In the official Microsoft documentation, it says:
While we’re delighted to extend the capability of speaker recognition to more rooms, it’s important to note the quality may not match that of an intelligent speaker certified device. So, it’s essential to evaluate the advantages of incorporating an intelligent speaker, especially in crucial spaces where attaining the highest quality transcription and attribution is vital.
It was also mentioned in the announcement of speaker recognition no longer requiring certified hardware. This was written by Christian Schacht – Principal Program Manager at Microsoft responsible for the speaker recognition feature.
While we’re delighted to extend the capability of speaker recognition to more rooms, it’s important to note that the quality may not match that of an intelligent speaker device. Intelligent speakers are designed with multiple microphones to provide high-quality audio, maximizing accuracy in recognition and transcription and boasting an industry-leading reduction of word error rate. In rooms where top-quality transcription and attribution are imperative, it’s worth it to assess the benefits of integrating intelligent speaker hardware certified for Teams.
And then Christian Schacht mentioned it again in a YouTube video explaining this feature and how to set it up. Fast forwarding to the 2:30 portion of this video, Christian says
…so long story short, the certified hardware for intelligent speaker will always be a little bit better because we just get more information um out of the device directly.
It is ultimately up to you if you want to go the certified hardware route. Just note that for the most accurate transcript, Microsoft believes certified hardware is the best option.
Why is this even a thing?
Why does Intelligent Speaker even exist? Heck – have you ever been on a meeting with transcription enabled?
Me either.
So why? I’ll tell you why: Because it’s cool technology!
OK, that’s not why. The why is boring and I’m trying to pep up this section. And I think I’m failing. And now I’m just wasting your time. So here we go:
Regulated Persons.
There you go. Good times.
What’s a regulated person?
According to some random website called Law Insider, “Regulated Persons means certain broker-dealers and registered investment advisers that are subject to prohibitions against participating in pay-to-play practices and are subject to the SEC’s oversight and, in the case of broker-dealers, the oversight of a registered national securities association, such as FINRA.”
Put another way, these people have all their communications logged, tracked, and recorded so, should a legal issue arise, they can claim their innocence. Hopefully. Otherwise: jail time.
So how do you track everything someone says when they walk into a common space like a conference room? Hello, Intelligent Speaker.
This has been the primary use case for Intelligent Speaker since its launch. And as such, this has been a niche feature that most Teams Rooms admins either have never heard of or have ignored because there is no need for it.
BUT THAT IS ALL ABOUT TO CHANGE!
Let me introduce you to my little friend – Copilot! Copilot! Copilot!
(Are you surprised it took me this long to get into the hip AI topic of the day?)
What happens if you throw AI, erm, Copilot at a transcript? It can quickly summarize it, pull out notes, and even put together a list of tasks derived from the meeting. And a transcript of a meeting room with proper attribution for Copilot to ingest? That’s like the greatest thing ever.
Now Copilot can summarize tasks like “Michael agreed to buy the donuts” instead of “Conference Room A agreed to buy the donuts”.
And now this Intelligent Speaker feature becomes far more than a niche feature for a handful of regulated persons. It becomes a potential game changer for office workers around the globe.
What are the requirements for Intelligent Speaker?
We now know that Intelligent Speaker will save the planet. Or something like that. How does one set it up? What are the requirements? There are quite a few and I’ll start by setting up the administrative side and then show you how to set up the end-user side.
Hardware requirements
I’ve already mentioned the six supported Intelligent Speakers above. But here they are again with pictures. I’m doing this to drive home just how different the Jabra PanaCast 50 (and the Lenovo ThinkSmart Bar 180) is than the other Intelligent Speakers on the market.
EPOS Capture 5
|
|
Jabra PanaCast 50
|
|
Lenovo ThinkSmart Bar 180
|
|
Sennheiser TeamConnect Intelligent Speaker
|
|
Yealink MSpeech
|
|
Yealink MVC S60 (Maybe)
|
|
Two are an all-in-one video bar with industry leading video and audio. Three of them are speaker pucks and the other is also a center-of-table device. Can you spot which one is the best option?
Note that Intelligent Speaker now works without the need for a certified Intelligent Speaker. See the discussion above. TLDR: any microphone connected to your Teams Rooms on Windows will work, but a certified Intelligent Speaker will work better.
The second hardware requirement is that – as of this writing in October 2023 – the Intelligent Speaker is only available on Microsoft Teams Rooms on Windows. This won’t work on Zoom Rooms (Windows or Android). And it won’t work on Teams Rooms on Android. The Android based Jabra PanaCast 50 VBS cannot do attributed captions. This may change at some point, but until then, if someone is interested in deploying Intelligent Speaker, Microsoft Teams Rooms on Windows is the only option.
And that’s about it for hardware.
Note: If someone is still using a Logitech SmartDock running Teams Rooms, sell them a more modern Teams Rooms implementation. Remind them that Intelligent Speaker is not supported on those ancient things due to “… a known issue that Teams Rooms can’t recognize the Intelligent Speaker through the dock.”
Software Requirements
The software requirements for Intelligent speaker are straightforward. You will need a Microsoft Teams Rooms on Windows installation, connected to the Intelligent Speaker of your choosing Jabra PanaCast 50. You also need to set the PanaCast 50 as the default speaker and microphone within Teams Rooms. For Intelligent Speaker to work, it must be the default speaker and microphone.
Network Requirements
The network requirements for Intelligent Speaker are the same as for any Teams Rooms on Windows installation with the exception that when using speaker attribution with a certified Intelligent Speaker, you need 7Mbs of available upload bandwidth. On the nerd side, the seven microphones send seven streams to Microsoft that adds a maximum of 1Mbs per audio stream which would be a maximum of 7Mbs. Once the audio streams reach Microsoft, magic happens, and voice matching is tried in the cloud.
If you are not using certified intelligent Speaker hardware, then there is no increase in audio bandwidth as only a single stream is sent to Microsoft. It is the same stream of audio sent if you have never used this feature. As such, the required audio bandwidth for non-certified Intelligent Speaker microphones is 1Mbps.
Licensing Requirements
The Teams Rooms resource account needs a Teams Rooms Pro license assigned to it. Speaker attribution is not supported on the Teams Rooms Basic license.
Note: If a customer is still using the legacy Teams Rooms Standard or Teams Rooms Pro license, Intelligent Speaker features will work with both licenses.
How about user licenses?
The meeting organizer needs at least an E1 or A1 license in order to create the meeting. Their ability to permit recording and transcription is based on the meeting policy assigned to them. this gets discussed further down.
Attendees whose voice you want attributed in the transcript must be members of the same tenant as the meeting organizer. They also need to be able to sign in to teams and record their biometric voiceprint. The ability to record your voiceprint is also controller by a meeting policy, discussed later in the “Create/Edit Teams Meeting Policies” section below.
At this point, with the licenses assigned to the meeting organizers and attendees, voice recognition will work. So long as you follow the requirements for setting up the meeting (discussed in the “How to use Intelligent Speaker in a Meeting” section below), the transcript of the meeting will have the names of the in-room attendees.
Note that you can’t do much with the transcript at this point, at least with regards to Microsoft tools. All you have now is a transcript with speaker attribution. If you want to do something useful with it (at least in this modern Microsoft world), you will need to assign your users Copilot and/or Teams Premium licenses.
Note: You can always feed the transcript into ChatGPT or other larger language models and they’ll do a really good job analyzing the transcript for you. You don’t *have* to use Copilot, it’s just easier and native within the Microsoft ecosystem.
The meeting organizer does not need a Copilot or Teams Premium license for voice recognition/intelligent speaker to work. You can happily join a meeting hosted by someone in your tenant with an E1 license and – if you have a Copilot license – you can then fire off prompts to Copilot during or after the meeting. Your ability to query copilot is in no way tied to the meeting organizer also having a Copilot license. All Copilot is doing is analyzing the transcript. The meeting organizer’s role in your ability to use Copilot only goes so far as their rights to create meetings that allow for transcripts.
In the scenario above, the meeting organizer (who does not have a Copilot license) cannot use Copilot and will not get an intelligent meeting recap, but you will as you have a Copilot license. I made a table below to help explain this and other licensing add-on scenarios. The below is above and beyond the E1/A1 minimum license.
|
Organizer |
Attendee |
No Copilot licenses for anyone |
No Copilot prompting, no intelligent meeting recap, but attributed transcript created if their meeting policy allows it. Can download the transcript. |
No Copilot prompting, no intelligent meeting recap. Can download the transcript. |
Organizer has a Copilot license, attendees do not |
Attributed transcript created if their meeting policy allows it. Organizer can prompt Copilot and receive the post-meeting intelligent meeting recap. Can also download the transcript. |
Can download the transcript but cannot use Copilot to do anything fun with it. |
Organizer has a Teams Premium license, attendees do not. |
Attributed transcript created if their meeting policy allows it. Organizer will receive the post-meeting intelligent meeting recap. Can also download the transcript. |
Can download the transcript but cannot use Copilot or Teams Premium to do anything fun with it. |
Organizer has no Copilot or Teams Premium license; Attendee has Copilot license. |
Attributed transcript created if their meeting policy allows it. Can download the transcript but cannot use Copilot to do anything fun with it. Can also download the transcript. |
Attendee can prompt Copilot and receive the post meeting intelligent meeting recap. Can also download the transcript. |
Organizer has no Teams Premium or Copilot License; Attendee has a Teams Premium license. |
Attributed transcript created if their meeting policy allows it. Can download the transcript but cannot use Teams Premium to do anything fun with it. Can also download the transcript. |
Attendee will receive the post-meeting intelligent meeting recap. Can also download the transcript. |
Everyone has a Copilot license |
Attributed transcript created if their meeting policy allows it. Organizer can prompt Copilot and receive the post-meeting intelligent meeting recap. Can also download the transcript. |
Attendee can prompt Copilot and receive the post-meeting intelligent meeting recap. Can also download the transcript. |
Note: This is assuming attendees have the right to view and download transcripts. Here is how to block downloads of Teams transcripts.
Note: Downloading transcripts presumes that transcripts are enabled for the meeting. This does not apply to the option to use Copilot without transcripts.
Other stuff
This feature is available in all countries and regions, at least as Microsoft defines them. That does not mean that all languages and locales are supported. See this list for a list of supported locales.
Beyond being available in certain locales, there are legal ramifications to using Intelligent Speaker. For Intelligent Speaker to work, users will have to give up some biometric information (i.e., their voice print). Some nations, principalities, city-states, and other political boundaries may have an issue with this. Check first if this is legal to be used where you intend to set it up.
Second, if it is legal to be used, verify with your company’s legal team if it is legal (or desired at all) in your organization. Some companies like plausible-deniability and not having a transcript sure helps avoid some of that pesky legal paperwork that needs to be handed over in a lawsuit. Or they just really value their employee’s privacy.
Assuming the points above are cleared, the Microsoft Teams administrator then needs to create meeting policies that explicitly enable the voice attribution feature. Depending on how they do it, this will apply to all users of that Microsoft 365 tenant (editing the Global meeting policy) or they can be more tactical and create a custom policy and only assign the policy to users willing to give up their voice print for the common good.
Which leads to….you must hope your users record their voice prints. It is completely voluntary for them to do this. It’s generally bad form to force an employee to give over personal biometric data like their voice.
How do I set up Intelligent Speaker?
This gets a little tricky but if it were easy, I wouldn’t be writing this.
If you are setting this up on hardware that is not a certified Intelligent Speaker, you can safely skip down to Create/Edit Teams Meeting Policies.
Configure the hardware
The first thing is to make sure your Intelligent Speaker is on the latest firmware. Make sure your Jabra PanaCast 50/Lenovo ThinkSmart Bar 180 is on firmware version 6.22 or later. You also need to have Jabra Direct version 6.11.28601 or later.
Connect your PanaCast 50 to a computer and start Jabra Direct. Once the P50 is recognized by Jabra Direct, click on it to get to the settings.
On the screen that appears, click on Settings to get to the good bits.
From within Device settings, scroll down until you see the Playback device type setting. Hit the drop down and change it from “Communication device” to “Microsoft Teams Rooms device”.
Click Save at the top and then reboot the P50. The PanaCast 50 is now ready to be an Intelligent Speaker.
Configure Teams Rooms on Windows
After your P50 reboots, you’re not quite done. You now need to verify that the setting was successfully applied and that the P50 is set as the correct output device within Teams Rooms.
Go to the Teams Rooms on Windows console and tap More.
On the next screen tap Settings.
You are then prompted to sign into Teams Rooms with administrative credentials. Enter the administrative password to move on to the next step.
From within Settings, scroll down to the Peripherals section.
Finally, set the Audio settings for Teams Rooms. For Microphone for Conferencing, select the PanaCast 50 that has UAC2_TEAMS in the name, as shown in the image below. (For a ThinkSmart Bar 180, the name will be different, but the (UAC2_TEAMS) will be the same).
Set the Speaker for Conferencing to the PanaCast 50 that has UAC2_Render in the name. This is shown in the below image. (For a ThinkSmart Bar 180, the name will be different, but the (UAC2_Render) will be the same).
Set the default speaker to the same thing you set above – the (UAC2_Render) device.
At this point you’ve completed the easy part from the admin side. Now we need to create some meeting policies.
Create/Edit Teams Meeting Policies
Up until now, this has been straightforward and anyone with a laptop, a cable, and (optionally) a PanaCast 50 handy can do this work. At this point, things change. In most organizations, you now need to bring in your Microsoft 365 administrators as you need to edit or create new policies to apply some custom settings.
There are two ways to do this:
- Edit the Global Teams meeting policy.
- The advantage here is it’s global, so all user accounts will get this setting.
- Create/Edit a custom policy and only apply it to certain users.
- Generally, you should not edit Global policies and instead create custom policies. This isn’t the document to debate the pros and cons of policy creation and hierarchy. But in this paper, I’m going with this approach in that I will create a new Teams meeting policy.
The person creating or editing these policies needs any of the following roles assigned to them:
- Teams Administrator
- Teams Communications Administrator
You need to edit/create two new policies – one for the Microsoft Teams Rooms Resource Account and one for end users.
Note: You could create just one policy covering both settings, but I’m going to show the most granular way to do this. How customers choose to implement these policies is wholly up to them.
The first policy is to enable the speaker attribute feature on Microsoft Teams Rooms. Note that you don’t set policies on the Teams Rooms device, you set policies on the Resource Account that signs into Teams Rooms and runs the meetings on the device.
First, I will create a policy called IntelligentSpeakerMTR that sets the value “roomAttributeUserOverride” to “Attribute”.
There are three values you can set for “roomAttributeUserOverride”.
One is “False” which turns the feature off, another is “Attribute” which enables speaker attribution, and the third is “Distinguish” which tells the speaker to distinguish between different voices but to *not* provide name attributes for the transcript (e.g., “Speaker 1”, “Speaker 2” instead of “Alice”, “Bob”)
The second policy is assigned to the users that will be allowed to have their voices transcribed (aka, folks who aren’t bonkers over the privacy of their biometrics.). This policy will be called IntelligentSpeakerUser and I will set the values for “enrollUserOverride” to “Enabled” and the value for “AllowTranscription” to “True”.
What do these attributes set? Good question. Also – good to know you’re awake and have read this far. You, my friend, are an amazing human being.
“enrollUserOverride” is used “…to set voice profile capture, or enrollment, in Teams settings for a tenant.” That’s a bit much as this isn’t a tenant level setting, but a user level setting. But whatever. It’s in Microsoft official documentation so it must be true.
If this attribute is disabled, the following happens (or doesn’t, depending):
- Users who have never enrolled can’t view, enroll, or re-enroll.
- The entry point to the enrollment flow will be hidden.
- If users select a link to the enrollment page, they’ll see a message that says this feature isn’t enabled for their organization.
- Users who have enrolled can view and remove their voice profile in the Teams settings. Once they remove their voice profile, they won’t be able to view, access, or complete the enrollment flow.
We want to enable this. When enabled, you get all this awesomeness:
- Users can view, access, and complete the enrollment flow.
- The entry point will show on Teams settings page under the Recognition tab.
The other attribute we will set is “AllowTranscription” which is obvious. You either allow transcription or you don’t. I want to allow transcript so I will set this to True.
Creating Teams Rooms Policy
Let me show you how to create these policies using Microsoft Teams PowerShell. You cannot do this using Teams admin center, which is the tool for total n00bs. YOU are not a total n00b are you? You are a 1337 h4x0r! We 1337 h4x0r5 use PowerShell!
At this point, if you are indeed 1337 h4x0r, do your thing. You don’t need documentation!
For those aspiring 1337 h4x0r5, I’ll walk you through this.
First, start PowerShell on your PC as Administrator (Pro Tip: use the Terminal app). If you don’t know how to start PowerShell, you can stop now and pass this documentation off to a more experienced administrator.
Once PowerShell has started, you need to make sure you have the Microsoft Teams PowerShell module installed. If you are unsure, run the following cmdlets with these parameters:
Install-Module -Name PowerShellGet -Force -AllowClobber
Install-Module -Name MicrosoftTeams -Force –AllowClobber
Import-Module –Name MicrosoftTeams
Now that you have the correct PowerShell module installed, you need to connect your PC to Microsoft Teams. You do this by running
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
Import-Module –Name MicrosoftTeams
Connect-MicrosoftTeams from your terminal window. After entering this cmdlet, you will be prompted to sign in.
When successfully signed in, you get this wonderful feedback (your values will be different).
Now we can get to business.
The cmdlet needed to create a new Teams meeting policy is New-CsTeamsMeetingPolicy. This is the full command line I will enter:
New-CsTeamsMeetingPolicy -Identity IntelligentSpeakerMTR -roomAttributeUserOverride Attribute
Copy and paste that into your PowerShell session. After a few seconds you should get a raft of information back. You can scroll up and see if the change has taken effect. If you aren’t into playing a PowerShell version of “Where’s Waldo”, you can run this PowerShell command to see what the value is set to for “roomAttributeUserOverride”.
Get-CsTeamsMeetingPolicy -Identity IntelligentSpeakerMTR | Select “roomAttributeUserOverride”
If the value returned is “Attribute” then you are ready for the next step.
Creating the End User Policy
The second policy you need to create is the one you will assign to end users. Only end users with this policy assigned will be able to enroll their voice for speaker attribution. As above, open a PowerShell session and connect to your tenant in the cloud.
Below is the PowerShell command needed to create the policy.
New-CsTeamsMeetingPolicy -Identity IntelligentSpeakerUser -enrollUserOverride Enabled -AllowTranscription $true
Copy and paste that into your PowerShell session.
After you hit enter, a raft of information should go flying by. You can scroll up to validate the changes in this policy or run the following PowerShell to confirm the attribute you set.
Get-CsTeamsMeetingPolicy -Identity IntelligentSpeakerUser | Select “enrollUserOverride”, “AllowTranscription”
If you see “Enabled” and “True” then you are good to go.
Assigning Teams Rooms Policy for n00bs
Now that you have the Teams Rooms policy created, you need to assign it to a Teams Rooms Resource account. The perceived easiest way for a new administrator to do this is via Microsoft Teams admin center (TAC). This way you can click away with no mucking about with PowerShell.
To access TAC, open a web browser and enter admin.teams.microsoft.com into the address bar. If necessary, sign into your Microsoft 365 tenant.
From here navigate to the Users section and click on Manage Users
From here, you can either scroll down and find an account, or type in the account name in the Search for a user search box. In this case, I will edit the second account listed – “Conference Room – MTR1”. I click on the name in the Display name column to bring up the properties for that account.
Once I have the properties for that account, I click on Policies to see which policies are assigned to that account.
To change a policy, click on the Edit icon. This brings up the list of possible policies and their settings. Scroll down until you see Select Meeting policy.
After clicking on the drop-down list for meeting policies, you see all available options. Select IntelligentSpeakerMTR and click Apply at the bottom of the screen.
You have now assigned this policy to the Teams Rooms Resource Account.
Note: After a policy is assigned, it can take up to 48 hours to take effect. To get the policy to take effect sooner, accounts must be signed out and signed back in.
Assigning Teams Rooms Policy for 1337 h4x0r
Grant-CsTeamsMeetingPolicy -Identity mtr.mtressler.1@jabrademos.com -PolicyName IntelligentSpeakerMTR
Assigning End User Policy for n00bs
I’ll make this quick.
The steps are the exact same as above – go to Teams admin center, find an end user, and change their policy to IntelligentSpeakerUser. Click apply and wait for the change to take effect.
The one difference is you probably want to apply this to several users at once and not assign the policy one at a time. To do this, click to the left of the names to which you want to assign this policy. A checkmark appears next to the selected names.
Once you have the names selected, scroll back to the top and click Edit settings.
From here, scroll down to Meeting policy and select IntelligentSpeakerUser, then click Apply at the bottom to apply the policy to the group of users.
Assigning End User Policy for 1337 h4x04
Grant-CsTeamsMeetingPolicy -Identity avance@jabrademos.com -PolicyName IntelligentSpeakerUser
Alternately, something like this:
Get-CsOnlineUser | Grant-CsTeamsMeetingPolicy -PolicyName IntelligentSpeakerUser
Digital voice profile
At this point, all the work is done from the administrative side. Now it is up to end users to record their voice profiles. You (legally, at least in most countries) can’t force people to do this. Sadly. 😊 However, once people see the benefit of this, they may volunteer to do this, but I would be surprised if you ever get 100% end user buy in. Some people are too lazy to do it and some people value their biometric privacy too much.
Biometric Privacy with Intelligent Speaker
What is the privacy story? Where is my recorded voice stored? Can anyone access it? Does it work cross-tenant? Those are all good questions and maybe I’ll answer two or three of those.
First, your voiceprint is stored in the same region as your other data. If your tenant is scoped to only store data in a specific region, your voiceprint stays in that specific region. The voiceprint is encrypted at rest. “Voice data is stored in the Office 365 trusted compliance store”.
“Voice data will be securely stored in the Office 365 Cloud, and users will retain control of their information, including the ability to delete it at any time. The capture of voice data can be turned on or off for each meeting. Additionally, admins have full control to turn on/off people identification through voice recognition feature across the organization.”
Admins can export the audio data via Teams admin center. If you go to a user who has made a voice recording, you will see an option to download the biometric profile.
Audio data can only be used within your tenant. This means if you or someone in your tenant hosts the meeting, then Intelligent Speaker features will work (if enabled). If you walk into a meeting room at another Office 365 tenant, your voice profile data will not be used and you will show up as “Speaker X” in the transcription.
To clarify, Alice from Contoso has set up her voice profile. She is going to an in-person meeting with Bob at Northwind Traders. Both Bob and Northwind Traders have successfully set up Intelligent Speaker. In the meeting with Bob and Alice in the Northwind Traders conference room, Bob will be properly attributed in the meeting room, while Alice will appear as “Speaker 1” – even though she has set up her biometric data. This is because the biometric data from Contoso is not shared with Northwind Traders.
Another point: the voice print biometric data is only used within Teams voice-recognition scenarios and not by any other Microsoft software or service.
In a pro-privacy move, user voice print data is removed if the user “…isn’t invited to any meetings with an Intelligent Speaker within that 1-year period.” If a user leaves the company and their account is deleted, the data is removed within 30 days, or whatever data retention policy is in effect.
Set up your digital voice profile
To set up your voice profile, open the Teams app, click the three dots (…) in the upper right, and click on Settings.
Once settings opens, scroll down to Recognition. If you see the message that says you are not enabled, then :sad face:. Most likely, you have not been assigned the Intelligent Speaker policy defined above, or – more likely – the policy has not yet taken effect on your account. Come back later.
If the policy has been assigned and applied to the user, you get the following screen instead.
Click on Create voice profile to get started.
At the top of the screen (where it says “Microphone array…”) you can select which microphone you are using for the recording. Make sure you pick the right microphone and that you are in a quiet room. Click Start voice capture and read that paragraph. It doesn’t matter if you mess up and need to read part of it again. You are not recording something for posterity. You are just letting Teams learn what your unique voice print is.
Nerd Note: You can read *anything* you want. That paragraph on the screen is just something that’s long enough to get a decent voice print. So as mentioned above – it doesn’t matter if you mess up. The point is that you speak for 15 seconds or so.
Note: You can not create or update your voice print while you are in a meeting. If you try to do this, you will receive this awkwardly worded notice:
How to use Intelligent Speaker in a Meeting
Now that you have your voice print made, you just roll into a conference room and start talking and magic happens, right?
Oh, if it were so easy. You precious child. With your simplistic desires.
There is more than just setting up policies and recording your voice. You need to set up the meeting invite correctly.
Note: Speaker recognition will work in an ad-hoc/unscheduled meeting, but speaker attribution will not. Put another way, even with your voice enrolled, you will show up in the transcript as Speaker 1 or Speaker 2, not as your real name.
Here are the requirements that must be met to meet with an Intelligent Speaker to have attributed transcription work successfully:
-
- Everyone who intends to have their voice transcribed must be listed on the meeting invite.
- No more than 20 people who have registered their voice can be on the meeting invite. (Well, 19 if you include yourself, the organizer)
- “Intelligent Speakers work best in medium-sized rooms that hold 8–10 people.”
- If more than 20 people with enrolled voices are on the invite, Intelligent Speaker is disabled.
- Transcription needs to be supported for the meeting. (We did that in the user voice policy, but it’s the meeting organizers policy – not yours – that determines if transcription is allowed)
- Someone needs to turn on transcription in their Teams client. Once the meeting starts, you can only enable transcription from the Teams client and not directly on the Teams Rooms console.
- To enable transcription, click the three dots (More) from within the meeting. Then click Record and transcribe > and finally click Start transcription
At this point, you should see a transcription with your name instead of something generic as seen at the very beginning of this white paper (Figure 1: Donuts are delicious).
Below is a stolen image that was very likely a copyright violation until I probably did the greatest Photoshop edit ever to totally make it a unique work. Like Andy Warhol, this is my art.
If you look at the transcription on the right, you’ll see that it says Serena Ribeiro (Conf Room P…). This lets us know that Intelligent Speaker is working as it recognized Serena’s voice and that she is in a conference room.
Note: This feature is currently removed and will be added again later. (Note added 6 August 2024)
What if someone in the conference room speaks and they don’t have a voice print (either they never set it up, weren’t on the invite, or are from a different tenant). What happens then? Anything?
Intelligent Speaker tracks all the voices in the room (well the first 10). If it recognizes a unique voice, it will tag it as Speaker 1 in the transcript. If a second new user speaks with no accessible voice print, they will be Speaker 2, etc.
See the following stolen screen shot for an example of a person being tagged as Speaker 1.
Editing mistakes
What if this whole thing makes a mistake? Or we just want to manually attribute a user in the transcript.
In the image above, you see there is a button named Identify speaker. If you click that, a drop down appears showing the names of everyone that was on the meeting invite. Pick the right name and that attribution is fixed.
Note: You can only change to a person that was on the meeting invite. This is to prevent falsely attributing something to someone who wasn’t in the room. Otherwise, nothing would stop me from attributing something to Luke Skywalker that Darth Vader said.
For more information on editing attribution on a transcript, see this document. I’m not in the mood to basically copy/paste that article into this one.
Note: You can hide your identity in meeting captions and transcripts! See this link for more info.
Troubleshooting
I’m not going to write a guide because:
- Review the steps above and make sure you got it right.
- Bugs pop up and Microsoft has a page dedicated to known issues. So please go there. (Though at the time of this writing they are still referencing an old Teams Rooms license so…..)
- One tip: If you see “Speaker 1” in the meeting transcript instead of the person’s name, this is a sign that this has been set up correctly, but it is not recognizing the person speaking. Make sure the user policies have been assigned to the user – which could take a minute. Or two days. Also, have the user re-record their voice in the Teams client. I have seen this fix a problem with a person not being recognized.
Summary
I hope this document helped you understand what Intelligent Speaker is and how to set it up with a Jabra PanaCast 50. It’s a cool feature but there is one warning:
Don’t expect perfection.
I’ve historically been disappointed in the accuracy of the attributed names in the transcription. However, due to the Jabra PanaCast 50 going through Microsoft’s Technology Adoption Program (TAP), Microsoft themselves looked closely at this feature for the first time in a while. As such, the service has gotten quite a few improvements on the back end.
Oh hey – I never answered why this solution is called Intelligent Speaker. This whole feature is based around microphones that capture voices so shouldn’t it be called Intelligent Microphone? Yes, yes it should. Also, the name Intelligent Speaker presumes that the person talking is saying something intelligent. That is not always the case.
It’s called Intelligent Speaker just because. It’s what Microsoft initially called it in development – most likely because it was initially based on a speaker puck design so that name kind of stuck.
- Clarified by Ilya Bukshteyen, 18 April 2024, in a chat message of the Teams Firesdai Chat meeting. Microsoft Support ↑
- Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn