The Most Epic Blog Article on IntelliFrame That There Ever Was and That There Ever Will Be.

Abstract

IntelliFrame is a term Microsoft uses to define different ways to show in-room attendees. There are three different IntelliFrame types. In this article, I explain what all three types are, how to configure them, and how to manage them.

About the Author

Michael Tressler, aka flinchbot, is a Senior Solutions Consultant at Jabra. He has spent the past 5 years focusing on Microsoft Teams devices, specifically Microsoft Teams Rooms.

He was the 1987 and 1988 American Association of Students of German Mühle champion. As such, do not take him lightly when playing the game the English call Nine Men’s Morris.

Intro

Microsoft announced IntelliFrame at Ignite in 2022. On October 12th of that same year, the “What’s new for Microsoft Teams Rooms, Teams devices, and intelligent cameras at Ignite 2022”1 blog article on TechCommunity appeared. Microsoft described IntelliFrame thusly: “IntelliFrame enhances the focus and framing of in-room meeting attendees so that all meeting participants – including those in the room – have their own, individual frame in the video gallery.


Note: Currently, Microsoft Teams Rooms on Windows supports Edge, Cloud, and Multi-stream IntelliFrame. Teams Rooms on Android only supports Edge IntelliFrame.


This image – stolen from the above referenced blog article – shows what this was going to look like. And it actually does look like this now that it is rolled out.

 
Focus on the yellow box. In there you see four faces. These are the four most recent speakers. Below them is a view of the full conference room.  This whole section combined is the client-side view of Multi-stream IntelliFrame.

If you look at the four squares, you will see that there is a name in each square. This is the name of each of those people. This is magic and sorcery. How does Teams know the name of these people if they just walk into a conference room? Look at the attendee list. You will see their names grouped under the name of the conference room (Conf Room Contoso Square 14).

Magically adding names to faces is the people recognition feature. It was also briefly mentioned in the Ignite announcement. 1

This article will only describe IntelliFrame in detail. By the end of reading this, you will be bored. But with a little luck, you may know what you are talking about if someone asks you about this stuff. Maybe. I’d grab a Red Bull now.

There will be a separate overly long post on people recognition. You have something to look forward to!

Also, unlike people recognition (and voice recognition/intelligent speaker), IntelliFrame can work on Meet Now/Ad-hoc meetings. The meeting does not need to be scheduled in advance.

However – just to be clear – if you want to use people or voice recognition with IntelliFrame, it must be a scheduled meeting. I’m just saying the specific IntelliFrame feature is not reliant on a scheduled meeting

What is IntelliFrame

According to Microsoft2, there are three versions of IntelliFrame.

1.) Edge IntelliFrame

2.) Cloud IntelliFrame

3.) Multi-Stream IntelliFrame

I will now write too many words on each f these three features.

Edge IntelliFrame

Smart Cameras have the ability to do some level of face/torso detection via the magic of artificial intelligence and zoom in or crop to those faces/torsos. If those cameras also have beam forming microphones, you can then detect from where the noise (i.e., talking) is coming. If there is a head or torso in the same place as the audible noise, the intelligent cameras presume that is a person talking.

For individual speaker tracking modes on these intelligent cameras, this is all that happens. A single person is tracked while they speak. When someone else starts talking, the camera processes a new voice direction and facial recognition happens again, and the camera frames the new speaker. This is called active speaker tracking or speaker framing.

Now, wouldn’t it be cool if a camera could actively track more than one person? This is Edge IntelliFrame. Edge IntelliFrame is a feature where the camera can track multiple people at the same time, all done solely by the camera in the meeting space. The camera shows all the people talking. Or the four most recent talkers. Or if no one has spoken yet, four random people. Or 15 people all in tiny squares. It all depends on the camera and its capabilities.

Here is an image of Jabra’s Dynamic Composition view (Jabra’s name for Edge IntelliFrame). Notice the 2×2 grid in the purple square. The PanaCast 50 has recognized four torsos and has zoomed/cropped into them. The Jabra PanaCast 50 only sends one video stream to Microsoft, a video stream consisting of four faces.

Every camera vendor calls this same feature something different. Here is a very incomplete table of the vendor and what they call the feature.

Logitech Grid View
Jabra Dynamic Composition
Neat Symmetry

There are a lot more vendors that do this, but I got bored searching for more. Leave a comment on ones I’m missing and maybe I’ll be motivated one day to update this table.

The point is, Edge IntelliFrame is *only* done on the edge. Microsoft has nothing to do with it. They get a video feed from the camera in the conference room, and they present that feed to remote attendees.

And until recently, Microsoft very rarely mentioned Edge IntelliFrame. It’s only with Microsoft 365 Roadmap item 409537 3 that this term has popped up again. Microsoft is adding a feature to Teams Rooms on Windows allowing you to tell the camera to switch modes.  From the roadmap item: “Capabilities covered with this feature includes group framing, active speaker framing, and edge composed IntelliFrame.”

Note that this feature is enabled in the conference room. Remote attendees cannot interact with this view (e.g., people attending outside the conference room cannot turn off this view or customize it in any way). The names of attendees inside the conference room are not shown (no people recognition)

For more information on Edge IntelliFrame, contact your camera vendor as Microsoft has very little to do with this.

Cloud IntelliFrame

If Edge IntelliFrame is fully provided by the camera, Cloud IntelliFrame is the complete opposite. With Cloud IntelliFrame, the camera can be a “non-intelligent camera”, and you can still get face and torso recognition. This is because the camera sends a feed to Microsoft, and then Microsoft throws AI at the video feed. If Microsoft detects some faces, it will pop them into their own little box.


Note: This is now the default view when using a supported camera. “All Microsoft Teams Rooms on Windows with a Pro license equipped with cameras (specified in Supported cameras) automatically opt-in to Cloud IntelliFrame.”4In the image above, facial recognition is done by Microsoft in the cloud. Hence the name Cloud IntelliFrame.

There is a limit to nine participants in a room for Cloud IntelliFrame to work. If there are 10+ users in the room, the standard room view will be used.5

Microsoft has done a nice job explaining how this is configured and how an Admin can turn this off.6

Users inside the meeting room can disable Cloud IntelliFrame by finding this setting on the Teams Rooms console.

Turn off IntelliFrame on the Microsoft Teams Room console.

Remote attendees can enable or disable this feature on a per-attendee basis. They can right click on the IntelliFrame frame and select Turn off IntelliFrame.

When Cloud IntelliFrame is disabled, the user is shown the full room view from the camera (or whatever view the camera is set to).

To re-enable Cloud IntelliFrame, click in the video frame of the conference room and select Turn on IntelliFrame.

Note: If you click the Spotlight for everyone option, it will spotlight the entire Multi-Stream IntelliFrame view, not the view of one specific video stream.

In-Room Self View

What do the users inside the conference room see with Cloud IntelliFrame? They see the full view of the room. They do not see the zoomed in “IntelliFrame view” that remote attendees see.

Disabling Cloud IntelliFrame

If you don’t want to use Cloud IntelliFrame at all, you can disable it.

The easiest way is to navigate to the settings on Teams Rooms and disabling this setting.

You can also disable Cloud IntelliFrame by using a skypesettings.xml file7.

Here is the minimum you need to add to a skypesettings.xml file:

<SkypeSettings>

<EnableCloudIntelliframe>false</EnableCloudIntelliframe>

</SkypeSettings>

To re-enable Cloud IntelliFrame, change false to true.

Requirements

Cloud IntelliFrame require a Teams Rooms Pro license5 and a supported camera8.

Cloud IntelliFrame is currently only supported on Microsoft Teams Rooms on Windows.

As of 5 August 2024, not every Teams client can see a Cloud IntelliFrame feed5. For Teams clients that don’t support rendering the Cloud IntelliFrame feed, they will see the full default view from the camera (often, a view of the whole conference room).

Like Edge IntelliFrame, names of attendees in the conference room are not shown (people recognition is unsupported).

Note: As Microsoft is doing this in the cloud, they are zooming and cropping on an already compressed video stream (even if it is 1080P). As such, the video quality of the zoomed in faces may be lower quality than with Edge IntelliFrame or Multi-stream IntelliFrame.


“That said, Cloud IntelliFrame experiences can sometimes result in lower video resolution than multi-stream cameras…”9 Arash Ghanaie-Sichanie – Senior Director, Teams AI Experiences, Microsoft


Multi-Stream IntelliFrame

With Multi-Stream IntelliFrame, the in-room camera and Microsoft both do some of the work.

There is a lot of intelligence required by the in-room camera, which is why (as of this writing on 2 August, 2024), there are only two cameras certified for Multi-stream IntelliFrame:

1.) The Yealink SmartVision 60 10

2.) The Jabra PanaCast 50 (Go Jabra!) 10

Over time, more cameras will probably be certified.

With regards to Multi-stream IntelliFrame, what makes these cameras special?

Well, for one, they have to do some edge processing. They are basically doing Edge IntelliFrame in that they both track the active speaker and most recent speaker(s). (Jabra tracks the current and most recent speaker while Yealink can track up to four speakers).

Once the speakers are tracked, zoomed, and cropped, the cameras then send individual video streams of each speaker and a panoramic view of the conference room to Microsoft. This is key. In both of the other IntelliFrame types, only one video stream is sent. In Multi-stream IntelliFrame, multiple video streams are sent. Microsoft then catches those feeds and lays them out optimally on your Teams client. This gives Microsoft the flexibility to show the people video feeds next to each other (portrait view) and the panorama underneath, or the people video feeds on top of each other (landscape view) with the panorama underneath. Anyway, now you know why it’s called Multi-stream IntelliFrame.

Here is a picture of Multi-stream IntelliFrame running on a Jabra PanaCast 50. Note that the two active Speakers are also visible in the panorama view of the conference room. (The yellow box does not appear when you use this. I added the yellow box to highlight the Multi-stream IntelliFrame view).
You also see the names of those two people. You have never seen Ben Clarke or Alice Kelly in your life before, but now you know their names. That’s the people recognition feature which is only available with Multi-stream IntelliFrame. People recognition will be covered in a different post.

Multi-stream IntelliFrame is only enabled by an Administrator of the camera. It cannot be enabled/disabled by users in the conference room or by remote attendees using the Teams client.

End user can disable the panorama view on the bottom.  And they can bring it back if they so choose. But unlike Cloud IntelliFrame, they cannot disable it.

Enabling Multi-stream IntelliFrame

I do not have a Yealink SmartVision 60. (Hey Yealink, send me one! You have my address on file already! 😁 ). But I do have a Jabra PanaCast 50 so I will be using that to show how to setup Multi-stream IntelliFrame (henceforth, MS IF)

Requirements

From the Microsoft Learn article11, these are the pre-requisites for MS IF.

  • Microsoft Teams Rooms Windows
  • Microsoft Teams Rooms Pro license
    • Microsoft Teams Rooms with Pro license is required to enable IntelliFrame and people recognition features on Microsoft Teams Rooms.
    • Basic license doesn’t support IntelliFrame or people recognition. If you have Teams Rooms Basic license, the camera shows only active speaker and panoramic views.
  • A supported camera
  • Bandwidth
    • All of the MS IF feeds are a maximum of 720P
      • If there are three streams, and if a single 720P video stream takes 1.2Mbs, then you will need up to  3.6 Mbs with a Jabra PanaCast 50 (3 x 1.2) or up to 6Mbs with the Yealink SmartVision 60 (5x 1.2Mbs).
    • You will need more bandwidth than a traditional single-stream video feed uses
  • A supported Teams client to ingest these multiple video streams
  • A USB3 connection is required. Be sure you are using the correct cable and that it is plugged into a USB3 port on the compute module.

As of 3 August 2024, not every Teams client can render an MS IF feed. The table below lists many of the Teams clients and if they do or do not show the MS IF view.

Client Can show MS IF?
Windows Yes
MacOS No
Web (Edge) No
Android (Mobile) No
iOS No
Teams Rooms (Windows) Yes
Teams Rooms (Android) No

On unsupported clients, you see only see the active speaker in a video tile. In other words, you only receive one video stream, and that is the stream of the active speaker.

Configuring Jabra PanaCast 50

I’m going to assume you have your PanaCast 50 unboxed, powered up, and connected to your PC or laptop. The first thing to do is to download Jabra Direct12 and make sure you are on a supported firmware.

The minimum required firmware for MS IF is 8.0.7. If you are not on this firmware, Jabra Direct will prompt you to upgrade to the latest available firmware. Follow the prompts and in about 10 minutes, you will be updated.


If you would prefer to watch a video on how to set this up, along with a demo of Multi-stream IntelliFrame on a PanaCast 50, my coworker Eric Taylor made just that video.


Within Jabra Direct, click on the Device section and then click on your PanaCast 50.


Next, click on the Settings option in the lower right.


Next, Click on Camera to get to camera-specific settings.

Now, select Multi-stream for Microsoft Teams Rooms from the Dynamic Composition dropdown.


Finally click Save.


After clicking Save, you will be given a notice that a reboot is required. Click Save and the PanaCast 50 will reboot (assuming it is not already in a call).

Configuring Teams Rooms on Windows

Once you make this change on the camera, you need to go into the Teams Rooms settings and verify the camera is still the default video device.

On the Teams Rooms Console, tap on More, then Settings, and sign in.

From here tap on Peripherals. In the Cameras section, you will now see three camera options. It doesn’t matter which one you select, just make sure you see these three choices. If not, validate the steps above to set the PanaCast 50 to Multi-stream for Microsoft Teams mode.


Click Save and exit and you are ready to go.

Client options

What capabilities does the client have to enable/disable Multi-stream IntelliFrame or to manipulate the view? Well, don’t get too excited. There is only one option and that is to enable or disable the panorama view. That’s it. Unlike Cloud IntelliFrame, you cannot enable or disable MS IF entirely.

Show/Hide Panorama

Below is an image of me doing some testing and smiling very broadly. You see the full MS IF view.

If I right click on any of the MS IF streams, a menu pops up. For the purposes of this article, the only interesting option is Hide panorama.

After you click to hide the panorama, you get a larger view of the current and most recent speaker(s).

You can right click again and the option to Show panorama appears.

Only One Person

What happens if only person is in the meeting room when the meeting starts? What view do you see there?

You see the stream of the active speaker, and that is all. You do not see a panorama below the stream. Put another way, Multi-stream IntellFrame is only enabled when two or more people are in the meeting room.

If the meeting starts with one person, and a second person walks in later, MS IF will get enabled. There do not need to be two people in the meeting room at the start of the meeting.

What happens if everyone leaves the meeting room except for one person? MS IF is not disabled.  Instead, you see a full view of the active speaker and the room’s panorama view. In other words, two streams are sent instead of three.

Changing views

Multi-stream IntelliFrame works best in Gallery view. There are a lot of options and I’m not going to go through every possible view here and explain what is being shown. But in general, if you change the view or the view changes for some reason (someone is sharing a document, for example), the active speaker stream is used.

Panorama view

As mentioned above, users can choose to hide and show the panorama view. There is one thing that you might notice in the panorama view that is interesting: The whole panorama is shown.

Many cameras have the ability to zoom in and show only the part of the room that has people in it. In the world of Jabra, this is called Intelligent Zoom. With Intelligent Zoom, the Jabra PanaCast 50 finds the leftmost person and the rightmost person and zooms the camera in to show them and what is between them.

As best I understand it, Microsoft requires the panorama to show the entire view, not a zoomed in view.

In-Room Self View

What do the users inside the conference room see with Multi-stream IntelliFrame? They see the active speaker view. They do not see the multi-stream view that remote attendees see.

Summary

Hopefully this article helped clarify IntelliFrame in the world of Microsoft Teams Rooms. If you have any questions, please comment below and I’ll make up an answer for you.

  1. What’s new for Microsoft Teams Rooms, Teams devices, and intelligent cameras at Ignite 2022 – Microsoft Community Hub[][]
  2. I promise this is true, though I can’t find a link[]
  3. Microsoft 365 Roadmap | Microsoft 365[]
  4. Cloud IntelliFrame – Microsoft Teams | Microsoft Learn)


    Pay attention to the yellow box in the below image. This is the standard video feed from the camera to Microsoft. Nothing special here really, just a view of people in a conference room.Now, look at the image below. It’s the same people in the same room, but this time, Cloud IntelliFrame has been enabled.  We now get a zoomed-in, cropped view of their faces. This view can be provided by most Teams Rooms certified cameras.((Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[]

  5. Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[][][]
  6. Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[]
  7. Remotely manage Microsoft Teams Rooms device settings – Microsoft Teams | Microsoft Learn[]
  8. Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[]
  9. A deep dive into intelligent cameras Multi-Stream and Cloud IntelliFrame for Teams Rooms – Microsoft Community Hub[]
  10. What is Microsoft Multi-Stream IntelliFrame and Intelligent Camera? – Microsoft Teams | Microsoft Learn[][]
  11. What is Microsoft Multi-Stream IntelliFrame and Intelligent Camera? – Microsoft Teams | Microsoft Learn[]
  12. https://www.jabra.com/software-and-services/jabra-direct[]

The Greatest White Paper on Intelligent Speaker (aka Voice Recognition) Ever.

 

By Michael Tressler

Sr Solutions Consultant, Jabra

BA, Software Engineering, Ball State University

Contents

About the Author 4

Preamble 5

Introduction 6

What is Intelligent Speaker? 7

What makes a speaker microphone device an Intelligent speaker? 8

Why is this even a thing? 8

What are the requirements for Intelligent Speaker? 10

Hardware requirements 10

Software Requirements 11

Network Requirements 11

Licensing Requirements 12

Other stuff 12

How do I set up Intelligent Speaker? 13

Configure the hardware 13

Configure Teams Rooms on Windows 15

Create/Edit Teams Meeting Policies 17

Creating Teams Rooms Policy 18

Creating the End User Policy 21

Assigning Teams Rooms Policy for n00bs 21

Assigning Teams Rooms Policy for 1337 h4x0r 24

Assigning End User Policy for n00bs 24

Assigning End User Policy for 1337 h4x04 25

Digital voice profile 27

Biometric Privacy with Intelligent Speaker 27

Set up your digital voice profile 28

How to use Intelligent Speaker in a meeting 31

Editing mistakes 34

Troubleshooting 35

Summary 36

About the Author

Michael Tressler is a Senior Solutions Consultant at Jabra. He focuses on enabling video sales in the channel via education, training, and awareness with our partners. He is closing in on his first year with Jabra.

Prior to Jabra, Michael worked for 6 years at Microsoft, with three of those years exclusively focused on Microsoft Teams devices such as Teams Rooms on Windows and Teams Rooms on Android. 

Michael has trained thousands of partners and customers on Teams Rooms on Windows and Teams Android devices.

Michael is a moderately proud graduate of Ball State University – best known for graduating David Letterman (so the standards at that school are…let’s go with inconsistent).

You can follow Michael on Mastodon via @flinchbot@twit.social.

Note: Everything said in this paper about the Jabra PanaCast 50 also applies to the Lenovo ThinkSmart Bar 180, as they are essentially the same device with different branding.

Introduction

Microsoft announced Intelligent Speaker at Ignite in March of 2021[3] and it went into preview in the second half of that year. At initial release, EPOS and Yealink[4] were the only two manufacturers to produce Intelligent Speaker certified devices.

In the blog announcement from Microsoft, Intelligent Speaker was defined as such: “…allow attendees to use the transcription to follow along or capture actions, by knowing who in the room said what. Whether you are working remotely or following the meeting in the conference room, you can effectively see who said what during the meeting.[5]

Cool. What does any of that mean?

Why does this thing exist?

How do you set it up?

Any security issues with this?

Wait – I thought this was about identifying the person talking, so shouldn’t it be called “Intelligent Microphone”?

These questions, and many more, will be answered in the following beautifully worded paragraphs.

What is Intelligent Speaker?

Intelligent Speaker (now often called speaker recognition), at its core, is proprietary Microsoft technology to uniquely identify a person’s voice to have accurate speaker attribution in places such as meeting rooms. Put another way, when multiple people are speaking in a conference room, and the transcription feature in Microsoft Teams is enabled, how can the sentences and words of each in-room attendee be attributed to them as opposed to being generically attributed to the meeting space?

Here is a sample transcript of a user speaking in a conference room, appropriately named “Epic Conference Room of Awesome”. Note the sentences from this snippet of the transcript are not attributed to a human, but rather to the conference room.

Figure Donuts are delicious

First off, how many people are in this conversation? 1 person talking to themselves? Two people? Three?

Let’s say there are two people in this conversation. Who said what? Am I bringing the donuts[6] or is the other person bringing them? Who even is the other person? 

Wouldn’t it be swell if the transcript showed the name of the person who said each sentence instead of just the room name?

What makes a speaker microphone device an Intelligent speaker?

As of July 2024, just having a microphone connected to Teams Rooms on Windows makes a device an Intelligent Speaker. But this has not always been the case. In the before times, you needed a specific piece of hardware to use the speaker recognition feature.

Which now raises the question: Is a Microsoft certified Intelligent Speaker still needed? Is there any benefit to these certified devices over “non-certified Intelligent Speakers”? 

Good question. But first, let’s dive into a bit of what makes a certified Intelligent Speaker special.

There are some hints of what makes Microsoft certified Intelligent Speakers so fancy. But not much. It is mentioned that Intelligent Speakers include a 7-microphone array[7] to help identify the voices of up to ten people in a meeting room. Little more is given regarding hardware requirements. The Jabra PanaCast 50 (P50) has 8 beamforming microphones, so I guess that’s good enough!

Beyond the hardware, there are the Microsoft services on the backend that really provide the magic powers. Microsoft says that they are leveraging the powers of Microsoft Graph to “…provide[s] access to rich people-centric data and insight in the Microsoft Cloud to contextualize the transcription. For example, because we know who the speaker is, the acronyms, names of colleagues, and different words the speaker uses can be more accurately transcribed.[8]

Word.

Not mentioned is all the other magic that needs to happen. For example, it must be able to match up a given voice to you. Or me. There is an audio-matching algorithm that must do this. And then a Speech-to-text service too to convert your spoken words to text so that it can be accurately written to the transcript. And then there must be a way for us to manually fix mistakes (if we care enough).

As of October 2023[9], there are now six hardware devices that Microsoft has certified to use the Intelligent Speaker feature.

  1. EPOS Capture 5
  2. Yealink M Speech[10]
  3. Sennheiser TeamConnect Intelligent Speaker
  4. Jabra Panacast 50 (YEAH BABY!)
  5. Yealink SmartVision 60
  6. Lenovo ThinkSmart Bar 180 [11]

Back to the question: Are the above devices any better? I don’t know of any specific testing and results. However, Microsoft has repeatedly said that certified Intelligent Speakers will outperform non-certified microphones.  In the official Microsoft documentation, it says:

While we’re delighted to extend the capability of speaker recognition to more rooms, it’s important to note the quality may not match that of an intelligent speaker certified device. So, it’s essential to evaluate the advantages of incorporating an intelligent speaker, especially in crucial spaces where attaining the highest quality transcription and attribution is vital. 1

 

It was also mentioned in the announcement of speaker recognition no longer requiring certified hardware. This was written by Christian Schacht – Principal Program Manager at Microsoft responsible for the speaker recognition feature.

While we’re delighted to extend the capability of speaker recognition to more rooms, it’s important to note that the quality may not match that of an intelligent speaker device. Intelligent speakers are designed with multiple microphones to provide high-quality audio, maximizing accuracy in recognition and transcription and boasting an industry-leading reduction of word error rate. In rooms where top-quality transcription and attribution are imperative, it’s worth it to assess the benefits of integrating intelligent speaker hardware certified for Teams.  2

 

And then Christian Schacht mentioned it again in a YouTube video explaining this feature and how to set it up. Fast forwarding to the 2:30 portion of this video, Christian says

…so long story short, the certified hardware for intelligent speaker will always be a little bit better because we just get more information um out of the device directly. 3

It is ultimately up to you if you want to go the certified hardware route. Just note that for the most accurate transcript, Microsoft believes certified hardware is the best option.

Why is this even a thing?

Why does Intelligent Speaker even exist? Heck – have you ever been on a meeting with transcription enabled?

Me either.

So why? I’ll tell you why: Because it’s cool technology!

OK, that’s not why. The why is boring and I’m trying to pep up this section. And I think I’m failing. And now I’m just wasting your time. So here we go:

Regulated Persons.

There you go. Good times.

What’s a regulated person?

According to some random website called Law Insider, “Regulated Persons means certain broker-dealers and registered investment advisers that are subject to prohibitions against participating in pay-to-play practices and are subject to the SEC’s oversight and, in the case of broker-dealers, the oversight of a registered national securities association, such as FINRA.[12]

Put another way, these people have all their communications logged, tracked, and recorded so, should a legal issue arise, they can claim their innocence. Hopefully. Otherwise: jail time.

So how do you track everything someone says when they walk into a common space like a conference room? Hello, Intelligent Speaker.

This has been the primary use case for Intelligent Speaker since its launch. And as such, this has been a niche feature that most Teams Rooms admins either have never heard of or have ignored because there is no need for it.

BUT THAT IS ALL ABOUT TO CHANGE!

Let me introduce you to my little friend – Copilot! Copilot! Copilot!

(Are you surprised it took me this long to get into the hip AI topic of the day?)

What happens if you throw AI, erm, Copilot at a transcript? It can quickly summarize it, pull out notes, and even put together a list of tasks derived from the meeting. And a transcript of a meeting room with proper attribution for Copilot to ingest? That’s like the greatest thing ever.

Now Copilot can summarize tasks like “Michael agreed to buy the donuts” instead of “Conference Room A agreed to buy the donuts”[13].

And now this Intelligent Speaker feature becomes far more than a niche feature for a handful of regulated persons. It becomes a potential game changer for office workers around the globe.

What are the requirements for Intelligent Speaker?

We now know that Intelligent Speaker will save the planet. Or something like that. How does one set it up? What are the requirements? There are quite a few and I’ll start by setting up the administrative side and then show you how to set up the end-user side.

Hardware requirements

I’ve already mentioned the six supported Intelligent Speakers above. But here they are again with pictures. I’m doing this to drive home just how different the Jabra PanaCast 50 (and the Lenovo ThinkSmart Bar 180) is than the other Intelligent Speakers on the market.[14]

EPOS Capture 5

EPOS Intelligent Speaker Microsoft Teams Rooms Meeting Transcription VOI - Black - Picture 1 of 4

Jabra PanaCast 50

Jabra PanaCast 50 - Video conferencing device - Zoom Certified, Certified for Microsoft Teams - black 1

Lenovo ThinkSmart Bar 180

Sennheiser TeamConnect Intelligent Speaker

Yealink MSpeech

The Yealink MSpeech intelligent speaker

Yealink MVC S60 (Maybe)

Two are an all-in-one video bar with industry leading video and audio. Three of them are speaker pucks and the other is also a center-of-table device. Can you spot which one is the best option?[15]

Note that Intelligent Speaker now works without the need for a certified Intelligent Speaker. See the discussion above. TLDR: any microphone connected to your Teams Rooms on Windows will work, but a certified Intelligent Speaker will work better.

The second hardware requirement is that – as of this writing in October 2023 – the Intelligent Speaker is only available on Microsoft Teams Rooms on Windows.[16] This won’t work on Zoom Rooms (Windows or Android). And it won’t work on Teams Rooms on Android. The Android based Jabra PanaCast 50 VBS cannot do attributed captions. This may change at some point, but until then, if someone is interested in deploying Intelligent Speaker, Microsoft Teams Rooms on Windows is the only option. 

And that’s about it for hardware.

Note: If someone is still using a Logitech SmartDock running Teams Rooms, sell them a more modern Teams Rooms implementation. Remind them that Intelligent Speaker is not supported on those ancient things due to “… a known issue that Teams Rooms can’t recognize the Intelligent Speaker through the dock.” 4

Software Requirements

The software requirements for Intelligent speaker are straightforward. You will need a Microsoft Teams Rooms on Windows installation, connected to the Intelligent Speaker of your choosing Jabra PanaCast 50. You also need to set the PanaCast 50 as the default speaker and microphone within Teams Rooms. For Intelligent Speaker to work, it must be the default speaker and microphone.[17]

Network Requirements

The network requirements for Intelligent Speaker are the same as for any Teams Rooms on Windows installation with the exception that when using speaker attribution with a certified Intelligent Speaker, you need 7Mbs of available upload bandwidth. [18] On the nerd side, the seven microphones send seven streams to Microsoft that adds a maximum of 1Mbs per audio stream which would be a maximum of 7Mbs. Once the audio streams reach Microsoft, magic happens, and voice matching is tried in the cloud.[19]

If you are not using certified intelligent Speaker hardware, then there is no increase in audio bandwidth as only a single stream is sent to Microsoft. It is the same stream of audio sent if you have never used this feature. As such, the required audio bandwidth for non-certified Intelligent Speaker microphones is 1Mbps.

Licensing Requirements

The Teams Rooms resource account needs a Teams Rooms Pro license assigned to it. Speaker attribution is not supported on the Teams Rooms Basic license.[20]

Note: If a customer is still using the legacy Teams Rooms Standard or Teams Rooms Pro license, Intelligent Speaker features will work with both licenses.

How about user licenses?

The meeting organizer needs at least an E1 or A1 license in order to create the meeting. Their ability to permit recording and transcription is based on the meeting policy assigned to them. this gets discussed further down.

Attendees whose voice you want attributed in the transcript must be members of the same tenant as the meeting organizer. They also need to be able to sign in to teams and record their biometric voiceprint. The ability to record your voiceprint is also controller by a meeting policy, discussed later in the “Create/Edit Teams Meeting Policies” section below.

At this point, with the licenses assigned to the meeting organizers and attendees, voice recognition will work. So long as you follow the requirements for setting up the meeting (discussed in the “How to use Intelligent Speaker in a Meeting” section below), the transcript of the meeting will have the names of the in-room attendees.

Note that you can’t do much with the transcript at this point, at least with regards to Microsoft tools. All you have now is a transcript with speaker attribution. If you want to do something useful with it (at least in this modern Microsoft world), you will need to assign your users Copilot and/or Teams Premium licenses. 

Note: You can always feed the transcript into ChatGPT or other larger language models and they’ll do a really good job analyzing the transcript for you. You don’t *have* to use Copilot, it’s just easier and native within the Microsoft ecosystem.

The meeting organizer does not need a Copilot or Teams Premium license for voice recognition/intelligent speaker to work.  You can happily join a meeting hosted by someone in your tenant with an E1 license and – if you have a Copilot license – you can then fire off prompts to Copilot during or after the meeting. Your ability to query copilot is in no way tied to the meeting organizer also having a Copilot license. All Copilot is doing is analyzing the transcript. The meeting organizer’s role in your ability to use Copilot only goes so far as their rights to create meetings that allow for transcripts. 

In the scenario above, the meeting organizer (who does not have a Copilot license) cannot use Copilot and will not get an intelligent meeting recap, but you will as you have a Copilot license. I made a table below to help explain this and other licensing add-on scenarios. The below is above and beyond the E1/A1 minimum license.

  Organizer Attendee
No Copilot licenses for anyone No Copilot prompting, no intelligent meeting recap, but attributed transcript created if their meeting policy allows it. Can download the transcript. No Copilot prompting, no intelligent meeting recap. Can download the transcript.
Organizer has a Copilot license, attendees do not Attributed transcript created if their meeting policy allows it. Organizer can prompt Copilot and receive the post-meeting intelligent meeting recap. Can also download the transcript. Can download the transcript but cannot use Copilot to do anything fun with it.
Organizer has a Teams Premium license, attendees do not. Attributed transcript created if their meeting policy allows it. Organizer will receive the post-meeting intelligent meeting recap. Can also download the transcript. Can download the transcript but cannot use Copilot or Teams Premium to do anything fun with it.
Organizer has no Copilot or Teams Premium license; Attendee has Copilot license. Attributed transcript created if their meeting policy allows it. Can download the transcript but cannot use Copilot to do anything fun with it. Can also download the transcript. Attendee can prompt Copilot and receive the post meeting intelligent meeting recap. Can also download the transcript.
Organizer has no Teams Premium or Copilot License; Attendee has a Teams Premium license. Attributed transcript created if their meeting policy allows it. Can download the transcript but cannot use Teams Premium to do anything fun with it. Can also download the transcript. Attendee will receive the post-meeting intelligent meeting recap. Can also download the transcript.
Everyone has a Copilot license Attributed transcript created if their meeting policy allows it. Organizer can prompt Copilot and receive the post-meeting intelligent meeting recap. Can also download the transcript. Attendee can prompt Copilot and receive the post-meeting intelligent meeting recap. Can also download the transcript.

Note: This is assuming attendees have the right to view and download transcripts. Here is how to block downloads of Teams transcripts.

Note: Downloading transcripts presumes that transcripts are enabled for the meeting. This does not apply to the option to use Copilot without transcripts.

Other stuff

This feature is available in all countries and regions, at least as Microsoft defines them. That does not mean that all languages and locales are supported. See this list for a list of supported locales[21].

Beyond being available in certain locales, there are legal ramifications to using Intelligent Speaker. For Intelligent Speaker to work, users will have to give up some biometric information (i.e., their voice print). Some nations, principalities, city-states, and other political boundaries may have an issue with this. Check first if this is legal to be used where you intend to set it up.

Second, if it is legal to be used, verify with your company’s legal team if it is legal (or desired at all) in your organization. Some companies like plausible-deniability and not having a transcript sure helps avoid some of that pesky legal paperwork that needs to be handed over in a lawsuit. Or they just really value their employee’s privacy.

Assuming the points above are cleared, the Microsoft Teams administrator then needs to create meeting policies that explicitly enable the voice attribution feature. Depending on how they do it, this will apply to all users of that Microsoft 365 tenant (editing the Global meeting policy) or they can be more tactical and create a custom policy and only assign the policy to users willing to give up their voice print for the common good.

Which leads to….you must hope your users record their voice prints. It is completely voluntary for them to do this. It’s generally bad form to force an employee to give over personal biometric data like their voice[22].

How do I set up Intelligent Speaker?

This gets a little tricky but if it were easy, I wouldn’t be writing this. 

If you are setting this up on hardware that is not a certified Intelligent Speaker, you can safely skip down to Create/Edit Teams Meeting Policies.

Configure the hardware

The first thing is to make sure your Intelligent Speaker is on the latest firmware. Make sure your Jabra PanaCast 50/Lenovo ThinkSmart Bar 180 is on firmware version 6.22 or later. You also need to have Jabra Direct[23] version 6.11.28601 or later.

Connect your PanaCast 50 to a computer and start Jabra Direct. Once the P50 is recognized by Jabra Direct, click on it to get to the settings.

A screenshot of a computer

Description automatically generated

On the screen that appears, click on Settings to get to the good bits.

A screenshot of a computer

Description automatically generated

From within Device settings, scroll down until you see the Playback device type setting. Hit the drop down and change it from “Communication device” to “Microsoft Teams Rooms device”.

A screenshot of a computer

Description automatically generated

Click Save at the top and then reboot the P50. The PanaCast 50 is now ready to be an Intelligent Speaker.

Configure Teams Rooms on Windows

After your P50 reboots, you’re not quite done. You now need to verify that the setting was successfully applied and that the P50 is set as the correct output device within Teams Rooms.

Go to the Teams Rooms on Windows console and tap More.

A screenshot of a phone

Description automatically generated

On the next screen tap Settings.

A screenshot of a computer

Description automatically generated

You are then prompted to sign into Teams Rooms with administrative credentials. Enter the administrative password to move on to the next step.

From within Settings, scroll down to the Peripherals section.

A screenshot of a computer

Description automatically generated

Finally, set the Audio settings for Teams Rooms. For Microphone for Conferencing, select the PanaCast 50 that has UAC2_TEAMS in the name, as shown in the image below. (For a ThinkSmart Bar 180, the name will be different, but the (UAC2_TEAMS) will be the same).

A close up of a sign

Description automatically generated

Set the Speaker for Conferencing to the PanaCast 50 that has UAC2_Render in the name. This is shown in the below image. (For a ThinkSmart Bar 180, the name will be different, but the (UAC2_Render) will be the same).

A black and white text

Description automatically generated

Set the default speaker to the same thing you set above – the (UAC2_Render) device.

At this point you’ve completed the easy part from the admin side. Now we need to create some meeting policies.

Create/Edit Teams Meeting Policies

Up until now, this has been straightforward and anyone with a laptop, a cable, and (optionally) a PanaCast 50 handy can do this work. At this point, things change. In most organizations, you now need to bring in your Microsoft 365 administrators as you need to edit or create new policies to apply some custom settings.

There are two ways to do this:

  1. Edit the Global Teams meeting policy.
    1. The advantage here is it’s global, so all user accounts will get this setting.
  2. Create/Edit a custom policy and only apply it to certain users.
    1. Generally, you should not edit Global policies and instead create custom policies. This isn’t the document to debate the pros and cons of policy creation and hierarchy. But in this paper, I’m going with this approach in that I will create a new Teams meeting policy.

The person creating or editing these policies needs any of the following roles assigned to them:

  • Teams Administrator[24]
  • Teams Communications Administrator[25]

You need to edit/create two new policies – one for the Microsoft Teams Rooms Resource Account[26] and one for end users.

Note: You could create just one policy covering both settings, but I’m going to show the most granular way to do this. How customers choose to implement these policies is wholly up to them.

The first policy is to enable the speaker attribute feature on Microsoft Teams Rooms. Note that you don’t set policies on the Teams Rooms device, you set policies on the Resource Account that signs into Teams Rooms and runs the meetings on the device.

First, I will create a policy called IntelligentSpeakerMTR that sets the value “roomAttributeUserOverride” to “Attribute”.[27]

There are three values you can set for “roomAttributeUserOverride”.

One is “False” which turns the feature off, another is “Attribute” which enables speaker attribution, and the third is “Distinguish” which tells the speaker to distinguish between different voices but to *not* provide name attributes for the transcript (e.g., “Speaker 1”, “Speaker 2” instead of “Alice”, “Bob”)

The second policy is assigned to the users that will be allowed to have their voices transcribed (aka, folks who aren’t bonkers over the privacy of their biometrics.). This policy will be called IntelligentSpeakerUser and I will set the values for “enrollUserOverride” to “Enabled” and the value for “AllowTranscription” to “True”.

What do these attributes set? Good question. Also – good to know you’re awake and have read this far. You, my friend, are an amazing human being.

“enrollUserOverride” is used “…to set voice profile capture, or enrollment, in Teams settings for a tenant.”[28] That’s a bit much as this isn’t a tenant level setting, but a user level setting. But whatever. It’s in Microsoft official documentation so it must be true.

If this attribute is disabled, the following happens (or doesn’t, depending):

  • Users who have never enrolled can’t view, enroll, or re-enroll.
  • The entry point to the enrollment flow will be hidden.
  • If users select a link to the enrollment page, they’ll see a message that says this feature isn’t enabled for their organization.
  • Users who have enrolled can view and remove their voice profile in the Teams settings. Once they remove their voice profile, they won’t be able to view, access, or complete the enrollment flow.[29]

We want to enable this. When enabled, you get all this awesomeness:

  • Users can view, access, and complete the enrollment flow.
  • The entry point will show on Teams settings page under the Recognition tab.[30]

The other attribute we will set is “AllowTranscription” which is obvious. You either allow transcription or you don’t. I want to allow transcript so I will set this to True.

Creating Teams Rooms Policy

Let me show you how to create these policies using Microsoft Teams PowerShell. You cannot do this using Teams admin center, which is the tool for total n00bs. YOU are not a total n00b are you? You are a 1337 h4x0r! We 1337 h4x0r5 use PowerShell!

At this point, if you are indeed 1337 h4x0r, do your thing. You don’t need documentation!

For those aspiring 1337 h4x0r5, I’ll walk you through this.

First, start PowerShell on your PC as Administrator (Pro Tip: use the Terminal app). If you don’t know how to start PowerShell, you can stop now and pass this documentation off to a more experienced administrator.

Once PowerShell has started, you need to make sure you have the Microsoft Teams PowerShell module installed. If you are unsure, run the following cmdlets[31] with these parameters:

Install-Module -Name PowerShellGet -Force -AllowClobber

Install-Module -Name MicrosoftTeams -Force –AllowClobber

Import-Module –Name MicrosoftTeams

A screenshot of a computer program

Description automatically generated

Now that you have the correct PowerShell module installed, you need to connect your PC to Microsoft Teams. You do this by running

Set-ExecutionPolicy -ExecutionPolicy Unrestricted

Import-Module –Name MicrosoftTeams

Connect-MicrosoftTeams from your terminal window. After entering this cmdlet, you will be prompted to sign in.

A screen shot of a computer screen

Description automatically generated

When successfully signed in, you get this wonderful feedback (your values will be different).

Now we can get to business.

The cmdlet needed to create a new Teams meeting policy is New-CsTeamsMeetingPolicy. This is the full command line I will enter:

New-CsTeamsMeetingPolicy -Identity IntelligentSpeakerMTR -roomAttributeUserOverride Attribute

Copy and paste that into your PowerShell session. After a few seconds you should get a raft of information back. You can scroll up and see if the change has taken effect. If you aren’t into playing a PowerShell version of “Where’s Waldo”, you can run this PowerShell command to see what the value is set to for “roomAttributeUserOverride”.

Get-CsTeamsMeetingPolicy -Identity IntelligentSpeakerMTR | Select “roomAttributeUserOverride”

If the value returned is “Attribute” then you are ready for the next step.

Creating the End User Policy

The second policy you need to create is the one you will assign to end users. Only end users with this policy assigned will be able to enroll their voice for speaker attribution. As above, open a PowerShell session and connect to your tenant in the cloud.

Below is the PowerShell command needed to create the policy.

New-CsTeamsMeetingPolicy -Identity IntelligentSpeakerUser -enrollUserOverride Enabled -AllowTranscription $true

Copy and paste that into your PowerShell session.

After you hit enter, a raft of information should go flying by. You can scroll up to validate the changes in this policy or run the following PowerShell to confirm the attribute you set.

Get-CsTeamsMeetingPolicy -Identity IntelligentSpeakerUser | Select “enrollUserOverride”, “AllowTranscription”

If you see “Enabled” and “True” then you are good to go.

Assigning Teams Rooms Policy for n00bs

Now that you have the Teams Rooms policy created, you need to assign it to a Teams Rooms Resource account. The perceived easiest way for a new administrator to do this is via Microsoft Teams admin center (TAC). This way you can click away with no mucking about with PowerShell.

To access TAC, open a web browser and enter admin.teams.microsoft.com into the address bar. If necessary, sign into your Microsoft 365 tenant.

From here navigate to the Users section and click on Manage Users

A screenshot of a computer

Description automatically generated

From here, you can either scroll down and find an account, or type in the account name in the Search for a user search box. In this case, I will edit the second account listed – “Conference Room – MTR1”. I click on the name in the Display name column to bring up the properties for that account.

Once I have the properties for that account, I click on Policies to see which policies are assigned to that account.

A screenshot of a computer

Description automatically generated

To change a policy, click on the Edit icon. This brings up the list of possible policies and their settings. Scroll down until you see Select Meeting policy.

A screenshot of a computer

Description automatically generated

After clicking on the drop-down list for meeting policies, you see all available options. Select IntelligentSpeakerMTR and click Apply at the bottom of the screen.

A screenshot of a conference room

Description automatically generated

You have now assigned this policy to the Teams Rooms Resource Account.

Note: After a policy is assigned, it can take up to 48 hours to take effect. To get the policy to take effect sooner, accounts must be signed out and signed back in.

Assigning Teams Rooms Policy for 1337 h4x0r

Grant-CsTeamsMeetingPolicy -Identity mtr.mtressler.1@jabrademos.com -PolicyName IntelligentSpeakerMTR

Assigning End User Policy for n00bs

I’ll make this quick.

The steps are the exact same as above – go to Teams admin center, find an end user, and change their policy to IntelligentSpeakerUser. Click apply and wait for the change to take effect.

The one difference is you probably want to apply this to several users at once and not assign the policy one at a time. To do this, click to the left of the names to which you want to assign this policy. A checkmark appears next to the selected names.

A screenshot of a computer

Description automatically generated

Once you have the names selected, scroll back to the top and click Edit settings.

A screenshot of a computer

Description automatically generated

From here, scroll down to Meeting policy and select IntelligentSpeakerUser, then click Apply at the bottom to apply the policy to the group of users.

A screenshot of a computer

Description automatically generated

Assigning End User Policy for 1337 h4x04

Grant-CsTeamsMeetingPolicy -Identity avance@jabrademos.com -PolicyName IntelligentSpeakerUser

Alternately, something like this:

Get-CsOnlineUser | Grant-CsTeamsMeetingPolicy -PolicyName IntelligentSpeakerUser

Digital voice profile

At this point, all the work is done from the administrative side. Now it is up to end users to record their voice profiles. You (legally, at least in most countries) can’t force people to do this. Sadly. 😊 However, once people see the benefit of this, they may volunteer to do this, but I would be surprised if you ever get 100% end user buy in. Some people are too lazy to do it and some people value their biometric privacy too much.

Biometric Privacy with Intelligent Speaker

What is the privacy story? Where is my recorded voice stored? Can anyone access it? Does it work cross-tenant? Those are all good questions and maybe I’ll answer two or three of those.

First, your voiceprint is stored in the same region as your other data. If your tenant is scoped to only store data in a specific region, your voiceprint stays in that specific region. The voiceprint is encrypted at rest.5 “Voice data is stored in the Office 365 trusted compliance store”.6

“Voice data will be securely stored in the Office 365 Cloud, and users will retain control of their information, including the ability to delete it at any time. The capture of voice data can be turned on or off for each meeting. Additionally, admins have full control to turn on/off people identification through voice recognition feature across the organization.”[33]

Admins can export the audio data[34] via Teams admin center.  If you go to a user who has made a voice recording, you will see an option to download the biometric profile. [35]

Audio data can only be used within your tenant. This means if you or someone in your tenant hosts the meeting, then Intelligent Speaker features will work (if enabled). If you walk into a meeting room at another Office 365 tenant, your voice profile data will not be used and you will show up as “Speaker X” in the transcription.

To clarify, Alice from Contoso has set up her voice profile. She is going to an in-person meeting with Bob at Northwind Traders. Both Bob and Northwind Traders have successfully set up Intelligent Speaker. In the meeting with Bob and Alice in the Northwind Traders conference room, Bob will be properly attributed in the meeting room, while Alice will appear as “Speaker 1” – even though she has set up her biometric data. This is because the biometric data from Contoso is not shared with Northwind Traders.

Another point: the voice print biometric data is only used within Teams voice-recognition scenarios and not by any other Microsoft software or service. [36]

In a pro-privacy move, user voice print data is removed if the user “…isn’t invited to any meetings with an Intelligent Speaker within that 1-year period.”[37] If a user leaves the company and their account is deleted, the data is removed within 30 days, or whatever data retention policy is in effect.[38]

Set up your digital voice profile

To set up your voice profile, open the Teams app, click the three dots (…) in the upper right, and click on Settings.

A screenshot of a computer

Description automatically generated

Once settings opens, scroll down to Recognition. If you see the message that says you are not enabled, then :sad face:. Most likely, you have not been assigned the Intelligent Speaker policy defined above, or – more likely – the policy has not yet taken effect on your account. Come back later.

image

If the policy has been assigned and applied to the user, you get the following screen instead.

image

Click on Create voice profile to get started.

image

At the top of the screen (where it says “Microphone array…”) you can select which microphone you are using for the recording. Make sure you pick the right microphone and that you are in a quiet room. Click Start voice capture and read that paragraph. It doesn’t matter if you mess up and need to read part of it again. You are not recording something for posterity. You are just letting Teams learn what your unique voice print is.

Nerd Note: You can read *anything* you want. That paragraph on the screen is just something that’s long enough to get a decent voice print. So as mentioned above – it doesn’t matter if you mess up. The point is that you speak for 15 seconds or so.

Note: You can not create or update your voice print while you are in a meeting. If you try to do this, you will receive this awkwardly worded notice:

How to use Intelligent Speaker in a Meeting

Now that you have your voice print made, you just roll into a conference room and start talking and magic happens, right?

Oh, if it were so easy. You precious child. With your simplistic desires.

There is more than just setting up policies and recording your voice. You need to set up the meeting invite correctly.


Note: Speaker recognition will work in an ad-hoc/unscheduled meeting, but speaker attribution will not. Put another way, even with your voice enrolled, you will show up in the transcript as Speaker 1 or Speaker 2, not as your real name.


Here are the requirements that must be met to meet with an Intelligent Speaker to have attributed transcription work successfully:

    1. Everyone who intends to have their voice transcribed must be listed on the meeting invite.
    2. No more than 20 people who have registered their voice[42] can be on the meeting invite. (Well, 19 if you include yourself, the organizer)
      • “Intelligent Speakers work best in medium-sized rooms that hold 8–10 people.”[39]
      • If more than 20 people with enrolled voices are on the invite, Intelligent Speaker is disabled.[40]
    3. Transcription needs to be supported for the meeting. (We did that in the user voice policy, but it’s the meeting organizers policy – not yours – that determines if transcription is allowed)
    4. Someone needs to turn on transcription in their Teams client. Once the meeting starts, you can only enable transcription from the Teams client and not directly on the Teams Rooms console.
      • To enable transcription, click the three dots (More) from within the meeting. Then click Record and transcribe > and finally click Start transcription

A screenshot of a computer

Description automatically generated

At this point, you should see a transcription with your name instead of something generic as seen at the very beginning of this white paper (Figure 1: Donuts are delicious).

Below is a stolen image that was very likely a copyright violation until I probably did the greatest Photoshop edit ever to totally make it a unique work. Like Andy Warhol, this is my art.

If you look at the transcription on the right, you’ll see that it says Serena Ribeiro (Conf Room P…). This lets us know that Intelligent Speaker is working as it recognized Serena’s voice and that she is in a conference room.

A group of people in a video conference

Description automatically generated

Note: This feature is currently removed and will be added again later. (Note added 6 August 2024)

What if someone in the conference room speaks and they don’t have a voice print (either they never set it up, weren’t on the invite, or are from a different tenant). What happens then? Anything?

Intelligent Speaker tracks all the voices in the room (well the first 10). If it recognizes a unique voice, it will tag it as Speaker 1 in the transcript. If a second new user speaks with no accessible voice print, they will be Speaker 2, etc.

See the following stolen screen shot for an example of a person being tagged as Speaker 1.

Select Identify speaker[41]

Editing mistakes

What if this whole thing makes a mistake? Or we just want to manually attribute a user in the transcript.

In the image above, you see there is a button named Identify speaker. If you click that, a drop down appears showing the names of everyone that was on the meeting invite. Pick the right name and that attribution is fixed.

Note: You can only change to a person that was on the meeting invite. This is to prevent falsely attributing something to someone who wasn’t in the room. Otherwise, nothing would stop me from attributing something to Luke Skywalker that Darth Vader said.

For more information on editing attribution on a transcript, see this document. I’m not in the mood to basically copy/paste that article into this one.

Note: You can hide your identity in meeting captions and transcripts! See this link for more info.

Troubleshooting

I’m not going to write a guide because:

  1. Review the steps above and make sure you got it right.
  2. Bugs pop up and Microsoft has a page dedicated to known issues. So please go there. (Though at the time of this writing they are still referencing an old Teams Rooms license so…..)
  3. One tip: If you see “Speaker 1” in the meeting transcript instead of the person’s name, this is a sign that this has been set up correctly, but it is not recognizing the person speaking. Make sure the user policies have been assigned to the user – which could take a minute. Or two days. Also, have the user re-record their voice in the Teams client. I have seen this fix a problem with a person not being recognized.

Summary

I hope this document helped you understand what Intelligent Speaker is and how to set it up with a Jabra PanaCast 50. It’s a cool feature but there is one warning:

Don’t expect perfection.

I’ve historically been disappointed in the accuracy of the attributed names in the transcription.  However, due to the Jabra PanaCast 50 going through Microsoft’s Technology Adoption Program (TAP), Microsoft themselves looked closely at this feature for the first time in a while. As such, the service has gotten quite a few improvements on the back end.

Oh hey – I never answered why this solution is called Intelligent Speaker. This whole feature is based around microphones that capture voices so shouldn’t it be called Intelligent Microphone? Yes, yes it should. Also, the name Intelligent Speaker presumes that the person talking is saying something intelligent. That is not always the case.

It’s called Intelligent Speaker just because. It’s what Microsoft initially called it in development – most likely because it was initially based on a speaker puck design so that name kind of stuck.

  1. Microsoft Teams Rooms and Devices: Microsoft Ignite 2023 – Microsoft Community Hub

  2. Microsoft Teams Rooms and Devices: Microsoft Ignite 2023 – Microsoft Community Hub

  3. Flexible work is here to stay: Microsoft 365 solutions for the hybrid work world | Microsoft 365 Blog

  4. The Yealink MSpeech can only connect to a Yealink Microsoft Teams Rooms. The EPOS can connect to any vendors Teams Rooms.

  5. Flexible work is here to stay: Microsoft 365 solutions for the hybrid work world | Microsoft 365 Blog

  6. I will always bring the donuts.

  7. Announcing general availability for Intelligent speakers for Microsoft Teams Rooms – Microsoft Community Hub

  8. Announcing general availability for Intelligent speakers for Microsoft Teams Rooms – Microsoft Community Hub

  9. The Yealink MVC S60 should be certified soon.

  10. Yealink MSpeech can only be used with Yealink Teams Rooms installations

  11. The Lenovo ThinkSmart Bar 180 is manufactured by Jabra. As such, this document completely applies to the Lenovo ThinkSmart Bar 180 as well

  12. Regulated Persons Definition | Law Insider

  13. Just to clear up any future confusion: I will always buy the donuts.

  14. Plus, it makes this whitepaper longer which adds to its legitimacy.

  15. It’s the second one. The Jabra one. That’s the best one! If you picked that one, go get yourself a well-deserved donut.

  16. As mentioned in the preamble, this will change at some point in 2024 where you can also use a BYOD connected laptop or desktop.

  17. I don’t have a reference for this. But I worked at Microsoft. You gonna question me on this one???

  18. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  19. Source: Conversation between Greg Baribault, Microsoft, and the author at Teams Rooms World, 25 October, 2023.

  20. Microsoft Teams Rooms licenses – Microsoft Teams | Microsoft Learn

  21. What is a locale? Locale – Globalization | Microsoft Learn

  22. Wait until we get into facial recognition! Speaking of which – Intelligent Speaker is *not* a technical requirement for facial recognition to work. Microsoft want to recognize people in the room when they are not an active speaker. As of today, the Teams client requires voice before facial biometrics can be recorded but I get the impression that will change in time. Source: Conversation between Greg Baribault, Microsoft, and the author at Teams Rooms World, 25 October, 2023

  23. You can download Jabra Direct from here – Jabra Direct – Engineered to optimize and personalize your headset

  24. Use Microsoft Teams administrator roles to manage Teams – Microsoft Teams | Microsoft Learn

  25. Use Microsoft Teams administrator roles to manage Teams – Microsoft Teams | Microsoft Learn

  26. Create resource accounts for rooms and shared Teams devices – Microsoft Teams | Microsoft Learn

  27. Set-CsTeamsMeetingPolicy (SkypeForBusiness) | Microsoft Learn

  28. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  29. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  30. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  31. What is a “cmdlet”? Cmdlet Overview – PowerShell | Microsoft Learn

  32. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  33. Announcing general availability for Intelligent speakers for Microsoft Teams Rooms – Microsoft Community Hub

  34. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  35. I have no idea why this is possible, as biometric data should be secured and you should only be allowed to delete it. But that’s just my opinion.

  36. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  37. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn

  38. Data retention, deletion, and destruction in Microsoft 365 – Microsoft Service Assurance | Microsoft Learn

  39. Use Microsoft Teams Intelligent Speakers to identify in-room participants in a meeting transcription – Microsoft Support

  40. Use Microsoft Teams Intelligent Speakers to identify in-room participants in a meeting transcription – Microsoft Support

  41. Use Microsoft Teams Intelligent Speakers to identify in-room participants in a meeting transcription – Microsoft Support

  42. Clarified by Ilya Bukshteyen, 18 April 2024, in a chat message of the Teams Firesdai Chat meeting. Microsoft Support
  43. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn
  1. Tenant Administration control for voice recognition (voice profile) in Teams Rooms – Microsoft Teams | Microsoft Learn[]
  2. Get the most out of any Teams Rooms meeting with speaker recognition and Copilot – Microsoft Community Hub[]
  3. https://youtu.be/r4OxLKQ8pC4?si=UM8x0y9uYGbJtMQi&t=150[]
  4. This may no longer be relevant now that any microphone can do speaker recognition[]
  5. Overview of voice and face enrollment – Microsoft Teams | Microsoft Learn[]
  6. https://learn.microsoft.com/microsoftteams/rooms/voice-and-face-recognition#frequently-asked-questions[]

Enabling and Validating QoS on Teams Rooms on Android

While delivering some Teams Rooms on Android training, I was asked a question about Quality of Service (QoS) on Microsoft Teams Android devices. I knew the answer conceptually, but not practically. In other words, I wanted to see it in action. 

In this article, I will show how to set up QoS on Teams Android devices and then how to vlaidate that DSCP tags are being applied.

There are a few ways to implement QoS for Teams – via Group Policy, networking equipiment, and via Meeting settings in Teams.

This article isn’t going to go through the pros and cons of each method. I’m just going with the last option – enabling QoS via Meeting Settings in Teams.

To do this, open up Teams admin center  and expand to the Meetings section. Next, click on Meeting settings.

Meeting settings in Teams admin center

From here, scroll to the Network section. Right at the top of this section is an option named Insert Quality of Service (QoS) markers for real-time media traffic. Flip this switch from Off to On.

Enabling QoS in Teams

Note: This will enable QoS on a lot of things beyond Teams Android devices – like all your Android mobile phones. There *shouldn’t* be a problem here but be sure you ae working on this with your networking staff because there is a lot more to QoS than flipping this switch. I’m also leaving all the media ports at default where your network team may need you to change these values.

Once you have enabled QoS, wait. At some point, like magic, Teams network traffic will start getting tagged with the appropriate DSCP markers.

So how do we know it’s working?

Well in my world, you look at a Teams network packet and see if it is tagged. It is at this point in the article where I can only provide some high level guidance as every network is different and where you go to “listen” to network traffic differs wildly. If you are in any moderately sized IT department, there will be someone who can do this for you or work with you to sniff the network traffic. 

Yup, if this is all knew to you, the workd sniffing is part of the parlance.

In my simple home network, this is how I did it. 

I happen to have a firewall from Ubiquit that has a packet sniffer bnuilt in. It’s called tcpdump which is a commonly used tool to sniff traffic. I won’t explain other ways to do this, but mirrioring a network port or using a dumb hub works too.

Now the most direct way to do this is to remote to the firewall using secure shell (ssh), run tcpdump, capture the traffic, copy it to my PC, and finally look at the file with WireShark.

And that way definitely would work. But it’s a lot of signing in and copying files and manually opening files that was just a bit much for me. Doing a little Bing-fu, I came across a post on the Ubiquiti support forums that allowed me to stream the tcpdump captures directly into WireShark on my Windows PC. 

This command may or may not work for you if you want to do similar packet sniffing. 

plink -batch -l <firewall username> -pw <firewall password> <firewall IP address> sudo /usr/sbin/tcpdump -i <firewall interface, e.g. eth1> -w - host <IP address of Android device> | "c:\program files\WireShark\Wireshark.exe" -k -i -

First up….what is a “plink”? It was new to me too. Plink stands for “PuTTY link” which is a command line interface for PuTTY. It basically signs into my firewall, starts tcpdump, and then redirects the output from tcpdump to a data stream that gets piped to WireShark, where the real magic happens.

The loose ‘-‘ in that command line are required. 

Note: Run this command the first time to cache the secure key from the firewall:
plink -l <firewall username> -pw <firewall password> <firewall IP address>

It’s also a useful test to make sure you can get connected.

Within WireShark, I used a simple display filter to filter on rtcp packets.

Apply rtcp display filter

The first few rtcp packets from your Android bar may not show the QoS information, but if you scroll down a bit, you will start seeing tagged packets, Also be sure that you are clicking on entries where the source is your Android device.

What are you looking for? How do you know QoS was tagged? Good question!

Here is a packet that shows QoS was applied. I’ll break this down to help you understand what you are seeing.

Full display of a captured rtcp packet.

Hopefully you figured out the top half shows all of the packets, and you can apply a display filter. After clicking on a packet, the bottom half of the screen shows the details. I’ve expanded the two most relevant sections.

Before we get any further, let me paste in the default settings that Teams recommends.

Recommended Port Ranges and DSCP Values

Now let’s break down the packet capture and see what we have. Starting from the top of the Internet Protocol (IP) section, we can see there are DSCP values. 

For DSCP, we see something called “AF41”. If we look at the table above, we see that AF41 is applied to video. Cool. Can we further verify that this is a video stream?

Yes, yes we can.

 Look down into the User Datagram Protocol (UDP) section and look at the Source Port value.

This tells us that the rtcp packet was sent from the Android device on UDP port 50031. Look up at the chart above, and you can see that media sourced from port 50031 is indeed video traffic as the video source port range is 50020 – 50039.

We now know that QoS is applied to our packets and that they are being applied correctly.

Enterprise Voice in Skype for Business Server 2015 book. Free!

Early image of the book cover.

Early cover. Note slightly different book name.

In 2016, I released the book Enterprise Voice in Skype for Business Server 2015. It is the definitive tome on connecting Skype for Business with the phone system. As time has passed, purchases of the book have basically fallen off the cliff.

I’m now giving the book away for free. Why would you want to download this?

1.)  It’s free so duh

2.) Much of what is in here is what was ported to Teams. If you are unsure of calling plans and other things like that, this book will help explain

3.) The appendix on Regular Expressions is probably the most useful chapter nowadays. The chapter gives plenty of examples for successfully creating Regular Expressions specific to calling.

If you want to read the back story of why and how I created the book, you can read that here

Download here