IntelliFrame is a term Microsoft uses to define different ways to show in-room attendees. There are three different IntelliFrame types. In this article, I explain what all three types are, how to configure them, and how to manage them.
About the Author
Michael Tressler, aka flinchbot, is a Senior Solutions Consultant at Jabra. He has spent the past 5 years focusing on Microsoft Teams devices, specifically Microsoft Teams Rooms.
He was the 1987 and 1988 American Association of Students of German Mühle champion. As such, do not take him lightly when playing the game the English call Nine Men’s Morris.
Microsoft announced IntelliFrame at Ignite in 2022. On October 12th of that same year, the “What’s new for Microsoft Teams Rooms, Teams devices, and intelligent cameras at Ignite 2022”1 blog article on TechCommunity appeared. Microsoft described IntelliFrame thusly: “IntelliFrame enhances the focus and framing of in-room meeting attendees so that all meeting participants – including those in the room – have their own, individual frame in the video gallery.”
Note: Currently, Microsoft Teams Rooms on Windows supports Edge, Cloud, and Multi-stream IntelliFrame. Teams Rooms on Android only supports Edge IntelliFrame.
This image – stolen from the above referenced blog article – shows what this was going to look like. And it actually does look like this now that it is rolled out.
Focus on the yellow box. In there you see four faces. These are the four most recent speakers. Below them is a view of the full conference room. This whole section combined is the client-side view of Multi-stream IntelliFrame.
If you look at the four squares, you will see that there is a name in each square. This is the name of each of those people. This is magic and sorcery. How does Teams know the name of these people if they just walk into a conference room? Look at the attendee list. You will see their names grouped under the name of the conference room (Conf Room Contoso Square 14).
Magically adding names to faces is the people recognition feature. It was also briefly mentioned in the Ignite announcement. 1
This article will only describe IntelliFrame in detail. By the end of reading this, you will be bored. But with a little luck, you may know what you are talking about if someone asks you about this stuff. Maybe. I’d grab a Red Bull now.
There will be a separate overly long post on people recognition. You have something to look forward to!
Also, unlike people recognition (and voice recognition/intelligent speaker), IntelliFrame can work on Meet Now/Ad-hoc meetings. The meeting does not need to be scheduled in advance.
However – just to be clear – if you want to use people or voice recognition with IntelliFrame, it must be a scheduled meeting. I’m just saying the specific IntelliFrame feature is not reliant on a scheduled meeting
What is IntelliFrame
According to Microsoft2, there are three versions of IntelliFrame.
1.) Edge IntelliFrame
2.) Cloud IntelliFrame
3.) Multi-Stream IntelliFrame
I will now write too many words on each f these three features.
Edge IntelliFrame
Smart Cameras have the ability to do some level of face/torso detection via the magic of artificial intelligence and zoom in or crop to those faces/torsos. If those cameras also have beam forming microphones, you can then detect from where the noise (i.e., talking) is coming. If there is a head or torso in the same place as the audible noise, the intelligent cameras presume that is a person talking.
For individual speaker tracking modes on these intelligent cameras, this is all that happens. A single person is tracked while they speak. When someone else starts talking, the camera processes a new voice direction and facial recognition happens again, and the camera frames the new speaker. This is called active speaker tracking or speaker framing.
Now, wouldn’t it be cool if a camera could actively track more than one person? This is Edge IntelliFrame. Edge IntelliFrame is a feature where the camera can track multiple people at the same time, all done solely by the camera in the meeting space. The camera shows all the people talking. Or the four most recent talkers. Or if no one has spoken yet, four random people. Or 15 people all in tiny squares. It all depends on the camera and its capabilities.
Here is an image of Jabra’s Dynamic Composition view (Jabra’s name for Edge IntelliFrame). Notice the 2×2 grid in the purple square. The PanaCast 50 has recognized four torsos and has zoomed/cropped into them. The Jabra PanaCast 50 only sends one video stream to Microsoft, a video stream consisting of four faces.
Every camera vendor calls this same feature something different. Here is a very incomplete table of the vendor and what they call the feature.
Logitech | Grid View |
Jabra | Dynamic Composition |
Neat | Symmetry |
There are a lot more vendors that do this, but I got bored searching for more. Leave a comment on ones I’m missing and maybe I’ll be motivated one day to update this table.
The point is, Edge IntelliFrame is *only* done on the edge. Microsoft has nothing to do with it. They get a video feed from the camera in the conference room, and they present that feed to remote attendees.
And until recently, Microsoft very rarely mentioned Edge IntelliFrame. It’s only with Microsoft 365 Roadmap item 409537 3 that this term has popped up again. Microsoft is adding a feature to Teams Rooms on Windows allowing you to tell the camera to switch modes. From the roadmap item: “Capabilities covered with this feature includes group framing, active speaker framing, and edge composed IntelliFrame.”
Note that this feature is enabled in the conference room. Remote attendees cannot interact with this view (e.g., people attending outside the conference room cannot turn off this view or customize it in any way). The names of attendees inside the conference room are not shown (no people recognition)
For more information on Edge IntelliFrame, contact your camera vendor as Microsoft has very little to do with this.
Cloud IntelliFrame
If Edge IntelliFrame is fully provided by the camera, Cloud IntelliFrame is the complete opposite. With Cloud IntelliFrame, the camera can be a “non-intelligent camera”, and you can still get face and torso recognition. This is because the camera sends a feed to Microsoft, and then Microsoft throws AI at the video feed. If Microsoft detects some faces, it will pop them into their own little box.
Note: This is now the default view when using a supported camera. “All Microsoft Teams Rooms on Windows with a Pro license equipped with cameras (specified in Supported cameras) automatically opt-in to Cloud IntelliFrame.”4In the image above, facial recognition is done by Microsoft in the cloud. Hence the name Cloud IntelliFrame.
There is a limit to nine participants in a room for Cloud IntelliFrame to work. If there are 10+ users in the room, the standard room view will be used.5
Microsoft has done a nice job explaining how this is configured and how an Admin can turn this off.6
Users inside the meeting room can disable Cloud IntelliFrame by finding this setting on the Teams Rooms console.
Remote attendees can enable or disable this feature on a per-attendee basis. They can right click on the IntelliFrame frame and select Turn off IntelliFrame.
When Cloud IntelliFrame is disabled, the user is shown the full room view from the camera (or whatever view the camera is set to).
To re-enable Cloud IntelliFrame, click in the video frame of the conference room and select Turn on IntelliFrame.
Note: If you click the Spotlight for everyone option, it will spotlight the entire Multi-Stream IntelliFrame view, not the view of one specific video stream.
In-Room Self View
What do the users inside the conference room see with Cloud IntelliFrame? They see the full view of the room. They do not see the zoomed in “IntelliFrame view” that remote attendees see.
Disabling Cloud IntelliFrame
If you don’t want to use Cloud IntelliFrame at all, you can disable it.
The easiest way is to navigate to the settings on Teams Rooms and disabling this setting.
You can also disable Cloud IntelliFrame by using a skypesettings.xml file7.
Here is the minimum you need to add to a skypesettings.xml file:
To re-enable Cloud IntelliFrame, change false to true.
Cloud IntelliFrame require a Teams Rooms Pro license5 and a supported camera8.
Cloud IntelliFrame is currently only supported on Microsoft Teams Rooms on Windows.
As of 5 August 2024, not every Teams client can see a Cloud IntelliFrame feed5. For Teams clients that don’t support rendering the Cloud IntelliFrame feed, they will see the full default view from the camera (often, a view of the whole conference room).
Like Edge IntelliFrame, names of attendees in the conference room are not shown (people recognition is unsupported).
Note: As Microsoft is doing this in the cloud, they are zooming and cropping on an already compressed video stream (even if it is 1080P). As such, the video quality of the zoomed in faces may be lower quality than with Edge IntelliFrame or Multi-stream IntelliFrame.
“That said, Cloud IntelliFrame experiences can sometimes result in lower video resolution than multi-stream cameras…”9 Arash Ghanaie-Sichanie – Senior Director, Teams AI Experiences, Microsoft
Multi-Stream IntelliFrame
With Multi-Stream IntelliFrame, the in-room camera and Microsoft both do some of the work.
There is a lot of intelligence required by the in-room camera, which is why (as of this writing on 2 August, 2024), there are only two cameras certified for Multi-stream IntelliFrame:
1.) The Yealink SmartVision 60 10
2.) The Jabra PanaCast 50 (Go Jabra!) 10
3.) Logitech Rally Bar or Rally Bar Mini running CollabOS 1.14.155
Over time, more cameras will probably be certified.
With regards to Multi-stream IntelliFrame, what makes these cameras special?
Well, for one, they have to do some edge processing. They are basically doing Edge IntelliFrame in that they both track the active speaker and most recent speaker(s). (Jabra tracks the current and most recent speaker while Yealink can track up to four speakers).
Once the speakers are tracked, zoomed, and cropped, the cameras then send individual video streams of each speaker and a panoramic view of the conference room to Microsoft. This is key. In both of the other IntelliFrame types, only one video stream is sent. In Multi-stream IntelliFrame, multiple video streams are sent. Microsoft then catches those feeds and lays them out optimally on your Teams client. This gives Microsoft the flexibility to show the people video feeds next to each other (portrait view) and the panorama underneath, or the people video feeds on top of each other (landscape view) with the panorama underneath. Anyway, now you know why it’s called Multi-stream IntelliFrame.
Here is a picture of Multi-stream IntelliFrame running on a Jabra PanaCast 50. Note that the two active Speakers are also visible in the panorama view of the conference room. (The yellow box does not appear when you use this. I added the yellow box to highlight the Multi-stream IntelliFrame view).
You also see the names of those two people. You have never seen Ben Clarke or Alice Kelly in your life before, but now you know their names. That’s the people recognition feature which is only available with Multi-stream IntelliFrame. People recognition will be covered in a different post.
Multi-stream IntelliFrame is only enabled by an Administrator of the camera. It cannot be enabled/disabled by users in the conference room or by remote attendees using the Teams client.
End user can disable the panorama view on the bottom. And they can bring it back if they so choose. But unlike Cloud IntelliFrame, they cannot disable it.
Enabling Multi-stream IntelliFrame
I do not have a Yealink SmartVision 60. (Hey Yealink, send me one! You have my address on file already! 😁 ). But I do have a Jabra PanaCast 50 so I will be using that to show how to setup Multi-stream IntelliFrame (henceforth, MS IF)
From the Microsoft Learn article11, these are the pre-requisites for MS IF.
- Microsoft Teams Rooms Windows
- Microsoft Teams Rooms Pro license
- Microsoft Teams Rooms with Pro license is required to enable IntelliFrame and people recognition features on Microsoft Teams Rooms.
- Basic license doesn’t support IntelliFrame or people recognition. If you have Teams Rooms Basic license, the camera shows only active speaker and panoramic views.
- A supported camera
- Bandwidth
- All of the MS IF feeds are a maximum of 720P
- If there are three streams, and if a single 720P video stream takes 1.2Mbs, then you will need up to 3.6 Mbs with a Jabra PanaCast 50 (3 x 1.2) or up to 6Mbs with the Yealink SmartVision 60 (5x 1.2Mbs).
- You will need more bandwidth than a traditional single-stream video feed uses
- All of the MS IF feeds are a maximum of 720P
- A supported Teams client to ingest these multiple video streams
- A USB3 connection is required. Be sure you are using the correct cable and that it is plugged into a USB3 port on the compute module.
As of 3 August 2024, not every Teams client can render an MS IF feed. The table below lists many of the Teams clients and if they do or do not show the MS IF view.
Client | Can show MS IF? |
Windows | Yes |
MacOS | No |
Web (Edge) | No |
Android (Mobile) | No |
iOS | No |
Teams Rooms (Windows) | Yes |
Teams Rooms (Android) | No |
On unsupported clients, you see only see the active speaker in a video tile. In other words, you only receive one video stream, and that is the stream of the active speaker.
Configuring Jabra PanaCast 50
I’m going to assume you have your PanaCast 50 unboxed, powered up, and connected to your PC or laptop. The first thing to do is to download Jabra Direct12 and make sure you are on a supported firmware.
The minimum required firmware for MS IF is 8.0.7. If you are not on this firmware, Jabra Direct will prompt you to upgrade to the latest available firmware. Follow the prompts and in about 10 minutes, you will be updated.
If you would prefer to watch a video on how to set this up, along with a demo of Multi-stream IntelliFrame on a PanaCast 50, my coworker Eric Taylor made just that video.
Within Jabra Direct, click on the Device section and then click on your PanaCast 50.
Next, click on the Settings option in the lower right.
Next, Click on Camera to get to camera-specific settings.
Now, select Multi-stream for Microsoft Teams Rooms from the Dynamic Composition dropdown.
After clicking Save, you will be given a notice that a reboot is required. Click Save and the PanaCast 50 will reboot (assuming it is not already in a call).
Configuring Teams Rooms on Windows
Once you make this change on the camera, you need to go into the Teams Rooms settings and verify the camera is still the default video device.
On the Teams Rooms Console, tap on More, then Settings, and sign in.
From here tap on Peripherals. In the Cameras section, you will now see three camera options. It doesn’t matter which one you select, just make sure you see these three choices. If not, validate the steps above to set the PanaCast 50 to Multi-stream for Microsoft Teams mode.
Click Save and exit and you are ready to go.
Client options
What capabilities does the client have to enable/disable Multi-stream IntelliFrame or to manipulate the view? Well, don’t get too excited. There is only one option and that is to enable or disable the panorama view. That’s it. Unlike Cloud IntelliFrame, you cannot enable or disable MS IF entirely.
Show/Hide Panorama
Below is an image of me doing some testing and smiling very broadly. You see the full MS IF view.
If I right click on any of the MS IF streams, a menu pops up. For the purposes of this article, the only interesting option is Hide panorama.
After you click to hide the panorama, you get a larger view of the current and most recent speaker(s).
You can right click again and the option to Show panorama appears.
Only One Person
What happens if only person is in the meeting room when the meeting starts? What view do you see there?
You see the stream of the active speaker, and that is all. You do not see a panorama below the stream. Put another way, Multi-stream IntellFrame is only enabled when two or more people are in the meeting room.
If the meeting starts with one person, and a second person walks in later, MS IF will get enabled. There do not need to be two people in the meeting room at the start of the meeting.
What happens if everyone leaves the meeting room except for one person? MS IF is not disabled. Instead, you see a full view of the active speaker and the room’s panorama view. In other words, two streams are sent instead of three.
Changing views
Multi-stream IntelliFrame works best in Gallery view. There are a lot of options and I’m not going to go through every possible view here and explain what is being shown. But in general, if you change the view or the view changes for some reason (someone is sharing a document, for example), the active speaker stream is used.
Panorama view
As mentioned above, users can choose to hide and show the panorama view. There is one thing that you might notice in the panorama view that is interesting: The whole panorama is shown.
Many cameras have the ability to zoom in and show only the part of the room that has people in it. In the world of Jabra, this is called Intelligent Zoom. With Intelligent Zoom, the Jabra PanaCast 50 finds the leftmost person and the rightmost person and zooms the camera in to show them and what is between them.
As best I understand it, Microsoft requires the panorama to show the entire view, not a zoomed in view.
In-Room Self View
What do the users inside the conference room see with Multi-stream IntelliFrame? They see the active speaker view. They do not see the multi-stream view that remote attendees see.
Hopefully this article helped clarify IntelliFrame in the world of Microsoft Teams Rooms. If you have any questions, please comment below and I’ll make up an answer for you.
- What’s new for Microsoft Teams Rooms, Teams devices, and intelligent cameras at Ignite 2022 – Microsoft Community Hub[↩][↩]
- I promise this is true, though I can’t find a link[↩]
- Microsoft 365 Roadmap | Microsoft 365[↩]
- Cloud IntelliFrame – Microsoft Teams | Microsoft Learn)
Pay attention to the yellow box in the below image. This is the standard video feed from the camera to Microsoft. Nothing special here really, just a view of people in a conference room.
Now, look at the image below. It’s the same people in the same room, but this time, Cloud IntelliFrame has been enabled. We now get a zoomed-in, cropped view of their faces. This view can be provided by most Teams Rooms certified cameras.((Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[↩]
- Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[↩][↩][↩]
- Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[↩]
- Remotely manage Microsoft Teams Rooms device settings – Microsoft Teams | Microsoft Learn[↩]
- Cloud IntelliFrame – Microsoft Teams | Microsoft Learn[↩]
- A deep dive into intelligent cameras Multi-Stream and Cloud IntelliFrame for Teams Rooms – Microsoft Community Hub[↩]
- What is Microsoft Multi-Stream IntelliFrame and Intelligent Camera? – Microsoft Teams | Microsoft Learn[↩][↩]
- What is Microsoft Multi-Stream IntelliFrame and Intelligent Camera? – Microsoft Teams | Microsoft Learn[↩]
- https://www.jabra.com/software-and-services/jabra-direct[↩]
1 pings
Skip to comment form
I just read the this blog post start to finish, and I’m not bored. What’s wrong with me?
Huddly is doing an edge intelliframe for their IQ camera called “Gallery view”, I’m interested to see how their crew product works in the future, maybe as multi stream or as multi camera.
Logitech Rally bars that use rightsight2 have an edge intelliframe too, will be interesting to see if they release edge intelliframe with their “Sight” product.
If we just have cameras capable of Edge or Cloud IntelliFraming, will voice registration be enough to have names display for in-person participants?
Voice registration will only add the person’s name to the transcript. It will not show their name in the video frame. Two separate and independent technologies.
Obviously a face and/or voice profile is required in order to match a certain voice or voice stream and identify that individual. Do you have more knowledge or information on how exactly this works? I’d assume that either the camera or (probably) the Microsoft cloud must compose a biometric fingerprint of a user in each individual stream in order to compare it against stored voice/face profiles, as it can’t know in advance which of those users have actually enrolled for this. That could mean that a biometrical profile is (temporarily) created as part of the processing even though a user might not have consented to this, similar to how law enforcement might use smart cameras to detect known criminals.
Hi. I do not know the inner details. However I can walk through what I know. Users who choose to enroll, record their voice or have their picture taken (or both). This creates a biometric profile.
At some point, a meeting is created and those people are on the meeting invite (this is a requirement for either feature to work). They walk into a conference room and start speaking. What happens on the audio side is the audio stream is compared to the biometric audio profile of registered users. if there is a match, the name is a attributed in the transcript. For a user with no biometric profile (they did not enroll their voice), then there is no biometric check for their voice.
I think where you are heading is what if someone without a biometric voiceprint speaks? Well, there obviously won’t be a match as they have no biometric voice print. However, MS *does* keep track of unique voices, e.g. Speaker 1, Speaker 2, etc. So there must be some temporary voice print made int he background which is then thrown away when the meeting ends (I hope! :)).
For facial recognition, the user takes selfies in the Teams app to create their biometric face profile. When the meeting starts, that person walks into the conference room. The in-room camera takes a snapshot and sends it to Microsoft. MS then compares that snapshot with known-faces (biometric faceprint exists) and if there is a high probability match, their name shows up on screen. If there is no match, no name is shown and so I believe MS does *not* create a temporary face print for un-enrolled users.
This is just my best guess as I do not know the inner workings of the back end. Just thinking through what I know and how it might be done.
[…] The Most Epic Blog Article on IntelliFrame That There Ever Was and That There Ever Will Be. – … […]