FFMPEG录屏(21)--- Linux 下基于X11枚举所有可见窗口,并获取标题、图标、缩略图、进程路径等信息

众人拾柴火焰高,github给个star行不行?
open-traa/traa
traa is a versatile project aimed at recording anything, anywhere. The primary focus is to provide robust solutions for various recording scenarios, making it a highly adaptable tool for multiple use cases.

在 Linux X11 下枚举窗口并获取窗口信息

在 Linux 系统中,X11 是一个非常流行的窗口系统,它提供了丰富的 API 用于管理和操作窗口。在这篇博客中,我们将详细介绍如何使用 X11 枚举当前系统中的窗口,并获取它们的标题、截图、进程路径、进程图标等信息。

基础知识

官方文档 新手 Displays and Screens
官方文档 新手 Window System Objects
官方文档 API手册

摘两段比较重要的概念

X Is Client / Server
The X Window System was designed to allow multiple programs to share access to a common set of hardware. This hardware includes both input devices such as mice and keyboards, and output devices: video adapters and the monitors connected to them. A single process was designated to be the controller of the hardware, multiplexing access to the applications. This controller process is called the X server, as it provides the services of the hardware devices to the client applications. In essence, the service the Xserver provides is access, through the keyboard, mouse and display, to the X user.

Like many client/server systems, the X server typically provides its service to many simultaneous clients. The X server runs longer than most of the clients do, and listens for incoming connections from new clients.

Many users will only ever use X on a standalone laptop or desktop system. In this setting, the X clients run on the same computer as the X server. However, X defines a stream protocol for clients / server communication. This protocol can be exposed over a network to allow clients to connect to a server on a different machine. Unfortunately, in this model, the client/server labeling can be confusing. You may have an X server running on the laptop in front of you, displaying graphics generated by an X client running on a powerful machine in a remote machine room. For most other protocols, the laptop would be a client of file sharing, http or similar services on the powerful remote machine. In such cases, it is important to remind yourself that keyboards and mice connect to the X server. It is also the one endpoint to which all the clients (terminal windows, web browsers, document editors) connect.
Displays and Screens
X divides the resources of a machine into Displays and Screens. A Display is typically all the devices connected to a single X server, and displaying a single session for a single user. Systems may have multiple displays, such as multi-seat setups, or even multiple virtual terminals on a system console. Each display has a set of input devices, and one or more Screens associated with it. A screen is a subset of the display across which windows can be displayed or moved - but windows cannot span across multiple screens or move from one screen to another. Input devices can interact with windows on all screens of an X server, such as moving the mouse cursor from one screen to another. Originally each Screen was a single display adaptor with a single monitor attached, but modern technologies have allowed multiple devices to be combined into logical screens or a single device split.

When connecting a client to an X server, you must specify which display to connect to, either via the $DISPLAY environment variable or an application option such as -display or --display. The full DISPLAY syntax is documented in the X(7) man page, but a typical display syntax is: hostname:display.screen The "hostname" may be omitted for local connections, and ".screen" may also be left off to use the default screen, leaving the minimal display specification of :display, such as ":0" for the normal default X server on a machine.
Windows
In X, a window is simply a region of the screen into which drawing can occur. Windows are placed in a tree hierarchy, with the root window being a server created window that covers the entire screen surface and which lives for the life of the server. All other windows are children of either the root window or another window. The UI elements that most users think of as windows are just one level of the window hierarchy.

At each level of the hierarchy, windows have a stacking order, controlling which portions of windows can be seen when sibling windows overlap each other. Clients can register for Visibility notifications to get an event whenever a window becomes more or less visible than it previously was, which they may use to optimize to only draw the visible portions of the window.

Clients running in traditional X environments will also receive Expose events when a portion of their window is uncovered and needs to be drawn because the X server does not know what contents were there. When the composite extension is active, clients will normally not receive expose events since composite puts the contents of each window in a separate, non-overlapped offscreen buffer, and then combines the visible segments of each window onscreen for display. Since clients cannot control when they will be used in a composited vs. legacy environment, they must still be prepared to handle Expose events on windows when they occur.

总的来说,Display就是像一个Session, Screen是当前Display(Session)中的显示适配器,Root Window是覆盖整个显示器的一块渲染区域,包含了各种子Window, UI元素就算一种窗口而已(总结的不到位,只是一点帮助获取所有窗口和屏幕列表的概括,建议看官方解释自行理解…)

有用的命令

另外分享一个命令,可以枚举所有窗口信息,与程序结果做对比

xwininfo -tree -root

or

xwininfo -tree -root | grep +66+32

以及现成的截图工具

xwd -id 0x1234 > screenshot.xwd
xwd -root > screenshot.xwd

以及查看xwd文件的命令

xwud -in screenshot.xwd

额外的说明

现在的Ubuntu系统,默认是Wayland的,所以一些X11功能可能会报错,尤其是 获取屏幕截图, 作为LInux新手…我搞了好久,切换X11环境后,一次就成了,可以在Login界面选择
在这里插入图片描述

环境准备

首先,我们需要安装一些必要的库和工具:

sudo apt-get install libx11-dev libxcomposite-dev libxext-dev libxrandr-dev

枚举窗口的基本步骤

我们将通过以下几个步骤来实现枚举窗口的功能:

  1. 打开 X11 显示连接。
  2. 获取根窗口。
  3. 查询根窗口的子窗口。
  4. 遍历子窗口,获取每个窗口的详细信息。

代码实现

以下是一个完整的代码示例,展示了如何实现上述功能:

#include <X11/Xlib.h>
#include <X11/Xatom.h>
#include <X11/extensions/Xcomposite.h>
#include <X11/extensions/Xrandr.h>
#include <iostream>
#include <vector>
#include <string>
#include <cstring>
#include <unistd.h>

// 获取窗口的标题
bool get_window_title(Display *display, ::Window window, std::string &title) {
    XTextProperty window_name;
    if (XGetWMName(display, window, &window_name) && window_name.value) {
        title = reinterpret_cast<char*>(window_name.value);
        XFree(window_name.value);
        return true;
    }
    return false;
}

// 获取窗口的进程 ID
pid_t get_window_pid(Display *display, ::Window window) {
    Atom pid_atom = XInternAtom(display, "_NET_WM_PID", True);
    if (pid_atom == None) {
        return -1;
    }

    Atom actual_type;
    int actual_format;
    unsigned long nitems, bytes_after;
    unsigned char *prop = nullptr;

    if (XGetWindowProperty(display, window, pid_atom, 0, 1, False, XA_CARDINAL, &actual_type,
                           &actual_format, &nitems, &bytes_after, &prop) != Success) {
        return -1;
    }

    if (nitems == 0) {
        return -1;
    }

    pid_t pid = *(pid_t *)prop;
    XFree(prop);
    return pid;
}

// 获取进程路径
std::string get_process_path(pid_t pid) {
    char path[1024];
    snprintf(path, sizeof(path), "/proc/%d/exe", pid);
    char exe_path[1024];
    ssize_t len = readlink(path, exe_path, sizeof(exe_path

)

 - 1);
    if (len == -1) {
        return "";
    }
    exe_path[len] = '\0';
    return std::string(exe_path);
}

// 获取窗口的图标
bool get_window_icon(Display *display, ::Window window, std::vector<uint8_t> &icon_data, int &width, int &height) {
    Atom icon_atom = XInternAtom(display, "_NET_WM_ICON", True);
    if (icon_atom == None) {
        return false;
    }

    Atom actual_type;
    int actual_format;
    unsigned long nitems, bytes_after;
    unsigned char *prop = nullptr;

    if (XGetWindowProperty(display, window, icon_atom, 0, (~0L), False, AnyPropertyType,
                          &actual_type, &actual_format, &nitems, &bytes_after, &prop) != Success) {
        return false;
    }

    if (nitems == 0) {
        return false;
    }

    unsigned long *data = reinterpret_cast<unsigned long *>(prop);
    width = static_cast<int>(data[0]);
    height = static_cast<int>(data[1]);

    icon_data.resize(width * height * 4);
    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {
            unsigned long pixel = data[2 + y * width + x];
            icon_data[(y * width + x) * 4 + 3] = (pixel >> 24) & 0xff; // A
            icon_data[(y * width + x) * 4 + 2] = (pixel >> 16) & 0xff; // R
            icon_data[(y * width + x) * 4 + 1] = (pixel >> 8) & 0xff;  // G
            icon_data[(y * width + x) * 4 + 0] = pixel & 0xff;         // B
        }
    }

    XFree(prop);
    return true;
}

// 获取窗口的截图
bool get_window_image_data(Display *display, ::Window window, int width, int height, std::vector<uint8_t> &data) {
    XImage *image = XGetImage(display, window, 0, 0, width, height, AllPlanes, ZPixmap);
    if (!image) {
        return false;
    }

    data.resize(width * height * 4);
    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {
            unsigned long pixel = XGetPixel(image, x, y);
            data[(y * width + x) * 4 + 3] = (pixel >> 24) & 0xff; // A
            data[(y * width + x) * 4 + 2] = (pixel >> 16) & 0xff; // R
            data[(y * width + x) * 4 + 1] = (pixel >> 8) & 0xff;  // G
            data[(y * width + x) * 4 + 0] = pixel & 0xff;         // B
        }
    }

    XDestroyImage(image);
    return true;
}

// 枚举窗口
void enum_windows() {
    Display *display = XOpenDisplay(NULL);
    if (!display) {
        std::cerr << "Failed to open display" << std::endl;
        return;
    }

    ::Window root = DefaultRootWindow(display);
    if (!root) {
        std::cerr << "Failed to get root window" << std::endl;
        XCloseDisplay(display);
        return;
    }

    ::Window parent;
    ::Window *children;
    unsigned int num_children;
    if (!XQueryTree(display, root, &root, &parent, &children, &num_children)) {
        std::cerr << "Failed to query for child windows" << std::endl;
        XCloseDisplay(display);
        return;
    }

    for (unsigned int i = 0; i < num_children; ++i) {
        std::string title;
        if (get_window_title(display, children[i], title)) {
            std::cout << "Window ID: " << children[i] << std::endl;
            std::cout << "Title: " << title << std::endl;

            pid_t pid = get_window_pid(display, children[i]);
            if (pid != -1) {
                std::string process_path = get_process_path(pid);
                std::cout << "Process Path: " << process_path << std::endl;
            }

            int width = 800;  // 假设窗口宽度
            int height = 600; // 假设窗口高度
            std::vector<uint8_t> icon_data;
            if (get_window_icon(display, children[i], icon_data, width, height)) {
                std::cout << "Icon Size: " << width << "x" << height << std::endl;
            }

            std::vector<uint8_t> image_data;
            if (get_window_image_data(display, children[i], width, height, image_data)) {
                std::cout << "Captured window image" << std::endl;
            }
        }
    }

    XFree(children);
    XCloseDisplay(display);
}

int main() {
    enum_windows();
    return 0;
}

代码说明

  1. 获取窗口标题:使用

XGetWMName

函数获取窗口的标题。
2. 获取窗口进程 ID:使用 _NET_WM_PID 属性获取窗口的进程 ID。
3. 获取进程路径:通过读取 /proc/<pid>/exe 获取进程的可执行文件路径。
4. 获取窗口图标:使用 _NET_WM_ICON 属性获取窗口的图标数据。
5. 获取窗口截图:使用

XGetImage

函数获取窗口的截图数据。
6. 枚举窗口:使用

XQueryTree

函数获取根窗口的子窗口,并遍历每个子窗口获取其详细信息。

需要注意获取ICON数据时候的特殊处理

get_window_icon 函数中,对于不同架构(32位和64位)的实现有所不同,这是因为在不同架构下,数据的存储方式和处理方式有所不同。以下是对该函数中不同架构实现的原因和方法的详细解释。

原因

  1. 数据存储方式不同

    • 在32位架构下,unsigned long 类型的大小是32位(4字节)。
    • 在64位架构下,unsigned long 类型的大小是64位(8字节),但实际数据只使用了低32位,高32位是填充的。
  2. 数据处理方式不同

    • 在32位架构下,可以直接复制整个内存块,因为数据是连续存储的。
    • 在64位架构下,需要逐个元素处理,因为每个元素是64位的,但实际数据只使用了低32位。

方法

32位架构实现

在32位架构下,直接复制整个内存块,因为数据是连续存储的,且每个元素的大小是32位。

#if defined(TRAA_ARCH_32_BITS)
  icon_data.assign(prop + 2 * sizeof(unsigned long), prop + nitems * sizeof(unsigned long));
#endif

这里使用 std::vector::assign 方法将图标数据从 prop 中复制到 icon_data 向量中。prop + 2 * sizeof(unsigned long) 跳过了前两个元素(宽度和高度),prop + nitems * sizeof(unsigned long) 表示复制整个内存块。

64位架构实现

在64位架构下,需要逐个元素处理,因为每个元素是64位的,但实际数据只使用了低32位。

#elif defined(TRAA_ARCH_64_BITS)
  // TODO: this can be optimized by using some SIMD instructions.
  icon_data.resize(width * height * desktop_frame::bytes_per_pixel);
  for (int y = 0; y < height; y++) {
    for (int x = 0; x < width; x++) {
      unsigned long pixel = data[2 + y * width + x];
      icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 3] = (pixel >> (24)) & 0xff; // B
      icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 2] = (pixel >> (16)) & 0xff; // G
      icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 1] = (pixel >> (8)) & 0xff;  // R
      icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 0] = pixel & 0xff;           // A
    }
  }
#endif

这里逐个像素处理,将每个像素的ARGB值提取出来并存储到 icon_data 向量中。具体步骤如下:

  1. 调整 icon_data 的大小

    icon_data.resize(width * height * desktop_frame::bytes_per_pixel);
    

    调整 icon_data 的大小以容纳所有像素数据。

  2. 逐个像素处理

    for (int y = 0; y < height; y++) {
      for (int x = 0; x < width; x++) {
        unsigned long pixel = data[2 + y * width + x];
        icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 3] = (pixel >> (24)) & 0xff; // B
        icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 2] = (pixel >> (16)) & 0xff; // G
        icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 1] = (pixel >> (8)) & 0xff;  // R
        icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 0] = pixel & 0xff;           // A
      }
    }
    

    逐个像素处理,将每个像素的ARGB值提取出来并存储到 icon_data 向量中。

总结

通过上述步骤,我们可以在 Linux X11 窗口系统下枚举系统中的可见窗口,并获取窗口的标题、进程名、窗口大小、窗口进程的 ICON 以及窗口的截屏等信息。这些功能可以用于开发桌面管理工具、屏幕录制软件等应用。希望这篇博客对你有所帮助!

更多细节请前往 TRAA

PS

Copilot生成的博客,还算有点乱和抓不住重点细节…感兴趣的去看源码吧还是,细节都在代码里了.

我还找到了,xorg的诸多app的代码仓库!!!! xwd xwininfo都有!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值