Linux调度器性能分析 - 2

本文介绍了Linux内核中SCHEDSTATS这一调试特性,它允许调度器导出预定义的性能计数器到用户空间。通过收集和分析这些性能计数器,可以调试或调优调度器及特定的应用程序或基准测试。文章详细解释了如何访问SCHEDSTATS计数器,并提供了关于系统级和进程级统计信息的具体细节。

This article was firstly published from http://oliveryang.net. The content reuse need include the original link.

SCHEDSTATS Perf Counters - Overview

1. What is the SCHEDSTATS?

SCHEDSTATS is a kernel debug feature which allows scheduler exports its pre-defined performance counters to user space. We can do following things by collecting and analyzing these perf counters,

  • Debug or tune scheduler
  • Debug or tune specific application or benchmark from scheduling perspective

2. How could we access SCHEDSTATS counters?

When SCHEDSTATS is enabled, scheduler statistics could be accessed by following ways,

  • Three proc files exported by SCHEDSTATS code

    /proc/schedstat, /proc/[pid]/schedstat, /proc/[pid]/sched

    Documentation/scheduler/sched-stats.txt file has the full description for file format. We can write user space tools to read and process the proc files.

  • pre-defined kernel trace points

    Kernel trace points could be used by dynamic tracing tools, such as systemtap, perf. So far, in Linux 4.1, there are 4 sched_stat_* trace points defined by SCHEDSTATS code, there are 4 sched_stat_* trace points defined by SCHEDSTATS code.

    # perf list | grep sched_stat_
      sched:sched_stat_wait              [Tracepoint event]
      sched:sched_stat_sleep             [Tracepoint event]
      sched:sched_stat_iowait            [Tracepoint event]
      sched:sched_stat_blocked           [Tracepoint event]
      sched:sched_stat_runtime           [Tracepoint event] >>>> Not a SCHEDSTAT trace point
    

    Linux perf tool, record, report, script sub-commands could be used for getting system wide or per-task statistics.

  • sleep profiler when SCHEDSTATS is enabled

    This needs readprofile command installed in user space. The usage of readprofile could be found from Documentation/basic_profiling.txt. To enable kernel profiler, please refer to Documentation/kernel-parameters.txt. This way is a legacy way and could be replaced by following trace point in latest kernel,

    # perf list | grep sched_stat_blocked
      sched:sched_stat_blocked                   [Tracepoint event]
    # perf record -e sched:sched_stat_blocked -a -g sleep 5
    # perf script
    

3. SCHEDSTATS proc files use cases

3.1 System wide statistic

This includes per-cpu(run queue) or per-sched-domain statistics.

**/proc/schedstat**

Implements in scheduler core, which is the common layer for all scheduling classes.

The CPU statistics in /proc/schedstat file is defined as members of struct rq in kernel/sched.c,

            struct rq {
                    [...snipped...]

            #ifdef CONFIG_SCHEDSTATS
                    /* latency stats */
                    struct sched_info rq_sched_info;
                    unsigned long long rq_cpu_time;
                    /* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? */

                    /* sys_sched_yield() stats */
                    unsigned int yld_count;

                    /* schedule() stats */
                    unsigned int sched_switch;
                    unsigned int sched_count;
                    unsigned int sched_goidle;

                    /* try_to_wake_up() stats */
                    unsigned int ttwu_count;
                    unsigned int ttwu_local;
            #endif

                    [...snipped...]
            };

The Domain statistics in /proc/schedstat file is defined as members of struct sched_domain
in include/linux/sched.h,

            struct sched_domain {
                    [...snipped...]

            #ifdef CONFIG_SCHEDSTATS
                    /* load_balance() stats */
                    unsigned int lb_count[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
                    unsigned int lb_nobusyq[CPU_MAX_IDLE_TYPES];

                    /* Active load balancing */
                    unsigned int alb_count;
                    unsigned int alb_failed;
                    unsigned int alb_pushed;

                    /* SD_BALANCE_EXEC stats */
                    unsigned int sbe_count;
                    unsigned int sbe_balanced;
                    unsigned int sbe_pushed;

                    /* SD_BALANCE_FORK stats */
                    unsigned int sbf_count;
                    unsigned int sbf_balanced;
                    unsigned int sbf_pushed;

                    /* try_to_wake_up() stats */
                    unsigned int ttwu_wake_remote;
                    unsigned int ttwu_move_affine;
                    unsigned int ttwu_move_balance;
            #endif

                    [...snipped...]
            };
3.2 Per task statistic

/proc/[pid]/schedstat

Common for all scheduling classes.

The statistics for /proc/[pid]/schedstat is defined as member of struct task_struct in include/linux/sched.h,

#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
struct sched_info {
        /* cumulative counters */
        unsigned long pcount;         /* # of times run on this cpu */
        unsigned long long run_delay; /* time spent waiting on a run queue */
        /* timestamps */
        unsigned long long last_arrival,/* when we last ran on a cpu */
        last_queued; /* when we were last queued to run */
};
#endif /* defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) */

struct task_struct {
       [...snipped...]         
#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
       struct sched_info sched_info;
#endif       
       [...snipped...]
};

/proc/[pid]/sched

Only available for CFS tasks. Need enable SCHED_DEBUG as well.

The se statistics for /proc/[pid]/sched is defined as member of struct task_struct in include/linux/sched.h,

#ifdef CONFIG_SCHEDSTATS
            struct sched_statistics {
                    u64                     wait_start;
                    u64                     wait_max;
                    u64                     wait_count;
                    u64                     wait_sum;
                    u64                     iowait_count;
                    u64                     iowait_sum;

                    u64                     sleep_start;
                    u64                     sleep_max;
                    s64                     sum_sleep_runtime;

                    u64                     block_start;
                    u64                     block_max;
                    u64                     exec_max;
                    u64                     slice_max;

                    u64                     nr_migrations_cold;
                    u64                     nr_failed_migrations_affine;
                    u64                     nr_failed_migrations_running;
                    u64                     nr_failed_migrations_hot;
                    u64                     nr_forced_migrations;

                    u64                     nr_wakeups;
                    u64                     nr_wakeups_sync;
                    u64                     nr_wakeups_migrate;
                    u64                     nr_wakeups_local;
                    u64                     nr_wakeups_remote;
                    u64                     nr_wakeups_affine;
                    u64                     nr_wakeups_affine_attempts;
                    u64                     nr_wakeups_passive;
                    u64                     nr_wakeups_idle;
            };
#endif


            struct sched_entity {
                    [...snipped...]

            #ifdef CONFIG_SCHEDSTATS
                    struct sched_statistics statistics;
            #endif

                    [...snipped...]
            };


            struct task_struct {
                    [...snipped...]

                    struct sched_entity se;

                    [...snipped...]
            };

4. SCHEDSTATS source files

To use SCHEDSTATS, need to enable kernel config SCHEDSTATS. All related code is protected by CONFIG_SCHEDSTATS.

As far as we know, Linux kernel scheduler defined two layers,

4.1 The upper layer is scheduler core which is common layer for all scheduling class.

In Linux 3.2.x, The SCHEDSTATS source files in scheduler common layer are,

include/linux/sched.h

Per-sched-domain and per-task perf counters definitions.

kernel/sched_stats.h

/proc/schestat proc file implementation

fs/proc/base.c

/proc/[pid]/schedstat proc file implementation

kernel/sched.c

Per-runqueue perf counters definitions.

Per-runqueue, per-sched-domain, per-task perf counters implementation, for example, ttwu_stat

kernel/profile.c

The legacy code, profiling code for /proc/profile support, readprofile(1) could read it.

kernel/sched_debug.c

SCHEDSTATS in /proc/sched_debug and /proc/[pid]/sched proc files implementation.
Need enable SCHED_DEBUG at same time.

4.2 The underlying layer is per scheduling class source code.

In Linux 3.2.x, only the CFS scheduling class code has the SCHEDSTATS implementation.

kernel/sched_fair.c

SCHEDSTATS in /proc/[pid]/sched. Need enable SCHED_DEBUG at same time.

/proc/schedstat counters for load balance.

Kernel Trace points for wait, sleep, iowait, blocked(not in 3.2.x) events. See section 3 in this blog.

转载于:https://www.cnblogs.com/ainima/p/6330788.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值