基础

内核

内核本质和用户进程一样，但是内核拥有完全的硬件访问能力，用户态代码只有部分硬件访问能力

分级保护环 Rings 将计算机资源划分成不同权限的模型

从最高级0级到最低级

Intelcpu权限分四个等级现代操作系统中只会使用ring0和ring3
用户态:ring3+用户进程运行环境
内核态:ring0+内核代码运行环境

状态切换

不同级切换途径:

中断(interrupt) 和异常(exception) 收到中断/异常时切换至ring0
特权级指令例如iret 或者 sysenter

现代操作系统用syscall

系统调用

系统调用。异常，外设中断等事件发生时进行切换

系统调用指令执行后在内核态完成以下操作

通过swapgs切换GS段寄存器将GS寄存器的值和一个特定位置的值交换，保存GS值，该位置的值作为内核运行GS值使用
当前栈顶记录在CPU独占变量区域，将独占区域记录的内核栈顶放入rsp/rsp
push保存各寄存器的值
判断是否为x32_abi
跳转到sys_call_table，执行系统调用

 ENTRY(entry_SYSCALL_64)
 /* SWAPGS_UNSAFE_STACK是一个宏，x86直接定义为swapgs指令 */
 SWAPGS_UNSAFE_STACK

 /* 保存栈值，并设置内核栈 */
 movq %rsp, PER_CPU_VAR(rsp_scratch)
 movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp


/* 通过push保存寄存器值，形成一个pt_regs结构 */
/* Construct struct pt_regs on stack */
pushq  $__USER_DS      /* pt_regs->ss */
pushq  PER_CPU_VAR(rsp_scratch)  /* pt_regs->sp */
pushq  %r11             /* pt_regs->flags */
pushq  $__USER_CS      /* pt_regs->cs */
pushq  %rcx             /* pt_regs->ip */
pushq  %rax             /* pt_regs->orig_ax */
pushq  %rdi             /* pt_regs->di */
pushq  %rsi             /* pt_regs->si */
pushq  %rdx             /* pt_regs->dx */
pushq  %rcx tuichu    /* pt_regs->cx */
pushq  $-ENOSYS        /* pt_regs->ax */
pushq  %r8              /* pt_regs->r8 */
pushq  %r9              /* pt_regs->r9 */
pushq  %r10             /* pt_regs->r10 */
pushq  %r11             /* pt_regs->r11 */
sub $(6*8), %rsp      /* pt_regs->bp, bx, r12-15 not saved */

退出时

通过swapgs恢复GS值
通过sysretq或者iretq恢复到用户控件继续执行，如果用ireq还需要给出用户空间一些信息.

虚拟内存空间

虚拟内存地址空间分为两块: user space 和 kernel space .
linux通常较高的分配给内核

进程权限管理

用户应用权限时kernel管理的

进程描述符

内核中用 task_struct表示进程。定义于内核源码include/linux/sched.h中

进程权限凭证 credential

结构体cred用于管理一个进程的权限

/*
 * The security context of a task
 *
 * The parts of the context break down into two categories:
 *
 *  (1) The objective context of a task.  These parts are used when some other
 *  task is attempting to affect this one.
 *
 *  (2) The subjective context.  These details are used when the task is acting
 *  upon another object, be that a file, a task, a key or whatever.
 *
 * Note that some members of this structure belong to both categories - the
 * LSM security pointer for instance.
 *
 * A task has two security pointers.  task->real_cred points to the objective
 * context that defines that task's actual details.  The objective part of this
 * context is used whenever that task is acted upon.
 *
 * task->cred points to the subjective context that defines the details of how
 * that task is going to act upon another object.  This may be overridden
 * temporarily to point to another security context, but normally points to the
 * same context as task->real_cred.
 */
struct cred {
    atomic_long_t   usage;
    kuid_t      uid;        /* real UID of the task */
    kgid_t      gid;        /* real GID of the task */
    kuid_t      suid;       /* saved UID of the task */
    kgid_t      sgid;       /* saved GID of the task */
    kuid_t      euid;       /* effective UID of the task */
    kgid_t      egid;       /* effective GID of the task */
    kuid_t      fsuid;      /* UID for VFS ops */
    kgid_t      fsgid;      /* GID for VFS ops */
    unsigned    securebits; /* SUID-less security management */
    kernel_cap_t    cap_inheritable; /* caps our children can inherit */
    kernel_cap_t    cap_permitted;  /* caps we're permitted */
    kernel_cap_t    cap_effective;  /* caps we can actually use */
    kernel_cap_t    cap_bset;   /* capability bounding set */
    kernel_cap_t    cap_ambient;    /* Ambient capability set */
#ifdef CONFIG_KEYS
    unsigned char   jit_keyring;    /* default keyring to attach requested
                     * keys to */
    struct key  *session_keyring; /* keyring inherited over fork */
    struct key  *process_keyring; /* keyring private to this process */
    struct key  *thread_keyring; /* keyring private to this thread */
    struct key  *request_key_auth; /* assumed request_key authority */
#endif
#ifdef CONFIG_SECURITY
    void        *security;  /* LSM security */
#endif
    struct user_struct *user;   /* real user ID subscription */
    struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
    struct ucounts *ucounts;
    struct group_info *group_info;  /* supplementary groups for euid/fsgid */
    /* RCU deletion */
    union {
        int non_rcu;            /* Can we skip RCU deletion? */
        struct rcu_head rcu;        /* RCU deletion hook */
    };
} __randomize_layout;

一个cred结构体记载进程中四种不同的用户ID
real UID :进程启动时用户ID
saved UID :进程最初的有效用户ID
effective UID:进程正在运行时所属的用户ID
UID for VFS ops : 创建文件时标识的用户ID

权限改变

改变cred结构体就能改变其执行权限

struct cred* prepare_kernel_cred(struct task_struct* daemon)
拷贝一个进程的cred 返回一个新的cred
int commit_creds(struct cred *new) 将新的cred应用到当前程序

Loadable Kernel Modules LKMs

linux采用宏内核架构一切系统服务都需要由内核提供。缺乏可扩展性和可维护性。内核装载很多可能用到的服务占据大量内存空间
内核空间的LKMs可以提供新的系统调用和其他服务 ,可以像积木一样被装载入内核/从内核中卸载。

常见LKMs:

启动程序
- 设备驱动
- 文件系统驱动
内核扩展模块

LKMs文件格式和用户态可执行程序相同可以用IDA分析
模块可以单独被编译但是需要在运行时链接到内核作为内核的一部分

insmod: 将指定模块加载到内核
rmmodL: 卸载指定模块
lsmod: 列出已经加载的模块
modprobe: 添加或删除模块

大多数kernel漏洞出现在LKM中

内核交互

ioctl

linux 定义了系统调用ioctl供进程和设备之间进行通信
第一个参数是打开设备返回的文件描述符，第二个参数是用户程序对设备的控制命令，后面是补充参数
ioctl 可以和设备驱动沟通。

常用内核态函数

printf() -> printk() printk不一定会把内容显示到终端上，但一定在内核缓冲区。
memcpy() -> copy_from_user()/copy_to_user()
malloc() -> kmalloc() 使用的是slab分配器
free -> kfree()
kernel 记录了进程的权限。
commit_creds(prepare_kernel_cred(&init_task)) 可以设置root权限这是最常用的提权手段
这些变量的地址可以在/proc/kallsyms 中查看较老的内核版本是/proc/ksyms

Mitigation

Kernel保护机制
canary dep PIE RELRO

KASLR 内核的aslr

*FGKASLR 以函数粒度重新排布内核代码

STACK PROTECTOR stack cookie 检测是否发生内核堆栈溢出
通常取自gs段寄存器某个固定偏移处的值

SMAP 管理模式访问保护
SMEP 管理模式执行保护通常同时开启，组织内核空间直接访问/执行用户空间的数据,防范ret2user攻击

可以用以下两种方式绕过:

内核线性映射区对物理地址空间的完整映射，找到用户空间对应页框的内核空间地址 ret2dir
intel下系统根据CR4寄存器的第20位标识是否开启SMEP保护若能够通过 kernelROP g改变CR4寄存器的值便能关闭SMEP 但是开启了KPTI的内核用户地址空间无执行权限

KPTI 内核页表隔离内核空间和用户空间使用两组不同的页表集，对内核的内存管理产生了根本性的变化
这两张表上都有对用户内存空间的完整映射但是用户页表中只映射了少量内核代码
主要用于修复Meltdown漏洞

内核“堆”保护机制

Hardened Usercopy

主要检查拷贝过程中对内核空间中数据的读写是否会越界:
读取的数据长度是否超出源object范围
写入的数据长度是否超出目的object范围

用于copy_to_user() 与 copy_from_user() 等数据交换API。不适用于内核空间内的数据拷贝。

Hardened freelist

开启保护之前 slub中的free object 的next指针直接存放next free object 地址，可以通过读取freelist泄露出内核线性映射区的地址
开启之后next存放当前free object 下一个free object 随机值三者异或

至少要获取一和三才能篡改

Random freelist

开启这个保护后，object之间的连接顺序是随机的，让攻击者无法预测下一个分配的object的地址

发生在slub allocator刚从buddy system 拿到新slub的时候，运行时freelist仍然遵循LIFO

进行分配时会把内存上内容清零

CTF

一般有三个文件 :

boot.sh: 用于启动kernel的shell脚本。多用qemu
bzImage: compressed kernel binary
rootfs.cpio: 系统映像

基础

内核