企業(yè)網(wǎng)站建設(shè)框架圖2345網(wǎng)址導(dǎo)航瀏覽器下載
目錄
驅(qū)動中的中斷處理
中斷下半部
軟中斷
tasklet
工作隊列
驅(qū)動中的中斷處理
????????通過上一節(jié)的分析不難發(fā)現(xiàn),要在驅(qū)動中支持中斷,則需要構(gòu)造一個 struct irqaction的結(jié)構(gòu)對象,并根據(jù)IRQ?號加入到對應(yīng)的鏈表中(因為 irq_des 已經(jīng)在內(nèi)核初始化時建好了)。不過內(nèi)核有相應(yīng)的API 接口,我們只需要調(diào)用就可以了。向內(nèi)核注冊一個中斷處理函數(shù)的函數(shù)原型如下。
int request_irq(unsigned int irg, irg_handler_t handler, unsigned long flags,const char *name, void *dev);
各個形參說明如下。
irq:設(shè)備上所用中斷的IRQ 號,這個號不是硬件手冊上查到的號,而是內(nèi)核中的IRQ號,這個號將會用于決定構(gòu)造的 struct irqaction 對象被插入到哪個鏈表,并用于初始化struct irqaction 對象中的irg 成員。
handler: 指向中斷處理函數(shù)的指針,類型定義如下。
irqreturn_t (*irg_handler_t)(int, void *);
中斷發(fā)生后中斷處理函數(shù)會被自動調(diào)用,第一個參數(shù)是 IRQ 號,第二個參數(shù)是對應(yīng)的設(shè)備ID,也是 struct irqaction 結(jié)構(gòu)中的dev_id 成員。handler 用于初始化 struct irqaction對象中的 handler 成員。
????????中斷處理函數(shù)的返回值是一個枚舉類型 irqretun_t,包含如下幾個舉值
IRQNONE:不是驅(qū)動所管理的設(shè)備產(chǎn)生的中斷,用于共享中斷。
IRQ_HANDLED:中斷被正常處理。
IRQ_WAKE_THREAD:需要喚醒一個內(nèi)核線程。
flags:與中斷相關(guān)的標志,用于初始化 struct irgaction 對象中的flags 成員,常用的標志如下,這些標志可以用位或的方式來設(shè)置多個。
IRQF_TRIGGER_RISING: 上升沿觸發(fā)。
IRQF_TRIGGER_FALLING:下降沿觸發(fā)
IRQF_TRIGGER_HIGH;高電平觸發(fā)。
IRQF_TRIGGER_LOW:低電平觸發(fā)。
IRQF_DISABLED:中斷函數(shù)執(zhí)行期間禁止中斷,將會被廢棄
IRQF_SHARED:共享中斷必須設(shè)置的標志。
IRQF_TIMER;定時器專用中斷標志。
name: 該中斷在/proc 中的名字,用于初始化 struct irqaction 對象中的 name 成員。
dev:區(qū)別共享中斷中的不同設(shè)備所對應(yīng)的 struct irqaction 對象,在 struct irqaction對象從鏈表中移除時需要,dev 用于初始化 struct irqaction 對象中的 dev_id 成員。共享中斷必須傳遞一個非 NULL 的實參,非共享中斷可以傳 NULL。中斷發(fā)生后,調(diào)用中斷處理函數(shù)時,內(nèi)核也會將該參數(shù)傳給中斷處理函數(shù)。
????????request_irq函數(shù)成功返回0,失敗返回負值。需要說明的是,request_irq 函數(shù)根據(jù)傳入的參數(shù)構(gòu)造好一個struct irqaction 對象,并加入到對應(yīng)的鏈表后,還將對應(yīng)的中斷使能了。所以我們并不需要再使能中斷。
????????注銷一個中斷處理函數(shù)的函數(shù)原型如下。
void free_irq(unsigned int, void *);
????????其中,第一個參數(shù)是IRQ號,第二個參數(shù)是dev_id,共享中斷必須要傳遞一個非NULL的實參,和request_irq 中的dev_id 保持一致。
除了中斷的注冊和注銷函數(shù)之外,還有一些關(guān)于中斷使能和禁止的函數(shù)或宏,這些函數(shù)不常用到,簡單羅列如下。
local_irq_enable():使能本地CPU 的中斷
local_irq_disable0:禁止本地CPU 的中斷
local_irq_save(flags):使能本地 CPU的中斷,并將之前的中斷使能狀態(tài)保存在flags中。
local_irq_restore(flags):用flags 中的中斷使能狀態(tài)恢復(fù)中斷使能標志。
void enable_irq(unsigned int irq): 使能irq 指定的中斷。
void disable_irq(unsigned int irq):同步禁止irq 指定的中斷,即要等到irq 上的所有中斷處理程序執(zhí)行完成后才能禁止中斷。很顯然,在中斷處理函數(shù)中不能調(diào)用。
void disable_irq_nosync(unsigned int irg):立即禁止irq 指定的中斷。
????????有了上面對中斷API函數(shù)的認識,接下來就可以在我們的驅(qū)動中添加中斷處理函數(shù).在下面的例子中,讓虛擬串口和以太網(wǎng)卡共享中斷,為了得到以太網(wǎng)卡的 IRQ 號,可以使用下面的命令來查詢,其中 eth0?對應(yīng)的 170就是以太網(wǎng)卡使用的 IRQ號。?
????????另外,FS4412目標板上使用的是DM9000 網(wǎng)卡芯片,查閱芯片手冊可知,中斷觸發(fā)的類型為高電平。有了這些信息后,可以在虛擬串口驅(qū)動中添加的有關(guān)中斷支持的代碼如下
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/kfifo.h>#include <linux/ioctl.h>
#include <linux/uaccess.h>#include <linux/wait.h>
#include <linux/sched.h>
#include <linux/poll.h>
#include <linux/aio.h>#include <linux/interrupt.h>
#include <linux/random.h>#include "vser.h"#define VSER_MAJOR 256
#define VSER_MINOR 0
#define VSER_DEV_CNT 1
#define VSER_DEV_NAME "vser"struct vser_dev {unsigned int baud;struct option opt;struct cdev cdev;wait_queue_head_t rwqh;wait_queue_head_t wwqh;struct fasync_struct *fapp;
};DEFINE_KFIFO(vsfifo, char, 32);
static struct vser_dev vsdev;static int vser_fasync(int fd, struct file *filp, int on);static int vser_open(struct inode *inode, struct file *filp)
{return 0;
}static int vser_release(struct inode *inode, struct file *filp)
{vser_fasync(-1, filp, 0);return 0;
}static ssize_t vser_read(struct file *filp, char __user *buf, size_t count, loff_t *pos)
{int ret;unsigned int copied = 0;if (kfifo_is_empty(&vsfifo)) {if (filp->f_flags & O_NONBLOCK)return -EAGAIN;if (wait_event_interruptible_exclusive(vsdev.rwqh, !kfifo_is_empty(&vsfifo)))return -ERESTARTSYS;}ret = kfifo_to_user(&vsfifo, buf, count, &copied);if (!kfifo_is_full(&vsfifo)) {wake_up_interruptible(&vsdev.wwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_OUT);}return ret == 0 ? copied : ret;
}static ssize_t vser_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos)
{int ret;unsigned int copied = 0;if (kfifo_is_full(&vsfifo)) {if (filp->f_flags & O_NONBLOCK)return -EAGAIN;if (wait_event_interruptible_exclusive(vsdev.wwqh, !kfifo_is_full(&vsfifo)))return -ERESTARTSYS;}ret = kfifo_from_user(&vsfifo, buf, count, &copied);if (!kfifo_is_empty(&vsfifo)) {wake_up_interruptible(&vsdev.rwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_IN);}return ret == 0 ? copied : ret;
}static long vser_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{if (_IOC_TYPE(cmd) != VS_MAGIC)return -ENOTTY;switch (cmd) {case VS_SET_BAUD:vsdev.baud = arg;break;case VS_GET_BAUD:arg = vsdev.baud;break;case VS_SET_FFMT:if (copy_from_user(&vsdev.opt, (struct option __user *)arg, sizeof(struct option)))return -EFAULT;break;case VS_GET_FFMT:if (copy_to_user((struct option __user *)arg, &vsdev.opt, sizeof(struct option)))return -EFAULT;break;default:return -ENOTTY;}return 0;
}static unsigned int vser_poll(struct file *filp, struct poll_table_struct *p)
{int mask = 0;poll_wait(filp, &vsdev.rwqh, p);poll_wait(filp, &vsdev.wwqh, p);if (!kfifo_is_empty(&vsfifo))mask |= POLLIN | POLLRDNORM;if (!kfifo_is_full(&vsfifo))mask |= POLLOUT | POLLWRNORM;return mask;
}static ssize_t vser_aio_read(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos)
{size_t read = 0;unsigned long i;ssize_t ret;for (i = 0; i < nr_segs; i++) {ret = vser_read(iocb->ki_filp, iov[i].iov_base, iov[i].iov_len, &pos);if (ret < 0)break;read += ret;}return read ? read : -EFAULT;
}static ssize_t vser_aio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos)
{size_t written = 0;unsigned long i;ssize_t ret;for (i = 0; i < nr_segs; i++) {ret = vser_write(iocb->ki_filp, iov[i].iov_base, iov[i].iov_len, &pos);if (ret < 0)break;written += ret;}return written ? written : -EFAULT;
}static int vser_fasync(int fd, struct file *filp, int on)
{return fasync_helper(fd, filp, on, &vsdev.fapp);
}static irqreturn_t vser_handler(int irq, void *dev_id)
{char data;get_random_bytes(&data, sizeof(data));data %= 26;data += 'A';if (!kfifo_is_full(&vsfifo))if(!kfifo_in(&vsfifo, &data, sizeof(data)))printk(KERN_ERR "vser: kfifo_in failure\n");if (!kfifo_is_empty(&vsfifo)) {wake_up_interruptible(&vsdev.rwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_IN);}return IRQ_HANDLED;
}static struct file_operations vser_ops = {.owner = THIS_MODULE,.open = vser_open,.release = vser_release,.read = vser_read,.write = vser_write,.unlocked_ioctl = vser_ioctl,.poll = vser_poll,.aio_read = vser_aio_read,.aio_write = vser_aio_write,.fasync = vser_fasync,
};static int __init vser_init(void)
{int ret;dev_t dev;dev = MKDEV(VSER_MAJOR, VSER_MINOR);ret = register_chrdev_region(dev, VSER_DEV_CNT, VSER_DEV_NAME);if (ret)goto reg_err;cdev_init(&vsdev.cdev, &vser_ops);vsdev.cdev.owner = THIS_MODULE;vsdev.baud = 115200;vsdev.opt.datab = 8;vsdev.opt.parity = 0;vsdev.opt.stopb = 1;ret = cdev_add(&vsdev.cdev, dev, VSER_DEV_CNT);if (ret)goto add_err;init_waitqueue_head(&vsdev.rwqh);init_waitqueue_head(&vsdev.wwqh);ret = request_irq(167, vser_handler, IRQF_TRIGGER_HIGH | IRQF_SHARED, "vser", &vsdev);if (ret)goto irq_err;return 0;irq_err:cdev_del(&vsdev.cdev);
add_err:unregister_chrdev_region(dev, VSER_DEV_CNT);
reg_err:return ret;
}static void __exit vser_exit(void)
{dev_t dev;dev = MKDEV(VSER_MAJOR, VSER_MINOR);free_irq(167, &vsdev);cdev_del(&vsdev.cdev);unregister_chrdev_region(dev, VSER_DEV_CNT);
}module_init(vser_init);
module_exit(vser_exit);MODULE_LICENSE("GPL");
MODULE_AUTHOR("name <e-mail>");
MODULE_DESCRIPTION("A simple character device driver");
MODULE_ALIAS("virtual-serial");
程序的IRQ號需要對應(yīng)修改
????????代碼第235行使用requcst_irq 注冊了中斷處理函數(shù) vser_handler,因為是共享中斷,并且是高電平觸發(fā),所以flags 參數(shù)設(shè)置為IRQF_TRIGGER_HIGH | IRQF_SHARED,而且共享中斷必須設(shè)置最后一個參數(shù),通常該參數(shù)傳遞代表設(shè)備的結(jié)構(gòu)對象指針即可,本例中傳送的實參為&vsdev。
????????代碼第255行表示在模塊卸載時需要注銷中斷,分別傳遞了中斷號和&vsdev。
????????代碼第 179 行至第 196 行是中斷處理函數(shù)的實現(xiàn)。代碼第 183 行至第 188 行產(chǎn)生了一個隨機的大寫英文字符并寫入到 FIFO 中,用到了兩個新的函數(shù) get_random_bytes 和 kfifo_in,由于函數(shù)簡單,通過代碼就可以知道使用的方法,所以在這里不再詳細講解。
????????代碼第 190 行至第193 行分別是阻塞進程的喚醒和信號發(fā)送操作。代碼 195 行返回IRQ_HANDLED,表示中斷處理正常。
????????需要說明的是,對于共享中斷的中斷處理函數(shù)應(yīng)該獲取硬件的中斷標志來判斷該中斷是否是本設(shè)備所產(chǎn)生的,如果不是則不應(yīng)該進行相應(yīng)的中斷處理,并返回IRQ_NONE由于例子中是一個虛擬設(shè)備,沒有相關(guān)的標志,所以沒有做相關(guān)的處理。
? ? ? ? 最后,中斷處理函數(shù)應(yīng)該快速完成,不能消耗太長的時間。因為在 ARM 處理器進入中斷后,相應(yīng)的中斷被屏 (IRQ中斷禁止IRQ中斷,FIQ?中斷禁止IRQ中斷和FIQ中斷,內(nèi)核沒有使用FIQ?中斷),在之后的代碼中又沒有重新開啟中斷。所以在整個中斷處理過程中中斷是禁止的,如果中斷處理函數(shù)執(zhí)行時間過長,那么其他的中斷將會被掛起,從而將會對其他中斷的響應(yīng)造成嚴重的影響。必須記住的一條準則是,在中斷處理函數(shù)中一定不能調(diào)用調(diào)度器,即一定不能調(diào)用可能會引起進程切換的函數(shù)(因為一旦中斷處理程序被切換,將不能再次被調(diào)度),這是內(nèi)核對中斷處理函數(shù)的一個嚴格限制。到目前為止,我們學(xué)過的可能會引起進程切換的函數(shù)有 kfifo_to_user、kfifo_from_user、copy_from_user、copy_to_user、wait_event_xxx。后面遇到此類函數(shù)會再次指出。另外在中斷處理函數(shù)中如果調(diào)用disable_irq 可能會死鎖。
????????驅(qū)動的編譯和測試命今如下。
make ARCH=arm
./lazy
?
中斷下半部
????????在上一節(jié)的最后我們提到,中斷處理函數(shù)應(yīng)該盡快完成,否則將會影響對其他中斷的及時響應(yīng),從而影響整個系統(tǒng)的性能。但有時候這些耗時的操作可能又避免不了,以網(wǎng)卡為例,當(dāng)網(wǎng)卡收到數(shù)據(jù)后會產(chǎn)生一個中斷,在中斷處理程序中需要將網(wǎng)卡收到的數(shù)據(jù)從網(wǎng)卡的緩存中復(fù)制出來,然后對數(shù)據(jù)包做嚴格的檢查(是否有幀格式錯誤,是否有校驗和錯誤等),檢查完成后再根據(jù)協(xié)議對數(shù)據(jù)包進行拆包處理,最后將拆包后的數(shù)據(jù)包遞交給上層。如果在這個過程中網(wǎng)卡又收到新的數(shù)據(jù),從而再次產(chǎn)生中斷,因為上一個中斷正在處理的過程中,所以新的中斷將會被掛起,新收到的數(shù)據(jù)就得不到及時處理。因為網(wǎng)卡的緩沖區(qū)大小有限,如果后面有更多的數(shù)據(jù)包到來,那么緩沖區(qū)最終會溢出,從而產(chǎn)生丟包。為了解決這個問題,Linux 將中斷分成了兩部分:上半部和下半部(頂半部和底半部),上半部完成緊急但能很快完成的事情,下半部完成不緊急但比較耗時的操作。對于網(wǎng)卡來說,將數(shù)據(jù)從網(wǎng)卡的緩存復(fù)制到內(nèi)存中就是緊急但可以快速完成的事,需要放在上半部執(zhí)行。而對包進行校驗和拆包則是不緊急但比較耗時的操作,可以放在下半部執(zhí)行。下半部在執(zhí)行的過程中,中斷被重新使能,所以如果有新的硬件中斷產(chǎn)生,將會停止執(zhí)行下半部的程序,轉(zhuǎn)為執(zhí)行硬件中斷的上半部。
軟中斷
????????下半部雖然可以推遲執(zhí)行,但是我們還是希望它能盡快執(zhí)行。那么什么時候可以試執(zhí)行下半部呢?肯定是在上半部執(zhí)行完成之后。為了能更直觀地了解這個過程,現(xiàn)在將這部分代碼再次摘錄如下。
/* arch/arm/kernel/irq.c*/
/** linux/arch/arm/kernel/irq.c** Copyright (C) 1992 Linus Torvalds* Modifications for ARM processor Copyright (C) 1995-2000 Russell King.** Support for Dynamic Tick Timer Copyright (C) 2004-2005 Nokia Corporation.* Dynamic Tick Timer written by Tony Lindgren <tony@atomide.com> and* Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>.** This program is free software; you can redistribute it and/or modify* it under the terms of the GNU General Public License version 2 as* published by the Free Software Foundation.** This file contains the code used by various IRQ handling routines:* asking for different IRQ's should be done through these routines* instead of just grabbing them. Thus setups with different IRQ numbers* shouldn't result in any weird surprises, and installing new handlers* should be easier.** IRQ's are in fact implemented a bit like signal handlers for the kernel.* Naturally it's not a 1:1 relation, but there are similarities.*/
#include <linux/kernel_stat.h>
#include <linux/signal.h>
#include <linux/ioport.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/irqchip.h>
#include <linux/random.h>
#include <linux/smp.h>
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/errno.h>
#include <linux/list.h>
#include <linux/kallsyms.h>
#include <linux/proc_fs.h>
#include <linux/export.h>#include <asm/exception.h>
#include <asm/mach/arch.h>
#include <asm/mach/irq.h>
#include <asm/mach/time.h>unsigned long irq_err_count;int arch_show_interrupts(struct seq_file *p, int prec)
{
#ifdef CONFIG_FIQshow_fiq_list(p, prec);
#endif
#ifdef CONFIG_SMPshow_ipi_list(p, prec);
#endifseq_printf(p, "%*s: %10lu\n", prec, "Err", irq_err_count);return 0;
}/** handle_IRQ handles all hardware IRQ's. Decoded IRQs should* not come via this function. Instead, they should provide their* own 'handler'. Used by platform code implementing C-based 1st* level decoding.*/
void handle_IRQ(unsigned int irq, struct pt_regs *regs)
{struct pt_regs *old_regs = set_irq_regs(regs);irq_enter();/** Some hardware gives randomly wrong interrupts. Rather* than crashing, do something sensible.*/if (unlikely(irq >= nr_irqs)) {if (printk_ratelimit())printk(KERN_WARNING "Bad IRQ%u\n", irq);ack_bad_irq(irq);} else {generic_handle_irq(irq);}irq_exit();set_irq_regs(old_regs);
}/** asm_do_IRQ is the interface to be used from assembly code.*/
asmlinkage void __exception_irq_entry
asm_do_IRQ(unsigned int irq, struct pt_regs *regs)
{handle_IRQ(irq, regs);
}void set_irq_flags(unsigned int irq, unsigned int iflags)
{unsigned long clr = 0, set = IRQ_NOREQUEST | IRQ_NOPROBE | IRQ_NOAUTOEN;if (irq >= nr_irqs) {printk(KERN_ERR "Trying to set irq flags for IRQ%d\n", irq);return;}if (iflags & IRQF_VALID)clr |= IRQ_NOREQUEST;if (iflags & IRQF_PROBE)clr |= IRQ_NOPROBE;if (!(iflags & IRQF_NOAUTOEN))clr |= IRQ_NOAUTOEN;/* Order is clear bits in "clr" then set bits in "set" */irq_modify_status(irq, clr, set & ~clr);
}
EXPORT_SYMBOL_GPL(set_irq_flags);void __init init_IRQ(void)
{if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq)irqchip_init();elsemachine_desc->init_irq();
}#ifdef CONFIG_MULTI_IRQ_HANDLER
void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{if (handle_arch_irq)return;handle_arch_irq = handle_irq;
}
#endif#ifdef CONFIG_SPARSE_IRQ
int __init arch_probe_nr_irqs(void)
{nr_irqs = machine_desc->nr_irqs ? machine_desc->nr_irqs : NR_IRQS;return nr_irqs;
}
#endif#ifdef CONFIG_HOTPLUG_CPUstatic bool migrate_one_irq(struct irq_desc *desc)
{struct irq_data *d = irq_desc_get_irq_data(desc);const struct cpumask *affinity = d->affinity;struct irq_chip *c;bool ret = false;/** If this is a per-CPU interrupt, or the affinity does not* include this CPU, then we have nothing to do.*/if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))return false;if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {affinity = cpu_online_mask;ret = true;}c = irq_data_get_irq_chip(d);if (!c->irq_set_affinity)pr_debug("IRQ%u: unable to set affinity\n", d->irq);else if (c->irq_set_affinity(d, affinity, true) == IRQ_SET_MASK_OK && ret)cpumask_copy(d->affinity, affinity);return ret;
}/** The current CPU has been marked offline. Migrate IRQs off this CPU.* If the affinity settings do not allow other CPUs, force them onto any* available CPU.** Note: we must iterate over all IRQs, whether they have an attached* action structure or not, as we need to get chained interrupts too.*/
void migrate_irqs(void)
{unsigned int i;struct irq_desc *desc;unsigned long flags;local_irq_save(flags);for_each_irq_desc(i, desc) {bool affinity_broken;raw_spin_lock(&desc->lock);affinity_broken = migrate_one_irq(desc);raw_spin_unlock(&desc->lock);if (affinity_broken && printk_ratelimit())pr_warning("IRQ%u no longer affine to CPU%u\n", i,smp_processor_id());}local_irq_restore(flags);
}
#endif /* CONFIG_HOTPLUG_CPU */
????????在前面的分析中我們知道,中斷處理的過程中會調(diào)用上面列出的 handle_IRQ函數(shù)該函數(shù)首先調(diào)用 irqenter 函數(shù)將被中斷進程中的搶占計數(shù)器 preempt_count 加上了HARDIRQ_OFFSET 這個值(主要是為了防止內(nèi)核搶占,以及作為是否在硬件中斷中的一個判斷條件)。當(dāng)中斷的上半部處理完,即generic_handle_irq函數(shù)返回后,又調(diào)用了irq_exit函數(shù),代碼如下。
/* arch/arm/kernel/irq.c */
見上面源碼。
????????在這個函數(shù)中又將 preempt_count減去了 HARDIRQ_OFFSET這個值,如果沒有發(fā)生中斷嵌套,那么preempt_count 中關(guān)于硬件中斷的計數(shù)值就為0,表示上半部已經(jīng)執(zhí)行完接下來代碼第387行調(diào)用in_interrupt和local_softirq_pending函數(shù)來分別判斷是否可以執(zhí)行中斷下半部,以及是否有中斷下半部等待執(zhí)行 (in_interrupt 函數(shù)主要是檢測preempt_count 中相關(guān)域的值,如果為0表示具備執(zhí)行中斷下半部的條件)。如果條件滿足就調(diào)用invoke_softirq 來立即執(zhí)行中斷下半部。由此可知,中斷下半部的最早執(zhí)行時間是中斷上半部執(zhí)行完成之后,但是中斷還沒有完全返回之前的時候。在 FS4412 目標板對應(yīng)的3.14.25版本的內(nèi)核源碼配置中,invoke_softirg 函數(shù)調(diào)用了do_softirq_own_stack函數(shù),該函數(shù)又調(diào)用了__do_softirq?函數(shù),這是中斷下半部的核心處理函數(shù)。在了解這個函數(shù)之前,我們首先來認識一下 softirq,即軟中斷。
????????軟中斷是中斷下半部機制中的一種,描述軟中斷的結(jié)構(gòu)是struct softirq_action,它的定義非常簡單,就是內(nèi)嵌了一個函數(shù)指針。內(nèi)核共定義了 NR_SOFTIRQS個(目前有 10個)struct softirq_action 對象,這些對象被放在一個名叫 softirq_vec 的數(shù)組中,對象在數(shù)組中的下標就是這個軟中斷的編號。內(nèi)核中有一個全局整型變量來記錄是否有相應(yīng)的軟中斷需要執(zhí)行,比如要執(zhí)行編號為1的軟中斷,那么就需要把這個全局整型變量的比特1置位。當(dāng)內(nèi)核在相應(yīng)的代碼中檢測到該比特位置 1,就會用這個比特位的位數(shù)去索引softirq_vec 這個數(shù)組,然后調(diào)用softirq_action 對象中的action 指針所指向的函數(shù)。目前內(nèi)核中定義的軟中斷的編號如下。
/*include/linux/interrupt.h */
/* interrupt.h */
#ifndef _LINUX_INTERRUPT_H
#define _LINUX_INTERRUPT_H#include <linux/kernel.h>
#include <linux/linkage.h>
#include <linux/bitops.h>
#include <linux/preempt.h>
#include <linux/cpumask.h>
#include <linux/irqreturn.h>
#include <linux/irqnr.h>
#include <linux/hardirq.h>
#include <linux/irqflags.h>
#include <linux/hrtimer.h>
#include <linux/kref.h>
#include <linux/workqueue.h>#include <linux/atomic.h>
#include <asm/ptrace.h>
#include <asm/irq.h>/** These correspond to the IORESOURCE_IRQ_* defines in* linux/ioport.h to select the interrupt line behaviour. When* requesting an interrupt without specifying a IRQF_TRIGGER, the* setting should be assumed to be "as already configured", which* may be as per machine or firmware initialisation.*/
#define IRQF_TRIGGER_NONE 0x00000000
#define IRQF_TRIGGER_RISING 0x00000001
#define IRQF_TRIGGER_FALLING 0x00000002
#define IRQF_TRIGGER_HIGH 0x00000004
#define IRQF_TRIGGER_LOW 0x00000008
#define IRQF_TRIGGER_MASK (IRQF_TRIGGER_HIGH | IRQF_TRIGGER_LOW | \IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING)
#define IRQF_TRIGGER_PROBE 0x00000010/** These flags used only by the kernel as part of the* irq handling routines.** IRQF_DISABLED - keep irqs disabled when calling the action handler.* DEPRECATED. This flag is a NOOP and scheduled to be removed* IRQF_SHARED - allow sharing the irq among several devices* IRQF_PROBE_SHARED - set by callers when they expect sharing mismatches to occur* IRQF_TIMER - Flag to mark this interrupt as timer interrupt* IRQF_PERCPU - Interrupt is per cpu* IRQF_NOBALANCING - Flag to exclude this interrupt from irq balancing* IRQF_IRQPOLL - Interrupt is used for polling (only the interrupt that is* registered first in an shared interrupt is considered for* performance reasons)* IRQF_ONESHOT - Interrupt is not reenabled after the hardirq handler finished.* Used by threaded interrupts which need to keep the* irq line disabled until the threaded handler has been run.* IRQF_NO_SUSPEND - Do not disable this IRQ during suspend* IRQF_FORCE_RESUME - Force enable it on resume even if IRQF_NO_SUSPEND is set* IRQF_NO_THREAD - Interrupt cannot be threaded* IRQF_EARLY_RESUME - Resume IRQ early during syscore instead of at device* resume time.*/
#define IRQF_DISABLED 0x00000020
#define IRQF_SHARED 0x00000080
#define IRQF_PROBE_SHARED 0x00000100
#define __IRQF_TIMER 0x00000200
#define IRQF_PERCPU 0x00000400
#define IRQF_NOBALANCING 0x00000800
#define IRQF_IRQPOLL 0x00001000
#define IRQF_ONESHOT 0x00002000
#define IRQF_NO_SUSPEND 0x00004000
#define IRQF_FORCE_RESUME 0x00008000
#define IRQF_NO_THREAD 0x00010000
#define IRQF_EARLY_RESUME 0x00020000#define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)/** These values can be returned by request_any_context_irq() and* describe the context the interrupt will be run in.** IRQC_IS_HARDIRQ - interrupt runs in hardirq context* IRQC_IS_NESTED - interrupt runs in a nested threaded context*/
enum {IRQC_IS_HARDIRQ = 0,IRQC_IS_NESTED,
};typedef irqreturn_t (*irq_handler_t)(int, void *);/*** struct irqaction - per interrupt action descriptor* @handler: interrupt handler function* @name: name of the device* @dev_id: cookie to identify the device* @percpu_dev_id: cookie to identify the device* @next: pointer to the next irqaction for shared interrupts* @irq: interrupt number* @flags: flags (see IRQF_* above)* @thread_fn: interrupt handler function for threaded interrupts* @thread: thread pointer for threaded interrupts* @thread_flags: flags related to @thread* @thread_mask: bitmask for keeping track of @thread activity* @dir: pointer to the proc/irq/NN/name entry*/
struct irqaction {irq_handler_t handler;void *dev_id;void __percpu *percpu_dev_id;struct irqaction *next;irq_handler_t thread_fn;struct task_struct *thread;unsigned int irq;unsigned int flags;unsigned long thread_flags;unsigned long thread_mask;const char *name;struct proc_dir_entry *dir;
} ____cacheline_internodealigned_in_smp;extern irqreturn_t no_action(int cpl, void *dev_id);extern int __must_check
request_threaded_irq(unsigned int irq, irq_handler_t handler,irq_handler_t thread_fn,unsigned long flags, const char *name, void *dev);static inline int __must_check
request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,const char *name, void *dev)
{return request_threaded_irq(irq, handler, NULL, flags, name, dev);
}extern int __must_check
request_any_context_irq(unsigned int irq, irq_handler_t handler,unsigned long flags, const char *name, void *dev_id);extern int __must_check
request_percpu_irq(unsigned int irq, irq_handler_t handler,const char *devname, void __percpu *percpu_dev_id);extern void free_irq(unsigned int, void *);
extern void free_percpu_irq(unsigned int, void __percpu *);struct device;extern int __must_check
devm_request_threaded_irq(struct device *dev, unsigned int irq,irq_handler_t handler, irq_handler_t thread_fn,unsigned long irqflags, const char *devname,void *dev_id);static inline int __must_check
devm_request_irq(struct device *dev, unsigned int irq, irq_handler_t handler,unsigned long irqflags, const char *devname, void *dev_id)
{return devm_request_threaded_irq(dev, irq, handler, NULL, irqflags,devname, dev_id);
}extern int __must_check
devm_request_any_context_irq(struct device *dev, unsigned int irq,irq_handler_t handler, unsigned long irqflags,const char *devname, void *dev_id);extern void devm_free_irq(struct device *dev, unsigned int irq, void *dev_id);/** On lockdep we dont want to enable hardirqs in hardirq* context. Use local_irq_enable_in_hardirq() to annotate* kernel code that has to do this nevertheless (pretty much* the only valid case is for old/broken hardware that is* insanely slow).** NOTE: in theory this might break fragile code that relies* on hardirq delivery - in practice we dont seem to have such* places left. So the only effect should be slightly increased* irqs-off latencies.*/
#ifdef CONFIG_LOCKDEP
# define local_irq_enable_in_hardirq() do { } while (0)
#else
# define local_irq_enable_in_hardirq() local_irq_enable()
#endifextern void disable_irq_nosync(unsigned int irq);
extern void disable_irq(unsigned int irq);
extern void disable_percpu_irq(unsigned int irq);
extern void enable_irq(unsigned int irq);
extern void enable_percpu_irq(unsigned int irq, unsigned int type);/* The following three functions are for the core kernel use only. */
extern void suspend_device_irqs(void);
extern void resume_device_irqs(void);
#ifdef CONFIG_PM_SLEEP
extern int check_wakeup_irqs(void);
#else
static inline int check_wakeup_irqs(void) { return 0; }
#endif#if defined(CONFIG_SMP)extern cpumask_var_t irq_default_affinity;extern int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask);
extern int irq_can_set_affinity(unsigned int irq);
extern int irq_select_affinity(unsigned int irq);extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);/*** struct irq_affinity_notify - context for notification of IRQ affinity changes* @irq: Interrupt to which notification applies* @kref: Reference count, for internal use* @work: Work item, for internal use* @notify: Function to be called on change. This will be* called in process context.* @release: Function to be called on release. This will be* called in process context. Once registered, the* structure must only be freed when this function is* called or later.*/
struct irq_affinity_notify {unsigned int irq;struct kref kref;struct work_struct work;void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);void (*release)(struct kref *ref);
};extern int
irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);#else /* CONFIG_SMP */static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
{return -EINVAL;
}static inline int irq_can_set_affinity(unsigned int irq)
{return 0;
}static inline int irq_select_affinity(unsigned int irq) { return 0; }static inline int irq_set_affinity_hint(unsigned int irq,const struct cpumask *m)
{return -EINVAL;
}
#endif /* CONFIG_SMP *//** Special lockdep variants of irq disabling/enabling.* These should be used for locking constructs that* know that a particular irq context which is disabled,* and which is the only irq-context user of a lock,* that it's safe to take the lock in the irq-disabled* section without disabling hardirqs.** On !CONFIG_LOCKDEP they are equivalent to the normal* irq disable/enable methods.*/
static inline void disable_irq_nosync_lockdep(unsigned int irq)
{disable_irq_nosync(irq);
#ifdef CONFIG_LOCKDEPlocal_irq_disable();
#endif
}static inline void disable_irq_nosync_lockdep_irqsave(unsigned int irq, unsigned long *flags)
{disable_irq_nosync(irq);
#ifdef CONFIG_LOCKDEPlocal_irq_save(*flags);
#endif
}static inline void disable_irq_lockdep(unsigned int irq)
{disable_irq(irq);
#ifdef CONFIG_LOCKDEPlocal_irq_disable();
#endif
}static inline void enable_irq_lockdep(unsigned int irq)
{
#ifdef CONFIG_LOCKDEPlocal_irq_enable();
#endifenable_irq(irq);
}static inline void enable_irq_lockdep_irqrestore(unsigned int irq, unsigned long *flags)
{
#ifdef CONFIG_LOCKDEPlocal_irq_restore(*flags);
#endifenable_irq(irq);
}/* IRQ wakeup (PM) control: */
extern int irq_set_irq_wake(unsigned int irq, unsigned int on);static inline int enable_irq_wake(unsigned int irq)
{return irq_set_irq_wake(irq, 1);
}static inline int disable_irq_wake(unsigned int irq)
{return irq_set_irq_wake(irq, 0);
}#ifdef CONFIG_IRQ_FORCED_THREADING
extern bool force_irqthreads;
#else
#define force_irqthreads (0)
#endif#ifndef __ARCH_SET_SOFTIRQ_PENDING
#define set_softirq_pending(x) (local_softirq_pending() = (x))
#define or_softirq_pending(x) (local_softirq_pending() |= (x))
#endif/* Some architectures might implement lazy enabling/disabling of* interrupts. In some cases, such as stop_machine, we might want* to ensure that after a local_irq_disable(), interrupts have* really been disabled in hardware. Such architectures need to* implement the following hook.*/
#ifndef hard_irq_disable
#define hard_irq_disable() do { } while(0)
#endif/* PLEASE, avoid to allocate new softirqs, if you need not _really_ highfrequency threaded job scheduling. For almost all the purposestasklets are more than enough. F.e. all serial device BHs etal. should be converted to tasklets, not to softirqs.*/enum
{HI_SOFTIRQ=0,TIMER_SOFTIRQ,NET_TX_SOFTIRQ,NET_RX_SOFTIRQ,BLOCK_SOFTIRQ,BLOCK_IOPOLL_SOFTIRQ,TASKLET_SOFTIRQ,SCHED_SOFTIRQ,HRTIMER_SOFTIRQ,RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */NR_SOFTIRQS
};#define SOFTIRQ_STOP_IDLE_MASK (~(1 << RCU_SOFTIRQ))/* map softirq index to softirq name. update 'softirq_to_name' in* kernel/softirq.c when adding a new softirq.*/
extern const char * const softirq_to_name[NR_SOFTIRQS];/* softirq mask and active fields moved to irq_cpustat_t in* asm/hardirq.h to get better cache usage. KAO*/struct softirq_action
{void (*action)(struct softirq_action *);
};asmlinkage void do_softirq(void);
asmlinkage void __do_softirq(void);#ifdef __ARCH_HAS_DO_SOFTIRQ
void do_softirq_own_stack(void);
#else
static inline void do_softirq_own_stack(void)
{__do_softirq();
}
#endifextern void open_softirq(int nr, void (*action)(struct softirq_action *));
extern void softirq_init(void);
extern void __raise_softirq_irqoff(unsigned int nr);extern void raise_softirq_irqoff(unsigned int nr);
extern void raise_softirq(unsigned int nr);DECLARE_PER_CPU(struct task_struct *, ksoftirqd);static inline struct task_struct *this_cpu_ksoftirqd(void)
{return this_cpu_read(ksoftirqd);
}/* Tasklets --- multithreaded analogue of BHs.Main feature differing them of generic softirqs: taskletis running only on one CPU simultaneously.Main feature differing them of BHs: different taskletsmay be run simultaneously on different CPUs.Properties:* If tasklet_schedule() is called, then tasklet is guaranteedto be executed on some cpu at least once after this.* If the tasklet is already scheduled, but its execution is still notstarted, it will be executed only once.* If this tasklet is already running on another CPU (or schedule is calledfrom tasklet itself), it is rescheduled for later.* Tasklet is strictly serialized wrt itself, but notwrt another tasklets. If client needs some intertask synchronization,he makes it with spinlocks.*/struct tasklet_struct
{struct tasklet_struct *next;unsigned long state;atomic_t count;void (*func)(unsigned long);unsigned long data;
};#define DECLARE_TASKLET(name, func, data) \
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }#define DECLARE_TASKLET_DISABLED(name, func, data) \
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }enum
{TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */TASKLET_STATE_RUN /* Tasklet is running (SMP only) */
};#ifdef CONFIG_SMP
static inline int tasklet_trylock(struct tasklet_struct *t)
{return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
}static inline void tasklet_unlock(struct tasklet_struct *t)
{smp_mb__before_clear_bit(); clear_bit(TASKLET_STATE_RUN, &(t)->state);
}static inline void tasklet_unlock_wait(struct tasklet_struct *t)
{while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
}
#else
#define tasklet_trylock(t) 1
#define tasklet_unlock_wait(t) do { } while (0)
#define tasklet_unlock(t) do { } while (0)
#endifextern void __tasklet_schedule(struct tasklet_struct *t);static inline void tasklet_schedule(struct tasklet_struct *t)
{if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))__tasklet_schedule(t);
}extern void __tasklet_hi_schedule(struct tasklet_struct *t);static inline void tasklet_hi_schedule(struct tasklet_struct *t)
{if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))__tasklet_hi_schedule(t);
}extern void __tasklet_hi_schedule_first(struct tasklet_struct *t);/** This version avoids touching any other tasklets. Needed for kmemcheck* in order not to take any page faults while enqueueing this tasklet;* consider VERY carefully whether you really need this or* tasklet_hi_schedule()...*/
static inline void tasklet_hi_schedule_first(struct tasklet_struct *t)
{if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))__tasklet_hi_schedule_first(t);
}static inline void tasklet_disable_nosync(struct tasklet_struct *t)
{atomic_inc(&t->count);smp_mb__after_atomic_inc();
}static inline void tasklet_disable(struct tasklet_struct *t)
{tasklet_disable_nosync(t);tasklet_unlock_wait(t);smp_mb();
}static inline void tasklet_enable(struct tasklet_struct *t)
{smp_mb__before_atomic_dec();atomic_dec(&t->count);
}static inline void tasklet_hi_enable(struct tasklet_struct *t)
{smp_mb__before_atomic_dec();atomic_dec(&t->count);
}extern void tasklet_kill(struct tasklet_struct *t);
extern void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu);
extern void tasklet_init(struct tasklet_struct *t,void (*func)(unsigned long), unsigned long data);struct tasklet_hrtimer {struct hrtimer timer;struct tasklet_struct tasklet;enum hrtimer_restart (*function)(struct hrtimer *);
};extern void
tasklet_hrtimer_init(struct tasklet_hrtimer *ttimer,enum hrtimer_restart (*function)(struct hrtimer *),clockid_t which_clock, enum hrtimer_mode mode);static inline
int tasklet_hrtimer_start(struct tasklet_hrtimer *ttimer, ktime_t time,const enum hrtimer_mode mode)
{return hrtimer_start(&ttimer->timer, time, mode);
}static inline
void tasklet_hrtimer_cancel(struct tasklet_hrtimer *ttimer)
{hrtimer_cancel(&ttimer->timer);tasklet_kill(&ttimer->tasklet);
}/** Autoprobing for irqs:** probe_irq_on() and probe_irq_off() provide robust primitives* for accurate IRQ probing during kernel initialization. They are* reasonably simple to use, are not "fooled" by spurious interrupts,* and, unlike other attempts at IRQ probing, they do not get hung on* stuck interrupts (such as unused PS2 mouse interfaces on ASUS boards).** For reasonably foolproof probing, use them as follows:** 1. clear and/or mask the device's internal interrupt.* 2. sti();* 3. irqs = probe_irq_on(); // "take over" all unassigned idle IRQs* 4. enable the device and cause it to trigger an interrupt.* 5. wait for the device to interrupt, using non-intrusive polling or a delay.* 6. irq = probe_irq_off(irqs); // get IRQ number, 0=none, negative=multiple* 7. service the device to clear its pending interrupt.* 8. loop again if paranoia is required.** probe_irq_on() returns a mask of allocated irq's.** probe_irq_off() takes the mask as a parameter,* and returns the irq number which occurred,* or zero if none occurred, or a negative irq number* if more than one irq occurred.*/#if !defined(CONFIG_GENERIC_IRQ_PROBE)
static inline unsigned long probe_irq_on(void)
{return 0;
}
static inline int probe_irq_off(unsigned long val)
{return 0;
}
static inline unsigned int probe_irq_mask(unsigned long val)
{return 0;
}
#else
extern unsigned long probe_irq_on(void); /* returns 0 on failure */
extern int probe_irq_off(unsigned long); /* returns 0 or negative on failure */
extern unsigned int probe_irq_mask(unsigned long); /* returns mask of ISA interrupts */
#endif#ifdef CONFIG_PROC_FS
/* Initialize /proc/irq/ */
extern void init_irq_proc(void);
#else
static inline void init_irq_proc(void)
{
}
#endifstruct seq_file;
int show_interrupts(struct seq_file *p, void *v);
int arch_show_interrupts(struct seq_file *p, int prec);extern int early_irq_init(void);
extern int arch_probe_nr_irqs(void);
extern int arch_early_irq_init(void);#endif
比如,NET_RX_SOFTIRQ 就代表網(wǎng)卡接收中斷所對應(yīng)的軟中斷編號,而這個編號所對應(yīng)的軟中斷處理函數(shù)就是 net_rx_action。
接下來就來看看 __do_softirq?的實現(xiàn),代碼如下。
/* kernel/softirq.c */
/** linux/kernel/softirq.c** Copyright (C) 1992 Linus Torvalds** Distribute under GPLv2.** Rewritten. Old one was good in 2.2, but in 2.3 it was immoral. --ANK (990903)*/#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt#include <linux/export.h>
#include <linux/kernel_stat.h>
#include <linux/interrupt.h>
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/notifier.h>
#include <linux/percpu.h>
#include <linux/cpu.h>
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/rcupdate.h>
#include <linux/ftrace.h>
#include <linux/smp.h>
#include <linux/smpboot.h>
#include <linux/tick.h>#define CREATE_TRACE_POINTS
#include <trace/events/irq.h>/*- No shared variables, all the data are CPU local.- If a softirq needs serialization, let it serialize itselfby its own spinlocks.- Even if softirq is serialized, only local cpu is marked forexecution. Hence, we get something sort of weak cpu binding.Though it is still not clear, will it result in better localityor will not.Examples:- NET RX softirq. It is multithreaded and does not requireany global serialization.- NET TX softirq. It kicks software netdevice queues, henceit is logically serialized per device, but this serializationis invisible to common code.- Tasklets: serialized wrt itself.*/#ifndef __ARCH_IRQ_STAT
irq_cpustat_t irq_stat[NR_CPUS] ____cacheline_aligned;
EXPORT_SYMBOL(irq_stat);
#endifstatic struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;DEFINE_PER_CPU(struct task_struct *, ksoftirqd);const char * const softirq_to_name[NR_SOFTIRQS] = {"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL","TASKLET", "SCHED", "HRTIMER", "RCU"
};/** we cannot loop indefinitely here to avoid userspace starvation,* but we also don't want to introduce a worst case 1/HZ latency* to the pending events, so lets the scheduler to balance* the softirq load for us.*/
static void wakeup_softirqd(void)
{/* Interrupts are disabled: no need to stop preemption */struct task_struct *tsk = __this_cpu_read(ksoftirqd);if (tsk && tsk->state != TASK_RUNNING)wake_up_process(tsk);
}/** preempt_count and SOFTIRQ_OFFSET usage:* - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving* softirq processing.* - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)* on local_bh_disable or local_bh_enable.* This lets us distinguish between whether we are currently processing* softirq and whether we just have bh disabled.*//** This one is for softirq.c-internal use,* where hardirqs are disabled legitimately:*/
#ifdef CONFIG_TRACE_IRQFLAGS
void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
{unsigned long flags;WARN_ON_ONCE(in_irq());raw_local_irq_save(flags);/** The preempt tracer hooks into preempt_count_add and will break* lockdep because it calls back into lockdep after SOFTIRQ_OFFSET* is set and before current->softirq_enabled is cleared.* We must manually increment preempt_count here and manually* call the trace_preempt_off later.*/__preempt_count_add(cnt);/** Were softirqs turned off above:*/if (softirq_count() == (cnt & SOFTIRQ_MASK))trace_softirqs_off(ip);raw_local_irq_restore(flags);if (preempt_count() == cnt)trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
}
EXPORT_SYMBOL(__local_bh_disable_ip);
#endif /* CONFIG_TRACE_IRQFLAGS */static void __local_bh_enable(unsigned int cnt)
{WARN_ON_ONCE(!irqs_disabled());if (softirq_count() == (cnt & SOFTIRQ_MASK))trace_softirqs_on(_RET_IP_);preempt_count_sub(cnt);
}/** Special-case - softirqs can safely be enabled in* cond_resched_softirq(), or by __do_softirq(),* without processing still-pending softirqs:*/
void _local_bh_enable(void)
{WARN_ON_ONCE(in_irq());__local_bh_enable(SOFTIRQ_DISABLE_OFFSET);
}
EXPORT_SYMBOL(_local_bh_enable);void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
{WARN_ON_ONCE(in_irq() || irqs_disabled());
#ifdef CONFIG_TRACE_IRQFLAGSlocal_irq_disable();
#endif/** Are softirqs going to be turned on now:*/if (softirq_count() == SOFTIRQ_DISABLE_OFFSET)trace_softirqs_on(ip);/** Keep preemption disabled until we are done with* softirq processing:*/preempt_count_sub(cnt - 1);if (unlikely(!in_interrupt() && local_softirq_pending())) {/** Run softirq if any pending. And do it in its own stack* as we may be calling this deep in a task call stack already.*/do_softirq();}preempt_count_dec();
#ifdef CONFIG_TRACE_IRQFLAGSlocal_irq_enable();
#endifpreempt_check_resched();
}
EXPORT_SYMBOL(__local_bh_enable_ip);/** We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,* but break the loop if need_resched() is set or after 2 ms.* The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in* certain cases, such as stop_machine(), jiffies may cease to* increment and so we need the MAX_SOFTIRQ_RESTART limit as* well to make sure we eventually return from this method.** These limits have been established via experimentation.* The two things to balance is latency against fairness -* we want to handle softirqs as soon as possible, but they* should not be able to lock up the box.*/
#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
#define MAX_SOFTIRQ_RESTART 10#ifdef CONFIG_TRACE_IRQFLAGS
/** When we run softirqs from irq_exit() and thus on the hardirq stack we need* to keep the lockdep irq context tracking as tight as possible in order to* not miss-qualify lock contexts and miss possible deadlocks.*/static inline bool lockdep_softirq_start(void)
{bool in_hardirq = false;if (trace_hardirq_context(current)) {in_hardirq = true;trace_hardirq_exit();}lockdep_softirq_enter();return in_hardirq;
}static inline void lockdep_softirq_end(bool in_hardirq)
{lockdep_softirq_exit();if (in_hardirq)trace_hardirq_enter();
}
#else
static inline bool lockdep_softirq_start(void) { return false; }
static inline void lockdep_softirq_end(bool in_hardirq) { }
#endifasmlinkage void __do_softirq(void)
{unsigned long end = jiffies + MAX_SOFTIRQ_TIME;unsigned long old_flags = current->flags;int max_restart = MAX_SOFTIRQ_RESTART;struct softirq_action *h;bool in_hardirq;__u32 pending;int softirq_bit;int cpu;/** Mask out PF_MEMALLOC s current task context is borrowed for the* softirq. A softirq handled such as network RX might set PF_MEMALLOC* again if the socket is related to swap*/current->flags &= ~PF_MEMALLOC;pending = local_softirq_pending();account_irq_enter_time(current);__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);in_hardirq = lockdep_softirq_start();cpu = smp_processor_id();
restart:/* Reset the pending bitmask before enabling irqs */set_softirq_pending(0);local_irq_enable();h = softirq_vec;while ((softirq_bit = ffs(pending))) {unsigned int vec_nr;int prev_count;h += softirq_bit - 1;vec_nr = h - softirq_vec;prev_count = preempt_count();kstat_incr_softirqs_this_cpu(vec_nr);trace_softirq_entry(vec_nr);h->action(h);trace_softirq_exit(vec_nr);if (unlikely(prev_count != preempt_count())) {pr_err("huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n",vec_nr, softirq_to_name[vec_nr], h->action,prev_count, preempt_count());preempt_count_set(prev_count);}rcu_bh_qs(cpu);h++;pending >>= softirq_bit;}local_irq_disable();pending = local_softirq_pending();if (pending) {if (time_before(jiffies, end) && !need_resched() &&--max_restart)goto restart;wakeup_softirqd();}lockdep_softirq_end(in_hardirq);account_irq_exit_time(current);__local_bh_enable(SOFTIRQ_OFFSET);WARN_ON_ONCE(in_interrupt());tsk_restore_flags(current, old_flags, PF_MEMALLOC);
}asmlinkage void do_softirq(void)
{__u32 pending;unsigned long flags;if (in_interrupt())return;local_irq_save(flags);pending = local_softirq_pending();if (pending)do_softirq_own_stack();local_irq_restore(flags);
}/** Enter an interrupt context.*/
void irq_enter(void)
{rcu_irq_enter();if (is_idle_task(current) && !in_interrupt()) {/** Prevent raise_softirq from needlessly waking up ksoftirqd* here, as softirq will be serviced on return from interrupt.*/local_bh_disable();tick_irq_enter();_local_bh_enable();}__irq_enter();
}static inline void invoke_softirq(void)
{if (!force_irqthreads) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK/** We can safely execute softirq on the current stack if* it is the irq stack, because it should be near empty* at this stage.*/__do_softirq();
#else/** Otherwise, irq_exit() is called on the task stack that can* be potentially deep already. So call softirq in its own stack* to prevent from any overrun.*/do_softirq_own_stack();
#endif} else {wakeup_softirqd();}
}static inline void tick_irq_exit(void)
{
#ifdef CONFIG_NO_HZ_COMMONint cpu = smp_processor_id();/* Make sure that timer wheel updates are propagated */if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {if (!in_interrupt())tick_nohz_irq_exit();}
#endif
}/** Exit an interrupt context. Process softirqs if needed and possible:*/
void irq_exit(void)
{
#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLEDlocal_irq_disable();
#elseWARN_ON_ONCE(!irqs_disabled());
#endifaccount_irq_exit_time(current);preempt_count_sub(HARDIRQ_OFFSET);if (!in_interrupt() && local_softirq_pending())invoke_softirq();tick_irq_exit();rcu_irq_exit();trace_hardirq_exit(); /* must be last! */
}/** This function must run with irqs disabled!*/
inline void raise_softirq_irqoff(unsigned int nr)
{__raise_softirq_irqoff(nr);/** If we're in an interrupt or softirq, we're done* (this also catches softirq-disabled code). We will* actually run the softirq once we return from* the irq or softirq.** Otherwise we wake up ksoftirqd to make sure we* schedule the softirq soon.*/if (!in_interrupt())wakeup_softirqd();
}void raise_softirq(unsigned int nr)
{unsigned long flags;local_irq_save(flags);raise_softirq_irqoff(nr);local_irq_restore(flags);
}void __raise_softirq_irqoff(unsigned int nr)
{trace_softirq_raise(nr);or_softirq_pending(1UL << nr);
}void open_softirq(int nr, void (*action)(struct softirq_action *))
{softirq_vec[nr].action = action;
}/** Tasklets*/
struct tasklet_head {struct tasklet_struct *head;struct tasklet_struct **tail;
};static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec);
static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);void __tasklet_schedule(struct tasklet_struct *t)
{unsigned long flags;local_irq_save(flags);t->next = NULL;*__this_cpu_read(tasklet_vec.tail) = t;__this_cpu_write(tasklet_vec.tail, &(t->next));raise_softirq_irqoff(TASKLET_SOFTIRQ);local_irq_restore(flags);
}
EXPORT_SYMBOL(__tasklet_schedule);void __tasklet_hi_schedule(struct tasklet_struct *t)
{unsigned long flags;local_irq_save(flags);t->next = NULL;*__this_cpu_read(tasklet_hi_vec.tail) = t;__this_cpu_write(tasklet_hi_vec.tail, &(t->next));raise_softirq_irqoff(HI_SOFTIRQ);local_irq_restore(flags);
}
EXPORT_SYMBOL(__tasklet_hi_schedule);void __tasklet_hi_schedule_first(struct tasklet_struct *t)
{BUG_ON(!irqs_disabled());t->next = __this_cpu_read(tasklet_hi_vec.head);__this_cpu_write(tasklet_hi_vec.head, t);__raise_softirq_irqoff(HI_SOFTIRQ);
}
EXPORT_SYMBOL(__tasklet_hi_schedule_first);static void tasklet_action(struct softirq_action *a)
{struct tasklet_struct *list;local_irq_disable();list = __this_cpu_read(tasklet_vec.head);__this_cpu_write(tasklet_vec.head, NULL);__this_cpu_write(tasklet_vec.tail, &__get_cpu_var(tasklet_vec).head);local_irq_enable();while (list) {struct tasklet_struct *t = list;list = list->next;if (tasklet_trylock(t)) {if (!atomic_read(&t->count)) {if (!test_and_clear_bit(TASKLET_STATE_SCHED,&t->state))BUG();t->func(t->data);tasklet_unlock(t);continue;}tasklet_unlock(t);}local_irq_disable();t->next = NULL;*__this_cpu_read(tasklet_vec.tail) = t;__this_cpu_write(tasklet_vec.tail, &(t->next));__raise_softirq_irqoff(TASKLET_SOFTIRQ);local_irq_enable();}
}static void tasklet_hi_action(struct softirq_action *a)
{struct tasklet_struct *list;local_irq_disable();list = __this_cpu_read(tasklet_hi_vec.head);__this_cpu_write(tasklet_hi_vec.head, NULL);__this_cpu_write(tasklet_hi_vec.tail, &__get_cpu_var(tasklet_hi_vec).head);local_irq_enable();while (list) {struct tasklet_struct *t = list;list = list->next;if (tasklet_trylock(t)) {if (!atomic_read(&t->count)) {if (!test_and_clear_bit(TASKLET_STATE_SCHED,&t->state))BUG();t->func(t->data);tasklet_unlock(t);continue;}tasklet_unlock(t);}local_irq_disable();t->next = NULL;*__this_cpu_read(tasklet_hi_vec.tail) = t;__this_cpu_write(tasklet_hi_vec.tail, &(t->next));__raise_softirq_irqoff(HI_SOFTIRQ);local_irq_enable();}
}void tasklet_init(struct tasklet_struct *t,void (*func)(unsigned long), unsigned long data)
{t->next = NULL;t->state = 0;atomic_set(&t->count, 0);t->func = func;t->data = data;
}
EXPORT_SYMBOL(tasklet_init);void tasklet_kill(struct tasklet_struct *t)
{if (in_interrupt())pr_notice("Attempt to kill tasklet from interrupt\n");while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {do {yield();} while (test_bit(TASKLET_STATE_SCHED, &t->state));}tasklet_unlock_wait(t);clear_bit(TASKLET_STATE_SCHED, &t->state);
}
EXPORT_SYMBOL(tasklet_kill);/** tasklet_hrtimer*//** The trampoline is called when the hrtimer expires. It schedules a tasklet* to run __tasklet_hrtimer_trampoline() which in turn will call the intended* hrtimer callback, but from softirq context.*/
static enum hrtimer_restart __hrtimer_tasklet_trampoline(struct hrtimer *timer)
{struct tasklet_hrtimer *ttimer =container_of(timer, struct tasklet_hrtimer, timer);tasklet_hi_schedule(&ttimer->tasklet);return HRTIMER_NORESTART;
}/** Helper function which calls the hrtimer callback from* tasklet/softirq context*/
static void __tasklet_hrtimer_trampoline(unsigned long data)
{struct tasklet_hrtimer *ttimer = (void *)data;enum hrtimer_restart restart;restart = ttimer->function(&ttimer->timer);if (restart != HRTIMER_NORESTART)hrtimer_restart(&ttimer->timer);
}/*** tasklet_hrtimer_init - Init a tasklet/hrtimer combo for softirq callbacks* @ttimer: tasklet_hrtimer which is initialized* @function: hrtimer callback function which gets called from softirq context* @which_clock: clock id (CLOCK_MONOTONIC/CLOCK_REALTIME)* @mode: hrtimer mode (HRTIMER_MODE_ABS/HRTIMER_MODE_REL)*/
void tasklet_hrtimer_init(struct tasklet_hrtimer *ttimer,enum hrtimer_restart (*function)(struct hrtimer *),clockid_t which_clock, enum hrtimer_mode mode)
{hrtimer_init(&ttimer->timer, which_clock, mode);ttimer->timer.function = __hrtimer_tasklet_trampoline;tasklet_init(&ttimer->tasklet, __tasklet_hrtimer_trampoline,(unsigned long)ttimer);ttimer->function = function;
}
EXPORT_SYMBOL_GPL(tasklet_hrtimer_init);void __init softirq_init(void)
{int cpu;for_each_possible_cpu(cpu) {per_cpu(tasklet_vec, cpu).tail =&per_cpu(tasklet_vec, cpu).head;per_cpu(tasklet_hi_vec, cpu).tail =&per_cpu(tasklet_hi_vec, cpu).head;}open_softirq(TASKLET_SOFTIRQ, tasklet_action);open_softirq(HI_SOFTIRQ, tasklet_hi_action);
}static int ksoftirqd_should_run(unsigned int cpu)
{return local_softirq_pending();
}static void run_ksoftirqd(unsigned int cpu)
{local_irq_disable();if (local_softirq_pending()) {/** We can safely run softirq on inline stack, as we are not deep* in the task stack here.*/__do_softirq();rcu_note_context_switch(cpu);local_irq_enable();cond_resched();return;}local_irq_enable();
}#ifdef CONFIG_HOTPLUG_CPU
/** tasklet_kill_immediate is called to remove a tasklet which can already be* scheduled for execution on @cpu.** Unlike tasklet_kill, this function removes the tasklet* _immediately_, even if the tasklet is in TASKLET_STATE_SCHED state.** When this function is called, @cpu must be in the CPU_DEAD state.*/
void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu)
{struct tasklet_struct **i;BUG_ON(cpu_online(cpu));BUG_ON(test_bit(TASKLET_STATE_RUN, &t->state));if (!test_bit(TASKLET_STATE_SCHED, &t->state))return;/* CPU is dead, so no lock needed. */for (i = &per_cpu(tasklet_vec, cpu).head; *i; i = &(*i)->next) {if (*i == t) {*i = t->next;/* If this was the tail element, move the tail ptr */if (*i == NULL)per_cpu(tasklet_vec, cpu).tail = i;return;}}BUG();
}static void takeover_tasklets(unsigned int cpu)
{/* CPU is dead, so no lock needed. */local_irq_disable();/* Find end, append list for that CPU. */if (&per_cpu(tasklet_vec, cpu).head != per_cpu(tasklet_vec, cpu).tail) {*__this_cpu_read(tasklet_vec.tail) = per_cpu(tasklet_vec, cpu).head;this_cpu_write(tasklet_vec.tail, per_cpu(tasklet_vec, cpu).tail);per_cpu(tasklet_vec, cpu).head = NULL;per_cpu(tasklet_vec, cpu).tail = &per_cpu(tasklet_vec, cpu).head;}raise_softirq_irqoff(TASKLET_SOFTIRQ);if (&per_cpu(tasklet_hi_vec, cpu).head != per_cpu(tasklet_hi_vec, cpu).tail) {*__this_cpu_read(tasklet_hi_vec.tail) = per_cpu(tasklet_hi_vec, cpu).head;__this_cpu_write(tasklet_hi_vec.tail, per_cpu(tasklet_hi_vec, cpu).tail);per_cpu(tasklet_hi_vec, cpu).head = NULL;per_cpu(tasklet_hi_vec, cpu).tail = &per_cpu(tasklet_hi_vec, cpu).head;}raise_softirq_irqoff(HI_SOFTIRQ);local_irq_enable();
}
#endif /* CONFIG_HOTPLUG_CPU */static int cpu_callback(struct notifier_block *nfb, unsigned long action,void *hcpu)
{switch (action) {
#ifdef CONFIG_HOTPLUG_CPUcase CPU_DEAD:case CPU_DEAD_FROZEN:takeover_tasklets((unsigned long)hcpu);break;
#endif /* CONFIG_HOTPLUG_CPU */}return NOTIFY_OK;
}static struct notifier_block cpu_nfb = {.notifier_call = cpu_callback
};static struct smp_hotplug_thread softirq_threads = {.store = &ksoftirqd,.thread_should_run = ksoftirqd_should_run,.thread_fn = run_ksoftirqd,.thread_comm = "ksoftirqd/%u",
};static __init int spawn_ksoftirqd(void)
{register_cpu_notifier(&cpu_nfb);BUG_ON(smpboot_register_percpu_thread(&softirq_threads));return 0;
}
early_initcall(spawn_ksoftirqd);/** [ These __weak aliases are kept in a separate compilation unit, so that* GCC does not inline them incorrectly. ]*/int __init __weak early_irq_init(void)
{return 0;
}int __init __weak arch_probe_nr_irqs(void)
{return NR_IRQS_LEGACY;
}int __init __weak arch_early_irq_init(void)
{return 0;
}
????????從上面的代碼可以清楚地看到, __do_softirq 首先用 local_softirq_pending 來獲取記所有掛起的軟中斷的全局整型變量的值,然后調(diào)用 local_irq_enable 重新使能了中斷(就意味著軟中斷在執(zhí)行過程中是可以響應(yīng)新的硬件中斷的),接下來將 softirq_vec 數(shù)組元素的地址賦值給h,在while循環(huán)中,遍歷被設(shè)置了的位,并索引得到對應(yīng)的softirq_action對象,再調(diào)用該對象的action 成員所指向的函數(shù)。
????????雖然軟中斷可以實現(xiàn)中斷的下半部,但是軟中斷基本上是內(nèi)核開發(fā)者預(yù)定義好的通常用在對性能要求特別高的場合,而且需要一些內(nèi)核的編程技巧,不太適合于驅(qū)動開發(fā)者。上面介紹的內(nèi)容更多的是讓讀者清楚中斷的下半部是怎樣執(zhí)行的,并且重點指出了在中斷下半部執(zhí)行的過程中,中斷被重新使能了,所以可以響應(yīng)新的硬件中斷。還需要說明的是,除了在中斷返回前執(zhí)行軟中斷,內(nèi)核還為每個 CPU創(chuàng)建了一個軟中斷內(nèi)核線程,當(dāng)需要在中斷處理函數(shù)之外執(zhí)行軟中斷時,可以喚醒該內(nèi)核線程,該線程最終也會調(diào)用上面的__do_softirg函數(shù)
????????最后需要注意的是,軟中斷也處于中斷上下文中,因此對中斷處理函數(shù)的限制同樣適用于軟中斷,只是沒有時間上的嚴格限定。
tasklet
????????雖然軟中斷通常是由內(nèi)核開發(fā)者來設(shè)計的,但是內(nèi)核開發(fā)者專門保留了一個軟中斷給驅(qū)動開發(fā)者,它就是 TASKLET_SOFTIRQ,相應(yīng)的軟中斷處理函數(shù)是 tasklet_action,下面是相關(guān)的代碼。
/* kernel/softirq.c*/
程序見上面源碼
????????在軟中斷的處理過程中,如果 TASKLET_SOFTIRQ對應(yīng)的比特位被設(shè)置了,則根據(jù)前面的分析,tasklet_action 函數(shù)將會被調(diào)用。在代碼第487 行,首先得到了本 CPU 的一個struct tasklet_struct 對象的鏈表,然后遍歷該鏈表,調(diào)用其中 func 成員所指向的函數(shù)并將data成員作為參數(shù)傳遞過去。struct tasklet_struct 的類型定義如下。
/*include/linux/interrupt.h */
見上面源碼
????????其中,next是構(gòu)成鏈表的指針,state 是該tasklet 被調(diào)度的狀態(tài)(已經(jīng)被調(diào)度還是已經(jīng)在執(zhí)行),count 用于禁止 tasklet 執(zhí)行(非0時),func 是 tasklet 的下半部函數(shù),data是傳遞給下半部函數(shù)的參數(shù)。從上面可知,驅(qū)動開發(fā)者要實現(xiàn) tasklet 的下半部,就要構(gòu)造個struct tasklet_struct 結(jié)構(gòu)對象,并初始化里面的成員,然后放入對應(yīng)CPU 的 tasklet鏈表中,最后設(shè)置軟中斷號TASKLET_SOFTIRQ所對應(yīng)的比特位。不過內(nèi)核已經(jīng)有封裝好的宏和函數(shù),大大地簡化了這一操作。下面列出這些常用的宏和函數(shù)。
/*include/linux/interrupt.h */
見上面源碼
????????其中,DECLARE_TASKLET 靜態(tài)定義一個 struct tasklet_struct 結(jié)構(gòu)對象,名字為name,下半部函數(shù)為 func,傳遞的參數(shù)為 data,該 tasklet 可以被執(zhí)行。而DECLARE_TASKLET_DISABLED和 DECLARE_TASKLET相似,只是 count 成員的值被初始化為1,不能被執(zhí)行,需要調(diào)用 tasklet_enable 來使能。tasklet_init 通常用于初始化一個動態(tài)分配的 struct tasklet_struct 結(jié)構(gòu)對象。tasklet_schedule 將指定的 struct tasklet_struct 結(jié)構(gòu)對象加入到對應(yīng)CPU的 tasklet 鏈表中,下半部函數(shù)將會在未來的某個時間被調(diào)度。
????????添加了 tasklet 下半部的虛擬串口驅(qū)動的相關(guān)驅(qū)動代碼如下
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/kfifo.h>#include <linux/ioctl.h>
#include <linux/uaccess.h>#include <linux/wait.h>
#include <linux/sched.h>
#include <linux/poll.h>
#include <linux/aio.h>#include <linux/interrupt.h>
#include <linux/random.h>#include "vser.h"#define VSER_MAJOR 256
#define VSER_MINOR 0
#define VSER_DEV_CNT 1
#define VSER_DEV_NAME "vser"struct vser_dev {unsigned int baud;struct option opt;struct cdev cdev;wait_queue_head_t rwqh;wait_queue_head_t wwqh;struct fasync_struct *fapp;
};DEFINE_KFIFO(vsfifo, char, 32);
static struct vser_dev vsdev;static void vser_tsklet(unsigned long arg);
DECLARE_TASKLET(vstsklet, vser_tsklet, (unsigned long)&vsdev);static int vser_fasync(int fd, struct file *filp, int on);static int vser_open(struct inode *inode, struct file *filp)
{return 0;
}static int vser_release(struct inode *inode, struct file *filp)
{vser_fasync(-1, filp, 0);return 0;
}static ssize_t vser_read(struct file *filp, char __user *buf, size_t count, loff_t *pos)
{int ret;unsigned int copied = 0;if (kfifo_is_empty(&vsfifo)) {if (filp->f_flags & O_NONBLOCK)return -EAGAIN;if (wait_event_interruptible_exclusive(vsdev.rwqh, !kfifo_is_empty(&vsfifo)))return -ERESTARTSYS;}ret = kfifo_to_user(&vsfifo, buf, count, &copied);if (!kfifo_is_full(&vsfifo)) {wake_up_interruptible(&vsdev.wwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_OUT);}return ret == 0 ? copied : ret;
}static ssize_t vser_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos)
{int ret;unsigned int copied = 0;if (kfifo_is_full(&vsfifo)) {if (filp->f_flags & O_NONBLOCK)return -EAGAIN;if (wait_event_interruptible_exclusive(vsdev.wwqh, !kfifo_is_full(&vsfifo)))return -ERESTARTSYS;}ret = kfifo_from_user(&vsfifo, buf, count, &copied);if (!kfifo_is_empty(&vsfifo)) {wake_up_interruptible(&vsdev.rwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_IN);}return ret == 0 ? copied : ret;
}static long vser_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{if (_IOC_TYPE(cmd) != VS_MAGIC)return -ENOTTY;switch (cmd) {case VS_SET_BAUD:vsdev.baud = arg;break;case VS_GET_BAUD:arg = vsdev.baud;break;case VS_SET_FFMT:if (copy_from_user(&vsdev.opt, (struct option __user *)arg, sizeof(struct option)))return -EFAULT;break;case VS_GET_FFMT:if (copy_to_user((struct option __user *)arg, &vsdev.opt, sizeof(struct option)))return -EFAULT;break;default:return -ENOTTY;}return 0;
}static unsigned int vser_poll(struct file *filp, struct poll_table_struct *p)
{int mask = 0;poll_wait(filp, &vsdev.rwqh, p);poll_wait(filp, &vsdev.wwqh, p);if (!kfifo_is_empty(&vsfifo))mask |= POLLIN | POLLRDNORM;if (!kfifo_is_full(&vsfifo))mask |= POLLOUT | POLLWRNORM;return mask;
}static ssize_t vser_aio_read(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos)
{size_t read = 0;unsigned long i;ssize_t ret;for (i = 0; i < nr_segs; i++) {ret = vser_read(iocb->ki_filp, iov[i].iov_base, iov[i].iov_len, &pos);if (ret < 0)break;read += ret;}return read ? read : -EFAULT;
}static ssize_t vser_aio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos)
{size_t written = 0;unsigned long i;ssize_t ret;for (i = 0; i < nr_segs; i++) {ret = vser_write(iocb->ki_filp, iov[i].iov_base, iov[i].iov_len, &pos);if (ret < 0)break;written += ret;}return written ? written : -EFAULT;
}static int vser_fasync(int fd, struct file *filp, int on)
{return fasync_helper(fd, filp, on, &vsdev.fapp);
}static irqreturn_t vser_handler(int irq, void *dev_id)
{tasklet_schedule(&vstsklet);return IRQ_HANDLED;
}static void vser_tsklet(unsigned long arg)
{char data;get_random_bytes(&data, sizeof(data));data %= 26;data += 'A';if (!kfifo_is_full(&vsfifo))if(!kfifo_in(&vsfifo, &data, sizeof(data)))printk(KERN_ERR "vser: kfifo_in failure\n");if (!kfifo_is_empty(&vsfifo)) {wake_up_interruptible(&vsdev.rwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_IN);}
}static struct file_operations vser_ops = {.owner = THIS_MODULE,.open = vser_open,.release = vser_release,.read = vser_read,.write = vser_write,.unlocked_ioctl = vser_ioctl,.poll = vser_poll,.aio_read = vser_aio_read,.aio_write = vser_aio_write,.fasync = vser_fasync,
};static int __init vser_init(void)
{int ret;dev_t dev;dev = MKDEV(VSER_MAJOR, VSER_MINOR);ret = register_chrdev_region(dev, VSER_DEV_CNT, VSER_DEV_NAME);if (ret)goto reg_err;cdev_init(&vsdev.cdev, &vser_ops);vsdev.cdev.owner = THIS_MODULE;vsdev.baud = 115200;vsdev.opt.datab = 8;vsdev.opt.parity = 0;vsdev.opt.stopb = 1;ret = cdev_add(&vsdev.cdev, dev, VSER_DEV_CNT);if (ret)goto add_err;init_waitqueue_head(&vsdev.rwqh);init_waitqueue_head(&vsdev.wwqh);ret = request_irq(167, vser_handler, IRQF_TRIGGER_HIGH | IRQF_SHARED, "vser", &vsdev);if (ret)goto irq_err;return 0;irq_err:cdev_del(&vsdev.cdev);
add_err:unregister_chrdev_region(dev, VSER_DEV_CNT);
reg_err:return ret;
}static void __exit vser_exit(void)
{dev_t dev;dev = MKDEV(VSER_MAJOR, VSER_MINOR);free_irq(167, &vsdev);cdev_del(&vsdev.cdev);unregister_chrdev_region(dev, VSER_DEV_CNT);
}module_init(vser_init);
module_exit(vser_exit);MODULE_LICENSE("GPL");
MODULE_AUTHOR("name <email>");
MODULE_DESCRIPTION("A simple character device driver");
MODULE_ALIAS("virtual-serial");
????????代碼第40 行靜態(tài)定義了一個 struct tasklet_struct 結(jié)構(gòu)對象,名字叫 vstsklet,執(zhí)行函數(shù)是vser_tsklet,傳遞的參數(shù)是&vsdev。代碼第183 行直接調(diào)度 tasklet,中斷的上半部中沒有做過多的其他操作,然后就返回了IRQ_HANDLED。代碼第 192行到第 203行vser_tsklet函數(shù)的實現(xiàn),它是以前的中斷中上半部完成的事情。
????????最后,對 tasklet 的主要特性進行一些總結(jié)。
????????(1)tasklet 是一個特定的軟中斷,處于中斷的上下文。
????????(2)tasklet_schedule 函數(shù)被調(diào)用后,對應(yīng)的下半部會保證被至少執(zhí)行一次? ? ? ? ? ? ? ? ? ? ? ? ? ? ????????(3)如果一個 tasklet 已經(jīng)被調(diào)度,但是還沒有被執(zhí)行,那么新的調(diào)度將會被忽略,
工作隊列
????????前面講解的下半部機制不管是軟中斷還是 tasklet 都有一個限制,就是在中斷上下文中執(zhí)行不能直接或間接地調(diào)用調(diào)度器。為了解決這個問題,內(nèi)核又提供了另一種下半部機制,叫作工作隊列。它的實現(xiàn)思想也比較簡單,就是內(nèi)核在啟動的時候創(chuàng)建一個或多個(在多核的處理器上)內(nèi)核工作線程,工作線程取出工作隊列中的每一個工作,然后執(zhí)行,當(dāng)隊列中沒有工作時,工作線程休眠。當(dāng)驅(qū)動想要延遲執(zhí)行某一個工作時,構(gòu)造一個工作隊列節(jié)點對象,然后加入到相應(yīng)的工作隊列,并喚醒工作線程,工作線程又取出隊列上的節(jié)點來完成工作,所有工作完成后又休眠。因為是運行在進程上下文中,所以工作可以調(diào)用調(diào)度器。工作隊列提供了一種延遲執(zhí)行的機制,很顯然這種機制也適用于中斷的下半部。另外,除了內(nèi)核本身的工作隊列之外,驅(qū)動開發(fā)者也可以使用內(nèi)核的基礎(chǔ)設(shè)施來創(chuàng)建自己的工作隊列。下面是工作隊列節(jié)點的結(jié)構(gòu)類型定義。
/*include/linux/workqueue.h */
/** workqueue.h --- work queue handling for Linux.*/#ifndef _LINUX_WORKQUEUE_H
#define _LINUX_WORKQUEUE_H#include <linux/timer.h>
#include <linux/linkage.h>
#include <linux/bitops.h>
#include <linux/lockdep.h>
#include <linux/threads.h>
#include <linux/atomic.h>
#include <linux/cpumask.h>struct workqueue_struct;struct work_struct;
typedef void (*work_func_t)(struct work_struct *work);
void delayed_work_timer_fn(unsigned long __data);/** The first word is the work queue pointer and the flags rolled into* one*/
#define work_data_bits(work) ((unsigned long *)(&(work)->data))enum {WORK_STRUCT_PENDING_BIT = 0, /* work item is pending execution */WORK_STRUCT_DELAYED_BIT = 1, /* work item is delayed */WORK_STRUCT_PWQ_BIT = 2, /* data points to pwq */WORK_STRUCT_LINKED_BIT = 3, /* next work is linked to this one */
#ifdef CONFIG_DEBUG_OBJECTS_WORKWORK_STRUCT_STATIC_BIT = 4, /* static initializer (debugobjects) */WORK_STRUCT_COLOR_SHIFT = 5, /* color for workqueue flushing */
#elseWORK_STRUCT_COLOR_SHIFT = 4, /* color for workqueue flushing */
#endifWORK_STRUCT_COLOR_BITS = 4,WORK_STRUCT_PENDING = 1 << WORK_STRUCT_PENDING_BIT,WORK_STRUCT_DELAYED = 1 << WORK_STRUCT_DELAYED_BIT,WORK_STRUCT_PWQ = 1 << WORK_STRUCT_PWQ_BIT,WORK_STRUCT_LINKED = 1 << WORK_STRUCT_LINKED_BIT,
#ifdef CONFIG_DEBUG_OBJECTS_WORKWORK_STRUCT_STATIC = 1 << WORK_STRUCT_STATIC_BIT,
#elseWORK_STRUCT_STATIC = 0,
#endif/** The last color is no color used for works which don't* participate in workqueue flushing.*/WORK_NR_COLORS = (1 << WORK_STRUCT_COLOR_BITS) - 1,WORK_NO_COLOR = WORK_NR_COLORS,/* special cpu IDs */WORK_CPU_UNBOUND = NR_CPUS,WORK_CPU_END = NR_CPUS + 1,/** Reserve 7 bits off of pwq pointer w/ debugobjects turned off.* This makes pwqs aligned to 256 bytes and allows 15 workqueue* flush colors.*/WORK_STRUCT_FLAG_BITS = WORK_STRUCT_COLOR_SHIFT +WORK_STRUCT_COLOR_BITS,/* data contains off-queue information when !WORK_STRUCT_PWQ */WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT,WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE),/** When a work item is off queue, its high bits point to the last* pool it was on. Cap at 31 bits and use the highest number to* indicate that no pool is associated.*/WORK_OFFQ_FLAG_BITS = 1,WORK_OFFQ_POOL_SHIFT = WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,WORK_OFFQ_LEFT = BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT,WORK_OFFQ_POOL_BITS = WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,WORK_OFFQ_POOL_NONE = (1LU << WORK_OFFQ_POOL_BITS) - 1,/* convenience constants */WORK_STRUCT_FLAG_MASK = (1UL << WORK_STRUCT_FLAG_BITS) - 1,WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,WORK_STRUCT_NO_POOL = (unsigned long)WORK_OFFQ_POOL_NONE << WORK_OFFQ_POOL_SHIFT,/* bit mask for work_busy() return values */WORK_BUSY_PENDING = 1 << 0,WORK_BUSY_RUNNING = 1 << 1,/* maximum string length for set_worker_desc() */WORKER_DESC_LEN = 24,
};struct work_struct {atomic_long_t data;struct list_head entry;work_func_t func;
#ifdef CONFIG_LOCKDEPstruct lockdep_map lockdep_map;
#endif
};#define WORK_DATA_INIT() ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL)
#define WORK_DATA_STATIC_INIT() \ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL | WORK_STRUCT_STATIC)struct delayed_work {struct work_struct work;struct timer_list timer;/* target workqueue and CPU ->timer uses to queue ->work */struct workqueue_struct *wq;int cpu;
};/** A struct for workqueue attributes. This can be used to change* attributes of an unbound workqueue.** Unlike other fields, ->no_numa isn't a property of a worker_pool. It* only modifies how apply_workqueue_attrs() select pools and thus doesn't* participate in pool hash calculations or equality comparisons.*/
struct workqueue_attrs {int nice; /* nice level */cpumask_var_t cpumask; /* allowed CPUs */bool no_numa; /* disable NUMA affinity */
};static inline struct delayed_work *to_delayed_work(struct work_struct *work)
{return container_of(work, struct delayed_work, work);
}struct execute_work {struct work_struct work;
};#ifdef CONFIG_LOCKDEP
/** NB: because we have to copy the lockdep_map, setting _key* here is required, otherwise it could get initialised to the* copy of the lockdep_map!*/
#define __WORK_INIT_LOCKDEP_MAP(n, k) \.lockdep_map = STATIC_LOCKDEP_MAP_INIT(n, k),
#else
#define __WORK_INIT_LOCKDEP_MAP(n, k)
#endif#define __WORK_INITIALIZER(n, f) { \.data = WORK_DATA_STATIC_INIT(), \.entry = { &(n).entry, &(n).entry }, \.func = (f), \__WORK_INIT_LOCKDEP_MAP(#n, &(n)) \}#define __DELAYED_WORK_INITIALIZER(n, f, tflags) { \.work = __WORK_INITIALIZER((n).work, (f)), \.timer = __TIMER_INITIALIZER(delayed_work_timer_fn, \0, (unsigned long)&(n), \(tflags) | TIMER_IRQSAFE), \}#define DECLARE_WORK(n, f) \struct work_struct n = __WORK_INITIALIZER(n, f)#define DECLARE_DELAYED_WORK(n, f) \struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, 0)#define DECLARE_DEFERRABLE_WORK(n, f) \struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, TIMER_DEFERRABLE)/** initialize a work item's function pointer*/
#define PREPARE_WORK(_work, _func) \do { \(_work)->func = (_func); \} while (0)#define PREPARE_DELAYED_WORK(_work, _func) \PREPARE_WORK(&(_work)->work, (_func))#ifdef CONFIG_DEBUG_OBJECTS_WORK
extern void __init_work(struct work_struct *work, int onstack);
extern void destroy_work_on_stack(struct work_struct *work);
static inline unsigned int work_static(struct work_struct *work)
{return *work_data_bits(work) & WORK_STRUCT_STATIC;
}
#else
static inline void __init_work(struct work_struct *work, int onstack) { }
static inline void destroy_work_on_stack(struct work_struct *work) { }
static inline unsigned int work_static(struct work_struct *work) { return 0; }
#endif/** initialize all of a work item in one go** NOTE! No point in using "atomic_long_set()": using a direct* assignment of the work data initializer allows the compiler* to generate better code.*/
#ifdef CONFIG_LOCKDEP
#define __INIT_WORK(_work, _func, _onstack) \do { \static struct lock_class_key __key; \\__init_work((_work), _onstack); \(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0); \INIT_LIST_HEAD(&(_work)->entry); \PREPARE_WORK((_work), (_func)); \} while (0)
#else
#define __INIT_WORK(_work, _func, _onstack) \do { \__init_work((_work), _onstack); \(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \INIT_LIST_HEAD(&(_work)->entry); \PREPARE_WORK((_work), (_func)); \} while (0)
#endif#define INIT_WORK(_work, _func) \do { \__INIT_WORK((_work), (_func), 0); \} while (0)#define INIT_WORK_ONSTACK(_work, _func) \do { \__INIT_WORK((_work), (_func), 1); \} while (0)#define __INIT_DELAYED_WORK(_work, _func, _tflags) \do { \INIT_WORK(&(_work)->work, (_func)); \__setup_timer(&(_work)->timer, delayed_work_timer_fn, \(unsigned long)(_work), \(_tflags) | TIMER_IRQSAFE); \} while (0)#define __INIT_DELAYED_WORK_ONSTACK(_work, _func, _tflags) \do { \INIT_WORK_ONSTACK(&(_work)->work, (_func)); \__setup_timer_on_stack(&(_work)->timer, \delayed_work_timer_fn, \(unsigned long)(_work), \(_tflags) | TIMER_IRQSAFE); \} while (0)#define INIT_DELAYED_WORK(_work, _func) \__INIT_DELAYED_WORK(_work, _func, 0)#define INIT_DELAYED_WORK_ONSTACK(_work, _func) \__INIT_DELAYED_WORK_ONSTACK(_work, _func, 0)#define INIT_DEFERRABLE_WORK(_work, _func) \__INIT_DELAYED_WORK(_work, _func, TIMER_DEFERRABLE)#define INIT_DEFERRABLE_WORK_ONSTACK(_work, _func) \__INIT_DELAYED_WORK_ONSTACK(_work, _func, TIMER_DEFERRABLE)/*** work_pending - Find out whether a work item is currently pending* @work: The work item in question*/
#define work_pending(work) \test_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))/*** delayed_work_pending - Find out whether a delayable work item is currently* pending* @work: The work item in question*/
#define delayed_work_pending(w) \work_pending(&(w)->work)/*** work_clear_pending - for internal use only, mark a work item as not pending* @work: The work item in question*/
#define work_clear_pending(work) \clear_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))/** Workqueue flags and constants. For details, please refer to* Documentation/workqueue.txt.*/
enum {/** All wqs are now non-reentrant making the following flag* meaningless. Will be removed.*/WQ_NON_REENTRANT = 1 << 0, /* DEPRECATED */WQ_UNBOUND = 1 << 1, /* not bound to any cpu */WQ_FREEZABLE = 1 << 2, /* freeze during suspend */WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */WQ_HIGHPRI = 1 << 4, /* high priority */WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */WQ_SYSFS = 1 << 6, /* visible in sysfs, see wq_sysfs_register() *//** Per-cpu workqueues are generally preferred because they tend to* show better performance thanks to cache locality. Per-cpu* workqueues exclude the scheduler from choosing the CPU to* execute the worker threads, which has an unfortunate side effect* of increasing power consumption.** The scheduler considers a CPU idle if it doesn't have any task* to execute and tries to keep idle cores idle to conserve power;* however, for example, a per-cpu work item scheduled from an* interrupt handler on an idle CPU will force the scheduler to* excute the work item on that CPU breaking the idleness, which in* turn may lead to more scheduling choices which are sub-optimal* in terms of power consumption.** Workqueues marked with WQ_POWER_EFFICIENT are per-cpu by default* but become unbound if workqueue.power_efficient kernel param is* specified. Per-cpu workqueues which are identified to* contribute significantly to power-consumption are identified and* marked with this flag and enabling the power_efficient mode* leads to noticeable power saving at the cost of small* performance disadvantage.** http://thread.gmane.org/gmane.linux.kernel/1480396*/WQ_POWER_EFFICIENT = 1 << 7,__WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */WQ_DFL_ACTIVE = WQ_MAX_ACTIVE / 2,
};/* unbound wq's aren't per-cpu, scale max_active according to #cpus */
#define WQ_UNBOUND_MAX_ACTIVE \max_t(int, WQ_MAX_ACTIVE, num_possible_cpus() * WQ_MAX_UNBOUND_PER_CPU)/** System-wide workqueues which are always present.** system_wq is the one used by schedule[_delayed]_work[_on]().* Multi-CPU multi-threaded. There are users which expect relatively* short queue flush time. Don't queue works which can run for too* long.** system_long_wq is similar to system_wq but may host long running* works. Queue flushing might take relatively long.** system_unbound_wq is unbound workqueue. Workers are not bound to* any specific CPU, not concurrency managed, and all queued works are* executed immediately as long as max_active limit is not reached and* resources are available.** system_freezable_wq is equivalent to system_wq except that it's* freezable.** *_power_efficient_wq are inclined towards saving power and converted* into WQ_UNBOUND variants if 'wq_power_efficient' is enabled; otherwise,* they are same as their non-power-efficient counterparts - e.g.* system_power_efficient_wq is identical to system_wq if* 'wq_power_efficient' is disabled. See WQ_POWER_EFFICIENT for more info.*/
extern struct workqueue_struct *system_wq;
extern struct workqueue_struct *system_long_wq;
extern struct workqueue_struct *system_unbound_wq;
extern struct workqueue_struct *system_freezable_wq;
extern struct workqueue_struct *system_power_efficient_wq;
extern struct workqueue_struct *system_freezable_power_efficient_wq;static inline struct workqueue_struct * __deprecated __system_nrt_wq(void)
{return system_wq;
}static inline struct workqueue_struct * __deprecated __system_nrt_freezable_wq(void)
{return system_freezable_wq;
}/* equivlalent to system_wq and system_freezable_wq, deprecated */
#define system_nrt_wq __system_nrt_wq()
#define system_nrt_freezable_wq __system_nrt_freezable_wq()extern struct workqueue_struct *
__alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,struct lock_class_key *key, const char *lock_name, ...) __printf(1, 6);/*** alloc_workqueue - allocate a workqueue* @fmt: printf format for the name of the workqueue* @flags: WQ_* flags* @max_active: max in-flight work items, 0 for default* @args: args for @fmt** Allocate a workqueue with the specified parameters. For detailed* information on WQ_* flags, please refer to Documentation/workqueue.txt.** The __lock_name macro dance is to guarantee that single lock_class_key* doesn't end up with different namesm, which isn't allowed by lockdep.** RETURNS:* Pointer to the allocated workqueue on success, %NULL on failure.*/
#ifdef CONFIG_LOCKDEP
#define alloc_workqueue(fmt, flags, max_active, args...) \
({ \static struct lock_class_key __key; \const char *__lock_name; \\__lock_name = #fmt#args; \\__alloc_workqueue_key((fmt), (flags), (max_active), \&__key, __lock_name, ##args); \
})
#else
#define alloc_workqueue(fmt, flags, max_active, args...) \__alloc_workqueue_key((fmt), (flags), (max_active), \NULL, NULL, ##args)
#endif/*** alloc_ordered_workqueue - allocate an ordered workqueue* @fmt: printf format for the name of the workqueue* @flags: WQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)* @args: args for @fmt** Allocate an ordered workqueue. An ordered workqueue executes at* most one work item at any given time in the queued order. They are* implemented as unbound workqueues with @max_active of one.** RETURNS:* Pointer to the allocated workqueue on success, %NULL on failure.*/
#define alloc_ordered_workqueue(fmt, flags, args...) \alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)#define create_workqueue(name) \alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name))
#define create_freezable_workqueue(name) \alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND | WQ_MEM_RECLAIM, \1, (name))
#define create_singlethread_workqueue(name) \alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, (name))extern void destroy_workqueue(struct workqueue_struct *wq);struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
void free_workqueue_attrs(struct workqueue_attrs *attrs);
int apply_workqueue_attrs(struct workqueue_struct *wq,const struct workqueue_attrs *attrs);extern bool queue_work_on(int cpu, struct workqueue_struct *wq,struct work_struct *work);
extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,struct delayed_work *work, unsigned long delay);
extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,struct delayed_work *dwork, unsigned long delay);extern void flush_workqueue(struct workqueue_struct *wq);
extern void drain_workqueue(struct workqueue_struct *wq);
extern void flush_scheduled_work(void);extern int schedule_on_each_cpu(work_func_t func);int execute_in_process_context(work_func_t fn, struct execute_work *);extern bool flush_work(struct work_struct *work);
extern bool cancel_work_sync(struct work_struct *work);extern bool flush_delayed_work(struct delayed_work *dwork);
extern bool cancel_delayed_work(struct delayed_work *dwork);
extern bool cancel_delayed_work_sync(struct delayed_work *dwork);extern void workqueue_set_max_active(struct workqueue_struct *wq,int max_active);
extern bool current_is_workqueue_rescuer(void);
extern bool workqueue_congested(int cpu, struct workqueue_struct *wq);
extern unsigned int work_busy(struct work_struct *work);
extern __printf(1, 2) void set_worker_desc(const char *fmt, ...);
extern void print_worker_info(const char *log_lvl, struct task_struct *task);/*** queue_work - queue work on a workqueue* @wq: workqueue to use* @work: work to queue** Returns %false if @work was already on a queue, %true otherwise.** We queue the work to the CPU on which it was submitted, but if the CPU dies* it can be processed by another CPU.*/
static inline bool queue_work(struct workqueue_struct *wq,struct work_struct *work)
{return queue_work_on(WORK_CPU_UNBOUND, wq, work);
}/*** queue_delayed_work - queue work on a workqueue after delay* @wq: workqueue to use* @dwork: delayable work to queue* @delay: number of jiffies to wait before queueing** Equivalent to queue_delayed_work_on() but tries to use the local CPU.*/
static inline bool queue_delayed_work(struct workqueue_struct *wq,struct delayed_work *dwork,unsigned long delay)
{return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
}/*** mod_delayed_work - modify delay of or queue a delayed work* @wq: workqueue to use* @dwork: work to queue* @delay: number of jiffies to wait before queueing** mod_delayed_work_on() on local CPU.*/
static inline bool mod_delayed_work(struct workqueue_struct *wq,struct delayed_work *dwork,unsigned long delay)
{return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
}/*** schedule_work_on - put work task on a specific cpu* @cpu: cpu to put the work task on* @work: job to be done** This puts a job on a specific cpu*/
static inline bool schedule_work_on(int cpu, struct work_struct *work)
{return queue_work_on(cpu, system_wq, work);
}/*** schedule_work - put work task in global workqueue* @work: job to be done** Returns %false if @work was already on the kernel-global workqueue and* %true otherwise.** This puts a job in the kernel-global workqueue if it was not already* queued and leaves it in the same position on the kernel-global* workqueue otherwise.*/
static inline bool schedule_work(struct work_struct *work)
{return queue_work(system_wq, work);
}/*** schedule_delayed_work_on - queue work in global workqueue on CPU after delay* @cpu: cpu to use* @dwork: job to be done* @delay: number of jiffies to wait** After waiting for a given time this puts a job in the kernel-global* workqueue on the specified CPU.*/
static inline bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,unsigned long delay)
{return queue_delayed_work_on(cpu, system_wq, dwork, delay);
}/*** schedule_delayed_work - put work task in global workqueue after delay* @dwork: job to be done* @delay: number of jiffies to wait or 0 for immediate execution** After waiting for a given time this puts a job in the kernel-global* workqueue.*/
static inline bool schedule_delayed_work(struct delayed_work *dwork,unsigned long delay)
{return queue_delayed_work(system_wq, dwork, delay);
}/*** keventd_up - is workqueue initialized yet?*/
static inline bool keventd_up(void)
{return system_wq != NULL;
}/** Like above, but uses del_timer() instead of del_timer_sync(). This means,* if it returns 0 the timer function may be running and the queueing is in* progress.*/
static inline bool __deprecated __cancel_delayed_work(struct delayed_work *work)
{bool ret;ret = del_timer(&work->timer);if (ret)work_clear_pending(&work->work);return ret;
}/* used to be different but now identical to flush_work(), deprecated */
static inline bool __deprecated flush_work_sync(struct work_struct *work)
{return flush_work(work);
}/* used to be different but now identical to flush_delayed_work(), deprecated */
static inline bool __deprecated flush_delayed_work_sync(struct delayed_work *dwork)
{return flush_delayed_work(dwork);
}#ifndef CONFIG_SMP
static inline long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
{return fn(arg);
}
#else
long work_on_cpu(int cpu, long (*fn)(void *), void *arg);
#endif /* CONFIG_SMP */#ifdef CONFIG_FREEZER
extern void freeze_workqueues_begin(void);
extern bool freeze_workqueues_busy(void);
extern void thaw_workqueues(void);
#endif /* CONFIG_FREEZER */#ifdef CONFIG_SYSFS
int workqueue_sysfs_register(struct workqueue_struct *wq);
#else /* CONFIG_SYSFS */
static inline int workqueue_sysfs_register(struct workqueue_struct *wq)
{ return 0; }
#endif /* CONFIG_SYSFS */#endif
(struct work_struct)
data: 傳遞給工作函數(shù)的參數(shù),可以是一個整型數(shù),但更常用的是指針
entry:構(gòu)成工作隊列的鏈表節(jié)點對象。
func: 工作函數(shù),工作線程取出工作隊列節(jié)點后執(zhí)行,data 會作為調(diào)用該函數(shù)的參數(shù)常用的與工作隊列相關(guān)的宏和函數(shù)如下。
????????DECLARE_WORK:靜態(tài)定義一個工作隊列節(jié)點,n 是節(jié)點的名字,f是工作函數(shù)。
????????DECLARE_DELAYED_WORK:靜態(tài)定義一個延遲的工作隊列節(jié)點。
????????INIT_WORK:常用于動態(tài)分配的工作隊列節(jié)點的初始化。
????????schedule_work:將工作隊列節(jié)點加入到內(nèi)核定義的全局工作隊列中
????????schedule_delayed_work:在 delay 指定的時間后將一個延遲工作隊列節(jié)點加入到全局的工作隊列中。
????????下面是使用工作隊列來實現(xiàn)中斷下半部的相關(guān)代碼
?
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/kfifo.h>#include <linux/ioctl.h>
#include <linux/uaccess.h>#include <linux/wait.h>
#include <linux/sched.h>
#include <linux/poll.h>
#include <linux/aio.h>#include <linux/interrupt.h>
#include <linux/random.h>#include "vser.h"#define VSER_MAJOR 256
#define VSER_MINOR 0
#define VSER_DEV_CNT 1
#define VSER_DEV_NAME "vser"struct vser_dev {unsigned int baud;struct option opt;struct cdev cdev;wait_queue_head_t rwqh;wait_queue_head_t wwqh;struct fasync_struct *fapp;
};DEFINE_KFIFO(vsfifo, char, 32);
static struct vser_dev vsdev;static void vser_work(struct work_struct *work);
DECLARE_WORK(vswork, vser_work);static int vser_fasync(int fd, struct file *filp, int on);static int vser_open(struct inode *inode, struct file *filp)
{return 0;
}static int vser_release(struct inode *inode, struct file *filp)
{vser_fasync(-1, filp, 0);return 0;
}static ssize_t vser_read(struct file *filp, char __user *buf, size_t count, loff_t *pos)
{int ret;unsigned int copied = 0;if (kfifo_is_empty(&vsfifo)) {if (filp->f_flags & O_NONBLOCK)return -EAGAIN;if (wait_event_interruptible_exclusive(vsdev.rwqh, !kfifo_is_empty(&vsfifo)))return -ERESTARTSYS;}ret = kfifo_to_user(&vsfifo, buf, count, &copied);if (!kfifo_is_full(&vsfifo)) {wake_up_interruptible(&vsdev.wwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_OUT);}return ret == 0 ? copied : ret;
}static ssize_t vser_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos)
{int ret;unsigned int copied = 0;if (kfifo_is_full(&vsfifo)) {if (filp->f_flags & O_NONBLOCK)return -EAGAIN;if (wait_event_interruptible_exclusive(vsdev.wwqh, !kfifo_is_full(&vsfifo)))return -ERESTARTSYS;}ret = kfifo_from_user(&vsfifo, buf, count, &copied);if (!kfifo_is_empty(&vsfifo)) {wake_up_interruptible(&vsdev.rwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_IN);}return ret == 0 ? copied : ret;
}static long vser_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{if (_IOC_TYPE(cmd) != VS_MAGIC)return -ENOTTY;switch (cmd) {case VS_SET_BAUD:vsdev.baud = arg;break;case VS_GET_BAUD:arg = vsdev.baud;break;case VS_SET_FFMT:if (copy_from_user(&vsdev.opt, (struct option __user *)arg, sizeof(struct option)))return -EFAULT;break;case VS_GET_FFMT:if (copy_to_user((struct option __user *)arg, &vsdev.opt, sizeof(struct option)))return -EFAULT;break;default:return -ENOTTY;}return 0;
}static unsigned int vser_poll(struct file *filp, struct poll_table_struct *p)
{int mask = 0;poll_wait(filp, &vsdev.rwqh, p);poll_wait(filp, &vsdev.wwqh, p);if (!kfifo_is_empty(&vsfifo))mask |= POLLIN | POLLRDNORM;if (!kfifo_is_full(&vsfifo))mask |= POLLOUT | POLLWRNORM;return mask;
}static ssize_t vser_aio_read(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos)
{size_t read = 0;unsigned long i;ssize_t ret;for (i = 0; i < nr_segs; i++) {ret = vser_read(iocb->ki_filp, iov[i].iov_base, iov[i].iov_len, &pos);if (ret < 0)break;read += ret;}return read ? read : -EFAULT;
}static ssize_t vser_aio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos)
{size_t written = 0;unsigned long i;ssize_t ret;for (i = 0; i < nr_segs; i++) {ret = vser_write(iocb->ki_filp, iov[i].iov_base, iov[i].iov_len, &pos);if (ret < 0)break;written += ret;}return written ? written : -EFAULT;
}static int vser_fasync(int fd, struct file *filp, int on)
{return fasync_helper(fd, filp, on, &vsdev.fapp);
}static irqreturn_t vser_handler(int irq, void *dev_id)
{schedule_work(&vswork);return IRQ_HANDLED;
}static void vser_work(struct work_struct *work)
{char data;get_random_bytes(&data, sizeof(data));data %= 26;data += 'A';if (!kfifo_is_full(&vsfifo))if(!kfifo_in(&vsfifo, &data, sizeof(data)))printk(KERN_ERR "vser: kfifo_in failure\n");if (!kfifo_is_empty(&vsfifo)) {wake_up_interruptible(&vsdev.rwqh);kill_fasync(&vsdev.fapp, SIGIO, POLL_IN);}
}static struct file_operations vser_ops = {.owner = THIS_MODULE,.open = vser_open,.release = vser_release,.read = vser_read,.write = vser_write,.unlocked_ioctl = vser_ioctl,.poll = vser_poll,.aio_read = vser_aio_read,.aio_write = vser_aio_write,.fasync = vser_fasync,
};static int __init vser_init(void)
{int ret;dev_t dev;dev = MKDEV(VSER_MAJOR, VSER_MINOR);ret = register_chrdev_region(dev, VSER_DEV_CNT, VSER_DEV_NAME);if (ret)goto reg_err;cdev_init(&vsdev.cdev, &vser_ops);vsdev.cdev.owner = THIS_MODULE;vsdev.baud = 115200;vsdev.opt.datab = 8;vsdev.opt.parity = 0;vsdev.opt.stopb = 1;ret = cdev_add(&vsdev.cdev, dev, VSER_DEV_CNT);if (ret)goto add_err;init_waitqueue_head(&vsdev.rwqh);init_waitqueue_head(&vsdev.wwqh);ret = request_irq(167, vser_handler, IRQF_TRIGGER_HIGH | IRQF_SHARED, "vser", &vsdev);if (ret)goto irq_err;return 0;irq_err:cdev_del(&vsdev.cdev);
add_err:unregister_chrdev_region(dev, VSER_DEV_CNT);
reg_err:return ret;
}static void __exit vser_exit(void)
{dev_t dev;dev = MKDEV(VSER_MAJOR, VSER_MINOR);free_irq(167, &vsdev);cdev_del(&vsdev.cdev);unregister_chrdev_region(dev, VSER_DEV_CNT);
}module_init(vser_init);
module_exit(vser_exit);MODULE_LICENSE("GPL");
MODULE_AUTHOR("name <email>");
MODULE_DESCRIPTION("A simple character device driver");
MODULE_ALIAS("virtual-serial");
????????代碼第40行定義了一個工作隊列節(jié)點,名字叫 vswork,工作函數(shù)是 vser_work。和tasklet 類似,中斷上半部調(diào)度了工作然后就返回了,以前中斷上半部的事情交由下半部的工作函數(shù)來處理。
????????最后,對工作隊列的主要特性進行一些總結(jié)。
????????(1)工作隊列的工作函數(shù)運行在進程上下文,可以調(diào)度調(diào)度器
????????(2) 如果上一個工作還沒有完成,又重新調(diào)度下一個工作,那么新的工作將不會被調(diào)度。