多线程的 pipeline 设计模式

Terark-CTO-雷鹏

于 2008-04-22 16:49:00 发布

阅读量402

点赞数

分类专栏： C++ 文章标签：设计模式多线程 thread vim 框架

C++ 专栏收录该内容

85 篇文章

订阅专栏

本文介绍了一种针对大量HTML网页的高效处理方案，利用多线程技术和消息队列实现网页内容的读取、解析及统计，并详细阐述了设计模式与实现框架。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一个简单例子：

有很多个html网页，网页的id、title、url、path等信息存在一个数据库表中，网页内容存储在一个磁盘阵列上。现在要把所有网页都读出来，统计其中的html标签、正文等信息，并写入另一个数据库表，怎样的设计最好呢？

一般的想法是使用多个平行的线程，每个线程处理某个ID范围的网页。但是仔细分析就可以发现，对每个网页的处理可以分为以下处理步骤：

读取数据库行
读取文件内容
解析html，生成统计数据
将统计结果写入数据库

这几个处理步骤有各自的特征，读取数据库的时间一般主要消耗在数据库服务器响应，读取文件内容一般主要消耗在磁盘IO上，解析、统计消耗在计算上，写统计结果也消耗在数据库服务器响应上。如果我们为这几个过程建立各自的线程，每个任务通过消息队列来传递。就得到如下设计：

在这个设计中，每个处理过程可以根据需要设置不同的线程数，这个例子中，数据库不会是瓶颈，只剩下读文件和计算，如果文件IO够快（如果网页存在不同的阵列上），那么可以增加计算线程（服务器一般都是多CPU的）来达到平衡。

一些例子或许还会有更多的处理步骤。

可以从中得出一个设计模式，甚至可以直接写出实现框架的类：

pipeline.h

/* vim: set tabstop=4 : */

#ifndef __febird_pipeline_h__

#define __febird_pipeline_h__

#if defined(_MSC_VER) && (_MSC_VER >= 1020)

# pragma once

# pragma warning(push)

# pragma warning(disable: 4018)

# pragma warning(disable: 4267)

#endif

#include <vector>

#include <queue>

#include <string>

#include <boost/thread.hpp>

#include "../thread/ConcurrentQueue.h"

//#include "../thread/LockSentry.h"

//#include "../thread/thread.h"

namespace febird { namespace thread {

class PipelineTask

{

public:

virtual ~PipelineTask();

};

class PipelineMultiTask : public PipelineTask

{

public:

// PipelineMultiTask(size_t size = 10);

std::vector<PipelineTask*> m_tasks;

virtual ~PipelineMultiTask();

};

class PipelineStep;

class PipelineThread;

class PipelineProcessor;

class PipelineStep

{

friend class PipelineThread;

friend class PipelineProcessor;

public:

typedef ConcurrentQueue<std::queue<PipelineTask*> > queue_t;

protected:

queue_t* m_out_queue;

PipelineStep *m_prev, *m_next;

PipelineProcessor* m_owner;

std::vector<PipelineThread*> m_threads;

bool m_batchProcess;

void process_wrapper(int threadno, PipelineTask*& task);

void run_wrapper(PipelineThread* pthread);

void run_step_first(PipelineThread* pthread);

void run_step_last(PipelineThread* pthread);

void run_step_mid(PipelineThread* pthread);

bool isPrevRunning();

bool isRunning();

void start(int queue_size);

void join();

protected:

virtual void process(int threadno, PipelineTask*& task) = 0;

virtual void setup(int threadno);

virtual void clean(int threadno);

virtual void run(PipelineThread* pthread);

virtual void onException(int threadno, const std::exception& exp);

public:

std::string m_step_name;

PipelineStep();

PipelineStep(int thread_count, bool batchProcess = false);

virtual ~PipelineStep();

int step_ordinal() const;

const std::string& err(int threadno) const;

// helper functions:

std::string msg_leading(int threadno) const;

boost::mutex* getMutex() const;

queue_t* getInQueue() const { return m_prev->m_out_queue; }

queue_t* getOutQueue() const { return m_out_queue; }

void stop();

};

class FunPipelineStep : public PipelineStep

{

boost::function3<void, PipelineStep*, int, PipelineTask*&> m_process; // take(this, threadno, task)

boost::function2<void, PipelineStep*, int> m_setup; // take(this, threadno)

boost::function2<void, PipelineStep*, int> m_clean; // take(this, threadno)

void process(int threadno, PipelineTask*& task);

void setup(int threadno);

void clean(int threadno);

void default_setup(int threadno);

void default_clean(int threadno);

static void static_default_setup(PipelineStep* self, int threadno);

static void static_default_clean(PipelineStep* self, int threadno);

public:

FunPipelineStep(int thread_count,

const boost::function3<void, PipelineStep*, int, PipelineTask*&>& fprocess,

const boost::function2<void, PipelineStep*, int>& fsetup,

const boost::function2<void, PipelineStep*, int>& fclean);

FunPipelineStep(int thread_count,

const boost::function3<void, PipelineStep*, int, PipelineTask*&>& fprocess,

const std::string& step_name = ""

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。