做的網(wǎng)站怎么放到域名北京搜索引擎優(yōu)化
文章目錄
- 0x00 準(zhǔn)備
- 0x01 MapReduce簡(jiǎn)介
- 0x02 RPC
- 0x03 調(diào)試
- 0x04 代碼
- coordinator.go
- rpc.go
- worker.go
0x00 準(zhǔn)備
- 閱讀MapReduce論文
- 配置GO環(huán)境
因?yàn)橹皼](méi)用過(guò)GO,所以 先在網(wǎng)上學(xué)了一下語(yǔ)法A Tour of Go
感覺(jué)Go的接口和方法的語(yǔ)法和C++挺不一樣, 并發(fā)編程也挺有意思
0x01 MapReduce簡(jiǎn)介
需要實(shí)現(xiàn)master和coordinator。
MapReduce分為兩個(gè)階段:Map和Reduce階段。
Map階段函數(shù)提供Key,比如pg-being_ernest.txt
是key,然后Worker通過(guò)這個(gè)Key獲取Value。比如pg-being_ernest.txt
的具體內(nèi)容。然后將Key和Value(在例子中是文章的內(nèi)容),傳遞給map function。獲取結(jié)果,并將結(jié)果分成R個(gè)Reduce內(nèi)容。
舉個(gè)例子。假設(shè)我們要對(duì)pg-being_ernest.txt
和pg-dorian_gray.txt
統(tǒng)計(jì)詞頻。那么就要有兩個(gè)Map Task
(不一定有兩個(gè)Worker,比如有3個(gè)Worker,那么就是2個(gè)Worker干活一個(gè)圍觀;如果只有一個(gè)Worker,那么該Worker會(huì)被前后分配兩次Map操作)。假設(shè)有3個(gè)Reduce操作,那么Map
的中間操作就會(huì)按照key被分為3個(gè)文件。
pg-being_ernest.txt
對(duì)應(yīng)Map0 , Map0操作的kv被分進(jìn)mr-0-0,mr-0-1,mr-0-2
pg-dorian_gray.txt
對(duì)應(yīng)Map0 , Map0操作的kv被分進(jìn)mr-1-0,mr-1-1,mr-1-2
當(dāng)所有的Mapf已經(jīng)生成結(jié)果,Worker就會(huì)被指派Reduce操作。比如被指派的Reduce操作編號(hào)為2,那么Reduce就會(huì)讀取mr-0-2
,mr-1-2
。并且聚合相同的Key,傳遞給Reduce函數(shù)。
比如,pg-being_ernest.txt
中的map操作有kv,a 1 b 1 b 1
輸出到mr-0-2
。pg-dorian_gray.txt
中的map操作有kv,c 1 b 1 c 1
輸出到mr-0-2
。
然后Task編號(hào)為2的Reduce任務(wù)會(huì)讀取所有對(duì)應(yīng)的中間文件。得到key。a 1 b 1 b 1 c 1 b 1 c 1
。然后再對(duì)要處理的key進(jìn)行排序,得到 a 1 b 1 b 1 b 1 c 1 c 1
。再按照相同的key調(diào)用reduce函數(shù)。
上面例子的調(diào)用為
reducef(key:"a",value:list[1])
,得到"1"
reducef(key:"b",value:list[1,1,1])
,得到"3"
reducef(key:"c",value:list[1,1])
,得到"2"最后將kvs:[{“a”,“1”},{“b”,“3”},{“c”,“2”}]寫入該reduce生成的文件
mr-out-2
0x02 RPC
使用GO的RPC庫(kù),可以簡(jiǎn)單地實(shí)現(xiàn)Server
學(xué)習(xí)時(shí)參考了Go 每日一庫(kù)之 rpc - 知乎 (zhihu.com)
在MapReduce操作流程中就是:
- 首先啟動(dòng)多個(gè)Worker(以下簡(jiǎn)稱C)和一個(gè)Coordinator(以下簡(jiǎn)稱S)
- C每隔一段時(shí)間(比如1s)會(huì)向S發(fā)送一個(gè)任務(wù)請(qǐng)求
- S首先檢查Map任務(wù)還有沒(méi)有分配完(注意不是運(yùn)行完)。如果沒(méi)有,分配一個(gè)Map任務(wù)給C
- 如果Map任務(wù)分配完了,并且還沒(méi)有工作完,S讓C等待
- 如果Map工作完了。Reduce還沒(méi)分配完了,S給C分配一個(gè)空閑的Reduce任務(wù)
- 如果Reduce都工作完了,所有任務(wù)也都結(jié)束了。
- 如果C完成了任務(wù),會(huì)向S發(fā)送一個(gè)請(qǐng)求。S知道了某個(gè)任務(wù)完成,就會(huì)進(jìn)行相應(yīng)的操作標(biāo)記。
一些注意的點(diǎn):
每個(gè)任務(wù)是有時(shí)間上限的(10s)。每分配一個(gè)任務(wù)就會(huì)啟動(dòng)一個(gè)GO程,然后等待相應(yīng)的時(shí)間,檢查是否完成了工作。如果沒(méi)完成,將該任務(wù)編號(hào)重新加入管道。
如何判斷一個(gè)任務(wù)是否完成呢?
比如第一個(gè)Worker申請(qǐng)到了任務(wù)1,過(guò)了10s鐘還沒(méi)有完成,S又將任務(wù)1加入待完成管道。此時(shí)第2個(gè)worker申請(qǐng)到了任務(wù)1,又過(guò)了4s,第一個(gè)Worker發(fā)送一個(gè)MapDone的請(qǐng)求給S。S如何判斷是否完成了該任務(wù)。
我的處理是維護(hù)任務(wù)是由哪個(gè)Worker運(yùn)行的狀態(tài)。其中Worker由RPC的時(shí)間戳標(biāo)記。比如worker1在第一次請(qǐng)求時(shí)時(shí)間戳為13213123
,Server維護(hù)maptask[1]是由13213123
正在運(yùn)行,當(dāng)?shù)谝淮纬瑫r(shí),maptask[1]變成了worker2請(qǐng)求時(shí)的時(shí)間戳``13219889。在第14s,收到MapDone的請(qǐng)求,檢查其時(shí)間戳為
13213123`和當(dāng)前正在運(yùn)行的時(shí)間戳不同,所以丟棄掉該結(jié)果。
還有就是并發(fā)處理,這個(gè)使用鎖就行了。
0x03 調(diào)試
- 命令行的參數(shù):(因?yàn)椴挥胹hell的話不能用通配符pg*.txt代替,只能輸入所有文件名)
pg-being_ernest.txt
pg-dorian_gray.txt
pg-frankenstein.txt
pg-grimm.txt
pg-huckleberry_finn.txt
pg-metamorphosis.txt
pg-sherlock_holmes.txt
pg-tom_sawyer.txt
- 在調(diào)試時(shí)出現(xiàn)報(bào)錯(cuò)
cannot load plugin ./wc.so err: plugin.Open("./wc"): plugin was built with a different version of package internal/abi
是因?yàn)?code>build wc.so時(shí)的參數(shù)和運(yùn)行mr參數(shù)不一致導(dǎo)致的。
- 使用
./test-mr-many.sh 3
重復(fù)測(cè)試3次。通過(guò)測(cè)試
感覺(jué)Lab1做下來(lái)還是挺通透。像是引入GO和相關(guān)概念。通過(guò)lab,學(xué)習(xí)到了GO調(diào)試。
0x04 代碼
coordinator.go
package mrimport ("log""sync""time"
)
import "net"
import "os"
import "net/rpc"
import "net/http"type status int // 用于指示worker的狀態(tài)const (notStart status = iotarunningtaskDone
)
const workMaxTime = 12 * time.Secondtype Coordinator struct {// Your definitions here.nReduce int // Reduce數(shù)量mMap int // Map數(shù)量taskDone boolreduceTaskStatus []statusmapTaskStatus []status// runningMap 是當(dāng)前正在running的rpcId// 想一下這種情況:第一個(gè)worker沒(méi)有在10秒內(nèi)返回結(jié)果,于是master開(kāi)始把同樣的任務(wù)返回給了第二個(gè)worker,此時(shí)又過(guò)了幾秒,比如兩秒鐘// 那么master如何判斷是第二個(gè)worker完成了任務(wù),還是第一個(gè)worker呢?runningMap []RpcIdTrunningReduce []RpcIdTmapTasks chan TaskIdT // 待開(kāi)始的mapreduceTasks chan TaskIdT // 待開(kāi)始的reducefiles []string // 要進(jìn)行task的文件mapCnt int // 已完成的map數(shù)量reduceCnt int // 已完成的reduce數(shù)量latch *sync.Cond
}// Your code here -- RPC handlers for the worker to call.// Example
// an example RPC handler.
//
// the RPC argument and reply types are defined in rpc.go.
func (c *Coordinator) Example(args *ExampleArgs, reply *ExampleReply) error {reply.Y = args.X + 1return nil
}// Appoint 用于worker請(qǐng)求一個(gè)任務(wù)
func (c *Coordinator) Appoint(request *ReqArgs, reply *ResArgs) error {reply.ResId = request.ReqIdreply.MapNumM = c.mMapreply.ReduceNumN = c.nReducec.latch.L.Lock()done := c.taskDonec.latch.L.Unlock()if done {reply.ResOp = WorkDonereturn nil}switch request.ReqOp {case WorkReq:{// 請(qǐng)求一個(gè)任務(wù)c.latch.L.Lock()if len(c.mapTasks) > 0 {// 如果map任務(wù)還沒(méi)有完全分配 分配一個(gè)map workertaskId := <-c.mapTasksreply.ResTaskId = taskIdreply.ResContent = c.files[taskId]reply.ResOp = WorkMapc.runningMap[taskId] = reply.ResIdc.mapTaskStatus[taskId] = runningc.latch.L.Unlock()go c.checkDone(WorkMap, reply.ResTaskId)log.Printf("Assign map \t%d to \t%d\n", reply.ResTaskId, reply.ResId)return nil}if c.mapCnt < c.mMap {// 如果map任務(wù)已經(jīng)全部分配完了,但是還沒(méi)有運(yùn)行完成,還不能開(kāi)始reduce// worker需要暫時(shí)等待一下reply.ResOp = WorkNothingc.latch.L.Unlock()log.Println("Map All assigned but not done")return nil}if len(c.reduceTasks) > 0 {// 已經(jīng)確定完成了所有map,還沒(méi)有分配完reducetaskId := <-c.reduceTasksreply.ResTaskId = taskIdreply.ResOp = WorkReducec.runningReduce[taskId] = reply.ResIdc.reduceTaskStatus[taskId] = runningc.latch.L.Unlock()go c.checkDone(WorkReduce, reply.ResTaskId)log.Printf("Assign reduce \t%d to \t%d\n", reply.ResTaskId, reply.ResId)return nil}// 如果分配完了所有的reduce,但是還沒(méi)有done.worker需要等待reply.ResOp = WorkNothinglog.Println("Reduce All assigned but not done")c.latch.L.Unlock()return nil}case WorkMapDone:{c.latch.L.Lock()defer c.latch.L.Unlock()if c.runningMap[request.ReqTaskId] != request.ReqId || c.mapTaskStatus[request.ReqTaskId] != running {// 說(shuō)明該map已經(jīng)被abortreply.ResOp = WorkTerminatereturn nil}log.Printf("Work Map \t%d done by \t%d\n", request.ReqTaskId, request.ReqId)c.mapTaskStatus[request.ReqTaskId] = taskDonec.mapCnt++}case WorkReduceDone:{c.latch.L.Lock()defer c.latch.L.Unlock()if c.runningReduce[request.ReqTaskId] != request.ReqId || c.reduceTaskStatus[request.ReqTaskId] != running {// 說(shuō)明該map已經(jīng)被abortreply.ResOp = WorkTerminatereturn nil}c.reduceTaskStatus[request.ReqTaskId] = taskDonec.reduceCnt++log.Printf("Work Reduce \t%d done by \t%d\n", request.ReqTaskId, request.ReqId)if c.reduceCnt == c.nReduce {c.taskDone = truereply.ResOp = WorkDone}}default:return nil}return nil
}// start a thread that listens for RPCs from worker.go
func (c *Coordinator) server() {log.Println("Launching Server")e := rpc.Register(c)if e != nil {log.Fatal("register error:", e)}rpc.HandleHTTP()//l, e := net.Listen("tcp", ":1234")sockname := coordinatorSock()_ = os.Remove(sockname)l, e := net.Listen("unix", sockname)go func(l net.Listener) {for {time.Sleep(5 * time.Second)if c.Done() {err := l.Close()if err != nil {log.Fatal("close error:", err)}}}}(l)if e != nil {log.Fatal("listen error:", e)}go func() {err := http.Serve(l, nil)if err != nil {log.Fatal("server error:", err)}}()
}// Done main/mrcoordinator.go calls Done() periodically to find out
// if the entire job has finished.
func (c *Coordinator) Done() bool {c.latch.L.Lock()defer c.latch.L.Unlock()// Your code here.return c.taskDone
}// checkDone 檢查任務(wù)是否完成
func (c *Coordinator) checkDone(workType WorkType, t TaskIdT) {time.Sleep(workMaxTime)c.latch.L.Lock()defer c.latch.L.Unlock()switch workType {case WorkMap:{if c.mapTaskStatus[t] != taskDone {c.mapTaskStatus[t] = notStartc.mapTasks <- t}}case WorkReduce:{if c.reduceTaskStatus[t] != taskDone {// 如果沒(méi)有完成任務(wù)c.reduceTaskStatus[t] = notStartc.reduceTasks <- t}}default:log.Panicf("Try Check Invalid WorkType %v\n", workType)}}// MakeCoordinator create a Coordinator.
// main/mrcoordinator.go calls this function.
// nReduce is the number of reduce tasks to use.
func MakeCoordinator(files []string, nReduce int) *Coordinator {log.Println("Launching Master Factory")c := Coordinator{}c.nReduce = nReducec.mMap = len(files) // 每個(gè)file對(duì)應(yīng)一個(gè)mapc.taskDone = falsec.files = filesc.mapTasks = make(chan TaskIdT, c.mMap)c.mapTaskStatus = make([]status, c.mMap)c.runningMap = make([]RpcIdT, c.mMap)c.reduceTaskStatus = make([]status, nReduce)c.reduceTasks = make(chan TaskIdT, nReduce)c.runningReduce = make([]RpcIdT, nReduce)c.latch = sync.NewCond(&sync.Mutex{})for i := 0; i < c.mMap; i++ {c.mapTasks <- TaskIdT(i)c.runningMap[i] = -1c.mapTaskStatus[i] = notStart}for i := 0; i < c.nReduce; i++ {c.reduceTasks <- TaskIdT(i)c.runningReduce[i] = -1c.reduceTaskStatus[i] = notStart}c.server()return &c
}
rpc.go
package mr//
// RPC definitions.
//
// remember to capitalize all names.
//import "os"
import "strconv"//
// example to show how to declare the arguments
// and reply for an RPC.
//type ExampleArgs struct {X int
}type ExampleReply struct {Y int
}
type RpcIdT int64 // RpcIdT 是通過(guò)時(shí)間戳生成的, 指示一個(gè)唯一的RpcId
type ReqArgs struct {ReqId RpcIdTReqOp WorkTypeReqTaskId TaskIdT
}// ResArgs 是RPC的返回
// Response
type ResArgs struct {ResId RpcIdTResOp WorkTypeResTaskId TaskIdT // 分配的任務(wù)編號(hào)ResContent stringReduceNumN int // 有n個(gè)reduceMapNumM int // 有M個(gè)map任務(wù)
}
type WorkType int// TaskIdT 是對(duì)任務(wù)的編號(hào)
type TaskIdT int// 枚舉工作類型
const (WorkNothing WorkType = iotaWorkReq // worker申請(qǐng)工作WorkMap // 分配worker進(jìn)行map操作WorkReduce // 分配worker進(jìn)行reduce操作WorkDone // [[unused]]master所有的工作完成WorkTerminate // 工作中止WorkMapDone // Worker完成了map操作WorkReduceDone // Worker完成了reduce操作
)// Rpc exports struct we need
type Rpc struct {Req ReqArgsRes ResArgs
}// Cook up a unique-ish UNIX-domain socket name
// in /var/tmp, for the coordinator.
// Can't use the current directory since
// Athena AFS doesn't support UNIX-domain sockets.
func coordinatorSock() string {s := "/var/tmp/824-mr-"s += strconv.Itoa(os.Getuid())return s
}
worker.go
package mrimport ("encoding/json""fmt""io""os""sort""strconv""time"
)
import "log"
import "net/rpc"
import "hash/fnv"const sleepTime = 500 * time.Millisecond// KeyValue
// Map functions return a slice of KeyValue
type KeyValue struct {Key stringValue string
}
type ByKey []KeyValue// Len 通過(guò)HashKey進(jìn)行排序
func (a ByKey) Len() int { return len(a) }
func (a ByKey) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
func (a ByKey) Less(i, j int) bool { return ihash(a[i].Key) < ihash(a[j].Key) }// use ihash(key) % NReduce to choose the reduce
// task number for each KeyValue emitted by Map.
func ihash(key string) int {h := fnv.New32a()_, err := h.Write([]byte(key))if err != nil {return 0}return int(h.Sum32() & 0x7fffffff)
}// Worker
// main/mrworker.go calls this function.
func Worker(mapf func(string, string) []KeyValue,reducef func(string, []string) string) {// Your worker implementation here.for {timeStamp := time.Now().Unix()rpcId := RpcIdT(timeStamp)req := ReqArgs{}req.ReqId = rpcIdreq.ReqOp = WorkReq // 請(qǐng)求一個(gè)工作res := ResArgs{}ok := call("Coordinator.Appoint", &req, &res)if !ok {// 如果Call發(fā)生錯(cuò)誤log.Println("Maybe Coordinator Server has been closed")return}switch res.ResOp {case WorkDone:// 所有工作已經(jīng)完成returncase WorkMap:doMap(rpcId, &res, mapf)case WorkReduce:doReduce(rpcId, &res, reducef)case WorkNothing:// 等待time.Sleep(sleepTime)default:break}time.Sleep(sleepTime)}
}
func doMap(rpcId RpcIdT, response *ResArgs, mapf func(string, string) []KeyValue) {// filename 是response中的文件名filename := response.ResContentfile, err := os.Open(filename)if err != nil {log.Fatalf("cannot open %v", filename)}defer func(file *os.File) {_ = file.Close()}(file)// content讀取該文件中的所有內(nèi)容content, err := io.ReadAll(file)if err != nil {log.Fatalf("cannot read %v", filename)}kvs := mapf(filename, string(content))// 需要將kv輸出到n路 中間文件中ofiles := make([]*os.File, response.ReduceNumN)encoders := make([]*json.Encoder, response.ReduceNumN)for i := 0; i < response.ReduceNumN; i++ {// 這里輸出的名字是mr-ResTaskId-reduceN// 其中,ResTaskId是0~m的數(shù)字oname := "mr-" + strconv.Itoa(int(response.ResTaskId)) + "-" + strconv.Itoa(i)ofiles[i], err = os.Create(oname)if err != nil {log.Fatal("Can't Create Intermediate File: ", oname)}defer func(file *os.File, oname string) {err := file.Close()if err != nil {log.Fatal("Can't Close Intermediate File", oname)}}(ofiles[i], oname)encoders[i] = json.NewEncoder(ofiles[i])}for _, kv := range kvs {ri := ihash(kv.Key) % response.ReduceNumNerr := encoders[ri].Encode(kv)if err != nil {log.Fatal("Encode Error: ", err)return}}req := ReqArgs{ReqId: rpcId,ReqOp: WorkMapDone,ReqTaskId: response.ResTaskId,}res := ResArgs{}call("Coordinator.Appoint", &req, &res)
}func doReduce(rpcId RpcIdT, response *ResArgs, reducef func(string, []string) string) {rid := response.ResTaskId // 當(dāng)前reduce的編號(hào)var kva []KeyValuefor i := 0; i < response.MapNumM; i++ {// 讀取所有該rid的中間值func(mapId int) {// 讀取m-rid的中間值inputName := "mr-" + strconv.Itoa(i) + "-" + strconv.Itoa(int(rid))// 在當(dāng)前對(duì)應(yīng)r的輸出中,獲取所有keyifile, err := os.Open(inputName)if err != nil {log.Fatal("Can't open file: ", inputName)}defer func(file *os.File) {err := file.Close()if err != nil {log.Fatal("Can't close file: ", inputName)}}(ifile)dec := json.NewDecoder(ifile)for {var kv KeyValueif err := dec.Decode(&kv); err != nil {break}kva = append(kva, kv) //}}(i)}// 通過(guò)hashKey排序sort.Sort(ByKey(kva))intermediate := kva[:]oname := "mr-out-" + strconv.Itoa(int(rid))ofile, err := os.Create(oname)if err != nil {log.Fatal("Can't create file: ", oname)}defer func(ofile *os.File) {err := ofile.Close()if err != nil {log.Fatal("Can't close file: ", oname)}}(ofile)// log.Println("Total kv len: ", len(intermediate))// cnt := 0i := 0for i < len(intermediate) {j := i + 1for j < len(intermediate) && intermediate[j].Key == intermediate[i].Key {j++}var values []stringfor k := i; k < j; k++ {values = append(values, intermediate[k].Value)}// cnt++output := reducef(intermediate[i].Key, values)// this is the correct format for each line of Reduce output._, fprintf := fmt.Fprintf(ofile, "%v %v\n", intermediate[i].Key, output)if fprintf != nil {return}i = j}// log.Println("Unique key count: ", cnt)req := ReqArgs{ReqId: rpcId,ReqOp: WorkReduceDone,ReqTaskId: response.ResTaskId,}res := ResArgs{}call("Coordinator.Appoint", &req, &res)
}// CallExample
// example function to show how to make an RPC call to the coordinator.
//
// the RPC argument and reply types are defined in rpc.go.
func CallExample() {// declare an argument structure.args := ExampleArgs{}// fill in the argument(s).args.X = 99// declare a reply structure.reply := ExampleReply{}// send the RPC request, wait for the reply.call("Coordinator.Example", &args, &reply)// reply.Y should be 100.fmt.Printf("reply.Y %v\n", reply.Y)
}// send an RPC request to the coordinator, wait for the response.
// usually returns true.
// returns false if something goes wrong.
func call(rpcName string, args interface{}, reply interface{}) bool {// c, err := rpc.DialHTTP("tcp", "127.0.0.1"+":1234")sockname := coordinatorSock()c, err := rpc.DialHTTP("unix", sockname)if err != nil {log.Fatal("dialing:", err)}defer func(c *rpc.Client) {err := c.Close()if err != nil {log.Fatal("Close Client Error When RPC Calling", err)}}(c)err = c.Call(rpcName, args, reply)if err == nil {return true}fmt.Println(err)return false
}