做網(wǎng)站要霸屏嗎推廣營(yíng)銷
文章目錄
- 實(shí)驗(yàn)?zāi)康?/li>
- 實(shí)現(xiàn)
- 定義單詞對(duì)應(yīng)的種別碼
- 定義輸出形式:三元式
- python代碼實(shí)現(xiàn)
- 運(yùn)行結(jié)果
- 檢錯(cuò)處理
- 總結(jié)
實(shí)驗(yàn)?zāi)康?/h3>
輸入一個(gè)C語(yǔ)言代碼串,輸出單詞流,識(shí)別對(duì)象包含關(guān)鍵字、標(biāo)識(shí)符、整型浮點(diǎn)型字符串型常數(shù)、科學(xué)計(jì)數(shù)法、操作符和標(biāo)點(diǎn)、注釋等等。
實(shí)現(xiàn)
定義單詞對(duì)應(yīng)的種別碼
自行定義相關(guān)單詞的種別碼


定義輸出形式:三元式
# 三元式
class ThreeFml: # 三元式def __init__(self, syn, inPoint, value):self.syn = syn # 種別碼self.inPoint = inPoint # 內(nèi)碼值self.value = value # 自身值def __eq__(self, other): # 重載 判別對(duì)象相等的依據(jù)return self.syn == other.syn and self.value == other.valuedef __lt__(self, other): # 重載 比較對(duì)象大小關(guān)系的依據(jù)if self.syn == other.syn:return self.inPoint < other.inPointelse:return self.syn < other.syn
每個(gè)三元組用一個(gè)自定義類表示:
類屬性:種別碼syn、內(nèi)碼值inPoint、自身值value
類方法:
- 方法1:判斷兩個(gè)三元組相等的方法:種別碼syn和自身值value相等
- 方法2:確定展示時(shí)的先后順序的方法:先比較種別碼syn,再比較內(nèi)碼值inPoint
例如:
- 輸入:double a; int a;
- 輸出:

- 分析:有兩個(gè)標(biāo)識(shí)符a,根據(jù)類方法1,判斷前后兩個(gè)a為同一個(gè)三元組,因此不重復(fù)輸a。參見(jiàn)種別碼表,double為6,int為2,則根據(jù)類方法2,進(jìn)行三元組的展示排序。
python代碼實(shí)現(xiàn)
import re# 三元式
class ThreeFml: # 三元式def __init__(self, syn, inPoint, value):self.syn = syn # 種別碼self.inPoint = inPoint # 內(nèi)碼值self.value = value # 自身值def __eq__(self, other): # 重載 判別對(duì)象相等的依據(jù)return self.syn == other.syn and self.value == other.valuedef __lt__(self, other): # 重載 比較對(duì)象大小關(guān)系的依據(jù)if self.syn == other.syn:return self.inPoint < other.inPointelse:return self.syn < other.syn# 詞法識(shí)別
class WordAnalysis:def __init__(self, input_code_str):self.input_code_str = input_code_str # 源程序字符串self.code_char_list = [] # 源程序字符列表self.code_len = 0 # 源程序字符列表長(zhǎng)度self.cp = 0 # 源程序字符列表指針,方便遍歷字符串中的字符self.cur = '' # 當(dāng)前源程序字符列表的某個(gè)字符self.val = [] # 單詞自身的值self.syn = 0 # 單詞種別碼self.errInfo = "" # 錯(cuò)誤信息self.keyWords = ["main", "int", "short", "long", "float","double", "char", "string", "const", "void","struct", "if", "else", "switch", "case","default", "do", "while", "for", "continue","break", "cout", "cin", "endl", "scanf","printf", "return", 'catch', 'class', 'delete','enum', 'export', 'extern', 'false', 'friend','goto', 'inline', 'namespace', 'new', 'not','public', 'static', 'template', 'this', 'true','try', 'typedef', 'union', 'using', 'virtual','std', 'include', 'iostream'] # 關(guān)鍵字self.TFs = [] # 存儲(chǔ)三元式def nextChar(self): # 封裝cp++,簡(jiǎn)化函數(shù)scanWord中的代碼self.cp += 1self.cur = self.code_char_list[self.cp]def error(self, info): # errInfo錯(cuò)誤信息line = 1for i in range(0, self.cp + 1):if self.code_char_list[i] == '\n':line += 1self.errInfo = "第" + str(line) + "行報(bào)錯(cuò):" + infodef bracket_match(self):pattern = r'(\/\/.*?$|\/\*(.|\n)*?\*\/)' # 匹配單行或多行注釋comments = re.findall(pattern, self.input_code_str, flags=re.MULTILINE | re.DOTALL)comments = [comment[0].strip() for comment in comments] # 處理結(jié)果,去除多余的空格i = 0code_sub_com = [] # 去除注釋print(f"comment: {comments}")while i < len(self.input_code_str):ch = self.input_code_str[i]if ch == "/" and comments != []:i += len(comments[0])comments.pop(0)continuecode_sub_com.append((i, ch))i += 1pattern2 = r'"([^"]*)"' # 匹配雙引號(hào)包裹的字符串strings = re.findall(pattern2, self.input_code_str)code_sub_com_str = [] # 去除字符串變量i = 0while i < len(code_sub_com):item = code_sub_com[i]ch = item[1]if ch == "\"" and comments != []:i += len(strings[0]) + 2strings.pop(0)continuecode_sub_com_str.append(item)i += 1s = []stack = []mapping = {")": "(", "}": "{", "]": "["}for idx, char in code_sub_com_str:if char in mapping.keys() or char in mapping.values():s.append((idx, char))if not s:return "ok"for item in s:idx = item[0]char = item[1]if char in mapping.values(): # 左括號(hào)stack.append(item)elif char in mapping.keys(): # 右括號(hào)if not stack: # 棧為空,當(dāng)前右括號(hào)匹配不到return idxtopitem = stack[-1]topidx = topitem[0]topch = topitem[1]if mapping[char] != topch: # 當(dāng)前右括號(hào)匹配失敗return topidxelse:stack.pop()if not stack: # 棧為空,匹配完畢return "ok"else: # 棧不為空,只剩下左括號(hào)item = stack[0]idx = item[0]return idxdef scanWord(self): # 詞法分析# 初始化valueself.val = []self.syn = 0# ******獲取當(dāng)前有效字符(去除空白,直至掃描到第一個(gè)有效字符)******self.cur = self.code_char_list[self.cp]# print(f"==={self.cp} {self.code_len-1}===")while self.cur == ' ' or self.cur == '\n' or self.cur == '\t':self.cp += 1if self.cp >= self.code_len - 1:print(f"越界{self.cp}")return # 越界直接返回self.cur = self.code_char_list[self.cp]# ********************首字符為數(shù)字*****************if self.cur.isdigit():# ====首先默認(rèn)為整數(shù) ====i_value = 0while self.cur.isdigit(): # string數(shù)轉(zhuǎn)inti_value = i_value * 10 + int(self.cur)self.nextChar()six_flag = Falseif (self.cur == 'x' or self.cur == 'X') \and self.code_char_list[self.cp - 1] == '0': # 十六進(jìn)制整數(shù) 0x?????self.nextChar()six_flag = Trues = ""while self.cur.isdigit() or self.cur.isalpha():if self.cur.isalpha():if not (('a' <= self.cur <= 'f') or ('A' <= self.cur <= 'F')):self.syn = -999self.error("十六進(jìn)制中的字母不為:a~f 或 A~F")returns += self.curself.nextChar()i_value = int(s, 16) # 將16進(jìn)制數(shù)轉(zhuǎn)為整數(shù)self.syn = 201self.val = str(i_value) # int轉(zhuǎn)strif six_flag:return# ====有小數(shù)點(diǎn)或e,則為浮點(diǎn)數(shù)====d_value = i_value * 1.0if self.cur == '.':fraction = 0.1self.nextChar()while self.cur.isdigit(): # 計(jì)算小數(shù)位上的數(shù) 形如 123.45d_value += fraction * int(self.cur)fraction = fraction * 0.1self.nextChar()if self.cur == 'E' or self.cur == 'e': # 形如 123.4E?? 或 123.E??self.nextChar()powNum = 0if self.cur == '+': # 形如 123.4E+5self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumelif self.cur == '-': # 形如 123.4E-5self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value /= 10 ** powNumelif self.cur.isdigit(): # 形如 123.4E5while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumif self.cur.isalpha():self.syn = -999self.error(f"科學(xué)計(jì)數(shù)法后含有多余字母{self.cur}")returnself.syn = 202self.val = str(d_value) # double轉(zhuǎn)strelif self.cur == 'E' or self.cur == 'e': # 形如 123E??self.nextChar()powNum = 0if self.cur == '+': # 形如 123E+4self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumelif self.cur == '-': # 形如 123E-4self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value /= 10 ** powNumelif self.cur.isdigit(): # 形如 123E4while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumif self.cur.isalpha():self.syn = -999self.error(f"科學(xué)計(jì)數(shù)法后含有多余字母{self.cur}")returnself.syn = 202self.val = str(d_value)# ********************首字符為字母*****************elif self.cur.isalpha():# ====標(biāo)識(shí)符====while self.cur.isdigit() or self.cur.isalpha() or self.cur == '_':self.val.append(self.cur)self.nextChar()self.syn = 222# ====判斷是否為關(guān)鍵字====for i, keyword in enumerate(self.keyWords):if ''.join(self.val) == keyword:self.syn = i + 1break# ********************首字符為標(biāo)點(diǎn)*****************else:if self.cur == '+':self.syn = 101self.val.append(self.cur)self.nextChar()if self.cur == '+':self.syn = 131self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 136self.val.append(self.cur)self.nextChar()elif self.cur == '-':self.syn = 102self.val.append(self.cur)self.nextChar()if self.cur == '-':self.syn = 132self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 137self.val.append(self.cur)self.nextChar()elif self.cur == '*':self.syn = 103self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 138self.val.append(self.cur)self.nextChar()elif self.cur == '/':self.syn = 104self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 139self.val.append(self.cur)self.nextChar()# 單行注釋elif self.cur == '/':self.nextChar()while self.cur != '\n':self.nextChar()self.syn = 0# 多行注釋elif self.cur == '*':self.cp += 1haveEnd = Falseflag = 0for i in range(self.cp + 1, self.code_len):# print(self.code_char_list[i])if self.code_char_list[i - 1] == '*' and self.code_char_list[i] == '/':haveEnd = Trueflag = ibreakif haveEnd:self.syn = 0self.cp = flag + 1else:self.syn = -999self.error(" 多行注釋沒(méi)有結(jié)尾*/ ")elif self.cur == '%':self.syn = 105self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 140self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 106self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 118self.val.append(self.cur)self.nextChar()elif self.cur == '(':self.syn = 107self.val.append(self.cur)self.nextChar()elif self.cur == ')':self.syn = 108self.val.append(self.cur)self.nextChar()elif self.cur == '[':self.syn = 109self.val.append(self.cur)self.nextChar()elif self.cur == ']':self.syn = 110self.val.append(self.cur)self.nextChar()elif self.cur == '{':self.syn = 111self.val.append(self.cur)self.nextChar()elif self.cur == '}':self.syn = 112self.val.append(self.cur)self.nextChar()elif self.cur == ';':self.syn = 113self.val.append(self.cur)self.nextChar()elif self.cur == '>':self.syn = 114self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 116self.val.append(self.cur)self.nextChar()elif self.cur == '>':self.syn = 119self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 141self.val.append(self.cur)self.nextChar()elif self.cur == '<':self.syn = 115self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 117self.val.append(self.cur)self.nextChar()elif self.cur == '<':self.syn = 120self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 142self.val.append(self.cur)self.nextChar()elif self.cur == '!':self.syn = 121self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 122self.val.append(self.cur)self.nextChar()elif self.cur == '&':self.syn = 123self.val.append(self.cur)self.nextChar()if self.cur == '&':self.syn = 124self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 143self.val.append(self.cur)self.nextChar()elif self.cur == '|':self.syn = 125self.val.append(self.cur)self.nextChar()if self.cur == '|':self.syn = 126self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 144self.val.append(self.cur)self.nextChar()elif self.cur == '\\': # \self.syn = 127self.val.append(self.cur)self.nextChar()elif self.cur == '\'': # ‘self.syn = 128self.val.append(self.cur)self.nextChar()elif self.cur == '\"': # ”self.nextChar()haveEnd = Falseflag = 0for i in range(self.cp, self.code_len):if self.code_char_list[i] == '"':haveEnd = Trueflag = ibreakif haveEnd:for j in range(self.cp, flag):self.val.append(self.code_char_list[j])self.cp = flag + 1self.cur = self.code_char_list[self.cp]self.syn = 203else:self.syn = -999self.error(" string常量沒(méi)有閉合的\" ")elif self.cur == ':':self.syn = 130self.val.append(self.cur)self.nextChar()if self.cur == ':':self.syn = 134self.val.append(self.cur)self.nextChar()elif self.cur == ',':self.syn = 133self.val.append(self.cur)self.nextChar()elif self.cur == '^': # 按位異或self.syn = 146self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 145self.val.append(self.cur)self.nextChar()elif self.cur == '#':self.syn = 147self.val.append(self.cur)self.nextChar()else:self.syn = -999self.error(f" 無(wú)效字符: {self.cur}")def solve(self):print("\n================scan-main begin================")self.code_char_list = list(self.input_code_str.strip()) # 去除頭尾的空格self.code_char_list.append('\n') # 末尾補(bǔ)充一個(gè)\n, 可在一些while判斷中 防止越界self.code_len = len(self.code_char_list)if self.bracket_match() != "ok": # 檢測(cè)括號(hào)匹配self.cp = self.bracket_match()self.error(f"{self.code_char_list[self.cp]}匹配缺失!")intCnt, doubleCnt, stringCnt, idCnt = 0, 0, 0, 0 # 內(nèi)碼值while True: # 至少執(zhí)行一次,如同do whileself.scanWord() # 進(jìn)入詞法分析value = ''.join(self.val) # char列表 ===> Stringnew_tf = ThreeFml(self.syn, -1, value) # 創(chuàng)建三元式對(duì)象if self.syn == 201: # 整型常數(shù)# print(f"整型常數(shù): {value}")if not any(tf == new_tf for tf in self.TFs): # append前先判斷是否有重復(fù)intCnt += 1new_tf.inPoint = intCntself.TFs.append(new_tf)elif self.syn == 202: # 浮點(diǎn)型常數(shù)# print(f"浮點(diǎn)型常數(shù): {value}")if not any(tf == new_tf for tf in self.TFs):doubleCnt += 1new_tf.inPoint = doubleCntself.TFs.append(new_tf)elif self.syn == 203: # 字符串常數(shù)# print(f"字符串常數(shù): {value}")if not any(tf == new_tf for tf in self.TFs):stringCnt += 1new_tf.inPoint = stringCntself.TFs.append(new_tf)elif self.syn == 222: # 標(biāo)識(shí)符# print(f"標(biāo)識(shí)符: {value}")if not any(tf == new_tf for tf in self.TFs):idCnt += 1new_tf.inPoint = idCntself.TFs.append(new_tf)elif 1 <= self.syn <= 100: # 關(guān)鍵字# print(f"關(guān)鍵字: {value}")if not any(tf == new_tf for tf in self.TFs):new_tf.inPoint = 1self.TFs.append(new_tf)elif self.syn == 0: # 注釋內(nèi)容、或者最后的\n# print("注釋 or 結(jié)束")passelif self.syn == -999: # 報(bào)錯(cuò)# print(f"error: {self.errInfo}")breakelse: # 符號(hào):標(biāo)點(diǎn)符、算符# print(f"符號(hào): {value}")if not any(tf == new_tf for tf in self.TFs):new_tf.inPoint = 1self.TFs.append(new_tf)if self.cp >= (self.code_len - 1): # 最后一個(gè)元素 為自主添加的\n,代表結(jié)束# print(f"{cp} 跳出")breakif self.errInfo: # 檢查是否有報(bào)錯(cuò)print(self.errInfo)returnself.TFs.sort() # 給三元式列表TFs排序(按種別碼、內(nèi)碼)for tf in self.TFs: # 打印print(f"({tf.syn}, {tf.inPoint}, {tf.value})")print("================scan-main end================")if __name__ == '__main__':filepath = "./code.txt"with open(filepath, "r") as file:code = file.read()word_analysis = WordAnalysis(code)word_analysis.solve()
運(yùn)行結(jié)果
輸入:通過(guò)讀取txt文件輸入要分析的源程序串

輸出:

第一行表示注釋內(nèi)容
后面為三元式(種別碼,內(nèi)碼值,自身值)
從結(jié)果可以看到:
- 輸入代碼串中有兩個(gè)int關(guān)鍵字,并沒(méi)有重復(fù)輸出,只保留1個(gè)
- 輸入代碼串中識(shí)別到多個(gè)標(biāo)識(shí)符(我設(shè)定的種別碼為222),由于它們的值不同,所以在種別碼相同的情況下,給出不同的內(nèi)碼值。
檢錯(cuò)處理
括號(hào)匹配


字符串常量未閉合


多行注釋未閉合


十六進(jìn)制數(shù)不規(guī)范


科學(xué)計(jì)數(shù)法不規(guī)范


總結(jié)
體會(huì)
在本次的實(shí)驗(yàn)中,通過(guò)對(duì)詞法分析器的編寫,在理論的基礎(chǔ)上加深了對(duì)詞法分析的理解和實(shí)踐,所編寫的詞法分析器在多次的測(cè)試中均得到了正確的結(jié)果。
此外,我是先用c++編寫的代碼,確認(rèn)大部分功能無(wú)誤后,再改用python編寫。在改語(yǔ)言的過(guò)程中,明顯感受到python 的便利之處,就比如一個(gè)簡(jiǎn)單的判斷字符是否為字母,在c++里需要自定義一個(gè)函數(shù)來(lái)判斷(if ch>=’a’ and if ch<=’z’),而python則直接使用系統(tǒng)自帶的isalpha函數(shù)即可,大大簡(jiǎn)化了代碼量。
問(wèn)題
編寫的程序中,雖然已完成絕大部分單詞分析功能,但對(duì)一些小細(xì)節(jié)就沒(méi)有進(jìn)行直接的編寫。例如在識(shí)別用科學(xué)表示法表示的浮點(diǎn)型常量時(shí),并沒(méi)有考慮是否會(huì)溢出C++語(yǔ)言中的double類型,當(dāng)然,這也可以認(rèn)為是語(yǔ)義分析的任務(wù),而非詞法分析的任務(wù),但這可以是程序改善的一處地方。
附思路流程圖
總體邏輯:

主函數(shù)邏輯

輸入一個(gè)C語(yǔ)言代碼串,輸出單詞流,識(shí)別對(duì)象包含關(guān)鍵字、標(biāo)識(shí)符、整型浮點(diǎn)型字符串型常數(shù)、科學(xué)計(jì)數(shù)法、操作符和標(biāo)點(diǎn)、注釋等等。
自行定義相關(guān)單詞的種別碼
# 三元式
class ThreeFml: # 三元式def __init__(self, syn, inPoint, value):self.syn = syn # 種別碼self.inPoint = inPoint # 內(nèi)碼值self.value = value # 自身值def __eq__(self, other): # 重載 判別對(duì)象相等的依據(jù)return self.syn == other.syn and self.value == other.valuedef __lt__(self, other): # 重載 比較對(duì)象大小關(guān)系的依據(jù)if self.syn == other.syn:return self.inPoint < other.inPointelse:return self.syn < other.syn
每個(gè)三元組用一個(gè)自定義類表示:
類屬性:種別碼syn、內(nèi)碼值inPoint、自身值value
類方法:
- 方法1:判斷兩個(gè)三元組相等的方法:種別碼syn和自身值value相等
- 方法2:確定展示時(shí)的先后順序的方法:先比較種別碼syn,再比較內(nèi)碼值inPoint
例如:
- 輸入:double a; int a;
- 輸出:
- 分析:有兩個(gè)標(biāo)識(shí)符a,根據(jù)類方法1,判斷前后兩個(gè)a為同一個(gè)三元組,因此不重復(fù)輸a。參見(jiàn)種別碼表,double為6,int為2,則根據(jù)類方法2,進(jìn)行三元組的展示排序。
import re# 三元式
class ThreeFml: # 三元式def __init__(self, syn, inPoint, value):self.syn = syn # 種別碼self.inPoint = inPoint # 內(nèi)碼值self.value = value # 自身值def __eq__(self, other): # 重載 判別對(duì)象相等的依據(jù)return self.syn == other.syn and self.value == other.valuedef __lt__(self, other): # 重載 比較對(duì)象大小關(guān)系的依據(jù)if self.syn == other.syn:return self.inPoint < other.inPointelse:return self.syn < other.syn# 詞法識(shí)別
class WordAnalysis:def __init__(self, input_code_str):self.input_code_str = input_code_str # 源程序字符串self.code_char_list = [] # 源程序字符列表self.code_len = 0 # 源程序字符列表長(zhǎng)度self.cp = 0 # 源程序字符列表指針,方便遍歷字符串中的字符self.cur = '' # 當(dāng)前源程序字符列表的某個(gè)字符self.val = [] # 單詞自身的值self.syn = 0 # 單詞種別碼self.errInfo = "" # 錯(cuò)誤信息self.keyWords = ["main", "int", "short", "long", "float","double", "char", "string", "const", "void","struct", "if", "else", "switch", "case","default", "do", "while", "for", "continue","break", "cout", "cin", "endl", "scanf","printf", "return", 'catch', 'class', 'delete','enum', 'export', 'extern', 'false', 'friend','goto', 'inline', 'namespace', 'new', 'not','public', 'static', 'template', 'this', 'true','try', 'typedef', 'union', 'using', 'virtual','std', 'include', 'iostream'] # 關(guān)鍵字self.TFs = [] # 存儲(chǔ)三元式def nextChar(self): # 封裝cp++,簡(jiǎn)化函數(shù)scanWord中的代碼self.cp += 1self.cur = self.code_char_list[self.cp]def error(self, info): # errInfo錯(cuò)誤信息line = 1for i in range(0, self.cp + 1):if self.code_char_list[i] == '\n':line += 1self.errInfo = "第" + str(line) + "行報(bào)錯(cuò):" + infodef bracket_match(self):pattern = r'(\/\/.*?$|\/\*(.|\n)*?\*\/)' # 匹配單行或多行注釋comments = re.findall(pattern, self.input_code_str, flags=re.MULTILINE | re.DOTALL)comments = [comment[0].strip() for comment in comments] # 處理結(jié)果,去除多余的空格i = 0code_sub_com = [] # 去除注釋print(f"comment: {comments}")while i < len(self.input_code_str):ch = self.input_code_str[i]if ch == "/" and comments != []:i += len(comments[0])comments.pop(0)continuecode_sub_com.append((i, ch))i += 1pattern2 = r'"([^"]*)"' # 匹配雙引號(hào)包裹的字符串strings = re.findall(pattern2, self.input_code_str)code_sub_com_str = [] # 去除字符串變量i = 0while i < len(code_sub_com):item = code_sub_com[i]ch = item[1]if ch == "\"" and comments != []:i += len(strings[0]) + 2strings.pop(0)continuecode_sub_com_str.append(item)i += 1s = []stack = []mapping = {")": "(", "}": "{", "]": "["}for idx, char in code_sub_com_str:if char in mapping.keys() or char in mapping.values():s.append((idx, char))if not s:return "ok"for item in s:idx = item[0]char = item[1]if char in mapping.values(): # 左括號(hào)stack.append(item)elif char in mapping.keys(): # 右括號(hào)if not stack: # 棧為空,當(dāng)前右括號(hào)匹配不到return idxtopitem = stack[-1]topidx = topitem[0]topch = topitem[1]if mapping[char] != topch: # 當(dāng)前右括號(hào)匹配失敗return topidxelse:stack.pop()if not stack: # 棧為空,匹配完畢return "ok"else: # 棧不為空,只剩下左括號(hào)item = stack[0]idx = item[0]return idxdef scanWord(self): # 詞法分析# 初始化valueself.val = []self.syn = 0# ******獲取當(dāng)前有效字符(去除空白,直至掃描到第一個(gè)有效字符)******self.cur = self.code_char_list[self.cp]# print(f"==={self.cp} {self.code_len-1}===")while self.cur == ' ' or self.cur == '\n' or self.cur == '\t':self.cp += 1if self.cp >= self.code_len - 1:print(f"越界{self.cp}")return # 越界直接返回self.cur = self.code_char_list[self.cp]# ********************首字符為數(shù)字*****************if self.cur.isdigit():# ====首先默認(rèn)為整數(shù) ====i_value = 0while self.cur.isdigit(): # string數(shù)轉(zhuǎn)inti_value = i_value * 10 + int(self.cur)self.nextChar()six_flag = Falseif (self.cur == 'x' or self.cur == 'X') \and self.code_char_list[self.cp - 1] == '0': # 十六進(jìn)制整數(shù) 0x?????self.nextChar()six_flag = Trues = ""while self.cur.isdigit() or self.cur.isalpha():if self.cur.isalpha():if not (('a' <= self.cur <= 'f') or ('A' <= self.cur <= 'F')):self.syn = -999self.error("十六進(jìn)制中的字母不為:a~f 或 A~F")returns += self.curself.nextChar()i_value = int(s, 16) # 將16進(jìn)制數(shù)轉(zhuǎn)為整數(shù)self.syn = 201self.val = str(i_value) # int轉(zhuǎn)strif six_flag:return# ====有小數(shù)點(diǎn)或e,則為浮點(diǎn)數(shù)====d_value = i_value * 1.0if self.cur == '.':fraction = 0.1self.nextChar()while self.cur.isdigit(): # 計(jì)算小數(shù)位上的數(shù) 形如 123.45d_value += fraction * int(self.cur)fraction = fraction * 0.1self.nextChar()if self.cur == 'E' or self.cur == 'e': # 形如 123.4E?? 或 123.E??self.nextChar()powNum = 0if self.cur == '+': # 形如 123.4E+5self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumelif self.cur == '-': # 形如 123.4E-5self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value /= 10 ** powNumelif self.cur.isdigit(): # 形如 123.4E5while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumif self.cur.isalpha():self.syn = -999self.error(f"科學(xué)計(jì)數(shù)法后含有多余字母{self.cur}")returnself.syn = 202self.val = str(d_value) # double轉(zhuǎn)strelif self.cur == 'E' or self.cur == 'e': # 形如 123E??self.nextChar()powNum = 0if self.cur == '+': # 形如 123E+4self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumelif self.cur == '-': # 形如 123E-4self.nextChar()while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value /= 10 ** powNumelif self.cur.isdigit(): # 形如 123E4while self.cur.isdigit():powNum = powNum * 10 + int(self.cur)self.nextChar()d_value *= 10 ** powNumif self.cur.isalpha():self.syn = -999self.error(f"科學(xué)計(jì)數(shù)法后含有多余字母{self.cur}")returnself.syn = 202self.val = str(d_value)# ********************首字符為字母*****************elif self.cur.isalpha():# ====標(biāo)識(shí)符====while self.cur.isdigit() or self.cur.isalpha() or self.cur == '_':self.val.append(self.cur)self.nextChar()self.syn = 222# ====判斷是否為關(guān)鍵字====for i, keyword in enumerate(self.keyWords):if ''.join(self.val) == keyword:self.syn = i + 1break# ********************首字符為標(biāo)點(diǎn)*****************else:if self.cur == '+':self.syn = 101self.val.append(self.cur)self.nextChar()if self.cur == '+':self.syn = 131self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 136self.val.append(self.cur)self.nextChar()elif self.cur == '-':self.syn = 102self.val.append(self.cur)self.nextChar()if self.cur == '-':self.syn = 132self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 137self.val.append(self.cur)self.nextChar()elif self.cur == '*':self.syn = 103self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 138self.val.append(self.cur)self.nextChar()elif self.cur == '/':self.syn = 104self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 139self.val.append(self.cur)self.nextChar()# 單行注釋elif self.cur == '/':self.nextChar()while self.cur != '\n':self.nextChar()self.syn = 0# 多行注釋elif self.cur == '*':self.cp += 1haveEnd = Falseflag = 0for i in range(self.cp + 1, self.code_len):# print(self.code_char_list[i])if self.code_char_list[i - 1] == '*' and self.code_char_list[i] == '/':haveEnd = Trueflag = ibreakif haveEnd:self.syn = 0self.cp = flag + 1else:self.syn = -999self.error(" 多行注釋沒(méi)有結(jié)尾*/ ")elif self.cur == '%':self.syn = 105self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 140self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 106self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 118self.val.append(self.cur)self.nextChar()elif self.cur == '(':self.syn = 107self.val.append(self.cur)self.nextChar()elif self.cur == ')':self.syn = 108self.val.append(self.cur)self.nextChar()elif self.cur == '[':self.syn = 109self.val.append(self.cur)self.nextChar()elif self.cur == ']':self.syn = 110self.val.append(self.cur)self.nextChar()elif self.cur == '{':self.syn = 111self.val.append(self.cur)self.nextChar()elif self.cur == '}':self.syn = 112self.val.append(self.cur)self.nextChar()elif self.cur == ';':self.syn = 113self.val.append(self.cur)self.nextChar()elif self.cur == '>':self.syn = 114self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 116self.val.append(self.cur)self.nextChar()elif self.cur == '>':self.syn = 119self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 141self.val.append(self.cur)self.nextChar()elif self.cur == '<':self.syn = 115self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 117self.val.append(self.cur)self.nextChar()elif self.cur == '<':self.syn = 120self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 142self.val.append(self.cur)self.nextChar()elif self.cur == '!':self.syn = 121self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 122self.val.append(self.cur)self.nextChar()elif self.cur == '&':self.syn = 123self.val.append(self.cur)self.nextChar()if self.cur == '&':self.syn = 124self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 143self.val.append(self.cur)self.nextChar()elif self.cur == '|':self.syn = 125self.val.append(self.cur)self.nextChar()if self.cur == '|':self.syn = 126self.val.append(self.cur)self.nextChar()elif self.cur == '=':self.syn = 144self.val.append(self.cur)self.nextChar()elif self.cur == '\\': # \self.syn = 127self.val.append(self.cur)self.nextChar()elif self.cur == '\'': # ‘self.syn = 128self.val.append(self.cur)self.nextChar()elif self.cur == '\"': # ”self.nextChar()haveEnd = Falseflag = 0for i in range(self.cp, self.code_len):if self.code_char_list[i] == '"':haveEnd = Trueflag = ibreakif haveEnd:for j in range(self.cp, flag):self.val.append(self.code_char_list[j])self.cp = flag + 1self.cur = self.code_char_list[self.cp]self.syn = 203else:self.syn = -999self.error(" string常量沒(méi)有閉合的\" ")elif self.cur == ':':self.syn = 130self.val.append(self.cur)self.nextChar()if self.cur == ':':self.syn = 134self.val.append(self.cur)self.nextChar()elif self.cur == ',':self.syn = 133self.val.append(self.cur)self.nextChar()elif self.cur == '^': # 按位異或self.syn = 146self.val.append(self.cur)self.nextChar()if self.cur == '=':self.syn = 145self.val.append(self.cur)self.nextChar()elif self.cur == '#':self.syn = 147self.val.append(self.cur)self.nextChar()else:self.syn = -999self.error(f" 無(wú)效字符: {self.cur}")def solve(self):print("\n================scan-main begin================")self.code_char_list = list(self.input_code_str.strip()) # 去除頭尾的空格self.code_char_list.append('\n') # 末尾補(bǔ)充一個(gè)\n, 可在一些while判斷中 防止越界self.code_len = len(self.code_char_list)if self.bracket_match() != "ok": # 檢測(cè)括號(hào)匹配self.cp = self.bracket_match()self.error(f"{self.code_char_list[self.cp]}匹配缺失!")intCnt, doubleCnt, stringCnt, idCnt = 0, 0, 0, 0 # 內(nèi)碼值while True: # 至少執(zhí)行一次,如同do whileself.scanWord() # 進(jìn)入詞法分析value = ''.join(self.val) # char列表 ===> Stringnew_tf = ThreeFml(self.syn, -1, value) # 創(chuàng)建三元式對(duì)象if self.syn == 201: # 整型常數(shù)# print(f"整型常數(shù): {value}")if not any(tf == new_tf for tf in self.TFs): # append前先判斷是否有重復(fù)intCnt += 1new_tf.inPoint = intCntself.TFs.append(new_tf)elif self.syn == 202: # 浮點(diǎn)型常數(shù)# print(f"浮點(diǎn)型常數(shù): {value}")if not any(tf == new_tf for tf in self.TFs):doubleCnt += 1new_tf.inPoint = doubleCntself.TFs.append(new_tf)elif self.syn == 203: # 字符串常數(shù)# print(f"字符串常數(shù): {value}")if not any(tf == new_tf for tf in self.TFs):stringCnt += 1new_tf.inPoint = stringCntself.TFs.append(new_tf)elif self.syn == 222: # 標(biāo)識(shí)符# print(f"標(biāo)識(shí)符: {value}")if not any(tf == new_tf for tf in self.TFs):idCnt += 1new_tf.inPoint = idCntself.TFs.append(new_tf)elif 1 <= self.syn <= 100: # 關(guān)鍵字# print(f"關(guān)鍵字: {value}")if not any(tf == new_tf for tf in self.TFs):new_tf.inPoint = 1self.TFs.append(new_tf)elif self.syn == 0: # 注釋內(nèi)容、或者最后的\n# print("注釋 or 結(jié)束")passelif self.syn == -999: # 報(bào)錯(cuò)# print(f"error: {self.errInfo}")breakelse: # 符號(hào):標(biāo)點(diǎn)符、算符# print(f"符號(hào): {value}")if not any(tf == new_tf for tf in self.TFs):new_tf.inPoint = 1self.TFs.append(new_tf)if self.cp >= (self.code_len - 1): # 最后一個(gè)元素 為自主添加的\n,代表結(jié)束# print(f"{cp} 跳出")breakif self.errInfo: # 檢查是否有報(bào)錯(cuò)print(self.errInfo)returnself.TFs.sort() # 給三元式列表TFs排序(按種別碼、內(nèi)碼)for tf in self.TFs: # 打印print(f"({tf.syn}, {tf.inPoint}, {tf.value})")print("================scan-main end================")if __name__ == '__main__':filepath = "./code.txt"with open(filepath, "r") as file:code = file.read()word_analysis = WordAnalysis(code)word_analysis.solve()
輸入:通過(guò)讀取txt文件輸入要分析的源程序串
輸出:
第一行表示注釋內(nèi)容
后面為三元式(種別碼,內(nèi)碼值,自身值)
從結(jié)果可以看到:
- 輸入代碼串中有兩個(gè)int關(guān)鍵字,并沒(méi)有重復(fù)輸出,只保留1個(gè)
- 輸入代碼串中識(shí)別到多個(gè)標(biāo)識(shí)符(我設(shè)定的種別碼為222),由于它們的值不同,所以在種別碼相同的情況下,給出不同的內(nèi)碼值。
括號(hào)匹配
字符串常量未閉合
多行注釋未閉合
十六進(jìn)制數(shù)不規(guī)范
科學(xué)計(jì)數(shù)法不規(guī)范
體會(huì)
在本次的實(shí)驗(yàn)中,通過(guò)對(duì)詞法分析器的編寫,在理論的基礎(chǔ)上加深了對(duì)詞法分析的理解和實(shí)踐,所編寫的詞法分析器在多次的測(cè)試中均得到了正確的結(jié)果。
此外,我是先用c++編寫的代碼,確認(rèn)大部分功能無(wú)誤后,再改用python編寫。在改語(yǔ)言的過(guò)程中,明顯感受到python 的便利之處,就比如一個(gè)簡(jiǎn)單的判斷字符是否為字母,在c++里需要自定義一個(gè)函數(shù)來(lái)判斷(if ch>=’a’ and if ch<=’z’),而python則直接使用系統(tǒng)自帶的isalpha函數(shù)即可,大大簡(jiǎn)化了代碼量。
問(wèn)題
編寫的程序中,雖然已完成絕大部分單詞分析功能,但對(duì)一些小細(xì)節(jié)就沒(méi)有進(jìn)行直接的編寫。例如在識(shí)別用科學(xué)表示法表示的浮點(diǎn)型常量時(shí),并沒(méi)有考慮是否會(huì)溢出C++語(yǔ)言中的double類型,當(dāng)然,這也可以認(rèn)為是語(yǔ)義分析的任務(wù),而非詞法分析的任務(wù),但這可以是程序改善的一處地方。
附思路流程圖
總體邏輯:
主函數(shù)邏輯