这几天准备完善下 Base64 & UUE 编码文件生成工具,发现处理大文件时,特别慢,分析了一下发现是字符串拼接和切分代码效率太低,看如下代码:
Private Sub Command1_Click()Dim fL As Long, enfp As Integer, defp As Integer, enfn, defnDim B() As Byte, tmpstr As String, outStr As StringDim timx As Singletimx = Timerenfn = Text1.Textdefn = Text2.Textenfp = FreeFileOpen enfn For Binary As #enfpfL = LOF(enfp)ReDim B(fL - 1)Get #enfp, , BClose #enfptmpstr = StrConv(B, vbUnicode)defp = FreeFileOpen defn For Output As #defpDo While Len(tmpstr) > 60outStr = "M" & Mid(tmpstr, 1, 60)tmpstr = Mid(tmpstr, 61) '这句导致效率变低 20220522Print #defp, outStrDoEventsLoopPrint #defp, tmpstrClose #defpMsgBox "处理:" & fL & " 字节用时:" & Timer - timx & " 秒"
End Sub
编码结果得到的字符串,切分为固定长度时,这句:
tmpstr = Mid(tmpstr, 61) '这句导致效率变低 20220522
本意是将切过剩下的字符串取出来,在字符串短的时候没什么影响,但是字符串长度增加后,其速度越来越慢,于是重新想了一个办法:
Private Sub Command2_Click()Dim fL As Long, enfp As Integer, defp As Integer, enfn, defnDim B() As Byte, tmpstr As String, outStr As StringDim EDim timx As Singletimx = Timerenfn = Text1.Textdefn = Text2.Textenfp = FreeFileOpen enfn For Binary As #enfpfL = LOF(enfp)ReDim B(fL - 1)Get #enfp, , BClose #enfptmpstr = StrConv(B, vbUnicode)defp = FreeFileE = 1Open defn For Output As #defpDo While (fL - E) > 60outStr = "M" & Mid(tmpstr, E, 60)Print #defp, outStrDoEventsE = E + 60LoopoutStr = Mid(tmpstr, E, 60)Print #defp, outStrClose #defpMsgBox "处理:" & fL & " 字节用时:" & Format(Timer - timx, "0.000000") & " 秒"
End Sub
只从原字符串截取指定长度字符,不再变动原字符串,效率一下子提升了几百倍(字符串越长,提升效率越大)。
’================================================================
另外,对于整个文件读取来说,原先使用的是 :Line Input
Open defn For Input As #defpDo While Not EOF(defp)Line Input #defp, tmpstrEnStr = EnStr & tmpstrLoopClose #defp
同理其中 EnStr = EnStr & tmpstr 这句字符串拼接语句也导致了读取效率超低,于是想到了使用 Adodb.Stream 来一次读取整个文件,同样的,小文件时不明显,但是对于2Mb以上的文件来说,obj.readtext 这句效率居然超低,对于8.27 MB的文件需时可达7.32秒。
Private Sub Command3_Click()Dim str, stm, enfn, defnDim timx As Single, tmpstr As Stringtimx = Timerenfn = Text1.Textdefn = Text2.TextSet stm = CreateObject("Adodb.Stream")stm.Type = 2 '1 bin,2 txtstm.Mode = 3stm.Openstm.Charset = "GB2312"stm.LoadFromFile enfnstr = stm.readtext '------ 低效 7.32秒' str = stm.Read '--------高效 0.015秒stm.CloseSet stm = Nothing
' tmpstr = StrConv(str, vbUnicode)MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub
于是改为 Obj.Read ,发现效率立马提升近500倍。
Private Sub Command3_Click()Dim str, stm, enfn, defnDim timx As Single, tmpstr As Stringtimx = Timerenfn = Text1.Textdefn = Text2.TextSet stm = CreateObject("Adodb.Stream")stm.Type = 1 '1 bin,2 txtstm.Mode = 3stm.Open
' stm.Charset = "GB2312"stm.LoadFromFile enfn' str = stm.readtext '------ 低效 7.32秒str = stm.Read '--------高效 0.015秒stm.CloseSet stm = Nothingtmpstr = StrConv(str, vbUnicode)MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub
可见还是因为字符串拼接导致效率变低,同时,与我前面直接用单子节数组读取完整文件的方法比较,Adodb.Stream Obj.Read 的效率还是低了,用之前 8.27MB的文件,以下代码已经计算不出延时,几乎为 0 了。于是更换了一个 75.7 MB 的文件,Adodb.Stream Obj.Read 用时:0.109秒,而以下代码用时:0.023秒,可见 open 语句读取整个文件的话,效率至少是 Adodb.Stream Obj.Read 的 4 倍。
Private Sub Command4_Click()Dim fL As Long, enfp As Integer, defp As Integer, enfn, defnDim B() As Byte, tmpstr As String, outStr As StringDim timx As Singletimx = Timerenfn = Text1.Textdefn = Text2.Textenfp = FreeFileOpen enfn For Binary As #enfpfL = LOF(enfp)ReDim B(fL - 1) '----比 Adodb.Stream 更高效 Get #enfp, , BClose #enfp
' tmpstr = StrConv(B, vbUnicode)MsgBox "完成读取文件用时:" & Format((Timer - timx), "0.000000") & " 秒" '& Chr(B(0))
End Sub
'============================================
同时,在之前的 Base64 编码结果拼接时,原先使用的是字符直接拼接的方法(见:一个 VBS 写的 Base64 + UUE 编码程序源码,可自定义编码表_jessezappy的博客-CSDN博客): ret = ret & Chr(Base64EncMap((first \ 4) And 63)) ,全部拼接完成后返回整个字符串,也是在数据量变大后,发现其效率超级低,后来,将其改为先保存编码结果至 byte 单字节数组,
ReDim Preserve ret(retLength + 4)
ret(retLength + 1) = (Base64EncMap((first \ 4) And 63))
ret(retLength + 2) = (Base64EncMap(((first * 16) And 48) + ((second \ 16) And 15)))
ret(retLength + 3) = (Base64EncMap(((second * 4) And 60) + ((third \ 64) And 3)))
ret(retLength + 4) = (Base64EncMap(third And 63))
最后将单字节数组直接用 StrConv(ret, vbUnicode) 转换为字符串,对比效率提升了近千倍(倍率由编码数据长度决定)。
’===========================================
综上所述,字符串的拼接,裁剪,是导致以上代码效率变低的罪魁祸首。
----------此记
本文链接:https://my.lmcjl.com/post/8846.html
4 评论