There is a jsonline file file.txt size is about 10K you can do like this:
def get_lines(): with open('file.txt','rb') as f: return f.readlines() if __name__ == '__main__': for e in get_lines(): process(e)
Now I have to process a file with a size of 10G, but the memory is only 4G. What should I do if I only modify the get_lines function and the other code remains the same? What are the issues that need to be considered?
def get_lines(): with open('file.txt','rb') as f: for i in f: yield i
The problems to be considered are: only 4G memory can not read 10G files at a time, and it is necessary to read in batches and read data in batches to record the location of each read data. The size of the data read in batches each time, too small will take too much time in the read operation.
from mmap import mmap def get_lines(fp): with open(fp,"r+") as f: m = mmap(f.fileno(), 0) tmp = 0 for i, char in enumerate(m): if char==b"\n": yield m[tmp:i+1].decode() tmp = i+1 if __name__=="__main__": for i in get_lines("fp_some_huge_file"): print(i)