博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
How to remove duplicate lines in a large text file?
阅读量:5452 次
发布时间:2019-06-15

本文共 1604 字,大约阅读时间需要 5 分钟。

How would you remove duplicate lines from a file that is  much too large to fit in memory? The duplicate lines are not necessarily adjacent, and say the file is 10 times bigger than RAM.

A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.

Java:

// Efficient Java program to remove // duplicates from input.txt and  // save output to output.txt   import java.io.*; import java.util.HashSet;   public class FileOperation {     public static void main(String[] args) throws IOException      {         // PrintWriter object for output.txt         PrintWriter pw = new PrintWriter("output.txt");                   // BufferedReader object for input.txt         BufferedReader br = new BufferedReader(new FileReader("input.txt"));                   String line = br.readLine();                   // set store unique values         HashSet
hs = new HashSet
(); // loop for each line of input.txt while(line != null) { // write only if not // present in hashset if(hs.add(line)) pw.println(line); line = br.readLine(); } pw.flush(); // closing resources br.close(); pw.close(); System.out.println("File operation performed successfully"); } }

  

 

 

 

 

转载于:https://www.cnblogs.com/lightwindy/p/9650718.html

你可能感兴趣的文章
【CF772D】Varying Kibibits FWT
查看>>
微信网页授权调试
查看>>
不要有这样的思维定势
查看>>
十万个为什么 —— 自然的好奇
查看>>
指针应用时的注意事项
查看>>
作为电磁波的 Wi-Fi 信号
查看>>
一步步学会用docker部署应用(nodejs版)
查看>>
让表单input等文本框为只读不可编辑的方法-转
查看>>
Flink window Function - ProcessAllWindowFunction
查看>>
创建物料批次特性
查看>>
UI基础一:值节点赋值
查看>>
sql
查看>>
快速排序练习(二)
查看>>
网上见到一同行发的隐私政策 备以后用
查看>>
将响应数据进行压缩处理的过滤器(CompressionFilter)
查看>>
安装Hadoop
查看>>
感性的人最理性,理性的人很感性
查看>>
《Java开发手册》学习进程之第4章控制流程语句
查看>>
log4j xml配置
查看>>
BZOJ 1924: [Sdoi2010]所驼门王的宝藏 【tarjan】
查看>>