196-删除重复的电子邮箱

发布时间 2023-07-11 15:07:23作者: OnlyOnYourself-Lzw

删除重复的电子邮箱

原文地址:196. 删除重复的电子邮箱 - 力扣(LeetCode)

  • 题目如下所示

个人题解

这道题难度还可以,还算简单,不过需要一定的熟练度(个人已经很久没练 SQL ,有些生疏了),个人的思考过程和总结如下列 SQL 所示

  • -- 建表
    CREATE TABLE 196_person(
    	id INT PRIMARY KEY,
    	email VARCHAR(30)
    );
    -- PS: 电子邮件将不包含大写字母
    
    -- 编写一个 SQL 删除语句来 删除 所有重复的电子邮件,只保留一个 id 最小的唯一电子邮件。以 任意顺序 返回结果表。(注意:仅需要写删除语句,将自动对剩余结果进行查询)
    
    -- 查找重复 email, 并根据 MIN 函数查找最小的 id
    SELECT p.email, count(p.email), MIN(p.id) FROM 196_person p GROUP BY p.email HAVING COUNT(p.email) > 1;
    
    -- 下面这个是错误的 MySQL 语法,MySQL 不允许 update 或者 delete 目标表和子查询里面的表为同一张表
    DELETE FROM 196_person p
    WHERE 
    p.email IN (SELECT p1.email FROM 196_person p1 GROUP BY p1.email HAVING COUNT(p1.email) > 1) 
    AND
    p.id NOT IN (SELECT MIN(p2.id) FROM 196_person p2 GROUP BY p2.email HAVING COUNT(p2.email) > 1);
    
    -- 验证该语句是否正确
    SELECT * FROM 196_person p
    WHERE 
    p.email IN (SELECT p1.email FROM 196_person p1 GROUP BY p1.email HAVING COUNT(p1.email) > 1) 
    AND
    p.id NOT IN (SELECT MIN(p2.id) FROM 196_person p2 GROUP BY p2.email HAVING COUNT(p2.email) > 1);
    
    -- 将子查询里面的表,写成子表即可(可通过 LeetCode)
    DELETE FROM 196_person p
    WHERE 
    p.email IN (SELECT * FROM (SELECT p1.email FROM 196_person p1 GROUP BY p1.email HAVING COUNT(p1.email) > 1) as a) 
    AND
    p.id NOT IN (SELECT * from (SELECT MIN(p2.id) FROM 196_person p2 GROUP BY p2.email HAVING COUNT(p2.email) > 1) as b);
    -- 以下是官网给出的答案,也挺不错的,刚才我也有想到,不过我想的是不等于,它这里是大于
    
    SELECT p1.* FROM Person p1,Person p2
    WHERE 
    p1.Email = p2.Email AND p1.Id > p2.Id	
    -- 转换上面的查询语句
    DELETE p1 FROM Person p1,  Person p2
      WHERE
      p1.Email = p2.Email AND p1.Id > p2.Id
    
    
      /* 
      以下是对上述SQL的释义:
      a. 从表p1取出3条记录;
      b. 拿着第1条记录去表p2查找满足WHERE的记录,代入该条件p1.Email = p2.Email AND p1.Id > p2.Id后,发现没有满足的,所以不用删掉记录1;
      c. 记录2同理;
      d. 拿着第3条记录去表p2查找满足WHERE的记录,发现有一条记录满足,所以要从p1删掉记录3;
      e. 3条记录遍历完,删掉了1条记录,这个DELETE也就结束了。
      */
    
      -- 还有一个比较通俗易懂的答案,很好理解,如果不是能很好理解官网的自连接方法的话
      -- 使用 GROUP BY 的方法,且不用像我之前的那样一定要让 COUNT(email) > 1,因为如果不大于 1 的话,那也说明是最小 id
      DELETE FROM 196_person p WHERE p.id NOT IN (
      	SELECT id FROM 
      			(
      				SELECT MIN(id) as id FROM 196_person GROUP BY email
      			) as tb 
      )
      -- 不过根据官网的解释,自连接相对于子查询在大数据情境下是更具优势的,因为可以走索引,索引还是要尽量理解和学习官网的答案