Java面试黄金宝典25-优快云博客

本文链接：https://blog.youkuaiyun.com/ylfhpy/article/details/146801059

1. 对 100 万个玩家的积分中前 100 名积分进行实时更新

定义

该问题旨在实时追踪并展示 100 万个玩家中积分排名前 100 的玩家信息。随着玩家通过完成任务或获取金钱改变积分，系统需要迅速更新排名并展示最新的前 100 名。

要点

运用 Java 的 PriorityQueue 构建最小堆，以此维护前 100 名的积分。
借助 HashMap 存储玩家 ID 与积分的映射，便于快速定位玩家积分。
当玩家积分更新时，先从 HashMap 中获取原积分。若原积分不在前 100 名且新积分大于堆顶元素，或者原积分在前 100 名，均需更新堆。

应用

游戏排行榜：实时展示玩家排名，激励玩家竞争。
电商平台销售排行榜：实时显示商品销售排名，引导用户购买。

示例代码

java

import java.util.*;

class PlayerScore {
    int playerId;
    int score;

    public PlayerScore(int playerId, int score) {
        this.playerId = playerId;
        this.score = score;
    }
}

public class Top100Players {
    private PriorityQueue<PlayerScore> minHeap;
    private Map<Integer, Integer> scoreMap;

    public Top100Players() {
        minHeap = new PriorityQueue<>(100, Comparator.comparingInt(p -> p.score));
        scoreMap = new HashMap<>();
    }

    public void updateScore(int playerId, int newScore) {
        int oldScore = scoreMap.getOrDefault(playerId, 0);
        scoreMap.put(playerId, newScore);

        if (minHeap.size() < 100) {
            minHeap.offer(new PlayerScore(playerId, newScore));
        } else if (newScore > minHeap.peek().score) {
            if (oldScore <= minHeap.peek().score) {
                minHeap.poll();
                minHeap.offer(new PlayerScore(playerId, newScore));
            } else {
                // 更新堆中已有的元素
                List<PlayerScore> temp = new ArrayList<>();
                while (!minHeap.isEmpty()) {
                    PlayerScore ps = minHeap.poll();
                    if (ps.playerId == playerId) {
                        ps.score = newScore;
                    }
                    temp.add(ps);
                }
                minHeap.addAll(temp);
            }
        }
    }

    public List<PlayerScore> getTop100() {
        List<PlayerScore> top100 = new ArrayList<>(minHeap);
        top100.sort((p1, p2) -> p2.score - p1.score);
        return top100;
    }
}

2. 从 10 亿条短信中找出前一万条重复率高的

定义

此问题要求从海量的 10 亿条短信数据里，找出重复出现次数最多的前一万条短信。

要点

采用分块处理数据的方式，防止内存溢出。
利用 HashMap 统计每条短信的出现次数。
借助 PriorityQueue 维护前一万条短信。

应用

垃圾短信检测：找出频繁出现的短信模式，识别垃圾短信。
热点话题分析：通过统计短信内容，找出热门话题。

示例代码

java

import java.util.*;

public class Top10000Messages {
    public List<String> findTop10000(List<String> messages) {
        Map<String, Integer> countMap = new HashMap<>();
        for (String message : messages) {
            countMap.put(message, countMap.getOrDefault(message, 0) + 1);
        }

        PriorityQueue<Map.Entry<String, Integer>> minHeap = new PriorityQueue<>(10000, Map.Entry.comparingByValue());
        for (Map.Entry<String, Integer> entry : countMap.entrySet()) {
            if (minHeap.size() < 10000) {
                minHeap.offer(entry);
            } else if (entry.getValue() > minHeap.peek().getValue()) {
                minHeap.poll();
                minHeap.offer(entry);
            }
        }

        List<String> top10000 = new ArrayList<>();
        while (!minHeap.isEmpty()) {
            top10000.add(minHeap.poll().getKey());
        }
        Collections.reverse(top10000);
        return top10000;
    }
}

3. 对一万条数据排序，最好的方式是什么

定义

针对一万条数据的排序问题，需要选择一种高效的排序算法，以实现数据的有序排列。

要点

快速排序：平均时间复杂度为 O(nlogn)，空间复杂度为 O(logn)，实现简单，但最坏情况下时间复杂度为 O(n^2)。
归并排序：时间复杂度稳定为 O(nlogn)，空间复杂度为 O(n)，排序稳定，但需要额外空间。
堆排序：时间复杂度为 O(nlogn)，空间复杂度为 O(1)，无需额外空间，但实现相对复杂。

应用

数据库查询结果排序：对查询到的数据进行排序展示。
数据分析：对数据进行排序以便后续分析。

示例代码（快速排序）

java

import java.util.Arrays;

public class QuickSort {
    public static void quickSort(int[] arr, int low, int high) {
        if (low < high) {
            int pivotIndex = partition(arr, low, high);
            quickSort(arr, low, pivotIndex - 1);
            quickSort(arr, pivotIndex + 1, high);
        }
    }

    private static int partition(int[] arr, int low, int high) {
        int pivot = arr[high];
        int i = low - 1;
        for (int j = low; j < high; j++) {
            if (arr[j] < pivot) {
                i++;
                swap(arr, i, j);
            }
        }
        swap(arr, i + 1, high);
        return i + 1;
    }

    private static void swap(int[] arr, int i, int j) {
        int temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }

    public static void main(String[] args) {
        int[] arr = {5, 3, 8, 4, 2, 7, 1, 6};
        quickSort(arr, 0, arr.length - 1);
        System.out.println(Arrays.toString(arr));
    }
}

4. 用最优的方式计算出一个大文件中一个字符串是否存在

定义

在一个包含大量字符串的大文件中，需要高效地判断指定字符串是否存在。

要点

利用布隆过滤器进行初步筛选，减少不必要的文件读取操作。布隆过滤器是一种空间效率高的概率型数据结构，若判断元素不存在，则元素一定不存在；若判断元素存在，则元素可能存在。
初始化布隆过滤器时，根据文件大小和预期的误判率设置合适的参数。
遍历文件，将每个字符串添加到布隆过滤器中。
检查目标字符串是否存在于布隆过滤器中，若存在，则进一步在文件中查找。

应用

搜索引擎缓存：快速判断网页是否已被收录。
数据库查询优化：减少不必要的磁盘 I/O 操作。

示例代码

java

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class StringSearchInFile {
    public static boolean searchStringInFile(String filePath, String target) {
        try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
            BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(), 1000000, 0.01);
            String line;
            while ((line = reader.readLine()) != null) {
                bloomFilter.put(line);
            }

            if (!bloomFilter.mightContain(target)) {
                return false;
            }

            reader.close();
            try (BufferedReader newReader = new BufferedReader(new FileReader(filePath))) {
                while ((line = newReader.readLine()) != null) {
                    if (line.equals(target)) {
                        return true;
                    }
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return false;
    }
}

5. 设计一个算法快速找出十亿条 QQ 在线日志记录中今天的在线人数

定义

从十亿条 QQ 在线日志记录里，快速统计出当天的在线用户数量。

要点

采用分块处理数据的方式，避免内存溢出。
利用 HashSet 存储用户 ID 以去重。

应用

社交平台活跃度统计：了解用户在线情况。
网络流量分析：评估网络使用情况。

示例代码

java

import java.util.HashSet;
import java.util.Set;

public class OnlineUserCount {
    public int countOnlineUsers(String[] logs) {
        Set<String> userSet = new HashSet<>();
        for (String log : logs) {
            // 假设日志格式为 "用户ID 时间"
            String userId = log.split(" ")[0];
            userSet.add(userId);
        }
        return userSet.size();
    }
}

6. 统计 4 个 10G 文件中 Top10 的单词

定义

对 4 个各为 10G 的文件进行处理，统计其中出现次数最多的前 10 个单词。

要点

运用分治策略，将每个文件分成多个小块，分别统计每个小块中单词的出现次数。
利用 HashMap 统计单词出现次数。
借助 PriorityQueue 维护前 10 个单词。

应用

文本挖掘：找出高频词汇，分析文本主题。
搜索引擎优化：了解热门关键词。

示例代码

java

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;

public class Top10Words {
    public List<String> findTop10Words(String[] filePaths) {
        Map<String, Integer> countMap = new HashMap<>();
        for (String filePath : filePaths) {
            try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
                String line;
                while ((line = reader.readLine()) != null) {
                    String[] words = line.split("\\s+");
                    for (String word : words) {
                        countMap.put(word, countMap.getOrDefault(word, 0) + 1);
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        PriorityQueue<Map.Entry<String, Integer>> minHeap = new PriorityQueue<>(10, Map.Entry.comparingByValue());
        for (Map.Entry<String, Integer> entry : countMap.entrySet()) {
            if (minHeap.size() < 10) {
                minHeap.offer(entry);
            } else if (entry.getValue() > minHeap.peek().getValue()) {
                minHeap.poll();
                minHeap.offer(entry);
            }
        }

        List<String> top10 = new ArrayList<>();
        while (!minHeap.isEmpty()) {
            top10.add(minHeap.poll().getKey());
        }
        Collections.reverse(top10);
        return top10;
    }
}

7. 在三个大于 10G 的文件（每行一个数字）和 100M 内存的主机上，找到在三个文件都出现且次数最多的 10 个字符串

定义

在内存有限（100M）的情况下，处理三个大于 10G 且每行包含一个数字的文件，找出在三个文件中都出现且出现次数最多的前 10 个数字。

要点

采用分治策略，将每个文件分成多个小块，分别统计每个小块中数字的出现次数。
利用 HashMap 统计数字出现次数。
借助 PriorityQueue 维护前 10 个数字。

应用

数据挖掘：找出多个数据源中共同出现的高频数据。
日志分析：分析多个日志文件中的共同特征。

示例代码

java

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;

public class Top10NumbersInThreeFiles {
    public List<String> findTop10Numbers(String[] filePaths) {
        Map<String, int[]> countMap = new HashMap<>();
        for (int i = 0; i < filePaths.length; i++) {
            try (BufferedReader reader = new BufferedReader(new FileReader(filePaths[i]))) {
                String line;
                while ((line = reader.readLine()) != null) {
                    int[] counts = countMap.computeIfAbsent(line, k -> new int[3]);
                    counts[i]++;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        PriorityQueue<Map.Entry<String, int[]>> minHeap = new PriorityQueue<>(10, Comparator.comparingInt(e -> Arrays.stream(e.getValue()).sum()));
        for (Map.Entry<String, int[]> entry : countMap.entrySet()) {
            int[] counts = entry.getValue();
            if (counts[0] > 0 && counts[1] > 0 && counts[2] > 0) {
                if (minHeap.size() < 10) {
                    minHeap.offer(entry);
                } else if (Arrays.stream(counts).sum() > Arrays.stream(minHeap.peek().getValue()).sum()) {
                    minHeap.poll();
                    minHeap.offer(entry);
                }
            }
        }

        List<String> top10 = new ArrayList<>();
        while (!minHeap.isEmpty()) {
            top10.add(minHeap.poll().getKey());
        }
        Collections.reverse(top10);
        return top10;
    }
}

8. 什么是直接插入排序

定义

直接插入排序是一种简单的排序算法，其基本思想是将未排序数据依次插入到已排序序列的合适位置。从第二个元素开始，将其与前面已排序的元素逐一比较，找到合适位置后插入。

要点

时间复杂度：O(n^2)，空间复杂度：O(1)。
适用于小规模数据或接近有序的数据。

应用

数据量较小且对稳定性有要求的排序场景。
对基本有序的数据进行排序。

示例代码

java

public class InsertionSort {
    public static void insertionSort(int[] arr) {
        int n = arr.length;
        for (int i = 1; i < n; i++) {
            int key = arr[i];
            int j = i - 1;
            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                j--;
            }
            arr[j + 1] = key;
        }
    }
}

9. 什么是希尔排序

定义

希尔排序是对插入排序的改进算法。它先将原始数据分成多个子序列，对每个子序列进行插入排序，然后逐渐缩小子序列的间隔，直至间隔为 1，此时进行一次普通的插入排序。

要点

时间复杂度：平均情况下为 O(n^1.3)，最坏情况下为 O(n^2)，空间复杂度：O(1)。
性能优于直接插入排序。

应用

对中等规模数据进行排序。
对部分有序的数据进行排序。

示例代码

java

public class ShellSort {
    public static void shellSort(int[] arr) {
        int n = arr.length;
        for (int gap = n / 2; gap > 0; gap /= 2) {
            for (int i = gap; i < n; i++) {
                int temp = arr[i];
                int j;
                for (j = i; j >= gap && arr[j - gap] > temp; j -= gap) {
                    arr[j] = arr[j - gap];
                }
                arr[j] = temp;
            }
        }
    }
}

10. 什么是冒泡排序

定义

冒泡排序是一种简单的排序算法，其基本思想是重复遍历待排序数列，依次比较相邻的两个元素，若顺序错误则交换它们的位置，直到整个数列有序。

要点

时间复杂度：O(n^2)，空间复杂度：O(1)。
适用于小规模数据。

应用

教学场景：用于讲解排序算法的基本原理。
数据量极小的排序场景。

示例代码

java

public class BubbleSort {
    public static void bubbleSort(int[] arr) {
        int n = arr.length;
        for (int i = 0; i < n - 1; i++) {
            for (int j = 0; j < n - i - 1; j++) {
                if (arr[j] > arr[j + 1]) {
                    int temp = arr[j];
                    arr[j] = arr[j + 1];
                    arr[j + 1] = temp;
                }
            }
        }
    }
}

友情提示：本文已经整理成文档，可以到如下链接免积分下载阅读

https://download.youkuaiyun.com/download/ylfhpy/90553414