Intersection of Multiple Arrays Sorted Unsorted

本文探讨了多个已排序数组及未排序数组的交集算法,包括双指针法、部分排序加二分查找以及哈希表等方法,并讨论了不同算法的时间复杂度。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


Let's first talk about just two arrays.


If the arrays are sorted, everything becomes easier:


/* Function prints Intersection of arr1[] and arr2[]
   m is the number of elements in arr1[]
   n is the number of elements in arr2[] */
int printIntersection(int arr1[], int arr2[], int m, int n)
{
  int i = 0, j = 0;
  while (i < m && j < n)
  {
    if (arr1[i] < arr2[j])
      i++;
    else if (arr2[j] < arr1[i])
      j++;
    else /* if arr1[i] == arr2[j] */
    {
      printf(" %d ", arr2[j++]);
      i++;
    }
  }
}
 

Source: http://www.geeksforgeeks.org/union-and-intersection-of-two-sorted-arrays-2/


It's pretty straight forward that if the current element of array1 is less than the current element of array2, just increment the index of array by 1, and try the equal test again. If we do get equal values, just increment both indices. If the value of array2 is larger just, just increment its index. By doing so, we gradually iterate through two arrays. The time complexity is O(i+j) the sum of length of both arrays. If the two arrays are not sorted, the time complexity needs to  include the time O(iLogi + jLogj) for sorting. 


A second way of solving the problem when the two arrays are not sorted is a partial sorting + binary search.


func intersection2(arr1, arr2):
  intersect = { }
  sort(arr2)
  for i = 0 to arr1.length:
    if binarySearch(arr2, arr1[i]): // returns true if arr1[i] is in arr2
      intersect.add(arr1[i])
  return intersect

This is also quite simple, and it is more efficient when one array is smaller than another one. Instead of sorting both arrays, we just sort the shorter array, and run a binary targeting elements from the unsorted array. The time complexity is O(iLogi + jLogi) or O((i+j)Logi). When i is significantly smaller than j, this algorithm is more efficient than sorting both arrays. 


Another method is of course using hashing. Assume the time complexity of retrieving element from hashing is O(1). The time complexity of caching elements from one array, and searching elements from another array against the cache is O(i + j). 


Reference: http://www.geeksforgeeks.org/find-union-and-intersection-of-two-unsorted-arrays/


Things get more interesting when we come to multiple arrays. 


A found a good article about finding overlaps of three arrays.I wasn't really expecting the first article I would like to introduce is from GreeksforGreeks. But it seems I can't find similar questions either on LeetCode or LintCode. As usual, you get more detailed presentation of the question, more discussion of the the idea. But on the other hand, the time complexity is not very well discussed.

  while (i < n1 && j < n2 && k < n3)
    {
         // If x = y and y = z, print any of them and move ahead 
         // in all arrays
         if (ar1[i] == ar2[j] && ar2[j] == ar3[k])
         {   cout << ar1[i] << " ";   i++; j++; k++; }
 
         // x < y
         else if (ar1[i] < ar2[j])
             i++;
 
         // y < z
         else if (ar2[j] < ar3[k])
             j++;
 
         // We reach here when x > y and z < y, i.e., z is smallest
         else
             k++;
    }

The key part of this solution is how to apply the method of finding overlaps of two sorted arrays to three arrays. The solution above solves the process of finding the smallest element from three arrays in a very elegant way. 

If array1 < array2 , just increment index of array1,

if array1 > array2, just increment index of array2,

when none of the above conditions are true means that they are equal so just increment index of array3.

It's no doubts the complexity is O(i + j + k).


Finally, a more general solution for n sorted arrays coming from stackoverflow. The idea is to find overlaps of two arrays at each time and consider the result as a new array, and find the overlaps between this new array and a third array from the rest. 


Reference: http://stackoverflow.com/questions/5630685/efficient-algorithm-to-produce-the-n-way-intersection-of-sorted-arrays-in-c



http://www.geeksforgeeks.org/find-common-elements-three-sorted-arrays/




### 哈夫曼编码实现 哈夫曼编码是一种基于字符频率的压缩技术,通过构建一棵二叉树来生成最优前缀码。以下是针对字符串 `'The following code computes the intersection of two arrays'` 的 Python 实现过程。 #### 字符频率统计 首先需要统计输入字符串中每个字符的出现次数。这可以通过遍历字符串并记录每种字符的数量完成[^2]。 ```python from collections import Counter def calculate_frequencies(text): frequencies = Counter(text) return dict(frequencies) text = 'The following code computes the intersection of two arrays' frequencies = calculate_frequencies(text) print("Character Frequencies:", frequencies) ``` #### 构建哈夫曼树 利用字符及其对应的频率数据,可以按照以下方式构建哈夫曼树: 1. 创建一个优先队列(最小堆),其中每个节点表示一种字符以及其频率。 2. 不断取出两个具有最低频率的节点,创建一个新的内部节点作为它们的父亲,并将其频率设置为两子节点频率之和。 3. 将新节点重新加入优先队列,直到只剩下一个根节点为止。 ```python import heapq class HuffmanNode: def __init__(self, char, freq): self.char = char self.freq = freq self.left = None self.right = None def __lt__(self, other): return self.freq < other.freq def build_huffman_tree(freq_dict): priority_queue = [] for char, freq in freq_dict.items(): node = HuffmanNode(char, freq) heapq.heappush(priority_queue, node) while len(priority_queue) > 1: l
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值