Strings

@GwtCompatible
public final class Strings {
  private Strings() {}

  /**
   *若string为null,则返回空串;否则,返回自身
   */
  public static String nullToEmpty(@Nullable String string) {
    return (string == null) ? "" : string;
  }

  /**
   * 若string为null或者empty,都返回null;否则,返回自身
   */
  @Nullable
  public static String emptyToNull(@Nullable String string) {
    return isNullOrEmpty(string) ? null : string;
  }

  /**
   *当string为null或者为空时,返回true;否则,返回false
   *这里判断空串是通过length()==0,而java6中是isEmpty()
   */
  public static boolean isNullOrEmpty(@Nullable String string) {
    return string == null || string.length() == 0; // string.isEmpty() in Java 6
  }

  /**
   * 字符串填补
   * @param string 放在结果字符串的最后一部分
   * @param minLength 返回结果串应该具有的最小长度。可以为0或者负数,这样返回的是string本身
   * @param padChar 用于填补的字符,插入在string的前面,插入次数直到返回串的总长度达到minLength
   * @return the padded string
   */
  public static String padStart(String string, int minLength, char padChar) {
    checkNotNull(string);  
    if (string.length() >= minLength) {
      return string;
    }
    StringBuilder sb = new StringBuilder(minLength);
    for (int i = string.length(); i < minLength; i++) {
      sb.append(padChar);
    }
    sb.append(string);
    return sb.toString();
  }

  /**
   * 与padStart类似,只不过padChar放到了string的后面
   */
  public static String padEnd(String string, int minLength, char padChar) {
    checkNotNull(string);
    if (string.length() >= minLength) {
      return string;
    }
    StringBuilder sb = new StringBuilder(minLength);
    sb.append(string);
    for (int i = string.length(); i < minLength; i++) {
      sb.append(padChar);
    }
    return sb.toString();
  }

  /**
   *字符串拷贝
   * @param string 非空字符串
   * @param count 重复次数,非负的
   * @return a string containing {@code string} repeated {@code count} times
   *     (the empty string if {@code count} is zero)
   * @throws IllegalArgumentException if {@code count} is negative
   */
  public static String repeat(String string, int count) {
    checkNotNull(string);  // eager for GWT.

    if (count <= 1) {
      checkArgument(count >= 0, "invalid count: %s", count);
      return (count == 0) ? "" : string;
    }

    final int len = string.length();
    final long longSize = (long) len * (long) count;
    final int size = (int) longSize;
  //如果最后得到的字符串过长,直接抛出异常
    if (size != longSize) {
      throw new ArrayIndexOutOfBoundsException(
          "Required array size too large: " + longSize);
    }

    final char[] array = new char[size];
  //直接调用系统的字符串拷贝函数:System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);

   string.getChars(0, len, array, 0);
   int n;
    for (n = len; n < size - n; n <<= 1) {
      System.arraycopy(array, 0, array, n, n);
    }
  //最后一次拷贝,拷贝长度是size-n
    System.arraycopy(array, 0, array, n, size - n);
    return new String(array);
  }

  /**
   *计算两个字符串的公共前缀
   */
  public static String commonPrefix(CharSequence a, CharSequence b) {
    checkNotNull(a);
    checkNotNull(b);
    int maxPrefixLength = Math.min(a.length(), b.length());
    int p = 0;
    while (p < maxPrefixLength && a.charAt(p) == b.charAt(p)) {
      p++;
    }
    if (validSurrogatePairAt(a, p - 1) || validSurrogatePairAt(b, p - 1)) {
      p--;
    }
    return a.subSequence(0, p).toString();
  }

  /**
   * 同commonPrefix,只不过求的是公共后缀
   */
  public static String commonSuffix(CharSequence a, CharSequence b) {
    checkNotNull(a);
    checkNotNull(b);

    int maxSuffixLength = Math.min(a.length(), b.length());
    int s = 0;
    while (s < maxSuffixLength
        && a.charAt(a.length() - s - 1) == b.charAt(b.length() - s - 1)) {
      s++;
    }
    if (validSurrogatePairAt(a, a.length() - s - 1)
        || validSurrogatePairAt(b, b.length() - s - 1)) {
      s--;
    }
    return a.subSequence(a.length() - s, a.length()).toString();
  }

  @VisibleForTesting
  static boolean validSurrogatePairAt(CharSequence string, int index) {
    return index >= 0 && index <= (string.length() - 2)
        && Character.isHighSurrogate(string.charAt(index))
        && Character.isLowSurrogate(string.charAt(index + 1));
  }
}

 

转载于:https://www.cnblogs.com/lijia0511/p/5778291.html

### Definition and Implementation of Equivalent Strings in Programming In programming, **equivalent strings** refer to two or more string values that are considered equal under specific conditions. These conditions may involve case sensitivity, encoding formats, whitespace handling, or other transformations such as normalization[^1]. Below is a detailed explanation of how equivalent strings can be defined and implemented. #### Case Sensitivity Strings might be treated as equivalent regardless of their letter casing. In many languages, this comparison involves converting both strings into either uppercase or lowercase before comparing them. ```python def are_strings_equivalent_case_insensitive(str1, str2): return str1.lower() == str2.lower() ``` This approach ensures that differences like 'A' vs 'a' do not affect equivalence checks[^3]. #### Encoding Formats When dealing with different encodings (e.g., UTF-8, ASCII), ensuring proper decoding prior to comparison becomes essential. Incorrectly handled byte sequences could lead to mismatches even when characters appear identical visually. ```cpp std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter; std::string utf8_str = converter.to_bytes(wide_char_string); // Now compare `utf8_str` against another properly encoded string value. ``` Such conversions allow accurate comparisons across various character sets[^4]. #### Normalization Forms Unicode provides multiple ways to represent certain symbols due to combining marks or precomposed forms. To ensure true equivalency between seemingly alike but differently represented texts requires applying standard normalizations first. ```java import java.text.Normalizer; public boolean checkNormalizedEquivalence(String s1, String s2){ return Normalizer.normalize(s1, Form.NFKC).equals(Normalizer.normalize(s2, Form.NFKC)); } ``` By utilizing NFC/NFD/NFKC/NFKD standards provided by libraries within respective environments helps achieve consistent results during evaluations involving complex scripts[^2]. #### Whitespace Handling Ignoring leading/trailing spaces along trimming internal redundant ones also contributes towards defining equality among textual data elements. ```javascript function trimAndCompare(a,b){ const trimmedA=a.replace(/\s+/g,' ').trim(); const trimmedB=b.replace(/\s+/g,' ').trim(); return trimmedA===trimmedB; } ``` Here regex operations assist cleaning up unnecessary gaps which otherwise would prevent correct identification despite meaningful content being same. --- §§ 1. How does Unicode normalization impact performance while checking for equivalent strings? 2. What techniques exist beyond simple case folding to handle locale-specific variations in string matching algorithms? 3. Can you provide examples where ignoring diacritical marks leads to incorrect conclusions about string similarity? 4. Are there any built-in functions available directly inside popular databases supporting normalized text searches out-of-the-box without requiring additional coding efforts from developers side ? 5. Discuss potential pitfalls associated with multithreaded applications performing simultaneous modifications over shared mutable string objects used later on for determining equivalences .
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值