排除Transformation Errors

本文探讨了Session运行过程中因TransformationErrors而导致的性能下降问题,并分析了其产生的原因,如字段类型转换错误、Mapping逻辑冲突等。文章还介绍了如何通过设置最大错误记录数及调整Tracinglevel来缓解这一问题。

当运行session时,会产生大量的Transformation Errors,这些Error会导致性能变慢

1 Transformation Errors导致性能降低的原因
  当有大量记录有Transformation Errors时,Integration Service会进行如下操作而降低性能
    1) 查明引起error的原因
    2) 从数据流中将有error的记录排除
    3) 将该error数据写入session log2 什么情况下会导致Transformation Errors
  Transformation errors通常会在如下情况发生,若错误集中在某些特定的组件,就要仔细评估它们的约束条件
    1) 字段类型转换错误
    2) mapping中有逻辑冲突
    3) 条件设定错误(里面有null值)3 关于最大error记录数限制
  关于Transformation Errors是有最大数量限制的
  当未设置最大error记录数限制时,则Integration Service会一直处理下去直到所有数据处理完毕
  当设置最大error记录数限制时,则Integration Service在处理中,当发现超过这个限制时,会停止session

4 降低session的tracing level
  当数据有大量包含Transformation error的记录时,也可通过降低session的tracing level实现性能提升
  即通过减少写入session log的内容(主要是error记录信息)来节省I/O时间,从而整体上减少运行所花费时间
  但这种方式并不能根本解决性能,因此不推荐使用

TypeError Traceback (most recent call last) Cell In[3], line 101 99 # 主函数 100 if __name__ == "__main__": --> 101 main() 103 # 可视化结果 104 plt.figure(figsize=(14, 6)) Cell In[2], line 26, in main() 23 y_train, y_test = y[:split_index], y[split_index:] 25 # 数据归一化 ---> 26 X_train_scaled, X_test_scaled, scaler = normalize_data(X_train, X_test) 28 # 构建模型 29 input_shape = (X_train_scaled.shape[1], X_train_scaled.shape[2]) Cell In[3], line 76, in normalize_data(X_train, X_test) 74 # 创建并拟合归一化器 - 仅使用数值数据 75 scaler = MinMaxScaler(feature_range=(0, 1)) ---> 76 X_train_scaled = scaler.fit_transform(X_train_reshaped) 77 X_test_scaled = scaler.transform(X_test_reshaped) 79 # 恢复原始形状 (samples, timesteps, features) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\utils\_set_output.py:316, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs) 314 @wraps(f) 315 def wrapped(self, X, *args, **kwargs): --> 316 data_to_wrap = f(self, X, *args, **kwargs) 317 if isinstance(data_to_wrap, tuple): 318 # only wrap the first output for cross decomposition 319 return_tuple = ( 320 _wrap_data_with_container(method, data_to_wrap[0], X, self), 321 *data_to_wrap[1:], 322 ) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\base.py:894, in TransformerMixin.fit_transform(self, X, y, **fit_params) 879 warnings.warn( 880 ( 881 f"This object ({self.__class__.__name__}) has a `transform`" (...) 889 UserWarning, 890 ) 892 if y is None: 893 # fit method of arity 1 (unsupervised transformation) --> 894 return self.fit(X, **fit_params).transform(X) 895 else: 896 # fit method of arity 2 (supervised transformation) 897 return self.fit(X, y, **fit_params).transform(X) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\preprocessing\_data.py:454, in MinMaxScaler.fit(self, X, y) 452 # Reset internal state before fitting 453 self._reset() --> 454 return self.partial_fit(X, y) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\base.py:1365, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs) 1358 estimator._validate_params() 1360 with config_context( 1361 skip_parameter_validation=( 1362 prefer_skip_nested_validation or global_skip_validation 1363 ) 1364 ): -> 1365 return fit_method(estimator, *args, **kwargs) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\preprocessing\_data.py:494, in MinMaxScaler.partial_fit(self, X, y) 491 xp, _ = get_namespace(X) 493 first_pass = not hasattr(self, "n_samples_seen_") --> 494 X = validate_data( 495 self, 496 X, 497 reset=first_pass, 498 dtype=_array_api.supported_float_dtypes(xp), 499 ensure_all_finite="allow-nan", 500 ) 502 device_ = device(X) 503 feature_range = ( 504 xp.asarray(feature_range[0], dtype=X.dtype, device=device_), 505 xp.asarray(feature_range[1], dtype=X.dtype, device=device_), 506 ) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\utils\validation.py:2954, in validate_data(_estimator, X, y, reset, validate_separately, skip_check_array, **check_params) 2952 out = X, y 2953 elif not no_val_X and no_val_y: -> 2954 out = check_array(X, input_name="X", **check_params) 2955 elif no_val_X and not no_val_y: 2956 out = _check_y(y, **check_params) File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\utils\validation.py:1053, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_all_finite, ensure_non_negative, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name) 1051 array = xp.astype(array, dtype, copy=False) 1052 else: -> 1053 array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp) 1054 except ComplexWarning as complex_warning: 1055 raise ValueError( 1056 "Complex data not supported\n{}\n".format(array) 1057 ) from complex_warning File ~\anaconda3\envs\myenv\Lib\site-packages\sklearn\utils\_array_api.py:757, in _asarray_with_order(array, dtype, order, copy, xp, device) 755 array = numpy.array(array, order=order, dtype=dtype) 756 else: --> 757 array = numpy.asarray(array, order=order, dtype=dtype) 759 # At this point array is a NumPy ndarray. We convert it to an array 760 # container that is consistent with the input's namespace. 761 return xp.asarray(array) TypeError: float() argument must be a string or a real number, not 'Timestamp'
最新发布
11-04
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值