e-s0bic1.exe

本文档提供了详尽的进程知识库信息,涵盖了从系统进程到应用程序进程的全面内容。每个进程条目都包含了进程文件名、描述、出品者、是否为系统进程等详细信息,有助于读者了解各进程的功能及其在系统中的作用。
  href="../../../../style.css" type="text/css" rel="stylesheet" />  


进程知识库

e-s0bic1 - e-s0bic1.exe - 进程信息 name="google_ads_frame" marginwidth="0" marginheight="0" src="http://pagead2.googlesyndication.com/pagead/ads?client=ca-pub-5572165936844014&dt=1195292630218&lmt=1195292627&format=336x280_as&output=html&correlator=1195292630218&url=file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Flhh1%2F%E6%A1%8C%E9%9D%A2%2F000stthk.exe.htm&color_bg=FFFFFF&color_text=000000&color_link=000000&color_url=FFFFFF&color_border=FFFFFF&ad_type=text&ga_vid=1397507768.1195292630&ga_sid=1195292630&ga_hid=1390572954&flash=9&u_h=768&u_w=1024&u_ah=740&u_aw=1024&u_cd=32&u_tz=480&u_java=true" frameborder="0" width="336" scrolling="no" height="280" allowtransparency="allowtransparency">

进程文件: e-s0bic1 或者 e-s0bic1.exe
进程名称: Epson Stylus C62 Series
 
描述:
e-s0bic1.exe是爱普生Epson Stylus打印机驱动程序。

出品者: Epson
属于: Epson Stylus printers drivers

系统进程:
后台程序:
使用网络:

硬件相关:
常见错误: 未知N/A

内存使用: 未知N/A
 
安全等级 (0-5): 0

间谍软件:
广告软件:
病毒:
木马:

其它进程列表

 Other Processes  
 00thotkey.exe 12popup.exe 2portalmon.exe
 3capplnk.exe 3cdminic.exe 3cmcnkw.exe
 3cmlink.exe 3cmlinkw.exe 3cmlnkw.exe
 3deepctl.exe 3dfxman.exe 3dldemon.exe
 3dlman.exe 3dxsrv.exe 3qdctl.exe
 a4proxy.exe absr.exe abyssws.exe
 acbtnmgr_xxx.exe accelerate.exe access.exe
 acmonitor_xxx.exe acombo3d.exe aconti.exe
 acoustic.exe acroaum.exe acrobat elements.exe
 acrodist.exe act.exe actionagent.exe
 activation.exe activeeyes.exe activemenu.exe
 activeplus.exe ad-aware.exe ad-watch.exe
 ad2kclient.exe adblck.exe adc.exe
 addestroyer.exe addrbook.exe adg.exe
 AdManCtl.exe admillikeep.exe admilliserv.exe
 admlib32.exe admunch.exe Adobe Gamma Loader.exe
 adobes.exe adp.exe adsgone.exe
 adss.exe adstatserv.exe adsub.exe
 adtray.exe advapi.exe advchk.exe
 aeiwlsta.exe aelaunch.exe aexswdusr.exe
 agfaclnk.exe agsatellite.exe ahfp.exe
 ahnsd.exe ahqinit.exe ahqtb.exe
 aiepk.exe aiepk2.exe aimaol.exe
 aimingclick.exe airplus.exe airsvcu.exe
 akiller.exe alarm.app.exe AlarmApp.exe
 alarmwatcher.exe ALCMTR.EXE ALCWZRD.EXE
 alcxmntr.exe aldaemon.exe alertserver.exe
 AlertSvc.exe almxptray.exe ALUNotify.exe
 am32.exe amoumain.exe anote.exe
 AOLServiceHost.exe  aoltpspd.exe aoltray.exe
 aoltsmon.exe aornum.exe ap9h4qmo.exe
 Apache.exe appservices.exe aptezbp.exe
 apvxdwin.exe AquariumDesktop.exe armon32.exe
 armon32a.exe arupld32.exe ashDisp.exe
 ashmaisv.exe ashserv.exe asmproserver.exe
 aspi_me.exe  aspnet_admin.exe asupport.exe
 aswupdsv.exe athoc.exe ati2mdxx.exe
 ati2plab.exe aticwd32.exe ATIDtct.EXE
 atitask.exe atwtusb.exe ausvc.exe
 autoreg.exe autorun.exe autoupdate.exe
 avgamsvr.exe avgcc.exe avgcc32.exe
 avgemc.exe avgnt.exe avguard.exe
 avpm.exe avsched32.exe avwupsrv.exe
 awhost32.exe awrem32.exe babylon.exe
 BackItUp.exe backupnotify.exe bagent.exe
 bass.exe bboy.exe bcb.exe
 bcmwltry.exe bdmcon.exe bdss.exe
 bearshare.exe benetns.exe bengine.exe
 benser.exe beremote.exe beserver.exe
 bigfix.exe billmind.exe bkupexec.exe
 blocker.exe bmupdate.exe boot.exe
 bp.exe bpftp.exe bpk.exe
 bpsinstall.exe brad32.exe brmecom.exe
 bsoft.exe btstackserver.exe bttray.exe
 bwgo0000 cagent.exe cagent32.exe
 calcheck.exe CAP2LAK.EXE CAP2RSK.EXE
 CAP2SWK.EXE capfax.exe cc.exe
 cclaw.exe ccmagent.exe cconnect.exe
 ccsetmgr.exe cd_load.exe cdf.exe
 cdspeed.exe ceekey.exe cepmtray.exe
 cgtask.exe chkadmin.exe chkras.exe
 choke.exe cij3p2ps.exe client.exe
 clipsrv.exe clonecdtray.exe cmgrdian.exe
 cmluc.exe cnqmax.exe CnxMon.exe
 comctl_32.exe comet.exe command.exe
 compaq-rba.exe conf.exe config.exe
 conmgr.exe ConquerCam.exe consol32.exe
 cool.exe CoverDesigner.exe cpdclnt.exe
 cpqa1000.exe cpqdfwag.exe cpqeadm.exe
 cpqinet.exe Cpqset.exe cpunumber.exe
 createcd.exe createcd50.exe crsss.exe
 csinject.exe csinsm32.exe csinsmnt.exe
 Csrrs.exe csrsc.exe csrss32.exe
 ct_load.exe ctbclick.exe ctdvddet.exe
 cteaxspl.exe ctfmon32.exe ctrlvol.exe
 ctsrreg.exe ctsysvol.exe cusrvc.exe
 cuteftp.exe cutftp.exe cyb2k.exe
 cygrunsrv.exe cz.exe d4.exe
 daconfig.exe daemon.exe datalayer.exe
 ddhelper32.exe de_serv.exe defscangui.exe
 delldmi.exe dellmmkb.exe delmsbb.exe
 desk98.exe DeskAdKeep.exe DeskAdServ.exe
 dexplore.exe diagent.exe dialer.exe
 directx.exe directxset.exe dla.exe
 dlgli.exe dlt.exe dluca.exe
 dmremote.exe dmxlauncher.exe dnar.exe
 dnetc.exe dns.exe download.exe
 downloadplus.exe dragdrop.exe dreamweaver.exe
 drgtodsc.exe drivespeed.exe drvddll.exe
 drvlsnr.exe DRWTSN16.EXE dsagnt.exe
 dseraser.exe dslagent.exe dslmon.exe
 dsnthapp.exe dsnthser.exe dvdlauncher.exe
 DVDRegionFree.exe dvldr32.exe dvremind.exe
 DWHeartbeatMonitor.exe DxDebugService.exe DXEnum.exe
 dxnf.exe e-s0bic1.exe e_s0hic1.exe
 e_srcv03.exe eabservr.exe EasyAV.exe
 ebrr.exe edisk.exe edonkey.exe
 ee.exe ehmsas.exe ehrec.exe
 ehSched.exe ehshell.exe ehtray.exe
 elbycheck.exe elccest.exe emule.exe
 enbiei.exe encmontr.exe engutil.exe
 ensmix32.exe enternet.exe essdc.exe
 eudora.exe eusexe.exe EvtEng.exe
 expl32.exe explorer32.exe explorere.exe
 express.exe exshow95.exe ezejmnap.exe
 ezulumain.exe fameh32.exe fan.exe
 farmmext.exe fastdown.exe faxsvc.exe
 fbdirect.exe fc.exe fch32.exe
 fgadmin.exe fih32.exe finder.exe
 flashfxp.exe flashksk.exe flatbed.exe
 fnrb32.exe FONTVIEW.EXE forte.exe
 fpdisp4.exe fpxpress.exe frameworkservic.exe
 freedom.exe frontpage.exe frsk.exe
 fs20.exe fsaa.exe fsav32.exe
 fsbwlan.exe fsdfwd.exe fsg.exe
 fsg_3202.exe fsgk32.exe fsgk32st.exe
 fsm32.exe fsma32.exe fsmb32.exe
 fsscrctl.exe fssm32.exe fsw.exe
 ftpte.exe fts.exe fwenc.exe
 fxredir.exe gah95on6.exe gain_trickler_3202.exe
 gbtray.exe gcASCleaner.exe gcasDtServ.exe
 gcasInstallHelper.exe gcASNotice.exe gcasServ.exe
 gcasServAlert.exe gcasSWUpdater.exe gdonkey.exe
 gesfm32.exe gfxacc.exe Ghostexp.exe
 GHOSTS~2.EXE ghoststartservice.exe ghoststarttrayapp.exe
 giantantispywaremain.exe GIANTAntiSpywareUpdater.exe gnetmous.exe
 gnotify.exe go.exe GoogleDesktop.exe
 gozilla.exe gra.exe graph.exe
 GrpWise.exe gsicon.exe gstartup.exe
 gtwatch.exe gwmdmpi.exe gwsystemservice.exe
 hcontrol.exe helpexp.exe HelpHost.exe
 helpsvc.exe hhw.exe hidden32.exe
 hjym.exe hkserv.exe hkss.exe
 hkwnd.exe hotkeyapp.exe hotsync.exe
 hottray.exe hpbpro.exe hpdrv.exe
 hphmon03.exe hphmon04.exe hphmon06.exe
 hphupd04.exe hphupd05.exe hphupd06.exe
 hpnra.exe hpobnz08.exe hpoddt01.exe
 hpodev07.exe hpoevm07.exe hpohmr08.exe
 hpoopm07.exe hposol08.exe hpqcmon.exe
 hpqgalry.exe hpsjvxd.exe hpwuschd.exe
 HPWuSchd2.exe hpzstatn.exe hpztsb03.exe
 hpztsb09.exe hpztsb10.exe htmdeng.exe
 hypertrm.exe i8kfangui.exe iaanotif.exe
 iaantmon.exe ibmpmsvc.exe iconfig.exe
 icqlite.exe icsmgr.exe icwconn2.exe
 icwtutor.exe iexpiore.exe iexplore32.exe
 iFrmewrk.exe  igfxsrvc.dll ImageDrive.exe
 IMApp.exe IMEKRMIG.EXE imjpmig.exe
 imonnt.exe imontray.exe imscinst.exe
 incd.exe InCDsrv.exe IncMail.exe
 incredimail.exe inetd32.exe InfoTool.exe
 inicio.exe initsdk.exe inotask.exe
 IntelMEM.exe internet.exe ipclient.exe
 ipssvc.exe ireike.exe isignup.exe
 islp2sta.exe ismserv.exe isstart.exe
 itouch.exe iwctrl.exe ixapplet.exe
 JAMMER2ND.EXE javaws.exe jetcar.exe
 jucheck.exe jushed.exe jushed32.exe
 kavsvc.exe kazaalite.exe KB891711.EXE
 kencapi.exe kencli.exe kencron.exe
 kendns.exe kenftpgw.exe keninet.exe
 kenmail.exe kenmap.exe kenproxy.exe
 kenserv.exe kensocks.exe kentbcli.exe
 kernal32.exe keyhook.exe keylogger.exe
 keyword.exe KHALMNPR.exe khooker.exe
 kmw_run.exe kodakccs.exe kodakimage.exe
 kodakprv.exe kodorjan.exe kpf4gui.exe
 lao.exe launch.exe launchap.exe
 launcher.exe launchpd.exe leerlaufprozess
 lexplore.exe lexstart.exe lights.exe
 lmgrd.exe lmpdpsrv.exe load32.exe
 logitray.exe logon.exe lorena.exe
 LSAS.exe Lsass32.exe lsassa.exe
 lsasss.exe lsserv.exe ltcm000c.exe
 ltdmgr.exe ltmoh.exe ltmsg.exe
 lxdboxcp.exe main.exe mainserv.exe
 manager.exe mapiicon.exe master.exe
 matcli.exe mathchk.exe mbm4.exe
 mbm5.exe mc.exe mcagent.exe
 mcappins.exe mcdlc.exe McEPOC.exe
 McEPOCfg.exe mcinfo.exe mcmnhdlr.exe
 mcpalmcfg.exe mcpserver.exe mcupdate.exe
 mcvsshld.exe McWCE.exe McWCECfg.exe
 mediaaccess.exe MediaAccK.exe mediaman.exe
 mediapass.exe mediapassk.exe members-area.exe
 memorymeter.exe messenger.exe mgactrl.exe
 mgaqdesk.exe mgasc.exe mgavrtcl.exe
 mgui.exe mhotkey.exe microsoft.exe
 mim.exe minibug.exe minilog.exe
 mirc.exe MIRC32.exe mm_server.exe
 mmdiag.exe mmtray.exe mmtray2k.exe
 mmtraylsi.exe mmups.exe mmusbkb2.exe
 mnsvc.exe mnyexpr.exe monitor.exe
 monitr32.exe morpheus.exe moviemk.exe
 movieplace.exe Mozilla.exe mp3serch.exe
 mpbtn.exe mpf.exe mpfagent.exe
 mpfservice.exe mpftray.exe mpservic.exe
 mpsetup.exe mqtgsvc.exe msaccess.exe
 msams.exe msc32.exe mscifapp.exe
 mscnsz.exe mscommand.exe msconfig32.exe
 mscvb32.exe MSD.EXE mse7.exe
 msg32.exe msgloop.exe msgplus.exe
 mskagent.exe msmgs.exe msndc.exe
 MSNIASVC.EXE msoffice.exe mspmspv.exe
 mspub.exe msqry32.exe msscli.exe
 mssearch.exe msstat.exe mssvr.exe
 mstore.exe MSupdate.exe msvcmm32.exe
 mtx.exe muamgr.exe musirc4.71.exe
 mwd.exe mxoaldr.exe mxtask.exe
 myfastupdate.exe mysqld-nt.exe nail.exe
 naimag32.exe navapp.exe nbj.exe
 nbr.exe nclaunch.exe nddeagnt.exe
 NDSTray.exe neo.exe neoCapture.exe
 neoCopy.exe neoDVD.exe neoDVDstd.exe
 neotrace.exe nero.exe nerosmartstart.exe
 nerosvc.exe netmail.exe  netsurf.exe
 newdot.exe newsupd.exe ngctw32.exe
 nilaunch.exe NIP.exe nipsvc.exe
 NJeeves.exe nkvmon.exe noads.exe
 notify.exe npfmntor.exe NPFMSG.exe
 npscheck.exe npssvc.exe NRMENCTB.exe
 nscheck.exe nsl.exe NSMdtr.exe
 NSMdtr.exe nsvr.exe nsvsvc.exe
 nt_usdm.exe ntaskldr.exe ntfrs.exe
 ntmulti.exe ntrtscan.exe NVCOAS.exe
 NVCPL.EXE NVCSched.exe nvmctray
 nvsvc.exe nwtray.exe Nymse.exe
 ocxdll.exe oeloader.exe ois.exe
 Olehelp.exe Omniserv.exe onetouch.exe
 oodag.exe opera.exe opware12.exe
 owmngr.exe P2P Networking2.exe P2P Networking3.exe
 pacis.exe PACKAGER.EXE packethsvc.exe
 pav.exe pavfires.exe pavsrv51.exe
 pcard.exe pccclient.exe pccguide.exe
 pccnt.exe pccntmon.exe pccntupd.exe
 PcCtlCom.exe pcfmgr.exe pchbutton.exe
 pcscan.exe PCTVoice.exe pdsched.exe
 PDVDServ.exe persfw.exe pg_ctl.exe
 pgptray.exe phbase.exe photoshop.exe
 picsvr.exe pinball.exe pkjobs.exe
 plauto.exe player.exe plguni.exe
 pmr.exe pmxinit.exe pop3pack.exe
 popupkiller.exe postgres.exe postmaster.exe
 pow.exe powerdvd.exe powerkey.exe
 powerpnt.exe powers.exe ppmemcheck.exe
 pptd40nt.exe pptview.exe ppwebcap.exe
 pqhelper.exe PQIBrowser.exe PQIMountSvc.exe
 pqinit.exe pqtray.exe PQV2ISECURITY.EXE
 pqv2isvc.exe printnow.exe PRISMSTA.EXE
 PRISMSVR.EXE profiler.exe proflwiz.exe
 pruttct.exe psdrvcheck.exe psimsvc.exe
 pssvc.exe pts.exe ptssvc.exe
 pull.exe PureVoice.exe pvlsvr.exe
 qbdagent2002.exe qbupdate.exe qclean.exe
 qconsvc.exe qctray.exe qcwlicon.exe
 qdcsfs.exe Qoeloader.exe qtaet2s.exe
 qtask.exe qtzgacer.exe QuickBooks
 quickdcf.exe quickres.exe quicktimeplayer.exe
 qvp32.exe Radio.exe RadioSvr.EXE
 randomdigits.exe rapapp.exe rasman.exe
 RAVMOND.exe rcapi.exe rds.exe
 reader_sl.exe realjbox.exe realmon.exe
 realpopup.exe realshed.exe register.exe
 regloadr.exe regsrv.exe RegSrvc.exe
 regsvc32.exe remind.exe Remind_XP.exe
 remind32.exe reminder.exe removed.exe
 remupd.exe retrorun.exe rftray.exe
 rlid.exe RM_SV.exe  rosnmgr.exe
 rsrcmtr.exe rtlrack.exe rtmanager.exe
 rtmc.exe rtmservice.exe rtos.exe
 rtvscn95.exe runservice.exe s24evmon.exe
 s3tray2.exe sage.exe saimon.exe
 saproxy.exe sbdrvdet.exe sbserv.exe
 sbsetup.exe scanexplicit.exe ScanMailOutLook.exe
 scanregistry.exe scanserver.exe scards32.exe
 scardsvr32.exe scbar.exe sccenter.exe
 scchost.exe SCHDPL32.EXE schedhlp.exe
 schedul2.exe scheduler.exe schedulerv2.exe
 schost.exe schupd.exe scm.exe
 scrfs.exe scsiaccess.exe sdii.exe
 sdstat.exe se.exe searchnav.exe
 searchnavversion.exe sentstrt.exe service5.exe
 sessmgr.exe sethook.exe seti@home.exe
 setlang.exe sgbhp.exe sgmain.exe
 shadowbar.exe sharedprem.exe shell32.exe
 shine.exe shpc32.exe shstart.exe
 shwicon.exe silent.exe SIMETER.EXE
 sistray.exe sisusbrg.exe sixtypopsix.exe
 ska.exe skynetave.exe Skype.exe
 slee401.exe sllights.exe slpv24s.exe
 slserv.exe sm56hlpr.exe smagent.exe
 smartagt.exe SmartExplorer.exe SmartFTP.exe
 SMax4.exe SMax4PNP.exe SMceMan.exe
 smlogsvc.exe smOutlookPack.exe sms.exe
 smsmon32.exe smsss.exe smsx.exe
 smtray.exe sndrec32.exe sniffer.exe
 snmptrap.exe soffice.exe sointgr.exe
 SonicStageMonitoring.exe soundtrax.exe spamsub.exe
 SpamSubtract.exe SPBBCSvc.exe speedmgr.exe
 speedupmypc.exe splash.exe spool.exe
 spoolsrv.exe SpoolSvc.exe sptip.dll
 spvic.exe spyagent4.exe spyblast.exe
 spybotsd.exe spybuddy.exe spysweeper.exe
 spyware.exe sqlagent.exe sqlmangr.exe
 sqlservr.exe srv32.exe ss.exe
 ssgrate.exe sshd.exe ssonsvr.exe
 sstray.exe Stacmon.exe StatusClient.exe
 supporter5.exe surveyor.exe suss.exe
 svaplayer.exe svchoost.exe svchos1.exe
 svchosl.exe svcinit.exe svcproc.exe
 svhost.exe SwiftBTN.exe swimsuitnetwork.exe
 sws.exe sxgdsenu.exe symlcsvc.exe
 SymSPort.exe synchost.exe sysagent.exe
 syscfg32.exe syscnfg.exe sysformat.exe
 syshost.exe syslog.exe sysmon.exe
 sysmonnt.exe sysreg.exe SYSsfitb.exe
 systask32l.exe systemdll.exe systime.exe
 systray32.exe Szchost.exe tapicfg.exe
 task32.exe taskbar.exe taskpanl.exe
 taskswitch.exe tbctray.exe TBMon.exe
 tbpanel.exe tc.exe termsrv.exe
 testing.exe tfncky.exe tfnf5.exe
 The Weather Channel.exe thotkey.exe timershot.exe
 timeup.exe tintsetp.exe tmksrvi.exe
 tmksrvu.exe tmpfw.exe toadimon.exe
 topdesk.exe toscdspd.exe TouchED.exe
 tp4ex.exe tp4mon.exe tp4serv.exe
 tphkmgr.exe TPONSCR.exe TpScrex.exe
 tpsmain.exe TPTray.exe tpwrtray.exe
 tray.exe trayclnt.exe traymon.exe
 traymonitor.exe traysaver.exe trayserver.exe
 tskdbg.exe tskmgr32.exe tsl2.exe
 tsp2.exe tstool.exe tv_media.exe
 twunk_64.exe uc.exe udserve.exe
 unins000.exe unsecapp.exe upd.exe
 upgrade.exe ups.exe usb.exe
 usbmmkbd.exe usbmonit.exe usrmlnka.exe
 utilman.exe v2iconsole.exe vaserv.exe
 vettray.exe vi_grm.exe videodrv.exe
 view.exe viewport.exe virtualbouncer.exe
 visio.exe vmnat.exe vmss.exe
 vobregcheck.exe vsmain.exe vsserv.exe
 vssvc.exe VzFw.exe w3dbsmgr.exe
 watch.exe watchdog.exe WaveEdit.exe
 wbload.exe wbsched.exe wbss.exe
 wbutton.exe wcesmgr.exe weather.exe
 webcamrt.exe webcolct.exe webinstall.exe
 webtrapnt.exe welcome.exe wfxctl32.exe
 wfxsnt40.exe wfxswtch.exe whagent.exe
 whSurvey.exe win32api.exe WinAce.exe
 winadm.exe winadserv.exe winadslave.exe
 Winaw32.exe winbackup.exe WinCinemaMgr.exe
 wincomm.exe wincomp.exe winctlad.exe
 WinCtlAdAlt.exe winde.exe windefault.exe
 winex.exe winfs.exe wingate.exe
 winhelp.exe winhlp32.exe winhost.exe
 wininfo.exe winkey.exe winlog.exe
 winmgm32.exe winmysqladmin.exe Winpack.exe
 winproj.exe winproxy.exe winpsd.exe
 winpup32.exe winrecon.exe winroute.exe
 wins.exe winserv.exe winservad.exe
 winservices.exe winservs.exe winservsuit.exe
 winsocks.exe winsrv32.exe winstat.exe
 winsys.exe winsys32.exe wintask.exe
 wintasks.exe winvnc.exe winwan.exe
 wiseupdt.exe wkdetect.exe wkfud.exe
 wkqkpick.exe wkscal.exe wkssb.exe
 wlansta.exe wmburn.exe wmencagt.exe
 wmexe.exe wmiapsrv.exe WMPBurn.exe
 wntsf.exe workflow.exe wp.exe
 wptel.exe wsys.exe wuloader.exe
 wupdated.exe ww.exe x10nets.exe
 x1exec.exe xcommsvr.exe xferwan.exe
 xtcfgloader.exe ymsgr_tray.exe ydownloader.exe
 YServer.exe yupdater.exe Zanda.exe
 zbase32.exe zcast.exe zClientm.exe
 ZLH.exe ZStatus.exe


Other Process Categories:
- 系统进程列表
- 存在安全风险进程列表
- 应用程序进程列表

 

 

 

©Copyright 2000-2005 小土豆。版权所有。
  

C:\Users\IT\PyCharmMiscProject\.venv\Scripts\python.exe C:\Users\IT\PyCharmMiscProject\数模\1.py 中孕期数据记录数: 208 ================================================== 场景1: 静态关联模型 (中孕期数据) ================================================== Y染色体浓度描述统计: count 208.000000 mean 0.077614 std 0.033415 min 0.020269 25% 0.049692 50% 0.075392 75% 0.105557 max 0.172270 Name: Y染色体浓度, dtype: float64 Y染色体浓度缺失值数量: 0 相关系数矩阵: Y染色体浓度 检测孕周 孕妇BMI 年龄 Y染色体浓度 1.000000 -0.139445 -0.146805 -0.114173 检测孕周 -0.139445 1.000000 0.025377 -0.062409 孕妇BMI -0.146805 0.025377 1.000000 -0.039744 年龄 -0.114173 -0.062409 -0.039744 1.000000 多元线性回归模型摘要: OLS Regression Results ============================================================================== Dep. Variable: Y染色体浓度 R-squared: 0.057 Model: OLS Adj. R-squared: 0.043 Method: Least Squares F-statistic: 4.076 Date: Fri, 05 Sep 2025 Prob (F-statistic): 0.00772 Time: 20:38:02 Log-Likelihood: 418.35 No. Observations: 208 AIC: -828.7 Df Residuals: 204 BIC: -815.4 Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.2193 0.041 5.396 0.000 0.139 0.299 检测孕周 -0.0032 0.002 -2.109 0.036 -0.006 -0.000 孕妇BMI -0.0017 0.001 -2.178 0.031 -0.003 -0.000 年龄 -0.0012 0.001 -1.892 0.060 -0.002 4.97e-05 ============================================================================== Omnibus: 18.192 Durbin-Watson: 1.330 Prob(Omnibus): 0.000 Jarque-Bera (JB): 7.476 Skew: 0.207 Prob(JB): 0.0238 Kurtosis: 2.169 Cond. No. 839. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. ================================================== 场景2: 动态关联模型 (混合效应模型) ================================================== 有多条记录的孕妇数量: 229 纵向数据集记录数: 569 每个孕妇记录数分布: count 2 118 3 111 Name: count, dtype: int64 C:\Users\IT\PyCharmMiscProject\.venv\Lib\site-packages\statsmodels\regression\mixed_linear_model.py:2237: ConvergenceWarning: The MLE may be on the boundary of the parameter space. warnings.warn(msg, ConvergenceWarning) 混合效应模型摘要: Mixed Linear Model Regression Results ======================================================= Model: MixedLM Dependent Variable: Y染色体浓度 No. Observations: 569 Method: REML No. Groups: 229 Scale: 0.0004 Min. group size: 2 Log-Likelihood: 1191.8935 Max. group size: 3 Converged: Yes Mean group size: 2.5 ------------------------------------------------------- Coef. Std.Err. z P>|z| [0.025 0.975] ------------------------------------------------------- Intercept 0.106 0.026 4.150 0.000 0.056 0.156 检测孕周 0.003 0.000 12.814 0.000 0.003 0.003 孕妇BMI -0.002 0.001 -2.746 0.006 -0.003 -0.000 年龄 -0.001 0.001 -1.456 0.145 -0.002 0.000 Group Var 0.001 0.006 ======================================================= ================================================== 模型系数和显著性 ================================================== 多元线性回归模型系数: const 0.219270 检测孕周 -0.003179 孕妇BMI -0.001650 年龄 -0.001185 dtype: float64 多元线性回归模型p值: const 1.879409e-07 检测孕周 3.617355e-02 孕妇BMI 3.053769e-02 年龄 5.985923e-02 dtype: float64 混合效应模型固定效应系数: Intercept 0.105902 检测孕周 0.002996 孕妇BMI -0.001704 年龄 -0.000822 dtype: float64 混合效应模型固定效应p值: Intercept 3.318273e-05 检测孕周 1.365837e-37 孕妇BMI 6.026270e-03 年龄 1.453525e-01 Group Var 1.605283e-11 dtype: float64 Traceback (most recent call last): File "C:\Users\IT\PyCharmMiscProject\数模\1.py", line 176, in <module> 'R²': [model.rsquared, mixed_model.rsquared], ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\IT\PyCharmMiscProject\.venv\Lib\site-packages\statsmodels\base\wrapper.py", line 34, in __getattribute__ obj = getattr(results, attr) ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'MixedLMResults' object has no attribute 'rsquared' 进程已结束,退出代码为 1 这是上面我发的那个代码的运行结果
09-06
C:\Users\1\PycharmProjects\pythonProject1\.venv\Scripts\python.exe C:\Users\1\PycharmProjects\pythonProject1\1.2text.py 方差分析警告: 'PandasData' object has no attribute 'design_info' 测试集R²分数: 0.115 测试集RMSE: 0.030 回归方程显著性检验(F检验) F统计量: 12.881 p值: 0.0000 结论: p值 < 0.05,回归方程具有统计显著性 方差分析表生成失败 线性回归模型摘要: OLS Regression Results ============================================================================== Dep. Variable: Y染色体浓度 R-squared: 0.056 Model: OLS Adj. R-squared: 0.052 Method: Least Squares F-statistic: 12.88 Date: Wed, 29 Oct 2025 Prob (F-statistic): 3.46e-12 Time: 10:36:18 Log-Likelihood: 2170.8 No. Observations: 1082 AIC: -4330. Df Residuals: 1076 BIC: -4300. Df Model: 5 Covariance Type: nonrobust ================================================================================= coef std err t P>|t| [0.025 0.975] --------------------------------------------------------------------------------- 孕妇BMI 0.0069 0.004 1.744 0.081 -0.001 0.015 检测孕周_天数 -0.0010 0.000 -2.144 0.032 -0.002 -8.51e-05 孕妇BMI^2 -0.0002 5.77e-05 -2.758 0.006 -0.000 -4.59e-05 孕妇BMI 检测孕周_天数 1.534e-05 1.14e-05 1.350 0.177 -6.96e-06 3.77e-05 检测孕周_天数^2 2.771e-06 1.34e-06 2.065 0.039 1.38e-07 5.4e-06 const 0.0422 0.075 0.563 0.573 -0.105 0.189 ============================================================================== Omnibus: 88.267 Durbin-Watson: 0.828 Prob(Omnibus): 0.000 Jarque-Bera (JB): 121.286 Skew: 0.657 Prob(JB): 4.60e-27 Kurtosis: 3.983 Cond. No. 1.23e+06 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.23e+06. This might indicate that there are strong multicollinearity or other numerical problems. 皮尔逊相关系数: 孕妇BMI: 相关系数=-0.151, p值=0.0000 检测孕周_天数: 相关系数=0.124, p值=0.0000 孕妇BMI^2: 相关系数=-0.157, p值=0.0000 孕妇BMI 检测孕周_天数: 相关系数=0.061, p值=0.0445 检测孕周_天数^2: 相关系数=0.130, p值=0.0000 预测结果示例(真实值 vs 预测值): 真实值: 0.088, 预测值: 0.077 真实值: 0.044, 预测值: 0.077 真实值: 0.084, 预测值: 0.081 真实值: 0.094, 预测值: 0.082 真实值: 0.033, 预测值: 0.078
最新发布
10-30
C:\Users\1\PycharmProjects\pythonProject1\.venv\Scripts\python.exe C:\Users\1\PycharmProjects\pythonProject1\text.py 数据维度: (1082, 3) 缺失值检查: 孕妇BMI 0 Y染色体浓度 0 检测孕周_天数 0 dtype: int64 多重共线性检验(VIF): VIF Factor features 0 2393.479493 孕周_天数 1 2250.903973 BMI 2 1583.605206 孕周_BMI 3 1540.640016 BMI_squared 4 438.475081 孕周_squared 回归结果: OLS Regression Results ============================================================================== Dep. Variable: Y染色体浓度 R-squared: 0.056 Model: OLS Adj. R-squared: 0.052 Method: Least Squares F-statistic: 12.88 Date: Mon, 27 Oct 2025 Prob (F-statistic): 3.46e-12 Time: 22:01:59 Log-Likelihood: 2170.8 No. Observations: 1082 AIC: -4330. Df Residuals: 1076 BIC: -4300. Df Model: 5 Covariance Type: nonrobust =============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------- Intercept 0.0422 0.075 0.563 0.573 -0.105 0.189 孕周_天数 -0.0010 0.000 -2.144 0.032 -0.002 -8.51e-05 BMI 0.0069 0.004 1.744 0.081 -0.001 0.015 孕周_BMI 1.534e-05 1.14e-05 1.350 0.177 -6.96e-06 3.77e-05 BMI_squared -0.0002 5.77e-05 -2.758 0.006 -0.000 -4.59e-05 孕周_squared 2.771e-06 1.34e-06 2.065 0.039 1.38e-07 5.4e-06 ============================================================================== Omnibus: 88.267 Durbin-Watson: 0.828 Prob(Omnibus): 0.000 Jarque-Bera (JB): 121.286 Skew: 0.657 Prob(JB): 4.60e-27 Kurtosis: 3.983 Cond. No. 1.23e+06 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.23e+06. This might indicate that there are strong multicollinearity or other numerical problems. 方差分析表: sum_sq df F PR(>F) 孕周_天数 0.004895 1.0 4.596593 0.032259 BMI 0.003239 1.0 3.041276 0.081458 孕周_BMI 0.001940 1.0 1.821341 0.177438 BMI_squared 0.008099 1.0 7.604892 0.005920 孕周_squared 0.004541 1.0 4.264244 0.039162 Residual 1.145897 1076.0 NaN NaN C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 27531 (\N{CJK UNIFIED IDEOGRAPH-6B8B}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 24046 (\N{CJK UNIFIED IDEOGRAPH-5DEE}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 20998 (\N{CJK UNIFIED IDEOGRAPH-5206}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 24067 (\N{CJK UNIFIED IDEOGRAPH-5E03}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 25311 (\N{CJK UNIFIED IDEOGRAPH-62DF}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 21512 (\N{CJK UNIFIED IDEOGRAPH-5408}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 20540 (\N{CJK UNIFIED IDEOGRAPH-503C}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:88: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans. plt.tight_layout() C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 27531 (\N{CJK UNIFIED IDEOGRAPH-6B8B}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 24046 (\N{CJK UNIFIED IDEOGRAPH-5DEE}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 20998 (\N{CJK UNIFIED IDEOGRAPH-5206}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 24067 (\N{CJK UNIFIED IDEOGRAPH-5E03}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 25311 (\N{CJK UNIFIED IDEOGRAPH-62DF}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 21512 (\N{CJK UNIFIED IDEOGRAPH-5408}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 20540 (\N{CJK UNIFIED IDEOGRAPH-503C}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\PycharmProjects\pythonProject1\text.py:89: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans. plt.savefig('model_diagnostics.png') C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 27531 (\N{CJK UNIFIED IDEOGRAPH-6B8B}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 24046 (\N{CJK UNIFIED IDEOGRAPH-5DEE}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 20998 (\N{CJK UNIFIED IDEOGRAPH-5206}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 24067 (\N{CJK UNIFIED IDEOGRAPH-5E03}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 25311 (\N{CJK UNIFIED IDEOGRAPH-62DF}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 21512 (\N{CJK UNIFIED IDEOGRAPH-5408}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 20540 (\N{CJK UNIFIED IDEOGRAPH-503C}) missing from font(s) DejaVu Sans. func(*args) C:\Users\1\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py:862: UserWarning: Glyph 22270 (\N{CJK UNIFIED IDEOGRAPH-56FE}) missing from font(s) DejaVu Sans. func(*args) Traceback (most recent call last): File "C:\Users\1\PycharmProjects\pythonProject1\text.py", line 161, in <module> plot_3d_surface(model, df) File "C:\Users\1\PycharmProjects\pythonProject1\text.py", line 113, in plot_3d_surface z = model.predict(pred_data).reshape(week_grid.shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\1\PycharmProjects\pythonProject1\.venv\Lib\site-packages\pandas\core\generic.py", line 6321, in __getattr__ return object.__getattribute__(self, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Series' object has no attribute 'reshape'. Did you mean: 'shape'? 进程已结束,退出代码为 1
10-28
C:\ProgramData\anaconda3\python.exe "C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py" 918条数据的年龄分箱分布: Age_bin 0 0 1 80 2 585 3 253 4 0 Name: count, dtype: int64 5110条数据的年龄分箱分布: Age_bin 0 966 1 1204 2 1564 3 1190 4 186 Name: count, dtype: int64 412条数据的年龄分箱分布: Age_bin 0 0 1 69 2 257 3 86 4 0 Name: count, dtype: int64 胆固醇分箱分布(标签0-1-2): Cholesterol_bin 0 318 1 237 2 363 Name: count, dtype: int64 胆固醇分箱分布(标签0-1-2): Cholesterol_bin 0 23 1 39 2 252 Name: count, dtype: int64 Sex ChestPainType FastingBS RestingECG ExerciseAngina ST_Slope HeartDisease RestingBP_bin MaxHR_bin Oldpeak_bin Age_bin Cholesterol_bin 0 M ATA 0 Normal N Up 0 1级高血压 高(85-100%) 无压低 2 2 1 F NAP 0 Normal N Flat 1 2级高血压 高(85-100%) 轻度压低 2 0 2 M ATA 0 ST N Up 0 升高前期 低(<60%) 无压低 1 2 3 F ASY 0 Normal Y Flat 1 1级高血压 中(60-85%) 中度压低 2 1 4 M NAP 0 Normal N Up 0 2级高血压 中(60-85%) 无压低 2 0 .. .. ... ... ... ... ... ... ... ... ... ... ... 913 M TA 0 Normal N Flat 1 正常 中(60-85%) 中度压低 2 2 914 M ASY 1 Normal N Flat 1 2级高血压 高(85-100%) 严重压低 3 0 915 M ASY 0 Normal Y Flat 1 升高前期 中(60-85%) 中度压低 2 0 916 F ATA 0 LVH N Flat 1 升高前期 极高(>100%) 无压低 2 1 917 M NAP 0 Normal N Up 0 1级高血压 高(85-100%) 无压低 1 0 [918 rows x 12 columns] ID N_Days Status Drug Sex Ascites Hepatomegaly Spiders Edema Bilirubin Albumin Stage Bilirubin_bin Albumin_bin Copper_bin Alk_Phos_bin SGOT_bin Tryglicerides_bin Platelets_bin Prothrombin_bin Age_bin Cholesterol_bin 0 1 400 D D-penicillamine F Y Y Y Y 14.5 2.60 4 显著升高 降低 升高 升高 升高 升高 正常 正常 2 2 1 2 4500 C D-penicillamine F N Y Y N 1.1 4.14 3 正常 正常 升高 升高 升高 正常 正常 正常 2 2 2 3 1012 D D-penicillamine M N N N S 1.4 3.48 4 轻度升高 降低 升高 升高 升高 正常 正常 正常 3 0 3 4 1925 D D-penicillamine F N Y Y S 1.8 2.54 4 轻度升高 降低 升高 升高 升高 正常 正常 正常 2 2 4 5 1504 CL Placebo F N Y Y N 3.4 3.53 3 显著升高 正常 升高 升高 升高 正常 减少 正常 1 2 .. ... ... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 407 414 681 D D-penicillamine F N Y N N 1.2 2.96 3 正常 降低 升高 升高 正常 升高 正常 正常 3 NaN 408 415 1103 C D-penicillamine F N Y N N 0.9 3.83 4 正常 正常 升高 升高 正常 升高 正常 正常 1 NaN 409 416 1055 C D-penicillamine F N Y N N 1.6 3.42 3 轻度升高 降低 升高 升高 正常 升高 减少 正常 2 NaN 410 417 691 C D-penicillamine F N Y N N 0.8 3.75 3 正常 正常 升高 升高 正常 升高 正常 正常 2 NaN 411 418 976 C D-penicillamine F N Y N N 0.7 3.29 4 正常 降低 升高 升高 正常 升高 正常 正常 2 NaN [412 rows x 22 columns] id Sex hypertension heart_disease ever_married work_type Residence_type smoking_status stroke glucose_bin Age_bin bmi_bin 0 9046 M 0 1 Yes Private Urban formerly smoked 1 糖尿病 3 肥胖 1 51676 F 0 0 Yes Self-employed Rural never smoked 1 糖尿病 3 偏瘦 2 31112 M 0 1 Yes Private Rural never smoked 1 偏高 4 肥胖 3 60182 F 0 0 Yes Private Urban smokes 1 糖尿病前期 2 肥胖 4 1665 F 1 0 Yes Self-employed Rural never smoked 1 糖尿病前期 3 正常 ... ... .. ... ... ... ... ... ... ... ... ... ... 5105 18234 F 1 0 Yes Private Urban never smoked 0 正常 4 NaN 5106 44873 F 0 0 Yes Self-employed Urban never smoked 0 偏高 4 肥胖 5107 19723 F 0 0 Yes Self-employed Rural never smoked 0 正常 1 肥胖 5108 37544 M 0 0 Yes Private Rural formerly smoked 0 糖尿病前期 2 超重 5109 44679 F 0 0 Yes Govt_job Urban Unknown 0 正常 2 超重 [5110 rows x 12 columns] 处理后的数据形状:(918, 12)(无NaN) 处理后的数据形状:(5110, 12)(无NaN) 处理后的数据形状:(412, 22)(无NaN) Main already 处理 heart 数据集中的稀有类别(目标变量: HeartDisease) 处理 stroke 数据集中的稀有类别(目标变量: stroke) 处理 cirrhosis 数据集中的稀有类别(目标变量: Stage) C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:271: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['Oldpeak_bin'] = df['Oldpeak_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:271: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead. df['Oldpeak_bin'] = df['Oldpeak_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:275: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['RestingBP_bin'] = df['RestingBP_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:275: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead. df['RestingBP_bin'] = df['RestingBP_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:279: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['ST_Slope'] = df['ST_Slope'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:283: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['ChestPainType'] = df['ChestPainType'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:290: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['glucose_bin'] = df['glucose_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:290: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead. df['glucose_bin'] = df['glucose_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:293: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['bmi_bin'] = df['bmi_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:293: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead. df['bmi_bin'] = df['bmi_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:296: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['smoking_status'] = df['smoking_status'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:303: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['Bilirubin_bin'] = df['Bilirubin_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:303: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead. df['Bilirubin_bin'] = df['Bilirubin_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:306: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df['Albumin_bin'] = df['Albumin_bin'].replace( C:\Users\Diwith\Daily_Project\2023ES2\YataiBei_Bayesian Network.py:306: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead. df['Albumin_bin'] = df['Albumin_bin'].replace( Config already 已为 heart 构建贝叶斯网络 [heart] 模型CPD验证: 节点 ChestPainType 的CPD: +------------------+----------+ | ChestPainType(0) | 0.542339 | +------------------+----------+ | ChestPainType(1) | 0.190188 | +------------------+----------+ | ChestPainType(2) | 0.221102 | +------------------+----------+ | ChestPainType(3) | 0.046371 | +------------------+----------+ 节点 HeartDisease 的CPD: +-----------------+-----+---------------------+ | Age_bin | ... | Age_bin(3) | +-----------------+-----+---------------------+ | ChestPainType | ... | ChestPainType(3) | +-----------------+-----+---------------------+ | RestingBP_bin | ... | RestingBP_bin(3) | +-----------------+-----+---------------------+ | ST_Slope | ... | ST_Slope(2) | +-----------------+-----+---------------------+ | HeartDisease(0) | ... | 0.6628959276018099 | +-----------------+-----+---------------------+ | HeartDisease(1) | ... | 0.33710407239819007 | +-----------------+-----+---------------------+ 节点 ST_Slope 的CPD: +-------------+-----------+ | ST_Slope(0) | 0.0689964 | +-------------+-----------+ | ST_Slope(1) | 0.513889 | +-------------+-----------+ | ST_Slope(2) | 0.417115 | +-------------+-----------+ 已为 stroke 构建贝叶斯网络 [stroke] 模型CPD验证: 节点 hypertension 的CPD: +-----------------+-----------+ | hypertension(0) | 0.905076 | +-----------------+-----------+ | hypertension(1) | 0.0949244 | +-----------------+-----------+ 节点 stroke 的CPD: +---------------+-----+------------------+ | Age_bin | ... | Age_bin(4) | +---------------+-----+------------------+ | glucose_bin | ... | glucose_bin(3) | +---------------+-----+------------------+ | heart_disease | ... | heart_disease(1) | +---------------+-----+------------------+ | hypertension | ... | hypertension(1) | +---------------+-----+------------------+ | stroke(0) | ... | 0.5 | +---------------+-----+------------------+ | stroke(1) | ... | 0.5 | +---------------+-----+------------------+ 节点 glucose_bin 的CPD: +----------------+-----------+ | glucose_bin(0) | 0.60798 | +----------------+-----------+ | glucose_bin(1) | 0.227794 | +----------------+-----------+ | glucose_bin(2) | 0.0777208 | +----------------+-----------+ | glucose_bin(3) | 0.0865056 | +----------------+-----------+ 已为 cirrhosis 构建贝叶斯网络 [cirrhosis] 模型CPD验证: 节点 Bilirubin_bin 的CPD: +------------------+----------+ | Bilirubin_bin(0) | 0.446411 | +------------------+----------+ | Bilirubin_bin(1) | 0.26647 | +------------------+----------+ | Bilirubin_bin(2) | 0.287119 | +------------------+----------+ 节点 Stage 的CPD: +---------------+-----+------------------+ | Age_bin | ... | Age_bin(3) | +---------------+-----+------------------+ | Albumin_bin | ... | Albumin_bin(1) | +---------------+-----+------------------+ | Ascites | ... | Ascites(Y) | +---------------+-----+------------------+ | Bilirubin_bin | ... | Bilirubin_bin(2) | +---------------+-----+------------------+ | Stage(1) | ... | 0.25 | +---------------+-----+------------------+ | Stage(2) | ... | 0.25 | +---------------+-----+------------------+ | Stage(3) | ... | 0.25 | +---------------+-----+------------------+ | Stage(4) | ... | 0.25 | +---------------+-----+------------------+ 节点 Albumin_bin 的CPD: +----------------+----------+ | Albumin_bin(0) | 0.477876 | +----------------+----------+ | Albumin_bin(1) | 0.522124 | +----------------+----------+ 已成功保存 heart 网络结构图 已成功保存 stroke 网络结构图 已成功保存 cirrhosis 网络结构图 [heart] 拟合度计算失败: Missing columns in data. Can't find values for the following variables: set() [heart] 推理失败样本过多(184条),无有效预测结果 [stroke] 拟合度计算失败: Missing columns in data. Can't find values for the following variables: set() [stroke] 推理失败样本过多(1022条),无有效预测结果 [cirrhosis] 拟合度计算失败: Missing columns in data. Can't find values for the following variables: set() [cirrhosis] 推理失败样本过多(83条),无有效预测结果 进程已结束,退出代码为 0 import pandas as pd import matplotlib.pyplot as plt import numpy as np from pgmpy.models import BayesianNetwork from pgmpy.estimators import MaximumLikelihoodEstimator from pgmpy.inference import VariableElimination import networkx as nx from sklearn.metrics import accuracy_score, confusion_matrix, classification_report from pgmpy.metrics import log_likelihood_score from pgmpy.estimators import BicScore from sklearn.model_selection import train_test_split from pgmpy.estimators import BayesianEstimator from sympy.solvers.diophantine.diophantine import equivalent plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False pd.set_option('display.max_columns', 500) pd.set_option('display.width', 1000) df_heart = pd.read_csv('heart_se.csv') df_stroke = pd.read_csv('stroke_se.csv') df_cirrhosis = pd.read_csv('cirrhosis_se.csv') df_stroke.columns = ['id', 'Sex', 'Age', 'hypertension', 'heart_disease', 'ever_married', 'work_type', 'Residence_type', 'avg_glucose_level', 'bmi', 'smoking_status', 'stroke'] df_stroke['Sex'] = df_stroke['Sex'].apply(lambda x: 'M' if x == 'Male' else 'F') df_cirrhosis['Age'] = (df_cirrhosis['Age'] / 365).astype(int) df_stroke['Age'] = df_stroke['Age'].astype(int) def preprocess_cirrhosis(df): """处理肝硬化数据集中的高基数特征,转为低基数类别变量""" # 1. 胆固醇(Cholesterol):医学标准分箱(mg/dL) # 正常:<200;边缘升高:200-239;升高:≥240(参考临床标准) # df['Cholesterol_bin'] = pd.cut( # df['Cholesterol'], # bins=[0, 200, 240, 1000], # labels=['正常', '临界高值', '高胆固醇血症'] # ) # 2. 胆红素(Bilirubin):反映肝脏排泄功能(mg/dL) # 正常:<1.2;轻度升高:1.2-3.0;显著升高:>3.0 df['Bilirubin_bin'] = pd.cut( df['Bilirubin'], bins=[-np.inf, 1.2, 3.0, np.inf], labels=['正常', '轻度升高', '显著升高'] ) # 3. 白蛋白(Albumin):肝脏合成功能指标(g/dL) # 正常:3.5-5.0;降低:<3.5(肝硬化典型表现) df['Albumin_bin'] = pd.cut( df['Albumin'], bins=[-np.inf, 3.5, 5.0, np.inf], labels=['降低', '正常', '升高'] # 升高临床意义较小,合并为一类 ) # 4. 铜(Copper):尿铜排泄(μg/天) # 正常:<50;升高:≥50(肝豆状核变性指标) df['Copper_bin'] = pd.cut( df['Copper'], bins=[-np.inf, 50, np.inf], labels=['正常', '升高'] ) # 5. 碱性磷酸酶(Alk_Phos):肝胆疾病指标(单位/升) # 正常范围:40-150;升高:>150 df['Alk_Phos_bin'] = pd.cut( df['Alk_Phos'], bins=[-np.inf, 150, np.inf], labels=['正常', '升高'] ) # 6. 转氨酶(SGOT):肝细胞损伤指标(单位/毫升) # 正常:<40;升高:≥40 df['SGOT_bin'] = pd.cut( df['SGOT'], bins=[-np.inf, 40, np.inf], labels=['正常', '升高'] ) # 7. 甘油三酯(Tryglicerides):血脂指标(mg/dL) # 正常:<150;升高:≥150 df['Tryglicerides_bin'] = pd.cut( df['Tryglicerides'], bins=[-np.inf, 150, np.inf], labels=['正常', '升高'] ) # 8. 血小板(Platelets):肝硬化脾功能亢进指标(×10^9/L) # 正常:150-450;减少:<150 df['Platelets_bin'] = pd.cut( df['Platelets'], bins=[-np.inf, 150, np.inf], labels=['减少', '正常'] ) # 9. 凝血酶原时间(Prothrombin):肝脏合成功能(秒) # 正常:11-13;延长:>13 df['Prothrombin_bin'] = pd.cut( df['Prothrombin'], bins=[-np.inf, 13, np.inf], labels=['正常', '延长'] ) # 移除原始高基数变量,保留离散化后的变量 df = df.drop( columns=['Copper', 'Alk_Phos', 'SGOT', 'Tryglicerides', 'Platelets', 'Prothrombin'], errors='ignore' ) return df df_stroke['glucose_bin'] = pd.cut( df_stroke['avg_glucose_level'], bins=[0, 100, 140, 200, 1000], labels=['正常', '偏高', '糖尿病前期', '糖尿病'] ) # 预处理cirrhosis(含离散化+类型转换) df_cirrhosis = preprocess_cirrhosis(df_cirrhosis) def preprocess_heart(df): """处理心脏病数据集中的高基数特征,转为低基数类别变量""" # 2. 静息血压(RestingBP):按高血压指南分箱 # 正常:<120;升高前期:120-129;1级高血压:130-139;2级高血压:≥140 df['RestingBP_bin'] = pd.cut( df['RestingBP'], bins=[0, 120, 130, 140, 300], labels=['正常', '升高前期', '1级高血压', '2级高血压'] ) # 3. 胆固醇(Cholesterol):按血脂异常标准分箱 # 正常:<200;临界高值:200-239;高胆固醇血症:≥240 # df['Cholesterol_bin'] = pd.cut( # df['Cholesterol'], # bins=[0, 200, 240, 1000], # labels=['正常', '临界高值', '高胆固醇血症'] # ) # 4. 最大心率(MaxHR):按年龄预测最大心率百分比分箱 # 计算预测最大心率 = 220 - 年龄 df['MaxHR_pct'] = df['MaxHR'] / (220 - df['Age']) df['MaxHR_bin'] = pd.cut( df['MaxHR_pct'], bins=[0, 0.6, 0.85, 1.0, 2.0], labels=['低(<60%)', '中(60-85%)', '高(85-100%)', '极高(>100%)'] ) df.drop(columns=['MaxHR_pct'], inplace=True) # 删除临时计算列 # 5. ST段压低(Oldpeak):反映心肌缺血程度 # 无压低:≤0;轻度压低:0-1;中度压低:1-2;严重压低:>2 df['Oldpeak_bin'] = pd.cut( df['Oldpeak'], bins=[-0.1, 0, 1, 2, 10.0], labels=['无压低', '轻度压低', '中度压低', '严重压低'] ) # 移除原始高基数变量,保留离散化后的变量 df = df.drop( columns=['RestingBP','MaxHR', 'Oldpeak'], errors='ignore' ) return df df_heart = preprocess_heart(df_heart) # 将目标变量转为分类类型(确保pgmpy能处理) df_heart['HeartDisease'] = df_heart['HeartDisease'].astype('category') df_stroke['stroke'] = df_stroke['stroke'].astype('category') df_cirrhosis['Stage'] = df_cirrhosis['Stage'].astype('category') def discretize_age_uniform(df, age_col='Age'): """ 为三个数据集统一年龄分箱,用0-4数字标签,覆盖0.05~82岁所有范围 分箱逻辑:按20年一个区间,兼顾医学年龄分组和数据覆盖 标签:0-4(数字越小表示年龄越小) """ df['Age_bin'] = pd.cut( df[age_col], bins=[-0.1, 19, 39, 59, 79, 120], # 覆盖0.05~82岁所有数据 labels=[0, 1, 2, 3, 4], # 数字标签(0:儿童青少年;1:青年;2:中年;3:老年;4:高龄) include_lowest=True # 包含最小值(0.05岁) ) # 检查分箱后的数据分布(确保无空区间) print(f"{df.shape[0]}条数据的年龄分箱分布:") print(df['Age_bin'].value_counts().sort_index()) # 按0-4排序输出 return df df_heart = discretize_age_uniform(df_heart) # 心脏病(28~77岁) df_stroke = discretize_age_uniform(df_stroke) # 中风(0.05~82岁) df_cirrhosis = discretize_age_uniform(df_cirrhosis) # 肝硬化(26~78岁) # 分箱后删除原始Age列,保留分箱列(数字标签) df_heart.drop(columns=['Age'], inplace=True) df_stroke.drop(columns=['Age'], inplace=True) df_cirrhosis.drop(columns=['Age'], inplace=True) def discretize_cholesterol(df, cholesterol_col='Cholesterol'): """ 胆固醇分箱:统一标准,标签0-1-2,覆盖两个数据集的胆固醇范围 分箱逻辑(医学标准): - 0: 正常(<200 mg/dL) - 1: 临界高值(200-239 mg/dL) - 2: 升高(≥240 mg/dL) """ if cholesterol_col not in df.columns: print("数据集中无胆固醇列,跳过分箱") return df df['Cholesterol_bin'] = pd.cut( df[cholesterol_col], bins=[-1, 199, 239, 10000], # 覆盖两个数据集的胆固醇范围(最高达3e6,设10000足够) labels=[0, 1, 2], # 数字标签(0-正常,1-临界,2-升高) include_lowest=True ) # 验证分箱分布(确保无空箱) print(f"胆固醇分箱分布(标签0-1-2):") print(df['Cholesterol_bin'].value_counts().sort_index()) # 按0-1-2排序 # 删除原始胆固醇列,保留分箱列 df.drop(columns=[cholesterol_col], inplace=True, errors='ignore') return df df_heart = discretize_cholesterol(df_heart, cholesterol_col='Cholesterol') df_cirrhosis = discretize_cholesterol(df_cirrhosis, cholesterol_col='Cholesterol') def discretize_stroke_continuous(df): # bmi分箱(按WHO标准) df['bmi_bin'] = pd.cut( df['bmi'], bins=[-1, 18.5, 24, 28, 100], labels=['偏瘦', '正常', '超重', '肥胖'] ) # 血糖分箱(复用之前的逻辑) df['glucose_bin'] = pd.cut( df['avg_glucose_level'], bins=[0, 100, 140, 200, 1000], labels=['正常', '偏高', '糖尿病前期', '糖尿病'] ) df.drop(columns=['bmi', 'avg_glucose_level'], inplace=True) return df df_stroke = discretize_stroke_continuous(df_stroke) # inner_join = pd.merge(df_heart, df_stroke, on=['Age']) print(df_heart) print(df_cirrhosis) print(df_stroke) # 在所有预处理后添加强制去NaN def drop_or_fill_nan(df): # 先尝试填充分类变量的NaN(用众数) for col in df.columns: if df[col].isna().any(): if df[col].dtype == 'category' or df[col].dtype == 'object': df[col] = df[col].fillna(df[col].mode()[0]) else: df[col] = df[col].fillna(df[col].median()) # 若仍有NaN,直接删除(确保数据干净) df = df.dropna() print(f"处理后的数据形状:{df.shape}(无NaN)") return df df_heart = drop_or_fill_nan(df_heart) df_stroke = drop_or_fill_nan(df_stroke) df_cirrhosis = drop_or_fill_nan(df_cirrhosis) # 心脏病:替换Oldpeak_bin、RestingBP_bin等中文标签 def replace_heart_labels(df): # Oldpeak_bin:0-3(无→轻→中→重) df['Oldpeak_bin'] = df['Oldpeak_bin'].replace( {'无压低':0, '轻度压低':1, '中度压低':2, '严重压低':3} ).astype(int) # RestingBP_bin:0-3(正常→升高前期→1级→2级) df['RestingBP_bin'] = df['RestingBP_bin'].replace( {'正常':0, '升高前期':1, '1级高血压':2, '2级高血压':3} ).astype(int) # ST_Slope:0-2(Down→Flat→Up) df['ST_Slope'] = df['ST_Slope'].replace( {'Down':0, 'Flat':1, 'Up':2} ).astype(int) # ChestPainType:0-3(ASY→ATA→NAP→TA) df['ChestPainType'] = df['ChestPainType'].replace( {'ASY':0, 'ATA':1, 'NAP':2, 'TA':3} ).astype(int) return df # 中风:替换glucose_bin、bmi_bin等 def replace_stroke_labels(df): df['glucose_bin'] = df['glucose_bin'].replace( {'正常':0, '偏高':1, '糖尿病前期':2, '糖尿病':3} ).astype(int) df['bmi_bin'] = df['bmi_bin'].replace( {'偏瘦':0, '正常':1, '超重':2, '肥胖':3} ).astype(int) df['smoking_status'] = df['smoking_status'].replace( {'never smoked':0, 'formerly smoked':1, 'smokes':2, 'Unknown':3} ).astype(int) return df # 肝硬化:替换Bilirubin_bin等 def replace_cirrhosis_labels(df): df['Bilirubin_bin'] = df['Bilirubin_bin'].replace( {'正常':0, '轻度升高':1, '显著升高':2} ).astype(int) df['Albumin_bin'] = df['Albumin_bin'].replace( {'降低':0, '正常':1, '升高':2} ).astype(int) # 其他标签类似替换为0-1-2 return df # 应用替换 df_heart = replace_heart_labels(df_heart) df_stroke = replace_stroke_labels(df_stroke) df_cirrhosis = replace_cirrhosis_labels(df_cirrhosis) #---------------------------------------------------------------------------------------------------- def build_bayesian_network(data, file_name, shared_variables=['age', 'gender'], # 保持参数名一致 local_shared_vars=None, # 新增局部共享变量参数 included_features=None, custom_edges=None, forbidden_edges=None): """为单个数据集构建贝叶斯网络模型,支持多个共享变量和自定义先验知识""" if data is None or data.empty: print(f"无有效数据,无法为 {file_name} 构建模型") return None # 将共享变量名统一转为小写(便于匹配) shared_variables = [var.lower() for var in shared_variables] # 如果指定了要包含的特征,则筛选数据 if included_features: # 确保所有共享变量被包含 for var in shared_variables: # 查找数据中匹配的列名(不区分大小写) matched_col = next((col for col in data.columns if col.lower() == var), None) if matched_col and matched_col not in included_features: included_features.append(matched_col) data = data[included_features] # 定义网络结构 edges = [] # 为每个共享变量添加边 # for shared_var in shared_variables: # # 查找数据中匹配的列名 # matched_col = next((col for col in data.columns if col.lower() == shared_var), None) # if matched_col: # for feature in data.columns: # if feature != matched_col: # edges.append((matched_col, feature)) # 共享变量→其他特征 # 原代码中共享变量生成边的部分修改为: for shared_var in shared_variables: matched_col = next((col for col in data.columns if col.lower() == shared_var), None) if matched_col: # 目标变量匹配时忽略大小写 target_vars = [col for col in data.columns if col.lower() in ['stroke', 'heartdisease']] # 修正为小写匹配 for feature in target_vars: if feature != matched_col and (matched_col, feature) not in edges: edges.append((matched_col, feature)) # 强制生成共享变量→目标变量的边 # 添加自定义边(领域知识) if custom_edges: edges.extend(custom_edges) # 移除禁止的边(领域知识) if forbidden_edges: edges = [edge for edge in edges if edge not in forbidden_edges] # 创建贝叶斯网络 model = BayesianNetwork(edges) # 使用最大似然估计器学习参数 model.fit(data, estimator=BayesianEstimator, prior_type = "BDeu", equivalent_sample_size = 10) print(f"已为 {file_name} 构建贝叶斯网络") return model # 可视化贝叶斯网络 def visualize_network(model, file_name): """ 可视化贝叶斯网络结构,增强鲁棒性处理各种布局异常 """ if model is None: print(f"警告: {file_name} 模型为空,无法可视化") return # 创建图的副本并转为无向图(简化布局计算) G = model.to_undirected() if hasattr(model, 'to_undirected') else model.copy() # 严格过滤无效边 valid_edges = [] for u, v in G.edges(): if u != v: # 排除自循环边 valid_edges.append((u, v)) # 清除无效边并检查是否还有剩余边 G.remove_edges_from(list(G.edges())) G.add_edges_from(valid_edges) if not G.edges(): print(f"错误: {file_name} 网络中没有有效边(可能所有边都是自循环),无法可视化") return # 尝试多种布局算法,按优先级选择 try: # 优先使用力导向布局(适合稀疏图) pos = nx.spring_layout(G, k=0.3, iterations=100, seed=42) except: try: # 备选:Kawai布局(适合密集图) pos = nx.kamada_kawai_layout(G, seed=42) except: try: # 备选:圆形布局 pos = nx.circular_layout(G) except: # 最后的备选:随机布局 pos = nx.random_layout(G, seed=42) print(f"警告: {file_name} 网络使用随机布局,可能效果不佳") # 检查节点位置是否有重叠(距离小于阈值) MIN_DISTANCE = 0.01 positions = list(pos.values()) has_overlap = False for i in range(len(positions)): for j in range(i + 1, len(positions)): dist = np.sqrt(((positions[i][0] - positions[j][0]) ** 2) + ((positions[i][1] - positions[j][1]) ** 2)) if dist < MIN_DISTANCE: has_overlap = True break if has_overlap: break # 如果有重叠,添加微小扰动 if has_overlap: print(f"警告: {file_name} 网络节点位置存在重叠,添加随机扰动") for node in pos: pos[node] = ( pos[node][0] + np.random.uniform(-0.02, 0.02), pos[node][1] + np.random.uniform(-0.02, 0.02) ) # 创建图形 plt.figure(figsize=(14, 12)) # 增大图形尺寸 # 绘制节点 nx.draw_networkx_nodes( G, pos, node_size=2800, # 增大节点尺寸 node_color='skyblue', alpha=0.8, edgecolors='black', # 添加节点边框 linewidths=1.0 ) # 绘制边(分批次处理,捕获异常) edges_to_draw = list(G.edges()) successful_edges = [] for edge in edges_to_draw: try: nx.draw_networkx_edges( G, pos, edgelist=[edge], arrows=True, arrowsize=20, width=1.5, alpha=0.7, edge_color='gray' ) successful_edges.append(edge) except Exception as e: print(f"警告: 无法绘制边 {edge}: {e}") if len(successful_edges) == 0: print(f"错误: {file_name} 网络中没有成功绘制的边") plt.close() return # 绘制节点标签 nx.draw_networkx_labels( G, pos, font_size=10, font_weight='bold', font_family='SimHei', # 确保中文显示 horizontalalignment='center', verticalalignment='center' ) # 设置标题和保存图形 plt.title(f'{file_name} 的贝叶斯网络结构', fontsize=14) plt.axis('off') # 关闭坐标轴 plt.tight_layout() # 调整布局 try: plt.savefig(f'{file_name}_bayesian_network.png', dpi=300, bbox_inches='tight') print(f"已成功保存 {file_name} 网络结构图") except Exception as e: print(f"保存图像时出错: {e}") plt.close() # 关闭图形以释放内存 # 执行跨网络推理 def cross_network_inference(models, file_names, global_shared_vars=['age', 'gender'], local_shared_vars=None, evidence=None): """基于共享变量进行跨网络推理,支持全局和局部共享变量""" results = {} if evidence is None: evidence = {'age': 50, 'gender': 'Male'} # 默认全局证据值 for model, file_name in zip(models, file_names): if model is None: continue # 映射全局证据中的变量名到模型中的实际列名 model_evidence = {} for var, value in evidence.items(): if var.lower() in [v.lower() for v in global_shared_vars]: matched_col = next((col for col in model.nodes() if col.lower() == var.lower()), None) if matched_col: model_evidence[matched_col] = value # 添加局部共享变量的证据(如果有) if local_shared_vars: for local_var_name, local_var_col in local_shared_vars.items(): if local_var_name in evidence and local_var_col in model.nodes(): model_evidence[local_var_col] = evidence[local_var_name] infer = VariableElimination(model) # 对每个模型中的关键变量进行推理 key_variables = [var for var in model.nodes() if var not in model_evidence] for var in key_variables: try: result = infer.query(variables=[var], evidence=model_evidence) results[f"{file_name}_{var}"] = result print(f"\n{file_name} 中,已知 {model_evidence} 时,{var} 的概率分布:") print(result) except Exception as e: print(f"\n{file_name} 推理 {var} 时错误: {e}") return results def merge_rare_classes(df, target_variable, min_samples=2): """ 合并样本数少于阈值的稀有类别 :param df: 数据集 :param target_variable: 目标变量列名 :param min_samples: 最小样本数阈值 :return: 处理后的数据集 """ if target_variable not in df.columns: return df # 统计每个类别的样本数 class_counts = df[target_variable].value_counts() rare_classes = class_counts[class_counts < min_samples].index.tolist() if not rare_classes: return df # 没有稀有类别,直接返回 print(f"检测到稀有类别: {rare_classes},样本数阈值: {min_samples}") # 创建映射:稀有类别→合并到的目标类别 # 策略:合并到最接近的类别(基于类别名称的数值或字母顺序) sorted_classes = sorted(df[target_variable].unique()) class_mapping = {} for rare_class in rare_classes: # 找到最接近的有效类别 idx = sorted_classes.index(rare_class) # 尝试向前或向后查找有效类别 for direction in [-1, 1]: new_idx = idx + direction if 0 <= new_idx < len(sorted_classes) and sorted_classes[new_idx] not in rare_classes: class_mapping[rare_class] = sorted_classes[new_idx] break # 应用映射 if class_mapping: print(f"类别合并映射: {class_mapping}") df[target_variable] = df[target_variable].replace(class_mapping) return df def evaluate_model(model, test_data, target_variable, file_name): """ 综合评估贝叶斯网络模型:预测性能 + 拟合度指标 :param model: 训练好的贝叶斯网络(pgmpy的BayesianNetwork对象) :param test_data: 预处理后的测试数据集(pd.DataFrame) :param target_variable: 目标预测变量(如"HeartDisease"、"stroke") :param file_name: 数据集名称(用于日志输出) :return: 评估结果字典 """ results = {} if model is None or test_data.empty: print(f"[{file_name}] 模型或测试数据为空,跳过评估") return results # ---------------------- # 1. 模型拟合度指标(BIC、对数似然) # ---------------------- try: bic_score = BicScore(test_data).score(model) # 越小越好(平衡拟合与复杂度) ll_score = log_likelihood_score(model, test_data) # 越大越好(数据拟合度) results.update({ "bic_score": bic_score, "log_likelihood": ll_score }) except Exception as e: print(f"[{file_name}] 拟合度计算失败: {e}") bic_score = ll_score = None # ---------------------- # 2. 预测性能评估(分类任务) # ---------------------- if target_variable not in model.nodes() or target_variable not in test_data.columns: print(f"[{file_name}] 目标变量 {target_variable} 不在模型或测试数据中") return results # 分离特征和标签 X_test = test_data.drop(columns=[target_variable], errors='ignore') y_test = test_data[target_variable].dropna() # 过滤标签缺失值 X_test = X_test.loc[y_test.index] # 保持索引一致 if len(y_test) < 10: # 避免样本过少导致无意义评估 print(f"[{file_name}] 有效样本不足(仅{len(y_test)}条),跳过预测评估") return results infer = VariableElimination(model) y_pred = [] fail_count = 0 for idx, row in X_test.iterrows(): # 构建证据(忽略缺失值,贝叶斯网络支持边际化) evidence = {} for var in model.nodes(): if var == target_variable: continue val = row.get(var) if not pd.isna(val): evidence[var] = val # 推理目标变量的后验分布 try: query = infer.query(variables=[target_variable], evidence=evidence, show_progress=False) pred_class = query.argmax()[0] # 取概率最大的类别 y_pred.append(pred_class) except Exception as e: fail_count += 1 y_pred.append(None) # 标记推理失败的样本 # 过滤无效预测结果 valid_mask = [p is not None for p in y_pred] y_test_valid = y_test[valid_mask] y_pred_valid = [p for p in y_pred if p is not None] if len(y_test_valid) == 0: print(f"[{file_name}] 推理失败样本过多({fail_count}条),无有效预测结果") return results # 计算分类指标 try: accuracy = accuracy_score(y_test_valid, y_pred_valid) cm = confusion_matrix(y_test_valid, y_pred_valid) report = classification_report(y_test_valid, y_pred_valid, zero_division=0) results.update({ "accuracy": accuracy, "confusion_matrix": cm, "classification_report": report }) except Exception as e: print(f"[{file_name}] 分类指标计算失败: {e}") accuracy = cm = report = None # ---------------------- # 3. 输出评估结果 # ---------------------- print("\n" + "=" * 40) print(f"[{file_name}] 模型评估报告") print("=" * 40) if bic_score is not None: print(f"• BIC评分: {bic_score:.2f}(越小越好,平衡复杂度与拟合度)") print(f"• 对数似然: {ll_score:.2f}(越大越好,数据拟合度)") if accuracy is not None: print(f"• 预测准确率: {accuracy:.4f}") print(f"• 混淆矩阵:\n{cm}") print(f"• 分类报告:\n{report}") print(f"• 有效样本数: {len(y_test_valid)} / {len(test_data)}") print(f"• 推理失败数: {fail_count}") return results def print_model_cpds(model, file_name, top_n=3): """打印模型的前N个节点的CPD,验证是否有效""" if model is None: return print(f"\n[{file_name}] 模型CPD验证:") for i, node in enumerate(model.nodes()): if i >= top_n: break cpd = model.get_cpds(node) if cpd: print(f"\n节点 {node} 的CPD:") print(cpd) else: print(f"节点 {node} 无有效CPD!") # 主函数 - 支持自定义配置 def main(): file_names = ['heart', 'stroke', 'cirrhosis'] print('Main already') df_heart['HeartDisease'] = df_heart['HeartDisease'].astype('category') df_stroke['stroke'] = df_stroke['stroke'].astype('category') df_cirrhosis['Stage'] = df_cirrhosis['Stage'].astype('category') target_mapping = { 'heart': 'HeartDisease', 'stroke': 'stroke', 'cirrhosis': 'Stage' } # 加载数据 datasets = [df_heart, df_stroke, df_cirrhosis] processed_datasets = [] for data, name in zip(datasets, file_names): target_var = target_mapping.get(name) if not target_var or target_var not in data.columns: processed_datasets.append(data) continue print(f"\n处理 {name} 数据集中的稀有类别(目标变量: {target_var})") # 使用合并稀有类别的方法(推荐) processed_data = merge_rare_classes(data, target_var, min_samples=5) # 确保目标变量是分类类型 processed_data[target_var] = processed_data[target_var].astype('category') processed_datasets.append(processed_data) train_datasets = [] test_datasets = [] for data in processed_datasets: train, test = train_test_split(data, test_size=0.2, random_state=42) train_datasets.append(train) test_datasets.append(test) # 为每个数据集定义自定义配置 # disease_configs = [ # { # "disease_name": "心脏病", # "local_shared_vars": {"cholesterol": "Cholesterol_bin"}, # "included_features": [ # "Age_bin", "Sex", "ChestPainType", "RestingBP_bin", "Cholesterol_bin", # "FastingBS", "ExerciseAngina", "Oldpeak_bin", "ST_Slope", "HeartDisease" # ], # "custom_edges": [ # # 核心直接风险因素 # ("ChestPainType", "HeartDisease"), # ("ST_Slope", "HeartDisease"), # ("Oldpeak_bin", "HeartDisease"), # ("Cholesterol_bin", "HeartDisease"), # ("RestingBP_bin", "HeartDisease"), # ("ExerciseAngina", "HeartDisease"), # ("Age_bin", "HeartDisease"), # # # 关键间接路径 # ("FastingBS", "Cholesterol_bin"), # 高血糖影响胆固醇 # ("Age_bin", "RestingBP_bin"), # 年龄影响血压 # ], # "forbidden_edges": [ # ("Sex", "HeartDisease"), # 性别通过其他因素间接影响 # ("HeartDisease", "*"), # 避免反向因果 # ("FastingBS", "HeartDisease"), # 血糖主要通过胆固醇间接影响 # ] # }, # { # "disease_name": "中风", # "local_shared_vars": {}, # "included_features": [ # "Age_bin", "Sex", "hypertension", "heart_disease", # "glucose_bin", "bmi_bin", "smoking_status", "stroke" # ], # "custom_edges": [ # # 核心直接风险因素 # ("hypertension", "stroke"), # ("heart_disease", "stroke"), # ("Age_bin", "stroke"), # ("glucose_bin", "stroke"), # # # 关键间接路径 # ("smoking_status", "heart_disease"), # 吸烟导致心脏病 # ("bmi_bin", "hypertension"), # 肥胖导致高血压 # ("bmi_bin", "glucose_bin"), # 肥胖影响血糖 # ], # "forbidden_edges": [ # ("Sex", "stroke"), # 性别通过其他因素间接影响 # ("stroke", "*"), # 避免反向因果 # ("bmi_bin", "stroke"), # BMI主要通过高血压和血糖间接影响 # ] # }, # { # "disease_name": "肝硬化", # "local_shared_vars": {"cholesterol": "Cholesterol_bin"}, # "included_features": [ # "Age_bin", "Sex", "Ascites", "Hepatomegaly", "Spiders", # "Edema", "Bilirubin_bin", "Albumin_bin", # "SGOT_bin", "Platelets_bin", "Prothrombin_bin", "Stage" # ], # "custom_edges": [ # # 肝功能指标→疾病阶段 # ("Bilirubin_bin", "Stage"), # 胆红素升高→肝硬化进展 # ("Albumin_bin", "Stage"), # 白蛋白降低→肝硬化进展 # ("Prothrombin_bin", "Stage"), # 凝血酶原时间延长→肝硬化 # ("Platelets_bin", "Stage"), # 血小板减少→肝硬化 # # # 体征→疾病阶段 # ("Ascites", "Stage"), # 腹水→肝硬化晚期 # ("Hepatomegaly", "Stage"), # 肝肿大→肝硬化 # # # 间接关联 # ("Age_bin", "Stage"), # 年龄→疾病进展 # ("Ascites", "Edema"), # 腹水→水肿 # ("Albumin_bin", "Ascites"), # 低白蛋白→腹水 # ], # "forbidden_edges": [ # ("ID", "*"), # ("Status", "*"), # ("Drug", "Stage"), # ("Sex", "Stage"), # 性别不直接影响肝硬化阶段 # ("Stage", "*"), # 避免反向因果 # ] # } # ] disease_configs = [ { "disease_name": "心脏病", "local_shared_vars": {"cholesterol": "Cholesterol_bin"}, "included_features": [ "Age_bin", "ChestPainType", "ST_Slope", "RestingBP_bin", # 4个核心父节点 "HeartDisease" # 目标变量 ], "custom_edges": [ # 仅保留4个直接影响心脏病的核心父节点 ("ChestPainType", "HeartDisease"), # 胸痛类型(最核心症状) ("ST_Slope", "HeartDisease"), # ST段斜率(诊断金标准) ("RestingBP_bin", "HeartDisease"), # 血压(直接风险) ("Age_bin", "HeartDisease") # 年龄(基础风险) ], "forbidden_edges": [] # 禁用禁止边,避免误删核心关联 }, { "disease_name": "中风", "local_shared_vars": {}, "included_features": [ "hypertension", "glucose_bin", "Age_bin", "heart_disease", # 4个核心父节点 "stroke" # 目标变量 ], "custom_edges": [ # 仅保留4个直接影响中风的核心父节点 ("hypertension", "stroke"), # 高血压(头号风险) ("glucose_bin", "stroke"), # 高血糖(独立风险) ("heart_disease", "stroke"), # 心脏病史(血栓风险) ("Age_bin", "stroke") # 年龄(累积风险) ], "forbidden_edges": [] }, { "disease_name": "肝硬化", "local_shared_vars": {"cholesterol": "Cholesterol_bin"}, "included_features": [ "Bilirubin_bin", "Albumin_bin", "Age_bin", "Ascites", # 4个核心父节点 "Stage" # 目标变量 ], "custom_edges": [ # 仅保留4个直接影响肝硬化阶段的核心父节点 ("Bilirubin_bin", "Stage"), # 胆红素(肝脏排泄功能) ("Albumin_bin", "Stage"), # 白蛋白(肝脏合成功能) ("Ascites", "Stage"), # 腹水(肝硬化晚期标志) ("Age_bin", "Stage") # 年龄(病程累积) ], "forbidden_edges": [] } ] print('Config already') # 3. 构建模型(传递全局共享变量和局部共享变量) models = [] for train_data, name, config in zip(train_datasets, file_names, disease_configs): model = build_bayesian_network( data=train_data, file_name=name, shared_variables=["age", "gender"], local_shared_vars=config["local_shared_vars"], included_features=config["included_features"], custom_edges=config["custom_edges"], forbidden_edges=config["forbidden_edges"] ) print_model_cpds(model, name) models.append(model) # 4. 可视化网络 for model, name in zip(models, file_names): visualize_network(model, name) for model, test_data, name, config in zip(models, test_datasets, file_names, disease_configs): # 根据疾病类型确定目标变量 target_mapping = { 'heart': 'HeartDisease', 'stroke': 'stroke', 'cirrhosis': 'Stage' } target_var = target_mapping.get(name) if not target_var: print(f"[{name}] 未知目标变量,跳过验证") continue # 执行评估 evaluate_model(model, test_data, target_var, name) # # 5.网络推理(传递共享变量映射,确保证据匹配) # # 注意:证据中的变量名需与配置中的“逻辑名”一致,而非原始列名 # evidence = { # "age": 60, # 全局共享变量(年龄) # "gender": "男", # 全局共享变量(性别) # "cholesterol": 240 # 局部共享变量(胆固醇,逻辑名) # } # # cross_network_inference( # models=models, # file_names=file_names, # global_shared_vars=["age", "gender"], # 全局共享变量 # local_shared_vars={"cholesterol": "Cholesterol"}, # 局部共享变量映射(逻辑名→实际列名) # evidence=evidence # ) if __name__ == "__main__": main() 什么情况,你帮我看看,怎么解决
07-15
PC1 PC2 Age education experience Income Secondary occupation rate Cessation/Career Change Rate Begin year Distance Depth units officers y -1.487688713 -0.123135401 67.81 1.23 7277 6.1 0.0625 0.1875 1990 167 96.64 14.28571 760.5714 -14 -1.110930746 0.105295278 51.27 1.73 5700 3.27 0.363636364 0.181818182 1994 14.27 20.27 14.28571 760.5714 -145.52 -1.669981259 0.205800457 50.59 1.54 5290 3.38 0.178571429 0.25 1994 60.41 46.43 14.28571 760.5714 -18.39 -2.011589945 -0.070789436 48.12 1.71 3240 4.37 0.470588235 0.647058824 2004 60.35 54.88 14.28571 760.5714 -52 1.529763667 -1.011182787 47.61 2.1875 3324 7.17 0.222222222 0.277777778 2000 27.56 21.11 14.28571 760.5714 -80.61 -1.242452554 -0.213055395 61.72 1.9 6047 4 0.272727273 0.363636364 1989 31.82 35.36 14.28571 760.5714 -120 -2.53945682 0.24605652 51.4 1.125 4541 4.13 0 0.4 1996 10.56 16.11 14.28571 760.5714 -22.39 -2.937563682 0.222341469 49.69 1.84 3073 3.35 0.115384615 0.346153846 2000 39.35 44.38 14.28571 760.5714 -19.41 -3.072440879 0.250527889 51.43 2.14 5380 4.14 0.142857143 0.428571429 1995 94.86 30.86 14.28571 760.5714 0 -3.188904163 0.261092554 46.07 2.1538 3584 5.34 0.1333333 0.466666667 2003 55.53 30.21 14.28571 760.5714 0 -0.931952754 0.002281031 53.79 1.5714 5376 9.73 0.121212121 0.272727273 1996 46.25 45.56 14.28571 760.5714 -127.14 -2.459061374 0.043855153 52.86 1.8141 4937 6.09 0.034482759 0.24137931 1995 216.31 78.59 14.28571 760.5714 -18.98 -3.151851068 0.257731378 54.9 2.36364 4089 9.22 0.363636364 0.454545455 1998 32.9 43.45 14.28571 760.5714 0 -1.344487403 0.079617387 42.5 2.33333 1747 11 0.333333333 0.5 2008 46 35 14.28571 760.5714 -121.42 -2.262807611 0.053819563 58.45 1.958333 6285 4.58 0.090909091 0.212121212 1992 16.91 19.32 41.5 1090.643 -8.82 -1.967076416 0.093588628 61.89 1.68 5172 4.83 0.037037037 0.518518519 1996 187.13 81.48 41.5 1090.643 -14.61 -0.033489112 0.048221626 60 1.882353 6034 5.62 0.05 0.55 1994 185.5 89.84 41.5 1090.643 -63.4 0.987266275 -0.634194676 45.5 1.1818182 3503 7 0.083333333 0.083333333 2002 21.73 21.45 40.5 2367.071 -142.09 -2.403385794 -0.027063647 67.7 1.285714 6564 4.93 0.1 0.4 1994 114.6 64.78 40.5 2367.071 -22.39 -0.542599875 -0.266840712 52 1.6 4216 3.9 0.2 0.2 1993 44.25 16.25 40.5 2367.071 -44.78 -3.290819282 0.27033752 61 1 4294 3.67 0.166666667 0.333333333 2000 15.8 8.25 40.5 2367.071 0 -1.986421792 -0.035216433 59.06 1.933333 5019 6.8 0.222222222 0.444444444 1997 20.5 13.35 40.5 2367.071 -49.11 -3.036044223 0.24722626 52 1.5 3669 16.6 0.166666667 0.333333333 2006 33.83 18.67 40.5 2367.071 0 -3.290819282 0.27033752 62 1.5 4473 2 0 0.4 2002 4.6 16 40.5 2367.071 0 -2.951119204 0.239522507 50.89 1.66667 3578 2.86 0 0.11111 2003 61.44 23.11 40.5 2367.071 0 2.517151534 -1.1275613 57.08 1.33333 4852 4.41 0.083333333 0.083333333 1995 14.92 13 40.5 2367.071 -161.73 -2.456316079 0.174649314 59 1.529412 4403 3.03 0.12 0.36 2001 14.68 15.67 40.5 2367.071 0 0.21180011 -0.099374979 53.09 1.3 5094 8.55 0.047619048 0.095238095 1994 208.56 82.78 40.5 2367.071 -79.41 -1.36397542 -0.23904082 53.78 1.75 3667 6.25 0.11111 0.333333 2002 30.33 27.78 40.5 2367.071 -3.35 -2.311898213 0.157448555 59.19 1.86667 5766 4.98 0.0625 0.4375 1993 66.73 59.87 40.5 2367.071 0 -0.758454875 -0.237709692 64.5 1.625 6479 3.97 0.05 0.35 1990 101.76 42.53 40.5 2367.071 -33 3.517483337 -1.177534575 66.2 1 9537 3.28 0 0.2 1978 166.2 41 40.5 2367.071 -282.47 0.389915774 -0.274803177 64.15 1.16667 5994 3.07 0.076923077 0.769230769 1995 20.62 24.62 40.5 2367.071 -40.21 -2.129625435 0.004701125 61.37 1.375 5150 2.46 0.052631579 0.578947368 1995 24.35 20.11 40.5 2367.071 -46.52 0.616850515 -0.664392864 58.88 1.533333 5196 4.76 0.125 0.15625 1990 31.72 31.47 40.5 2367.071 -120.59 -1.174321726 0.034197388 55.58 1.59375 5156 4.6 0.135135135 0.405405405 1995 20.49 18.71 40.5 2367.071 -82 0.279011342 0.124110676 62.17 1.304348 6536 3.74 0.034482759 0.379310345 1988 93.96 30.63 40.5 2367.071 -133.73 -1.051997991 -0.171948432 62.71 1.4 5069 2.46 0.214285714 0.571428571 1997 52.64 48.86 40.5 2367.071 -54.32 -2.110925945 0.218264311 62.17 2 5042 3.6 0.166666667 0.166666667 1996 26.4 25.4 40.5 2367.071 0 0.024563441 -0.274624278 54.8 1.7 5727 2.64 0 0.2 1994 26.2 28.1 38.71429 1344.714 0 -1.49857412 0.537831233 55.5 1.75 5259 5 0 0.25 1991 30.375 25.125 38.71429 1344.714 -27.99 -1.22683473 -0.159118578 48.93 1.461538 4684 7.19 0.103448276 0.137931034 1997 61.21 80.21 38.71429 1344.714 -57.19 0.549169498 -0.670295411 69.56 1.55556 4090 2.31 0.333333333 0.333333333 1996 17.71 28.86 38.71429 1344.714 -171.55 -0.015052174 -0.4025312 58.55 1.210526 6480 6.36 0.05 0.15 1988 120.1 72.75 38.71429 1344.714 -33.59 -1.49924311 -0.220694532 55.2 1.5 6230 7.75 0.2 0.2 1989 179 111.25 38.71429 1344.714 0 -2.148386516 0.311881035 50.66 1.357143 4568 8.22 0.172413793 0.413793103 1995 33.89 21.68 38.71429 1344.714 0 0.217385996 -0.419221017 64.92 1.636364 4701 2.255 0.041666667 0.5 1994 61.17 41.88 38.71429 1344.714 -165 -0.538423733 -0.205699434 64.86 1.590909 5054 4.11 0.310344828 0.724137931 1996 57 45.33 38.71429 1344.714 -78.12 -1.346887256 0.400472608 51.32 1.464286 4044 7.81 0.290322581 0.387096774 2002 348.41 348.55 38.71429 1344.714 -144.46 -1.292578158 0.165293581 52.57 1.444444 4368 8.43 0.133333333 0.166666667 1998 190.36 98.67 36 2146.857 -17.16 -0.973048934 0.030002677 56.94 1.419355 4786 6.67 0.057142857 0.285714286 1998 242.94 72.43 36 2146.857 -494.45 0.351799666 -0.59664956 60.48 1.392857 6023 7.1 0.206896552 0.103448276 1999 100.15 50.56 36 2146.857 -602.55 0.520020092 -0.441754049 57.79 1.4 6450 11.22 0.137931034 0.1 1994 227.34 90.34 36 2146.857 -328.34 -0.28364749 0.397747026 65.94 1.193548 6056 6.34 0.088235294 0.470588235 1991 36.68 39.72 36 2146.857 -305.73 0.397102055 -0.169816632 53 1.571429 5683 6.26 0 0.444444444 1999 146.75 28.33 3.285714 479.5714 -7.44 0.933321653 -0.984254292 69.79 1.416667 6385 5.58 0.214285714 0.357142857 1995 131.29 86.07 3.285714 479.5714 -438.76 -1.921963244 0.255857538 65.92 1.636364 4038 5.67 0.25 0.583333333 1997 99.75 38.55 42.35714 2200.071 -37.32 -0.098375752 -0.265536039 60.68 1.411765 6167 11.94 0.052631579 0.210526316 1990 163.37 50.26 42.35714 2200.071 -545.84 -0.956573688 -0.051025829 55.26 1.944444 5357 9.67 0.052631579 0.368421053 1998 77.94 21.93 42.35714 2200.071 -308.79 3.610411344 0.089350858 60.14 1.428571 5228 7.78 0.214285714 0.642857143 1999 241.25 36.9 42.35714 2200.071 -87.61 -1.978509392 0.090153536 66.73 1.454545 5753 2.33 0.133333333 0.133333333 1994 97.93 39.6 42.35714 2200.071 -132.34 0.390938903 -0.974852361 69.8 1 7776 2.9 0.4 1 1988 77.8 31 42.35714 2200.071 -68.26 0.866663969 -0.603026165 69.54 1.090909 5236 2.875 0.153846154 0.461538462 1994 65.77 27.92 74.07143 3233.643 -336.94 -0.161937648 -0.117065876 65.5 1.555556 3732 4.14 0 0.6 1998 35.3 21.4 74.07143 3233.643 -233.58 1.141060979 -0.8904894 61.83 1.666667 4282 3.06 0.5 0.666666667 1997 29 10.83 74.07143 3233.643 -168.71 1.928895338 -0.501521838 64.85 1.75 4643 2.18 0.230769231 0.230769231 1996 84.54 28 74.07143 3233.643 -77.53 -2.500946381 0.129130126 66.13 1.6 5429 5.375 0.25 0.75 1992 53.64 18.13 74.07143 3233.643 -88.17 2.959587534 -0.883303433 54.31 1.307692 4399 7.94 0.384615385 0.153846154 1997 68.85 36.92 74.07143 3233.643 -85.22 2.085442265 -0.12184054 49.27 1.5 2957 10.87 0.066666667 0.2 2002 70.53 31.87 74.07143 3233.643 -305.85 2.696967016 -0.693200641 51.37 1.791667 4487 11.76 0.333333333 0.074074074 1996 54.85 21.92 74.07143 3233.643 -142.92 3.747974961 1.692610501 55.06 1.75 4088 5.91 0.1875 0.625 1995 39.93 24.56 74.07143 3233.643 -306.58 0.492133027 -0.246484985 64.88 1.75 3203 6.1 0.125 0.875 2001 49.25 20 74.07143 3233.643 -192.99 0.085208832 -0.947118849 59.6 1.4 5511 8.4 0 0 1991 32 16 74.07143 3233.643 -13.4 3.499043723 -1.717315443 69.5 1.6 4894 3.23 0 0.8 1992 52.78 20.7 74.07143 3233.643 -294.91 0.43759677 0.094821793 64.43 1.5 4587 3.64 0 0.260869565 1994 56.05 27.45 74.07143 3233.643 -12.65 4.081308778 1.07913148 53.44 1.533333 5060 6.94 0.0625 0.0625 1992 25.25 13.81 74.07143 3233.643 -65.56 1.842688905 -0.956768467 62.5 1.33333 1590 3 0 0.25 2012 17.75 7.5 53.57143 1177.357 0 0.706667583 -0.426804473 53.29 1.4 2980 2.2 0 0.571428571 2003 30.86 15.71 53.57143 1177.357 -303.14 1.512916314 0.360566033 66.2 1.8 4972 5 0.6 0.6 2001 73.5 31.25 53.57143 1177.357 0 2.860737924 -0.198906729 66.64 1.384615 5358 5.53 0 0.571428571 1992 142.67 36 53.57143 1177.357 -95.01 1.675201123 -0.906894768 48.82 1.545455 3862 6.44 0.363636364 0.272727273 2001 77.91 25.45 53.57143 1177.357 -68.83 4.840076919 -2.50462752 50.5 2 5225 17.5 0 0 1996 18.75 16.25 53.57143 1177.357 0 0.658418297 2.218593857 56.17 1.8 2901 6 0.166666667 0.333333333 2004 37.6 24.25 4.142857 172.7857 -231.35 3.595323165 0.939519372 52.5 2.5 4900 2.333333333 0 0.166666667 1996 20.67 9.2 4.142857 172.7857 -74.63 2.359683447 2.455150676 57.375 1.391304 3661 4.833333 0 0.12 2000 19.46 22.48 48.98257 2045.429 -61.39 3.270851842 5.067995714 60.52 1.409091 4535 6.534783 0.12 0.16 1998 120.75 71.3 48.98257 2045.429 -30.99 3.246804894 0.925289171 53.33 1.5 3413 7.642105 0.238095238 0.19047619 1999 21.1 13.55 48.98257 2045.429 -45.09 2.352075975 0.523375668 57.43 1.714286 3606 8.25 0.071428571 0.214285714 1997 64.14 33 48.98257 2045.429 -96.1 0.532574156 -0.481793696 72.5 1.33333 9675 0.1 0 0.5 1975 131 33.25 48.98257 2045.429 -180.4 3.046814942 2.449333207 58.55 1.285714 4900 4.542857143 0 0.428571429 1989 27.43 18.5 48.98257 2045.429 -8.44 2.084554656 1.161608928 65.07 1.416667 5391 7.78 0.066666667 0.6 1993 62.87 16.93 48.98257 2045.429 -3.74 1.988394837 -0.922717276 60.5 1.666667 4993 3.875 0 0.3 1995 28.44 20.11 48.98257 2045.429 -128.26 3.071818585 0.343056912 55.18 1.55556 4032 10.75 0.181818182 0 1996 74.9 32.55 48.98257 2045.429 -40.71 多元线性回归
10-11
<think>我们有一个数据集,包含多个自变量(PC1, PC2, Age, education, experience, Income, Secondary occupation rate, Cessation/Career Change Rate, Begin year, Distance, Depth, units, officers)和一个因变量(y)。我们想要建立一个多元线性回归模型。 步骤: 1. 数据预处理:检查缺失值、异常值,可能需要进行标准化(由于已经有了PC1和PC2,这些可能是主成分,所以其他变量可能需要检查是否已经标准化,但这里我们假设数据已经准备好) 2. 分割数据:如果有测试集需求,需要分割,但这里没有提供测试集,我们可能使用全部数据训练,或者自己分割(但问题没有要求,我们可以先建立模型,然后评估) 3. 建立多元线性回归模型:使用最小二乘法估计参数。 4. 模型评估:使用统计量如R-squared, p-value等。 由于数据量不大(约80行),我们可以使用全部数据建立模型,然后进行解释。 注意:数据中有些列的值是重复的(例如,units和officers有很多重复值),这可能会引起共线性问题。另外,我们还需要检查多重共线性。 但是,由于问题要求的是给出解决方案和代码,我们将使用Python的statsmodels库来进行多元线性回归。 步骤: - 导入必要的库 - 读取数据(这里数据以文本形式给出,我们可以复制到一个csv文件,或者直接读取字符串) - 定义自变量(X)和因变量(y) - 添加常数项(截距) - 建立模型并拟合 - 输出模型摘要 由于数据是以制表符分隔的,我们可以使用pandas读取。 考虑到数据直接以文本形式给出,我们可以将其保存到字符串变量中,然后使用pandas的read_csv读取。 但是,由于数据量较大,我们这里直接假设数据已经保存为文件,或者我们可以从剪贴板复制(但为了可重复性,我们使用字符串IO)。 我们将数据复制到一个字符串中,然后使用StringIO来读取。 另外,注意第一行是列名。 由于数据中有些列名包含斜杠(如Cessation/Career Change Rate),我们将用下划线代替斜杠以避免问题,或者读取时不做处理(pandas可以处理)。 但为了安全,我们读取数据后,将列名中的斜杠替换为下划线。 然而,在模型中,列名作为变量名不能包含特殊字符,所以我们需要处理列名。 或者,我们可以在公式中避免使用这些列名,但使用statsmodels的公式接口时,列名会被自动处理(比如用下划线代替斜杠)。这里我们使用非公式接口,所以需要自己处理列名。 计划: - 将列名中的'/'替换为'_' - 将列名中的空格替换为'_'(可选) 但注意,数据中还有负号,这些在读取时没有问题。 我们开始写代码。 注意:数据中有些数值是分数形式(如0.0625),这些读取为浮点数没有问题。 步骤: 1. 导入库 2. 读取数据(使用字符串) 3. 预处理列名(去掉特殊字符) 4. 定义自变量和因变量 5. 添加常数项 6. 拟合模型 7. 打印结果 我们将使用statsmodels的OLS(普通最小二乘)。 另外,为了检查多重共线性,我们可以计算方差膨胀因子(VIF)。但首先,我们建立基本模型。 由于数据较多,我们只关注回归结果。 如果遇到共线性问题,statsmodels会给出警告(如条件数很大),我们可以在结果中观察。 代码实现: </think>下面我将使用Python实现多元线性回归模型来分析提供的数据集。数据集包含多个自变量(PC1, PC2, Age, education等)和因变量(y)。 ### 解决方案代码 ```python import pandas as pd import numpy as np import statsmodels.api as sm from statsmodels.stats.outliers_influence import variance_inflation_factor import matplotlib.pyplot as plt import seaborn as sns # 创建DataFrame(实际使用时应从CSV文件读取) data = { 'PC1': [-1.4877, -1.1109, ...], # 此处应填入完整数据 'PC2': [-0.1231, 0.1053, ...], 'Age': [67.81, 51.27, ...], 'education': [1.23, 1.73, ...], 'experience': [7277, 5700, ...], 'Income': [6.1, 3.27, ...], 'Secondary_occupation_rate': [0.0625, 0.3636, ...], 'Cessation_Rate': [0.1875, 0.1818, ...], 'Begin_year': [1990, 1994, ...], 'Distance': [167, 14.27, ...], 'Depth': [96.64, 20.27, ...], 'units': [14.2857, 14.2857, ...], 'officers': [760.5714, 760.5714, ...], 'y': [-14, -145.52, ...] } df = pd.DataFrame(data) # 数据预处理 # 1. 处理缺失值(本例中无缺失值,实际数据需检查) print("缺失值检查:\n", df.isnull().sum()) # 2. 特征工程:计算工作年限 current_year = 2023 # 根据数据最新年份调整 df['work_years'] = current_year - df['Begin_year'] # 3. 移除不必要的列 df = df.drop(columns=['Begin_year']) # 4. 划分特征和目标变量 X = df.drop(columns=['y']) y = df['y'] # 5. 添加常数项(截距) X = sm.add_constant(X) # 多元线性回归建模 model = sm.OLS(y, X).fit() # 输出回归结果 print(model.summary()) # 多重共线性诊断(VIF) vif_data = pd.DataFrame() vif_data["feature"] = X.columns vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(len(X.columns))] print("\nVIF诊断:\n", vif_data) # 残差分析 residuals = model.resid plt.figure(figsize=(10, 6)) sns.histplot(residuals, kde=True) plt.title('残差分布') plt.show() # 预测值与残差的关系 plt.figure(figsize=(10, 6)) sns.scatterplot(x=model.predict(), y=residuals) plt.axhline(y=0, color='r', linestyle='--') plt.title('预测值 vs 残差') plt.ylabel('残差') plt.xlabel('预测值') plt.show() ``` ### 代码解释 1. **数据准备**: - 创建包含所有变量的DataFrame - 实际应用时应从CSV文件加载数据(示例中为示意) 2. **数据预处理**: - 缺失值检查(`isnull().sum()`) - 特征工程:计算`work_years`(工作年限) - 移除无关特征(`Begin_year`) - 划分特征(X)和目标变量(y) 3. **建模**: - 使用`statsmodels`的OLS(普通最小二乘法) - 添加常数项作为截距(`add_constant`) - `model.fit()`训练模型 4. **结果分析**: - `model.summary()`输出回归统计量: - R²(解释方差比例) - 系数估计值及p值(统计显著性) - F统计量(模型整体显著性) - AIC/BIC(模型选择指标) 5. **诊断检验**: - 计算VIF(方差膨胀因子)检测多重共线性 - 残差分布图检验正态性假设 - 预测值-残差图检验同方差性 ### 关键输出解释 1. **回归系数**: - 每个特征对应一个系数,表示其对y的影响大小 - 正系数表示正相关,负系数表示负相关 - p值<0.05表示统计显著 2. **R-squared**: - 0-1的范围,值越大表示模型解释力越强 - 调整R²考虑了特征数量,更可靠 3. **VIF诊断**: - VIF>10表示严重多重共线性 - 高VIF特征需移除或转换 4. **残差图**: - 应呈正态分布(钟形曲线) - 随机散布在0附近(无模式) ### 注意事项 1. 实际应用中应进行数据标准化(特别是量纲差异大时): ```python from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X.drop(columns=['const'])) X_scaled = sm.add_constant(X_scaled) # 重新添加常数项 ``` 2. 若存在高度相关特征(VIF>10),可: - 移除相关特征之一 - 使用PCA降维 - 应用正则化方法(岭回归/Lasso) 3. 根据残差图结果: - 若异方差(漏斗形),考虑加权最小二乘法 - 若非线性模式,添加多项式特征
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值