我如何通过深度学习构建大肠癌预测平台-优快云博客

Cloudiopsy是一个基于深度学习的大肠癌预测平台，旨在通过神经网络模型帮助诊断正常粘膜组织和上皮腺癌。通过谷歌云平台训练的Keras模型，使用Tensorflow.js在浏览器中提供预测。该平台有望提高结直肠癌的诊断效率，助力全球抗癌工作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

深度学习 (Deep Learning)

I believe that artificial intelligence can save the human race.

我相信人工智能可以拯救人类。

How ironic is it though that as any generic sci-fi portrays it, any advanced AI would suddenly conceive the thought that exterminating the human race is the solution for world peace.

讽刺的是，正如任何通用科幻小说所描绘的那样，任何先进的人工智能都会突然想到灭绝性的思想。人类是世界和平的解决方案。

A Skynet fantasy is kinda remote from happening… for now at least.

天网幻想离现在还很遥远……至少现在是这样。

Kidding aside, instead of focusing on the 0.001% probability that artificial intelligence will lead to the destruction of mankind, it would be worthwhile to consider that AI could be very beneficial to society, especially in the healthcare sector.

顺便说一句，值得关注的是，人工智能可能对社会非常有益，特别是在医疗保健领域，而不是关注人工智能会导致人类毁灭的0.001％概率。

The applications of AI in the field of healthcare are limitless, offering huge potential in the field of precision medicine, health analytics, medical informatics, etc.

AI在医疗领域的应用是无限的，在精密医学，健康分析，医疗信息学等领域提供了巨大的潜力。

In my case, I tried to investigate the potential of artificial intelligence in the field of oncology. I sought to leverage the power of deep learning to build neural networks trained on the cloud to more accurately and efficiently determine if a colorectal tissue contains tumors or not.

就我而言，我试图研究人工智能在肿瘤学领域的潜力。我试图利用深度学习的力量来构建在云上训练的神经网络，以更准确，更有效地确定结直肠组织是否包含肿瘤。

为什么要进行活检？ (Why Cloudiopsy?)

Colorectal cancer (CRC) claimed approximately 1.8 million lives in the year 2018 alone (WHO, 2018). In fact, in the Philippines, colorectal cancer is deemed as the number one gastrointestinal cancer (Afinidad-Bernardo, 2017). Fortunately, however, colorectal cancer is preventable and when identified at an early age, colorectal cancer could be cured.

仅在2018年，大肠癌(CRC)导致约180万人死亡(世卫组织，2018年)。实际上，在菲律宾，大肠癌被认为是胃肠道癌的第一名(Afinidad-Bernardo，2017年)。但是，幸运的是，大肠癌是可以预防的，并且在早期发现大肠癌可以治愈。

As such, Cloudiopsy is designed as a colorectal cancer prediction platform that could be easily distributed in the country to aid in diagnosis by differentiating between normal mucosal tissue and epithelial adenocarcinoma (tumor) tissue. This is a cost-efficient way of making colorectal cancer prediction universally available in the Philippines, given that the diagnosis platform could be in the future, easily accessible online.

因此，Cloudiopsy被设计为大肠癌的预测平台，可以通过在正常粘膜组织和上皮腺癌(肿瘤)组织之间进行区分，在全国范围内轻松分发以帮助诊断。鉴于诊断平台可能会在将来发布，并且可以轻松在线访问，因此这是一种在菲律宾普遍提供大肠癌预测的经济高效方式。

那是什么 (So, what is it?)

Put simply, the Cloudiopsy platform predicts image inputs through a neural network model through transfer learning, using Keras with a Tensorflow backend trained on Google Cloud Platform and serving predictions in the browser through Tensorflow.js.

简而言之， Cloudiopsy平台，通过通过转移学习神经网络模型预测图像输入，使用Keras与Tensorflow后端培训了谷歌云平台和服务中通过浏览器的预测Tensorflow.js 。

大肠癌背后的基本病理直觉 (Basic Pathological Intuition behind Colorectal Cancer)

The picture on the left is an example of a normal mucosal tissue layer. It could be seen that goblet cells (mucus-secreting cells) are relatively the same size and populate the colonic crypt (an intestinal gland which is elliptical in shape for this cross-section). This indicates that the crypt is healthy, and thus, this picture was taken from normal mucosal tissue.

左图是正常粘膜组织层的一个例子。可以看出杯状细胞(分泌粘液的细胞)的大小相对相同，并位于结肠隐窝(该横截面呈椭圆形的肠腺)中。这表明隐窝是健康的，因此，这张照片是从正常的粘膜组织拍摄的。

As opposed to the picture on the right, the goblet cells are no longer visible inside of the tissue which would imply it already is tumor tissue. This type of tumor is specifically known as epithelial adenocarcinoma. This could be an indicator of colorectal cancer if other region samples sent are also proven to be containing tumors.

与右图相反，杯状细胞在组织内部不再可见，这意味着它已经是肿瘤组织。这种类型的肿瘤被明确称为上皮腺癌。如果发送的其他区域样本也被证明含有肿瘤，则这可能是大肠癌的指标。

Current methods of analyzing tissue samples given biopsies are to manually scan a sample and find for cancerous regions by looking at the tissue’s architecture. However, scanning through the entire region is very laborious and is prone to human error. As such, it would be ideal to automate this process and find a way to create a platform that will automatically determine if a patch is of normal tissue or cancer tissue and return predictions.

在给定活组织检查的情况下，分析组织样本的当前方法是手动扫描样本并通过查看组织的结构查找癌变区域。但是，对整个区域进行扫描非常费力，并且容易出现人为错误。这样，使该过程自动化并找到一种创建平台的方法将是理想的，该平台将自动确定补丁是正常组织还是癌组织并返回预测。

神经网络与转移学习 (Neural Networks and Transfer Learning)

Neural networks are a set of connections in the brain where electrical signals are fired through each other to generate thoughts, ideas, emotions, etc. On the other hand, when we learn, we re-arrange these connections between these neurons to learn new patterns and concepts.

神经网络是大脑中的一组连接，在该连接中相互发射电信号以产生思想，观念，情感等。另一方面，当我们学习时，我们会重新排列这些神经元之间的这些连接以学习新的模式和概念。

Modeled after our brain, neural networks can transform information through mathematical operations to obtain a specific output. As such, neural networks have the capacity to translate complicated patterns such as image or speech into numerical values and mathematically transform these values through multiplying by weights, to classify them.

以我们的大脑为模型，神经网络可以通过数学运算来转换信息以获得特定的输出。因此，神经网络具有将复杂的模式(例如图像或语音)转换为数值并通过乘以权重对这些值进行数学转换以对其进行分类的能力。

Weights are simply values in which the input is multiplied or transformed by. These weights are what is modified until a specific output (classification or prediction) is generated.

权重只是将输入乘以或转换后的值。在生成特定输出(分类或预测)之前，将修改这些权重。

Moreover, in order to capture patterns and “learn” which one is a tumour and which is normal, certain layers in the neural network may be responsible for learning certain features regarding the image. Initial layers may be responsible for identifying features such as edges, and corners, and moving along the neural network, some layers go through higher levels of abstractions such as understanding of shapes (circles vs ellipses), and later on, possibly distance between these, etc.

此外，为了捕获模式并“学习”一个是肿瘤并且是正常的，神经网络中的某些层可能负责学习有关图像的某些特征。初始层可能负责识别诸如边缘和拐角之类的特征，并沿着神经网络移动，某些层会经历更高级别的抽象，例如对形状的理解(圆形与椭圆形)，然后可能是它们之间的距离，等等

One method would be to train a neural network from scratch. However, in my project, instead of training from scratch, knowing that the lower layers may be responsible for edge detection and shape detection, I have used transfer learning wherein I have initialized it with pre-trained weights from pre-trained neural networks. Specifically, I have pre-trained on the Mobilenet V2 model.

一种方法是从头开始训练神经网络。但是，在我的项目中，不是从头开始训练，而是知道下层可能负责边缘检测和形状检测，因此我使用了转移学习，其中我已使用来自预训练神经网络的预训练权重对其进行了初始化。具体来说，我已经对Mobilenet V2模型进行了预培训。

数据集 (The Dataset)

The dataset utilized for this project was the 100,000 histological images of human colorectal cancer and healthy tissue prepared by Kather, Halama, and Marx. According to the researchers, the dataset is composed of 100,000 images of hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue collated from samples from the NCT Biobank and the UMM pathology archive (Kather, Halama & Marx, 2016).

该项目使用的数据集是由Kather，Halama和Marx准备的100,000张人类结直肠癌和健康组织的组织学图像 。根据研究人员的说法，该数据集由100,000张苏木精和曙红(H＆E)染色的人类结直肠癌(CRC)和正常组织的图像图像组成，这些图像是从NCT生物库和UMM病理学档案库(Kather，Halama和Marx)中整理的，2016)。

建立模型 (Building the Model)

To build the neural network model, I have utilized the Google Cloud Platform to train on over 4534 images (2235 images of tumor tissue and 2299 images of normal tissue). I have 1330 images for validation (656 for normal and 674 for tumorous tissue) and 1653 images (810 of normal and 843 of tumor tissue) for a test dataset.

为了建立神经网络模型，我利用Google Cloud Platform训练了超过4534张图像(2235张肿瘤组织图像和2299张正常组织图像)。对于测试数据集，我有1330张图像用于验证(正常图像为656张，肿瘤组织为674张)和1653张图像(正常组织为810张，肿瘤组织为843张)。

I have utilized Keras with Tensorflow backend to design the neural network with a Nadam optimizer, training for 50 epochs.

我已经将Keras与Tensorflow后端结合使用，并通过Nadam优化器设计了神经网络，训练了50个纪元。

The specific architecture for my model is illustrated below:

我的模型的特定架构如下所示：

结果 (Results)

The loss function was based on binary cross-entropy. The model gained a final test accuracy of 98.79% and an F1 score of 0.987. The final size of the model is 75.5 MB.

损失函数基于二进制交叉熵。该模型的最终测试准确性为98.79％，F1得分为0.987。模型的最终大小为75.5 MB。

使用Tensorflow.js服务 (Serving using Tensorflow.js)

In order to serve it online, I have utilized the tfjs converter, which converts the model into a model.json file and shard files which will be used to load the model. I have written a simple script to load the model.json and serve predictions through cloudiopsy.github.io.

为了在线提供服务，我使用了tfjs转换器，它将模型转换为model.json文件和shard文件，这些文件将用于加载模型。我编写了一个简单的脚本来加载model.json并通过cloudiopsy.github.io提供预测。

结论 (Conclusion)

This is still a work of progress as a higher accuracy is more favorable to create a tool that will be very useful for pathologists and medical professionals. Moreover, testing on the clinical setting would reveal possible refinements in the software. However, one day, Cloudiopsy may hopefully be used to help diagnose colorectal cancer through a faster analysis of colorectal biopsies and aid in the global initiative in fighting cancer.

这仍然是一项进步，因为更高的准确性更有利于创建一种对病理学家和医学专业人员非常有用的工具。此外，在临床环境中进行测试将揭示软件的可能改进。但是，有一天，Cloudiopsy有望通过更快地分析结直肠活检来帮助诊断结直肠癌，并帮助开展全球抗癌行动。

致谢 (Acknowledgments)

I would like to thank my mentor Mr. Martin Gomez, Dr. Daphne Ang, and to my parents, Dr. Serafin Serapio, and Dr. Cherry Serapio who all helped me make this project possible!

我要感谢我的导师Martin Gomez先生，Daphne Ang博士以及我的父母Serafin Serapio博士和Cherry Serapio博士，他们全都帮助我使这个项目成为可能！

翻译自: https://medium.com/towards-artificial-intelligence/how-i-built-a-colorectal-cancer-prediction-platform-through-deep-learning-343aeb24d34a