【AI大模型】大模型基础论文全集

大模型基础论文列表

  • 语言模型基础

    • 基于统计方法的语言模型

    • 基于 RNN 的语言模型

    • 基于 Transformer 的语言模型

    • 语言模型的采样方法

    • 语言模型的评测

  • 大语言模型

    • 大数据+大模型→新智能

    • 大语言模型架构概览

    • 基于 Encoder-only 架构的大语言模型

    • 基于 Encoder-Decoder 架构的大语言模型

    • 基于 Decoder-only 架构的大语言模型

    • 非 Transformer 架构

  • Prompt 工程

    • Prompt 工程简介

    • 上下文学习

    • 思维链

    • Prompt 技巧

    • 相关应用

  • 参数高效微调

    • 参数高效微调简介

    • 参数附加方法

    • 参数选择方法

    • 低秩适配方法

    • 实践与应用

  • 模型编辑

    • 模型编辑简介

    • 模型编辑经典方法

    • 附加参数法:T-Patcher

    • 定位编辑法:ROME

    • 模型编辑应用

  • 检索增强生成

    • 检索增强生成简介

    • 检索增强生成架构

    • 知识检索

    • 生成增强

    • 实践与应用

语言模型基础

基于统计方法的语言模型

  1. Foundations of statistical natural language processing. BOOKChris Manning, Hinrich Sch{"{u}}tze [PDF], 1999

  2. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition.Third Edition. BOOKDaniel Jurafsky, James H. Martin [PDF], 2023

基于 RNN 的语言模型

  1. A learning algorithm for continually running fully recurrent neural networks. Neural computationRJ Williams, D Zipser. [PDF], 1989

  2. Long Short-Term Memory. Neural ComputingSepp Hochreiter, J{"{u}}rgen Schmidhuber [PDF], 1997

  3. On the difficulty of training Recurrent Neural Networks. ICMLRazvan Pascanu, Tomas Mikolov, Yoshua Bengio. [PDF], 2012

  4. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXivJunyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio [PDF], 2014

  5. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. NeurIPSSamy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer [PDF], 2015

基于 Transformer 的语言模型

  1. Layer Normalization. arXivJimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton [PDF], 2016

  2. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLRColin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. [PDF], 2019

  3. Transformer Feed-Forward Layers Are Key-Value Memories. EMNLPMor Geva, Roei Schuster, Jonathan Berant, Omer Levy [PDF], 2021

  4. ResiDual: Transformer with Dual Residual Connections. arXivShufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan. [PDF], 2023

语言模型的采样方法

  1. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models. AAAIAshwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra. [PDF], 2018

  2. The Curious Case of Neural Text Degeneration. ICLRAri Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi [PDF], 2020

语言模型的评测

  1. Perplexity—a Measure of the Difficulty of Speech Recognition Tasks. JASAF. Jelinek, R. L. Mercer, L. R. Bahl, J. K. Baker [PDF], 1997

  2. ROUGE: A Package for Automatic Evaluation of Summaries. ACLChin-Yew Lin [PDF], 2004

  3. BLEU might be Guilty but References are not Innocent. EMNLPMarkus Freitag, David Grangier, Isaac Caswell [PDF], 2020

  4. BERTScore: Evaluating Text Generation with BERT. ICLRTianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi. [PDF], 2020

  5. Leveraging Large Language Models for NLG Evaluation: Advances and Challenges. arXivZhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, Shuai Ma [PDF], 2024

  6. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. EMNLPYang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu [PDF], 2023

  7. INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback. EMNLPWenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Wang, Lei Li. [PDF], 2023

大语言模型

大数据+大模型→新智能

  1. Scaling laws for neural language models. arXivJared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei. [PDF], 2020.

  2. Training Compute-Optimal Large Language Models arXivJordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre. [PDF], 2022.

  3. PaLM 2 Technical Report. arXivGoogle. [PDF], 2023.

大语言模型架构概览

  1. Attention is all you need. NeurIPSVaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia. [PDF], 2017.

基于 Encoder-only 架构的大语言模型

  1. A survey on contextual embeddings. arXivQi Liu, Matt J. Kusner, Phil Blunsom. [PDF], 2020.

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. [PDF][Code], 2018.

  3. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. [PDF][Code], 2019.

  4. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. [PDF][Code], 2019.

  5. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. [PDF][Code], 2020.

基于 Encoder-Decoder 架构的大语言模型

  1. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXivColin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. [PDF][Code], 2019.

  2. Multitask Prompted Training Enables Zero-Shot Task Generalization. arXivVictor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush. [PDF][Code], 2021.

  3. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. NAACLLinting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. [PDF][Code], 2021.

  4. Scaling Instruction-Finetuned Language Models. Journal of Machine Learning ResearchHyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei. [PDF][Code], 2024.

  5. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ACLMike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. [PDF][Code], 2020.

  6. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational LinguisticsYinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. [PDF][Code], 2020.

基于 Decoder-only 架构的大语言模型

  1. Improving language understanding by generative pre-training. OnlineAlec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. [PDF], 2018.

  2. Language models are unsupervised multitask learners. OnlineAlec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. [PDF], 2019.

  3. Language models are few-shot learners. NeurIPSTom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. [PDF], 2020.

  4. Evaluating Large Language Models Trained on Code. arXivMark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba. [PDF], 2021.

  5. WebGPT: Browser-assisted question-answering with human feedback. arXivReiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman. [PDF], 2021.

  6. Training language models to follow instructions with human feedback. NeurIPSLong Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe. [PDF], 2022.

  7. Introducing chatgpt. OnlineOpenAI. [PDF], 2023.

  8. Gpt-4 technical report. OnlineOpenAI. [PDF], 2023.

  9. Gpt-4 technical report. OnlineOpenAI. [PDF], 2024.

  10. Gpt-4 technical report. OnlineOpenAI. [PDF], 2024.

  11. LLaMA: Open and Efficient Foundation Language Models. arXivHugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. [PDF][Code], 2023.

  12. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXivHugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. [PDF][Code], 2023.

  13. Introducing Meta Llama 3: The most capable openly available LLM to date. OnlineMeta AI. [PDF][Code], 2024.

  14. Alpaca: A Strong, Replicable Instruction-Following Model. OnlineRohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto. [PDF][Code], 2023.

  15. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. OnlineThe Vicuna Team. [PDF][Code], 2023.

  16. QLoRA: Efficient Finetuning of Quantized LLMs. arXivTim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. [PDF][Code], 2023.

  17. Code Llama: Open Foundation Models for Code. arXivBaptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. [PDF][Code], 2023.

  18. A Brief Report on LawGPT 1.0: A Virtual Legal Assistant Based on GPT-3. arXivHa-Thanh Nguyen. [PDF], 2023.

  19. Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks. arXivTiedong Liu, Bryan Kian Hsiang Low. [PDF][Code], 2023.

  20. Visual instruction tuning. NeurIPSHaotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee. [PDF][Code], 2023.

  21. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXivDeyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny. [PDF][Code], 2023.

非 Transformer 架构

  1. Efficiently modeling long sequences with structured state spaces. arXivAlbert Gu, Karan Goel, Christopher Ré. [PDF][Code], 2021.

  2. On the Parameterization and Initialization of Diagonal State Space Models. NeurIPSAlbert Gu, Karan Goel, Ankit Gupta, Christopher Ré. [PDF], 2022.

  3. RWKV: Reinventing RNNs for the Transformer Era. EMNLPBo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Leon Derczynski, Xingjian Du, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Jiaju Lin, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan S. Wind, Stanislaw Wozniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, Rui-Jie Zhu [PDF][Code], 2023.

  4. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXivAlbert Gu, Tri Dao. [PDF][Code], 2023.

  5. Learning to (Learn at Test Time): RNNs with Expressive Hidden States. arXivYu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, et al. [PDF][Code], 2024.

Prompt 工程

Prompt 工程简介

  1. A Survey of Large Language Models. arXiv

  2. Wayne Xin Zhao, Qian Liu, Zhicheng Dou, Jian-Yun Nie, and Ji-Rong Wen.[PDF], 2023.

  3. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models EMNLP

  4. Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu. [PDF] [Code], 2023.

  5. FIT-RAG: Black-Box RAG with Factual Information and Token Reduction. arXiv

  6. Yuren Mao, Xuemei Dong, Wenyi Xu, Yunjun Gao, Bin Wei, Ying Zhang.[PDF], 2024.

  7. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model arXiv

  8. DeepSeek-AI. [PDF] [Code], 2024.

  9. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. EMNLP

  10. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev.[PDF] [Code], 2018.

  11. Measuring Massive Multitask Language Understanding ICLR

  12. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. [PDF] [Code], 2021.

  13. FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. SIGMOD

  14. Chao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, Jinshu Lin.[PDF] [Code], 2024.

  15. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models

  16. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, and Percy Liang.[PDF] [Code], 2023.

  17. Wizardcoder: Empowering code large language models with evol-instruct. arXiv

  18. Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang.[PDF] [Code], 2023.

  19. Generative Agents: Interactive Simulacra of Human Behavior. UIST

  20. Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein.[PDF] [Code], 2023.

上下文学习

  1. Language Models are Few-Shot Learners NeurIPS

  2. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. [PDF] [Code], 2020.

  3. An Explanation of In-context Learning as Implicit Bayesian Inference. ICLR

  4. Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma.[PDF], 2022.

  5. In-context Learning with Retrieved Demonstrations for Language Models: A Survey. arXiv

  6. Man Luo, Xin Xu, Yue Liu, Panupong Pasupat, Mehran Kazemi.[PDF], 2024.

  7. What Makes Good In-Context Examples for GPT-3? ACL

  8. Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen.[PDF] [Code], 2022.

  9. Self-Prompting Large Language Models for Zero-Shot Open-Domain QA arXiv

  10. Junlong Li, Jinyuan Wang, Zhuosheng Zhang, Hai Zhao. [PDF] [Code], 2024.

  11. Long Short-Term Memory Neural Computation

  12. Sepp Hochreiter, Jürgen Schmidhuber. [PDF] [Code], 1997.

  13. The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis. arXiv

  14. Yuxiang Zhou, Jiazheng Li, Yanzheng Xiang, Hanqi Yan, Lin Gui, Yulan He.[PDF], 2024.

  15. On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model. NAACL

  16. Seongjin Shin, Sang-Woo Lee, Hwijeen Ahn, Sungdong Kim, HyoungSeok Kim, Boseop Kim, Kyunghyun Cho, Gichang Lee, Woomyoung Park, Jung-Woo Ha, Nako Sung.[PDF], 2022.

  17. Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression. NeurIPS

  18. Allan Raventós, Mansheej Paul, Feng Chen, Surya Ganguli.[PDF] [Code], 2023.

  19. Data Distributional Properties Drive Emergent In-Context Learning in Transformers NeurIPS

Stephanie C.Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, Felix Hill. [PDF] [Code], 2022.

  1. Emergent Abilities of Large Language Models. Transaction of Machine Learning Research

  2. Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus.[PDF], 2022.

  3. In-Context Learning Learns Label Relationships but Is Not Conventional Learning arXiv

  4. Jannik Kossen, Yarin Gal, Tom Rainforth. [PDF] [Code], 2024.

  5. Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations. EMNLP

  6. Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim.[PDF], 2022.

  7. What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. ACL

  8. Jane Pan, Tianyu Gao, Howard Chen, Danqi Chen.[PDF] [Code], 2023.

  9. Emergent Abilities of Large Language Models. Transaction of Machine Learning Research

  10. Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus.[PDF], 2022.

  11. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? EMNLP

  12. Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer.[PDF] [Code], 2022.

  13. Unified Demonstration Retriever for In-Context Learning. ACL

  14. Xiaonan Li, Kai Lv, Hang Yan, Tianyang Lin, Wei Zhu, Yuan Ni, Guotong Xie, Xiaoling Wang, Xipeng Qiu.[PDF] [Code], 2023.

  15. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. ACL

  16. Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, Pontus Stenetorp.[PDF] [Code], 2022.

思维链

  1. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS

  2. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou.[PDF], 2022.

  3. Large Language Models are Zero-Shot Reasoners NeurIPS

  4. Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. [PDF] [Code], 2022.

  5. Automatic Chain of Thought Prompting in Large Language Models. ICLR

  6. Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola.[PDF] [Code], 2023.

  7. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS

  8. Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan.[PDF] [Code], 2023.

  9. Graph of Thoughts: Solving Elaborate Problems with Large Language Models AAAI

  10. Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler. [PDF] [Code], 2024.

  11. Self-Consistency Improves Chain of Thought Reasoning in Language Models. ICLR

  12. Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou.[PDF], 2023.

Prompt 技巧

  1. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics

  2. Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang.[PDF] [Code], 2024.

  3. C3: Zero-shot Text-to-SQL with ChatGPT arXiv

  4. Xuemei Dong, Chao Zhang, Yuhang Ge, Yuren Mao, Yunjun Gao, Lu Chen, Jinshu Lin, Dongfang Lou. [PDF] [Code], 2023.

  5. PaLM: Scaling Language Modeling with Pathways Journal of Machine Learning Research

  6. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel. [PDF] [Code], 2023.

  7. Better Zero-Shot Reasoning with Role-Play Prompting arxiv

  8. Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, Xiaohang Dong. [PDF] [Code], 2023.

相关应用

  1. A survey on large language model based autonomous agents. Frontiers of Computer Science

  2. Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen.[PDF] [Code], 2024.

  3. Generative Agents: Interactive Simulacra of Human Behavior. UIST

  4. Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein.[PDF] [Code], 2023.

  5. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. Advances in Neural Information Processing Systems

  6. Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang.[PDF] [Code], 2023.

  7. Garbage in, garbage out: Having useful data is everything. Measurement: Interdisciplinary Research and Perspectives

  8. L. Todd Rose and Kurt W. Fischer.[PDF], 2011.

  9. Will we run out of data? Limits of LLM scaling based on human-generated data. arxiv

  10. Pablo Villalobos, Colin Raffel, and Tim Dettmers.[PDF], 2022.

  11. Self-Instruct: Aligning Language Models with Self-Generated Instructions. ACL

  12. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi.[PDF] [Code], 2023.

  13. C3: Zero-shot Text-to-SQL with ChatGPT arXiv

  14. Xuemei Dong, Chao Zhang, Yuhang Ge, Yuren Mao, Yunjun Gao, Lu Chen, Jinshu Lin, Dongfang Lou. [PDF] [Code], 2023.

参数高效微调

参数高效微调简介

  1. Efficient Large Language Models: A Survey. arXivZhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang. [PDF] [Code], 2023.

  2. A Survey for In-context Learning. arXivQingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, Zhifang Sui. [PDF] [Code], 2023.

  3. Instruction Tuning for Large Language Models: A Survey. arXivShengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang. [PDF] [Code], 2023.

  4. Finetuned language models are zero-shot learners. arXivJason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le. [PDF] [Code], 2021.

  5. Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLRVictor Sanh et al. [PDF] [Code], 2022.

  6. Instruction in the Wild: A User-based Instruction Dataset. GitHubJinjie Ni and Fuzhao Xue and Kabir Jain and Mahir Hitesh Shah and Zangwei Zheng and Yang You. [Code], 2023.

  7. Self-Instruct: Aligning Language Models with Self-Generated Instructions. ACLYizhong Wang et al. [PDF] [Code], 2023.

  8. Llama 2: Open foundation and fine-tuned chat models. arXivYizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. [PDF] [Code], 2023.

参数附加方法

  1. The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLPBrian Lester, Rami Al-Rfou, and Noah Constant [PDF] [Code], 2021.

  2. Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACLXiang Lisa Li and Percy Liang [PDF] [Code], 2021.

  3. Parameter-Efficient Transfer Learning for NLP. ICMLNeil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly. [PDF] [Code], 2019.

  4. AdapterFusion: Non-Destructive Task Composition for Transfer Learning `` Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych. [PDF] [Code], 2020.

  5. SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters. Findings of EMNLPShwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao. [PDF] [Code], 2022.

  6. Counter-Interference Adapter for Multilingual Machine Translation. Findings of EMNLPYaoming Zhu, Jiangtao Feng, Chengqi Zhao, Mingxuan Wang, Lei Li. [PDF] [Code], 2021.

  7. Tuning Language Models by Proxy. arXivAlisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith. [PDF] [Code], 2024.

  8. Training Neural Networks with Fixed Sparse Masks. NIPSYi-Lin Sung, Varun Nair, Colin Raffel [PDF] [Code], 2021.

参数选择方法

  1. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. ACLElad Ben Zaken, Shauli Ravfogel, Yoav Goldberg [PDF] [Code], 2022.

  2. What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning. arXivJaejun Lee, Raphael Tang, and Jimmy Lin [PDF], 2019.

  3. On the Effectiveness of Parameter-Efficient Fine-Tuning. AAAIZihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel Collier. [PDF] [Code], 2023.

  4. Parameter-Efficient Fine-Tuning without Introducing New Latency. ACLBaohao Liao, Yan Meng, and Christof Monz [PDF], 2023.

  5. Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning. EMNLPRunxin Xu, Fuli Luo, Zhiyuan Zhang, Chuanqi Tan, Baobao Chang, Songfang Huang, Fei Huang. [PDF] [Code], 2021.

  6. Masking as an Efficient Alternative to Finetuning for Pre-trained Language Models. EMNLPMengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze. [PDF], 2020.

  7. Composable Sparse Fine-Tuning for Cross-Lingual Transfer. ACLAlan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić. [PDF] [Code], 2022.

  8. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. ICLRAlex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman. [PDF] [Code], 2019.

  9. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. ICLRJonathan Frankle and Michael Carbin [PDF], 2019.

  10. Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning EMNLP Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Peng Shi, Wenpeng Yin, Rui Zhang. [PDF] [Code], 2023.

低秩适配方法

  1. LoRA: Low-Rank Adaptation of Large Language Models. ICLREdward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen [PDF] [Code], 2022

  2. Towards a Unified View of Parameter-Efficient Transfer Learning. ICLRJunxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig. [PDF] [Code], 2022.

  3. A Note on LoRA. arXivVlad Fomenko, Han Yu, Jongho Lee, Stanley Hsieh, Weizhu Chen. [PDF], 2024.

  4. KronA: Parameter Efficient Tuning with Kronecker Adapter arXiv Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh. [PDF], 2022.

  5. Parameter-Efficient Model Adaptation for Vision Transformers. AAAIXuehai He,Chunyuan Li,Pengchuan Zhang,Jianwei Yang,Xin Eric Wang. [PDF], 2023.

  6. DoRA: Weight-Decomposed Low-Rank Adaptation. arXivShih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen. [PDF] [Code], 2024.

  7. LoRA Learns Less and Forgets Less arXiv Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham. [PDF], 2024.

  8. GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection arXiv Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian. [PDF], 2024.

  9. S-LoRA: Serving Thousands of Concurrent LoRA Adapters. arXivYing Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica. [PDF] [Code], 2023.

  10. Sparse Low-rank Adaptation of Pre-trained Language Models. EMNLPNing Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, Maosong Sun. [PDF] [Code], 2023.

  11. DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution. arXivYulong Mao, Kaiyu Huang, Changhao Guan, Ganglin Bao, Fengran Mo, Jinan Xu [PDF] [Code], 2024.

  12. ReLoRA: High-Rank Training Through Low-Rank Updates. NIPS WorkshopVladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky. [PDF] [Code],2023.

  13. SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining. arXivAndi Han, Jiaxiang Li, Wei Huang, Mingyi Hong, Akiko Takeda, Pratik Jawanpuria, Bamdev Mishra. [PDF] [Code], 2024.

  14. Pissa: Principal singular values and singular vectors adaptation of large language models. arXivFanxu Meng, Zhaohui Wang, Muhan Zhang [PDF] [Code], 2024.

  15. MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning. arXivHanqing Wang, Zeguan Xiao, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen. [PDF], 2024.

  16. A Survey on LoRA of Large Language Models. arXivYuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao. [PDF] [Code], 2024.

  17. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat. Mac. Intell.Ding, Ning, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu. [PDF], 2023.

  18. LoTR: Low Tensor Rank Weight Adaptation. arXivDaniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets. [PDF], 2024.

  19. MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning. arXivTing Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang. [PDF] [Code], 2024.

  20. Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning. arXivWenhan Xia, Chengwei Qin, Elad Hazan. [PDF], 2024.

  21. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. ACL/IJCNLPArmen Aghajanyan, Luke Zettlemoyer, Sonal Gupta. [PDF],2021.

  22. Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning. arXivPengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei [PDF], 2024.

  23. LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning. arXivRui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang. [PDF] [Code], 2024.

  24. Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning. arXivWenhan Xia, Chengwei Qin, and Elad Hazan [PDF], 2024.

  25. Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning. ICLRQingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao. [PDF] [Code], 2023.

  26. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition. CoLMChengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, Min Lin. [PDF] [Code], 2023.

  27. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation EACL Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi. [PDF] [Code], 2023.

  28. DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dy-namic Rank Distribution arXiv Yulong Mao, Kaiyu Huang, Changhao Guan, Ganglin Bao, Fengran Mo, Jinan Xu. [PDF] [Code],2023.

实践与应用

  1. FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. SIGMODChao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, Jinshu Lin. [PDF], 2024.

  2. TabLLM: Few-shot Classification of Tabular Data with Large Language Models. AISTATSStefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, David Sontag. [PDF], 2023.

模型编辑

模型编辑简介

  1. Knowledge Editing for Large Language Models: A Survey. arXivSong Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, Jundong Li. [PDF], 2023

  2. A Comprehensive Study of Knowledge Editing for Large Language Models. arXivNingyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen. [PDF][Code], 2024

  3. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLPYunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang. [PDF][Code], 2023

  4. A Survey on Knowledge Editing of Neural Networks. arXivVittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi. [PDF], 2023

模型编辑经典方法

  1. Memory-Based Model Editing at Scale. ICMLEric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn. [PDF][Code], 2022

  2. Fixing Model Bugs with Natural Language Patches. EMNLPShikhar Murty, Christopher D. Manning, Scott M. Lundberg, Marco Túlio Ribeiro. [PDF], 2022

  3. Calibrating Factual Knowledge in Pretrained Language Models. EMNLPQingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, Lei Li. [PDF][Code], 2022

  4. Transformer-Patcher: One Mistake Worth One Neuron. ICLRZeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, Zhang Xiong. [PDF][Code], 2023

  5. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors. NeurIPSTom Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi. [PDF][Code], 2023

  6. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligenceTimothy Hospedales, Antreas Antoniou, Paul Micaelli, Amos Storkey. [PDF], 2021

  7. Editable Neural Networks. ICLRAnton Sinitsin, Vsevolod Plokhotnyuk, Dmitry V. Pyrkin, Sergei Popov, Artem Babenko. [PDF][Code], 2020

  8. Editing Factual Knowledge in Language Models. EMNLPNicola De Cao, Wilker Aziz, Ivan Titov. [PDF][Code], 2021

  9. Fast Model Editing at Scale. ICLREric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning. [PDF][Code], 2022

  10. Transformer Feed-Forward Layers Are Key-Value Memories. EMNLPMor Geva, Roei Schuster, Jonathan Berant, Omer Levy. [PDF][Code], 2021

  11. Knowledge Neurons in Pretrained Transformers. ACLDamai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei. [PDF][Code], 2022

  12. Locating and Editing Factual Associations in GPT. NeurIPSKevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. [PDF][Code], 2022

  13. Mass-Editing Memory in a Transformer. ICLRKevin Meng, Arnab Sen Sharma, Alex J. Andonian, Yonatan Belinkov, David Bau. [PDF][Code], 2023

附加参数法:T-Patcher

  1. Transformer-Patcher: One Mistake Worth One Neuron. ICLRZeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, Zhang Xiong. [PDF][Code], 2023

定位编辑法:ROME

  1. Locating and Editing Factual Associations in GPT. NeurIPSKevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. [PDF][Code], 2022

  2. Mass-Editing Memory in a Transformer. ICLRKevin Meng, Arnab Sen Sharma, Alex J. Andonian, Yonatan Belinkov, David Bau. [PDF][Code], 2023

模型编辑应用

  1. Scalable Extraction of Training Data from (Production) Language Models. arXivMilad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee. [PDF], 2023

  2. DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models. arXivXinwei Wu, Junzhuo Li, Minghui Xu, Weilong Dong, Shuangzhi Wu, Chao Bian, Deyi Xiong. [PDF][Code], 2023

  3. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. arXivMor Geva, Avi Caciularu, Kevin Ro Wang, Yoav Goldberg. [PDF][Code], 2022

  4. Locating and Mitigating Gender Bias in Large Language Models. arXivYuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu, Enhong Chen. [PDF], 2024

  5. Debiasing Algorithm through Model Adaptation. arXivTomasz Limisiewicz, David Mareček, Tomáš Musil. [PDF][Code], 2023

检索增强生成RAG

检索增强生成简介

  1. No free lunch theorems for optimization. IEEE Transactions on Evolutionary ComputationDavid H. Wolp ert, William G. Macready [PDF], 1997

  2. Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPSPatrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela [PDF], 2020

检索增强生成架构

  1. In-context retrieval-augmented language models. Transactions of the Association for Computational LinguisticsOri Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham. [PDF][Code], 2023

  2. Replug: Retrieval-augmented black-box language models. arXivWeijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih. [PDF], 2023

  3. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning ResearchGautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave. [PDF][Code], 2023

  4. Improving language models by retrieving from trillions of tokens. ICML Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark. [PDF][Code], 2022

  5. Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In. arXiv Zichun Yu, Chenyan Xiong, Shi Yu, Zhiyuan Liu. [PDF][Code], 2023

  6. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXivAkari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi. [PDF][Code], 2023

知识检索

  1. The Chronicles of RAG: The Retriever, the Chunk and the Generator. arXivPaulo Finardi, Leonardo Avila, Rodrigo Castaldoni, Pedro Gengo, Celio Larcher, Marcos Piau, Pablo Costa, Vinicius Carid{'a}. [PDF], 2024

  2. LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding. arXivMingrui Wu, Sheng Cao. [PDF], 2024

  3. Generate rather than retrieve: Large language models are strong context generators. ICLRWenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang. [PDF][Code], 2023

  4. An information-theoretic perspective of tf--idf measures. IPMAkiko Aizawa. [PDF], 2003

  5. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information RetrievalStephen Robertson, Hugo Zaragoza. [PDF], 2009

  6. Investigating the Effects of Sparse Attention on Cross-Encoders. ECIRFerdinand Schlatt, Maik Fr{"o}be, Matthias Hagen. [PDF][Code], 2024

  7. A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE. arXivHerv{'e} D{'e}jean, St{'e}phane Clinchant, Thibault Formal. [PDF], 2024

  8. Dense passage retrieval for open-domain question answering. EMNLPVladimir Karpukhin, Barlas O{\u{g}}uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih. [PDF][Code], 2020

  9. Colbert: Efficient and effective passage search via contextualized late interaction over bert. SIGIROmar Khattab, Matei Zaharia. [PDF][Code], 2020

  10. Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXivSamuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston. [PDF][Code], 2019

  11. Transformer memory as a differentiable search index. Advances in Neural Information Processing SystemsYi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta. [PDF][Code], 2022

  12. From matching to generation: A survey on generative information retrieval. arXivXiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, Zhicheng Dou. [PDF], 2024

  13. A Neural Corpus Indexer for Document Retrieval. arXivYujing Wang, Ying Hou, Hong Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang. [PDF], 2022

  14. Multidimensional binary search trees used for associative searching. Communications of the ACMJon Louis Bentley. [PDF], 1975

  15. Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXivMohamad Dolatshah, Ali Hadian, Behrouz Minaei-Bidgoli. [PDF], 2015

  16. Approximate nearest neighbor algorithm based on navigable small world graphs. Information SystemsYury Malkov, Alexander Ponomarenko, Andrey Logvinov, Vladimir Krylov. [PDF], 2014

  17. Non-metric similarity graphs for maximum inner product search. Advances in Neural Information Processing SystemsStanislav Morozov, Artem Babenko. [PDF][Code], 2018

  18. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine IntelligenceYu A Malkov, Dmitry A Yashunin. [PDF][Code], 2018

  19. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine IntelligenceHerve Jegou, Matthijs Douze, Cordelia Schmid. [PDF], 2010

  20. Optimized product quantization for approximate nearest neighbor search. CVPRTiezheng Ge, Kaiming He, Qifa Ke, Jian Sun. [PDF], 2013

  21. Searching in one billion vectors: re-rank with source coding. ICASSPHerv{'e} J{'e}gou, Romain Tavenard, Matthijs Douze, Laurent Amsaleg. [PDF], 2011

  22. Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXivWeiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, Zhaochun Ren. [PDF][Code], 2023

生成增强

  1. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. EMNLPPotsawee Manakul, Adian Liusie, Mark JF Gales [PDF] [Code], 2023

  2. Predicting Question-Answering Performance of Large Language Models through Semantic Consistency. arXivElla Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby Tavor [PDF], 2020

  3. Large language models struggle to learn long-tail knowledge. ICMLNikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel [PDF] [Code], 2023

  4. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. ACLAlex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Hannaneh Hajishirzi, Daniel Khashabi [PDF] [Code], 2023

  5. Locating and editing factual associations in GPT. NeurIPSKevin Meng, David Bau, Alex Andonian, Yonatan Belinkov [PDF] [Code], 2022

  6. Learning to trust your feelings: Leveraging self-awareness in llms for hallucination mitigation. arXivYuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang [PDF][Code], 2024

  7. Improving Language Models via Plug-and-Play Retrieval Feed-back. arXivWenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, Ashish Sabharwal [PDF], 2023

  8. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXivOmar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia [PDF][Code], 2022

  9. Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models. EMNLPGangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, Jaewoo Kang [PDF][Code], 2023

  10. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. arXivHuiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu [PDF][Code], 2023

  11. FIT-RAG: Black-Box RAG with Factual Information and Token Reduction. ACM Transactions on Information SystemsYuren Mao, Xuemei Dong, Wenyi Xu, Yunjun Gao, Bin Wei, Ying Zhang [PDF], 2024

  12. Prca: Fitting black-box large language models for retrieval question answering via pluggable reward-driven contextual adapter. EMNLPHaoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao [PDF], 2023

  13. Triforce: Lossless acceleration of long sequence generation with hierarchical speculative decodingr. arXivHanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen [PDF][Code], 2024

  14. RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation. arXivChao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin [PDF], 2024

实践与应用

  1. A survey on large language model based autonomous agents. Frontiers of Computer ScienceHanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen [PDF][Code], 2024

  2. Multimodal prompt retrieval for generative visual question answering. ACLTimothy Ossowski, Junjie Hu [PDF][Code], 2023

  3. FinTextQA: A Dataset for Long-form Financial Question Answering. arXivJian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang [PDF], 2024

  4. Retrieval-based controllable molecule generation. ICLRZichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumarn [PDF][Code], 2022

  5. Re-imagen: Retrieval-augmented text-to-image generator. arXivWenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen [PDF], 2022

  6. Using external off-policy speech-to-text mappings in contextual end-to-end automated speech recognition. arXivDavid M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister [PDF], 2023

  7. Language models with image descriptors are strong few-shot video-language learners. NeurIPSZhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji [PDF][Code], 2022

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值