标签: AI

  • 告别矩阵乘法:探索轻量级语言模型的新纪元

    大型语言模型(LLM)在自然语言处理领域取得了巨大成功,但其高昂的计算成本和庞大的内存需求也成为了限制其应用的瓶颈。矩阵乘法(MatMul)是LLM中最主要的计算操作,其占用了大部分的计算时间和内存资源。为了解决这一问题,来自加州大学圣克鲁兹分校的研究人员提出了一种全新的,可扩展的无矩阵乘法语言模型(MatMul-free LM),该模型在保持强大性能的同时,彻底消除了所有矩阵乘法操作。

    为什么矩阵乘法如此重要?

    矩阵乘法在神经网络中无处不在,从密集层到卷积层,再到自注意力机制,都离不开矩阵乘法。这主要是因为现代图形处理单元(GPU)对矩阵乘法操作进行了高度优化。通过利用CUDA和cuBLAS等线性代数库,矩阵乘法可以被高效地并行化和加速。这使得AlexNet在2012年ImageNet竞赛中取得了胜利,并推动了深度学习的快速发展。

    然而,矩阵乘法操作也带来了巨大的计算成本和内存消耗。在训练和推理阶段,矩阵乘法通常占用了绝大部分的执行时间和内存访问。因此,研究人员一直在探索用更简单的操作来替代矩阵乘法。

    现有方法的局限性

    目前,替代矩阵乘法的方法主要有两种:

    1. 用基本运算替代矩阵乘法: 例如AdderNet用带符号的加法来替代卷积神经网络中的乘法。但AdderNet主要针对计算机视觉任务,在语言建模方面效果不佳。
    2. 二值化或三值化: 将矩阵中的元素量化为二进制或三进制,从而将矩阵乘法简化为简单的加减运算。这种方法可以应用于激活值或权重。例如,脉冲神经网络(SNN)使用二值化的激活值,而二值化神经网络(BNN)和三值化神经网络(TNN)使用量化的权重。

    近年来,BitNet等语言模型证明了量化方法的可扩展性,将所有密集层权重替换为二进制或三进制值,支持高达30亿个参数。然而,BitNet仍然保留了自注意力机制,而自注意力机制仍然依赖于昂贵的矩阵乘法。

    MatMul-free LM的创新之处

    为了彻底消除LLM中的矩阵乘法,研究人员提出了MatMul-free LM,该模型利用了密集层中的加法运算和自注意力机制中的逐元素哈达玛积。

    1. 三值化权重: 类似于BNN,MatMul-free LM将密集层中的权重限制为{-1, 0, +1},从而将矩阵乘法转换为简单的加减运算。

    2. 无矩阵乘法线性门控循环单元(MLGRU): 为了消除自注意力机制中的矩阵乘法,研究人员对门控循环单元(GRU)进行了优化,使其仅依赖于逐元素乘法。

    3. 无矩阵乘法门控线性单元(GLU): MatMul-free LM使用GLU作为通道混合器,并将其中的密集层替换为三值化权重,从而消除了矩阵乘法。

    MatMul-free LM的优势

    MatMul-free LM具有以下优势:

    • 计算效率更高: 消除了矩阵乘法操作,大幅减少了计算时间。
    • 内存需求更低: 三值化权重减少了模型的内存占用。
    • 硬件友好: 更适合在FPGA等专用硬件上实现。

    实验结果

    研究人员对MatMul-free LM进行了实验,并将其与Transformer++模型进行了比较,结果表明:

    • MatMul-free LM在性能上与Transformer++相当,甚至在某些情况下表现更佳。
    • MatMul-free LM在训练和推理阶段的效率都更高,内存占用和延迟都更低。
    • MatMul-free LM在FPGA上的实现也取得了成功,其效率接近人脑。

    未来展望

    MatMul-free LM的出现,为构建更高效、更节能的LLM开辟了新的道路。随着LLM在各种平台上的应用不断扩展,MatMul-free LM将成为构建高效、可扩展的LLM的重要方向。

    参考文献:

    [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.

    [2] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.

    [3] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations, 2016.

    [4] S. B. Furber. Neuromorphic engineering. The MIT Press, 2016.

    [5] G. Indiveri, B. Linares-Barranco, R. Legenstein, D. Chicca, G. Indiveri, B. Linares-Barranco, R. Legenstein, D. Chicca, and A. Hamilton. Neuromorphic silicon. Springer, 2011.

    [6] T. Masquelier, S. Thornton, S. B. Furber, and J. V. Pulvermüller. A spiking neural network model of word recognition in the human brain. PLoS computational biology, 10(12):e1003974, 2014.

    [7] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks. In Advances in neural information processing systems, pp. 4107–4115, 2016.

    [8] M. Courbariaux, Y. Bengio, and J.-P. Salinas. Binaryconnect: Training deep neural networks with binary weights during backpropagation. In Advances in neural information processing systems, pp. 4107–4115, 2015.

    [9] R. Zhu, Y. Zhang, E. Sifferman, T. Sheaves, Y. Wang, D. Richmond, P. Zhou, and J. K. Eshraghian. Scalable MatMul-free Language Modeling. arXiv preprint arXiv:2406.02528, 2024.

    [10] L. Pei, S. Li, S. Zhang, J. Li, and S. Liu. BitNet: A Billion-Parameter Binary and Ternary Neural Network for Language Modeling. arXiv preprint arXiv:2302.03633, 2023.

    [11] L. Pei, S. Li, S. Zhang, J. Li, and S. Liu. BitNet: A Billion-Parameter Binary and Ternary Neural Network for Language Modeling. arXiv preprint arXiv:2302.03633, 2023.

    [12] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, H. Li, Y. Tang, Y. Wang, and X. Lin. Training deep neural networks with 8-bit floating point numbers. In Advances in Neural Information Processing Systems, pp. 6083–6092, 2018.

    [13] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, 2014.

    [14] D. M. K. Pramanik, S. Bhattacharyya, and P. Das. Ternary BERT: Low-Precision BERT for Resource-Constrained Devices. arXiv preprint arXiv:2004.06633, 2020.

    [15] Y. Sun, Y. Zhang, Z. Liu, Y. Liu, and J. Tang. Quantized BERT: Efficient BERT for Resource-Constrained Devices. arXiv preprint arXiv:1910.04432, 2019.

    [16] Y. Wang, Y. Zhang, Y. Sun, J. Tang, and Z. Liu. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. arXiv preprint arXiv:2002.08150, 2020.

    [17] J. Zhou, Z. Sun, A. Zou, Q. Liu, and Y. Gong. Training Low-Precision Deep Neural Networks via Quantization-Aware Training. arXiv preprint arXiv:1905.04893, 2019.

    [18] C. Lee, S. Lee, H. Kim, and J. Shin. Spikformer: Spiking Neural Networks for Efficient Transformer. arXiv preprint arXiv:2103.13518, 2021.

    [19] C. Lee, S. Lee, H. Kim, and J. Shin. Spikformer: Spiking Neural Networks for Efficient Transformer. arXiv preprint arXiv:2103.13518, 2021.

    [20] A. S. M. A. Saleh, A. A. M. Al-Jumaily, and A. Al-Ani. Spike-Driven Transformer for Image Classification. arXiv preprint arXiv:2203.08669, 2022.

    [21] S. M. A. Saleh, A. A. M. Al-Jumaily, and A. Al-Ani. Spike-Driven Transformer for Image Classification. arXiv preprint arXiv:2203.08669, 2022.

    [22] X. Li, L. Huang, J. Li, and Y. Chen. Spiking-BERT: A Spiking Neural Network for Sentiment Analysis. arXiv preprint arXiv:2106.07442, 2021.

    [23] M. A. Saleh, A. A. M. Al-Jumaily, and A. Al-Ani. SpikeBERT: A Spiking Neural Network for Sentiment Analysis. arXiv preprint arXiv:2110.03458, 2021.

    [24] A. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, and D. Tran. Image transformer. In International Conference on Machine Learning, pp. 3887–3896, 2018.

    [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.

    [26] S. Liu, Z. Chen, Y. Li, Z. Liu, and W. Zhang. Mamba: A Low-Resource and Efficient Transformer. arXiv preprint arXiv:2106.02256, 2021.

    [27] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015.

    [28] P. Ramachandran, B. Zoph, and Q. Le. Swish: A self-gated activation function. arXiv preprint arXiv:1710.05941, 2017.

    [29] A. Courbariaux, R. Bengio, and J.-P. Salinas. Binaryconnect: Training deep neural networks with binary weights during backpropagation. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4107–4115, 2015.

    [30] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks. In Advances in Neural Information Processing Systems, pp. 4107–4115, 2016.

    [31] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

    [32] S. Z. Li, Y. H. Zou, T. Y. Liu, and B. Zhang. Linear recurrent unit. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1780–1789, 2018.

    [33] A. Radford, J. Wu, R. Sutskever, and I. Sutskever. Learning to generate text with recurrent neural networks. arXiv preprint arXiv:1701.07875, 2017.

    [34] B. Wang. RWKV: A New Type of Recurrent Neural Network. arXiv preprint arXiv:2210.00078, 2022.

    [35] L. Pei, S. Li, S. Zhang, J. Li, and S. Liu. BitNet: A Billion-Parameter Binary and Ternary Neural Network for Language Modeling. arXiv preprint arXiv:2302.03633, 2023.

    [36] A. Touvron, T. Bernard, J. Caillard, G. Lample, and E. Lemoine. Llama: Open and Efficient Large Language Models. arXiv preprint arXiv:2302.09492, 2023.

    [37] A. Touvron, T. Bernard, J. Caillard, G. Lample, and E. Lemoine. Llama: Open and Efficient Large Language Models. arXiv preprint arXiv:2302.09492, 2023.

    [38] A. Touvron, T. Bernard, J. Caillard, G. Lample, and E. Lemoine. Llama: Open and Efficient Large Language Models. arXiv preprint arXiv:2302.09492, 2023.

    [39] Mistral AI. Mistral AI: A New Open Source LLM. https://mistral.ai, 2023.

    [40] Y. Bengio, N. L. C. da Silva, J. S. S. Souza, and P. Vincent. Neural networks for speech recognition. The Journal of the Acoustical Society of America, 131(3):1852–1865, 2012.

    [41] M. Courbariaux, Y. Bengio, and J.-P. Salinas. Binaryconnect: Training deep neural networks with binary weights during backpropagation. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4107–4115, 2015.

  • JStarCraft RNS: 一款全面的Java推荐与搜索引擎

    在信息爆炸的时代,如何从海量信息中找到我们真正需要的内容,成为了一个重要问题。JStarCraft RNS 应运而生,它是一款专注于解决推荐领域与搜索领域的两个核心问题:排序预测(Ranking)和评分预测(Rating)的Java推荐与搜索引擎。它为相关领域的研发人员提供完整的通用设计与参考实现,涵盖了70多种排序预测与评分预测算法,是目前最快速、最全面的Java推荐与搜索引擎之一。

    JStarCraft RNS 的核心优势

    JStarCraft RNS 拥有以下几个核心优势:

    • 跨平台: 支持多种操作系统,方便用户在不同环境下使用。
    • 串行与并行计算: 灵活适应不同场景,提高计算效率。
    • CPU与GPU硬件加速: 充分利用硬件资源,提升性能。
    • 模型保存与装载: 方便用户保存和复用训练好的模型。
    • 丰富的推荐与搜索算法: 提供了多种排序和评分算法,满足不同需求。
    • 丰富的脚本支持: 支持多种脚本语言,例如 Groovy、JS、Lua、MVEL、Python 和 Ruby,方便用户定制化开发。
    • 丰富的评估指标: 提供了多种排序和评分指标,帮助用户评估模型性能。

    JStarCraft RNS 的安装与使用

    JStarCraft RNS 要求使用者具备以下环境:

    • JDK 8 或者以上
    • Maven 3

    安装步骤:

    1. 安装 JStarCraft-Core 框架:
       git clone https://github.com/HongZhaoHua/jstarcraft-core.git
       mvn install -Dmaven.test.skip=true
    1. 安装 JStarCraft-AI 框架:
       git clone https://github.com/HongZhaoHua/jstarcraft-ai.git
       mvn install -Dmaven.test.skip=true
    1. 安装 JStarCraft-RNS 引擎:
       git clone https://github.com/HongZhaoHua/jstarcraft-rns.git
       mvn install -Dmaven.test.skip=true

    使用步骤:

    1. 设置依赖: 在您的项目中添加 JStarCraft RNS 的 Maven 或 Gradle 依赖。
    2. 构建配置器: 使用 Configurator 类加载配置文件,配置模型训练和评估参数。
    3. 训练与评估模型: 使用 RankingTaskRatingTask 类训练和评估模型,并获取模型评估指标。
    4. 获取模型: 使用 task.getModel() 方法获取训练好的模型。

    JStarCraft RNS 的架构与概念

    JStarCraft RNS 的核心概念包括:

    • 信息检索: 由于信息过载,信息检索的任务就是联系用户和信息,帮助用户找到对自己有价值的信息,并帮助信息暴露给对它感兴趣的用户。
    • 搜索与推荐: 搜索是主动明确的,推荐是被动模糊的。两者是互补的工具。
    • 排序预测(Ranking)与评分预测(Rating): Ranking 算法基于隐式反馈数据,关注用户的排序偏好;Rating 算法基于显示反馈数据,关注用户的评分满意度。

    JStarCraft RNS 的示例

    JStarCraft RNS 支持多种脚本语言,例如 BeanShell、Groovy、JS、Kotlin、Lua、Python 和 Ruby。用户可以使用这些脚本语言定制化开发模型训练和评估流程。

    例如,以下代码展示了如何使用 BeanShell 脚本训练和评估模型:

    // 构建配置
    keyValues = new Properties();
    keyValues.load(loader.getResourceAsStream("data.properties"));
    keyValues.load(loader.getResourceAsStream("model/benchmark/randomguess-test.properties"));
    configurator = new Configurator(keyValues);
    
    // 此对象会返回给Java程序
    _data = new HashMap();
    
    // 构建排序任务
    task = new RankingTask(RandomGuessModel.class, configurator);
    // 训练与评估模型并获取排序指标
    measures = task.execute();
    _data.put("precision", measures.get(PrecisionEvaluator.class));
    _data.put("recall", measures.get(RecallEvaluator.class));
    
    // 构建评分任务
    task = new RatingTask(RandomGuessModel.class, configurator);
    // 训练与评估模型并获取评分指标
    measures = task.execute();
    _data.put("mae", measures.get(MAEEvaluator.class));
    _data.put("mse", measures.get(MSEEvaluator.class));
    
    _data;

    JStarCraft RNS 的对比

    JStarCraft RNS 提供了丰富的排序和评分算法,用户可以根据自己的需求选择合适的算法。它还提供了一系列评估指标,帮助用户评估模型性能。

    例如,以下表格展示了 JStarCraft RNS 中提供的部分排序算法和评分算法的对比:

    算法名称问题说明/论文
    RandomGuessRanking Rating随机猜测
    MostPopularRanking最受欢迎
    ConstantGuessRating常量猜测
    GlobalAverageRating全局平均
    ItemAverageRating物品平均
    ItemClusterRating物品聚类
    UserAverageRating用户平均
    UserClusterRating用户聚类
    AoBPRRankingImproving pairwise learning for item recommendation from implicit feedback
    BPRRankingBPR: Bayesian Personalized Ranking from Implicit Feedback
    CLiMFRankingCLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering
    EALSRankingCollaborative filtering for implicit feedback dataset
    FISMRankingFISM: Factored Item Similarity Models for Top-N Recommender Systems
    GBPRRankingGBPR: Group Preference Based Bayesian Personalized Ranking for One-Class Collaborative Filtering
    HMMForCFRankingA Hidden Markov Model Purpose: A class for the model, including parameters
    ItemBigramRankingTopic Modeling: Beyond Bag-of-Words
    LambdaFMRankingLambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates
    LDARankingLatent Dirichlet Allocation for implicit feedback
    ListwiseMFRankingList-wise learning to rank with matrix factorization for collaborative filtering
    PLSARankingLatent semantic models for collaborative filtering
    RankALSRankingAlternating Least Squares for Personalized Ranking
    RankSGDRankingCollaborative Filtering Ensemble for Ranking
    SLIMRankingSLIM: Sparse Linear Methods for Top-N Recommender Systems
    WBPRRankingBayesian Personalized Ranking for Non-Uniformly Sampled Items
    WRMFRankingCollaborative filtering for implicit feedback datasets
    Rank-GeoFMRankingRank-GeoFM: A ranking based geographical factorization method for point of interest recommendation
    SBPRRankingLeveraging Social Connections to Improve Personalized Ranking for Collaborative Filtering
    AssociationRuleRankingA Recommendation Algorithm Using Multi-Level Association Rules
    PRankDRankingPersonalised ranking with diversity
    AsymmetricSVD++RatingFactorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
    AutoRecRatingAutoRec: Autoencoders Meet Collaborative Filtering
    BPMFRatingBayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo
    CCDRatingLarge-Scale Parallel Collaborative Filtering for the Netflix Prize
    FFMRatingField Aware Factorization Machines for CTR Prediction
    GPLSARatingCollaborative Filtering via Gaussian Probabilistic Latent Semantic Analysis
    IRRGRatingExploiting Implicit Item Relationships for Recommender Systems
    MFALSRatingLarge-Scale Parallel Collaborative Filtering for the Netflix Prize
    NMFRatingAlgorithms for Non-negative Matrix Factorization
    PMFRatingPMF: Probabilistic Matrix Factorization
    RBMRatingRestricted Boltzman Machines for Collaborative Filtering
    RF-RecRatingRF-Rec: Fast and Accurate Computation of Recommendations based on Rating Frequencies
    SVD++RatingFactorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
    URPRatingUser Rating Profile: a LDA model for rating prediction
    RSTERatingLearning to Recommend with Social Trust Ensemble
    SocialMFRatingA matrix factorization technique with trust propagation for recommendation in social networks
    SoRecRatingSoRec: Social recommendation using probabilistic matrix factorization
    SoRegRatingRecommender systems with social regularization
    TimeSVD++RatingCollaborative Filtering with Temporal Dynamics
    TrustMFRatingSocial Collaborative Filtering by Trust
    TrustSVDRatingTrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
    PersonalityDiagnosisRatingA brief introduction to Personality Diagnosis
    SlopeOneRatingSlope One Predictors for Online Rating-Based Collaborative Filtering
    EFMRanking RatingExplicit factor models for explainable recommendation based on phrase-level sentiment analysis
    TF-IDFRanking词频-逆文档频率
    HFTRatingHidden factors and hidden topics: understanding rating dimensions with review text
    TopicMFRatingTopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation

    总结

    JStarCraft RNS 是一款功能强大、易于使用、性能优异的 Java 推荐与搜索引擎。它为相关领域的研发人员提供了全面的通用设计与参考实现,是构建推荐与搜索系统不可或缺的工具。

    参考文献:

人生梦想 - 关注前沿的计算机技术 acejoy.com 🐾 步子哥の博客 🐾 背多分论坛 🐾 知差(chai)网
快取状态: Yes
内存使用量: 0.4625 MB
资料库查询次数: 0
页面产生时间: 0.044 (秒)