[D] Why transformers are not trained layer-wise?
Reddit » Machine Learning
by /u/kiockete
9m ago
It seems to me that thanks to the residual path the gradient that flows to each layer is the same regardless of the transformer layer/block. Example: ProjectionAndCost(X + L1(X) + L2(X + L1(X)) + L3(X + L1(X) + L2(X + L1(X))) ...) Since the input to ProjectionAndCost is just sum of outputs from all layers and initial embeddings then the gradient that comes to the layer L1 is the same as the gradient that comes to L2 or L3. So we could: first train only L1: ProjectionAndCost(X + L1(X)) freeze L1, include L2 and train: ProjectionAndCost(X + L1(X) + L2(X + L1(X))) freeze L1 and L2, include L3 a ..read more
Visit website
[D] Is there an equivalent BigDL project for NVIDIA GPUs, which allows distributing work loads across a DL cluster with spark?
Reddit » Machine Learning
by /u/PepperGrind
3h ago
So there's this relatively new "BigDL" project" (https://bigdl.readthedocs.io/en/latest/ ), which is for Intel CPUs and Intel GPUs, but there's no mention anywhere of it working for NVIDIA GPUs. Is there any equivalent library for NVIDIA GPUs on a spark cluster? submitted by /u/PepperGrind [visit reddit] [comments ..read more
Visit website
[P] New Book: BUILD GPT: HOW AI WORKS
Reddit » Machine Learning
by /u/Pure_Nerve_595
5h ago
After having worked on it for many months, I am now excited to say that my new book, “BUILD GPT: HOW AI WORKS”, is available on Amazon. It goes through the process of building a GPT from scratch and explains how it works with a focus on providing intuition. I want to thank everyone who has helped me with this book, they are in the acknowledgment section. Please feel free to share this book with anyone interested in learning about GPTs or interested in building GPTs. https://preview.redd.it/ixhw5wz9mlwc1.png?width=1507&format=png&auto=webp&s=5f9a0eb5d1f49ed936f12e4527950090d161 ..read more
Visit website
[D] What is the best TTS model for my case?
Reddit » Machine Learning
by /u/hwk06023
7h ago
Hi. Here is the new's question. The biggest concern is the rate of generation. I want to generate about 5 seconds of voice in about 100ms. I want to know which model performs best(SOTA) under those conditions. Which model is best for me? I think "styletts2" is best. If you have any relevant experience or know any other information, I would really appreciate your help. Thank you ! submitted by /u/hwk06023 [visit reddit] [comments ..read more
Visit website
[D] Exploring Complex Number Representations for Word Vectors: A New Approach
Reddit » Machine Learning
by /u/_mayuk
11h ago
Word embeddings like Word2Vec and GloVe have revolutionized natural language processing, offering compact and dense representations of word meanings. However, these embeddings typically represent words as real-valued vectors, potentially limiting their ability to capture complex semantic relationships. In this proposal, we explore an alternative approach: representing word vectors as complex numbers. We propose converting Word2Vec or GloVe vectors into complex numbers, where the real part captures magnitude and the imaginary part encodes additional semantic information. For instance, consider ..read more
Visit website
[R] French GEC dataset
Reddit » Machine Learning
by /u/R-e-v-e-r-i-e-
14h ago
Hi, does anyone know of a French L2 GEC dataset (that was published at a conference)? submitted by /u/R-e-v-e-r-i-e- [visit reddit] [comments ..read more
Visit website
[D] tutorial on how to build streaming ML applications
Reddit » Machine Learning
by /u/clementruhm
17h ago
My primary expertise is audio processing, but i believe this task happens in other domains too: running a model on chunks of infinitely long input. while for some architectures it is straightforward, it can get tedious for convolutional nets. I put together a comprehensive tutorial how to build a streaming ML applications: https://balacoon.com/blog/streaming_inference/. would be curious to learn wether its a common problem and how do people usually deal with it. because generally resources on the topic are surprisingly scarce. submitted by /u/clementruhm [visit reddit] [comments ..read more
Visit website
[D] ML promising fields
Reddit » Machine Learning
by /u/Dramatic_Chance9577
17h ago
I'm planning to dive into ML and I'd like to specialize in a special field. What are the promising subfields of ML and which ones are high demand? submitted by /u/Dramatic_Chance9577 [visit reddit] [comments ..read more
Visit website
[D] Why is R^2 so crazy?
Reddit » Machine Learning
by /u/Cloverdover1
17h ago
​ https://preview.redd.it/jpiyt4b9yhwc1.png?width=1165&format=png&auto=webp&s=95d80f8f9c9241d722717ad25215be4077d541ca Based on the MSE looks good right? But why is my R^2 starting off so negative and approaching 0? Could it be a bug in how i am calculating it? This happened after i min maxed the labels before training. This is an LSTM that is predicting runs scored for baseball games. submitted by /u/Cloverdover1 [visit reddit] [comments ..read more
Visit website
Recall Score Increase [D]
Reddit » Machine Learning
by /u/Legal_Hearing555
21h ago
Hello Everyone, I am trying to do a small fraud detection project and i have so imbalanced dataset. I used randomundersampling because minority class is pretty small and i also tried smote or combining with smote best recall score i got, was with only randomundersampling(0.95). I thought GridsearchCV to increase it but instead of increasing, it is decreasing although i tried to make it to focus on recall score. Why this is happening? submitted by /u/Legal_Hearing555 [visit reddit] [comments ..read more
Visit website

Follow Reddit » Machine Learning on FeedSpot

Continue with Google
Continue with Apple
OR