Reddit » Text & Data Mining
0 FOLLOWERS
Reddit is a network of communities based on people's interests. Find communities you're interested in, and become part of an online community. We share news, discussions, papers, tutorials, libraries, and tools related to NLP, machine learning and data analysis.
Reddit » Text & Data Mining
1w ago
Good evening,
I’m currently self-teaching text mining and I’m interested in exploring techniques to measure the progression of topics over time. Let’s assume that the topics aren’t predefined, which means we need to construct them using methods like LDA, SVD, or BERTopic.
The challenge is to analyze how these topics change over time. While one approach is to conduct topic modeling at separate intervals, I’m seeking a more continuous method. Any insights on how this can be achieved would be greatly appreciated.
My aim is to build an index to quantify how a certain topic evolves overtime.
subm ..read more
Reddit » Text & Data Mining
2w ago
I am developing a project that involves processing text data. My goal is to correct errors specifically related to unnecessary characters and spaces in texts. I'm looking for recommendations on suitable Python libraries and tools that could help address these issues.
Extraneous spaces:
Correct: "We boug ht a new car yesterday." to "We bought a new car yesterday."
Correct: "Today was a ve ry goo d da y." to "Today was a very good day."
Correct: "Hel lo! Ho w are you do ing?" to "Hello! How are you doing?"
I have explored several existing solutions, but most of them were either too basic for ..read more
Reddit » Text & Data Mining
1M ago
submitted by /u/Far-Amphibian3043
[visit reddit] [comments ..read more
Reddit » Text & Data Mining
1M ago
"Authorship Fingerprinting research is capable to correctly distinguish the works created by GPT 3.5, GPT 4, and human authors with recall rate 98.84% in our preliminary study." - Maiga Chang
One hour technical online (free) Thu Feb 29 "Challenges in Natural Language Processing Applications"
submitted by /u/gckoch
[visit reddit] [comments ..read more
Reddit » Text & Data Mining
2M ago
submitted by /u/charles-legislate
[visit reddit] [comments ..read more
Reddit » Text & Data Mining
2M ago
Good evening,
I need help with understanding the maths behind the LDA model:
https://ai.stanford.edu/~ang/papers/jair03-lda.pdf
Despite I understand the intuition of what is the model doing, for me is like a black box
submitted by /u/Cerricola
[visit reddit] [comments ..read more
Reddit » Text & Data Mining
2M ago
Hello everyone! I'm very new to this field and have been tasked with a project where by looking at movie/tv show reviews of east asian and north American media I need to identify some themes that differentiate the two types of media.
For example. Let's say I'm analyzing "Parasite" (Asian media) and "Breaking Bad" (North American media). After processing the reviews:
I might find that "Parasite" reviews frequently discuss themes of class disparity and societal structure, while "Breaking Bad" reviews often touch on themes of morality and personal choice.
So I need to classify the reviews based ..read more
Reddit » Text & Data Mining
3M ago
Hello, I'm new here. I'm an undergraduate student who is about to start a project that requires me to create a dataset for a model. This model that detects metaphors that are present in the English comprehension passages from a particular exam body.
please i need guidance, i'm willing to work and learn. I just need someone that knows more than me and can put me through so I won't keep wasting time.
submitted by /u/am_kolade
[visit reddit] [comments ..read more
Reddit » Text & Data Mining
4M ago
My open source software SentenceAx is a fine tuning of BERT for splitting complicated sentences into simple ones. After 500 commits, it is thoroughly debugged on a CPU for small values for everything. Now I need someone with a GPU (I don't have one) to volunteer to train it for me. I don't know how long it will take but probably just a few hours. This is a fairly close rewrite/improvement of the famous software Openie6, so this model and hyperparams have been used successfully before to train Openie6. If you decide to accept, Here is the repo. SentenceAx is a stand alone component of the Mapp ..read more
Reddit » Text & Data Mining
5M ago
Looking to do a web-scraping project for a class, specifically on US newspaper article data. Most of the APIs are pretty expensive and outside my budget. Is there a way to do web-scraping on an academic database like Lexus Nexus? Would make me life a whole lot easier. Thanks everyone!
submitted by /u/Mental_Bet6033
[visit reddit] [comments ..read more