24
Question about AI model training data quality
I keep seeing people in my local tech meetup talk about training models on any data they can scrape, focusing only on volume. Last month, a developer from Austin showed a project that failed because the training set was full of duplicate and low-quality forum posts. The model just repeated nonsense. I think clean, verified data matters more than sheer size. Has anyone else run into this and found a good way to source better datasets?
2 comments
Log in to join the discussion
Log In2 Comments
michael89511d ago
Wait, they just used any forum posts they could find? I mean, that's basically asking for a model to just spit back garbage.
3
river_gonzalez6611d ago
Isn't it more about how they filter and clean the data first?
1