Question about databases and duplicate content

built

//
BuSo Pro
Boot Camp
Joined
Jan 23, 2015
Messages
1,677
Likes
1,442
Degree
4
I want to create a database of reviews for various products in my niche. I found a site that has an api they let you use for this to pull in videos and articles.

The question is though, isn't this duplicate content? and will I get penalized for something like this?

My main goal is to have this database on a subdomain and then my main site on the root
 
It's a bit of an exaggeration when people refer to 'penalties' for feeds and other syndicated content because it's not so much that. It's more that in the interest of providing the best experience to their users it wouldn't benefit Google to return 20 sites on the first two pages that all use the same feed, and have the exact same reviews of a product. Because their users would hate that... so they don't.

Using a feed and trying to outrank everyone else who also uses that feed is probably not going to work out for you, for that reason. None of you are doing something unique, adding value, or being useful to Google's users more than any of the others.

Having said that, and as you already know, there are a fair few very large businesses that are based on driving traffic to automatically generated pages like this with paid traffic or leaked traffic (the ones I've seen myself and know are definitely making a lot are paid but I'll let the pro leakers chime in with thoughts on things they've seen).
 
It's a bit of an exaggeration when people refer to 'penalties' for feeds and other syndicated content because it's not so much that. It's more that in the interest of providing the best experience to their users it wouldn't benefit Google to return 20 sites on the first two pages that all use the same feed, and have the exact same reviews of a product. Because their users would hate that... so they don't.

Using a feed and trying to outrank everyone else who also uses that feed is probably not going to work out for you, for that reason. None of you are doing something unique, adding value, or being useful to Google's users more than any of the others.

Having said that, and as you already know, there are a fair few very large businesses that are based on driving traffic to automatically generated pages like this with paid traffic or leaked traffic (the ones I've seen myself and know are definitely making a lot are paid but I'll let the pro leakers chime in with thoughts on things they've seen).

Thanks for the reply.

I definitely don't intend to outrank anyone with the feed. Just thought it would be a good idea to provide some more value on my site and have a database that people can search.

Can you give me some examples of these business using paid traffic and auto generated pages? Wanna check them out
 
You won't receive a penalty for syndicating content. "Duplicate content" typically refers to duplicating your own content across your own site, although that's not how its taken by the majority, understandably.

Think about how someone might publish a series of posts that are all related, or even Part 1 through 5. They put all of them in the same 5 categories and hit them with the same 30 tags each.

They also don't no-index tag pages, etc. Now they have 35 pages on their site with the same post descriptions in the same order with the only difference being the URL slug. That's the real problem when it comes to dupe content.

If you want an edge over anyone else using the same API to pull these reviews, add something like 50-100 unique words to the top of each of these posts so Google's bots see you as adding value. Bonus points if you can add new pictures too.
 
@Ryuzaki What happens if I set a canonical link to itself? Weird but that would actually solve a problem I have now with autopublishing duplicate content within a subdomain and primary domain.
 
Every one of my pages canonicals to itself. I've read before that Google understands it and doesn't care:

Q: “Is it still okay to put rel canonical on every single page pointing to itself, just in order to avoid duplicate parameters and things like that?”

A: "It doesn’t matter how many pages. You just need to make sure that it points to the clean URL version, that you’re not pointing to the parameter version accidentally, or that you’re not always pointing to the homepage accidentally, because those are the types of mistakes we try and catch. You can do that across millions of pages and we’ll try to take that into account."​

The dumbest content scraper sites can't figure out where the article begins and ends, so they end up grabbing metadata too. I see this as protecting my site since Google will then realize that this is my content and not theirs and apply any links they acquire to my own content. So instead of someone duplicating my content, they are now unintentionally curating and resyndicating and I'm getting some form of a boost out of it.

Although if I catch this on idiotic sites, I disavow them. If it's on a respectable site or one that's at least related, I'll let it slide.
 
Back