There are lots of articles, code and discussions about using redirection for unused inbound links in an eShop. Furthermore there is fierce competition in the eCommerce arena where all kinds of tricks and nasty exploits may come to play in order for a merchant to score as high as possible with the search engine results for a given set of keywords.
As the eCommerce store owner on the one hand you want to protect your site against duplicated content and exploits, on the other you want to maintain a good functionality of the shopping cart facility so customers may purchase goods and explore the various features your store may offer. We will briefly describe the most common redirect methods and the ways they're used alone with their pitfalls and consequences, incorrect configuration settings may have for your store.
There are different types of redirects the most commonly used are the temporary redirect and the permanent redirect. The temporary redirect emits a header marked with 302 or 307 or it is automatically assumed when no number is specified (or when the header is incorrect). To achieve a permanent redirect an HTTP header must be specified using the 301 number. As an example in PHP the following headers will perform a permanent redirect function.
header("HTTP/1.1 301"); header(Location: http://www.example.com"); exit();
In contrast if the first line indicates 307 or when omitted, a temporary redirect is issued. Temporary redirects indicate that the page is valid but for some reason the particular request must be redirected. An example of this will be if with a stock eCommerce store we try to checkout. Such a redirection will take place by default and we will end up in the login page or in the shopping cart. Temporary redirects may be subject to the eShop internals, features and functionality. These are accepted cases. However temporary redirects should not take place for pages that meant to be removed permanently. In these cases a permanent redirect is in order. Search engines and bots in general examine the headers during a redirect. When a search engine sees a 301 header it understands this particular page must be removed from its index and/or replaced by the redirected destination.
This process is critical for SEO. Old products, information pages, or other PHP catalog scripts that no longer exist may still be in the index of search engines. When nothing processes these redirects (ie no server or PHP dedicated code) the 404 not-found error page will show up. Most SEO modules however do process redirects in some way. There are several reasons for it, one is to update the links as soon as possible, another to salvage web traffic from search engines in-between the transition. If left to a default 404 error page, a search engine has no way of knowing where to go from there, thus it may simply remove the page from its index and the eCommerce store in this case is left with one link less as it is completely removed from search engines.
So far so good, for an SEO module that can close this gap, by identifying when a specific script along with a set of parameters require a redirect and do just that. In later years the sophistication and complexity of SEO add-ons and features increased dramatically. New methods have evolved where the SEO code performs back to forth translations between a web site's specific parameter set and scripts. If the store owner removes a product from the administrator end for instance, he wants an immediate and permanent redirection to take place. A common use of this is to redirect back to the home page. Other SEO modules go a step further, identifying a similar product inside the eShop redirecting the old product to a relative one.
Then there are some extreme cases where problems start. We are seeing SEO modules that attempt to process every parameter as it is passed via the /GET array and do something with it. Like validating the parameters and when they do find invalid parameters they blindly perform a permanent redirect to some other page. This has multiple side effects.
1. Security-wise invalid parameters should simply ignored. They can be exploited by other parties by posting such invalid links to other sites forcing your SEO module to redirect. In many cases unknown to the store owner such methods may trigger an infinite number of redirects consuming resources from the server thanks to badly written SEO code.
2. Interfacing with external sites. There are many reasons why an SEO module should not interfere and/or mess with unknown parameters. First it cannot predict the various parameters required during handshaking between your site and an external site. Examples are external payment gateways, shipping carriers, affiliate marketing programs where each requires it's own handshaking algorithm with a dedicated module at your eCommerce store. Having an SEO module jumping in-between performing it's own arbitration task deciding whether to redirect is the last thing a store owner needs. Second is almost certain that such an SEO module, it was never tested for this particular combination of factors so the results of such actions can only be non-deterministic.
3. Deploying new modules that require a different set of parameters those passed via /GET. Another area that cannot be covered is when new modules/features are integrated into an eCommerce store. An SEO module must be synchronized, new code might be necessary to be added to service these new parameters and create spider friendly URLs. Therefore blind redirects may simply complicate matters and make difficult to debug the problems.
It is also difficult to detect and debug each and every SEO fault for redirects or a specific sequence (2 or 3 subsequent redirects), as for each request the administrator must examine the server logs or simulate the particular scenario having some sort of plug-in or other script on a browser where the headers show up. There are services that may validate an entire site, marking when redirection occurs, however these can only operate in a generic way as they are unaware of the store's specifics. For instance a form maybe processed in a number of different ways and been redirected to a different page, given different user input.
The safest mechanism is to use an SEO module that does not attempt to force a redirect when unknown parameters are present. Keep in mind that search engines when indexing your site, rely on internal site links therefore parameters will be known. If someone posts externally a URL with invalid parameters, a spider may pick it up to get into your site, but this link eventually won't be stored anywhere (in other words it does not exists inside your shop) as search engines index stores using the internal links. There are however exceptions.
Reasons for some SEO modules performing blind redirects vary but among them (apart of the attempts for traffic utilization which we believe it is pointless) there might be also a poorly coded script. It is imperative that when building links on the catalog end of an eShop, the PHP scripts only setup the known and good parameters. Use of the tep_get_all_get_params() on the catalog end may have devastating side effects for the store. This is because this function collects all parameters passed via /GET without further checking and interprets this as a trusted input to generate an encoded part of the URL of the current page. While the same function is a must for the administrator end and its usage is highly recommended, it should be avoided at all costs on the catalog and preferably be removed. The reason is, propagation of the invalid parameters in such a way that a spider may index these invalid links which in turn may trigger the faulty SEO module to begin the redirects among other things. Competitors may take advantage of such weaknesses to force your store to show up duplicate content. Do not overlook such details.
Unfortunately the stock osCommerce framework includes this function in some places on the catalog end to perform pagination of products as well as to process few other entities. The page propagation exploit may include the exact link that uses the tep_get_all_get_params() function to an external site in order to generate the link along with the invalid parameters (that may very well include an XSS attempt). As a result, the search engine will pickup the first link to get into the eShop (which includes the invalid parameters). The tep_get_all_get_params that is present on that specific first page will then pick up the same parameters for at least an extra page (including the invalid ones). And now this second page transition comes from within the store and looks like an internal link to the search engine. Consequently this information will be stored within the spider's cache and presented as an internal link with your osCommerce store.
When there is a long pagination of products this exploit can be replicated to several pages. As a result the store ends up with a number of problems inside the search engine's indexed database and cache. Among these problems the links may include XSS attempts, translated as duplicated content (as they're internal and they do point in to the same page with different parameters) and so forth. Google has the canonical tag workaround for these cases however this maybe spider dependent and change over time, therefore we recommend fixing the code.
Now in extreme cases this may cause a site to be banned outright by the search engine itself (depends on how well the URL was initially crafted and posted, to look as a dangerous one). When a session is present this approach will also propagate the invalid parameters inside the database as part of the navigation history (ie when a customer visits the store using the link that provided by the search results of a spider). At least this function is not used extensively thus with a little effort, the problem can be rectified. In addition some instances of this function assume a logged-in customer or a session to be present for further processing. There are however cases open to search engines and those are the ones of prime concern. Fixing this particular issue will improve both security and ranking of an ecommerce store, as well as limit duplicate content potential. At the top of it you will not need any SEO hacks that may break the shopping cart..
If you are wondering how to replicate this problem on your eCommerce store try: http://example.com/products_new.php?invalid1=1&invalid2=2 After you invoke the page examine the links at the top/bottom of the subsequent pages for the new products. They all include the invalid parameters. In this example the same links will be indexed by search engines, because they show up as internal ones. Subject to the SEO module you may have deployed (if any) results may vary.
Asymmetric Software maintains and deploys an advanced SEO module called SEO-G that generates 100% static links of the store owner's choice. The module maintains it's own structure of SEO links in the database and triggers a redirect on predefined cases and only, as specified by the site administrator. In addition the I-Metrics Layer rectifies these cases and avoids invalid parameter propagation thus a clean index is always present with search engine result pages. The combination of these 2 frameworks, highly increases the security of the store while maintains the normal operation of all functions of your site. |