Jun 29 2009

Fun with other people's websites (aka Baconification of the web)

Posted by Joe Steinbring at 12:01 AM
2 comments
- Categories: ColdFusion

I recently found myself needing to copy webpages that are not on my server and redisplay them on one of my sites, on demand.  Using cfhttp, that's an easy task but if the site that you're redisplaying uses relative links, it's not going to look right redisplayed.  So, how do you clean up the page before redisplaying it?

The way I handled it, was to use a series of replaces.  You can check out the full CFC here:

 

<cffunction name="domainfromurl" access="public" returntype="string">
<cfargument name="url" type="string" required="yes">
<cfset start = #find("//", arguments.url)# + 2>
<cfset length = #find("/", arguments.url, variables.start)#>
<cfset length = #variables.length# - #variables.start#>
<cftry>
<cfreturn #mid(arguments.url, variables.start, variables.length)#>
<cfcatch>
<cfreturn #replace(arguments.url, 'http://', '')#>
</cfcatch>
</cftry>
</cffunction>

<cffunction name="getHTML" access="public" returntype="string">
   <cfargument name="url" type="string" required="yes">
   <cfhttp url="#arguments.url#" timeout="5" useragent="proxy.cfc: A proxy server CFC by Joe Steinbring (http://steinbring.net)">
   <cfreturn cfhttp.FileContent>
</cffunction>

<cffunction name="fixHTML" access="public" returntype="string">
<cfargument name="html" type="string" required="yes">
<cfargument name="url" type="string" required="yes">
<cfargument name="proxyify" type="string" required="yes">

<cfinvoke
component = "proxy"
method = "domainfromurl"
returnVariable = "variables.domain">

<cfinvokeargument name="url" value="#arguments.url#">
</cfinvoke>

<!---

When you specify that you want to "proxyify" a page, it changes references to webpages
that are on the origional server to proxied pages.

--->


<cfif #arguments.proxyify# neq 'no'>

<cfset replacewith = [
'a href="?#arguments.proxyify#=http',
'a href="
?#arguments.proxyify#=http://#variables.domain#/',
'a onClick="javascript:" href="?#arguments.proxyify#='
] />


<cfset whattolookfor = [
'a href="
http',
'a href="/',
'a onClick="
javascript:" href="'
] />


<cfloop from="1" to="#arraylen(whattolookfor)#" index="loop_index">

<cfset html = #replacenocase(html, variables.whattolookfor[loop_index], variables.replacewith[loop_index], 'all')#>

</cfloop>

</cfif>


<cfset replacewith = [
'src="http://#variables.domain#/',
'src="
http://#variables.domain#/images/',
'href="http://#variables.domain#/',
'href="
http://#variables.domain#/css/',
'action="http://#variables.domain#/',
': url(http://#variables.domain#/',
'@import "
http://#variables.domain#/',
' href="http://#variables.domain#/style.css',
'img src="
http://#variables.domain#/logo.'
] />


<cfset whattolookfor = [
'src="/',
'src="
images/',
'href="/',
'href="
css/',
'action="/',
': url(/',
'@import "
/',
' href="style.css',
'img src="
logo.'
] />


<cfloop from="1" to="#arraylen(whattolookfor)#" index="loop_index">

<cfset html = #replacenocase(html, variables.whattolookfor[loop_index], variables.replacewith[loop_index], 'all')#>

</cfloop>

<cfreturn html>
</cffunction>

 

 

So, you might be wondering what practical applications there are in this.  Well, I came up with a good one.  I wrote a utility that takes the html from a site of your choice and redisplays it with random words replaced with the word 'bacon'.  Think of it like madlibs but with just bacon.

Original Site

Baconified Site

Comments

Jay

Jay wrote on 09/02/09 8:08 PM

a great place for information and advice about a wide range of topics.
I am from Mauritius and too poorly know English, give true I wrote the following sentence: "Similar blues knew hard theater and fourth magazine to reach filmmaking an dyed contaminant, imparted kuffi - September 11, 2009, 2:58 AM Another government to dramatically becoming down a society is to currently depict the book with the pop at the practical wood leathers on the bailey, which uses many psychedelic costs."

With respect :-), Jay.
Ink Cartridge Recycling

Ink Cartridge Recycling wrote on 11/09/09 10:37 AM

I added your post to my college Report


Larry

Write your comment



(it will not be displayed)