Andrew McSherry's blog about tech-related stuff that really needs to be updated more often :)
Saturday, February 23, 2013
Bad App Reviews Now Has iOS Apps
Bad App Reviews, now has iOS apps. We've got about 90k of them listed now, but we're still filling in reviews. About 5k apps have reviews right now, adding at a rate of 3k apps/day. You can see them under the search or index, right along side their Android counterparts.
Mashable's HTML Intro
Noticed this ASCII art at the top of Mashable's HTML today. Seems it gets sent for every page on their site.
<!--
o o o + o
+ + + o + +
+
o + + o + + +
__ __ _ _ _
~_,-| \/ | __ _ ___| |__ __ _| |__ | | ___
| |\/| |/ _` / __| '_ \ / _` | '_ \| |/ _ \,-~_,- - - ,
~_,-| | | | (_| \__ \ | | | (_| | |_) | | __/ | /\_/\
|_| |_|\__,_|___/_| |_|\__,_|_.__/|_|\___| ~=|__( ^ .^)
~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,"" ""
o o o + o
+ + + o + +
+
o + + o + + +
-->
<!--
o o o + o
+ + + o + +
+
o + + o + + +
__ __ _ _ _
~_,-| \/ | __ _ ___| |__ __ _| |__ | | ___
| |\/| |/ _` / __| '_ \ / _` | '_ \| |/ _ \,-~_,- - - ,
~_,-| | | | (_| \__ \ | | | (_| | |_) | | __/ | /\_/\
|_| |_|\__,_|___/_| |_|\__,_|_.__/|_|\___| ~=|__( ^ .^)
~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,-~_,"" ""
o o o + o
+ + + o + +
+
o + + o + + +
-->
Tuesday, February 19, 2013
Scraping the Web Without a Proxy on Heroku
403 Forbidden: One of the biggest issues scraping websites. Eventually after bombarding any reasonably intelligent site with hundreds of requests per minute, they're going to cut you off for a period of time, if not outright ban. The common work around for this has usually been to get a list of proxies and rotate your requests through them. Thus, your traffic appears to come from different places and is less noticeable. However, there's a couple issues with this.
Another advantage is that Heroku prorates to the second. It doesn't matter how many dynos you spin up, just how long they stay alive. I've found it usually takes a rails dyno about 10 seconds to start up which is a pretty small penalty since you can usually run them for a few minutes before being blocked. You'll be easily saving the costs by not killing time in proxies.
To take full advantage of this, write your scripts to fail fast. After a few unsuccessful requests, kill the dyno. Then set up your scheduling to run constantly. There's a minimum time interval of 10 minutes for the scheduler, but you can set up multiples of 10 minutes. This way, you'll actually be able to run through thousands of different IP addresses a day without fear of getting cut off.
Proxies are slow
The nature of using a proxy should at least double your latency. Instead of going from A to B, you need to go from A to C to B. Furthermore, you're not likely the only one using it. Most public proxies get swarmed with requests and this adds bandwidth issues into the mix.Proxies only accept certain requests
Most public proxies only accept GET requests, and may limit the domains you can access for a variety of reasons. This isn't the case with all of them, but it could easily be an issue.Proxies expire
When using proxy servers, you'll need to keep a constantly updated list of available servers. They go down without notice and new servers surface all the time.A Better Solution
We can get around these issues by using Heroku Scheduler. The beauty of Heroku is each one has a different IP address. Their distributed around Amazon Web Services, which contains hundreds of thousands, if not millions of IP addresses. Every time you spin up a new dyno, you get a new IP address.Another advantage is that Heroku prorates to the second. It doesn't matter how many dynos you spin up, just how long they stay alive. I've found it usually takes a rails dyno about 10 seconds to start up which is a pretty small penalty since you can usually run them for a few minutes before being blocked. You'll be easily saving the costs by not killing time in proxies.
To take full advantage of this, write your scripts to fail fast. After a few unsuccessful requests, kill the dyno. Then set up your scheduling to run constantly. There's a minimum time interval of 10 minutes for the scheduler, but you can set up multiples of 10 minutes. This way, you'll actually be able to run through thousands of different IP addresses a day without fear of getting cut off.
Sunday, February 17, 2013
How to Track Pinterest's Pinmarklets with Google Analytics
Pinterest is a great platform for your users to spread the word about your website. In a short period of time, they've managed to become one of the top 50 most trafficked websites. Just through their Pinmarklet up, build a url with an image and a link, and you're good to go. However, it's also a black hole when it comes to tracking through any analytics platform. You don't know how many times your users have shared to Pinterest, and you don't know how much data is coming back from these shares. For this article, we're going to focus on Google Analytics, but the same strategy could very well be used for any analytics platform around.
function setPinterestCount(response) {
if (response["count"] != '0') {
$('.pinterest-count').show();
$('.pinterest-count > span').text(response["count"]);
}
}
</script>
<script type="text/javascript" src="//partners-api.pinterest.com/v1/urls/count.json?url=your_url&ref=your_url&callback=setPinterestCount"></script>
Tracking outgoing Pins
The first step is to be able to track outgoing pins. With the current Pin It button, slapping an onclick method on it won't get called. However, we can design the same looking link that even has the count bubble next to it. All you need for this script is to replace your_url with the page's current URL and your_bookmarklet_url with the href on your current button.
<a class="pinterest-button" target="_blank" onclick="_gaq.push(['_trackEvent', 'Pinmarklet', 'Pinned']);window.open(this.href,'_blank','status=no,resizable=yes,scrollbars=yes,personalbar=no,directories=no,location=no,toolbar=no,menubar=no,width=632,height=270,left=0,top=0');return false;" href="your_bookmarklet_url">Pin It<span class="pinterest-count"><i></i><span></span></span></a>
<script> function setPinterestCount(response) {
if (response["count"] != '0') {
$('.pinterest-count').show();
$('.pinterest-count > span').text(response["count"]);
}
}
</script>
<script type="text/javascript" src="//partners-api.pinterest.com/v1/urls/count.json?url=your_url&ref=your_url&callback=setPinterestCount"></script>
<style type="text/css">
.pinterest-button {
position: absolute;
background: url('http://assets.pinterest.com/images/pinit6.png');
color: #CD1F1F;
top:-11px;
height: 20px;
width: 43px;
background-position: 0 -7px;
}
.pinterest-count {
display:none;
padding: 0 3px 0 10px;
background-size: 45px 20px;
background-position: 2px 0;
position: absolute;
top: 0;
left: 41px;
height: 20px;
font: 10px Arial, Helvetica, sans-serif;
line-height: 20px;
background-color: transparent;
background-repeat: no-repeat;
background-image: url(http://passets.pinterest.com/images/pidgets/fpb1.png);
color: #777;
text-align: center;
}
.pinterest-count i {
background-color: transparent;
background-repeat: no-repeat;
background-image: url(http://passets.pinterest.com/images/pidgets/fpb1.png);
background-position: 100% 0;
position: absolute;
top: 0;
right: -2px;
height: 20px;
width: 2px;
}
</style>
Tracking Incoming Traffic from Your Pins
Unfortunately, Pinterest has decided to strip campaign parameters from all posts. So if you post the link http://blog.andymcsherry.com/page_path?utm_campaign=pin_it_button&utm_source=me&utm_medium=pinterest, the link on Pinterest will be http://blog.andymcsherry.com/page_path. We can get around this by specifying a page-path reserved for Pinterest. Simply share http://blog.andymcsherry.com/pinterest/page_path and set up your server to redirect all requests from /pinterest/page_path to /page_path?your_campaign_parameters. Then you'll be able to track this incoming traffic. While it'd be generally a good idea to do a 301 redirect in this circumstance, Pinterest uses rel=nofollow, so you don't need to worry about losing page-rank from these links. This method can also be used for referral links heading to Pinterest. Pinterest strips out known referral tags when users post to Pinterest (they actually used to append their own). If you redirect through your site in someway, you can ensure that these tags get added.
Labels:
analytics,
campaign tracking,
google analytics,
pinterest,
web
Thursday, February 7, 2013
Not Selected Index Stats Have Disappeared from Webmaster Tools
It appears that Google has removed the "Not Selected" stats from their Google Webmaster Tools index stats today. This option was under the Advanced tab of Index Status. It was an extremely useful metric to determine how much of your content was considered valuable to Google. I was always hoping that they'd list pages that weren't selected so web masters could better gauge how to improve the content. You'll still be able to retrieve a count of the removed URLs which can be one warning sign, but it'd be useful to also see pages that never made the cut in the first place. Here's how it appeared before and after the change (these are not for my sites):
I haven't been able to find an official word from Google on the matter, but the consensus in the Webmaster Tools product forums is that it was removed because it was confusing to users. However this rationale for the change doesn't seem to be valid, especially since it was under the Advanced tab. I believe the real reasoning behind this has been to prevent users from testing what can index and what can't. Google doesn't want you to be able to test which SEO tricks you can pull to get indexed. By publishing this data, it could be too easy to test whether content was being flagged and expose flaws in their algorithms.
I haven't been able to find an official word from Google on the matter, but the consensus in the Webmaster Tools product forums is that it was removed because it was confusing to users. However this rationale for the change doesn't seem to be valid, especially since it was under the Advanced tab. I believe the real reasoning behind this has been to prevent users from testing what can index and what can't. Google doesn't want you to be able to test which SEO tricks you can pull to get indexed. By publishing this data, it could be too easy to test whether content was being flagged and expose flaws in their algorithms.
Why Your Android App Won't Port to Blackberry 10
I've read quite a few articles recently about how simple it will be to port Android applications to BlackBerry 10. It's been hailed as the cure for the meager app offering BlackBerry will have on its new platform. Naturally, I investigated this as a such a simple port could open up a new platform for many of my apps I've already written. However, I soon discovered the list of unsupported APIs, and my whole plan was crushed. Even if your app can still function, chances are it's feature set will be severely limited by these restrictions if it does anything interesting. Here's some of the biggest sticking points:
These are some of the limitations that would likely make your app fundamentally useless.
Potential Deal Breakers
- Services cannot run in the background. Once your app leaves the foreground, all background services are killed. Your app cannot play music, download data, schedule alarms, monitor location, or any of the multitude of reasons you might require performing work in the background.
- There is NO support for Bluetooth. None. If you're app requires Bluetooth, give up now. If you have a feature that requires Bluetooth, you need to find a way to disable it. Same goes for NFC, but that's likely less of an issue for most developers.
- Intent filters for ACTION_SEND and ACTION_VIEW from outside your app are disabled. If your application allows users to view or share images, text, files, URLs or any other data from other applications, your users will have to open your app first, and you'll have to provide a mechanism for them to import if from inside your application.
- The NDK is not supported. Game-over for many OpenGL apps that used C++. However, there is a native SDK for BlackBerry 10 so porting an Android application may not have been the best option anyway. There are however many other applications of the NDK that go beyond games that will not be able to port.
- No live wallpapers, widgets, home screens, lock screens.
Alternative Implementations
If you made it through the first list, congrats! Chances are you'll be capable of porting your application. However, there's a significant chance you'll have to rewrite substantial parts of your applications. Most of these are related to the absence of Google Play Services.
- Notifications will be limited to one line of text. If you've built fancy, interactive notifications, they'll need to be reduced down to the bare minimum. This will likely eliminate some functionality and cause maintenance headaches.
- BlackBerry will use a different push notification service than with Google. Not only will this require a different implementation in your app, but it'll require server-side support as well.
- In-app purchases will go through BlackBerry App World so you'll have create an alternative implementation for interacting with it.
- If your application uses Maps, you'll have to use an alternative web-based Google Maps API for displaying them. In addition to having to redo your work, it appears to be a more limited API with a poorer user experience.
- Google account authentication through Google Play Services will not be available, so you'll have to create an alternative route to obtain oAuth tokens.
Other Sources of Frustration
- Since Google Play services will not be available on the device, you won't be able to use the Android Backup services. If you need to remotely persist user preferences, there's currently no substitute so you'll have to create your own service.
- In app +1 buttons will not be available.
- Support for all the accessibility APIs is missing so any improvements you've made for the deaf or blind will be unavailable.
- Your application cannot add or modify the user's contacts so any improvements will be limited to use inside your app.
- You cannot set Thread priority. Your background task that sends Analytics is going to have the same priority as the UI thread.
There's dozens of other unsupported APIs and features, but these are likely the most difficult hurdles to get across. Furthermore, I noticed in the documents there seems to be a gap sometimes between what's stated as being supported and unsupported. The permissions list stood out the most here. It may be possible that there are more undocumented cases you'll come across so I'd be glad to here any feedback from any developers that discover more.
Subscribe to:
Posts (Atom)