Wednesday, November 30, 2016

Tuesday, November 1, 2016

Pornhub Bypasses Ad Blockers With WebSockets



*** Links to discussions on Reddit and Hacker News. Also check out BugReplay on Product Hunt :)
(NSFW level: medium)
We tried to find the most PG page on MindGeek’s network to use as an example- it wasn’t easy.

When I was building the prototype for BugReplay, I was evaluating different methods of capturing and analyzing network traffic from Chrome. One of the first things I saw that looked promising was the chrome.webRequest API.
From the docs: “Use the chrome.webRequest API to observe and analyze traffic and to intercept, block, or modify requests in-flight.”
That seemed to be exactly what I needed.
After experimenting with the Chrome webRequest API, I quickly realized there was a big problem. It didn’t allow me to analyze any WebSocket traffic, something I really wanted to support.
As I was searching the web trying to see if I was misreading the documentation or was looking in the wrong spot, I found a relevant bug report from 2012: “chrome.webRequest.onBeforeRequest doesn’t intercept WebSocket requests.” In the bug report, users were complaining that without the ability to block WebSockets, websites could get around ad blockers fairly easily. If WebSocket data was not visible to Chrome extensions via the webRequest API, they could not be blocked without some heavy duty hacks.
Initially, the risks to ad blockers seemed theoretical; the examples of sites that were employing this technique were very obscure. Then in August 2016, an employee of the company that owns Pornhub.com (MindGeek) started arguing against adding the WebSocket blocking capabilities to the Chrome API. Pornhub is the 63rd most visited site on the Internet according to Alexa. I checked out a few of MindGeek’s sites and sure enough, I could see ads coming through even though I had Adblock Plus on. The ads on Pornhub are marked ‘By Traffic Junky,’ which is an ad network owned by MindGeek.
In the screenshot below, you can see a banner at the top of the page announcing that the site is aware that the user is using an Ad Blocker, with an invitation to subscribe to a premium ads free version of the site. On the right side of the page you can see an advertisement.


HOW THEY DO IT
When you visit Pornhub.com, it tries to detect if you have an ad blocker. If it detects one, it opens a WebSocket connection that acts as a backup mechanism for delivering ads.
Watching the BugReplay browser recording, you can see a number of network requests triggered that are blocked by AdBlock: They are marked Failed in the network traffic, and if you click one to inspect the detail pane you can see the failed reason is net::ERR_BLOCKED_BY_CLIENT. That is the error reported by Chrome when an asset is blocked from loading.
You can find the WebSocket frames individually in the network panel or just look at the WebSocket create request which has links to all the individual frames. The name of the domain where the WebSocket connects is “ws://ws.adspayformy.site.” A decent joke aimed at ad blockers :)
When the WebSocket loads, the browser sends a frame with a JSON encoded payload for each of the spots it has available for ads. Checking out one of the WebSocket frames, you can see in the frame data the advertisement data is sent back with:
  1. The zone_id 13, for where the JavaScript should place the ad.
  2. The media_type image, so the page knows what kind of element to create (most of the ads are videos, I picked an image for this post because it was relatively tame).
  3. The Image itself, transmitted base64 encoded so it can be reconstructed using the data uri scheme
  4. An “img_type” (“image/jpeg”) to pass to the data uri.
Ad Blockers primarily work using the webRequest API, so constructing the ad by transmitting the data over the WebSocket as base64 is a pretty clever way of dodging the blocker.

WHAT’S NEXT

On October 25th, 2016, there was some new activity on the Chromium ticket. A contributor wrote a patch adding the ability to block WebSockets using the webRequest api. If it’s accepted, it will eventually wind up in Chrome stable.
When or if that rolls out, the ad blocker extension writers can choose to remove the hacks for users of the latest Chrome, leaving content providers like Pornhub to figure out their next move in the ad blocking war.

UPDATE

Since I started looking into this, AdBlock Plus and uBlock Origin have shipped workarounds to block this technique. AdBlock and others still do not.
For AdBlock Plus, “The wrapper performs a dummy web request before WebSocket messages are sent/received. The extension recognizes these dummy web requests as representing a WebSocket message. It intercepts and blocks them if the corresponding WebSocket message should be blocked. The WebSocket wrapper then allows / blocks the WebSocket message based on whether the dummy web request was blocked or not.” via
For uBlock Origin, they shipped a workaround that has the “ability to foil WebSocket using a CSP directive.”

Monday, October 24, 2016

Demonstrating WordPress exploits: WP Mobile Detector 3.5

Several months ago a very popular WordPress plugin that helps you present a mobile friendly version of your site was found to have a large security hole. The plugin had a bug in it that allowed anyone to upload and execute arbitrary files.
If you go to /wp-content/plugins/3.5/resize.php on a website with the unpatched plugin installed, the script looks for a src query parameter and copies that file path to a directory in its cache. Due to a PHP setting (allow_url_fopen), which is default on, that allows you to specify a URL as a file path, this opens the door to upload an executable PHP file to the server. Exploiting that is trivial as demonstrated in this BugReplay video.
For a little bit of background, WordPress accounts for about 26% of websites on the internet. Plugins for WordPress are bundles of source code that add functionality to your blog/site. They are installed typically by uploading a zip file through the admin panel.
You are essentially relying on third party programmers to keep your website secure when you install plugins (‘It was so simple we didn’t even have to involve the IT department. Bonus!’), and that can be a really bad idea.
If you do run WordPress, always make sure your plugins are up to date, and try to follow best practices on the server for file write permissions to minimize the impact of these kinds of exploits.

Monday, September 19, 2016

Troubleshooting A ReactJS Project With BugReplay



React is frequently written in JSX and then compiled down to JavaScript using something like Babel. Afterwards, the JavaScript is pretty much unrecognizable. This makes for tricky debugging when your console errors point to a line in the compiled js.
When compiling with Babel, you can specify that Babel additionally create source map files (.map), which is an index that allows a parser (like Chrome DevTools) to translate a compiled, combined and/or minified file back to its original state. That way line numbers in console stack traces match up to the actual source of the error in your own code, versus in the compiled JavaScript where the line number is likely different.
For this example I’ve forked ToDoMVC, which is an example ToDo list app written in a handful of different frameworks. I’ve broken the ‘toggle all’ feature, which fires off when clicking the button which toggles all ToDo items either checked or unchecked. I modified the source so it runs off of compiled and minified JavaScript, since the original example translates to JavaScript on the fly via JSXTransformer.
The purposely broken ToDoMVC is available here.
Here is the BugReplay report that reproduces the issue. I click the ‘toggle all’ button and nothing happens. When viewing the report, you can see the problem along with the corresponding JavaScript error at ~5 seconds into the video.
Clicking on the JavaScript error ‘Uncaught ReferenceError: evnt is not defined’ gives you the corresponding stacktrace.
The first item in the stack trace is:





image

The source is listed as ../js/app.jsx, which is the file that our compiled and minified js was built from. You can see this file in the ‘Resources’ tab of DevTools, which is generated by Chrome from the minified js coupled with its corresponding map file.
BugReplay checks the server to see if map files exist when saving console logs, and if so, will use them to generate the correct file-name and line numbers based on the code in the original file.
If I open my local copy of js/app.jsx, the line it is complaining about is:





image

As you can see, the error is occurring on line 58 of the jsx file, where there’s a fairly obvious but very common problem: I’ve referred to the ‘event’ variable as ‘evnt.’
Here is the same issue, but without source map files on the server.
The error itself is still clear, but the stack trace line number is useless due to the JSX compilation and minification.
MostJavaScript frameworks that minify or combine sources include an option for generating source maps, so whether you’re using browser DevTools or BugReplay, they are a huge help for debugging.

Wednesday, September 7, 2016

How To Use Google Cloud's Free Structured Logging Service With Golang

I had a conversation with an early user of BugReplay some months ago, who had a problem with our software.
‘I’m getting “Failed to attach to the network logging of the debugger” when trying to record.’
There’s a brief moment of panic when you hear this kind of thing as a developer. That error message is supposed to be a failsafe when earlier error checking didn’t catch an issue. It’s an error that is not supposed to happen.
Luckily, we had spent a good deal of time at BugReplay setting up logging for specifically this reason. I checked Google’s log console, punched in a few filters - structpayload.EventName=’ExtensionError’ - and I found logs relevant to the problem. I dug into the stack trace and found the message that the browser had reported: “The extensions gallery cannot be scripted” and the url that the error had occured on: https://chrome.google.com/webstore/category/extensions.





image

Google’s logging service is awesome. It’s one of those services that brings me actual joy to use because it solves problems for me. I’ve tracked down root causes of problems, analyzed performance, and seen how users are using (or not using) the web site. You get searchable and readable logs via a pretty good web interface, as well as extensive API access.
The best part of Google’s logging service is the first class structured logging support. The JSON objects you send are fully searchable by field; and Google even analyzes the log entries as they come in so will auto-complete for you: You can type err and if you have logs that define a property that matches err* it will display something like ‘event.metaInfo.errorInfo.errorName’  and you just click it. You can find entries with that field present, or missing, or equal to a specific value. There’s a million other features of the service that I haven’t dug into yet.
Because it’s logging to a central location, we’re able to keep our client side and server side logs in the same place, then view them together or separately, and because we’re logging to a remote server, we are much more careful about what and how much we log. An API call to the service can take 200ms; so even though we execute it asynchronously it is not something we want to be hammering on.
For server side errors, we spawn a goroutine to send over the log entry. For our client side errors, in our catch blocks we JSONify the exception to capture the stack trace and relevant metadata (what url the error occured on, etc) then hit an API endpoint which in turn hits the logging API endpoint.
Since we’ve started using structured logs, we’ve focused more on logging metadata: ids, urls, client ids. Knowing that we can retrieve things by those fields (say, give me any errors that have occurred on this url in the past week) encourages better practice, and discourages leaving extraneous logs (console.log(“lolwtfbbq”) ). Since it is structured we aren’t really crafting log lines in isolation. For most things we log we have an ‘event’ object with a name which indicates what we are actually logging (‘UserCreate’ for example), then we’ll attach metadata to it.
In practice, this ends up looking something like e := events.NewUserCreate(r, user). This leads us to a more consistent form than if we were just doing a log.Printf(“A new user was created from the URL %s, their id is %s, their team id is %s”,url,id,teamID).
The Go library for Google’s logging platform is unfortunately lacking on examples though the  docs for other languages are more complete. Also unfortunate, the Go library is mostly auto-generated so it is not particularly user friendly. I struggled through trying to guess at why it wasn’t working from looking at the web API, looking at assorted stackoverflow questions, gave up and came back to it a few times, then finally cobbled together a working library. The result is clonable here. If you’re not on a mobile device it is also pasted below.
As for the “Failed to attach to the network logging of the debugger” bug, the next version of BugReplay’s extension had Chrome’s extension gallery page added to its preflight check blacklist, so now it correctly shows that the extension is not usable on the current page.
package googlelogdemo
/*
Log entries to Google Cloud Logging service.
// Severity: Severity of log.
// Possible values:
// "DEFAULT"
// "DEBUG"
// "INFO"
// "NOTICE"
// "WARNING"
// "ERROR"
// "CRITICAL"
// "ALERT"
// "EMERGENCY"
*/
import (
"log"
"net/http"
"time"
"golang.org/x/net/context"
"golang.org/x/oauth2"
"golang.org/x/oauth2/google"
"golang.org/x/oauth2/jwt"
logging "google.golang.org/api/logging/v1beta3"
"google.golang.org/appengine"
"google.golang.org/appengine/urlfetch"
)
const (
LevelDefault = "DEFAULT"
LevelDebug = "DEBUG"
LevelInfo = "INFO"
LevelNotice = "NOTICE"
LevelWarning = "WARNING"
LevelError = "ERROR"
LevelCritical = "CRITICAL"
LevelAlert = "ALERT"
LevelEmergency = "EMERGENCY"
)
//CloudLogger will write structured logs to the cloud.
type CloudLogger struct {
service *logging.ProjectsService
logEntryService *logging.ProjectsLogsEntriesService
projectID string
logsID string
}
func New(serviceAccount, privateKey, projectID, logsID string) (*CloudLogger, error) {
oauthConf := &jwt.Config{
Email: serviceAccount,
PrivateKey: []byte(privateKey),
Scopes: []string{logging.LoggingWriteScope},
TokenURL: google.JWTTokenURL,
}
oauthHTTPClient := oauthConf.Client(oauth2.NoContext)
loggingService, err := logging.New(oauthHTTPClient)
if err != nil {
return nil, err
}
s := logging.NewProjectsService(loggingService)
return &CloudLogger{
service: s,
projectID: projectID,
logsID: logsID,
logEntryService: s.Logs.Entries,
}, nil
}
func NewAppEngineLogger(r *http.Request, projectID, logsID string) (*CloudLogger, error) {
ctx, _ := context.WithTimeout(appengine.NewContext(r), time.Second*5)
//https://godoc.org/golang.org/x/oauth2/google#example-AppEngineTokenSource
loggingService, err := logging.New(&http.Client{
Transport: &oauth2.Transport{
Source: google.AppEngineTokenSource(ctx, logging.LoggingWriteScope),
Base: &urlfetch.Transport{
Context: ctx,
},
},
})
if err != nil {
return nil, err
}
s := logging.NewProjectsService(loggingService)
return &CloudLogger{
service: s,
projectID: projectID,
logsID: logsID,
logEntryService: s.Logs.Entries,
}, nil
}
func (cl *CloudLogger) WriteLogEntry(severity string, e interface{}) error {
le := &logging.LogEntry{
StructPayload: e,
Metadata: &logging.LogEntryMetadata{
ServiceName: "compute.googleapis.com",
//Severity can be set to other levels once we need them.
Severity: severity,
},
}
req := &logging.WriteLogEntriesRequest{
Entries: []*logging.LogEntry{le},
}
call := cl.logEntryService.Write(cl.projectID, cl.logsID, req)
_, err := call.Do()
if err != nil {
log.Println("Got an error trying to write logEntry", err)
return err
}
return nil
}
view rawgooglelogdemo.go hosted with ❤ by GitHub

NOTES:

  • The repo here (or docs here) have example usage for GCE and Appengine. It will also work from outside of Google’s cloud services as long as you have an account.
  • The logsID is whatever you want to call your log, google will add it to a drop down on the page the first time you use it.