Linky: Scraping LinkedIn

Monday, July 1, 2019

Introduction

Linky is a tool to aid in user enumeration, it works by querying two URL’s, depending on the keyword:

url='https://www.linkedin.com/voyager/api/search/cluster?count=40&guides=List(v->PEOPLE,facetCurrentCompany->%s)&origin=OTHER&q=guided&start=0' % company_id

url = "https://www.linkedin.com/voyager/api/search/cluster?count=40&guides=List(v->PEOPLE,facetCurrentCompany->%s)&keywords=%s&origin=OTHER&q=guided&start=0" % (company_id,keyword)

By setting facetCurrentCompany to a company id, the api will return data for people who currently have their workplace set as the company id specified. For keywords, simply adding &keywords into the query resolves that issue.

A full example json output can be found here.

Annoyingly, this method can only obtain 1000 results at a time. Thats why the keywords feature is so important.

Obtaining your cookie

To use this tool, LinkedIn authentication is required. This is done, for us, via the li_at cookie. This can be obtained by going into LinkedIn’s cookies and locating it:

You must save it out to a file and call it with --cookie

Obtaining the Company ID

The company ID is a integer that represents that company. If you were to look up google, you’d see they have several:

https://www.linkedin.com/search/results/people/?facetCurrentCompany=%5B%221441%22%2C%22791962%22%2C%222374003%22%2C%2218950635%22%2C%2216140%22%2C%2210440912%22%5D

The company ID’s are:

  1. 1441
  2. 79162
  3. 2364003
  4. 18950635
  5. 16140
  6. 10440912

One of these values can be passed into linky via --company-id.

To find the company id, search the company on LinkedIn:

You’ll land on their corporate profile. From here, click See all x employees on LinkedIn:

Identifying email schemes

This is the most awkward part, the need to identify the email scheme for the client. 95% of the time, its probably firstname.surname. Currently, there is no logic in order to detect this, so this will have to be done manually.

Running Linky

This is easy enough, so here are some example commands:

Getting Employees

python3 linky --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname'

Using keywords

python3 linky --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer

Making sense of it all

If the client has more than 1000 employees, then the recommendation is to run Linky normally and then open the html outout, and scroll down to the table at the bottom. It will look like this:

In this instance, Google’s first 1000 results contained 85 Software Engineers. This would be a good indication that it is wirth running Linky again with --keyword 'software engineer'. And then so on, and so forth.

Super short post, but its for people who have never used tools like this and it serves as a mediocre introduction to Linky.

Its also worth noting that I’m currently working on bypassing the 1000 limit and using the LinkedIn urls to extract data from individual’s bio. Example, if the individual has a bunch of Ubuntu versions or OS’s they use at their current role, then regex it out.

Update #1

05-09-2019

Since this initial release, I’ve added a bunch more stuff. There is one thing I would like to explain, which I imagine people will want to know about. And, that is the validation aspect.

Linky is able to validate users via the Hunter API and the Office 365 Bug, more importantly.

As explained by grimhacker:

HTTP Response Description
200 Valid Username and Password without 2FA
401 Valid Username
403 Valid Username and Password with 2FA
404 Invalid Username

Simply, by sending a request to https://outlook.office365.com/Microsoft-Server-ActiveSync, the response code can give away information.

Here is my code for it.

Get the project!

Update #2

28-11-2019

After the patching of the previous bug, there is now a new one. Written here by raikiasec.

This issue works in a similar way. However, it does not submit passwords to the service. The response is taken from:

requests.get('https://outlook.office365.com/autodiscover/autodiscover.json/v1.0/{}@{}?Protocol=Autodiscoverv1'.format(junk_user, domain), headers=headers, verify=args.nossl, allow_redirects=False, proxies=proxies)

Depending on the response, the following code can determine its validity from the response codes:

if r.status_code == 200:
                print("VALID: ", email)
                if args.output is not None:
                    print_queue.put(email)
            elif r.status_code == 302:
                if domain_is_o365[domain] and 'outlook.office365.com' not in r.text:
                    print("VALID: ", email)
                    if args.output is not None:
                        print_queue.put(email)
            else:
                if args.verbose:
                    print("INVALID: ", email)

For a full explanation on this bug, visit this post written by Raikiasec.

DevelopmentLinkyOSINT

creds_all

#TIFG: Server Message Block