Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to Diagnose "Crawled - Currently Not Indexed" in Google Search Console
The new Google Search Console gives a ton of information about which pages were excluded and why, but one that I'm struggling with is "crawled - currently not indexed". I have some clients that have fallen into this pit and I've identified one reason why it's occurring on some of them - they have multiple websites covering the same information (local businesses) - but others I'm completely flummoxed. Does anyone have any experience figuring this one out?
Reporting & Analytics | | brettmandoes2 -
Differences in site search revenue in GA
I just put in a piece of software to replace a really bad built in site search engine on my 3dcart website. Now I am trying to measure the change, but I am having some issues. When I check the ecom data in the conversions section of GA with the built in segment Performed Site Search, I get promising results. Approximately 5% revenue increase over LY. But if we jump to behavior, site search, usage, and then check the visits with site search, I get a decrease by 4%. And the actual revenue is off, by like double (150k compared to 80k) Anyone have any idea why I am getting these results? The site search function is set up. Tracking is enabled, query parameter is keyword and search url is /search.asp?keyword=
Reporting & Analytics | | ShockoeCommerce0 -
How to interpret search "clicks" in GA?
I am trying to make some sense of the data in the "Landing Pages" report under "Search Engine Optimization" in GA versus the data under the "Landing Pages" report in "Behavior | Site Content". For example, the SEO report says my page http://www.asiantraveltips.com/blog/bangkok-skytrain-bts-mrt-lines/ received 22,000 search impressions in the past 30 days and 900 "clicks" (12.42%). What are these "clicks" when the Content "Landing Pages" report says the same page only got 382 "sessions" in the same period??? What are these "clicks" if not clicks on the search results link that should be reflected in landing page sessions on the corresponding page?
Reporting & Analytics | | Gavin.Atkinson0 -
Google Analytics Tracking Code Queries
Hello, I have taken on a new client who has Google Analytics installed. The tracking code is set to 'single domain'. Recently they added a mobile site using a sub-domain (m.website.com) which means that Google Analytics is not picking up this traffic. I want to revise the account so that I have a master account (raw data) and then profiles for the mobile site, main domain (www.website.com) and one other for a sub-domain that they are using. I am aware that there is mobile specific tracking code however I thought it would be easier (re conversions/goals/eCommerce tracking) to not use this and by changing the account to 'multiple domains' we could also get data for another sub-domain that they are using . My questions are: Am I right to want to use individual profiles over web properties. If not please explain why. When installing the tracking code (where the profile number is changing) I believe that I need to add that code with the changing profile number to the sub-domain sections. So my question is a) is that correct, and b) if I use a profile number on a sub-domain section will the master account still gather the data for the main URL as well as all sub-domains. If I change the master account from using 'single domain' tracking code to 'multiple domain' tracking code will this affect historical data? Will I lose the data? When changing from 'single domain' tracking to 'multiple domain' tracking does this affect eCommerce tracking? Or do we only need to be adding the additional lines of tracking code that allow sub-domains to be tracked? The web developers are using asynchronous code however half is in the and the other half is at the bottom of the source code. Given that traffic is being reported in the Google Analytics account should I have any concerns that the code is split? I have done a lot of reading but seem to be going around in circles, so your help is much appreciated! Thanks,
Reporting & Analytics | | Unity
Dinny0 -
Google Analytics Organic search queries aren't being updated, even though I'm still seeing results in all our typical results pages.
We pushed some new changes to the site and Google Analytics is no longer updating the Organic Search queries listing, even though traffic is consistent and and we're still landing results in all our typical keyword searches. Any ideas?
Reporting & Analytics | | unclekaos0 -
Can you help me figure out what happened to my website search results in Google?
On or about the 24th of April I noticed an abrupt decrease in traffic to my website:
Reporting & Analytics | | rdominey
http://www.getyourphotosoncanvas.com Sorry this might be long but I’m trying to be as thorough as possible. I thought that I had been hacked, a virus, maybe penalized by Google I don’t know what ? I submitted a reconsideration request to Google and they responded with the following: Reconsideration request for http://www.getyourphotosoncanvas.com/: No manual spam actions found
May 10, 2012
Dear site owner or webmaster of http://www.getyourphotosoncanvas.com/,
We received a request from a site owner to reconsider http://www.getyourphotosoncanvas.com/ for compliance with Google's Webmaster Guidelines. - - - - - -
We reviewed your site and found no manual actions by the webspam team that might affect your site's ranking in Google. There's no need to file a reconsideration request for your site, because any ranking issues you may be experiencing are not related to a manual action taken by the webspam team.
Google Search Quality Team I have ran all kinds of web crawl tests, Google webmaster, talked with SEO “Experts” and still can not figure out what is happening. I decided to use a couple of SEOmoz tools to try to help me explain what is happening. I figured that if I could take a very specific and unique KeyPhrase and run it on a specific page that I might be able to better explain what is happening. Basically, We appear to be no longer searchable by key words or phrases on google?
Here is an example:
Key Phrase: Free Services to Help Improve Your Photos on Canvas
Website: http://www.getyourphotosoncanvas.com/free-photo-canvas-retouching
Attached are some screen shots of the actual search results on Bing, Yahoo and Google along with the ranking tool results from SEOmoz and the on page grade for the key phrase.
Anybody got any Ideas? I am hurting; the internet and Google search is about 40% of by business. http://www.getyourphotosoncanvas.com/wp-content/uploads/2012/05/Bing-Free-Services.jpg http://www.getyourphotosoncanvas.com/wp-content/uploads/2012/05/Yahoo-Free-Services.jpg http://www.getyourphotosoncanvas.com/wp-content/uploads/2012/05/Google-Free-Services.jpg http://www.getyourphotosoncanvas.com/wp-content/uploads/2012/05/SEOmoz-Ranking.jpg http://www.getyourphotosoncanvas.com/wp-content/uploads/2012/05/SEOmoz-Report-Card.jpg [" target="_blank">iframe>](<iframe class=) Bing-Free-Services.jpg Yahoo-Free-Services.jpg Google-Free-Services.jpg SEOmoz-Ranking.jpg SEOmoz-Report-Card.jpg0 -
Problem when searching for "link:www.mysite.com" vs "link: www.mysite.com"
Why does a search for "link:www.mysite.com" show no results, but when there is a space before www.mysite.com it shows results? The same happens for "link:www.mysite.com" (nothing shows up), but when I search for "link:www.mysite.com/index.php" it returns results. Is there a problem I am missing? Thanks so much!
Reporting & Analytics | | EmilyP0 -
Organic search on google
Hi there, pl take a look at this link, there is a section which says shared results and has a star against two agencies and once article. can pl someone let me know what those starts are and how to get them? Thank you 🙂 so sorry - the link is http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=digital+mareketing+agency+los+angeles Vijay
Reporting & Analytics | | vijayvasu0