silikoncamping.blogg.se

Visual studio 2017 download from url with progress bar
Visual studio 2017 download from url with progress bar






visual studio 2017 download from url with progress bar

It also reduces the time for training the model in BigQuery from 24 minutes to 3. Now we wait - while BigQuery shows us the progress of our training:Īnd when it’s done, we even get an evaluation of our model:ĭo we really need 4,000 one-hot encoded dimensions to obtain better clusters? Turns out that 500 are enough - and I like the results better. With this line, I’m creating a one-hot encoding string that I can use later to define the 4,000+ columns I’ll use for k-means:įORMAT("IFNULL(ANY_VALUE(IF(tag2='%s',1,null)),0)X%s", tag2, REPLACE(REPLACE(REPLACE(REPLACE(tag2,'-','_'),'.','D'),'#','H'),'+','P'))Īnd training a k-means model in BigQuery is really easy:ĬREATE MODEL `deleting.kmeans_tagsubtag_50_big_a_01` Now - instead of using this small table, let’s use the whole table to compute k-means with BigQuery. ,IFNULL(ANY_VALUE(IF(tag2='jquery',1,null)),0) XjqueryįROM `deleting.stack_overflow_tag_co_ocurrence` ,IFNULL(ANY_VALUE(IF(tag2='android',1,null)),0) Xandroid ,IFNULL(ANY_VALUE(IF(tag2='python',1,null)),0 ) Xpython ,IFNULL(ANY_VALUE(IF(tag2='javascript',1,null)),0) Xjavascript You can reduce or augment the sensibility of these relations with the percent threshold: ‘unit-testing’ a relation to almost every column here, except to ‘php’, ‘html’, ‘css’, and ‘jquery’.‘multi-threading’ shows a relation to ‘python’, ‘java’, ‘c#’, and ‘android`.‘machine-learning’ shows a relation to ‘python’, but not the other way around.‘javascript’ shows a relation to ‘php’, ‘html’, ‘css’, ‘node.js’, and ‘jquery’.What you see here is a co-occurrence matrix:

visual studio 2017 download from url with progress bar

Let’s see first a subset of these results: Then I can use that string to get a huge table, with a 1 for every time a tag co-occurs with the main one at least certain % of time. So I’m going to create a string first that will define all the columns where I want to find co-occurrence. BigQuery ML does a good job of hot-encoding strings, but it doesn’t handle arrays as I wish it did (stay tuned). WHERE tag1 IN (SELECT tag FROM active_tags)ĪND tag2 IN (SELECT tag FROM active_tags) SELECT *, MAX(questions) OVER(PARTITION BY tag1) questions_tag1įROM data, UNNEST(SPLIT(tags, '|')) tag1, UNNEST(SPLIT(tags, '|')) tag2

visual studio 2017 download from url with progress bar

SELECT *, questions/questions_tag1 percent

Visual studio 2017 download from url with progress bar plus#

So I’ll take these relationships and I’ll save them on an auxiliary table - plus a percentage of how frequently a relationship happens for each tag.ĬREATE OR REPLACE TABLE `deleting.stack_overflow_tag_co_ocurrence`įROM `fh-bigquery.stackoverflow_archive.201906_posts_questions` Let’s find tags that usually go together:Ĭo-occurring tags on Stack Overflow questions ORDER BY 2 DESC Top Stack Overflow tags by number of questions. In this picture I only have 240 tags - how would you group and categorize 4,000+ of them?įROM `fh-bigquery.stackoverflow_archive.201906_posts_questions`,

visual studio 2017 download from url with progress bar

These are the most active Stack Overflow tags since 2018 - they’re a lot. You can check out more about working with Stack Overflow data and BigQuery here and here. In this post he works with BigQuery – Google’s serverless data warehouse – to run k-means clustering over Stack Overflow’s published dataset, which is refreshed and uploaded to Google’s Cloud once a quarter. Felipe Hoffa is a Developer Advocate for Google Cloud. Visualizing a universe of clustered tags. How would you group more than 4,000 active Stack Overflow tags into meaningful groups? This is a perfect task for unsupervised learning and k-means clustering - and now you can do all this inside BigQuery.








Visual studio 2017 download from url with progress bar