Unsupervised Machine Learning – Flat Clustering with KMeans with Scikit-learn and Python

This unsupervised machine learning tutorial covers flat clustering, which is where we give the machine an unlabeled data set, and tell it how many categories we want the data categorized into.

sample code:

Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6

19 comments

  1. The Justice Of 666 says:

    Very good video, I’m so interested in hierarchical unsupervised machine
    learning. I made a self-exploring robot and I want to get the GPS
    coordinates, put it trough the algorithm and find what places it “likes”
    aka the centroids. Later I’m going to label them so I can tell it to go to
    that place without needing to input coordinates. 

  2. Chris Smith says:

    Instead of re-typing the lists for numpy, you could do:
    X = np.array( zip(x,y) ) … python 2
    X = np.array( list(zip(x,y)) ) … python 3
    … which is easier for big lists.

  3. brickmaster555 says:

    sentdex I know this isn’t exactly related to this video, but I really need
    help! I am following your series of how to make a game using Pygame. My
    problem is that whenever I try to run what I have written, It only shows a
    black screen. When I quit the game, my background and car image pop up
    quickly, then the window closes. Here is my code:

    import pygame
    pygame.init()
    display_width=800
    display_height=600

    white=(255,255,255)
    red=(255,0,0)
    cyan=(68,228,201)

    gameDisplay=pygame.display.set_mode((display_width,display_height))
    pygame.display.set_caption(“Spider”)
    clock=pygame.time.Clock()

    spiderImg=pygame.image.load(“newspid.png”)

    def spider(x,y):
    gameDisplay.blit(spiderImg,(x,y))

    x=(display_width*0.45)
    y=(display_height*0.8)

    x_change=0

    dead=False
    while not dead:
    for event in pygame.event.get():
    if event.type==pygame.QUIT:
    dead=True

    if event.type==pygame.KEYDOWN:
    if event.key==pygame.K_LEFT:
    x_change=-5

    elif event.key==pygame.K_RIGHT:
    x_change=5

    if event.type==pygame.KEYUP:
    if event.key==pygame.K_LEFT or event.key==pygame.K_RIGHT:
    x_change=0

    x+=x_change

    gameDisplay.fill(cyan)
    spider(x,y)

    pygame.display.update()
    clock.tick(60)

    pygame.quit()
    quit()

    That is all so far. Anything that you notice to be out of place? Thanks :)

  4. nicholas Bradford says:

    I am getting this error
    “`centroids = kmeans.cluster_centers_
    —————————————————————————
    AttributeError Traceback (most recent call last)
    in ()
    –> 1 centroids = kmeans.clusters_centers_

    AttributeError: ‘KMeans’ object has no attribute ‘clusters_centers_’“`

  5. Isra Shabir says:

    Hey sentdex – your videos are great. I was wondering if it’s possible for
    your to post a link to your code files for each video?
    Thanks!

  6. RAC C (thiirane) says:

    Interesting video. So I wanted to know if you could substitute HOG
    features for X and create a KMeans Plot for a collection of Images. I have
    successfully run the SVM for these data but wanted to see how the data was
    distributed. I tried a simple case of two sets of images. The code ran
    but there is something I don’t understand about the feature array from HOG
    images. The KMeans fit seems to have labeled the data as 0,1 but instead
    of getting an x,y like you have the feature array has coordinate: [
    0.00000000e+00 0.00000000e+00 0.00000000e+00 …, 3.42854831e-16
    1.10856395e-15 5.82853213e-16] label: 1 for example. Perhaps if I
    take the log??? Do you know anything about how the HOG feature array is set
    up so I can get a meaningful KMeans plot?

  7. Soheil P says:

    Hey sentdex, are you planning to do a presentation on Agglomerative
    Clustering by any chance? According to
    , it should
    work for nonlinear data. However, there are many tuning parameters. Any
    comment is appreciated.

  8. unique raj says:

    Nice video, if you want to use your own dataset, X being feature
    matrix,(imported as csv file) why we can not cluster without using PCA? i
    would be really happy for your response.

    in scikit learn , is PCA is compulsary before KMeans clustering

  9. Yew Jie says:

    Nice tutorial! Just one question, how can I plot the graph if the array
    contains more than 2 features? Eg. [[1,4,3.5], [x,y,z ], [x2,y2,z2 ] ….

Comments are closed.