Learn Elasticsearch with Hollywood movies
I am tired of boring Elasticsearch tutorials. Learning should be interactive; it shouldn't feel like reading lengthy technical documentation. Today we are going to learn the basics of Elasticsearch using a movie database. I have a small database of titles ready for you to import.
Don't we need some theory? You didn't click to view the typical technical blog post. If you are stubborn, jump to the workshop section and learn on the fly. Some people learn this way better, Β nothing wrong with that. If you want to get to know some essentials, go through Slideshare I created.
Elasticsearch from PTSD Engineer
Before we begin clone this repo: git clone git@github.com:ptsdengineer/learn-elasticsearch-with-hollywood-movies.git
or download zip
Installing Elasticsearch and seeding data
For Mac users : the most comfortable way would be to install it from homebrew: brew install Elasticsearch
For Linux users : Please try this tutorial here: https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-elasticsearch-on-ubuntu-14-04
Run Elasticsearch
elasticsearch
on Mac
sudo service elasticsearch start
on Ubuntu
Check if it's running on http://localhost:9200
Run using docker
If you prefer docker way, you can also use it in tutorial. I have docker-compose file ready.
docker-compose up
Credentials : user is elastic
and password changeme
Install API testing tool
Use Postman, curl or Insomnia for making calls to Elasticsearch API. I can't recommend Insomnia enough, this piece of software makes life so much easier.
Creating data
Create index for movies, it will hold all the movies documents that we will import in a minute. Open insomnia and make your first call.
Now let's import data into our index. I prepared JSON with all the documents that could be easily run.
curl -s --header "Content-Type:application/json" -XPOST localhost:9200/_bulk --data-binary @movies.json
Bulk import of movies from json
*Use option -u
for typing user and password when running with docker.
If you want to learn more about this feature: Bulk import
Match all
Let's make the most straightforward possible query to our movies index. The query that returns all results, it's called match all query.
GET <name_of_index>/_search
{
"query": {
"match_all": {}
}
}
You should get this type of result in response:
"hits": {
"total": 306,
"max_score": 1,
Exercise
Type this query into Insomnia. You will get results for the movies index.
String query
Still straightforward query, we will only search for a particular string.
GET /_search
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
Exercise.
Using this knowledge find movie Scarface in the Elasticsearch. There should be only one result.
Operators
Let's build on this. We want to extend our search capabilities. Elasticsearch uses operators like in programming, by default it uses OR
but we can use AND
to get an exact match.
GET /_search
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "Strawberry pie with jello",
"default_operator": "AND"
}
}
}
Now we will be sure that we will only get recipes we are interested.
Exercise.
Make a query to Elasticsearch that will return only one result on query Captain America first avenger
"hits": {
"total": 1,
"max_score": 11.263437,
"hits": [
{
"_index": "movies",
"_type": "movie",
"_id": "139",
"_score": 11.263437,
"_source": {
"title": "Captain America: The First Avenger",
"plot": "Predominantly set during World War II, Steve Rogers is a sickly man from Brooklyn who's transformed into super-soldier Captain America to aid in the war effort. Rogers must stop the Red Skull β Adolf Hitler's ruthless head of weaponry, and the leader of an organization that intends to use a mysterious device of untold powers for world domination.",
"genres": null,
Fuzziness
What about cases when users don't type query correctly? We should also handle those cases. Fortunately Elasticsearch has an answer, match query with a fuzzy query, it's a simpler cousin of string query.
"query": {
"match": {
"text": {
"query": "jomped over me!",
"fuzziness": "AUTO",
"operator": "and"
}
}
}
"fuzziness": "AUTO"
generates an edit distance based on the length of the term. 0..2
must match exactly 3..5
one edit allowed >5
two edits allowed
You could also use number values, like 0, 1, 2
. Fuzziness is interpreted as Levenshtein Edit Distance. More about: fuzziness
Exercise.
Write a query that will return all Captain America movies based on a query, which was mistyped: "Captaon America"
Filtering
We are using a range query.
Matches documents with fields that have terms within a certain range. The Lucene query type depends on the field type, for string fields, the TermRangeQuery, while for number/date fields, the query is a NumericRangeQuery. The following example returns all documents where age is between 10 and 20:
GET _search
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
gte = Greater-than or equal to
gt = Greater-than
lte = Less-than or equal to
lt = Less-than
Exercise.
Create a query that would return movies with a running time between 60 and 90 minutes.
It should return 57 results.
Bool query
The bool query takes a more-matches-is-better approach, so the score from each matching must
or should
clause will be added together to provide the final _score
for each document.
must
- The clause (query) must appear in matching documents and will contribute to the score.
filter
- Filter clauses are executed in filter context. Scoring is ignored and clauses are considered for caching.
should
- The clause (query) should appear in the matching document.
Example query:
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tag" : "wow" } },
{ "term" : { "tag" : "elasticsearch" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
Exercise.
Create a query that will find superhero movies (keywords field: superhero) that are no longer than 120 minutes and no shorter than 60 minutes (field runtime) and must not have Robert Downey Jr. as a starring actor (actors field).
You should get 12 results for this query.
Aggregations
Let's get some interesting stats for analytics. We want to get an overall view of how some value occurs through the documents. The stats aggregation would give us count, minimum value, maximum value, averages. It's useful for getting overall insight.
{
"aggs" : {
"grades_stats" : { "stats" : { "field" : "grade" } }
}
}
and returns:
{
...
"aggregations": {
"grades_stats": {
"count": 6,
"min": 60,
"max": 98,
"avg": 78.5,
"sum": 471
}
}
}
Read more about aggregations here
Exercise.
Get overall data for ratings in movies: min, max, average. Do that using stats query.
Range Aggregation
A multi-bucket value source-based aggregation enables the user to define a set of ranges - each representing a bucket.
GET products/_search?size=0
{
"aggs": {
"weight_ranges": {
"range": {
"field": "weight",
"ranges": [
{
"to": 500
},
{
"from": 500,
"to": 1000
},
{
"from": 1000,
"to": 1500
}
]
}
}
}
}
Returns aggregated data:
...
"aggregations": {
"weight_ranges" : {
"buckets": [
{
"to": 500,
"doc_count": 20
},
{
"from": 500,
"to": 1000,
"doc_count": 4
},
{
"from": 1000,
"doc_count": 4
}
]
}
}
}
Exercise.
Using range queries, count how many movies were in specific run times: below 60 minutes, between 60 and 75 minutes, between 90 and 120 minutes.
Histogram aggregation
We can also use a histogram to bucket data instead of ranges. It's useful for prices in shops, so we can see how prices fall between different ranges like 0, 10.
POST /sales/_search?size=0
{
"aggs" : {
"prices" : {
"histogram" : {
"field" : "price",
"interval" : 10
}
}
}
}
Would return:
{
...
"aggregations": {
"prices" : {
"buckets": [
{
"key": 0.0,
"doc_count": 1
},
{
"key": 50.0,
"doc_count": 1
},
{
"key": 100.0,
"doc_count": 0
},
{
"key": 150.0,
"doc_count": 2
},
{
"key": 200.0,
"doc_count": 3
}
]
}
}
}
Exercise.
Create histogram aggregation for a rating in movies with interval equal 1.
Sorting
Allows adding one or more sort on specific fields. Each sort can be reversed as well. The sort is defined on a per-field level, with particular field name for _score
to sort by score, and _doc
to sort by index order.
GET /my_index/my_type/_search
{
"sort" : [
{ "post_date" : {"order" : "asc"}},
"user",
{ "name" : "desc" },
{ "age" : "desc" },
"_score"
],
"query" : {
"term" : { "user" : "kimchy" }
}
}
Exercise.
Sort Captain America movies by release date in ascending order, the oldest film first. You should display only Captain America movies here. Keep results relevant.
Highlighting
Elasticsearch allows highlighting search results in one or more fields. It's useful for the results page, visually communicates where query appears in the searched field.
GET /_search
{
"query" : {
"match": { "content": "kimchy" }
},
"highlight" : {
"pre_tags" : ["<tag1>"],
"post_tags" : ["</tag1>"],
"fields" : {
"content" : {}
}
}
}
Exercise.
Create highlight for your query to search "terrorist attack" plot in films. It should return with highlighted fields with tags like this:
"highlight": {
"plot": [
"Jack Ryan, as a young covert CIA analyst, uncovers a Russian plot to crash the U.S. economy with a <highlight>terrorist</highlight> <highlight>attack</highlight>."
]
}
You can create pagination by passing parameters size and from to query. The size will dictate a number of elements on the page and from will work as offset.
For pages 1 to 3.
GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10
Could be passed to body too.
{
"query": {
"match_all": {}
},
"size": 5
}
Exercise.
Create pagination for movies with genre action.
Final task
Put your knowledge to good use and create a movie recommendation query that will take many parameters, including plot, actors, title, release date.
Requirements:
- Fancy movies with a higher rating. They should have a higher score but stay relevant.
- The algorithm should prefer newer movies.
- Favor shorter films over longer.
You can also play around with it further and add extra powers to it.
Send me your answers to ptsdengineer@protonmail.com. I will post the best ones in the upcoming blog post. Get creative!
Best regards,
PTSD Engineer
Embarking on a journey to learn Elasticsearch with Hollywood movies has never been this exciting! π¬π» The Bling2 APK takes this educational experience to the next level, offering a seamless platform to explore the intricacies of Elasticsearch while enjoying your favorite movies. ππΏ The user-friendly interface and innovative features on Bling2 make the learning process both entertaining and informative. Dive into the world of knowledge with Bling2 APK - where education meets entertainment!
https://bling2modapk.org/
Embarking on the journey to learn Elasticsearch with Hollywood movies is akin to discovering the perfect tutorial on IDMcracksdl β a gateway to knowledge with a touch of cinematic flair. Just as IDMcracksdl simplifies the download process, this Elasticsearch endeavor promises an accessible and engaging learning experience. Imagine delving into the intricacies of data querying and retrieval while using iconic Hollywood movies as real-world examples.
https://idmcracksdl.com/
Just started my journey to learn Elasticsearch with Hollywood movies on my go-to streaming app! The innovative approach of blending tech education with blockbuster entertainment is a game-changer. Each tutorial feels like a scene straight out of a movie, making the learning experience both fun and informative. Canβt believe how engaging this is β who knew Elasticsearch could be this cinematic? Ready to level up my skills with the best teacher in town: Hollywood!
https://loklokapkdl.com/