Questions tagged [hive]

Questions about Hive and HiveQL.

Apache Hive is a data warehouse tool for working with large data sets.

Main Site: https://hive.apache.org

Questions about Hive and HiveQL.

34 questions
5
votes
1 answer

Cumulative sum using hiveql

I have a table in Hive which looks like: col1 col2 b 1 b 2 a 3 b 2 c 4 c 5 How do I, with hiveql, group up col1 elements together, sum them up, sort by the sum, as well as create a…
klx123
  • 53
  • 1
  • 1
  • 4
4
votes
1 answer

correlate subquery in hive

SELECT ndo.sku ParentSKU, visitsWhenSKUWasOnline.s FROM Temp.NumberOfDaysOnline ndo JOIN ( SELECT SUM(gasessiondata.sessions) as s FROM gasessiondata WHERE gasessiondata.date >= ndo.FromDate AND gasessiondata.date <=…
systemdebt
  • 169
  • 3
  • 10
3
votes
1 answer

Getting the row before a row with a certain value in SQL

I have a table like below where user actions are stored with a timestamp. My goal is to identify the action that happened before a specific action (named reference_action) and count the number of those actions to see which actions happens before the…
2
votes
1 answer

Connecting to cluster with cqlsh returns "Unable to connect to any servers"

I am trying to deploy The Hive 4 on a VMware Workstation 17 player VM to test Splunk integration with The Hive. I am following the guide at this link, but I encountered an error at one of the stages, namely when launching cassandra using the cqlsh…
aimakovm
  • 21
  • 1
  • 3
2
votes
1 answer

joining the where clause to select statement of two different tables

How can I properly structure the below query to make it work? I would like to have the Ref_CD = MBR_ID_TYPE_ID to be my select statement on the beginning of it. select MBR_ID_TYP_ID || '-' || ( select ref_desc from ref where…
Ina gurey
  • 21
  • 2
2
votes
2 answers

How can I group the null results in this FULL OUTER JOIN with non-null responses?

I am looking at the overlap and non-overlap (unique values) of users-ids from two different select statements using a full join. The main differentiation being that one table will have a deal_id = 0 and the other will have any deal_id greater than…
userLP
  • 31
  • 1
  • 7
2
votes
1 answer

how to insert data into extra columns of target avro table when source table is having less no of columns compared to target using hive or impala?

Suppose I am having a source Avro table having 10 columns and my target Avro table is having 12 columns, while inserting data into the target table I need to add null values to the extra 2 columns. But when I execute the below query it has thrown…
user109612
  • 21
  • 2
1
vote
0 answers

How to pass multiple values into hive hql for the same hivevar

Requirement : My Hql has below script in which I want to pass values into the where clause dynamically. How do I dynamically pass using hivevar in below specific scenario where multiple values are expected. Or how do I invoke the hql with hivevar…
RaCh
  • 11
  • 1
1
vote
1 answer

Metastore(Mysql) bottleneck for Hive

We have a hive installation that has MariaDB as metastore database. MariaDB has around ~250 GB metadata with ~100GB indexes. It becomes terribly slow during the peak load of 40-60K QPS. Looking from the community to share similar experiences if any…
Shakti Garg
  • 111
  • 1
1
vote
1 answer

For each tuple, get the name of the first column which is non-zero

I have a table in Hive which looks like: | Name | 1990 | 1991 | 1992 | 1993 | 1994 | | Rex | 0 | 0 | 1 | 1 | 1 | | Max | 0 | 0 | 0 | 0 | 1 | | Phil | 1 | 1 | 1 | 1 | 1 | I would like to get, for each…
user2891462
  • 113
  • 3
1
vote
1 answer

How to check total allotted space inside a HDFS 'group'

Our DBA has created a schema for our team in HDFS/HIVE. Not sure if 'schema' is the right word, they call it a 'group'. Anyway, we can only write to the data lake inside this schema, whether it is parquet files or hive tables. Is there a way to…
Victor
  • 127
  • 1
  • 4
1
vote
0 answers

Defining external table on JSON with an @ sign in an element

I need to define a Hive external table onto a JSON file that has @ signs in its elements, e.g. { "data": { "@type": "person", "name": "Phil", "job": "Programmer" } } This works: create external table sandbox.test_table ( data…
PhilHibbs
  • 539
  • 1
  • 7
  • 22
1
vote
1 answer

HIVE SQL [Error 10025]: Expression not in GROUP BY key

Here is my SQL statement: my_table includes 10+ columns (e.g, day, ip_address, user, request, etc), including strings and numbers. I want to GROUP by & HAVING based on column 'ip_address', if more than 20 records. SELECT day, ip_address, user,…
TJCLK
  • 127
  • 1
  • 6
1
vote
1 answer

SQL filter only if each unique value has more than N records

Here is my sample SQL statement: SELECT DAY, name, value FROM my_table WHERE DAY = '${date}' GROUP BY DAY name, value ORDER BY name ASC For example, 3 unique names in 'name' column: Alice, Bob, Clark. Alice has 5…
TJCLK
  • 127
  • 1
  • 6
1
vote
0 answers

HIVE + understanding the hive-metastore logs

we have HDP cluster version - 2.6.4 , and we are runs spark streaming app and we are uses presto cluster in order to run Hive queries when we look on the hivemetastore logs ( under /var/log/hive ) , we can see the following warnings , that repeat…
King David
  • 111
  • 1
  • 4
1
2 3